PHP Preg_match Matching a class and getting content after - php

$str = '<div class="rss"><img src="http://www.wired.com/images_blogs/gadgetlab/2013/10/1125_hbogo_660-660x436.jpg" alt="You Can Now Get HBO GO Without Paying for Other Channels">
</div>Fans of';
I'm trying to get hold of the text after the <div class="rss"></div> but each expression I use doesn't seem to work.
matching .rss
if(preg_match('/^(<div class=\"rss\">[\S\s]+?</div>([\S\s]*)$/i', $item_content, $matches)) {
Could someone please help with this expression?
Originally I had this expression to match an image tag instead of a div and this worked fine by using
if(preg_match('/^(<img[\S\s]+?>)([\S\s]*)$/i', $item_content, $matches)) {

I didn't go deeply for the regex but yours work well with just solving some syntax problems.
It should be:
^<div class=\"rss\">[\S\s]+?<\/div>([\S\s]*)$/i
Live demo

This may help:
<?php
$item_content = '<div class="rss"><img src="http://www.wired.com/images_blogs/gadgetlab/2013/10/1125_hbogo_660-660x436.jpg" alt="You Can Now Get HBO GO Without Paying for Other Channels">
</div>Fans of';
if(preg_match('/^(<div class=\"rss\">[\S\s]+?<\/div>)([\S\s]*)$/i', $item_content, $matches)) {
$div = $matches[1];
$text = $matches[2];
echo "<textarea style=\"width: 600px; height: 300px;\">";
echo $div . "\n";
echo $text . "\n";
echo "</textarea>";
}
?>

Related

BBCODE IMG TAG variations with REGEX

I need to convert BBCODE IMG TAG to HTML.
The problem is: the IMG TAG has multiple variations.
[img]img_patch[/img]
[img=200x150]img_patch[/img]
[img width=200 height=150]img_patch[/img]
[img=width=200xheight=150]img_patch[/img]
[img width=200]img_patch[/img]
This regex below cover the First one and the Second one.
'#\[img](.+)\[/img]#Usi',
'#\[img=?(\d+)?x?(\d+)?\](.*?)\[/img\]#Usi',
I need help with the other variations or turning all the variations in an unique REGEX.
I realy apreciate your help!
This should cover all cases:
<?php
$data = <<<DATA
[img]img_patch[/img]
[img=200x150]img_patch[/img]
[img width=200 height=150]img_patch[/img]
[img=width=200xheight=150]img_patch[/img]
[img width=200]img_patch[/img]
DATA;
$regex = '~
(?P<tag>\[img[^][]*\])
(?P<src>.+?)
\[/img]
~x';
$inner = '~\b(?P<key>width|height)?=(?P<value>[^\s\]]+)~';
$values = '~\d+~';
$data = preg_replace_callback($regex,
function($match) use($inner, $values) {
$attr = [];
preg_match_all($inner, $match['tag'], $attributes, PREG_SET_ORDER);
foreach($attributes as $attribute) {
if (!empty($attribute["key"])) $attr[$attribute["key"]] = $attribute["value"];
else {
preg_match_all($values, $attribute["value"], $width_height);
list($attr["width"], $attr["height"]) = array($width_height[0][0], $width_height[0][1]);
}
}
// do the actual replacement here
$attr["src"] = $match["src"];
$ret = "<img";
foreach ($attr as $key => $value) $ret .= " $key='$value'";
$ret .= '>';
return $ret;
},
$data);
echo $data;
?>
And yields
<img src='img_patch'>
<img height='150' width='200' src='img_patch'>
<img width='200' height='150' src='img_patch'>
<img height='150' width='200' src='img_patch'>
<img width='200' src='img_patch'>
The code uses a multi-step-approach: first matching all the tags, then analyzing the attributes. In the end, the new string is formed.
See a demo on ideone.com.
Note: As opposed to your (nickname) son, now you actually do know something, don't you?
Hey Jan I really thank you for your help! Yes, I don't know everything but I do know something. Actually I have created the following REGEX and works fine and covers all IMG TAGs:
'#\[img=(.+)\]#Usi',
'#\[img=+(\d+)x+(\d+)\](.+)\[/img\]#Usi',
'#\[img[\s|=]+[width=]+([0-9]+)?[\s|x]+[height=]+([0-9]+)\](.+)\[/img\]#Usi',
'#\[img[\s]+[width=]+([0-9]+)\](.+)\[/img\]#Usi',
I hope this post may help others on their own projects!

Regex not interworking with mysqli

I have the following php code:
//Usual mysqli stuff before here --> works
while($post = $result->fetch_assoc()) {
$post_text = (string)$post['post_text'];
$regex = "/(" . (string)$some_var . " - )(.*?)( [A-Z]{4})/";
}
echo $post_text . "<br />";
echo $regex . "<br />";
preg_match($regex, $post_text, $matches);
echo count($matches);
Unfortunately I don't get any results back, even though my regex seems to work with this tool here: http://www.phpliveregex.com/
I also tried to put the result manually in a string like this:
$my_string = "blablablablabla - proper stuff where the regex will find smth"
Using this string, the code works, but just not with the string I get back from the db. What am I doing wrong here?
Thx in advance.
Made it work myself. Problem was that the string contained new lines so I am running now the following regex on the stuff received from the db:
preg_replace('/\s+/', ' ', $post_text)

Regex match full hyperlink only with certain class

I have a string that has some hyperlinks inside. I want to match with regex only certain link from all of them. I can't know if the href or the class comes first, it may be vary.
This is for example a sting:
<div class='wp-pagenavi'>
<span class='pages'>Page 1 of 8</span><span class='current'>1</span>
<a href='http://stv.localhost/channel/political/page/2' class='page'>2</a>
»eee<span class='extend'>...</span><a href='http://stv.localhost/channel/political/page/8' class='last'>lastן »</a>
<a class="cccc">xxx</a>
</div>
I want to select from the aboce string only the one that has the class nextpostslink
So, the match in this example should return this -
»eee
This regex is the most close I could get -
/<a\s?(href=)?('|")(.*)('|") class=('|")nextpostslink('|")>.{1,6}<\/a>/
But it is selecting the links from the start of the string.
I think my problem is in the (.*) , but I can't figure out how to change this to select only the needed link.
I would appreciate your help.
It's much better to use a genuine HTML parser for this. Abandon all attempts to use regular expressions on HTML.
Use PHP's DOMDocument instead:
$dom = new DOMDocument;
$dom->loadHTML($yourHTML);
foreach ($dom->getElementsByTagName('a') as $link) {
$classes = explode(' ', $link->getAttribute('class'));
if (in_array('nextpostslink', $classes)) {
// $link has the class "nextpostslink"
}
}
Not sure if that's what you're but anyway: it's a bad idea to parse html with regex. Use a xpath implementation in order to reach the desired elements. The following xpath expression would give you all the 'a' elements with class "nextpostlink" :
//a[contains(#class,"nextpostslink")]
There are loads of xpath info around, since you didn't mention your programming language here goes a quick xpath tutorial using java: http://www.ibm.com/developerworks/library/x-javaxpathapi/index.html
Edit:
php + xpath + html: http://dev.juokaz.com/php/web-scraping-with-php-and-xpath
This would work in php:
/<a[^>]+href=(\"|')([^\"']*)('|\")[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*('|\")[^>]*>(.{1,6})<\/a>/m
This is of course assuming that the class attribute always comes after the href attribute.
This is a code snippet:
$html = <<<EOD
<div class='wp-pagenavi'>
<span class='pages'>Page 1 of 8</span><span class='current'>1</span>
<a href='http://stv.localhost/channel/political/page/2' class='page'>2</a>
»eee<span class='extend'>...</span><a href='http://stv.localhost/channel/political/page/8' class='last'>lastן »</a>
<a class="cccc">xxx</a>
</div>
EOD;
$regexp = "/<a[^>]+href=(\"|')([^\"']*)('|\")[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*('|\")[^>]*>(.{1,6})<\/a>/m";
$matches = array();
if(preg_match($regexp, $html, $matches)) {
echo "URL: " . $matches[2] . "\n";
echo "Text: " . $matches[6] . "\n";
}
I would however suggest first matching the link and then getting the url so that the order of the attributes doesn't matter:
<?php
$html = <<<EOD
<div class='wp-pagenavi'>
<span class='pages'>Page 1 of 8</span><span class='current'>1</span>
<a href='http://stv.localhost/channel/political/page/2' class='page'>2</a>
»eee<span class='extend'>...</span><a href='http://stv.localhost/channel/political/page/8' class='last'>lastן »</a>
<a class="cccc">xxx</a>
</div>
EOD;
$regexp = "/(<a[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*('|\")[^>]*>(.{1,6})<\/a>)/m";
$matches = array();
if(preg_match($regexp, $html, $matches)) {
$link = $matches[0];
$text = $matches[4];
$regexp = "/href=(\"|')([^'\"]*)(\"|')/";
$matches = array();
if(preg_match($regexp, $html, $matches)) {
$url = $matches[2];
echo "URL: $url\n";
echo "Text: $text\n";
}
}
You could of course extend the regexp by matching one of the both variants (class first vs href first) but it would be very long and I don't think it would be a performance increase.
Just as a proof of concept I created a regexp that doesn't care about the order:
/<a[^>]+(href=(\"|')([^\"']*)('|\")[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*(\"|')|class=(\"|')[^'\"]*nextpostslink[^'\"]*(\"|')[^>]+href=(\"|')([^\"']*)('|\"))[^>]*>(.{1,6})<\/a>/m
The text will be in group 12 and the URL will be in either group 3 or group 10 depending on the order.
As the question is to get it by regex, here is how <a\s[^>]*class=["|']nextpostslink["|'][^>]*>(.*)<\/a>.
It doesn't matter in which order are the attributs and it also consider simple or double quotes.
Check the regex online: https://regex101.com/r/DX03KD/1/
I replaced the (.*) with [^'"]+ as follows:
<a\s*(href=)?('|")[^'"]+('|") class=('|")nextpostslink('|")>.{1,6}</a>
Note: I tried this with RegEx Buddy so I didnt need to escape the <>'s or /

Split variable content into multiple paragraphs

Hey there,
I have this little php code:
<p class="category_text"><? echo $category_text; ?></p>
I waht to split the $category_text and get something like this:
This is sentence 1 of category_text
This is sentence 2 of category_text
and so on...
$category_text has about 300 words and lets say 6 sentences. How could I split the text in multiple paragraphs (delimited by the stop sings ".")
Thank you very much!
echo '<p class="category_text">'
. implode('</p><p class="category_text">', explode('.',$string))
.'</p>';
You can just replace the "." by the tag "":
<p class="category_text"><? echo str_replace('.', '.<br />', $category_text); ?></p>
It's not a perfect solution! But if you text is simple enough this little trick should work.
For example if you have a line with 3 dots:
$category_text = "Ok...";
It will show up like that:
OK.
.
.
Also if your sentences finish by "?" or "!" you can also use that:
<p class="category_text"><? echo str_replace(array('.', '!', '?'), array('.<br />', '!<br />', '?<br />'), $category_text); ?></p>
PS: My solution will create one paragraph "" but with multiple line break
Try creating an array, and then output the lines one by one. A sentence ending in ... would still be recognized as still ends in ". ".
$sentences = explode('. ', $category_text)
foreach($sentences as $val)
{
echo $val . ".<br /><br />";
}
You want to split a text into sentences, which is not trivial - using explode(".", $string) does often not give good results.
Search Stackoverflow for "php split sentence", or directly try the solution to PHP: Parse document / text into sentences :
http://www.zubrag.com/scripts/text-splitter.php
Once you have an array with sentences, use
echo '<p>' . implode('</p><p>', $sentences) . '</p>';
to echo them out.

Replacing symbol strange situation

foreach($ret as $object)
{
$res = $object->...;
$img_src = $res[0]->src;
echo $img_src . '<br />';
echo str_replace("&size=2", "", $img_src) . '<br /><br />';
}
$img_src ~ 'http://site.com/img.jpg&size=2'
And I have to receive same link but without &size=2. Why doesn't work my last line in code. It shows the same url.
Are you absolutely certain there are any goofy unprintable characters in your source string? Try debugging with this:
printf("%s\n", join(':', str_split($img_src)));
And make sure you really have &size=2 in your string. If you see two consecutive colons, you've got something like a \0 or some other character mucking up the works in the middle of your string.
Seems to work on this end:
http://site.com/img.jpg&size=2
http://site.com/img.jpg
from
<?php
$img_src = 'http://site.com/img.jpg&size=2';
echo $img_src.'<br />';
echo str_replace("&size=2", "", $img_src).'<br/><br/>';
?>
use preg_replace:
$c=preg_replace("/&size=2/","",$img_src);
Example of usage
<?php
$sr="http://site.com/img.jpg&size=2";
echo preg_replace("/&size=2/","",$sr);
?>
This will output
http://site.com/img.jpg

Categories