php preg_match_all and preg_replace. - php

[caption id="attachment_1342" align="alignleft" width="300" caption="Cheers... "Forward" diversifying innovation to secure first place. "][/caption] A group of 35 students from...
I'm reading this data from api. I want the text just start with A group of 35 students from.... Help me to replace the caption tag with null. This is what I tried:
echo "<table>";
echo "<td>".$obj[0]['title']."</td>";
echo "<td>".$obj[0]['content']."</td>";
echo "</table>";
$html = $obj[0]['content'];
preg_match_all('/<caption>(.*?)<\/caption>/s', $html, $matches);
preg_replace('',$matches, $obj[0]['content']);
Any help.

$pattern = "/\[caption (.*?)\](.*?)\[\/caption\]/i";
$removed = preg_replace($pattern, "", $html);

echo preg_replace("#\[caption.*\[/caption\]#u", "", $str);

In the snippet mentioned in the question, regex search pattern is incorrect. there is no <caption> in the input. its <caption id....
Second using preg_replace doesn't serve any purpose here. preg_replace expects three arguments. first should be a regex pattern for search. second the string to replace with. and third is input string.
Following snippet using preg_match will work.
<?php
//The input string from API
$inputString = '<caption id="attachment_1342" align="alignleft" width="300" caption="Cheers... "Forward" diversifying innovation to secure first place. "></caption> A group of 35 students from';
//Search Regex
$pattern = '/<caption(.*?)<\/caption>(.*?)$/';
//preg_match searches inputString for a match to the regular expression given in pattern
//The matches are placed in the third argument.
preg_match($pattern, $inputString, $matches);
//First match is the whole string. second if the part before caption. third is part after caption.
echo $matches[2];
// var_dump($matches);
?>
if you still want to use preg_match_all for some reason. following snippet is modification of the one mentioned in question -
<?php
//Sample Object for test
$obj = array(
array(
'title' => 'test',
'content' => '<caption id="attachment_1342" align="alignleft" width="300" caption="Cheers... "Forward" diversifying innovation to secure first place. "></caption> A group of 35 students from'
)
);
echo "<table border='1'>";
echo "<td>".$obj[0]['title']."</td>";
echo "<td>".$obj[0]['content']."</td>";
echo "</table>";
$html = $obj[0]['content'];
//preg_match_all will put the caption tag in first match
preg_match_all('/<caption(.*?)<\/caption>/s', $html, $matches2);
//var_dump($matches2);
//use replace to remove the chunk from content
$obj[0]['content'] = str_replace($matches2[0], '', $obj[0]['content']);
//var_dump($obj);
?>

Thank you guys. I use explode function to do this.
$html = $obj[0]['content'];
$code = (explode("[/caption]", $html));
if($code[1]==''){
echo $code[1];
}

Related

Display Longtext Description Output

Hello how to display the content on the database correctly
[center][youtube]vn9mMeWcgoM[/youtube] [/center]
[center]This is a test youtube post video [/center][center][img]http://3.bp.blogspot.com/-RgszeTgP4eA/Vck93de-LZI/AAAAAAAAaOQ/F0s-XK5Zh4c/w1200-h630-p-k-no-nu/samabawan_island_leyte_philippines.jpg[/img][/center]
That is the Output it should display the image and video
This is my display code
<?php echo nl2br($item['content']); ?>
The way you can do this by using preg_replace()
And this is a simple function i have written, i hope this helps you.
$text = "[center][youtube]vn9mMeWcgoM[/youtube] [/center] [center]This is a test youtube post video [/center][center][img]http://3.bp.blogspot.com/-RgszeTgP4eA/Vck93de-LZI/AAAAAAAAaOQ/F0s-XK5Zh4c/w1200-h630-p-k-no-nu/samabawan_island_leyte_philippines.jpg[/img][/center]";
function replace($string){
$string = preg_replace("/\[center\](.*?)\[\/center]/", "<div align='center'>$1</div>", $string);
$string = preg_replace("/\[youtube\](.*?)\[\/youtube]/", "<iframe src=\"https://www.youtube.com/embed/$1\"></iframe>", $string);
$string = preg_replace("/\[img\](.*?)\[\/img]/", "<img src='$1' />", $string);
return $string;
}
echo replace($text);
For your code, call it like this, echo nl2br(replace($item['content']));
EDITED
Here is the way you can add more tags,
First read about the function preg_replace()
And read Possible modifiers in regex patterns for more informations.
Now you can add,
$string = preg_replace("/\[url\](.*?)\[\/url]/", "<a href=\"$1\" >$1</a>", $string);
After last $string variable in my simple function.
This mean replace what inside URL tags with <a href=\"$1\" >$1</a>
And $1 is the URL, in most cases $1 is the element you want.

PhP Find (and replace) string between two different strings

I have a string, that look like this "<html>". Now what I want to do, is get all text between the "<" and the ">", and this should apply to any text, so that if i did "<hello>", or "<p>" that would also work. Then I want to replace this string with a string that contains the string between the tags.
For example
In:
<[STRING]>
Out:
<this is [STRING]>
Where [STRING] is the string between the tags.
Use a capture group to match everything after < that isn't >, and substitute that into the replacement string.
preg_replace('/<([^>]*)>/, '<this is $1>/, $string);
here is a solution to test on the pattern exists and then capture it to finally modify it ...
<?php
$str = '<[STRING]>';
$pattern = '#<(\[.*\])>#';
if(preg_match($pattern, $str, $matches)):
var_dump($matches);
$str = preg_replace($pattern, '<this is '.$matches[1].'>', $str);
endif;
echo $str;
?>
echo $str;
You can test here: http://ideone.com/uVqV0u
I don't know if this can be usefull to you.
You can use a regular expression that is the best way. But you can also consider a little function that remove first < and last > char from your string.
This is my solution:
<?php
/*Vars to test*/
$var1="<HTML>";
$var2="<P>";
$var3="<ALL YOU WANT>";
/*function*/
function replace($string_tag) {
$newString="";
for ($i=1; $i<(strlen($string_tag)-1); $i++){
$newString.=$string_tag[$i];
}
return $newString;
}
/*Output*/
echo (replace($var1));
echo "\r\n";
echo (replace($var2));
echo "\r\n";
echo (replace($var3));
?>
Output give me:
HTML
P
ALL YOU WANT
Tested on https://ideone.com/2RnbnY

How would one use PHP preg_match_all to differentiate anchor elements identified by attribute of inner HTML element?

I have sets of HTML anchor elements enclosing image elements. For each set, using PHP-CLI, I want to pull the URLs and classify them according to their types. The type of anchor can only be determined by an attribute of its child image element. It would be easy if there was only one of each type per set. My problem is when two anchor elements of one type are separated by one or more of the other types. My non-greedy parenthesized sub-pattern seems to become greedy and expands to find the second relevant child attribute. In my test script I'm trying to pull the 'Userlink' URLs from amongst the other types. Using a simple pattern like:
#<a href="(.*?)" custattr="value1"><img alt="Userlink"#
On a set like:
<li><img alt="Userlink" class="common_link_class" height="123" src="pic0.png" width="123" style="width: 123px;"></li><li><img alt="Socnet1" class="common_link_class" height="123" src="pic1.png" width="123" style="width: 123px;"></li><li><img alt="Socnet2" class="common_link_class" height="123" src="pic2.png" width="123" style="width: 123px;"></li><li><img alt="Usermail" class="common_link_class" height="123" src="pic3.png" width="123" style="width: 123px;"></li><li><img alt="Userlink" class="common_link_class" height="123" src="pic4.png" width="123" style="width: 123px;"></li>
(sorry, but the actual html is on one line like that)
My sub-pattern captures from the beginning of the first "Userlink" URL to the end of the last one.
I've tried many variations of look-aheads, not sure I should list them all here. So far they've either returned no match at all or the same as described above.
Here's my test script (running in a Bash shell):
#!/usr/bin/php
<?
$lines = 0;
$input = "";
$matches = array();
while ($line = fgets(STDIN)){
$input .= $line;
$lines++;
}
fwrite(STDERR, "Processing $lines\n");
$pcre = '#<a href="(.*?)" custattr="value1"><img alt="Userlink"#';
if (preg_match_all($pcre,$input,$matches)){
fwrite(STDERR, "\$matches has " . count($matches) . " elements\n");
foreach ($matches[1] as $match){
fwrite(STDOUT, $match . "\n");
}
}
?>
What PCRE pattern for PHP's preg_match_all() would return the two "Userlink" URLs in the above example?
I have taken the liberty of changing your variable names:
$pattern = '~<a href="([^"]++)" custattr="value1"><img alt="Userlink"~';
if ($nb = preg_match_all($pattern, $input, $matches)) {
fwrite(STDERR, "\$matches has " . $nb . " elements\n");
fwrite(STDOUT, implode("\n", $match) . "\n");
}
Note that the preg_match_all function returns the number of matches.
This regex should work -
<a href="([^"]*?)"[^>]*\><img alt="Userlink"
You can see how it work here.
Testing it -
$pcre = '/<a href="([^"]*?)"[^>]*\><img alt="Userlink"/';
if (preg_match_all($pcre,$input,$matches)){
var_dump($matches);
//$matches[1] will be the array containing the urls.
}
/*
OUTPUT-
array
0 =>
array
0 => string '<a href="http://www.userlink1.com/my/page.html" custattr="value1"><img alt="Userlink"' (length=85)
1 => string '<a href="http://www.userlink2.com/my/page.html" custattr="value1"><img alt="Userlink"' (length=85)
1 =>
array
0 => string 'http://www.userlink1.com/my/page.html' (length=37)
1 => string 'http://www.userlink2.com/my/page.html' (length=37)
*/

preg_replace image tag to other format string

i have to change image tags per php ...
here is the source string ...
'picture number is <img src=get_blob.php?id=77 border=0> howto use'
the result should be like this
'picture number is #77# howto use'
I have already tested a lot, but I only get the number of the image as a result ...
this is my last test ...
$content = 'picture number is <img src=get_blob.php?id=77 border=0> howto use';
$content = preg_replace('|\<img src=get_blob.php\?id=(\d+)+( border\=0\>)|e', '$1', $content);
now $content is 77
I hope someone can help me
Almost correct. Just drop the e flag:
$content = 'picture number is <img src=get_blob.php?id=77 border=0> howto use';
$content = preg_replace('/\<img src=get_blob.php\?id=(\d+)+( border\=0\>)/', '#$1#', $content);
echo $content;
Outputs:
picture number is #77# howto use
See documentation for more information about regular expression modifiers in PHP.
Don't use the e flag, it's not necessairy for regex placeholders, just try this:
preg_replace('/\<.*\?id\=([0-9]+)[^>]*>/', '#$1#', $string);
This regex does assume id will be the first parameter of the src url, if this isn't always going to be the case, use this:
preg_replace('/\<.*[?&]id\=([0-9]+)[^>]*>/', '#$1#', $string);

Regex match full hyperlink only with certain class

I have a string that has some hyperlinks inside. I want to match with regex only certain link from all of them. I can't know if the href or the class comes first, it may be vary.
This is for example a sting:
<div class='wp-pagenavi'>
<span class='pages'>Page 1 of 8</span><span class='current'>1</span>
<a href='http://stv.localhost/channel/political/page/2' class='page'>2</a>
»eee<span class='extend'>...</span><a href='http://stv.localhost/channel/political/page/8' class='last'>lastן »</a>
<a class="cccc">xxx</a>
</div>
I want to select from the aboce string only the one that has the class nextpostslink
So, the match in this example should return this -
»eee
This regex is the most close I could get -
/<a\s?(href=)?('|")(.*)('|") class=('|")nextpostslink('|")>.{1,6}<\/a>/
But it is selecting the links from the start of the string.
I think my problem is in the (.*) , but I can't figure out how to change this to select only the needed link.
I would appreciate your help.
It's much better to use a genuine HTML parser for this. Abandon all attempts to use regular expressions on HTML.
Use PHP's DOMDocument instead:
$dom = new DOMDocument;
$dom->loadHTML($yourHTML);
foreach ($dom->getElementsByTagName('a') as $link) {
$classes = explode(' ', $link->getAttribute('class'));
if (in_array('nextpostslink', $classes)) {
// $link has the class "nextpostslink"
}
}
Not sure if that's what you're but anyway: it's a bad idea to parse html with regex. Use a xpath implementation in order to reach the desired elements. The following xpath expression would give you all the 'a' elements with class "nextpostlink" :
//a[contains(#class,"nextpostslink")]
There are loads of xpath info around, since you didn't mention your programming language here goes a quick xpath tutorial using java: http://www.ibm.com/developerworks/library/x-javaxpathapi/index.html
Edit:
php + xpath + html: http://dev.juokaz.com/php/web-scraping-with-php-and-xpath
This would work in php:
/<a[^>]+href=(\"|')([^\"']*)('|\")[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*('|\")[^>]*>(.{1,6})<\/a>/m
This is of course assuming that the class attribute always comes after the href attribute.
This is a code snippet:
$html = <<<EOD
<div class='wp-pagenavi'>
<span class='pages'>Page 1 of 8</span><span class='current'>1</span>
<a href='http://stv.localhost/channel/political/page/2' class='page'>2</a>
»eee<span class='extend'>...</span><a href='http://stv.localhost/channel/political/page/8' class='last'>lastן »</a>
<a class="cccc">xxx</a>
</div>
EOD;
$regexp = "/<a[^>]+href=(\"|')([^\"']*)('|\")[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*('|\")[^>]*>(.{1,6})<\/a>/m";
$matches = array();
if(preg_match($regexp, $html, $matches)) {
echo "URL: " . $matches[2] . "\n";
echo "Text: " . $matches[6] . "\n";
}
I would however suggest first matching the link and then getting the url so that the order of the attributes doesn't matter:
<?php
$html = <<<EOD
<div class='wp-pagenavi'>
<span class='pages'>Page 1 of 8</span><span class='current'>1</span>
<a href='http://stv.localhost/channel/political/page/2' class='page'>2</a>
»eee<span class='extend'>...</span><a href='http://stv.localhost/channel/political/page/8' class='last'>lastן »</a>
<a class="cccc">xxx</a>
</div>
EOD;
$regexp = "/(<a[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*('|\")[^>]*>(.{1,6})<\/a>)/m";
$matches = array();
if(preg_match($regexp, $html, $matches)) {
$link = $matches[0];
$text = $matches[4];
$regexp = "/href=(\"|')([^'\"]*)(\"|')/";
$matches = array();
if(preg_match($regexp, $html, $matches)) {
$url = $matches[2];
echo "URL: $url\n";
echo "Text: $text\n";
}
}
You could of course extend the regexp by matching one of the both variants (class first vs href first) but it would be very long and I don't think it would be a performance increase.
Just as a proof of concept I created a regexp that doesn't care about the order:
/<a[^>]+(href=(\"|')([^\"']*)('|\")[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*(\"|')|class=(\"|')[^'\"]*nextpostslink[^'\"]*(\"|')[^>]+href=(\"|')([^\"']*)('|\"))[^>]*>(.{1,6})<\/a>/m
The text will be in group 12 and the URL will be in either group 3 or group 10 depending on the order.
As the question is to get it by regex, here is how <a\s[^>]*class=["|']nextpostslink["|'][^>]*>(.*)<\/a>.
It doesn't matter in which order are the attributs and it also consider simple or double quotes.
Check the regex online: https://regex101.com/r/DX03KD/1/
I replaced the (.*) with [^'"]+ as follows:
<a\s*(href=)?('|")[^'"]+('|") class=('|")nextpostslink('|")>.{1,6}</a>
Note: I tried this with RegEx Buddy so I didnt need to escape the <>'s or /

Categories