Getting the src from text using preg_match - php

I am adding images to a WP site via a shortcode:
[figure src="" url"" caption=""]
Where the src is the image source, the url is the link to a larger image (if wanted), and the caption is the caption.
I am trying to get the src from the above basing it off this code:
$pattern = '/<img[^>]*src=\"?(?<src>[^\"]*)\"?[^>]*>/im';
preg_match( $pattern, $html, $matches );
if($matches['src']) {
return $matches['src'];
}
But am trying to figure out how to get the [figure] match.

/\[figure(( src="(?<src>[^"]+)")?|( url="(?<url>[^"]+)")?|( caption="(?<caption>[^"]+)")?)*\]/i
[figure url="http://example.com/large.gif" caption="my caption" src="http://example.com/figure.gif"]
Array
(
[0] => [figure url="http://example.com/large.gif" caption="my caption" src="http://example.com/figure.gif"]
[1] =>
[2] => src="http://example.com/figure.gif"
[src] => http://example.com/figure.gif
[3] => http://example.com/figure.gif
[4] => url="http://example.com/large.gif"
[url] => http://example.com/large.gif
[5] => http://example.com/large.gif
[6] => caption="my caption"
[caption] => my caption
[7] => my caption
)

Try this
$foo = [figure src="" url"" caption=""];
preg_match( '/src="([^"]*)"/i', $foo, $array ) ;
$finalStr = $array[0];
$explode = explode("=", $finalStr);
echo $explode[1];

Related

php explode doesn't second item

I'm trying to split this image string: $output = "<img typeof="foaf:Image" src="http://asite.dev/sites/default/files/video/MBI_Part%201_v9.jpg" width="1920" height="1080" alt="" />
I'm doing it like this: $split = explode('"', $output);
But when I print_r($split);it returns:
Array ( [0] => typeof="foaf:Image" [2] => src="http://makingitcount.dev/sites/default/files/video/MBI_Part%201_v9.jpg" [3] => width="1920" [4] => height="1080" [5] => alt="" [6] => /> )
No second value! Where'd it go? split[1] throws an error, of course. I also notice that the "<img" part of the string isn't in the array either.
The problem stems from the parsing of the html tag. If you remove the <img at the beginning of the html string, you'll notice the rest of the attributes will parse into an array with a proper number sequence (including a '1' element). You can solve your problem by formatting your quotes to tell php not to parse the html and treat the entire unit strictly as a string.
If you want to bypass this whole mess, you can also just use regular expression matching to collect tag information and pass it into an array. $matches[0][*] will contain all of your tag attributes, and $matches[1] contains the tag itself (img)
$output = '<img typeof="Image" src="http://asite.dev/sites/default/files/video/MBI_Part%201_v9.jpg" width="1920" height="1080" alt="" />';
$pattern = '( \w+|".*?")';
preg_match_all($pattern, $output, $matches);
preg_match("[\w+]",$output,$matches[1]);
print_r($matches);
which gives you
Array ( [0] => Array ( [0] => typeof [1] => "Image" [2] => src [3] => "http://asite.dev/sites/default/files/video/MBI_Part%201_v9.jpg" [4] => width [5] => "1920" [6] => height [7] => "1080" [8] => alt [9] => "" )
[1] => Array ( [0] => img ) )

use regex to get all attributes

My input string is
$center="[{video('whats-new/reaction-vid1.jpg')} width=\"580\" height=\"326\" alt=\"\" video=\"Zb36h4K2IKQ\"]";
$pattern="/\[{video\('([a-zA-Z0-9\/\-\_]+)'\)}\s+(width|height|alt|video)=\"[^\"]+\"\]/";
if(preg_match($pattern,$center,$matches)){
print_r($matches);exit;
}
However it is not working. Basically i want to extra attributes for width, height, alt and video.I have tried half an hour. can anyone point me in the right direction?
Use preg_match_all instead of preg_match
<?php
$center="[{video('whats-new/reaction-vid1.jpg')} width=\"580\" height=\"326\" alt=\"\" video=\"Zb36h4K2IKQ\"]";
$pattern="~video\('\K[^']*|(?:width|height|alt|video)=\"\K[^\"]*~";
preg_match_all($pattern, $center, $matches);
print_r($matches);
?>
Output:
Array
(
[0] => Array
(
[0] => whats-new/reaction-vid1.jpg
[1] => 580
[2] => 326
[3] =>
[4] => Zb36h4K2IKQ
)
)

need find from text all "src" elements

i need get from text all "src" elements. "src" can have " or '.
Text found in the well, but if element have id, style... They also grabbed.
I need only src value.
My code:
$html = 'text text <img src="img1.png"/> as as <img src=\'second.gif\' id ="test" /> as';
preg_match_all('/src=("|\')([^"]*)("|\')/', $html, $htmlSrc);
echo '<pre>';
print_r($htmlSrc);
Array
(
[0] => Array
(
[0] => src="img1.png"
[1] => src='second.gif' id ="
)
[1] => Array
(
[0] => "
[1] => '
)
[2] => Array
(
[0] => img1.png
[1] => second.gif' id =
)
[3] => Array
(
[0] => "
[1] => "
)
)
Regexp is a bad idea and you will probably end up with unmaintainable and unreliable code. It would be easy and reliable if you use an HTML parser. You can find an example here: http://simplehtmldom.sourceforge.net/
preg_match_all('/src="|\'([^"\']*)"|\'/', $html, $htmlSrc);
print_r($htmlSrc[2]);
Seems to work better.

php regular expression to match shorttags

This is close, but is failing to match successive "attributes":
$string = "single attribute [include file=\"bob.txt\"] multiple attributes [another prop=\"val\" attr=\"one\"] no attributes [tag] etc";
preg_match_all('/\[((\w+)((\s(\w+)="([^"]+)"))*)\]/', $string, $matches, PREG_SET_ORDER);
print '<pre>' . print_r($matches, TRUE) . '</pre>';
Gives back the following:
Array
(
[0] => Array
(
[0] => [include file="bob.txt"]
[1] => include file="bob.txt"
[2] => include
[3] => file="bob.txt"
[4] => file="bob.txt"
[5] => file
[6] => bob.txt
)
[1] => Array
(
[0] => [another prop="val" attr="one"]
[1] => another prop="val" attr="one"
[2] => another
[3] => attr="one"
[4] => attr="one"
[5] => attr
[6] => one
)
[2] => Array
(
[0] => [tag]
[1] => tag
[2] => tag
)
)
Where [2] is the tag name, [5] is the attribute name and [6] is the attribute value.
The failure is on the second node - it catches attr="one" but not prop="val"
TYIA.
(this is only meant for limited, controlled use - not broad distribution - so I don't need to worry about single quotes or escaped double quotes)
Unfortunately there is no way to repeat capture groups like that. Personally, I would use preg_match to match the tags themselves (i.e. remove all the extra parentheses inside the regex), then foreach match you can then extract the attributes. Something like this:
$string = "single attribute [include file=\"bob.txt\"] multiple attributes [another prop=\"val\" attr=\"one\"] no attributes [tag] etc";
preg_match_all('/\[\w+(?:\s\w+="[^"]+")*\]/', $string, $matches);
foreach($matches[0] as $m) {
preg_match('/^\w+/', $m, $tagname); $tagname = $tagname[0];
preg_match_all('/\s(\w+)="([^"]+)"/', $m, $attrs, PREG_SET_ORDER);
// do something with $tagname and $attrs
}
Note that if you intend to replace the tag with some content, you should use preg_replace_callback like so:
$string = "single attribute [include file=\"bob.txt\"] multiple attributes [another prop=\"val\" attr=\"one\"] no attributes [tag] etc";
$output = preg_replace_callback('/\[\w+(?:\s\w+="[^"]+")*\]/', $string, function($match) {
preg_match('/^\w+/', $m, $tagname); $tagname = $tagname[0];
preg_match_all('/\s(\w+)="([^"]+)"/', $m, $attrs, PREG_SET_ORDER);
$result = // do something with $tagname and $attrs
return $result;
});

Extract HTML Tags using preg_split

i have a string
$string = 'this is test <b>bold</b> this is another test <img src="#"> image' ;
i want split html tag alone & normal text alone.
need the following output like :
[0] => this is test
[1] => <b>bold</b>
[2] => this is another test
[3] => <img src="#">
[4] => image
using this code.
$strip = preg_split('/\s+(?![^<>]+>)/m', $string , -1, PREG_SPLIT_DELIM_CAPTURE) ;
output.
[0] => this
[1] => is
[2] => test
[3] => <b>bold</b>
[4] => this
[5] => .....
i'm newbie. pls help!
I find it easier to get that result using preg_match:
$string = 'this is test <b>bold</b> this is another test <img src="#"> image <hr/>';
preg_match_all('/<([^\s>]+)(.*?)>((.*?)<\/\1>)?|(?<=^|>)(.+?)(?=$|<)/i',$string,$result);
$result = $result[0];
// assign the result to the variable
foreach ($result as &$group) {
$group = preg_replace('/^\s*(.*?)\s*$/','$1',$group);
// this is to eliminate preceding and trailing spaces
}
print_r($result);
EDIT:
I was assuming there should be at least 1 character in between the opening and the closing of a tag, but it's not necessary so I changed the second + into an * and I took into account the possibility of case insensitivity in tags.
Output:
Array
(
[0] => this is test
[1] => <b>bold</b>
[2] => this is another test
[3] => <img src="#">
[4] => image
[4] => <hr/>
)
EDIT 2:
This won't work with irregular situations such as thode exemplified in the comments:
foo<b>bar<i>ital</b>ic</i> or foo<b>bar<b>baz</b>fail</b>
To make it work the RegEx should be tweaked to look inside the matches and process them accordingly.

Categories