php explode doesn't second item - php

I'm trying to split this image string: $output = "<img typeof="foaf:Image" src="http://asite.dev/sites/default/files/video/MBI_Part%201_v9.jpg" width="1920" height="1080" alt="" />
I'm doing it like this: $split = explode('"', $output);
But when I print_r($split);it returns:
Array ( [0] => typeof="foaf:Image" [2] => src="http://makingitcount.dev/sites/default/files/video/MBI_Part%201_v9.jpg" [3] => width="1920" [4] => height="1080" [5] => alt="" [6] => /> )
No second value! Where'd it go? split[1] throws an error, of course. I also notice that the "<img" part of the string isn't in the array either.

The problem stems from the parsing of the html tag. If you remove the <img at the beginning of the html string, you'll notice the rest of the attributes will parse into an array with a proper number sequence (including a '1' element). You can solve your problem by formatting your quotes to tell php not to parse the html and treat the entire unit strictly as a string.
If you want to bypass this whole mess, you can also just use regular expression matching to collect tag information and pass it into an array. $matches[0][*] will contain all of your tag attributes, and $matches[1] contains the tag itself (img)
$output = '<img typeof="Image" src="http://asite.dev/sites/default/files/video/MBI_Part%201_v9.jpg" width="1920" height="1080" alt="" />';
$pattern = '( \w+|".*?")';
preg_match_all($pattern, $output, $matches);
preg_match("[\w+]",$output,$matches[1]);
print_r($matches);
which gives you
Array ( [0] => Array ( [0] => typeof [1] => "Image" [2] => src [3] => "http://asite.dev/sites/default/files/video/MBI_Part%201_v9.jpg" [4] => width [5] => "1920" [6] => height [7] => "1080" [8] => alt [9] => "" )
[1] => Array ( [0] => img ) )

Related

use regex to get all attributes

My input string is
$center="[{video('whats-new/reaction-vid1.jpg')} width=\"580\" height=\"326\" alt=\"\" video=\"Zb36h4K2IKQ\"]";
$pattern="/\[{video\('([a-zA-Z0-9\/\-\_]+)'\)}\s+(width|height|alt|video)=\"[^\"]+\"\]/";
if(preg_match($pattern,$center,$matches)){
print_r($matches);exit;
}
However it is not working. Basically i want to extra attributes for width, height, alt and video.I have tried half an hour. can anyone point me in the right direction?
Use preg_match_all instead of preg_match
<?php
$center="[{video('whats-new/reaction-vid1.jpg')} width=\"580\" height=\"326\" alt=\"\" video=\"Zb36h4K2IKQ\"]";
$pattern="~video\('\K[^']*|(?:width|height|alt|video)=\"\K[^\"]*~";
preg_match_all($pattern, $center, $matches);
print_r($matches);
?>
Output:
Array
(
[0] => Array
(
[0] => whats-new/reaction-vid1.jpg
[1] => 580
[2] => 326
[3] =>
[4] => Zb36h4K2IKQ
)
)

need find from text all "src" elements

i need get from text all "src" elements. "src" can have " or '.
Text found in the well, but if element have id, style... They also grabbed.
I need only src value.
My code:
$html = 'text text <img src="img1.png"/> as as <img src=\'second.gif\' id ="test" /> as';
preg_match_all('/src=("|\')([^"]*)("|\')/', $html, $htmlSrc);
echo '<pre>';
print_r($htmlSrc);
Array
(
[0] => Array
(
[0] => src="img1.png"
[1] => src='second.gif' id ="
)
[1] => Array
(
[0] => "
[1] => '
)
[2] => Array
(
[0] => img1.png
[1] => second.gif' id =
)
[3] => Array
(
[0] => "
[1] => "
)
)
Regexp is a bad idea and you will probably end up with unmaintainable and unreliable code. It would be easy and reliable if you use an HTML parser. You can find an example here: http://simplehtmldom.sourceforge.net/
preg_match_all('/src="|\'([^"\']*)"|\'/', $html, $htmlSrc);
print_r($htmlSrc[2]);
Seems to work better.

Getting the src from text using preg_match

I am adding images to a WP site via a shortcode:
[figure src="" url"" caption=""]
Where the src is the image source, the url is the link to a larger image (if wanted), and the caption is the caption.
I am trying to get the src from the above basing it off this code:
$pattern = '/<img[^>]*src=\"?(?<src>[^\"]*)\"?[^>]*>/im';
preg_match( $pattern, $html, $matches );
if($matches['src']) {
return $matches['src'];
}
But am trying to figure out how to get the [figure] match.
/\[figure(( src="(?<src>[^"]+)")?|( url="(?<url>[^"]+)")?|( caption="(?<caption>[^"]+)")?)*\]/i
[figure url="http://example.com/large.gif" caption="my caption" src="http://example.com/figure.gif"]
Array
(
[0] => [figure url="http://example.com/large.gif" caption="my caption" src="http://example.com/figure.gif"]
[1] =>
[2] => src="http://example.com/figure.gif"
[src] => http://example.com/figure.gif
[3] => http://example.com/figure.gif
[4] => url="http://example.com/large.gif"
[url] => http://example.com/large.gif
[5] => http://example.com/large.gif
[6] => caption="my caption"
[caption] => my caption
[7] => my caption
)
Try this
$foo = [figure src="" url"" caption=""];
preg_match( '/src="([^"]*)"/i', $foo, $array ) ;
$finalStr = $array[0];
$explode = explode("=", $finalStr);
echo $explode[1];

Get also non-matching strings from preg_match_all

I have some text with <img> tags in is that I need to divvy up. It's in the format
<img.../> Text text text <img.../>text text text<img.../> text text text
I have my regex qworking in preg_match_all so that I get
Array
(
[0] => Array
(
[0] => <img ... />
[1] => <img ... />
[2] => <img ... />
[3] => <img ... />
)
But it would be really nice if I could get
Array
(
[0] => Array
(
[0] => <img ... />
[1] => text text text
[2] => <img ... />
[3] => text text text
[4] => <img ... />
[5] => text text text
)
I've tried a few things but I really don't have a good understanding of PCREs. I don't want to use preg_split if I can avoid it because each of the images tags are different.
(I understand that a general HTML parser cannot be written with regular expressions, but in this case, I think this will work because the input data that I'm working is in the form I described. There aren't going to be any nested image tags that I'll need to worry about.)
PS I've tried /!<img.+>/, /!(<img.+>)/, and /(!(<img.+>))/ to get the non-matches, but it returns an empty array. I don't know a good way to debug regexes to know what I'm doing wrong.
I don't know what your issue (or actual code for that matter) is, but:
$r = preg_split('#<img[^>]+>#', $source, 0, PREG_SPLIT_DELIM_CAPTURE);
results in:
Array
(
[0] => <img.../>
[1] => Text text text
[2] => <img.../>
[3] => text text text
[4] => <img.../>
[5] => text text text
)
In place of a proper regex, you can keep using your fixed strings (I presume) with #<img1>|<img2>|<img3># of course.
You can have the information you want just not quite in the right format by doing this:
preg_match_all('~(<img[^>]*>)([^<]+)~', $str, $matches);
//if inside your "text text text" areas you have other html tags, use this:
preg_match_all('~(<img[^>]*>)(.+?)(?=<img|$)~', $str, $matches);
At this point, $matches[0] contains the entire matched string. $matches[1] contains all of the matches from the first set of parenthesis and $matches[2] contains all of the matches from the second set of parenthesis.
Array (
[0] => Array (
[0] => <img.../> Text text text
[1] => <img.../>text text text
[2] => <img.../> text text text
)
[1] => Array (
[0] => <img.../>
[1] => <img.../>
[2] => <img.../>
)
[2] => Array (
[0] => Text text text
[1] => text text text
[2] => text text text
)
)
Now if you really need it formatted the way you would like, just add these lines of code:
$answer = array();
foreach($matches[0] as $i=>$match){
$answer[] = $matches[1][$i];
$answer[] = $matches[2][$i];
};
$answer now looks like this:
Array (
[0] => <img ... />
[1] => Text text text
[2] => <img ... />
[3] => text text text
[4] => <img ... />
[5] => text text text
)

Extract HTML Tags using preg_split

i have a string
$string = 'this is test <b>bold</b> this is another test <img src="#"> image' ;
i want split html tag alone & normal text alone.
need the following output like :
[0] => this is test
[1] => <b>bold</b>
[2] => this is another test
[3] => <img src="#">
[4] => image
using this code.
$strip = preg_split('/\s+(?![^<>]+>)/m', $string , -1, PREG_SPLIT_DELIM_CAPTURE) ;
output.
[0] => this
[1] => is
[2] => test
[3] => <b>bold</b>
[4] => this
[5] => .....
i'm newbie. pls help!
I find it easier to get that result using preg_match:
$string = 'this is test <b>bold</b> this is another test <img src="#"> image <hr/>';
preg_match_all('/<([^\s>]+)(.*?)>((.*?)<\/\1>)?|(?<=^|>)(.+?)(?=$|<)/i',$string,$result);
$result = $result[0];
// assign the result to the variable
foreach ($result as &$group) {
$group = preg_replace('/^\s*(.*?)\s*$/','$1',$group);
// this is to eliminate preceding and trailing spaces
}
print_r($result);
EDIT:
I was assuming there should be at least 1 character in between the opening and the closing of a tag, but it's not necessary so I changed the second + into an * and I took into account the possibility of case insensitivity in tags.
Output:
Array
(
[0] => this is test
[1] => <b>bold</b>
[2] => this is another test
[3] => <img src="#">
[4] => image
[4] => <hr/>
)
EDIT 2:
This won't work with irregular situations such as thode exemplified in the comments:
foo<b>bar<i>ital</b>ic</i> or foo<b>bar<b>baz</b>fail</b>
To make it work the RegEx should be tweaked to look inside the matches and process them accordingly.

Categories