Regular expression in PHP to return array with all images from html, eg: all src="images/header.jpg" instances - php

I'd like to be able to return an array with a list of all images (src="" values) from html
[0] = "images/header.jpg"
[1] = "images/person.jpg"
is there a regular expression that can do this?
Many thanks in advance!

Welcome to the world of the millionth "how to exactract these values using regex" question ;-) I suggest to use the search tool before seeking an answer -- here is just a handful of topics that provide code to do exactly what you need;
replacing all image src tags in HTML text
getting image src in php
How to extract img src, title and alt from html using php?
Matching SRC attribute of IMG tag using preg_match
php regex : get src value
Dynamically replace the “src” attributes of all <img> tags (redux)
preg_match_all , get all img tag that include a string

/src="([^"]+)"/
The image will be in group 1.
Example:
preg_match_all('/src="([^"]+)"/', '<img src="lol"><img src="wat">', $arr, PREG_PATTERN_ORDER);
Returns:
Array
(
[0] => Array
(
[0] => src="lol"
[1] => src="wat"
)
[1] => Array
(
[0] => lol
[1] => wat
)
)

Here is a more polished version of the regular expression provided by Håvard:
/(?<=src=")[^"]+(?=")/
This expression uses Lookahead & Lookbehind Assertions to get only what you want.
$str = '<img src="/img/001.jpg"><img src="/img/002.jpg">';
preg_match_all('/(?<=src=")[^"]+(?=")/', $str, $srcs, PREG_PATTERN_ORDER);
print_r($srcs);
The output will look like the following:
Array
(
[0] => Array
(
[0] => /img/001.jpg
[1] => /img/002.jpg
)
)

I see that many peoples struggle with Håvard's post and <script> issue. Here is same solution on more strict way:
<img.*?src="([^"]+)".*?>
Example:
preg_match_all('/<img.*?src="([^"]+)".*?>/', '<img src="lol"><img src="wat">', $arr, PREG_PATTERN_ORDER);
Returns:
Array
(
[1] => Array
(
[0] => "lol"
[1] => "wat"
)
)
This will avoid other tags to be matched. HERE is example.

Related

PHP preg_match_all cutting out text elements

Got a question.
I have a HTML string that I have imported into PHP. file_get_contents.
What I want to do is to cut out some strings at the hand of tags like {{ or [tagname] and [/tagname]
And then return them in an array so I can process them.
For example :
{{header.tpl}} to have PHP load the header.tpl file and replace {{header.tpl}} with its contents.
I THINK that regex is the way to go. But that's exactly where my weak spot is. I have tried but to no avail.
I got as far as the following code:
<?php
$text = '
Hi this is a text<br />
[#title]
<br />
{{header.tpl}}
<br />link{{menu.tpl}}<br />
<hr/>
<h1>[#subtitle]</h1>
[#content]
{submenu}
{itemactive}<strong>[#link]</strong>{/itemactive}
{itema}[#link]{/item}
{/submenu}
';
$pattern = '^\{{.*}}^';
preg_match_all($pattern, $text, $matches, PREG_SET_ORDER);
print_r($matches);
?>
It gives some results though.
:
Array
(
[0] => Array
(
[0] => {{header.tpl}}
)
[1] => Array
(
[0] => {{menu.tpl}}
)
)
Is this what I want?
No but.... close!
Because when I am using the now nicely formatted $text string as one long string.
Like :
$text = 'Hi this is a text<br />[#title]<br />{{header.tpl}}<br />link{{menu.tpl}}<br /><hr/><h1>[#subtitle]</h1>[#content]';
It goes wrong!
The result will become:
Array
(
[0] => Array
(
[0] => {{header.tpl}}<br />link{{menu.tpl}}
)
)
And even then I want the result to be just like the one above!
Then the second problem...
I think I should use the same option for getting the submenu.
Something like :
$pattern = '^\{submenu}.*{/submenu}^';
But strangely that does not work. :-(
And all that I get is:
Array
(
)
Would anyone be able to tell me what I am doing wrong?
TIAD!!
You where close.
The problem with ^\{{.*}}^
.* is greedy and would match anything till the next }} change that to a non greedy .*? or as in the below regex.
A better regex would be
\{\{[^}]+}}
Example : http://regex101.com/r/gF4jZ6/1
\{\{ matches the {{
[^}]+ matches anything other than a }
}} matches }}
Will give an output as
Array ( [0] => Array ( [0] => {{header.tpl}} ) [1] => Array ( [0] => {{menu.tpl}} ) )
Note for differnece between the two regexes see this link also
Now inoder to match submenu, just add an s flag so that the . matches new line as well
$pattern = '/\{submenu}.*?{\/submenu}/s';

How to get a particular string using preg_replace?

i want to get a particular value from string in php. Following is the string
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_replace('/(.*)\[(.*)\](.*)\[(.*)\](.*)/', '$2', $str);
i want to get value of data01. i mean [1,2].
How can i achieve this using preg_replace?
How can solve this ?
preg_replace() is the wrong tool, I have used preg_match_all() in case you need that other item later and trimmed down your regex to capture the part of the string you are looking for.
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('/\[([0-9,]+)\]/',$string,$match);
print_r($match);
/*
print_r($match) output:
Array
(
[0] => Array
(
[0] => [1,2]
[1] => [2,3]
)
[1] => Array
(
[0] => 1,2
[1] => 2,3
)
)
*/
echo "Your match: " . $match[1][0];
?>
This enables you to have the captured characters or the matched pattern , so you can have [1,2] or just 1,2
preg_replace is used to replace by regular expression!
I think you want to use preg_match_all() to get each data attribute from the string.
The regex you want is:
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('#data[0-9]{2}=(\[[0-9,]+\])#',$string,$matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => data01=[1,2]
[1] => data02=[2,3]
)
[1] => Array
(
[0] => [1,2]
[1] => [2,3]
)
)
I have tested this as working.
preg_replace is for replacing stuff. preg_match is for extracting stuff.
So you want:
preg_match('/(.*?)\[(.*?)\](.*?)\[(.*?)\](.*)/', $str, $match);
var_dump($match);
See what you get, and work from there.

How to replace html tag with other tags using preg-match?

I have a string like the following.
<label>value1<label>:value<br>
<label>value2<label>:value<br>
<label>value3<label>:value<br>
and i need to arrange this as following
<li><label>value1<label><span>value</span><li>
i have tried for this last 2 days, but no luck.Any help?
This really isn't something you should do with regex. You might be able to fudge together a solution that works provided it makes a lot of assumptions about the content it's parsing, but it will always be fragile and liable to break should that content deviate from the expected by any significant degree.
A better bet is using PHP's DOM family of classes. I'm not really at liberty to write the code for you (and that's not what SO is for anyway), but I can give you a pointer regarding the steps you need to follow.
Locate text nodes that follow a label and precede a BR (XPath may be useful here)
Put the text node into a span.
Insert the span into the DOM after the label
Remove the BR.
wrap label and span in an li
If, for the sake of regex, you should use it then follow as below :
$string = <<<TOK
<label>value1<label>:value<br>
<label>value2<label>:value<br>
<label>value3<label>:value<br>
TOK;
preg_match_all('/<label>(.*?)<label>\:(.*?)<br>/s', $string, $matches);
print_r($matches);
/*
Array
(
[0] => Array
(
[0] => value1:value
[1] => value2:value
[2] => value3:value
)
[1] => Array
(
[0] => value1
[1] => value2
[2] => value3
)
[2] => Array
(
[0] => value
[1] => value
[2] => value
)
)
*/
$content = "";
foreach($matches as $key => $match)
{
$content.= "<li><label>{$matches[1][$key]}<label><span>{$matches[2][$key]}</span><li>\n";
}
echo($content);
/*
Output:
<li><label>value1<label><span>value</span><li>
<li><label>value2<label><span>value</span><li>
<li><label>value3<label><span>value</span><li>
*/

Regular Expression with wordpress shortcodes

I'm trying to find all shortcodes within a string which looks like this:
 [a_col] One
 [/a_col]
outside
[b_col]
Two
[/b_col] [c_col] Three [/c_col]
I need the content (eg "Three") and the letter from the col (a, b or c)
Here's the expression I'm using
preg_match_all('#\[(a|b|c)_col\](.*)\[\/\1_col\]#m', $string, $hits);
but $hits contains only the last one.
The content can have any character even "[" or "]"
EDIT:
I would like to get "outside" as well which can be any string (except these cols). How can I handle that or should I parse this in a second step?
This will capture anything in the content, as well as attributes, and will allow any characters in the content.
<?php
$input = '[a_col some="thing"] One[/a_col]
[b_col] Two [/b_col]
[c_col] [Three] [/c_col] ';
preg_match_all('#\[(a|b|c)_col([^\[]*)\](.*?)\[\/\1_col\]#msi', $input, $matches);
print_r($matches);
?>
EDIT:
You may want to then trim the matches, since it appears there may be some whitespace. Alternatively, you can use regex for removing the whitespace in the content:
preg_match_all('#\[(a|b|c)_col([^\[]*)\]\s*(.*?)\s*\[\/\1_col\]#msi', $input, $matches);
OUTPUT:
Array
(
[0] => Array
(
[0] => [a_col some="thing"] One[/a_col]
[1] => [b_col] Two [/b_col]
[2] => [c_col] [Three] [/c_col]
)
[1] => Array
(
[0] => a
[1] => b
[2] => c
)
[2] => Array
(
[0] => some="thing"
[1] =>
[2] =>
)
[3] => Array
(
[0] => One
[1] => Two
[2] => [Three]
)
)
It might also be helpful to use this for capturing the attribute names and values stored in $matches[2]. Consider $atts to be the first element in $matches[2]. Of course, would iterate over the array of attributes and perform this on each.
preg_match_all('#([^="\'\s]+)[\t ]*=[\t ]*("|\')(.*?)\2#', $atts, $att_matches);
This gives an array where the names are stored in $att_matches[1] and their corresponding values are stored in $att_matches[3].
use ((.|\n)*) instead of (.*) to capture multiple lines...
<?php
$string = "
[a_col] One
[/a_col]
[b_col]
Two
[/b_col] [c_col] Three [/c_col]";
preg_match_all('#\[(a|b|c)_col\]((.|\n)*)\[\/\1_col\]#m', $string, $hits);
echo "<textarea style='width:90%;height:90%;'>";
print_r($hits);
echo "</textarea>";
?>
I don't have an environment I can test with here but you could use a look behind and look ahead assertion and a back reference to match tags around the content. Something like this.
(?<=\[(\w)\]).*(?=\[\/\1\])

PHP Regexp: ignoring everything before a defined substring

I'm trying to parse a web page.
Basically it gets stored in a string that will look like this:
"[HTML CODE ...]world:[HTML CODE ...]my_number[REST OF HTML_CODE ...]"
Of course "world:" and "MY_NUMBER" are part of the html code, however I would like to ignore everything before the first occurrence of "world:". What I need is the first number that appears after the first occurrence of "world:", keeping in mind that a bunch of html code will be between those.
I could substring the html code but I would like to do this all just by using a single regex if possible.
This is the regular expression I tried to match:
'/(?<=world:)\D+?[0-9]+/'
But this returns me all the html stuff between "world:" and my number.
Thanks!
I think you were close to getting it. I was able to use this on the string you provided.
$subject = "[HTML CODE ...]world:[HTML CODE ...]3334[REST OF HTML_CODE ...]";
$pattern = "/world:\D+?(?<my_number>[0-9]+)/";
$matches = array();
$result = preg_match_all($pattern, $subject, &$matches);
print_r($matches);
Results in:
Array
(
[0] => Array
(
[0] => world:[HTML CODE ...]3334
)
[my_number] => Array
(
[0] => 3334
)
[1] => Array
(
[0] => 3334
)
)

Categories