i need get from text all "src" elements. "src" can have " or '.
Text found in the well, but if element have id, style... They also grabbed.
I need only src value.
My code:
$html = 'text text <img src="img1.png"/> as as <img src=\'second.gif\' id ="test" /> as';
preg_match_all('/src=("|\')([^"]*)("|\')/', $html, $htmlSrc);
echo '<pre>';
print_r($htmlSrc);
Array
(
[0] => Array
(
[0] => src="img1.png"
[1] => src='second.gif' id ="
)
[1] => Array
(
[0] => "
[1] => '
)
[2] => Array
(
[0] => img1.png
[1] => second.gif' id =
)
[3] => Array
(
[0] => "
[1] => "
)
)
Regexp is a bad idea and you will probably end up with unmaintainable and unreliable code. It would be easy and reliable if you use an HTML parser. You can find an example here: http://simplehtmldom.sourceforge.net/
preg_match_all('/src="|\'([^"\']*)"|\'/', $html, $htmlSrc);
print_r($htmlSrc[2]);
Seems to work better.
Related
I'm trying to find a regex capable of capturing the content of short codes produces in Wordpress.
My short codes have the following structure:
[shortcode name param1="value1" param2="value2" param3="value3"]
The number of parameters is variable.
I need to capture the shortcode name, the parameter name and its value.
The closest results I have achieved is with this:
/(?:\[(.*?)|\G(?!^))(?=[^][]*])\h+([^\s=]+)="([^\s"]+)"/
If I have the following content in the same string:
[specs product="test" category="body"]
[pricelist keyword="216"]
[specs product="test2" category="network"]
I get this:
0=>array(
0=>[specs product="test"
1=> category="body"
2=>[pricelist keyword="216"
3=>[specs product="test2"
4=> category="network")
1=>array(
0=>specs
1=>
2=>pricelist
3=>specs
4=>)
2=>array(
0=>product
1=>category
2=>keyword
3=>product
4=>category)
3=>array(
0=>test
1=>body
2=>216
3=>test2
4=>network)
)
I have tried different regex models but I always end up with the same issue, if I have more than one parameter, it fails to detect it.
Do you have any idea of how I could achieve this?
Thanks
Laurent
You could make use of the \G anchor using 3 capture groups, where capture group 1 is the name of the shortcode, and group 2 and 3 the key value pairs.
Then you can remove the first entry of the array, and remove the empty entries in the 1st, 2nd and 3rd entry.
This is a slightly updated pattern
(?:\[(?=[^][]*])(\w+)|\G(?!^))\h+(\w+)="([^"]+)"
Regex demo | Php demo
Example
$s = '[specs product="test" category="body"]';
$pattern = '/(?:\[(?=[^][]*])(\w+)|\G(?!^))\h+(\w+)="([^"]+)"/';
$strings = [
'[specs product="test" category="body"]',
'[pricelist keyword="216"]',
'[specs product="test2" category="network" key="value"]'
];
foreach($strings as $s) {
if (preg_match_all($pattern, $s, $matches)) {
unset($matches[0]);
$matches = array_map('array_filter', $matches);
print_r($matches);
}
}
Output
Array
(
[1] => Array
(
[0] => specs
)
[2] => Array
(
[0] => product
[1] => category
)
[3] => Array
(
[0] => test
[1] => body
)
)
Array
(
[1] => Array
(
[0] => pricelist
)
[2] => Array
(
[0] => keyword
)
[3] => Array
(
[0] => 216
)
)
Array
(
[1] => Array
(
[0] => specs
)
[2] => Array
(
[0] => product
[1] => category
[2] => key
)
[3] => Array
(
[0] => test2
[1] => network
[2] => value
)
)
Need to fix this regex which extract html attributes in array for me by preg_mach_all function in php:
(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?
the attributes example is:
style="width: 462px;" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......=" data-filename="Screenshot from 2016-02-09 21:54:47.png"
working example in finddle: https://regex101.com/r/QE9XGD/1
because of equals sign in the end of src attribute, I got wrong array:
Array
(
[0] => Array
(
[0] => style="width: 462px;"
[1] => src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......=" data-filename="
)
[1] => Array
(
[0] => style
[1] => src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......
)
[2] => Array
(
[0] => width: 462px;
[1] => data-filename=
)
)
correct array should be like this:
Array
(
[0] => Array
(
[0] => style="width: 462px;"
[1] => src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......="
[2] => data-filename="Screenshot from 2016-02-09 1:54:47.png"
)
[1] => Array
(
[0] => style
[1] => src
[2] => data-filename
)
[2] => Array
(
[0] => width: 462px;
[1] => data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......=
[2] => Screenshot from 2016-02-09 1:54:47.png
)
)
how to fix this regex to get correct answer?
Remember I use this regex not just in image attributes extraction, is a universal regex for all type of html tags
(\S+?)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?
The change is to make the attribute name evaluation lazy, so it only eats until it finds an =.
Working example on regex101
That being said, I'm fairly confident this regex can be reduced.
([^\s=]+)=('?)("?)([^>"']*)\2\3 is probably the best option:
It takes about 2% of the time of lazy evaluation and will do both singly and doubly quoted attributes. The big change here is the capture groups you want are the 1st and 4th. As far as I'm aware this will work on any html except: tag='"value'
regex101
I'm trying to split this image string: $output = "<img typeof="foaf:Image" src="http://asite.dev/sites/default/files/video/MBI_Part%201_v9.jpg" width="1920" height="1080" alt="" />
I'm doing it like this: $split = explode('"', $output);
But when I print_r($split);it returns:
Array ( [0] => typeof="foaf:Image" [2] => src="http://makingitcount.dev/sites/default/files/video/MBI_Part%201_v9.jpg" [3] => width="1920" [4] => height="1080" [5] => alt="" [6] => /> )
No second value! Where'd it go? split[1] throws an error, of course. I also notice that the "<img" part of the string isn't in the array either.
The problem stems from the parsing of the html tag. If you remove the <img at the beginning of the html string, you'll notice the rest of the attributes will parse into an array with a proper number sequence (including a '1' element). You can solve your problem by formatting your quotes to tell php not to parse the html and treat the entire unit strictly as a string.
If you want to bypass this whole mess, you can also just use regular expression matching to collect tag information and pass it into an array. $matches[0][*] will contain all of your tag attributes, and $matches[1] contains the tag itself (img)
$output = '<img typeof="Image" src="http://asite.dev/sites/default/files/video/MBI_Part%201_v9.jpg" width="1920" height="1080" alt="" />';
$pattern = '( \w+|".*?")';
preg_match_all($pattern, $output, $matches);
preg_match("[\w+]",$output,$matches[1]);
print_r($matches);
which gives you
Array ( [0] => Array ( [0] => typeof [1] => "Image" [2] => src [3] => "http://asite.dev/sites/default/files/video/MBI_Part%201_v9.jpg" [4] => width [5] => "1920" [6] => height [7] => "1080" [8] => alt [9] => "" )
[1] => Array ( [0] => img ) )
I have input string from users. this input from users are unpredictable. it's mean user can input any string as they like.
I would like to filter the input that match following pattern and return it as an array
These following string pattern should works:
product=bag, product=tshirt, product=shoes
product=bag status=sold, product=jeans, product=shoes
product=all
I would like the output as array like below :
Array(
[0] => Array
(
[product] => bag
[status] => sold
)
[1] => Array
(
[product] => jeans
)
[2] => Array
(
[product] => shoes
)
)
I guess it can be achieved by use preg_match_all() beside explode. Anyone can give me example using preg_match_all ? or any other ways are ok for me as long as the best method.
$string = 'product=bag status=sold, product=tshirt, product=shoes';
$m = preg_match_all('/needregexrulehere/', $string, $matches);
You don't need a regular expression for this, you can do something like this:
$return = array();
foreach( str_getcsv( $string) as $line) {
parse_str( str_replace( ' ' , '&', $line), $temp);
$return[] = $temp;
}
This will output:
Array
(
[0] => Array
(
[product] => bag
[status] => sold
)
[1] => Array
(
[product] => tshirt
)
[2] => Array
(
[product] => shoes
)
)
I will leave error checking / input sanitation up to the OP.
I have some text strings like this
{hello|hi}{there|you}
I want to count the instances of {..anything..}, so in the example above, I would want to return:
hello|hi
there|you
in the matches array created by preg_match_all()
Right now my code looks like:
preg_match_all('/{(.*?)}/', $text,$text_pieces);
And $text_pieces contains:
Array ( [0] => Array ( [0] => {hello|hi} [1] => {there|you} ) [1] => Array ( [0] => hello|hi [1] => there|you ) )
All I need is this:
[0] => hello|hi [1] => there|you
preg_match_all cannot omit the full text matches, only subpattern matches, therefore the only solution is to set $text_pieces to $text_pieces[1] after the function call:
if(preg_match_all('/{(.*?)}/', $text,$text_pieces))
{
$text_pieces = $text_pieces[1];
}