How to replace html tag with other tags using preg-match? - php

I have a string like the following.
<label>value1<label>:value<br>
<label>value2<label>:value<br>
<label>value3<label>:value<br>
and i need to arrange this as following
<li><label>value1<label><span>value</span><li>
i have tried for this last 2 days, but no luck.Any help?

This really isn't something you should do with regex. You might be able to fudge together a solution that works provided it makes a lot of assumptions about the content it's parsing, but it will always be fragile and liable to break should that content deviate from the expected by any significant degree.
A better bet is using PHP's DOM family of classes. I'm not really at liberty to write the code for you (and that's not what SO is for anyway), but I can give you a pointer regarding the steps you need to follow.
Locate text nodes that follow a label and precede a BR (XPath may be useful here)
Put the text node into a span.
Insert the span into the DOM after the label
Remove the BR.
wrap label and span in an li

If, for the sake of regex, you should use it then follow as below :
$string = <<<TOK
<label>value1<label>:value<br>
<label>value2<label>:value<br>
<label>value3<label>:value<br>
TOK;
preg_match_all('/<label>(.*?)<label>\:(.*?)<br>/s', $string, $matches);
print_r($matches);
/*
Array
(
[0] => Array
(
[0] => value1:value
[1] => value2:value
[2] => value3:value
)
[1] => Array
(
[0] => value1
[1] => value2
[2] => value3
)
[2] => Array
(
[0] => value
[1] => value
[2] => value
)
)
*/
$content = "";
foreach($matches as $key => $match)
{
$content.= "<li><label>{$matches[1][$key]}<label><span>{$matches[2][$key]}</span><li>\n";
}
echo($content);
/*
Output:
<li><label>value1<label><span>value</span><li>
<li><label>value2<label><span>value</span><li>
<li><label>value3<label><span>value</span><li>
*/

Related

PHP Regex containing its limiters as ocurrences

I have this string:
{include="folder/file" vars="key:value"}
I have a regex to catch the file and the vars like this:
|\{include\=[\'\"](.*)\/(.*)[\'\"](.*)\}|U
First (.*) = folder
Second (.*) = file
Third (.*) = params (and I have some functions to parse it)
But there are some cases where I need to catch the params where they contains brackets {}. Like this:
{include="file" vars="key:{value}"}
The regext is working but it catches the results only until the first closing bracket. Like this:
{include="file" vars="key:{value}
So some part of the code remains out.
How can I make to allow those brackets as part of the results instead as a closing limiter???
Thanks!
You can use this regex:
\{include=['"](?:(.*)\/(.*?)|(\w+))['"] vars="(.*?)"\}
Working demo
MATCH 1
1. [10-16] `folder`
2. [17-21] `file`
4. [29-38] `key:value`
MATCH 2
3. [51-55] `file`
4. [63-74] `key:{value}`
Having in mind what #naomik said, I think I should change my regex.
What I want to make now is detecting this structure:
{word="value" word="value" ... n times}
I have this regex: (\w+)=['"](.*?)['"]
it detects :
{include="folder/file"}
{include="folder/file" vars="key:value"}
{vars="key:{value}" include="folder/file"} (order changed)
it works fine BUT I dont know how to add the initial and final brackets to the regex. When I add them it doesnt work like I want anymore
Live Demo
Another robust regexp that covers your first question :
preg_match_all("{include=[\"']{1}([^\"']+)[\"']{1} vars=[\"']{1}([^\"]+)[\"']{1}}", $str, $matches);
You'll get this kind of result into $matches :
Array
(
[0] => Array
(
[0] => {include="folder/file" vars="key:{value}"}
[1] => {include="folder/file" vars="key:value"}
[2] => {include="folder/file" vars="key:value"}
[3] => {include="file" vars="key:{value}"}
)
[1] => Array
(
[0] => folder/file
[1] => folder/file
[2] => folder/file
[3] => file
)
[2] => Array
(
[0] => key:{value}
[1] => key:value
[2] => key:value
[3] => key:{value}
)
)
you can access to what matters this way : $matches[1][0] and $matches[2][0] for the first elem, $matches[1][1] $matches[2][1] for the second, etc.
It does not store folder or file in separate results. For this, you'll have to write a sub piece of code. There is no elegant way to write a regex that is covering both include="folder/file" and include="file".
It does not support the inversion of include and vars. If you want to support this, you'll have to split your input data into chunks (line by line or text between braces) before your try to match the content with something like this :
preg_match_all("([\w]+)=[\"']{1}([^\"']+)[\"']{1}", $chunk, $matches);
then matches will contain something like this :
Array
(
[0] => Array
(
[0] => vars="key:{value}"
[1] => include="folder/file"
)
[1] => Array
(
[0] => vars
[1] => include
)
[2] => Array
(
[0] => key:{value}
[1] => folder/file
)
)
Then you know that $matches[1][0] contains 'vars', you can gets vars value in $matches[2][0]. For $matches[1][1] it contais 'include', you can then get 'folder/file' in $matches[2][1].

Regex - Does not contain certain Characters preg_match

I need a regex that match if the array contain certain it could anywhere for example, this array :
Array
(
[1] => Array
(
[0] => http://www.test1.com
[1] => 4
[2] => 4
)
[2] => Array
(
[0] => http://www.test2.fr/blabla.html
[1] => 2
[2] => 2
)
[3] => Array
(
[0] => http://www.stuff.com/admin/index.php
[1] => 2
[2] => 2
)
[4] => Array
(
[0] => http://www.test3.com/blabla/bla.html
[1] => 2
[2] => 2
)
[5] => Array
(
[0] => http://www.stuff.com/bla.html
[1] => 2
[2] => 2
)
I want to return all but the array that have the word stuff in it, and when i try to test with this it doesn't quite work :
return !preg_match('/(stuff)$/i', $element[0]);
any solution for that ?
Thanks
You don't need a regular expression for performing a simple search. Use array_filter() in conjunction with strpos():
$result = array_filter($array, function ($elem) {
return (strpos($elem[0], 'stuff') !== FALSE);
});
Now, to answer your question, your current regex pattern will only match strings that contain stuff at the end of the line. You don't want that, so get rid of the "end of the line" anchor $ from your regex.
The updated regex should look like below:
return !preg_match('/stuff/i', $element[0]);
If the actual use-case is different from what is shown in your question and if the operation involves more than just a simple pattern matching, then preg_match() is the right tool. As shown above, this can be used with array_filter() to create a new array that satisifes your requirements.
Here's how you'd do it with a callback function:
$result = array_filter($array, function ($elem) {
return preg_match('/stuff/i', $elem[0]);
});
Note: The actual regex might be more complex - I've used /stuff/ as an example. Also, note that I've removed the negation !... from the statement.
Your pattern will only match a string where stuff appears at the end of the string or line. To fix this, just get rid of the end anchor ($):
return !preg_match('/stuff/i', $element[0]);

Regular Expression with wordpress shortcodes

I'm trying to find all shortcodes within a string which looks like this:
 [a_col] One
 [/a_col]
outside
[b_col]
Two
[/b_col] [c_col] Three [/c_col]
I need the content (eg "Three") and the letter from the col (a, b or c)
Here's the expression I'm using
preg_match_all('#\[(a|b|c)_col\](.*)\[\/\1_col\]#m', $string, $hits);
but $hits contains only the last one.
The content can have any character even "[" or "]"
EDIT:
I would like to get "outside" as well which can be any string (except these cols). How can I handle that or should I parse this in a second step?
This will capture anything in the content, as well as attributes, and will allow any characters in the content.
<?php
$input = '[a_col some="thing"] One[/a_col]
[b_col] Two [/b_col]
[c_col] [Three] [/c_col] ';
preg_match_all('#\[(a|b|c)_col([^\[]*)\](.*?)\[\/\1_col\]#msi', $input, $matches);
print_r($matches);
?>
EDIT:
You may want to then trim the matches, since it appears there may be some whitespace. Alternatively, you can use regex for removing the whitespace in the content:
preg_match_all('#\[(a|b|c)_col([^\[]*)\]\s*(.*?)\s*\[\/\1_col\]#msi', $input, $matches);
OUTPUT:
Array
(
[0] => Array
(
[0] => [a_col some="thing"] One[/a_col]
[1] => [b_col] Two [/b_col]
[2] => [c_col] [Three] [/c_col]
)
[1] => Array
(
[0] => a
[1] => b
[2] => c
)
[2] => Array
(
[0] => some="thing"
[1] =>
[2] =>
)
[3] => Array
(
[0] => One
[1] => Two
[2] => [Three]
)
)
It might also be helpful to use this for capturing the attribute names and values stored in $matches[2]. Consider $atts to be the first element in $matches[2]. Of course, would iterate over the array of attributes and perform this on each.
preg_match_all('#([^="\'\s]+)[\t ]*=[\t ]*("|\')(.*?)\2#', $atts, $att_matches);
This gives an array where the names are stored in $att_matches[1] and their corresponding values are stored in $att_matches[3].
use ((.|\n)*) instead of (.*) to capture multiple lines...
<?php
$string = "
[a_col] One
[/a_col]
[b_col]
Two
[/b_col] [c_col] Three [/c_col]";
preg_match_all('#\[(a|b|c)_col\]((.|\n)*)\[\/\1_col\]#m', $string, $hits);
echo "<textarea style='width:90%;height:90%;'>";
print_r($hits);
echo "</textarea>";
?>
I don't have an environment I can test with here but you could use a look behind and look ahead assertion and a back reference to match tags around the content. Something like this.
(?<=\[(\w)\]).*(?=\[\/\1\])

I'm trying to parse some text in html with custom nested tags

I would like to parse some text into an array:
My text looks like this:
You've come to the {right; correct; appropriate} place! Start by {searching; probing; inquiring} our site below, or {browse; {search; lookup; examine}} our list of popular support articles.
The third group of words has nested tags. How can I ignore the opening and closing nested tags to achieve an array such as
$tags[0][0] = 'right';
$tags[0][1] = 'suitable';
$tags[0][2] = 'appropriate';
$tags[1][0] = 'searching';
$tags[1][1] = 'probing';
$tags[1][2] = 'inquiring';
$tags[2][1] = 'browse';
$tags[2][2] = 'search';
$tags[2][3] = 'lookup';
$tags[2][4] = 'examine';
Essentially ignoring the nesting of the tags.
Any help would be greatly appreciated.
My only current ideas for this is to traverse the text character by character until I find a { which would increment a "depth" variable. Capture the words in between until I find a } decreasing the depth variable and upon it returning to zero, stop capturing words. I was just wondering if there's a much easier way of doing this. Thanks.
Thanks for your excellent help, I modified it a bit to come up with the following solution.
$code = "You've come to {the right; the correct; the appropriate} place!
Start by {searching; probing; inquiring} our site below, or
{browse; {search; {foo; bar}; lookup}; examine} our list of
popular support articles.";
echo $code."\r\n\r\n";
preg_match_all('/{((?:[^{}]*|(?R))*)}/', $code, $matches);
$arr = array();
$r = array('{','}');
foreach($matches[1] as $k1 => $m)
{
$ths = explode(';',str_replace($r,'',$m));
foreach($ths as $key => $val)
{
if($val!='')
$arr[$k1][$key] = trim($val);
$code = str_replace($matches[0][$k1],'[[rep'.$k1.']]',$code);
}
}
echo $code;
Returns
You've come to {the right; the correct; the appropriate} place! Start by {searching; probing; inquiring} our site below, or {browse; {search; {foo; bar}; lookup}; examine} our list of popular support articles.
You've come to [[rep0]] place! Start by [[rep1]] our site below, or [[rep2]] our list of popular support articles.
My only current ideas for this is to traverse the text character by character until I find a { which would increment a "depth" variable. Capture the words in between until I find a } decreasing the depth variable and upon it returning to zero, stop capturing words. I was just wondering if there's a much easier way of doing this.
That sounds like a reasonable way to do it. Another way to do this is by using a bit of regex, although that might result in a solution that is (far) less readable (and therefor less maintainable) than your own solution.
<?php
$text = "You've come to the {right; correct; appropriate} place!
Start by {searching; probing; inquiring} our site below, or
{browse; {search; {foo; bar}; lookup}; examine} our list of
popular support articles. {the right; the correct; the appropriate}";
preg_match_all('/{((?:[^{}]*|(?R))*)}/', $text, $matches);
$arr = array();
foreach($matches[1] as $m) {
preg_match_all('/\w([\w\s]*\w)?/', $m, $words);
$arr[] = $words[0];
}
print_r($arr);
?>
would produce:
Array
(
[0] => Array
(
[0] => right
[1] => correct
[2] => appropriate
)
[1] => Array
(
[0] => searching
[1] => probing
[2] => inquiring
)
[2] => Array
(
[0] => browse
[1] => search
[2] => foo
[3] => bar
[4] => lookup
[5] => examine
)
[3] => Array
(
[0] => the right
[1] => the correct
[2] => the appropriate
)
)

Regular expression in PHP to return array with all images from html, eg: all src="images/header.jpg" instances

I'd like to be able to return an array with a list of all images (src="" values) from html
[0] = "images/header.jpg"
[1] = "images/person.jpg"
is there a regular expression that can do this?
Many thanks in advance!
Welcome to the world of the millionth "how to exactract these values using regex" question ;-) I suggest to use the search tool before seeking an answer -- here is just a handful of topics that provide code to do exactly what you need;
replacing all image src tags in HTML text
getting image src in php
How to extract img src, title and alt from html using php?
Matching SRC attribute of IMG tag using preg_match
php regex : get src value
Dynamically replace the “src” attributes of all <img> tags (redux)
preg_match_all , get all img tag that include a string
/src="([^"]+)"/
The image will be in group 1.
Example:
preg_match_all('/src="([^"]+)"/', '<img src="lol"><img src="wat">', $arr, PREG_PATTERN_ORDER);
Returns:
Array
(
[0] => Array
(
[0] => src="lol"
[1] => src="wat"
)
[1] => Array
(
[0] => lol
[1] => wat
)
)
Here is a more polished version of the regular expression provided by Håvard:
/(?<=src=")[^"]+(?=")/
This expression uses Lookahead & Lookbehind Assertions to get only what you want.
$str = '<img src="/img/001.jpg"><img src="/img/002.jpg">';
preg_match_all('/(?<=src=")[^"]+(?=")/', $str, $srcs, PREG_PATTERN_ORDER);
print_r($srcs);
The output will look like the following:
Array
(
[0] => Array
(
[0] => /img/001.jpg
[1] => /img/002.jpg
)
)
I see that many peoples struggle with Håvard's post and <script> issue. Here is same solution on more strict way:
<img.*?src="([^"]+)".*?>
Example:
preg_match_all('/<img.*?src="([^"]+)".*?>/', '<img src="lol"><img src="wat">', $arr, PREG_PATTERN_ORDER);
Returns:
Array
(
[1] => Array
(
[0] => "lol"
[1] => "wat"
)
)
This will avoid other tags to be matched. HERE is example.

Categories