Splitting a string containing <if> <else> with regexps - php

I'm very poor with regexps but this should be very simple for someone who knows regexps.
Basically I will have a string like this:
<if>abc <else>xyz
I would like a regexp so if the string contains <if> <else>, it splits the string into two parts and returns the two strings after <if> and <else>. In the above example it might return an array with the first element being abc, second xyz. I'm open to approaches not using regexps too.
Any thoughts?

// $subject is you variable containing string you want to run regex on
$result = preg_match('/<if>(.*)<else>(.*)/i', $subject, $matches);
// $matches[0] has the full matched text
// $matches[1] has if part
// $matches[2] has else part
The /i at the end makes the search case sensitive to allow both <if><else> and <IF> <elSE>

Regular Expressions will work for this in simple cases. For more complicated cases (where nesting occurs), you may find it easier to parse the string and use a simple stack to grab the data you need.

The regular expression you are looking for is:
/<if>(.*)<\/else>(.*)/
You'd have a problem if there are multiple <else>'s, though. It can be edited depending on what you want the result to be.
Also note that this regular expression would work even if there is something before <if>.
In PHP, you'd want a code like the following:
<?php
$string = '<if>abc</else>xyz';
preg_match_all("/<if>(.*)<\/else>(.*)/", $string, $matches);
foreach($matches[0] as $value)
{
echo $value;
}
?>

Related

PHP regular expression with nth occurrence

Here's a string:
n%3A171717%2Cn%3A%747474%2Cn%3A555666%2Cn%3A1234567&bbn=555666
From this string how can I extract 1234567 ? Need a good logic / syntax.
I guess preg_match would be a better option than explode function in PHP.
It's about a PHP script that extracts data. The numbers can vary and the occurrence of numbers can vary as well only %2Cn%3A will always be there in front of the numbers.the end will always have a &bbn=anyNumber.
That looks like part of an encoded URL so there's bound to be better ways to do it, but urldecoded() your string looks like:
n:171717,n:t7474,n:555666,n:1234567&bbn=555666
So:
preg_match_all('/n:(\d+)/', urldecode($string), $matches);
echo array_pop($matches[1]);
Parenthesized matches are in $matches[1] so just array_pop() to get the last element.
If &bbn= can be anywhere (except for at the beginning) then:
preg_match('/n:(\d+)&bbn=/', urldecode($string), $matches);
echo $matches[1];
only %2Cn%3A will always be there in front of the numbers
urldecoded equivalent of %2Cn%3A is ,n:.The last "enclosing boundary" &bbn remains as is.
preg_match function will do the job:
preg_match("/(?<=,n:)\d+(?=&bbn)/", urldecode("n%3A171717%2Cn%3A%747474%2Cn%3A555666%2Cn%3A1234567&bbn=555666"), $m);
print_r($m[0]); // "1234567"

Extracting substrings between curly brackets inside a string into an array using PHP

I need help extracing all the sub string between curly brackets that are found inside a specific string.
I found some solutions in javascript but I need it for PHP.
$string = "www.example.com/?foo={foo}&test={test}";
$subStrings = HELPME($string);
print_r($subStrings);
The result should be:
array( [0] => foo, [1] => test )
I tried playing with preg_match but I got confused.
I'd appreciate if whoever manage to get it to work with preg_match, explain also what is the logic behind it.
You could use this regex to capture the strings between {}
\{([^}]*)\}
Explanation:
\{ Matches a literal {
([^}]*) Capture all the characters not of } zero or more times. So it would capture upto the next } symbol.
\} Matches a literal }
Your code would be,
<?php
$regex = '~\{([^}]*)\}~';
$string = "www.example.com/?foo={foo}&test={test}";
preg_match_all($regex, $string, $matches);
var_dump($matches[1]);
?>
Output:
array(2) {
[0]=>
string(3) "foo"
[1]=>
string(4) "test"
}
DEMO
Regex Pattern: \{(\w+)\}
Get all the matches that is captured by parenthesis (). The pattern says anything that is enclosed by {...} are captured.
Sample code:
$regex = '/\{(\w{1,})\}/';
$testString = ''; // Fill this in
preg_match_all($regex, $testString, $matches);
// the $matches variable contains the list of matches
Here is demo on debuggex
If you want to capture any type of character inside the {...} then try below regex pattern.
Regex : \{(.*?)\}
Sample code:
$regex = '/\{(.{0,}?)\}/';
$testString = ''; // Fill this in
preg_match_all($regex, $testString, $matches);
// the $matches variable contains the list of matches
Here is demo on debuggex
<?php
$string = "www.example.com/?foo={foo}&test={test}";
$found = preg_match('/\{([^}]*)\}/',$string, $subStrings);
if($found){
print_r($subStrings);
}else{
echo 'NOPE !!';
}
DEMO HERE
Function parse_url, which parses a URL and return its components. Including the query string.
Try This:
preg_match_all("/\{.*?\}/", $string, $subStrings);
var_dump($subStrings[0]);
Good Luck!
You can use the expression (?<=\{).*?(?=\}) to match any string of text enclosed in {}.
$string = "www.example.com/?foo={foo}&test={test}";
preg_match_all("/(?<=\{).*?(?=\})/",$string,$matches);
print_r($matches[0]);
Regex explained:
(?<=\{) is a positive lookbehind, asserting that the line of text is preceeded by a {.
Similarly (?=\}) is a positive lookahead asserting that it is followed by a }. .* matches 0 or more characters of any type. And the ? in .*? makes it match the least possible amount of characters. (Meaning it matches foo in {foo} and {bar} as opposed to foo} and {bar.
$matches[0] contains an array of all the matched strings.
I see answers here using regular expressions with capture groups, lookarounds, and lazy quantifiers. All of these techniques will slow down the pattern -- granted, the performance is very unlikely to be noticeable in the majority of use cases. Because we are meant to offer solutions that are suitable to more scenarios than just the posted question, I'll offer a few solutions that deliver the expected result and explain the differences using the OP's www.example.com/?foo={foo}&test={test} string assigned to $url. I have prepared a php DEMO of the techniques to follow. For information about the function calls, please follow the links to the php manual. For an in depth breakdown of the regex patterns, I recommend using regex101.com -- a free online tool that allows you to test patterns against strings, see the results as both highlighted text and a grouped list, and provides a technique breakdown character-by-character of how the regex engine is interpreting your pattern.
#1 Because your input string is a url, a non-regex technique is appropriate because php has native functions to parse it: parse_url() with parse_str(). Unfortunately, your requirements go beyond extracting the query string's values, you also wish to re-index the array and remove the curly braces from the values.
parse_str(parse_url($url, PHP_URL_QUERY), $assocArray);
$values = array_map(function($v) {return trim($v, '{}');}, array_values($assocArray));
var_export($values);
While this approach is deliberate and makes fair use of native functions that were built for these jobs, it ends up making longer, more convoluted code which is somewhat unpleasant in terms of readability. Nonetheless, it provides the desired output array and should be considered as a viable process.
#2 preg_match_all() is a super brief and highly efficient technique to extract the values. One draw back with using regular expressions is that the regex engine is completely "unaware" of any special meanings that a formatted input string may have. In this case, I don't see any negative impacts, but when hiccups do arise, often the solution is to use a parser that is "format/data-type aware".
var_export(preg_match_all('~\{\K[^}]*~', $url, $matches) ? $matches[0] : []);
Notice that my pattern does not need capture groups or lookarounds; nor does my answer suffer from the use of a lazy quantifier. \K is used to "restart the fullstring match" (in other words, forget any matched characters upto that point). All of these features will mean that the regex engine can traverse the string with peak efficiency. If there is a downsides to using the function they are:
that a multi-dimensional array is generated while you only want a one-dimensional array
that the function creates a reference variable instead of returning the results
#3 preg_split() most closely aligns with the plain-English intent of your task AND it provides the exact output as its return value.
var_export(preg_split('~(?:(?:^|})[^{]*{)|}[^{]*$~', $url, 0, PREG_SPLIT_NO_EMPTY));
My pattern, while admittedly unsavoury to the novice regex pattern designer AND slightly less efficient because it is making "branched" matches (|), basically says: "Split the string at the following delimiters:
from the start of the string or from a }, including all non-{ characters, then the first encountered { (this is the end of the delimiter).
from the lasts }, including all non-{ characters until the end of the string."

Regex pattern for matching mm <sup>3<sup>

I’m trying to write a regular expression to change mm3 to mL:
<?php
$match = 'mm<sup>3</sup>';
if(preg_match('/\b(mm<sup>3</sup>)\b/', $match))
{
$replacement = 'ml';
$replac = preg_replace('/\b(mm<sup>3</sup>)\b/', $replacement, $match);
echo $replac;
}
?>
But my regular expression doesn't capture the content in $match variable, and the $replac value isn't output. What am I doing wrong?
Change:
if(preg_match('/\b(mm<sup>3</sup>)\b/',$match))
to:
if(preg_match('#\bmm<sup>3</sup>\b#',$match))
and similarly in the preg_replace call.
Since your regular expression contains /, you need to either escape it or use a different delimiter around the regular expression.
There's also no need for the parentheses, since you're not doing anything with the groups.
You need to either use preg_quote to get rid of that / in your regexp, or use a different delimiter (usually # is used).
Also, the \b separator after the > is not necessary, nor are parentheses since you don't seem to be doing capture; you're basically doing a more expensive str_replace.
Finally, you can do everything in one move. If there's no match, nothing will happen.
<?php
$match = 'mm<sup>3</sup>';
$replacement='ML';
$replac = preg_replace('#\\bmm<sup>3</sup>#',
$replacement,
$match);
echo $replac;
?>
If you want to be picky, I guess you should also replace with 'ml', not 'ML' :-)
(for replacement of multiple strings, preg_replace supports arrays).
Note: unless you're sure that is the correct HTML you want replaces, maybe you ought to try
$match = 'mm\\s*<sup>\\s*3\\s*</sup>';
in order to catch mm 3 and similar, in addition to mm3 (in some circumstances they may look alike, and some editors might use or automatically "correct" either form into the other).

How to match a regular expression that contains an arbitrary string, and get only that arbitrary string into a variable using PHP?

I'm trying to write a very simple markup language in PHP that contains tags like [x=123], and I need to be able to match that tag and extract only the value of x.
I'm assuming the answer involves regex but maybe I'm wrong.
So if we had a string:
$str = "F9F[x=]]^$^$[x=123]#3j3E]]#J";
And a regular expression to match:
/^\[x=.+\]$/
How would we get only the ".+" portion of the matching string into a variable?
You can use preg_match to search a string for a regular expression.
Check out the documentation here: http://www.php.net/manual/en/function.preg-match.php for more information on how to use it (as well as some examples). You might also want to take a look at preg_grep.
Following code should work for you:
$str = "F9F[x=]]^$^$[x=123]#3j3E]]#J";
if (preg_match('~\[x=(?<valX>\d+)\]~', $str, $match))
echo $match['valX'] . "\n";
OUTPUT:
123

How do you perform a preg_match where the pattern is an array, in php?

I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.
Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.
First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:
$matches = false;
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
$matches = true;
}
}
You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.
I would recommend at least grouping your patterns using parenthesis like:
foreach ($patterns as $pattern)
{
$grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");
But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.
Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.
So doing this:
foreach ($strings_to_match as $string_to_match)
{
if (strpos($page, $string_to_match) !== false))
{
// etc.
break;
}
}
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
// etc.
break;
}
}
and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().
// assuming you have something like this
$patterns = array('a','b','\w');
// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);
if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}
// PS: that's off the top of my head, I didn't check it in a code editor
If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:
$regex = "/
pattern1| # search for occurences of 'pattern1'
pa..ern2| # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat # Note that the last pattern does NOT have a pipe char
/x";
With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.
This would avoid the looping through the array.
If you're merely searching for the presence of a string in another string, use strpos as it is faster.
Otherwise, you could just iterate over the array of patterns, calling preg_match each time.
If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.
What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:
$sites = array(
'you_tube' => array('dead', 'moved'),
...
);
foreach ($sites as $site => $deadArray) {
// get $html
if ($html == str_replace($deadArray, '', $html)) {
// video is live
}
}
You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.
$patterns = array(
'abc',
'\d+h',
'[abc]{6,8}\-\s*[xyz]{6,8}',
);
$master_pattern = '/(' . implode($patterns, ')|(') . ')/'
if(preg_match($master_pattern, $string_to_check))
{
//do something
}
Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.

Categories