Regular expression not returning propper matches - php

I am currently experiencing some problems with my regular expression on my PHP server.
This is my current regulair expression:
/\{content="(?:([^"\|]*)\|?)+"\}/
And I want it to match:
{content="default|test|content|text"}
And then return this in the matches:
default
test
content
text
But when I currently execute it I get back this in my matches:
array (
0 => '{content="default|test|content|text"}',
1 => '',
)
Do any of you have a problem what I am doing wrong?
With kind regards,
Youri Arktesteijn

You can use positive lookaheads and positive lookbehinds.
Three phases:
We match the beginning quote, but don't catch it in our output. Then we match anything that's not a pipe. Then we match a pipe without catching it.
Non-pipes between pipes
Non-pipes and non-quotes between a pipe and a quote.
Here's the code.
<?php
$string = '{content="default|test|content|text"}';
$my_matches = preg_match_all('!((?<=")([^|]+)(?=[|])|(?<=[|])([^|]+)(?=[|])|(?<=[|])([^|"]+)(?="))!',$string,$matches);
print_r($matches[0]);
?>
Output
Array
(
[0] => default
[1] => test
[2] => content
[3] => text
)
Once you have the logic working, then you can pair the look ahead and look behind characters to shorten the match string.
$my_matches = preg_match_all('!(?<=["|])([^|"]+)(?=[|"])!',$string,$matches);
Output
Array
(
[0] => default
[1] => test
[2] => content
[3] => text
)

I don't know how it is possible by using a single line of regular expression. Anyway try the following code,
<?php
if (preg_match('/\{content="(?:([^\"]+))"\}/', $sContent, $matches) > 0) {
$result = explode('|', $matches[1]);
} else {
$result = array();
}
echo '<pre>' . print_r($result, true) . '</pre>';
?>

Related

Explode a string where the explode condition is bunch of specific characters

I'm looking for a way to explode a string. For example, I have the following string: (we don't count the beginning - 0x)
0xa9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368
which is actually an ETH transaction input. I need to explode this string into 3 parts. Imagine 1 bunch of zeros is actually a single space and these spaces define the gates where the string should be exploded.
How can I do that?
preg_split()
This function uses a regular expression to split a string.
So in this example at two or more 0 in a row:
$arr = preg_split('/[0]{2,}/', $string);
print_r($arr);
echo PHP_EOL;
This will output the following:
Array
(
[0] => a9059xbb
[1] => fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d
[2] => 54368
)
Be aware that you will have problems if a message itself has a 00 in it. Assuming it is used as a null-byte for "end of string", this will not happen, though.
preg_match()
This is an example using regular expressions. You can split at arbitrary points.
$string = 'a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368';
print_r($string);
echo PHP_EOL;
$res = preg_match('/(.{4})(.{32})(.{32})/', $string, $matches);
print_r($matches);
echo PHP_EOL;
This outputs:
a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368
Array
(
[0] => a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199a
[1] => a905
[2] => 9xbb000000000000000000000000fc7a
[3] => 5f48a1a1b3f48e7dcb1f23a1ea24199a
)
As you can see /(.{4})(.{32})(.{32})/ will find 4 bytes, then 32 and after that 32 again. Capturing groups are made with () around what you want to find. They appear in the $matches array (0 is always the whole string found).
In case you want to ignore certain parts you can express that as well:
/(.{4})9x(.{32}).{4}(.{32})/
This changes the found string:
Array
(
[0] => a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d000
[1] => a905
[2] => bb000000000000000000000000fc7a5f
[3] => a1b3f48e7dcb1f23a1ea24199af4d000
)
Links
PHP documentation for the mentioned functions:
https://www.php.net/manual/en/function.preg-split.php
https://www.php.net/manual/en/book.pcre.php
Play around with the second regular expression using this demo:
https://regex101.com/r/pfZtH8/1
If you will always explode them at the same points (4 bytes(8 hexadecimal digits), 32 bytes(64 hexadecimal digits), 32 bytes(64 hexadecimal digits)), you could use substr().
$input = "0xa9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368";
$first = substr($input,2,8);
$second = substr($input,10,64);
$third = substr($input,74,64);
print_r($first);
print "<br>";
print_r($second);
print "<br>";
print_r($third);
print "<br>";
this outputs:
a9059xbb
000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d0
0000000000000000000000000000000000000000000000000000000000054368

php preg_split ignore duplicate delimiters

I want to split strings produced by an older version of phpstan we are constrained to use (v0.9).
Each error string is separated by :, but there are sometimes static calls marked with :: which I want to ignore.
My code:
$error = '/path/to/file/namespace/filename:line_number:error message Namespace\ClassName::method().'
$output = preg_split('/:/', $error);
A var_dump of $output gives this:
Array
(
[0] => /path/to/file/namespace/filename
[1] => line_number
[2] => error message Namespace\ClassName
[3] =>
[4] => method().
)
The result I want is this:
Array
(
[0] => /path/to/file/namespace/filename
[1] => line_number
[2] => error message Namespace\ClassName::method().
)
I was hoping this could be solved with regex.
I have been reading similar questions and have tried variations of regex, none of which worked.
You can use lookahead and lookbehind for your split:
$error = '/path/to/file/namespace/filename:line_number:error message Namespace\ClassName::method().';
$arr = preg_split('/(?<!:):(?!:)/', $error, -1, PREG_SPLIT_NO_EMPTY);
print_r($arr);
Array
(
[0] => /path/to/file/namespace/filename
[1] => line_number
[2] => error message Namespace\ClassName::method().
)
RegEx Demo
RegEx Details:
(?<!:): Negative lookbehind to fail the match if there is a : behind
:: Match a :
(?!:): Negative lookahead to fail the match if there is a : ahead
Another option is to match 2 or more occurrences of : and use (*SKIP)(*F). Then match a single : to split on.
:{2,}(*SKIP)(*F)|:
Explanation
:{2,}(*SKIP)(*F) Match 2 or more occurrences of :, then skip all currently matched chars
| Or
: Match a single :
Regex demo | Php demo
$error = '/path/to/file/namespace/filename:line_number:error message Namespace\ClassName::method().';
$output = preg_split('/:{2,}(*SKIP)(*F)|:/', $error);
print_r($output);
Output
Array
(
[0] => /path/to/file/namespace/filename
[1] => line_number
[2] => error message Namespace\ClassName::method().
)
Using preg_match_all (sometimes more simple to split):
preg_match_all('~[^:]+(?>::[^:]*)*~', $error, $matches);
print_r($matches[0]);

Strange behavior of preg_match_all php

I have a very long string of html. From this string I want to parse pairs of rus and eng names of cities. Example of this string is:
$html = '
Абакан
Хакасия республика
Абан
Красноярский край
Абатский
Тюменская область
';
My code is:
$subject = $this->html;
$pattern = '/<a href="([\/a-zA-Z0-9-"]*)">([а-яА-Я]*)/';
preg_match_all($pattern, $subject, $matches);
For trying I use regexer . You can see it here http://regexr.com/399co
On the test used global modifier - /g
Because of in PHP we can't use /g modifier I use preg_match_all function. But result of preg_match_all is very strange:
Array
(
[0] => Array
(
[0] => <a href="/forecasts5000/russia/republic-khakassia/abakan">Абакан
[1] => <a href="/forecasts5000/russia/krasnoyarsk-territory/aban">Абан
[2] => <a href="/forecasts5000/russia/tyumen-area/abatskij">Аба�
[3] => <a href="/forecasts5000/russia/arkhangelsk-area/abramovskij-ma">Аб�
)
[1] => Array
(
[0] => /forecasts5000/russia/republic-khakassia/abakan
[1] => /forecasts5000/russia/krasnoyarsk-territory/aban
[2] => /forecasts5000/russia/tyumen-area/abatskij
[3] => /forecasts5000/russia/arkhangelsk-area/abramovskij-ma
)
[2] => Array
(
[0] => Абакан
[1] => Абан
[2] => Аба�
[3] => Аб�
)
)
First of all - it found only first match (but I need to get array with all matches)
The second - result is very strange for me. I want to get the next result:
pairs of /forecasts5000/russia/republic-khakassia/abakan and Абакан
What do I do wrong?
Element 0 of the result is an array of each of the full matches of the regexp. Element 1 is an array of all the matches for capture group 1, element 2 contains capture group 2, and so on.
You can invert this by using the PREG_SET_ORDER flag. Then element 0 will contain all the results from the first match, element 1 will contain all the results from the second match, and so on. Within each of these, [0] will be the full match, and the remaining elements will be the capture groups.
If you use this option, you can then get the information you want with:
foreach ($matches as $match) {
$url = $match[1];
$text = $match[2];
// Do something with $url and $text
}
You can also use T-Regx library which has separate methods for each case :)
pattern('<a href="([/a-zA-Z0-9-"]*)">([а-яА-Я]*)')
->match($this->html)
->forEach(function (Match $match) {
$match = $match->text();
$group = $match->group(1);
echo "Match $match with group $group"
});
I also has automatic delimiters

Pattern for preg_match

I have a string contains the following pattern "[link:activate/$id/$test_code]" I need to get the word activate, $id and $test_code out of this when the pattern [link.....] occurs.
I also tried getting the inside items by using grouping but only gets active and $test_code couldn't get $id. Please help me to get all the parameter and action name in array.
Below is my code and output
Code
function match_test()
{
$string = "Sample string contains [link:activate/\$id/\$test_code] again [link:anotheraction/\$key/\$second_param]]] also how the other ationc like [link:action] works";
$pattern = '/\[link:([a-z\_]+)(\/\$[a-z\_]+)+\]/i';
preg_match_all($pattern,$string,$matches);
print_r($matches);
}
Output
Array
(
[0] => Array
(
[0] => [link:activate/$id/$test_code]
[1] => [link:anotheraction/$key/$second_param]
)
[1] => Array
(
[0] => activate
[1] => anotheraction
)
[2] => Array
(
[0] => /$test_code
[1] => /$second_param
)
)
Try this:
$subject = <<<'LOD'
Sample string contains [link:activate/$id/$test_code] again [link:anotheraction/$key/$second_param]]] also how the other ationc like [link:action] works
LOD;
$pattern = '~\[link:([a-z_]+)((?:/\$[a-z_]+)*)]~i';
preg_match_all($pattern, $subject, $matches);
print_r($matches);
if you need to have \$id and \$test_code separated you can use this instead:
$pattern = '~\[link:([a-z_]+)(/\$[a-z_]+)?(/\$[a-z_]+)?]~i';
Is this what you are looking for?
/\[link:([\w\d]+)\/(\$[\w\d]+)\/(\$[\w\d]+)\]/
Edit:
Also the problem with your expression is this part:
(\/\$[a-z\_]+)+
Although you have repeated the group, the match will only return one because it is still only one group declaration. The regex won't invent matching group numbers for you (Not that i've ever seen anyway).

How does this RegEx for parsing emails work in PHP?

Okay, I have the following PHP code to extract an email address of the following two forms:
Random Stranger <email#domain.com>
email#domain.com
Here is the PHP code:
// The first example
$sender = "Random Stranger <email#domain.com>";
$pattern = '/([\w_-]*#[\w-\.]*)|.*<([\w_-]*#[\w-\.]*)>/';
preg_match($pattern,$sender,$matches,PREG_OFFSET_CAPTURE);
echo "<pre>";
print_r($matches);
echo "</pre><hr>";
// The second example
$sender = "user#domain.com";
preg_match($pattern,$sender,$matches,PREG_OFFSET_CAPTURE);
echo "<pre>";
print_r($matches);
echo "</pre>";
My question is... what is in $matches? It seems to be a strange collection of arrays. Which index holds the match from the parenthesis? How can I be sure I'm getting the email address and only the email address?
Update:
Here is the output:
Array
(
[0] => Array
(
[0] => Random Stranger
[1] => 0
)
[1] => Array
(
[0] =>
[1] => -1
)
[2] => Array
(
[0] => user#domain.com
[1] => 5
)
)
Array
(
[0] => Array
(
[0] => user#domain.com
[1] => 0
)
[1] => Array
(
[0] => user#domain.com
[1] => 0
)
)
This doesn't help you with your preg question but it will simplify your code. Since those are the only 2 options, dont use regular expressions
echo end( explode( '<', rtrim( $sender, '>' ) ) );
The following is copied directly from the help doc at http://us.php.net/preg_match
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
The preg_match() manual page explains how $matches works. It's an optional parameter that gets filled with the results of any bracketed sub-expression from your regexp, in the order that they matched. $matches[0] is always the entire expression match, followed by the sub-expressions.
So for example, that pattern contains two sub-expression, ([\w_-]*#[\w-\.]*) and ([\w_-]*#[\w-\.]*). The parts matching those two expressions will be put into $matches[1] and $matches[2], respectively. I would guess after a quick glance that for the email address of Random Stranger <email#domain.com>, you would have something like this in $matches:
Array(
0 => "Random Stranger <email#domain.com>",
1 => "Random Stranger",
2 => "email#domain.com"
)
Think of it as passing an array named $matches by reference, that gets filled with all the sub-parts that are matched.
Edit - note that you are using the PREG_OFFSET_CAPTURE flag, which alters the behaviour of how $matches gets filled, so your result won't match my example. The manual explains how this flag alters the capture as well. In this case, instead of a set of matched sub-expressions, you get a multidimensional array of each expression with the position it was found at in the string.

Categories