PHP preg_match_all RegEx conflict

PHP preg_match_all RegEx conflict - php

if (preg_match_all ("/\[protected\]\s*(((?!\[protected\]|\[/protected\]).)+)\s*\[/protected\]/g", $text, $matches)) {
var_dump($matches);
var_dump($text);
}
The text is
<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>
But $matches when var_dump ed (outside the if statement), it gives out NULL
Help people!

You're using / (slash) as the regex delimiter, but you also have unescaped slashes in the regex. Either escape them or (preferably) use a different delimiter.
There's no g modifier in PHP regexes. If you want a global match, you use preg_match_all(); otherwise you use preg_match().
...but there is an s modifier, and you should be using it. That's what enables . to match newlines.
After changing your regex to this:
'~\[protected\]\s*((?:(?!\[/?protected\]).)+?)\s*\[/protected\]~s'
...I get this output:
array(2) {
[0]=>
array(1) {
[0]=>
string(42) "[protected]<br> STUFFFFFF<br>
[/protected]"
}
[1]=>
array(1) {
[0]=>
string(18) "<br> STUFFFFFF<br>"
}
}
string(93) "<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>"
Additional changes:
I switched to using single-quotes around the regex; double-quotes are subject to $variable interpolation and {embedded code} evaluation.
I shortened the lookahead expression by using an optional slash (/?).
I switched to using a reluctant plus (+?) so the whitespace following the closing tag doesn't get included in the capture group.
I changed the innermost group from capturing to non-capturing; it was only saving the last character in the matched text, which seems pointless.

$text= '<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>';
if (preg_match_all ("/\[protected\]\s*(((?!\[protected\]|\[\/protected\]).)+)\s*\[\/protected\]/x", $text, $matches)) {
var_dump($matches);
var_dump($text);
}
There is no g modifier in preg_match - you can read more at Pattern Modifiers . Using x modifier works fine thou.

Related

PHP preg_match returns two matches instead of one

I have this string: ATL.556808.UMO20.02 and I want to get only UMO20.02.
Here is my preg_match:
$e = preg_match('"\.[^\.]+\.(.*?)$"si', $t, $m);
But this code return two matches instead of one. I got:
array(2) {
[0]=> string(16) ".556808.UMO20.02"
[1]=> string(8) "UMO20.02"
}
But I want to get one match:
array(1) {
[0]=> string(8) "UMO20.02"
}
Where is the problem?

You don't have to use the s and i flags as there are no specific cases for upper or lowercase chars, and the dot does not have to match a newline in the example data.
You can use
\.[^.]+\.\K.+$
\. Match .
[^.]+\. Match 1+ times any char except a .
\K Forget what is matched
.+ Match any char 1+ times
$ End of string
Regex demo
Example code
$re = '/\.[^.]+\.\K.+$/';
$str = 'ATL.556808.UMO20.02';
preg_match($re, $str, $matches);
print_r($matches);
Output
Array
(
[0] => UMO20.02
)

Your \.[^\.]+\.(.*?)$ regex matches a ., then any one or more chars other than a dot, then a dot, and then any zero or more chars as few as possible (but as many as necessary to complete a match) up to the end of string. The .*? must be tempered to match any chars but dots.
To remove all up to and including the second dot, you can use
$t = 'ATL.556808.UMO20.02';
echo preg_replace('~^(?:[^.]+\.){2}~', '', $t);
// => UMO20.02
See the PHP demo. See the regex demo. Details:
^ - start of string
(?:[^.]+\.){2} - two occurrences of any one or more chars other than a . and then a . char

Match string with 1 or more trailing substrings

I have an input that goes like this
[d/D/d1/d2/d3/d4/d5/d6/d7/D1/D2/D3/D4/D5/D6/D7]+[\.]+[r1/r2/r3/r4/r5/r6/R1/R2/R3/R4/R5/R6]+[\.]+[number 1 to 37]+[#]+[number 0 - 9 ]
An example would be "d2.r1.4#100.37#1.9#2.3#1(can have as many 1-37 # 0-9 as needed)"
How do I write a regex match that can allow the last part of the string to be dynamic (matches as many groups as needed as inputted)
I've tried this expression:
[dD1-7]+\.[rR1-5]+\.
and I'm not sure how to match the dynamic group that comes after the "d2.r1." part.

Assuming you merely need to validate the string (and not capture/extract specific substrings), the following pattern provides the same result as Emma's answer but with a tighter syntax.
The i pattern modifier means you only have to write the two letters in lowercase. I don't use any excess non-capturing groups. Two-character character classes don't need a hyphen. \d is the shorter way of expressing [0-9].
Wrapping the final/repeating characters in parentheses then writing * means the sequence in the parentheses may repeat zero or more times.
Code: (Demo)
$inputs = [
'd2.r1.4#100.37#1.9#2.3#1',
'd2.r1.4#100.37#1.9#2.38#1.8#22',
'd2.r1.4#100.37#1.9#2.3#1.12#2.30#2',
];
$pattern = '/^d[1-7]\.r[1-6](?:\.(?:3[0-7]|[12]\d|[1-9])#\d+)*$/i';
foreach ($inputs as $input) {
echo "\n{$input}: ";
var_export((bool)preg_match($pattern, $input));
}
Output:
d2.r1.4#100.37#1.9#2.3#1: true
d2.r1.4#100.37#1.9#2.38#1.8#22: false
d2.r1.4#100.37#1.9#2.3#1.12#2.30#2: true

I'm guessing that maybe some expression similar to,
^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$
or with some slight changes, would likely work here.
Test
$re = '/^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$/m';
$str = 'd2.r1.4#100.37#1.9#2.3#1
d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1
d2.r1.4#100.38#1.9#2.3#1
d2.r1.4#100.0#1.9#2.3#1
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(2) {
[0]=>
array(1) {
[0]=>
string(24) "d2.r1.4#100.37#1.9#2.3#1"
}
[1]=>
array(1) {
[0]=>
string(63) "d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

find innerhtml in html tag excluding that tag using php preg_match_all

Using preg_match_all(), I want to match something like:
"<table class='list2'><tr><th>States</th><th width='55' class='view'>Localities</th></tr></table>"
This is exactly the string I'm trying to extract data from, including the brackets, quotes, angle-brackets etc...
I want "th" innerhtml using preg_match_all().
I am use this expression
(?=<th[^>]*>)(.*?)(?=<\/th>)
it will give me
"<th>States" and "<th width='55' class='view'>Localities"
but I want only "States" and "Localities".

You could use the below regex to match states and Localities. \K is used to cutdown the previously matched strings.
<th.*?>\K[^<]*
DEMO
Your PHP code would be,
<?php
$data = "\"<table class='list2'><tr><th>States</th><th width='55' class='view'>Localities</th></tr></table>\"";
$regex = '~<th.*?>\K[^<]*~';
preg_match_all($regex, $data, $matches);
var_dump($matches);
?>
Output:
array(1) {
[0]=>
array(2) {
[0]=>
string(6) "States"
[1]=>
string(10) "Localities"
}
}
Explanation:
<th.*?> Matches <th upto the first occurrence of >
\K Previously matched characters are discarded.
[^<]* Matches any character not of < zero or more times.

php regex find text within parenthesis

Using PHP or Powershell I need help in finding a text in a text file.txt, within parenthesis then output the value.
Example:
file.txt looks like this:
This is a test I (MyTest: Test) in a parenthesis
Another Testing (MyTest: JohnSmith) again. Not another testing testing (MyTest: 123)
My code:
$content = file_get_contents('file.txt');
$needle="MyTest"
preg_match('~^(.*'.$needle.'.*)$~', $content, $line);
Output to a new text file will be:
123Test, JohnSmith,123,

Use this pattern:
~\(%s:\s*(.*?)\)~s
Note that %s here is not a part of the actual pattern. It's used by sprintf() to substitute the values that are passed as arguments. %s stands for string, %d for signed integer etc.
Explanation:
~ - starting delimiter
\( - match a literal (
%s - a placeholder for the $needle value
: - match a literal :
\s* - zero or more whitespace characters
(.*?) - match (and capture) anything inside the parentheses
\) - match a literal )
~ - ending delimiter
s - a pattern modifier that makes . match newlines as well
Code:
$needle = 'MyTest';
$pattern = sprintf('~\(%s:\s*(.*?)\)~s', preg_quote($needle, '~'));
preg_match_all($pattern, $content, $matches);
var_dump($matches[1]);
Output:
array(3) {
[0]=>
string(4) "Test"
[1]=>
string(9) "JohnSmith"
[2]=>
string(3) "123"
}
Demo

Here's a Powershell solution:
#'
This is a test I (MyTest: Test) in a parenthesis
Another Testing (MyTest: JohnSmith) again. Not another testing testing (MyTest: 123)
'# | set-content test.txt
([regex]::Matches((get-content test.txt),'\([^:]+:\s*([^)]+)')|
foreach {$_.groups[1].value}) -join ','
Test,JohnSmith,123
You can add that trailing comma after it's done if you really did want that there....

php preg_match get numbers between two strings

Hi I'm starting to learn php regex and have the following problem:
I need to extract the numbers inside $string.
The regex I use returns "NULL".
$string = 'Clasificación</a> (2194) </li>';
$regex = '/Clasificación</a>((.*?))</li>/';
preg_match($regex , $string, $match);
var_dump($match);
Thanks in advance.

There are three problems with your regex:
You aren't escaping the forward slash. You're using the forward slash as a delimiter, so if you want to use it as a literal character inside the expression, you need to escape it
((.*?)) doesn't do what you think it does. It creates two capturing groups -- one nested inside the other. I assume, you're trying to capture what's inside the parentheses. For that, you'll need to escape the ( and ) characters. The expression would become: \((.*?)\)
Your expression doesn't handle whitespace. In the string you've given, there is whitespace between the </a> and the beginning of the number -- </a> (2194). To ignore the whitespace and capture just the number, you need to use \s (which matches any whitespace character). For that, you need to write \s*\((.*?)\)\s*.
The final regular expression after fixing all the above errors, will look like:
$regex = '~Clasificación</a>\s*\((.*?)\)\s*</li>~';
Full code:
$string = 'Clasificación</a> (2194) </li>';
$regex = '~Clasificación</a>\s*\((.*?)\)\s*</li>~';
preg_match($regex , $string, $match);
var_dump($match);
Output:
array(2) {
[0]=>
string(32) "ClasificaciÃ³n (2194) "
[1]=>
string(4) "2194"
}
Demo.

You forget to espace / in your regex, since you're using the / as a delimiter:
$regex = '/Clasificación<\/a>((.*?))<\/li>/';
// ^ delimiter ^^ ^ delimiter
// ^^ / in a string which is escaped
Another way can be to change that delimiter, and then you will not have to escape it:
$regex = '#Clasificación<\/a>((.*?))<\/li>#';
See the PHP documentation for more information.

you will have to escape out the special characters that you want to match:
$regex = '/Clasificación<\/a> \((.*?)\) <\/li>/'
and may want to make your match a little more specific where it matters (depending on your use case)
$regex = '/Clasificación<\/a>\s*\(([0-9]+)\)\s*<\/li>/';
that will allow for 0 or more spaces before or after the (1234) and only match if there are only numbers in the ()
I just tried this in php:
php > preg_match($regex , $string, $match);
php > var_dump($match);
array(2) {
[0]=>
string(30) "Clasificacin</a> (2194) </li>"
[1]=>
string(4) "2194"
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP preg_match_all RegEx conflict - php

Related

PHP preg_match returns two matches instead of one

Match string with 1 or more trailing substrings

find innerhtml in html tag excluding that tag using php preg_match_all

php regex find text within parenthesis

php preg_match get numbers between two strings

Categories

Resources