php regex find text within parenthesis

php regex find text within parenthesis - php

Using PHP or Powershell I need help in finding a text in a text file.txt, within parenthesis then output the value.
Example:
file.txt looks like this:
This is a test I (MyTest: Test) in a parenthesis
Another Testing (MyTest: JohnSmith) again. Not another testing testing (MyTest: 123)
My code:
$content = file_get_contents('file.txt');
$needle="MyTest"
preg_match('~^(.*'.$needle.'.*)$~', $content, $line);
Output to a new text file will be:
123Test, JohnSmith,123,

Use this pattern:
~\(%s:\s*(.*?)\)~s
Note that %s here is not a part of the actual pattern. It's used by sprintf() to substitute the values that are passed as arguments. %s stands for string, %d for signed integer etc.
Explanation:
~ - starting delimiter
\( - match a literal (
%s - a placeholder for the $needle value
: - match a literal :
\s* - zero or more whitespace characters
(.*?) - match (and capture) anything inside the parentheses
\) - match a literal )
~ - ending delimiter
s - a pattern modifier that makes . match newlines as well
Code:
$needle = 'MyTest';
$pattern = sprintf('~\(%s:\s*(.*?)\)~s', preg_quote($needle, '~'));
preg_match_all($pattern, $content, $matches);
var_dump($matches[1]);
Output:
array(3) {
[0]=>
string(4) "Test"
[1]=>
string(9) "JohnSmith"
[2]=>
string(3) "123"
}
Demo

Here's a Powershell solution:
#'
This is a test I (MyTest: Test) in a parenthesis
Another Testing (MyTest: JohnSmith) again. Not another testing testing (MyTest: 123)
'# | set-content test.txt
([regex]::Matches((get-content test.txt),'\([^:]+:\s*([^)]+)')|
foreach {$_.groups[1].value}) -join ','
Test,JohnSmith,123
You can add that trailing comma after it's done if you really did want that there....

Related

PHP preg_match returns two matches instead of one

I have this string: ATL.556808.UMO20.02 and I want to get only UMO20.02.
Here is my preg_match:
$e = preg_match('"\.[^\.]+\.(.*?)$"si', $t, $m);
But this code return two matches instead of one. I got:
array(2) {
[0]=> string(16) ".556808.UMO20.02"
[1]=> string(8) "UMO20.02"
}
But I want to get one match:
array(1) {
[0]=> string(8) "UMO20.02"
}
Where is the problem?

You don't have to use the s and i flags as there are no specific cases for upper or lowercase chars, and the dot does not have to match a newline in the example data.
You can use
\.[^.]+\.\K.+$
\. Match .
[^.]+\. Match 1+ times any char except a .
\K Forget what is matched
.+ Match any char 1+ times
$ End of string
Regex demo
Example code
$re = '/\.[^.]+\.\K.+$/';
$str = 'ATL.556808.UMO20.02';
preg_match($re, $str, $matches);
print_r($matches);
Output
Array
(
[0] => UMO20.02
)

Your \.[^\.]+\.(.*?)$ regex matches a ., then any one or more chars other than a dot, then a dot, and then any zero or more chars as few as possible (but as many as necessary to complete a match) up to the end of string. The .*? must be tempered to match any chars but dots.
To remove all up to and including the second dot, you can use
$t = 'ATL.556808.UMO20.02';
echo preg_replace('~^(?:[^.]+\.){2}~', '', $t);
// => UMO20.02
See the PHP demo. See the regex demo. Details:
^ - start of string
(?:[^.]+\.){2} - two occurrences of any one or more chars other than a . and then a . char

php preg_match get numbers between two strings

Hi I'm starting to learn php regex and have the following problem:
I need to extract the numbers inside $string.
The regex I use returns "NULL".
$string = 'Clasificación</a> (2194) </li>';
$regex = '/Clasificación</a>((.*?))</li>/';
preg_match($regex , $string, $match);
var_dump($match);
Thanks in advance.

There are three problems with your regex:
You aren't escaping the forward slash. You're using the forward slash as a delimiter, so if you want to use it as a literal character inside the expression, you need to escape it
((.*?)) doesn't do what you think it does. It creates two capturing groups -- one nested inside the other. I assume, you're trying to capture what's inside the parentheses. For that, you'll need to escape the ( and ) characters. The expression would become: \((.*?)\)
Your expression doesn't handle whitespace. In the string you've given, there is whitespace between the </a> and the beginning of the number -- </a> (2194). To ignore the whitespace and capture just the number, you need to use \s (which matches any whitespace character). For that, you need to write \s*\((.*?)\)\s*.
The final regular expression after fixing all the above errors, will look like:
$regex = '~Clasificación</a>\s*\((.*?)\)\s*</li>~';
Full code:
$string = 'Clasificación</a> (2194) </li>';
$regex = '~Clasificación</a>\s*\((.*?)\)\s*</li>~';
preg_match($regex , $string, $match);
var_dump($match);
Output:
array(2) {
[0]=>
string(32) "ClasificaciÃ³n (2194) "
[1]=>
string(4) "2194"
}
Demo.

You forget to espace / in your regex, since you're using the / as a delimiter:
$regex = '/Clasificación<\/a>((.*?))<\/li>/';
// ^ delimiter ^^ ^ delimiter
// ^^ / in a string which is escaped
Another way can be to change that delimiter, and then you will not have to escape it:
$regex = '#Clasificación<\/a>((.*?))<\/li>#';
See the PHP documentation for more information.

you will have to escape out the special characters that you want to match:
$regex = '/Clasificación<\/a> \((.*?)\) <\/li>/'
and may want to make your match a little more specific where it matters (depending on your use case)
$regex = '/Clasificación<\/a>\s*\(([0-9]+)\)\s*<\/li>/';
that will allow for 0 or more spaces before or after the (1234) and only match if there are only numbers in the ()
I just tried this in php:
php > preg_match($regex , $string, $match);
php > var_dump($match);
array(2) {
[0]=>
string(30) "Clasificacin</a> (2194) </li>"
[1]=>
string(4) "2194"
}

PHP regex - Take the short one

I have the string: This is a [[bla]] and i want a [[burp]] and i need to put in an array the 2 string [[bla]] and [[burp]].
The regexp i am trying to use is:
$pattern = "/\[\[.+\]\]/"
The problem is that the output is: [[bla]] and [[burp]] ,because i suppose it take the first [[ with the last ]]
How can i fix the pattern?

Make it ungreedy, see it on Regexr
/\[\[.+?\]\]/
or use a negated character class, see it on Regexr
/\[\[[^\]]+\]\]/

You need ungreedy repitition (lazy) matching here -> *? to get only the text between [[ ]] and not between [[ ]] [[ ]]:
$pattern = "/\[\[(.*?)\]\]/"
Also you need a matching group to get only the text between the square brackets and not the brackets itself -> (.*?)
Example:
$string = "This is a [[bla]] and i want a [[burp]]";
$pattern = "/\[\[(.*?)\]\]/";
preg_match_all($pattern , $string, $matches);
var_dump($matches[1]);
Output:
array(2) {
[0]=>
string(3) "bla"
[1]=>
string(4) "burp"
}

preg_match or similar to get value from a string

I am not good with preg_match or similar functions which are not deprecated.
Here are 2 strings:
/Cluster-Computers-c-10.html
/Mega-Clusters-c-15_32.html
I would to find out:
In number 1 example, how to get the value between -c- and .html (the value in the example is 10). The value is always an integer (numeric)
In number 2 example, how to get the value between -c- and .html (the value in the example is 15_32) . The value is always an integer seperated by _
Basically what I want to do is check if a string has either c-10.html or c-15_32.html and get the value and pass it to the database.

You can do:
preg_match('/-c-(\d+(?:_\d+)?)\.html$/i',$str);
Explanation:
-c- : A literal -c-
( : Beginning of capturing group
\d+ : one or more digits, that is a number
(?: : Beginning of a non-capturing group
_\d+ : _ followed by a number
) : End of non-capturing group
? : Makes the last group optional
) : End of capturing group
\. : . is a metacharacter to match any char (expect newline) to match
a literal . you need to escape it.
html : a literal html
$ : End anchor. Without it the above pattern will match any part
of the input string not just the end.
See it

preg_match('~-c-(.*?)\.html$~', $str, $matches)
var_dump($matches);

/-c-(\d+(?:_\d+)?)\.html$/i
-c- look for -c-
(\d+(?:_\d+)?) match number or number-underscore-number
\.html a period and trailing html
$ force it to match the end of the line
i case-insensitive match
Example:
<?php
header('Content-Type: text/plain');
$t = Array(
'1) /Cluster-Computers-c-10.html',
'2) /Mega-Clusters-c-15_32.html'
);
foreach ($t as $test){
$_ = null;
if (preg_match('/-c-(\d+(?:_\d+)?)\.html$/i',$test,$_))
var_dump($_);
echo "\r\n";
}
?>
output:
array(2) {
[0]=>
string(10) "-c-10.html"
[1]=>
string(2) "10"
}
array(2) {
[0]=>
string(13) "-c-15_32.html"
[1]=>
string(5) "15_32"
}
Working Code: http://www.ideone.com/B70AQ

The simplest way I see would be:
preg_match( '/-c-([^.]+)\.html/i', $url, $matches );
var_dump( $matches );

PHP preg_match_all RegEx conflict

if (preg_match_all ("/\[protected\]\s*(((?!\[protected\]|\[/protected\]).)+)\s*\[/protected\]/g", $text, $matches)) {
var_dump($matches);
var_dump($text);
}
The text is
<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>
But $matches when var_dump ed (outside the if statement), it gives out NULL
Help people!

You're using / (slash) as the regex delimiter, but you also have unescaped slashes in the regex. Either escape them or (preferably) use a different delimiter.
There's no g modifier in PHP regexes. If you want a global match, you use preg_match_all(); otherwise you use preg_match().
...but there is an s modifier, and you should be using it. That's what enables . to match newlines.
After changing your regex to this:
'~\[protected\]\s*((?:(?!\[/?protected\]).)+?)\s*\[/protected\]~s'
...I get this output:
array(2) {
[0]=>
array(1) {
[0]=>
string(42) "[protected]<br> STUFFFFFF<br>
[/protected]"
}
[1]=>
array(1) {
[0]=>
string(18) "<br> STUFFFFFF<br>"
}
}
string(93) "<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>"
Additional changes:
I switched to using single-quotes around the regex; double-quotes are subject to $variable interpolation and {embedded code} evaluation.
I shortened the lookahead expression by using an optional slash (/?).
I switched to using a reluctant plus (+?) so the whitespace following the closing tag doesn't get included in the capture group.
I changed the innermost group from capturing to non-capturing; it was only saving the last character in the matched text, which seems pointless.

$text= '<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>';
if (preg_match_all ("/\[protected\]\s*(((?!\[protected\]|\[\/protected\]).)+)\s*\[\/protected\]/x", $text, $matches)) {
var_dump($matches);
var_dump($text);
}
There is no g modifier in preg_match - you can read more at Pattern Modifiers . Using x modifier works fine thou.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php regex find text within parenthesis - php

Related

PHP preg_match returns two matches instead of one

php preg_match get numbers between two strings

PHP regex - Take the short one

preg_match or similar to get value from a string

PHP preg_match_all RegEx conflict

Categories

Resources