preg_match or similar to get value from a string - php

I am not good with preg_match or similar functions which are not deprecated.
Here are 2 strings:
/Cluster-Computers-c-10.html
/Mega-Clusters-c-15_32.html
I would to find out:
In number 1 example, how to get the value between -c- and .html (the value in the example is 10). The value is always an integer (numeric)
In number 2 example, how to get the value between -c- and .html (the value in the example is 15_32) . The value is always an integer seperated by _
Basically what I want to do is check if a string has either c-10.html or c-15_32.html and get the value and pass it to the database.

You can do:
preg_match('/-c-(\d+(?:_\d+)?)\.html$/i',$str);
Explanation:
-c- : A literal -c-
( : Beginning of capturing group
\d+ : one or more digits, that is a number
(?: : Beginning of a non-capturing group
_\d+ : _ followed by a number
) : End of non-capturing group
? : Makes the last group optional
) : End of capturing group
\. : . is a metacharacter to match any char (expect newline) to match
a literal . you need to escape it.
html : a literal html
$ : End anchor. Without it the above pattern will match any part
of the input string not just the end.
See it

preg_match('~-c-(.*?)\.html$~', $str, $matches)
var_dump($matches);

/-c-(\d+(?:_\d+)?)\.html$/i
-c- look for -c-
(\d+(?:_\d+)?) match number or number-underscore-number
\.html a period and trailing html
$ force it to match the end of the line
i case-insensitive match
Example:
<?php
header('Content-Type: text/plain');
$t = Array(
'1) /Cluster-Computers-c-10.html',
'2) /Mega-Clusters-c-15_32.html'
);
foreach ($t as $test){
$_ = null;
if (preg_match('/-c-(\d+(?:_\d+)?)\.html$/i',$test,$_))
var_dump($_);
echo "\r\n";
}
?>
output:
array(2) {
[0]=>
string(10) "-c-10.html"
[1]=>
string(2) "10"
}
array(2) {
[0]=>
string(13) "-c-15_32.html"
[1]=>
string(5) "15_32"
}
Working Code: http://www.ideone.com/B70AQ

The simplest way I see would be:
preg_match( '/-c-([^.]+)\.html/i', $url, $matches );
var_dump( $matches );

Related

PHP preg_match returns two matches instead of one

I have this string: ATL.556808.UMO20.02 and I want to get only UMO20.02.
Here is my preg_match:
$e = preg_match('"\.[^\.]+\.(.*?)$"si', $t, $m);
But this code return two matches instead of one. I got:
array(2) {
[0]=> string(16) ".556808.UMO20.02"
[1]=> string(8) "UMO20.02"
}
But I want to get one match:
array(1) {
[0]=> string(8) "UMO20.02"
}
Where is the problem?
You don't have to use the s and i flags as there are no specific cases for upper or lowercase chars, and the dot does not have to match a newline in the example data.
You can use
\.[^.]+\.\K.+$
\. Match .
[^.]+\. Match 1+ times any char except a .
\K Forget what is matched
.+ Match any char 1+ times
$ End of string
Regex demo
Example code
$re = '/\.[^.]+\.\K.+$/';
$str = 'ATL.556808.UMO20.02';
preg_match($re, $str, $matches);
print_r($matches);
Output
Array
(
[0] => UMO20.02
)
Your \.[^\.]+\.(.*?)$ regex matches a ., then any one or more chars other than a dot, then a dot, and then any zero or more chars as few as possible (but as many as necessary to complete a match) up to the end of string. The .*? must be tempered to match any chars but dots.
To remove all up to and including the second dot, you can use
$t = 'ATL.556808.UMO20.02';
echo preg_replace('~^(?:[^.]+\.){2}~', '', $t);
// => UMO20.02
See the PHP demo. See the regex demo. Details:
^ - start of string
(?:[^.]+\.){2} - two occurrences of any one or more chars other than a . and then a . char

Problem regular express pattern hunting similar matches

I have one of the four patter:
"Test"
'Test'
`Test`
(Test)
Is it possible to get "Test" with a single preg_match call?
I tried the following:
if ( preg_match( '/^(?:"(.*)"|\'(.*)\'|`(.*)`|\((.*)\')$/iu', $pattern, $matches ) )
... but this gives me five elements of $matches back. But I would like to have two only (One for the whole match and one for the found match with "Test" in it.)
To make sure that the single quote, back tick and double quote and have the same closing char you might use a capturing group with a backreference to that group.
To get the same group in the alternation to also match ( with the closing ) you might use a branch reset group.
The match for Test is in group 2
(?|(["'`])(Test)\1|\(((Test)\)))
Explanation
(?| Branch reset group
(["'`]) Capture in group 1 any of the listed
(Test)\1 Capture in group 2 matching Test followed by a backreference \1 to group 1
| Or
\(((Test)\)) Match (, capture in group 2 matching Test followed by )
) Close branch reset group
Regex demo | Php demo
For example:
$strings = [
"\"Test\"",
"'Test'",
"`Test`",
"(Test)",
"Test\"",
"'Test",
"Test`",
"(Test",
"\"Test'",
"'Test\"",
"`Test",
"Test)",
];
$pattern = '/(?|(["\'`])(Test)\1|\(((Test)\)))/';
foreach ($strings as $string){
$isMatch = preg_match($pattern, $string, $matches);
if ($isMatch) {
echo "Match $string ==> " . $matches[2] . PHP_EOL;
}
}
Result
Match "Test" ==> Test
Match 'Test' ==> Test
Match `Test` ==> Test
Match (Test) ==> Test
You can use dot to match the characters aroun d the word and use array_unique to remove duplicates.
preg_match_all("/.(\w+)./", $str,$match);
foreach($match as &$m) $m = array_unique($m);
var_dump($match);
https://3v4l.org/T2hnh
array(2) {
[0]=>
array(4) {
[0]=>
string(6) ""Test""
[1]=>
string(6) "'Test'"
[2]=>
string(6) "`Test`"
[3]=>
string(6) "(Test)"
}
[1]=>
&array(1) {
[0]=>
string(4) "Test"
}
}
You can use non-capturing groups :
'/^(?:"|\'|`|\()(.*)(?:"|\'|`|\))$/iu'
So just the (.*) group will capture data.
Your regex could be:
^['"`(](.+)['"`)]$
Which would give off the following code in PHP:
if(preg_match('^[\'"`(](.+)[\'"`)]$', $pattern, $matches))
Explanation
In Regex, character groups—marked with enclosing square brackets []— matches one of the characters inside of it.

php preg_match_all returning array of arrays

I want to replace some template tags:
$tags = '{name} text {first}';
preg_match_all('~\{(\w+)\}~', $tags, $matches);
var_dump($matches);
output is:
array(2) {
[0]=> array(2) {
[0]=> string(6) "{name}"
[1]=> string(7) "{first}"
}
[1]=> array(2) {
[0]=> string(4) "name"
[1]=> string(5) "first"
}
}
why are there inside 2 arrays? How to achieve only second one?
The sort answer:
Is there an alternative? Of course there is: lookaround assertions allow you to use zero-width (non-captured) single char matches easily:
preg_match_all('/(?<=\{)\w+(?=})/', $tags, $matches);
var_dump($matches);
Will dump this:
array(1) {
[0]=>
array(2) {
[0]=>
string(4) "name"
[1]=>
string(5) "first"
}
}
The pattern:
(?<=\{): positive lookbehind - only match the rest of the pattern if there's a { character in front of it (but don't capture it)
\w+: word characters are matches
(?=}): only match preceding pattern if it is followed by a } character (but don't capture the } char)
It's that simple: the pattern uses the {} delimiter chars as conditions for the matches, but doesn't capture them
Explaining this $matches array structure a bit:
The reason why $matches looks the way it does is quite simple: when using preg_match(_all), the first entry in the match array will always be the entire string matched by the given regex. That's why I used zero-width lookaround assertions, instead of groups. Your expression matches "{name}" in its entirety, and extracts "name" through grouping.
The matches array will hold the full match on index 0, and add groups at every subsequent index, in your case that means that:
$matches[0] will contain all substrings matching /\{\w+\}/ as a pattern.
$matches[1] will contain all substrings that were captured (/\{(\w+)\}/ captures (\w+)).
If you were to have a regex like this: /\{((\w)([^}]+))}/ the matches array will look something like this:
[
0 => [
'{name}',//as if you'd written /\{\w[^}]+}/
],
1 => [
'name',//matches group (\w)([^}]+), as if you wrote (\w[^}]+)
],
2 => [
'n',//matches (\w) group
],
3 => [
'ame',//and this is the ([^}]+) group obviously
]
]
Why? simple because the pattern contains 3 matching groups. Like I said: the first index in the matches array will always be the full match, regardless of capture groups. The groups are then appended to the array in the order the appear in in the expression. So if we analyze the expression:
\{: not matches, but part of the pattern, will only be in the $matches[0] values
((\w)([^}]+)): Start of first matching group, \w[^}]+ match is grouped here, $matches[1] will contain these values
(\w): Second group, a single \w char (ie first character after {. $matches[2] will therefore contain all first characters after a {
([^}]+): Third group, matches rest of string after {\w until a } is encountered, this will make out the $matches[3] values
To better understand, and be able to predict the way $matches will get populated, I'd strongly recommend you use this site: regex101. Write your expression there, and it'll break it all down for you on the right hand side, listing the groups. For example:
/\{((\w)([^}]+))}/
Is broken down like this:
/\{((\w)([^}]+))}/
\{ matches the character { literally
1st Capturing group ((\w)([^}]+))
2nd Capturing group (\w)
\w match any word character [a-zA-Z0-9_]
3rd Capturing group ([^}]+)
[^}]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
} the literal character }
} matches the character } literally
Looking at the capturing groups, you can now confidently say what $matches will look like, and you can safely say that $matches[2] will be an array of single characters.
Of course, this may leave you wondering why $matches is a 2D array. Well, that again is really quite easy: What you can predict is how many match indexes a $matches array will contain: 1 for the full pattern, then +1 for each capture group. What you Can't predict, though, is how many matches you'll find.
So what preg_match_all does is really quite simple: fill $matches[0] with all substrings that match the entire pattern, then extract each group substring from these matches and append that value onto the respective $matches arrays. In other words, the number of arrays that you can find in $matches is a given: it depends on the pattern. The number of keys you can find in the sub-arrays of $matches is an unknown, it depends on the string you're processing. If preg_match_all were to return a 1D array, it would be a lot harder to process the matches, now you can simply write this:
$total = count($matches);
foreach ($matches[0] as $k => $full) {
echo $full . ' contains: ' . PHP_EOL;
for ($i=1;$i<$total;++$i) {
printf(
'Group %d: %s' . PHP_EOL,
$i, $matches[$i][$k]
);
}
}
If preg_match_all created a flat array, you'd have to keep track of the amount of groups in your pattern. Whenever the pattern changes, you'd also have make sure to update the rest of the code to reflect the changes made to the pattern, making your code harder to maintain, whilst making it more error-prone, too
Thats because your regex could have multiple match groups - if you have more (..) you would have more entries in your array. The first one[0] ist always the whole match.
If you want an other order of the array, you could use PREG_SET_ORDER as the 4. argument for preg_match_all. Doing this would result in the following
array(2) {
[0]=> array(2) {
[0]=> string(6) "{name}"
[1]=> string(7) "name"
}
[1]=> array(2) {
[0]=> string(4) "{first}"
[1]=> string(5) "first"
}
}
this could be easier if you loop over your result in a foreach loop.
If you only interessted in the first match - you should stay with the default PREG_PATTERN_ORDER and just use $matches[1]

php regex find text within parenthesis

Using PHP or Powershell I need help in finding a text in a text file.txt, within parenthesis then output the value.
Example:
file.txt looks like this:
This is a test I (MyTest: Test) in a parenthesis
Another Testing (MyTest: JohnSmith) again. Not another testing testing (MyTest: 123)
My code:
$content = file_get_contents('file.txt');
$needle="MyTest"
preg_match('~^(.*'.$needle.'.*)$~', $content, $line);
Output to a new text file will be:
123Test, JohnSmith,123,
Use this pattern:
~\(%s:\s*(.*?)\)~s
Note that %s here is not a part of the actual pattern. It's used by sprintf() to substitute the values that are passed as arguments. %s stands for string, %d for signed integer etc.
Explanation:
~ - starting delimiter
\( - match a literal (
%s - a placeholder for the $needle value
: - match a literal :
\s* - zero or more whitespace characters
(.*?) - match (and capture) anything inside the parentheses
\) - match a literal )
~ - ending delimiter
s - a pattern modifier that makes . match newlines as well
Code:
$needle = 'MyTest';
$pattern = sprintf('~\(%s:\s*(.*?)\)~s', preg_quote($needle, '~'));
preg_match_all($pattern, $content, $matches);
var_dump($matches[1]);
Output:
array(3) {
[0]=>
string(4) "Test"
[1]=>
string(9) "JohnSmith"
[2]=>
string(3) "123"
}
Demo
Here's a Powershell solution:
#'
This is a test I (MyTest: Test) in a parenthesis
Another Testing (MyTest: JohnSmith) again. Not another testing testing (MyTest: 123)
'# | set-content test.txt
([regex]::Matches((get-content test.txt),'\([^:]+:\s*([^)]+)')|
foreach {$_.groups[1].value}) -join ','
Test,JohnSmith,123
You can add that trailing comma after it's done if you really did want that there....

php preg_match get numbers between two strings

Hi I'm starting to learn php regex and have the following problem:
I need to extract the numbers inside $string.
The regex I use returns "NULL".
$string = 'Clasificación</a> (2194) </li>';
$regex = '/Clasificación</a>((.*?))</li>/';
preg_match($regex , $string, $match);
var_dump($match);
Thanks in advance.
There are three problems with your regex:
You aren't escaping the forward slash. You're using the forward slash as a delimiter, so if you want to use it as a literal character inside the expression, you need to escape it
((.*?)) doesn't do what you think it does. It creates two capturing groups -- one nested inside the other. I assume, you're trying to capture what's inside the parentheses. For that, you'll need to escape the ( and ) characters. The expression would become: \((.*?)\)
Your expression doesn't handle whitespace. In the string you've given, there is whitespace between the </a> and the beginning of the number -- </a> (2194). To ignore the whitespace and capture just the number, you need to use \s (which matches any whitespace character). For that, you need to write \s*\((.*?)\)\s*.
The final regular expression after fixing all the above errors, will look like:
$regex = '~Clasificación</a>\s*\((.*?)\)\s*</li>~';
Full code:
$string = 'Clasificación</a> (2194) </li>';
$regex = '~Clasificación</a>\s*\((.*?)\)\s*</li>~';
preg_match($regex , $string, $match);
var_dump($match);
Output:
array(2) {
[0]=>
string(32) "Clasificación (2194) "
[1]=>
string(4) "2194"
}
Demo.
You forget to espace / in your regex, since you're using the / as a delimiter:
$regex = '/Clasificación<\/a>((.*?))<\/li>/';
// ^ delimiter ^^ ^ delimiter
// ^^ / in a string which is escaped
Another way can be to change that delimiter, and then you will not have to escape it:
$regex = '#Clasificación<\/a>((.*?))<\/li>#';
See the PHP documentation for more information.
you will have to escape out the special characters that you want to match:
$regex = '/Clasificación<\/a> \((.*?)\) <\/li>/'
and may want to make your match a little more specific where it matters (depending on your use case)
$regex = '/Clasificación<\/a>\s*\(([0-9]+)\)\s*<\/li>/';
that will allow for 0 or more spaces before or after the (1234) and only match if there are only numbers in the ()
I just tried this in php:
php > preg_match($regex , $string, $match);
php > var_dump($match);
array(2) {
[0]=>
string(30) "Clasificacin</a> (2194) </li>"
[1]=>
string(4) "2194"
}

Categories