Regex to match a number followed by a specific string - php

I need to find a number followed by a specific string, within another string.
The initial string could be:
some text 0.25mcg some more text
some text 25mcg some more text
so the number could be a decimal. I need to be able to return the number (so 0.25 or 25) where ever the number is followed by 'mcg'
Can anyone help me out. This doesn't work:
if(preg_match('(\d+mcg)', $item, $match))

Another option is to capture a digit with an optional decimal part \d+(?:\.\d+)? and use a word boundary \b to prevent the match being part of a larger word.
\b(\d+(?:\.\d+)?)mcg\b
Regex demo | Php demo
Code example
$re = '/\b(\d+(?:\.\d+)?)mcg\b/';
$str = 'some text 0.25mcg some more text some text 25mcg some more text';
preg_match_all($re, $str, $matches);
print_r($matches[1]);
Output
Array
(
[0] => 0.25
[1] => 25
)
If you want a match only instead of a capturing group you might also opt for a positive lookahead (?= instead.
\b\d+(?:\.\d+)?(?=mcg\b)
Regex demo | Php demo

It's a job for preg_match_all
preg_match_all('/([\d.]+)mcg/', $item, $matches);
[\d.]+ matches 1 or more digits or dot.

here is simple version:
<?php
$item1 = 'some text 0.25mcg some more text';
$item2 = 'some text 25mcg some more text';
if (preg_match('/([0-9\\.]+)\\s*mcg/', $item1, $match)) echo $match[1] . '<br>';
if (preg_match('/([0-9\\.]+)\\s*mcg/', $item2, $match)) echo $match[1] . '<br>';

Related

preg_match for one single digit number with a blank space on both sides

I need to output the single 3 in the array below using preg_match or preg_split, how can I accomplish this? This possibilities are 1 through 8.
VMName Count CompatibilityForMigrationEnabled CompatibilityForOlderOperatingSystemsEnabled ------ ----- -------------------------------- -------------------------------------------- ap-1-38 3 False False
I have tried the following with no success using both preg_match and preg_split:
('\\s+\\d\\s+', $output)
('\\s+[\\d]\\s+', $output)
("^[\s0-9\s]+$", $output)
("/(.*), (.*)/", $output)
Give the following preg_match a try
<?php
$matched = preg_match('/( [0-9] )/', $string, $matches);
if($matched){
print_r($matches);
}
Hope this helps!
Try this:
preg_match("/( \d{1} )/", $input_line, $output_array);
Examples: http://www.phpliveregex.com/p/luf
To match a 1 to 8 number that is in between whitespaces, you may use
preg_match('~(?<!\S)[1-8](?!\S)~', $s, $match)
See the regex demo.
Details
(?<!\S) - a whitespace or start of string required immediately to the left of the current location
[1-8] - a digit from 1 to 8
(?!\S) - a whitespace or end of string required immediately to the right of the current location
See PHP demo:
$str = 'VMName Count CompatibilityForMigrationEnabled CompatibilityForOlderOperatingSystemsEnabled ------ ----- -------------------------------- -------------------------------------------- ap-1-38 3 False False';
if (preg_match('/(?<!\S)[1-8](?!\S)/', $str, $match)) {
echo $match[0];
}
// => 3
Note you may also use a capturing approach:
if (preg_match('/(?:^|\s)([1-8])(?:$|\s)/', $str, $match)) {
echo $match[1];
}
See the regex demo and the PHP demo.
Here, (?:^|\s) is a non-capturing alternation group matching start of string *(^) or (|) a whitespace (\s), then a digit from 1 to 8 is captured (with ([1-8])) and then (?:$|\s) matches the end of string ($) or a whitespace. $match[1] holds the necessary output.

regex inside tags with specified string

I'm not very good at regex but i have a string like this :
$str = '<span id="MainStatuSSpan" style="background: brown;"> Incoming: 012345678 Group- SUPERMONEY Fronter: - 992236 UID: Y3281602190002004448</span>';
$pattern = '/(?:Fronter: - )[0-9]{1,6}/i';
preg_match($pattern, $str, $matches);
print_r($matches);
/*** ^^^^^^^ This prints :*/
Array ( [0] => Fronter: - 992236 )
In case of the Fronter is not with - or spaces I don't get the Fronter - number.
Can anyone help with an example that works in any case, there is always a Fronter and a number.
you can use Fronter:\W*[0-9]{1,6}
Fronter:\W*[0-9]{1,6} : match Fronter:
\W* : zero or more non-word characters
[0-9]{1,6} one to six digits
you regex will also find a match with Fronter:99222236 so you must use \b to avoid overflow digit length
Fronter:[- ]*[0-9]{1,6}\b

Regex with PHP preg_replace() need to find the nearest name in string

I need to find the nearest name in string how would I do this ?
The closest I got was the apposite and it finds the furthest away from string is:
$string = "joe,bob,luis,sancho,bob,marco,lura,hannah,bob,marco,luis";
$new_string = preg_replace('/(bob(?!.*bob))/', 'found it!', $string);
echo $new_string;
<!-- outputs: joe,bob,luis,sancho,bob,marco,lura,hannah,found it!,marco,luis -->
How would I do the apposite ? and have an output like this:
<!-- outputs: joe,found it!,luis,sancho,bob,marco,lura,hannah,bob,marco,luis -->
The regex you are using (bob(?!.*bob)) matches the last occurrence of bob (not as a whole word) on a line, because the . matches any character but a newline, and the negative lookahead makes sure there is no bob after bob. See what your regex matches (if we use preg_replace with default options).
You may use
$re = '/\bbob\b/';
$str = "joe,bob,luis,sancho,bob,marco,lura,hannah,bob,marco,luis";
$result = preg_replace($re, 'found it!', $str, 1);
See IDEONE demo
The regex \bbob\b will match a whole word, and using the limit argument will only match the first occurrence of the word 'bob'.
See preg_replace help:
limit
The maximum possible replacements for each pattern in each subject string. Defaults to -1 (no limit).
You can try a negative lookbehind instead, like this
$string = "joe,bob,luis,sancho,bob,marco,lura,hannah,bob,marco,luis";
$new_string = preg_replace('/((?<!bob)bob)/', 'found it!', $string, 1);
echo $new_string;
<!-- outputs: joe,found it!,luis,sancho,bob,marco,lura,hannah,bob,marco,luisoff -->
As Wiktor said, use the limit option to match only the first occurrence of the name.

PHP's preg_match() returns the position of the last match

With
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
is it possible to search a string in reverse? I.e., return the position of the last occurrence of the pattern in the subject similar to strripos.
Or do I have to return the position of all matches with preg_match_all and use the last element of $matches?
PHP doesn't have a regex method that search a string from right to left (like in .NET). There are several possible recipes to solve that (this list isn't exhaustive, but it may provide ideas for your own workaround):
using preg_match_all with PREG_SET_ORDER flag and end($matches) will give you the last match set
reversing the string with strrev and building a "reversed" pattern to be used with preg_match
using preg_match and building a pattern anchored at the end of the string that ensures there is no more occurrences of the searched mask until the end of the string
using a greedy quantifier before the target and \K to start the match result at the position you want. Once the end of the string is reached, the regex engine will backtrack until it finds a match.
Examples with the string $str = 'xxABC1xxxABC2xx' for the pattern /x[A-Z]+\d/
Way 1: find all matches and displays the last.
if ( preg_match_all('/x[A-Z]+\d/', $str, $matches, PREG_SET_ORDER) )
print_r(end($matches)[0]);
Demo
Way 2: find the first match of the reversed string with a reversed pattern, and displays the reversed result.
if ( preg_match('/\d[A-Z]+x/', strrev($str), $match) )
print_r(strrev($match[0]));
Demo
Note that it isn't always so easy to reverse a pattern.
Way 3: Jumps from x to x and checks with the negative lookahead if there's no other x[A-Z]+\d matches from the end of the string.
if ( preg_match('/x[A-Z]+\d(?!.*x[A-Z]+\d)/', $str, $match) )
print_r($match[0]);
Demo
Variants:
With a lazy quantifier
if ( preg_match('/x[A-Z]+\d(?!.*?x[A-Z]+\d)/', $str, $match) )
print_r($match[0]);
or with a "tempered quantifier"
if ( preg_match('/x[A-Z]+\d(?=(?:(?!x[A-Z]+\d).)*$)/', $str, $match) )
print_r($match[0]);
It can be interesting to choose between these variants when you know in advance where a match has the most probabilities to occur.
Way 4: goes to the end of the string and backtracks until it finds a x[A-Z]+\d match. The \K removes the start of the string from the match result.
if ( preg_match('/^.*\Kx[A-Z]+\d/', $str, $match) )
print_r($match[0]);
Way 4 (a more hand-driven variant): to limit backtracking steps, you can greedily advance from the start of the string, atomic group by atomic group, and backtrack in the same way by atomic groups, instead of by characters.
if ( preg_match('/^(?>[^x]*\Kx)+[A-Z]+\d/', $str, $match) )
print_r($match[0]);
"Greedy" is the keyword here.
* is by default greedy, and *? limits greediness to the bare minimum.
So the solution is to use the combination, e.g. (searching for last period followed by a whitespace),
/^.*\.\s(.*?)$/s
^ is the beginning of text
.* eats as much as it can, including matching patterns
\\.\s is the period followed by a whitespace (what I am looking for)
(.*?) eats as little as possible. Capture group () so I could address it as a match group.
$ end of text
s - makes sure newlines are ignored (not treated as $ and ^ - . (dot) matches newline)
I did not understand exactly what you want, because it depends on how many groups will be captured.
I made a function to capture the offset of the last capture according to the group number. In my pattern, I have three groups: the first group, full capture and the other two groups, sub-groups.
Pattern sample code:
$pattern = "/<a[^\x3e]{0,}href=\x22([^\x22]*)\x22>([^\x3c]*)<\/a>/";
HTML sample code:
$subject = '<ul>
<li>Search Engines</li>
<li>Google</li>
<li>Bing</li>
<li>DuckDuckGo</li>
</ul>';
My function captures the offset of the last element and you have the possibility to indicate the number of matching:
function get_offset_last_match( $pattern, $subject, $number ) {
if ( preg_match_all( $pattern, $subject, $matches, PREG_OFFSET_CAPTURE ) == false ) {
return false;
}
return $matches[$number][count( $matches[0] ) - 1][1];
}
You can get detailed information about preg_match_all here on official documentation.
Using my pattern for example:
0 => all text
1 => href value
2 => innerHTML
echo '<pre>';
echo get_offset_last_match( $pattern, $subject, 0 ) . PHP_EOL; // all text
echo get_offset_last_match( $pattern, $subject, 1 ) . PHP_EOL; // href value
echo get_offset_last_match( $pattern, $subject, 2 ) . PHP_EOL; // innerHTML
echo '</pre>';
die();
The output is:
140
149
174
My function (text):
function get_text_last_match( $pattern, $subject, $number ) {
if ( preg_match_all( $pattern, $subject, $matches, PREG_OFFSET_CAPTURE ) == false ) {
return false;
}
return $matches[$number][count( $matches[0] ) - 1][0];
}
Sample code:
echo '<textarea style="font-family: Consolas: font-size: 14px; height: 200px; tab-size: 4; width: 90%;">';
echo 'ALL = ' . get_text_last_match( $pattern, $subject, 0 ) . PHP_EOL; // all text
echo 'HREF = ' . get_text_last_match( $pattern, $subject, 1 ) . PHP_EOL; // href value
echo 'INNER = ' . get_text_last_match( $pattern, $subject, 2 ) . PHP_EOL; // innerHTML
echo '</textarea>';
The output is:
ALL = DuckDuckGo
HREF = https://duckduckgo.com/
INNER = DuckDuckGo
preg_match() does not support reverse searching because it is not necessary.
You can create a regular expression that contains a greedy (that is default) lookahead that matches anything (like (?<=.*)stuff). This way, you should get the last occurrence of your match.
Detailed information from official documentation is in preg_match.

Regex to match a set of characters but only if particular characters are not grouped

This is a tricky one, I have a string:
This is some text with a {%TAG IN IT%} and some more text then {%ANOTHER TAG%} with some more text at the end.
I have a regex to match the tags:
({%\w+[\w =!:;,\.\$%"'#\?\-\+\{}]*%})
Which will match a starting tag with any alphanumeric character followed by any number of other ansi characters (sample set specified in the regex above).
However (in PHP using "preg_match_all" and "preg_split" at least) the fact that the set contains both the percent (%) and the curly braces ({}) means that the regex matches too much if there are two tags on the same line.
e.g, in the example given, the following is matched:
{%TAG IN IT%} and some more text then {%ANOTHER TAG%}
As you can see, the %}...{% were matched. So, what I need is to allow the "%" but NOT when followed by "}"
I've tried non-reedy matching, and negative lookahead, but the negative lookahead won't work in a character set (i.e. everything in the [\w...]* set).
I'm stuck!
You could use alternation to achieve this:
/\{%(?:[^%]|%(?!}))*%\}/
It matches either characters that aren't % or those that aren't followed by } (using a look-ahead assertion).
$str = 'This is some text with a {%tag with % and } inside%} and some more text then {%ANOTHER TAG%} with some more text at the end.';
$pattern = '/\{%(?:[^%]|%(?!}))*%\}/';
preg_match_all($pattern, $str, $matches);
print_r($matches[0]);
Output:
Array
(
[0] => {%tag with % and } inside%}
[1] => {%ANOTHER TAG%}
)
A slight modification of your regexp works(Just add the question mark to make it non-greedy)-
<?php
$input = "This is some text with a {%TAG % }IT%%} and some more text then {%ANOTHER TAG%} with some more text at the end.";
$regexp = "/{%\w+[\w =!:;,\.\$%\"'#\?\-\+\{}]*?%}/";
// ^ Notice this
if(preg_match_all($regexp, $input, $matches, PREG_SET_ORDER)) {
foreach($matches as $match) {
var_dump($match);
echo "\r\n";
}
unset($match);
}
/*
Outputs:
array
0 => string '{%TAG % }IT%%}' (length=14)
array
0 => string '{%ANOTHER TAG%}' (length=15)
*/
?>

Categories