I have a few hundred thousand strings that are laid out like the following
AX23784268B2
LJ93842938A1
MN39423287S
IY289383N2
With PHP I'm racking my brain how to return B2, A1, S, and N2.
Tried all sorts of substr, strstr, strlen manipulation and am coming up short.
substr('MN39423287S', -2); ?> // returns 7S, not S
This is a simpler regexp than the other answer:
preg_match('/[A-Z][^A-Z]*$/', $token, $matches);
echo $matches[0];
[A-Z] matches a letter, [^A-Z] matches a non-letter. * makes the preceiding pattern match any number of times (including 0), and $ matches the end of the string.
So this matches a letter followed by any number of non-letters at the end of the string.
$matches[0] contains the portion of the string that the entire regexp matched.
There's many way to do this.
One example would be a regex
<?php
$regex = "/.+([A-Z].?+)$/";
$tokens = [
'AX23784268B2',
'LJ93842938A1',
'MN39423287S',
'IY289383N2',
];
foreach($tokens as $token)
{
preg_match($regex, $token, $matches);
var_dump($matches[1]);
// B2, A2, S, N2
}
How the regex works;
.+ - any character except newline
( - create a group
[A-Z] - match any A-Z character
.?+ - also match any characters after it, if any
) - end group
$ - match the end of the string
Related
I am trying to get zipcodes out of address strings.
Zipcodes may look like this: 23-123 or 50-530.
Strings usually look like this: Street 50, 50-123 City
What I tried to do is finding the position of the zipcode and cut the next 6 characters starting from that point. Unfortunatelly strpos returns false all the time.
$zipCodePosition = strpos($form->address, "\d{2}-\d{3}");
$zipCode = $zipCodePosition ? substr($form->address, $zipCodePosition , 6) : '';
The strpos does not allow the use of regex as an argument.
You need a preg_match / preg_match_all here:
// Get the first match
if (preg_match('~\b\d{2}-\d{3}\b~', $text, $match)) {
echo $match[0];
}
// Get all matches
if (preg_match_all('~\b\d{2}-\d{3}\b~', $text, $matches)) {
print_r($matches[0]);
}
The regex matches
\b - a word boundary
\d{2} - two digits
- - a hyphen
\d{3} - three digits
\b - a word boundary
See the regex demo.
http://www.tehplayground.com/#0qrTOzTh3
$inputs = array(
'2', // no match
'29.2', // no match
'2.48',
'8.06.16', // no match
'-2.41',
'-.54', // no match
'4.492', // no match
'4.194,32',
'39,299.39',
'329.382,39',
'-188.392,49',
'293.392,193', // no match
'-.492.183,33', // no match
'3.492.249,11',
'29.439.834,13',
'-392.492.492,43'
);
$number_pattern = '-?(?:[0-9]|[0-9]{2}|[0-9]{3}[\.,]?)?(?:[0-9]|[0-9]{2}|[0-9]{3})[\.,][0-9]{2}(?!\d)';
foreach($inputs as $input){
preg_match_all('/'.$number_pattern.'/m', $input, $matches);
print_r($matches);
}
It seems you are looking for
$number_pattern = '-?(?<![\d.,])\d{1,3}(?:[,.]\d{3})*[.,]\d{2}(?![\d.])';
See the PHP demo and a regex demo.
The anchors are not used, there are lookarounds on both sides of the pattern instead.
Pattern details:
-? - an optional hyphen
(?<![\d.,]) - there cannot be a digit, comma or dot befire the current location
-\d{1,3} - 1 to 3 digits
(?:[,.]\d{3})* - zero or more sequences of a comma or dot followed with 3 digits
[.,] - a comma or dot
\d{2} - 2 digits that are
(?![\d.]) - not followed with a digit or dot.
Note in PHP, you do not need to specify the /m MULTILINE mode and use the $ end of string anchor,
preg_match_all('/'.$number_pattern.'/', $input, $matches);
is enough to match the numbers you need in larger texts.
If you need to match them as standalone strings, use a simpler
^-?\d{1,3}(?:[,.]\d{3})*[.,]\d{2}$
See the regex demo.
Let's say I want to split this string in two variables:
$string = "levis 501";
I will use
preg_match('/\d+/', $string, $num);
preg_match('/\D+/', $string, $text);
but then let's say I want to split this one in two
$string = "levis 5° 501";
as $text = "levis 5°"; and $num = "501";
So my guess is I should add a rule to the preg_match('/\d+/', $string, $num); that looks for numbers only at the END of the string and I want it to be between 2 and 3 digits.
But also the $text match now has one number inside...
How would you do it?
To slit a string in two parts, use any of the following:
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
This regex matches:
^ - the start of the string
(.*?) - Group 1 capturing any one or more characters, as few as possible (as *? is a "lazy" quantifier) up to...
\s* - zero or more whitespace symbols
(\d+) - Group 2 capturing 1 or more digits
\D* - zero or more characters other than digit (it is the opposite shorthand character class to \d)
$ - end of string.
The ~s modifier is a DOTALL one forcing the . to match any character, even a newline, that it does not match without this modifier.
Or
preg_split('~\s*(?=\s*\d+\D*$)~', $s);
This \s*(?=\s*\d+\D*$) pattern:
\s* - zero or more whitespaces, but only if followed by...
(?=\s*\d+\D*$) - zero or more whitespaces followed with 1+ digits followed with 0+ characters other than digits followed with end of string.
The (?=...) construct is a positive lookahead that does not consume characters and just checks if the pattern inside matches and if yes, returns "true", and if not, no match occurs.
See IDEONE demo:
$s = "levis 5° 501";
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
print_r($matches[1] . ": ". $matches[2]. PHP_EOL);
print_r(preg_split('~\s*(?=\s*\d+\D*$)~', $s, 2));
I'm trying to get the string that match with original and with number in the end.
I got these strings:
mod_courts2
mod_courts_config
mod_courts_config2
From these strings I want the one that matches only with "mod_courts" with number in the end.
I'm doing this:
if (strpos($t, "mod_courts") !== FALSE) {
preg_match('/^\w+(\d+)$/U', $t, $match);
echo $match;
}
This returns me "mod_courts2" and "mod_courts_config2", I just want "mod_courts2"
Use the following regex:
/^[a-z]+_[a-z]+(\d+)$/
Explanation:
^ - assert position at the beginning of the string
[a-z]+ - match any alphabet one or more times
_ - match a literal undescore character
[a-z]+ - match any alphabet one or more times
(\d+) - match (and capture) any digit from 0 to 9 one or more times
$ - assert position at the end of the string
Test cases:
$array = array(
'mod_courts2',
'mod_courts_config',
'mod_courts_config2'
);
foreach ($array as $string) {
if(preg_match('/^[a-z]+_[a-z]+(\d+)$/i', $string, $matches)) {
print_r($matches);
}
}
Output:
Array
(
[0] => mod_courts2
[1] => 2
)
Very simply, you can do:
/^(mod_courts\d+)$/
However, if you want exactly the following format: sometext_somettext2, you can use the following regex:
/^([a-zA-Z]+_[a-zA-Z]+\d+)$/
or
/^([^_]+_[^_]+\d+)$/
Demos
http://regex101.com/r/jP8iC1
http://regex101.com/r/tI1uX8
http://regex101.com/r/fX8pO5
^mod_courts\d+$
this should do it
You can just use
^mod_courts[0-9]+$
Meaning mod_courts followed by a number (and only that, thanks to ^$ matching the beginning and end of the string). No need for the strpos check.
How can I extract hyphenated strings from this string line?
ADW-CFS-WE CI SLA Def No SLANAME CI Max Outage Service
I just want to extract "ADW-CFS-WE" from it but has been very unsuccessful for the past few hours. I'm stuck with this simple regEx "(.*)" making the all of the string stated about selected.
You can probably use:
preg_match("/\w+(-\w+)+/", ...)
The \w+ will match any number of alphanumeric characters (= one word). And the second group ( ) is any additional number of hyphen with letters.
The trick with regular expressions is often specificity. Using .* will often match too much.
$input = "ADW-CFS-WE X-Y CI SLA Def No SLANAME CI Max Outage Service";
preg_match_all('/[A-Z]+-[A-Z-]+/', $input, $matches);
foreach ($matches[0] as $m) {
echo $matches . "\n";
}
Note that this solutions assumes that only uppercase A-Z can match. If that's not the case, insert the correct character class. For example, if you want to allow arbitrary letters (like a and Ä), replace [A-Z] with \p{L}.
Just catch every space free [^\s] words with at least an '-'.
The following expression will do it:
<?php
$z = "ADW-CFS-WE CI SLA Def No SLANAME CI Max Outage Service";
$r = preg_match('#([^\s]*-[^\s]*)#', $z, $matches);
var_dump($matches);
The following pattern assumes the data is at the beginning of the string, contains only capitalized letters and may contain a hyphen before each group of one or more of those letters:
<?php
$str = 'ADW-CFS-WE CI SLA Def No SLANAME CI Max Outage Service';
if (preg_match('/^(?:-?[A-Z]+)+/', $str, $matches) !== false)
var_dump($matches);
Result:
array(1) {
[0]=>
string(10) "ADW-CFS-WE"
}