I am trying to get zipcodes out of address strings.
Zipcodes may look like this: 23-123 or 50-530.
Strings usually look like this: Street 50, 50-123 City
What I tried to do is finding the position of the zipcode and cut the next 6 characters starting from that point. Unfortunatelly strpos returns false all the time.
$zipCodePosition = strpos($form->address, "\d{2}-\d{3}");
$zipCode = $zipCodePosition ? substr($form->address, $zipCodePosition , 6) : '';
The strpos does not allow the use of regex as an argument.
You need a preg_match / preg_match_all here:
// Get the first match
if (preg_match('~\b\d{2}-\d{3}\b~', $text, $match)) {
echo $match[0];
}
// Get all matches
if (preg_match_all('~\b\d{2}-\d{3}\b~', $text, $matches)) {
print_r($matches[0]);
}
The regex matches
\b - a word boundary
\d{2} - two digits
- - a hyphen
\d{3} - three digits
\b - a word boundary
See the regex demo.
Related
I have a few hundred thousand strings that are laid out like the following
AX23784268B2
LJ93842938A1
MN39423287S
IY289383N2
With PHP I'm racking my brain how to return B2, A1, S, and N2.
Tried all sorts of substr, strstr, strlen manipulation and am coming up short.
substr('MN39423287S', -2); ?> // returns 7S, not S
This is a simpler regexp than the other answer:
preg_match('/[A-Z][^A-Z]*$/', $token, $matches);
echo $matches[0];
[A-Z] matches a letter, [^A-Z] matches a non-letter. * makes the preceiding pattern match any number of times (including 0), and $ matches the end of the string.
So this matches a letter followed by any number of non-letters at the end of the string.
$matches[0] contains the portion of the string that the entire regexp matched.
There's many way to do this.
One example would be a regex
<?php
$regex = "/.+([A-Z].?+)$/";
$tokens = [
'AX23784268B2',
'LJ93842938A1',
'MN39423287S',
'IY289383N2',
];
foreach($tokens as $token)
{
preg_match($regex, $token, $matches);
var_dump($matches[1]);
// B2, A2, S, N2
}
How the regex works;
.+ - any character except newline
( - create a group
[A-Z] - match any A-Z character
.?+ - also match any characters after it, if any
) - end group
$ - match the end of the string
I have a string like "some words 12345cm some more words"
and I want to extract the 12345cm bit from that string. So I get the position of the first number:
$position_of_first_number = strcspn( "some words 12345cm some more words" , '0123456789' );
Then the position of the first space after $position_of_first_number
$position_of_space_after_numbers = strpos("some words 12345cm some more words", " ", $position_of_first_number);
Then I want to have a function which return the portion of the string between $position_of_first_number and $position_of_space_after_numbers.
How do I do it?
You can use the substr function. Note that it takes a starting position and a length, which you can calculate as the difference between the start and end positions.
Since you are looking for a pattern like blank-digits-letters-blank, I would recommend a regular expression using preg_match:
$s = "some words 12345cm some more words";
preg_match("/\s(?P<result>\d+[^\W\d_]+)\s/", $s, $matches);
echo $matches["result"];
12345cm
Explaining the pattern:
"/.../" limits the pattern in PHP
\s matches any whitespace character
(?P<name>...) names the following pattern
\d+ matches 1 or more digits
[^\W\d_]+ matches 1 or more Unicode-letters (i.e. any character that is not a non-alphanumeric character; see this answer)
I'm trying to grab everything after the following digits, so I end up with just the store name in this string:
full string: /stores/1077029-gacha-pins
what I want to ignore: /stores/1077029-
what I need to grab: gacha-pins
Those digits can change at any time so it's not specifically that ID, but any numbers after /stores/
My attempt so far is only grabbing /stores/1
\/stores\/[0-9]
I'm still trying, just thought I would see if I can get some help in the meantime too, will post an answer if I solve.
You may use
'~/stores/\d+-\K[^/]+$~'
Or a more specific one:
'~/stores/\d+-\K\w+(?:-\w+)*$~'
See the regex demo and this regex demo.
Details
/stores/ - a literal string
\d+ - 1+ digits
- - a hyphen
\K - match reset operator
[^/]+ - any 1+ chars other than /
\w+(?:-\w+)* - 1+ word chars and then 0+ sequences of - and 1+ word chars
$ - end of string.
See the PHP demo:
$s = "/stores/1077029-gacha-pins";
$rx = '~/stores/\d+-\K[^/]+$~';
if (preg_match($rx, $s, $matches)) {
echo "Result: " . $matches[0];
}
// => Result: gacha-pins
You should do it like this:
$string = '/stores/1077029-gacha-pins';
preg_match('#/stores/[0-9-]+(.*)#', $string, $matches);
$part = $matches[1];
print_r($part);
I have to check csv files live and match some expression to get data.
These files can have different type of message so different matching expression.
The message can be something like that
GuiPrinter.ProcessPrint of 116806 25374 K356 S Black Face.png 229 at 1
table
And I want to get 116806 25374 K356 S Black Face.png
. So the regex associate to this kind of file would be something like (GuiPrinter.ProcessPrint of )(.*)([.][png|jpg|jpeg|PNG|JPG|JPEG]*) and I can return $result[2]
But the message and the regex can change, so I need a common function that can return the string that I want based on the regex, the function would have message and regex parameters. Maybe for another file the string that I want would be on first position so my $result[2] won't work.
How can I ensure to always return the string that I want to match ?
Use
\preg_match('/GuiPrinter.ProcessPrint of(.*?)\.(gif|png|bmp|jpe?g)/', $str, $match);
print_r($match[1]);
You could match the text GuiPrinter.ProcessPrint and then use \K to reset the starting point of the reported match.
Match any character zero or more times non greedy .*?, then match a dot \. and any of the image extensions in a non capturing group (?:gif|png|bmp|jpe?g) followed by a word boundary \b
GuiPrinter\.ProcessPrint of \K.*?\.(?:gif|png|bmp|jpe?g)\b
Note that to match the dot literally you have to escape it \.
For example to return 1 match using preg_match:
$str = 'GuiPrinter.ProcessPrint of 116806 25374 K356 S Black Face.png 229 at 1 table';
$re = '/GuiPrinter\.ProcessPrint of \K.*?\.(?:gif|png|bmp|jpe?g)\b/';
function findMatch($message, $regex) {
preg_match($regex, $message, $matches);
return array_shift($matches);
}
$result = findMatch($str, $re);
if ($result) {
echo "Found: $result";
} else {
echo "No match.";
}
Demo
http://www.tehplayground.com/#0qrTOzTh3
$inputs = array(
'2', // no match
'29.2', // no match
'2.48',
'8.06.16', // no match
'-2.41',
'-.54', // no match
'4.492', // no match
'4.194,32',
'39,299.39',
'329.382,39',
'-188.392,49',
'293.392,193', // no match
'-.492.183,33', // no match
'3.492.249,11',
'29.439.834,13',
'-392.492.492,43'
);
$number_pattern = '-?(?:[0-9]|[0-9]{2}|[0-9]{3}[\.,]?)?(?:[0-9]|[0-9]{2}|[0-9]{3})[\.,][0-9]{2}(?!\d)';
foreach($inputs as $input){
preg_match_all('/'.$number_pattern.'/m', $input, $matches);
print_r($matches);
}
It seems you are looking for
$number_pattern = '-?(?<![\d.,])\d{1,3}(?:[,.]\d{3})*[.,]\d{2}(?![\d.])';
See the PHP demo and a regex demo.
The anchors are not used, there are lookarounds on both sides of the pattern instead.
Pattern details:
-? - an optional hyphen
(?<![\d.,]) - there cannot be a digit, comma or dot befire the current location
-\d{1,3} - 1 to 3 digits
(?:[,.]\d{3})* - zero or more sequences of a comma or dot followed with 3 digits
[.,] - a comma or dot
\d{2} - 2 digits that are
(?![\d.]) - not followed with a digit or dot.
Note in PHP, you do not need to specify the /m MULTILINE mode and use the $ end of string anchor,
preg_match_all('/'.$number_pattern.'/', $input, $matches);
is enough to match the numbers you need in larger texts.
If you need to match them as standalone strings, use a simpler
^-?\d{1,3}(?:[,.]\d{3})*[.,]\d{2}$
See the regex demo.