Regular expression for highlighting numbers between words - php

Site users enter numbers in different ways, example:
from 8 000 packs
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs
I am looking for a regular expression with which I could highlight words before digits (if there are any), digits in any format and words after (if there are any). It is advisable to exclude spaces.
Now I have such a design, but it does not work correctly.
(^[0-9|a-zA-Z].*?)\s([0-9].*?)\s([a-zA-Z]*$)
The main purpose of this is to put the strings in order, bring them to the same form, format them in PHP digit format, etc.
As a result, I need to get the text before the digits, the digits themselves and the text after them into the variables separately.
$before = 'from';
$num = '8000';
$after = 'packs';
Thank you for any help in this matter)

I think you may try this:
^(\D+)?([\d \t]+)(\D+)?$
group 1: optional(?) group that will contain anything but digit
group 2: mandatory group that will contain only digits and
white space character like space and tab
group 3: optional(?) group that will contain anything but digit
Demo
Source (run)
$re = '/^(\D+)?([\d \t]+)(\D+)?$/m';
$str = 'from 8 000 packs
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $matchgroup)
{
echo "before: ".$matchgroup[1]."\n";
echo "number:".preg_replace('/\D/m','',$matchgroup[2])."\n";
echo "after:".$matchgroup[3]."";
echo "\n\n\n";
}

I corrected your regex and added groups, the regex looks like this:
^(?<before>[a-zA-Z]+)?\s?(?<number>[0-9].*?)\s?(?<after>[a-zA-Z]+)?$`
Test regex here: https://regex101.com/r/QLEC9g/2
By using groups you can easily separate the words and numbers, and handle them any way you want.

Your pattern does not match because there are 4 required parts that all expect 1 character to be present:
(^[0-9|a-zA-Z].*?)\s([0-9].*?)\s([a-zA-Z]*$)
^^^^^^^^^^^^ ^^ ^^^^^ ^^
The other thing to note is that the first character class [0-9|a-zA-Z] can also match digits (you can omit the | as it would match a literal pipe char)
If you would allow all other chars than digits on the left and right, and there should be at least a single digit present, you can use a negated character class [^\d\r\n]* optionally matching any character except a digit or a newline:
^([^\d\r\n]*)\h*(\d+(?:\h+\d+)*)\h*([^\d\r\n]*)$
^ Start of string
([^\d\r\n]*) Capture group 1, match any char except a digit or a newline
\h* Match optional horizontal whitespace chars
(\d+(?:\h+\d+)*) Capture group 2, match 1+ digits and optionally repeat matching spaces and 1+ digits
\h* Match optional horizontal whitespace chars
([^\d\r\n]*) Capture group 3, match any char except a digit or a newline
$ End of string
See a regex demo and a PHP demo.
For example
$re = '/^([^\d\r\n]*)\h*(\d+(?:\h+\d+)*)\h*([^\d\r\n]*)$/m';
$str = 'from 8 000 packs
test from 8 000 packs test
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach($matches as $match) {
list(,$before, $num, $after) = $match;
echo sprintf(
"before: %s\nnum:%s\nafter:%s\n--------------------\n",
$before, preg_replace("/\h+/", "", $num), $after
);
}
Output
before: from
num:8000
after:packs
--------------------
before: test from
num:8000
after:packs test
--------------------
before:
num:432534534
after:
--------------------
before: from
num:344454
after:packs
--------------------
before:
num:45054
after:packs
--------------------
before:
num:04555
after:
--------------------
before:
num:434654
after:
--------------------
before:
num:54564
after:packs
--------------------
If there should be at least a single digit present, and the only allowed characters are a-z for the word(s), you can use a case insensitive pattern:
(?i)^((?:[a-z]+(?:\h+[a-z]+)*)?)\h*(\d+(?:\h+\d+)*)\h*((?:[a-z]+(?:\h+[a-z]+)*)?)?$
See another regex demo and a php demo.

Related

regex to remove variable prefix with or without a delimiter

I am trying to process historic military service numbers which have a very variable format. The key thing is to remove any prefix, but also to keep any suffix. Prefixes most commonly have a delimiter of a space, slash or dash, but sometimes they do not. In these cases the prefix is always one or more uppercase letters. In all other cases both prefixes and suffixes can contain letters or numbers and whilst typically uppercase, can be lower!
Currently my php code is
$cleanServiceNumber = preg_replace("/^.*[\/\s-]/","",$serviceNumber)
and typical values and desired results are
AB/12345 => 12345
CD-23456 => 23456
EF 34567 => 34567
5/45678 => 45678
GH/56789/A =>56789/A
GH/56789B => 56789B
XY67890 => 67890 <<< fails to do any replace and returns XY67890
I'm afraid my basic regex skills are failing me in terms of sorting the last example!
This regex replaces the combination of 0 to n digits and n non-digits at the beginning of the string: /^\d*\D+/
Demo
$serviceNumbers = array(
'AB/12345',
'CD-23456',
'EF 34567',
'5/45678',
'GH/56789/A',
'GH/56789B',
'XY67890');
foreach ($serviceNumbers as $serviceNumber) {
$cleanServiceNumber = preg_replace("/^\d*\D+/","",$serviceNumber);
echo $cleanServiceNumber . "\n";
}
Output:
12345
23456
34567
45678
56789/A
56789B
67890
You can add an alternation of [A-Z]+, but you should also make the other alternation more efficient by searching for non-delimiter characters followed by a delimiter:
$cleanServiceNumber = preg_replace("/^(?:[^\/ -]+[\/ -]|[A-Z]+)/","",$serviceNumber);
Demo on regex101
PHP demo on 3v4l.org
Here is another try for a regex which looks like:
/^([A-Za-z]+(\d+\W|\W)?|\d+\W)/
It has 2 parts which detects the type of prefixes you have:
[A-Za-z]+(\d+\W|\W)? => Any alphabets ending with non word character or alphabets having numbers and then ending with non word character. However, this ending game is optional with a ? at the end.
\d+\W => Any digits followed by a non word character.
Snippet:
<?php
$tests = [
'AB/12345',
'CD-23456',
'EF 34567',
'5/45678',
'GH/56789/A',
'GH/56789B',
'XY67890',
'XY67890/90/A'
];
foreach($tests as $test){
echo $test," => ",preg_replace("/^([A-Za-z]+(\d+\W|\W)?|\d+\W)/","",$test),PHP_EOL;
}
Demo: https://3v4l.org/9hJLJ
The pattern you tried ^.*[\/\s-] first matches until the end of the string because the dot is greedy. Then it will backtrack until it can match either a /, - or a whitespace char.
This will not work for GH/56789/A as it will backtrack until the last / and it will not work for XY67890 as it does not match any of the characters in the character class.
You could match from the start of the string either 1 or more chars a-zA-Z or 1 or more digits 0-9 and at the end match an optional /, - or a horizontal whitespace character.
^(?:[A-Za-z]+|\d+)[/\h-]?
Regex demo | Php demo
For example
$serviceNumbers = [
"AB/12345",
"CD-23456",
"EF 34567",
"5/45678",
"GH/56789/A",
"GH/56789B",
"XY67890"
];
foreach ($serviceNumbers as $serviceNumber) {
echo preg_replace("~^(?:[A-Za-z]+|\d+)[/\h-]?~","",$serviceNumber) . PHP_EOL;
}
Output
12345
23456
34567
45678
56789/A
56789B
67890

Extract last 2 snippets from string (PHP)

In my code I call the following up
{$item.articlename}
this one has the content:
"Red Blue Green Yellow Black"
I just want to have the last two words in the string.
"Yellow Black"
I tried to delete the first words with regex_replace,
{$item.articlename|regex_replace:"/^(\w+\s)/":" "}
but the number of words at the beginning varies, so I always want to have the last two words.
I would appreciate any hint.
You could match the last 2 words using \w+ to match 1+ word characters and \h+ to match 1+ horizontal whitespace characters. Use an anchor $ to assert the end of the string.
Note that \s also matches a newline.
\w+\h+\w+$
Regex demo
If you want to use a replacement, you could replace using the first capturing group and use a word boundary \b before the first \w+
^.*(\b\w+\h+\w+)$
^ Start of stirng
.* Match any char except a newline 0+ times
( Capture group 1
\b\w+\h+\w+ Wordboundary, 1+ word chars, 1+ horizontal whitespace chars, 1+ word chars
) Close group 1
$ End of string
Regex demo
In the replacement use group 1 $1
How about this:
$string = "Red Blue Green Yellow Black";
$arr = explode(" ", $string);
$arr = array_slice($arr, -2, 2, true);
$result = implode(" ", $arr);
Assuming last 2 words would always exist, you can use simple explode() and array_slice() with a negative offset to get them. Later, you can glue them using join.
<?php
$str = "Red Blue Green Yellow Black";
echo join(" ",array_slice(explode(" ",trim($str)),-2));
Demo: https://3v4l.org/7FJ9n
In your code, it would look like
{{ join(" ",array_slice(explode(" ",trim($item.articlename)),-2)) }}

Split address street name house number and room number

I need split address: Main Str. 202-52 into
street=Main Str.
house No.=202
room No.=52
I tried to use this:
$data['address'] = "Main Str. 202-52";
$data['street'] = explode(" ", $data['address']);
$data['building'] = explode("-", $data['street'][0]);
It is working when street name one word. How split address where street name have several words.
I tried $data['street'] = preg_split('/[0-9]/', $data['address']);But getting only street name...
You may use a regular expression like
/^(.*)\s(\d+)\W+(\d+)$/
if you need all up to the last whitespace into group 1, the next digits into Group 2 and the last digits into Group 3. \W+ matches 1+ chars other than word chars, so it matches - and more. If you have a - there, just use the hyphen instead of \W+.
See the regex demo and a PHP demo:
$s = "Main Str. 202-52";
if (preg_match('~^(.*)\s(\d+)\W+(\d+)$~', $s, $m)) {
echo $m[1] . "\n"; // Main Str.
echo $m[2] . "\n"; // 202
echo $m[3]; // 52
}
Pattern details:
^ - start of string
(.*) - Group 1 capturing any 0+ chars other than line break chars as many as possible up to the last....
\s - whitespace, followed with...
(\d+) - Group 2: one or more digits
\W+ - 1+ non-word chars
(\d+) - Group 3: one or more digits
$ - end of string.
Also, note that in case the last part can be optional, wrap the \W+(\d+) with an optional capturing group (i.e. (?:...)?, (?:\W+(\d+))?).

split string in numbers and text but accept text with a single digit inside

Let's say I want to split this string in two variables:
$string = "levis 501";
I will use
preg_match('/\d+/', $string, $num);
preg_match('/\D+/', $string, $text);
but then let's say I want to split this one in two
$string = "levis 5° 501";
as $text = "levis 5°"; and $num = "501";
So my guess is I should add a rule to the preg_match('/\d+/', $string, $num); that looks for numbers only at the END of the string and I want it to be between 2 and 3 digits.
But also the $text match now has one number inside...
How would you do it?
To slit a string in two parts, use any of the following:
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
This regex matches:
^ - the start of the string
(.*?) - Group 1 capturing any one or more characters, as few as possible (as *? is a "lazy" quantifier) up to...
\s* - zero or more whitespace symbols
(\d+) - Group 2 capturing 1 or more digits
\D* - zero or more characters other than digit (it is the opposite shorthand character class to \d)
$ - end of string.
The ~s modifier is a DOTALL one forcing the . to match any character, even a newline, that it does not match without this modifier.
Or
preg_split('~\s*(?=\s*\d+\D*$)~', $s);
This \s*(?=\s*\d+\D*$) pattern:
\s* - zero or more whitespaces, but only if followed by...
(?=\s*\d+\D*$) - zero or more whitespaces followed with 1+ digits followed with 0+ characters other than digits followed with end of string.
The (?=...) construct is a positive lookahead that does not consume characters and just checks if the pattern inside matches and if yes, returns "true", and if not, no match occurs.
See IDEONE demo:
$s = "levis 5° 501";
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
print_r($matches[1] . ": ". $matches[2]. PHP_EOL);
print_r(preg_split('~\s*(?=\s*\d+\D*$)~', $s, 2));

What would be Regex to match the following 10-digit numbers?

What would be Regex to match the following 10-digit numbers:
0108889999 //can contain nothing except 10 digits
011 8889999 //can contain a whitespace at that place
012 888 9999 //can contain two whitespaces like that
013-8889999 // can contain one dash
014-888-9999 // can contain two dashes
If you're just looking for the regex itself, try this:
^(\d{3}(\s|\-)?){2}\d{4}$
Put slightly more legibly:
^ # start at the beginning of the line (or input)
(
\d{3} # find three digits
(
\s # followed by a space
| # OR
\- # a hyphen
)? # neither of which might actually be there
){2} # do this twice,
\d{4} # then find four more digits
$ # finish at the end of the line (or input)
EDIT: Oops! The above was correct, but it was also too lenient. It would match things like 01088899996 (one too many characters) because it liked the first (or the last) 10 of them. Now it's more strict (I added the ^ and $).
I'm assuming you want a single regex to match any of these examples:
if (preg_match('/(\d{3})[ \-]?(\d{3})[ \-]?(\d{4})/', $value, $matches)) {
$number = $matches[1] . $matches[2] . $matches[3];
}
preg_match('/\d{3}[\s-]?\d{3}[\s-]?\d{4}/', $string);
0108889999 // true
011 8889999 // true
012 888 9999 // true
013-8889999 // true
014-888-9999 // true
To match the specific parts:
preg_match('/(\d{3})[\s-]?(\d{3})[\s-]?(\d{4}/)', $string, $matches);
echo $matches[1]; // first 3 numbers
echo $matches[2]; // next 3 numbers
echo $matches[3]; // next 4 numbers
You can try this pattern. It satisfies your requirements.
[0-9]{3}[-\s]?[0-9]{3}[-\s]?[0-9]{4}
Also, you can add more conditions to the last character by appending [\s.,]+: (phone# ending with space, dot or comma)
[0-9]{3}[-\s]?[0-9]{3}[-\s]?[0-9]{4}[\s.,]+

Categories