preg_replace how to remove all numbers except alphanumeric - php

How to remove all numbers exept alphanumeric, for example if i have string like this:
Abs_1234abcd_636950806858590746.lands
to become it like this
Abs_1234abcd_.lands

It is probably done like this
Find (?i)(?<![a-z\d])\d+(?![a-z\d])
Replace with nothing.
Explained:
It's important to note that in the class [a-z\d] within assertions,
there exists a digit, without which could let "abc901234def" match.
(?i) # Case insensitive
(?<! [a-z\d] ) # Behind, not a letter nor digit
\d+ # Many digits
(?! [a-z\d] ) # Ahead, not a letter nor digit
Note - a speedier version exists (?i)\d(?<!\d[a-z\d])\d*(?![a-z\d])
Regex1: (?i)\d(?<!\d[a-z\d])\d*(?![a-z\d])
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 2
Elapsed Time: 0.53 s, 530.56 ms, 530564 µs
Matches per sec: 188,478
Regex2: (?i)(?<![a-z\d])\d+(?![a-z\d])
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 2
Elapsed Time: 0.91 s, 909.58 ms, 909577 µs
Matches per sec: 109,941

In this specific example, we can simply use _ as a left boundary and . as the right boundary, collect our digits, and replace:
Test
$re = '/(.+[_])[0-9]+(\..+)/m';
$str = 'Abs_1234abcd_636950806858590746.lands';
$subst = '$1$2';
$result = preg_replace($re, $subst, $str);
echo $result;
Demo

For your example data, you could also match not a word character or an underscore [\W_] using a character class. Then forget what is matched using \K.
Match 1+ digits that you want to replace with a empty string and assert what is on the right is again not a word character or an underscore.
[\W_]\K\d+(?=[\W_])
Regex demo

Related

Regular expression for highlighting numbers between words

Site users enter numbers in different ways, example:
from 8 000 packs
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs
I am looking for a regular expression with which I could highlight words before digits (if there are any), digits in any format and words after (if there are any). It is advisable to exclude spaces.
Now I have such a design, but it does not work correctly.
(^[0-9|a-zA-Z].*?)\s([0-9].*?)\s([a-zA-Z]*$)
The main purpose of this is to put the strings in order, bring them to the same form, format them in PHP digit format, etc.
As a result, I need to get the text before the digits, the digits themselves and the text after them into the variables separately.
$before = 'from';
$num = '8000';
$after = 'packs';
Thank you for any help in this matter)
I think you may try this:
^(\D+)?([\d \t]+)(\D+)?$
group 1: optional(?) group that will contain anything but digit
group 2: mandatory group that will contain only digits and
white space character like space and tab
group 3: optional(?) group that will contain anything but digit
Demo
Source (run)
$re = '/^(\D+)?([\d \t]+)(\D+)?$/m';
$str = 'from 8 000 packs
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $matchgroup)
{
echo "before: ".$matchgroup[1]."\n";
echo "number:".preg_replace('/\D/m','',$matchgroup[2])."\n";
echo "after:".$matchgroup[3]."";
echo "\n\n\n";
}
I corrected your regex and added groups, the regex looks like this:
^(?<before>[a-zA-Z]+)?\s?(?<number>[0-9].*?)\s?(?<after>[a-zA-Z]+)?$`
Test regex here: https://regex101.com/r/QLEC9g/2
By using groups you can easily separate the words and numbers, and handle them any way you want.
Your pattern does not match because there are 4 required parts that all expect 1 character to be present:
(^[0-9|a-zA-Z].*?)\s([0-9].*?)\s([a-zA-Z]*$)
^^^^^^^^^^^^ ^^ ^^^^^ ^^
The other thing to note is that the first character class [0-9|a-zA-Z] can also match digits (you can omit the | as it would match a literal pipe char)
If you would allow all other chars than digits on the left and right, and there should be at least a single digit present, you can use a negated character class [^\d\r\n]* optionally matching any character except a digit or a newline:
^([^\d\r\n]*)\h*(\d+(?:\h+\d+)*)\h*([^\d\r\n]*)$
^ Start of string
([^\d\r\n]*) Capture group 1, match any char except a digit or a newline
\h* Match optional horizontal whitespace chars
(\d+(?:\h+\d+)*) Capture group 2, match 1+ digits and optionally repeat matching spaces and 1+ digits
\h* Match optional horizontal whitespace chars
([^\d\r\n]*) Capture group 3, match any char except a digit or a newline
$ End of string
See a regex demo and a PHP demo.
For example
$re = '/^([^\d\r\n]*)\h*(\d+(?:\h+\d+)*)\h*([^\d\r\n]*)$/m';
$str = 'from 8 000 packs
test from 8 000 packs test
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach($matches as $match) {
list(,$before, $num, $after) = $match;
echo sprintf(
"before: %s\nnum:%s\nafter:%s\n--------------------\n",
$before, preg_replace("/\h+/", "", $num), $after
);
}
Output
before: from
num:8000
after:packs
--------------------
before: test from
num:8000
after:packs test
--------------------
before:
num:432534534
after:
--------------------
before: from
num:344454
after:packs
--------------------
before:
num:45054
after:packs
--------------------
before:
num:04555
after:
--------------------
before:
num:434654
after:
--------------------
before:
num:54564
after:packs
--------------------
If there should be at least a single digit present, and the only allowed characters are a-z for the word(s), you can use a case insensitive pattern:
(?i)^((?:[a-z]+(?:\h+[a-z]+)*)?)\h*(\d+(?:\h+\d+)*)\h*((?:[a-z]+(?:\h+[a-z]+)*)?)?$
See another regex demo and a php demo.

How to extract an ID number from a string?

How do I retrieve the middle value using regex or preg_match?
$str = 'fxs_124024574287414=base_domain=.example.com; datr=KWHazxXEIkldzBaVq_of--syv5; csrftoken=szcwad; ds_user_id=219132; mid=XN4bpAAEAAHOyBRR4V17xfbaosyN; sessionid=14811313756%12fasda%3A27; rur=VLL;'
How do I only get values ​​from ds_user_id using regex or preg_match?
Use preg_match to match ds_user_id=, then forget those matched characters with \K, then match one or more digits. No capture groups, no lookarounds, no parsing all the key-value pairs, no exploding.
Code: (Demo)
$str = 'fxs_124024574287414=base_domain=.example.com; datr=KWHazxXEIkldzBaVq_of--syv5; csrftoken=szcwad; ds_user_id=219132; mid=XN4bpAAEAAHOyBRR4V17xfbaosyN; sessionid=14811313756%12fasda%3A27; rur=VLL;';
echo preg_match('~ds_user_id=\K\d+~', $str, $out) ? $out[0] : 'no match';
Output:
219132
Ok, nothing can beat the mickmackusa \K construct.
But, for the \K impaired engines, this is the next best thing
(\d(?<=ds_user_id=\d)\d*)(?=;)
Explained
( # (1 start), Consume many ID digits
\d # First digit of ID
(?<= ds_user_id= \d ) # Look behind, assert ID key exists before digit
\d* # Optional the rest of the digits
) # (1 end)
(?= ; ) # Look ahead, assert a colon exists
This one is a verb solution (no \K), about %30 faster.
( # (1 start), Consume many ID digits
\d # First digit of ID
(?:
(?<! ds_user_id= \d ) # Look behind, if not ID,
\d* # get rest of digits
(*SKIP) # Fail, then start after this
(?!)
|
\d* # Rest of ID digits
)
) # (1 end)
(?= ; ) # Look ahead, assert a colon exists
Some benchmarks for comparison
Regex1: (\d(?:(?<!ds_user_id=\d)\d*(*SKIP)(?!)|\d*))(?=;)
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 1
Elapsed Time: 0.53 s, 534.47 ms, 534473 µs
Matches per sec: 93,550
Regex2: (\d(?<=ds_user_id=\d)\d*)(?=;)
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 1
Elapsed Time: 0.80 s, 796.97 ms, 796971 µs
Matches per sec: 62,737
Regex3: ds_user_id=\K\d+(?=;)
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 1
Elapsed Time: 0.21 s, 214.55 ms, 214549 µs
Matches per sec: 233,046
Regex4: ds_user_id=(\d+)(?=;)
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 1
Elapsed Time: 0.23 s, 231.23 ms, 231233 µs
Matches per sec: 216,232
If we wish to use explode:
$str = 'fxs_124024574287414=base_domain=.example.com; datr=KWHazxXEIkldzBaVq_of--syv5; csrftoken=szcwad; ds_user_id=219132; mid=XN4bpAAEAAHOyBRR4V17xfbaosyN; sessionid=14811313756%12fasda%3A27; rur=VLL;';
$arr = explode(';', $str);
foreach ($arr as $key => $value) {
if (preg_match('/ds_user_id/s', $value)) {
$ds_user_id = explode('=', $value);
echo $ds_user_id[1];
}
}
Output
219132
Here, we can also use two non-capturing groups with a capturing group:
(?:ds_user_id=)(.+?)(?:;)
where we have a left boundary:
(?:ds_user_id=)
and a right boundary:
(?:;)
and we collect our desired digits or anything else that we wish to have using:
(.+?)
If we wish to validate our ID number, we can use:
(?:ds_user_id=)([0-9]+?)(?:;)
DEMO
and our desired value can be simply called using var_dump($matches[0][1]);.
Test
$re = '/(?:ds_user_id=)(.+?)(?:;)/m';
$str = 'fxs_124024574287414=base_domain=.example.com; datr=KWHazxXEIkldzBaVq_of--syv5; csrftoken=szcwad; ds_user_id=219132; mid=XN4bpAAEAAHOyBRR4V17xfbaosyN; sessionid=14811313756%12fasda%3A27; rur=VLL;';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
Output
array(1) {
[0]=>
array(2) {
[0]=>
string(18) "ds_user_id=219132;"
[1]=>
string(6) "219132"
}
}
DEMO

split string in numbers and text but accept text with a single digit inside

Let's say I want to split this string in two variables:
$string = "levis 501";
I will use
preg_match('/\d+/', $string, $num);
preg_match('/\D+/', $string, $text);
but then let's say I want to split this one in two
$string = "levis 5° 501";
as $text = "levis 5°"; and $num = "501";
So my guess is I should add a rule to the preg_match('/\d+/', $string, $num); that looks for numbers only at the END of the string and I want it to be between 2 and 3 digits.
But also the $text match now has one number inside...
How would you do it?
To slit a string in two parts, use any of the following:
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
This regex matches:
^ - the start of the string
(.*?) - Group 1 capturing any one or more characters, as few as possible (as *? is a "lazy" quantifier) up to...
\s* - zero or more whitespace symbols
(\d+) - Group 2 capturing 1 or more digits
\D* - zero or more characters other than digit (it is the opposite shorthand character class to \d)
$ - end of string.
The ~s modifier is a DOTALL one forcing the . to match any character, even a newline, that it does not match without this modifier.
Or
preg_split('~\s*(?=\s*\d+\D*$)~', $s);
This \s*(?=\s*\d+\D*$) pattern:
\s* - zero or more whitespaces, but only if followed by...
(?=\s*\d+\D*$) - zero or more whitespaces followed with 1+ digits followed with 0+ characters other than digits followed with end of string.
The (?=...) construct is a positive lookahead that does not consume characters and just checks if the pattern inside matches and if yes, returns "true", and if not, no match occurs.
See IDEONE demo:
$s = "levis 5° 501";
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
print_r($matches[1] . ": ". $matches[2]. PHP_EOL);
print_r(preg_split('~\s*(?=\s*\d+\D*$)~', $s, 2));

Regex validation for North American phone numbers

I am having trouble finding a pattern that would detect the following
909-999-9999
909 999 9999
(909) 999-9999
(909) 999 9999
999 999 9999
9999999999
\A[(]?[0-9]{3}[)]?[ ,-][0-9]{3}[ ,-][0-9]{3}\z
I tried it but it doesn't work for all the instances . I was thinking I can divide the problem by putting each character into an array and then checking it. but then the code would be too long.
You have 4 digits in the last group, and you specify 3 in the regex.
You also need to apply a ? quantifier (1 or 0 occurrence) to the separators since they are optional.
Use
^[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}$
See the demo here
PHP demo:
$re = "/\A[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}\z/";
$strs = array("909-999-9999", "909 999 9999", "(909) 999-9999", "(909) 999 9999", "999 999 9999","9999999999");
$vals = preg_grep($re, $strs);
print_r($vals);
And another one:
$re = "/\A[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}\z/";
$str = "909-999-9999";
if (preg_match($re, $str, $m)) {
echo "MATCHED!";
}
BTW, optional ? subpatterns perform better than alternations.
Try this regex:
^(?:\(\d{3}\)|\d{3})[- ]?\d{3}[- ]?\d{4}$
Explaining:
^ # from start
(?: # one of
\(\d{3}\) # '(999)' sequence
| # OR
\d{3} # '999' sequence
) #
[- ]? # may exist space or hyphen
\d{3} # three digits
[- ]? # may exist space or hyphen
\d{4} # four digits
$ # end of string
Hope it helps.

What would be Regex to match the following 10-digit numbers?

What would be Regex to match the following 10-digit numbers:
0108889999 //can contain nothing except 10 digits
011 8889999 //can contain a whitespace at that place
012 888 9999 //can contain two whitespaces like that
013-8889999 // can contain one dash
014-888-9999 // can contain two dashes
If you're just looking for the regex itself, try this:
^(\d{3}(\s|\-)?){2}\d{4}$
Put slightly more legibly:
^ # start at the beginning of the line (or input)
(
\d{3} # find three digits
(
\s # followed by a space
| # OR
\- # a hyphen
)? # neither of which might actually be there
){2} # do this twice,
\d{4} # then find four more digits
$ # finish at the end of the line (or input)
EDIT: Oops! The above was correct, but it was also too lenient. It would match things like 01088899996 (one too many characters) because it liked the first (or the last) 10 of them. Now it's more strict (I added the ^ and $).
I'm assuming you want a single regex to match any of these examples:
if (preg_match('/(\d{3})[ \-]?(\d{3})[ \-]?(\d{4})/', $value, $matches)) {
$number = $matches[1] . $matches[2] . $matches[3];
}
preg_match('/\d{3}[\s-]?\d{3}[\s-]?\d{4}/', $string);
0108889999 // true
011 8889999 // true
012 888 9999 // true
013-8889999 // true
014-888-9999 // true
To match the specific parts:
preg_match('/(\d{3})[\s-]?(\d{3})[\s-]?(\d{4}/)', $string, $matches);
echo $matches[1]; // first 3 numbers
echo $matches[2]; // next 3 numbers
echo $matches[3]; // next 4 numbers
You can try this pattern. It satisfies your requirements.
[0-9]{3}[-\s]?[0-9]{3}[-\s]?[0-9]{4}
Also, you can add more conditions to the last character by appending [\s.,]+: (phone# ending with space, dot or comma)
[0-9]{3}[-\s]?[0-9]{3}[-\s]?[0-9]{4}[\s.,]+

Categories