How to extract an ID number from a string?

How to extract an ID number from a string? - php

How do I retrieve the middle value using regex or preg_match?
$str = 'fxs_124024574287414=base_domain=.example.com; datr=KWHazxXEIkldzBaVq_of--syv5; csrftoken=szcwad; ds_user_id=219132; mid=XN4bpAAEAAHOyBRR4V17xfbaosyN; sessionid=14811313756%12fasda%3A27; rur=VLL;'
How do I only get values from ds_user_id using regex or preg_match?

Use preg_match to match ds_user_id=, then forget those matched characters with \K, then match one or more digits. No capture groups, no lookarounds, no parsing all the key-value pairs, no exploding.
Code: (Demo)
$str = 'fxs_124024574287414=base_domain=.example.com; datr=KWHazxXEIkldzBaVq_of--syv5; csrftoken=szcwad; ds_user_id=219132; mid=XN4bpAAEAAHOyBRR4V17xfbaosyN; sessionid=14811313756%12fasda%3A27; rur=VLL;';
echo preg_match('~ds_user_id=\K\d+~', $str, $out) ? $out[0] : 'no match';
Output:
219132

Ok, nothing can beat the mickmackusa \K construct.
But, for the \K impaired engines, this is the next best thing
(\d(?<=ds_user_id=\d)\d*)(?=;)
Explained
( # (1 start), Consume many ID digits
\d # First digit of ID
(?<= ds_user_id= \d ) # Look behind, assert ID key exists before digit
\d* # Optional the rest of the digits
) # (1 end)
(?= ; ) # Look ahead, assert a colon exists
This one is a verb solution (no \K), about %30 faster.
( # (1 start), Consume many ID digits
\d # First digit of ID
(?:
(?<! ds_user_id= \d ) # Look behind, if not ID,
\d* # get rest of digits
(*SKIP) # Fail, then start after this
(?!)
|
\d* # Rest of ID digits
)
) # (1 end)
(?= ; ) # Look ahead, assert a colon exists
Some benchmarks for comparison
Regex1: (\d(?:(?<!ds_user_id=\d)\d*(*SKIP)(?!)|\d*))(?=;)
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 1
Elapsed Time: 0.53 s, 534.47 ms, 534473 µs
Matches per sec: 93,550
Regex2: (\d(?<=ds_user_id=\d)\d*)(?=;)
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 1
Elapsed Time: 0.80 s, 796.97 ms, 796971 µs
Matches per sec: 62,737
Regex3: ds_user_id=\K\d+(?=;)
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 1
Elapsed Time: 0.21 s, 214.55 ms, 214549 µs
Matches per sec: 233,046
Regex4: ds_user_id=(\d+)(?=;)
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 1
Elapsed Time: 0.23 s, 231.23 ms, 231233 µs
Matches per sec: 216,232

If we wish to use explode:
$str = 'fxs_124024574287414=base_domain=.example.com; datr=KWHazxXEIkldzBaVq_of--syv5; csrftoken=szcwad; ds_user_id=219132; mid=XN4bpAAEAAHOyBRR4V17xfbaosyN; sessionid=14811313756%12fasda%3A27; rur=VLL;';
$arr = explode(';', $str);
foreach ($arr as $key => $value) {
if (preg_match('/ds_user_id/s', $value)) {
$ds_user_id = explode('=', $value);
echo $ds_user_id[1];
}
}
Output
219132
Here, we can also use two non-capturing groups with a capturing group:
(?:ds_user_id=)(.+?)(?:;)
where we have a left boundary:
(?:ds_user_id=)
and a right boundary:
(?:;)
and we collect our desired digits or anything else that we wish to have using:
(.+?)
If we wish to validate our ID number, we can use:
(?:ds_user_id=)([0-9]+?)(?:;)
DEMO
and our desired value can be simply called using var_dump($matches[0][1]);.
Test
$re = '/(?:ds_user_id=)(.+?)(?:;)/m';
$str = 'fxs_124024574287414=base_domain=.example.com; datr=KWHazxXEIkldzBaVq_of--syv5; csrftoken=szcwad; ds_user_id=219132; mid=XN4bpAAEAAHOyBRR4V17xfbaosyN; sessionid=14811313756%12fasda%3A27; rur=VLL;';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
Output
array(1) {
[0]=>
array(2) {
[0]=>
string(18) "ds_user_id=219132;"
[1]=>
string(6) "219132"
}
}
DEMO

Related

preg_replace how to remove all numbers except alphanumeric

How to remove all numbers exept alphanumeric, for example if i have string like this:
Abs_1234abcd_636950806858590746.lands
to become it like this
Abs_1234abcd_.lands

It is probably done like this
Find (?i)(?<![a-z\d])\d+(?![a-z\d])
Replace with nothing.
Explained:
It's important to note that in the class [a-z\d] within assertions,
there exists a digit, without which could let "abc901234def" match.
(?i) # Case insensitive
(?<! [a-z\d] ) # Behind, not a letter nor digit
\d+ # Many digits
(?! [a-z\d] ) # Ahead, not a letter nor digit
Note - a speedier version exists (?i)\d(?<!\d[a-z\d])\d*(?![a-z\d])
Regex1: (?i)\d(?<!\d[a-z\d])\d*(?![a-z\d])
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 2
Elapsed Time: 0.53 s, 530.56 ms, 530564 µs
Matches per sec: 188,478
Regex2: (?i)(?<![a-z\d])\d+(?![a-z\d])
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 2
Elapsed Time: 0.91 s, 909.58 ms, 909577 µs
Matches per sec: 109,941

In this specific example, we can simply use _ as a left boundary and . as the right boundary, collect our digits, and replace:
Test
$re = '/(.+[_])[0-9]+(\..+)/m';
$str = 'Abs_1234abcd_636950806858590746.lands';
$subst = '$1$2';
$result = preg_replace($re, $subst, $str);
echo $result;
Demo

For your example data, you could also match not a word character or an underscore [\W_] using a character class. Then forget what is matched using \K.
Match 1+ digits that you want to replace with a empty string and assert what is on the right is again not a word character or an underscore.
[\W_]\K\d+(?=[\W_])
Regex demo

Get string with number of specified length

Supposing there's an array as the following:
$arr = array('foo1234bar', 'foo1234', '1234bar', 'foo12345bar');
and I just need the elements that contains 4 characters in total only. So except for foo12345bar, other 3 elements are valid.
Because '\d{4}' would match foo12345bar, so I try following clumsily:
$arr = array('foo1234bar', 'foo1234', '1234bar', 'foo12345bar');
$result = array();
foreach ($arr as $value) {
preg_match('/\d+/', $value, $match);
if (strlen($match[0]) != 4) {
continue;
}
$result[] = $value;
}
var_dump($result); //array('foo1234bar', 'foo1234', '1234bar')
Is there a regular expression to match directly(so the if condition can be omitted)? Thank you in advance.

This easy to handle with look-around regex and preg_grep function:
$arr = array('foo1234bar', 'foo1234', '1234bar', 'foo12345bar');
print_r(preg_grep('/(?<!\d)\d{4}(?!\d)/', $arr));
RegEx Breakup:
(?<!\d) # assert previous char is not a digit
\d{4} # match exact 4 digits
(?!\d) # assert next char is not a digit
Output:
Array
(
[0] => foo1234bar
[1] => foo1234
[2] => 1234bar
)

Assuming the characters in front of and after the numbers will always be alphabetical, you can use this regex:
^[a-zA-Z]*\d{4}[a-zA-Z]+$

Modify your regex as follows
/^\D*\d{4}\D*$/
Explaination
^ your string must start with
\D any non-digit char
* repeated from 0 to infinite times
\d{4} followed by any digit repeated EXACTLY 4 times
\D followed by any non-digit char
* repeated from 0 to infinite times
$ end of the string
Moreover you could modify your code as follows
$arr = array('foo1234bar', 'foo1234', '1234bar', 'foo12345bar');
$result = array_filter(
$arr,
function($element) {
return preg_match('/^\D*\d{4}\D*$/', $element);
}
);
var_dump($result);
 Pay attention
As OP didn't specify it, this regex will match even 1234 (any four digit string without non-digit chars in front or behind). If he wishes to have at least a char in front or/and behind, this regex must be changed.

the regexp will be \d{4}
preg_match('/\d{4}/', $value, $match);
expect will help

you might try folowing : preg_match('/\D\d{4}\D/', $value, $match); it searches for:
a not digit(/D)
4 digits(/d{4})
again a non digit(/D)

This regular expression will work on all your examples:
'/^\D*(\d{4})\D*$/'
││ │ │ └── end string
││ │ └───── zero or more NOT digits
││ └─────────── four digits ( match 1 )
│└─────────────── zero or more NOT digits
└──────────────── start string
They doesn't work if there are multiple number group in the string ( '123abc1234o' ).

Regex validation for North American phone numbers

I am having trouble finding a pattern that would detect the following
909-999-9999
909 999 9999
(909) 999-9999
(909) 999 9999
999 999 9999
9999999999
\A[(]?[0-9]{3}[)]?[ ,-][0-9]{3}[ ,-][0-9]{3}\z
I tried it but it doesn't work for all the instances . I was thinking I can divide the problem by putting each character into an array and then checking it. but then the code would be too long.

You have 4 digits in the last group, and you specify 3 in the regex.
You also need to apply a ? quantifier (1 or 0 occurrence) to the separators since they are optional.
Use
^[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}$
See the demo here
PHP demo:
$re = "/\A[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}\z/";
$strs = array("909-999-9999", "909 999 9999", "(909) 999-9999", "(909) 999 9999", "999 999 9999","9999999999");
$vals = preg_grep($re, $strs);
print_r($vals);
And another one:
$re = "/\A[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}\z/";
$str = "909-999-9999";
if (preg_match($re, $str, $m)) {
echo "MATCHED!";
}
BTW, optional ? subpatterns perform better than alternations.

Try this regex:
^(?:\(\d{3}\)|\d{3})[- ]?\d{3}[- ]?\d{4}$
Explaining:
^ # from start
(?: # one of
\(\d{3}\) # '(999)' sequence
| # OR
\d{3} # '999' sequence
) #
[- ]? # may exist space or hyphen
\d{3} # three digits
[- ]? # may exist space or hyphen
\d{4} # four digits
$ # end of string
Hope it helps.

php regular expression minimum and maximum length doesn't work as expected

I want to create a regular expression in PHP, which will allow to user to enter a phone number in either of the formats below.
345-234 898
345 234-898
235-123-456
548 812 346
The minimum length of number should be 7 and maximum length should be 12.
The problem is that, the regular expression doesn't care about the minimum and maximum length. I don't know what is the problem in it. Please help me to solve it. Here is the regular expression.
if (preg_match("/^([0-9]+((\s?|-?)[0-9]+)*){7,12}$/", $string)) {
echo "ok";
} else {
echo "not ok";
}
Thanks for reading my question. I will wait for responses.

You should use the start (^) and the end ($) sign on your pattern
$subject = "123456789";
$pattern = '/^[0-9]{7,9}$/i';
if(preg_match($pattern, $subject)){
echo 'matched';
}else{
echo 'not matched';
}

You can use preg_replace to strip out non-digit symbols and check length of resulting string.
$onlyDigits = preg_replace('/\\D/', '', $string);
$length = strlen($onlyDigits);
if ($length < 7 OR $length > 12)
echo "not ok";
else
echo "ok";

Simply do this:
if (preg_match("/^\d{3}[ -]\d{3}[ -]\d{3}$/", $string)) {
Here \d means any digits from 0-9. Also [ -] means either a space or a hyphen

You can check the length with a lookahead assertion (?=...) at the begining of the pattern:
/^(?=.{7,12}$)[0-9]+(?:[\s-]?[0-9]+)*$/

Breaking down your original regex, it can read like the following:
^ # start of input
(
[0-9]+ # any number, 1 or more times
(
(\s?|-?) # a space, or a dash.. maybe
[0-9]+ # any number, 1 or more times
)* # repeat group 0 or more times
)
{7,12} # repeat full group 7 to 12 times
$ # end of input
So, basically, you're allowing "any number, 1 or more times" followed by a group of "any number 1 or more times, 0 or more times" repeat "7 to 12 times" - which kind of kills your length check.
You could take a more restricted approach and write out each individual number block:
(
\d{3} # any 3 numbers
(?:[ ]+|-)? # any (optional) spaces or a hyphen
\d{3} # any 3 numbers
(?:[ ]+|-)? # any (optional) spaces or a hyphen
\d{3} # any 3 numbers
)
Simplified:
if (preg_match('/^(\d{3}(?:[ ]+|-)?\d{3}(?:[ ]+|-)?\d{3})$/', $string)) {
If you want to restrict the separators to be only a single space or a hyphen, you can update the regex to use [ -] instead of (?:[ ]+|-); if you want this to be "optional" (i.e. there can be no separator between number groups), add in a ? to the end of each.
if (preg_match('/^(\d{3}[ -]\d{3}[ -]\d{3})$/', $string)) {

may it help you out.
Validator::extend('price', function ($attribute, $value, $args) {
return preg_match('/^\d{0,8}(\.\d{1,2})?$/', $value);
});

How to work around PHP lookbehind fixed width limitation?

I ran into a problem when trying to match all numbers found between spesific words on my page. How would you match all the numbers in the following text, but only between the word "begin" and "end"?
11
a
b
13
begin
t
899
y
50
f
end
91
h
This works:
preg_match("/begin(.*?)end/s", $text, $out);
preg_match_all("/[0-9]{1,}/", $out[1], $result);
But can it be done in one expression?
I tried this but it doesnt do the trick
preg_match_all("/begin.*([0-9]{1,}).*end/s", $text, $out);

You can make use of the \G anchor like this, and some lookaheads to make sure that you're not going 'out of territory' (out of the area between the two words):
(?:begin|(?!^)\G)(?:(?=(?:(?!begin).)*end)\D)*?(\d+)
regex101 demo
(?: # Begin of first non-capture group
begin # Match 'begin'
| # Or
(?!^)\G # Start the match from the previous end of match
) # End of first non-capture group
(?: # Second non-capture group
(?= # Positive lookahead
(?:(?!begin).)* # Negative lookahead to prevent running into another 'begin'
end # And make sure that there's an 'end' ahead
) # End positive lookahead
\D # Match non-digits
)*? # Second non-capture group repeated many times, lazily
(\d+) # Capture digits
A debuggex if that also helps:

Ideal solution
What is really needed here is a positive lookbehind with variable width. The regex would end up like this:
~(?<=begin.*)\d+(?=.*end)~s
However, as of this writing, the PHP regex flavor doesn't support this feature. Only lookbehind with fixed width is supported. (.Net flavor does though).
Workaround
To acheive our goal, we can use preg_replace_callback with the following regex:
~(?<token>begin|end)|(?<number>\d+)|.*?~s
Sample code
function extract_number($input) {
function matchNumbers($match) {
static $in_region = false;
switch ($match['token']) {
case 'begin':
$in_region=true;
break;
case 'end':
$in_region=false;
break;
}
if ($in_region && isset($match['number'])) {
return $match['number'].',';
} else {
return '';
}
}
$ret=preg_replace_callback('~(?<token>begin|end)|(?<number>\d+)|.*?~s', 'matchNumbers', $input);
return array_filter(explode(',',$ret));
}
echo '<pre>';
echo var_dump(extract_number($str));
echo '</pre>';
Output (with OP's example)
array(3) {
[0]=>
string(3) "899"
[1]=>
string(2) "50"
}

Assuming your project data only has one begin and end "marker" in the text, you can build a more direct and efficient pattern...
Code: (PHP Demo) (Pattern Demo)
$text = "11
a
b
13
begin
t
899
y
50
f
end
91
h";
var_export(preg_match_all('~(?:begin|\G(?!^))(?:(?!end)\D)+\K\d+~s', $text, $out) ? $out[0] : 'no matches');
Output:
array (
0 => '899',
1 => '50',
)
Layman's Breakdown:
(?:begin|\G(?!^)) #match "begin" or continue matching from the position immediately after previous match
(?:(?!end)\D)*? #match zero or more occurrences of any non-digit character while screening for "end". If end is found, immediately cease pattern execution.
\K #restart the fullstring match from this position; this avoids the expense of using a capture group on the desired digits
\d+ #match one or more digits (as much as possible)
See the Pattern Demo link for a more academic breakdown of the pattern.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to extract an ID number from a string? - php

Related

preg_replace how to remove all numbers except alphanumeric

Get string with number of specified length

Regex validation for North American phone numbers

php regular expression minimum and maximum length doesn't work as expected

How to work around PHP lookbehind fixed width limitation?

Categories

Resources