Supposing there's an array as the following:
$arr = array('foo1234bar', 'foo1234', '1234bar', 'foo12345bar');
and I just need the elements that contains 4 characters in total only. So except for foo12345bar, other 3 elements are valid.
Because '\d{4}' would match foo12345bar, so I try following clumsily:
$arr = array('foo1234bar', 'foo1234', '1234bar', 'foo12345bar');
$result = array();
foreach ($arr as $value) {
preg_match('/\d+/', $value, $match);
if (strlen($match[0]) != 4) {
continue;
}
$result[] = $value;
}
var_dump($result); //array('foo1234bar', 'foo1234', '1234bar')
Is there a regular expression to match directly(so the if condition can be omitted)? Thank you in advance.
This easy to handle with look-around regex and preg_grep function:
$arr = array('foo1234bar', 'foo1234', '1234bar', 'foo12345bar');
print_r(preg_grep('/(?<!\d)\d{4}(?!\d)/', $arr));
RegEx Breakup:
(?<!\d) # assert previous char is not a digit
\d{4} # match exact 4 digits
(?!\d) # assert next char is not a digit
Output:
Array
(
[0] => foo1234bar
[1] => foo1234
[2] => 1234bar
)
Assuming the characters in front of and after the numbers will always be alphabetical, you can use this regex:
^[a-zA-Z]*\d{4}[a-zA-Z]+$
Modify your regex as follows
/^\D*\d{4}\D*$/
Explaination
^ your string must start with
\D any non-digit char
* repeated from 0 to infinite times
\d{4} followed by any digit repeated EXACTLY 4 times
\D followed by any non-digit char
* repeated from 0 to infinite times
$ end of the string
Moreover you could modify your code as follows
$arr = array('foo1234bar', 'foo1234', '1234bar', 'foo12345bar');
$result = array_filter(
$arr,
function($element) {
return preg_match('/^\D*\d{4}\D*$/', $element);
}
);
var_dump($result);
Pay attention
As OP didn't specify it, this regex will match even 1234 (any four digit string without non-digit chars in front or behind). If he wishes to have at least a char in front or/and behind, this regex must be changed.
the regexp will be \d{4}
preg_match('/\d{4}/', $value, $match);
expect will help
you might try folowing : preg_match('/\D\d{4}\D/', $value, $match); it searches for:
a not digit(/D)
4 digits(/d{4})
again a non digit(/D)
This regular expression will work on all your examples:
'/^\D*(\d{4})\D*$/'
││ │ │ └── end string
││ │ └───── zero or more NOT digits
││ └─────────── four digits ( match 1 )
│└─────────────── zero or more NOT digits
└──────────────── start string
They doesn't work if there are multiple number group in the string ( '123abc1234o' ).
Related
Before I store user-supplied phone numbers in my database, I need to standatdize/sanitize the string to consist of exactly 10 digits.
I want to end up with 1112223333 from all of these potential input values:
(111)222-3333
111-222-3333
111.222.3333
+11112223333
11112223333
In the last two strings, there's a 1 as the country code.
I was able to make some progress with:
preg_replace('/\D/', '', mysqli_real_escape_string($conn, $_POST["phone"]));
Can anyone help me to fix up the strings that have more than 10 digits?
Using your preg_replace which got all but the last one. Next you count the length of the string and remove the first number if it's over 9 numbers.
preg_replace('/\D/', '', mysqli_real_escape_string($conn, $_POST["phone"]));
if(strlen($str) > 9){
$str = substr($str, 1);
}
If you want to parse phone numbers, a very useful library is giggsey/libphonenumber-for-php. It is based on Google's libphonenumber, it has also a demo online to show how it works
Do it in two passes:
$phone = [
'(111)222-3333',
'111-222-3333',
'111.222.3333',
'+11112223333',
'11112223333',
'+331234567890',
];
# remove non digit
$res = preg_replace('/\D+/', '', $phone);
# keep only 10 digit
$res = preg_replace('/^\d+(\d{10})$/', '$1', $res);
print_r($res);
Output:
Array
(
[0] => 1112223333
[1] => 1112223333
[2] => 1112223333
[3] => 1112223333
[4] => 1112223333
[5] => 1234567890
)
This task can/should be accomplished by making just one pass over the string to replace unwanted characters.
.* #greedily match zero or more of any character
(\d{3}) #capture group 1
\D* #greedily match zero or more non-digits
(\d{3}) #capture group 2
\D* #greedily match zero or more non-digits
(\d{4}) #capture group 3
$ #match end of string
Matching the position of the end of the string ensures that the final 10 digits from the string are captured and any extra digits at the front of the string are ignored.
Code: (Demo)
$strings = [
'(111)222-3333',
'111-222-3333',
'111.222.3333',
'+11112223333',
'11112223333'
];
foreach ($strings as $string) {
echo preg_replace(
'/.*(\d{3})\D*(\d{3})\D*(\d{4})$/',
'$1$2$3',
$string
) . "\n---\n";
}
Output:
1112223333
---
1112223333
---
1112223333
---
1112223333
---
1112223333
---
The same result can be achieved by changing the third capture group to be a lookahead and only using two backreferences in the replacement string. (Demo)
echo preg_replace(
'/.*(\d{3})\D*(\d{3})\D*(?=\d{4}$)/',
'$1$2',
$string
);
Finally, a much simpler pattern can be used to purge all non-digits, but this alone will not trim the string down to 10 characters. Calling substr() with a starting offset of -10 will ensure that the last 10 digits are preserved. (Demo)
echo substr(preg_replace('/\D+/', '', $string), -10);
As a side note, you should use a prepared statement to interact with your database instead of relying on escaping which may have vulnerabilities.
Use str_replace with an array of the characters you want to remove.
$str = "(111)222-3333 111-222-3333 111.222.3333 +11112223333";
echo str_replace(["(", ")", "-", "+", "."], "", $str);
https://3v4l.org/80AWc
I have the word AK747, I use regex to detect if a string (at least 2 chars ex: AK) is followed by a number (at least to digits ex: 747).
EDIT : (sorry that I wasn't clear on this guys)
I need to do this above because :
In some case I need to split to match search against AK-747. When I search for string 'AK-747' with keyword 'AK747' it won't find a match unless I use levenshtein in database, so I prefer splitting AK747 to AK and 747.
My code:
$strNumMatch = preg_match('/^[a-zA-Z]{2,}[0-9]{2,}$/',
$value, $match);
if(isset($match[0]))
echo $match[0];
How do I split to array ['AK', '747'] for example with preg_split() or any other way?
$input = 'AK-747';
if (preg_match('/^([a-z]{2,})-?([0-9]{2,})$/i', $input, $result)) {
unset($result[0]);
}
print_r($result);
The output:
Array
(
[1] => AK
[2] => 747
)
You may try this:
preg_match('/[0-9]{2,}/', $value, $matches, PREG_OFFSET_CAPTURE);
$position = $matches[0][1];
$letters = substr($value, 0, $position);
$numbers = substr($value, $position);
This way you get the position of the first number and split there.
EDIT:
Starting from your original approach this could look somewhat like this:
$strNumMatch = preg_match('/^([a-zA-Z]{2,})([0-9]{2,})$/', $value, $match, PREG_OFFSET_CAPTURE);
if($strNumMatch){
$position = $matches[2][1];
$letters = substr($value, 0, $position);
$numbers = substr($value, $position);
$alternative = $letters.'-'.$numbers;
}
preg_split() is a very sensible and direct call since you desire an indexed array containing the two substrings.
Code: (Demo)
$input = 'AK-747';
var_export(preg_split('/[a-z]{2,}\K-?/i',$input));
Output:
array (
0 => 'AK',
1 => '747',
)
The \K means "restart the fullstring match". Effectively, everything to the left of \K is retained as the first element in the result array and everything to right (the optional hyphen) is omitted because it is considered the delimiter. Pattern Demo
Code: (Demo)
I process a small battery of inputs to show what can be done and explain after the snippet.
$inputs=['AK747','AK-747','AK-','AK']; // variations as I understand them
foreach($inputs as $input){
echo "$input returns: ";
var_export(preg_split('/[a-z]{2,}\K-?/i',$input,2,PREG_SPLIT_NO_EMPTY));
echo "\n";
}
Output:
AK747 returns: array (
0 => 'AK',
1 => '747',
)
AK-747 returns: array (
0 => 'AK',
1 => '747',
)
AK- returns: array (
0 => 'AK',
)
AK returns: array (
0 => 'AK',
)
preg_split() takes a pattern that receives a pattern that will match a variable substring and use it as a delimiter. If - were present in every input string then explode('-',$input) would be most appropriate. However, - is optional in this task, so the pattern must allow - to be optional (this is what the ? quantifier does in all of the patterns on this page).
Now, you couldn't just use a pattern like /-?/, that would split the string on every character. To overcome this, you need to tell the regex engine the exact expected location for the optional -. You do this by referencing [a-z]{2,} before the -? (single intended delimiter).
The pattern /[a-z]{2,}-?/i does a fair job of finding the correct location for the optional hyphen, but now the trouble is, the leading letters in the string are included as part of the delimiting substring.
Sometimes, "lookarounds" can be used in regex patterns to match but not consume substrings. A "positive lookbehind" is used to match a preceding substring, however "variable length lookbehinds" are not permitted in php (and most other regex flavors). This is what the invalid pattern would look like: /(?<=[a-z]{2,})-?/i.
The way around this technicality is to "restart the fullstring match" using the \K token (aka a lookbehind alternative) just before the optional hyphen. To correctly target only the intended delimiter, the leading letters must be "matched/consumed" then "discarded" -- that's what \K does.
As for the inclusion of the 3rd and 4th parameter of preg_split()...
I've set the 3rd parameter to 2. This is just like the limit parameter that explode() has. It instructs the function to not make more than 2 output elements. For this case, I could have used NULL or -1 to mean "unlimited", but I could NOT leave the parameter empty -- it must be assigned to allow for the declaration of the 4th parameter.
I've set the 4th parameter to PREG_SPLIT_NO_EMPTY which instructs the function to not generate empty output elements.
Ta-Da!
p.s. a preg_match_all() solution is as easy as using a pipe and two anchors:
$inputs=['AK747','AK-747','AK-','AK']; // variations as I understand them
foreach($inputs as $input){
echo "$input returns: ";
var_export(preg_match_all('/^[a-z]{2,}|\d{2,}$/i',$input,$out)?$out[0]:[]);
echo "\n";
}
// same outputs as above
You can make the - optional with ?.
/([A-Za-z]{2,}-?[0-9]{2,})/
https://regex101.com/r/tIgM4F/1
I'm trying to get the string that match with original and with number in the end.
I got these strings:
mod_courts2
mod_courts_config
mod_courts_config2
From these strings I want the one that matches only with "mod_courts" with number in the end.
I'm doing this:
if (strpos($t, "mod_courts") !== FALSE) {
preg_match('/^\w+(\d+)$/U', $t, $match);
echo $match;
}
This returns me "mod_courts2" and "mod_courts_config2", I just want "mod_courts2"
Use the following regex:
/^[a-z]+_[a-z]+(\d+)$/
Explanation:
^ - assert position at the beginning of the string
[a-z]+ - match any alphabet one or more times
_ - match a literal undescore character
[a-z]+ - match any alphabet one or more times
(\d+) - match (and capture) any digit from 0 to 9 one or more times
$ - assert position at the end of the string
Test cases:
$array = array(
'mod_courts2',
'mod_courts_config',
'mod_courts_config2'
);
foreach ($array as $string) {
if(preg_match('/^[a-z]+_[a-z]+(\d+)$/i', $string, $matches)) {
print_r($matches);
}
}
Output:
Array
(
[0] => mod_courts2
[1] => 2
)
Very simply, you can do:
/^(mod_courts\d+)$/
However, if you want exactly the following format: sometext_somettext2, you can use the following regex:
/^([a-zA-Z]+_[a-zA-Z]+\d+)$/
or
/^([^_]+_[^_]+\d+)$/
Demos
http://regex101.com/r/jP8iC1
http://regex101.com/r/tI1uX8
http://regex101.com/r/fX8pO5
^mod_courts\d+$
this should do it
You can just use
^mod_courts[0-9]+$
Meaning mod_courts followed by a number (and only that, thanks to ^$ matching the beginning and end of the string). No need for the strpos check.
I'm trying to split a string by non-alphanumeric delimiting characters AND between alternations of digits and non-digits. The end result should be a flat array of consisting of alphabetic strings and numeric strings.
I'm working in PHP, and would like to use REGEX.
Examples:
ES-3810/24MX should become ['ES', '3810', '24', 'MX']
CISCO1538M should become ['CISCO' , '1538', 'M']
The input file sequence can be indifferently DIGITS or ALPHA.
The separators can be non-ALPHA and non-DIGIT chars, as well as a change between a DIGIT sequence to an APLHA sequence, and vice versa.
The command to match all occurrances of a regex is preg_match_all() which outputs a multidimensional array of results. The regex is very simple... any digit ([0-9]) one or more times (+) or (|) any letter ([A-z]) one or more times (+). Note the capital A and lowercase z to include all upper and lowercase letters.
The textarea and php tags are inluded for convenience, so you can drop into your php file and see the results.
<textarea style="width:400px; height:400px;">
<?php
foreach( array(
"ES-3810/24MX",
"CISCO1538M",
"123ABC-ThatsHowEasy"
) as $string ){
// get all matches into an array
preg_match_all("/[0-9]+|[[:upper:][:lower:]]+/",$string,$matches);
// it is the 0th match that you are interested in...
print_r( $matches[0] );
}
?>
</textarea>
Which outputs in the textarea:
Array
(
[0] => ES
[1] => 3810
[2] => 24
[3] => MX
)
Array
(
[0] => CISCO
[1] => 1538
[2] => M
)
Array
(
[0] => 123
[1] => ABC
[2] => ThatsHowEasy
)
$str = "ES-3810/24MX35 123 TEST 34/TEST";
$str = preg_replace(array("#[^A-Z0-9]+#i","#\s+#","#([A-Z])([0-9])#i","#([0-9])([A-Z])#i"),array(" "," ","$1 $2","$1 $2"),$str);
echo $str;
$data = explode(" ",$str);
print_r($data);
I could not think on a more 'cleaner' way.
The most direct preg_ function to produce the desired flat output array is preg_split().
Because it doesn't matter what combination of alphanumeric characters are on either side of a sequence of non-alphanumeric characters, you can greedily split on non-alphanumeric substrings without "looking around".
After that preliminary obstacle is dealt with, then split on the zero-length positions between a digit and a non-digit OR between a non-digit and a digit.
/ #starting delimiter
[^a-z\d]+ #match one or more non-alphanumeric characters
| #OR
\d\K(?=\D) #match a number, then forget it, then lookahead for a non-number
| #OR
\D\K(?=\d) #match a non-number, then forget it, then lookahead for a number
/ #ending delimiter
i #case-insensitive flag
Code: (Demo)
var_export(
preg_split('/[^a-z\d]+|\d\K(?=\D)|\D\K(?=\d)/i', $string, 0, PREG_SPLIT_NO_EMPTY)
);
preg_match_all() isn't a silly technique, but it doesn't return the array, it returns the number of matches and generates a reference variable containing a two dimensional array of which the first element needs to be accessed. Admittedly, the pattern is shorter and easier to follow. (Demo)
var_export(
preg_match_all('/[a-z]+|\d+/i', $string, $m) ? $m[0] : []
);
I have some string data with alphanumeric value. like us01name, phc01name and other i.e alphabates + number + alphabates.
i would like to get first alphabates + number in first string and remaining on second.
How can i do it in php?
You can use a regular expression:
// if statement checks there's at least one match
if(preg_match('/([A-z]+[0-9]+)([A-z]+)/', $string, $matches) > 0){
$firstbit = $matches[1];
$nextbit = $matches[2];
}
Just to break the regular expression down into parts so you know what each bit does:
( Begin group 1
[A-z]+ As many alphabet characters as there are (case agnostic)
[0-9]+ As many numbers as there are
) End group 1
( Begin group 2
[A-z]+ As many alphabet characters as there are (case agnostic)
) End group 2
Try this code:
preg_match('~([^\d]+\d+)(.*)~', "us01name", $m);
var_dump($m[1]); // 1st string + number
var_dump($m[2]); // 2nd string
OUTPUT
string(4) "us01"
string(4) "name"
Even this more restrictive regex will also work for you:
preg_match('~([A-Z]+\d+)([A-Z]+)~i', "us01name", $m);
You could use preg_split on the digits with the pattern capture flag. It returns all pieces, so you'd have to put them back together. However, in my opinion is more intuitive and flexible than a complete pattern regex. Plus, preg_split() is underused :)
Code:
$str = 'user01jason';
$pieces = preg_split('/(\d+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($pieces);
Output:
Array
(
[0] => user
[1] => 01
[2] => jason
)