I recently asked a question on formatting a telephone number and I got lots of responses. Most of the responses were great but one i really wanted to figure out what its doing because it worked great. If phone is the following how do the other lines work...what are they doing so i can learn
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Let's break the code into two lines.
preg_replace("~[^0-9]~", "", $phone);
First, we're going to replace matches to a regex with an empty string (in other words, delete matches from the string). The regex is [^0-9] (the ~ on each end is a delimiter). [...] in a regex defines a character class, which tells the regex engine to match one character within the class. Dashes are generally special characters inside a character class, and are used to specify a range (ie. 0-9 means all characters between 0 and 9, inclusive).
You can think of a character class like a shorthand for a big OR condition: ie. [0-9] is a shorthand for 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9. Note that classes don't have to contain ranges, either -- [aeiou] is a character class that matches a or e or i or o or u (or in other words, any vowel).
When the first character in the class is ^, the class is negated, which means that the regex engine should match any character that isn't in the class. So when you put all that together, the first line removes anything that isn't a digit (a character between 0 and 9) from $phone.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
The second line tries to match $phone against a second expression, and puts the results into an array called $matches, if a match is made. You will note there are three sets of brackets; these define capturing groups -- ie. if there is a match of a pattern as a whole, you will end up with three submatches, which in this case will contain the area code, prefix and suffix of the phone number. In general, anything contained in brackets in a regular expression is capturing (while there are exceptions, they are beyond the scope of this explanation). Groups can be useful for other things too, without wanting the overhead of capturing, so a group can be made non-capturing by prefacing it with ?: (ie. (?:...)).
Each group does a similar thing: [0-9]{3} or [0-9]{4}. As we saw above, [0-9] defines a character class containing the digits between 0 and 9 (as the classes here don't start with ^, these are not negated groups). The {3} or {4} is a repetition operator, which says "match exactly 3 (or 4) of the previous token (or group)". So [0-9]{3} will match exactly three digits in a row, and [0-9]{4} will match exactly four digits in a row. Note that the digits don't have to be all the same (ie. 111), because the character class is evaluate for each repetition (so 123 will match because 1 matches [0-9], then 2 matches [0-9], and then 3 matches [0-9]).
In the preg_replace it looks for anything that is not, ^ inside of the [], 0-9 (basically not a number) and replaces / removes it from that string given the replacement is "".
For the first section, it pulls out the first 3 numbers ([0-9]{3}) the {3} is the number of characters to match the items inside the [] are what to match and since this is inside of paranthesis () it stores it as a match in the array $matches. The second part pulls out the next 3 numbers and the last part pulls out the last 4 numbers from $phone and stores the matches that were matched in $matches.
The ~ are delimeters for the regular expressions.
You know it's a regular expression from the regex tag.
So, you are pattern matching.
The pattern you are matching is: [^0-9] followed by the phone number.
[^0-9] is NOT '^' any one digit
So, the match after that is any 3 digits, followed by any 3 digits, followed by any 4 digits.
I don't think it will match because of the () around the area code and the dash are missing.
I'd do this:
~\(([0-9]{3})\)([0-9]{3})-([0-9]{4})~'
"[^0-9]" means everything but numbers from 0 to 9. So basically, first line replace everything but numbers with "" (nothing)
[0-9]{3} means number from 0 to 9, 3 times in a row.
So it check if you have 3 numbers then 3 numbers than 4 numbers and try to match it with $matches.
Check this tuts
Using Regular Expressions with PHP
http://www.webcheatsheet.com/php/regular_expressions.php
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
In php you have to delimit regex pattern in some non-alphanumeric character "~" is used here.
[^0-9] is regex pattern used to remove anything out of $phone that is not in 0-9 range remember [^...] will negate the pattern it precedes.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Again in this line of code you have "~" as delimiter and
([0-9]{3}) this part of pattern will return 3 numbers from string (note: {} is used to specify range/number of characters to match) in a different output array dimension (check your $matches variable for result) using ( ) in a pattern results in groups/submatches
Related
I have numbers wrapped with curly brackets in my text i.e. {123} or {456ABC}. I also have numbers not wrapped with brackets i.e. 789. I want to match these not-yet wrapped numbers and use PHP's preg_replace to wrap them with pound signs i.e. #789#. The numbers usually range from 1-3 digits.
print(preg_replace('/\d+/','#$0#',
'1) I can count to 2997510. You can only count to {456ABC}.'));
Desired output:
#1#) I can count to #2997510#. You can only count to {456ABC}.
What regex would match the numbers? I've tried negative lookahead (?![^\{])\d+ and [^\{](\d+)[^\{]
[^\{\dA-F]([A-F\d]+)[^\}\dA-F]
(I'm assuming that you're trying to match hex numbers with capital letters; if not, just alter the character class appropriately.)
The extra \d's are in the negative character classes because if they aren't there, then the engine will avoid brackets by cutting off the outermost digits. For instance, [^\{](\d+)[^\}] will match the 456 in {34567}.
The number itself is "group 1" of any match. If you need the entire match itself to be the number, use a lookahead and a lookbehind:
(?<=[^\{\dA-F])([A-F\d]+)(?=[^\}\dA-F])
Here is a Perl-style search-and-replace to insert the #'s, with no lookahead or lookbehind:
s/([^\{\dA-F])([A-F\d]+)([^\}\dA-F])/$1#$2#$3/g
(\A|[^{\d])(\d[\d\w]*)(\z|[^\}\d\z]) should do it for you.
Used like:
print(preg_replace('/(\A|[^{\d])(\d[\d\w]*)(\z|[^\}\d\z])/','$1#$2#$3',
'1) I can count to 2997510. You can only count to {456ABC}.'));
Explanation:
The first part (\A|[^{\d]) matches either the start of the input (to catch numbers at the beginning of the string) or a non { or digit. This part ensures the numbers aren't already wrapped.
The second part (\d[\d\w]*) does the actual matching of the number. It matches anything that starts with a digit followed by any number of contiguous digits or letters.
The last part (\z|[^\}\d\z]) is analogous to the first part, except looks for the end of the input.
Because this regular expression can capture a character before and after the target number, it is important to add those characters back in using the 1st and 3rd matched subgroups (as seen in the PHP example.
I want to write php regular expression to find uppercase string , which can also contain one number and spaces, from text.
For example from this text "some text to contain EXAM PL E 7STRING uppercase word" I want to get string- EXAM PL E 7STRING ,
found string should start and end only with uppercase, but in the middle, without uppercase letters can also contain(but not necessarily ) one number and spaces. So, regex should match any of these patterns
1) EXAMPLESTRING - just uppercase string
2) EXAMP4LESTRING - with number
3) EXAMPLES TRING - with space
4) EXAM PL E STRING - with more than one spaces
5) EXAMP LE4STRING - with number and space
6) EXAMP LE 4ST RI NG - with number and spaces
and with total length string should be equal or more than 4 letters
I wrote this regex '/[A-Z]{1,}([A-Z\s]{2,}|\d?)[A-Z]{1,}/', that can find first 4 patterns, but I can not figure it out to match also the last 2 patterns.
Thanks
There is a neat trick called a lookahead. It just checks what is following after the current position. That can be used to check for multiple conditions:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])(?!(?:[A-Z\s]*\d){2})[A-Z][A-Z\s\d]*[A-Z]/'
The first lookaround is actually a lookbehind and checks that there is no previous uppercase letter. This is just a little speedup for strings that would fail the match anyway. The second lookaround (a lookahead) checks that there are at least four letters. The third one checks that there are no two digits. The rest just matches then a string of the allowed characters, starting and ending with an uppercase letter.
Note that in the case of two digits this will not match at all (instead of matching everything up to the second digit). If you do want to match in such a case, you could incorporate the "1 digit" rule into the actual match instead:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])[A-Z][A-Z\s]*\d?[A-Z\s]*[A-Z]/'
EDIT:
As Ωmega pointed out, this will cause problems if there are less then four letters before the second digit, but more after that. This is actually quite tough, because the assertion needs to be, that there are more than 4 letters before the second digit. Since we do not know where the first digit occurs in those four letters, we have to check for all possible positions. For this I would do away with the lookaheads altogether, and simply provide the three different alternatives. (I will keep the lookbehind as an optimization for non-matching parts.)
'/(?<![A-Z])[A-Z]\s*(?:\d\s*[A-Z]\s*[A-Z]|[A-Z]\s*\d\s*[A-Z]|[A-Z]\s*[A-Z][A-Z\s]*\d?)[A-Z\s]*[A-Z]/'
Or here with added comments:
'/
(?<! # negative lookbehind
[A-Z] # current position is not preceded by a letter
) # end of lookbehind
[A-Z] # match has to start with uppercase letter
\s* # optional spaces after first letter
(?: # subpattern for possible digit positions
\d\s*[A-Z]\s*[A-Z]
# digit comes after first letter, we need two more letters before last one
| # OR
[A-Z]\s*\d\s*[A-Z]
# digit comes after second letter, we need one more letter before last one
| # OR
[A-Z]\s*[A-Z][A-Z\s]*\d?
# digit comes after third letter, or later, or not at all
) # end of subpattern for possible digit positions
[A-Z\s]* # arbitrary amount of further letters and whitespace
[A-Z] # match has to end with uppercase letter
/x'
That gives the same result on Ωmega's lengthy test input.
I suggest to use regex pattern
[A-Z][ ]*(\d)?(?(1)(?:[ ]*[A-Z]){3,}|[A-Z][ ]*(\d)?(?(2)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(\d)?(?(3)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(?:\d|(?:[ ]*[A-Z])+[ ]*\d?))))(?:[ ]*[A-Z])*
(see this demo).
[A-Z][ ]*(?:\d(?:[ ]*[A-Z]){2}|[A-Z][ ]*\d[ ]*[A-Z]|(?:[A-Z][ ]*){2,}\d?)[A-Z ]*[A-Z]
(see this demo)
I'm trying to work out a regex pattern to search a string for a 12 digit number. The number could have any number of other characters (but not numbers) in front or behind the one I am looking for.
So far I have /([0-9]{12})/ which finds 12 digit numbers correctly, however it also will match on a 13 digit number in the string.
the pattern should match 123456789012 on the following strings
"rgergiu123456789012ergewrg"
"123456789012"
"#123456789012"
"ergerg ergerwg erwgewrg \n rgergewrgrewg regewrge 123456789012 ergwerg"
it should match nothing on these strings:
"123456789012000"
"egjkrgkergr 123123456789012"
What you want are look-arounds. Something like:
/(?<![0-9])[0-9]{12}(?![0-9])/
A lookahead or lookbehind matches if the pattern is preceded or followed by another pattern, without consuming that pattern. So this pattern will match 12 digits only if they are not preceded or followed by more digits, without consuming the characters before and after the numbers.
/\D(\d{12})\D/ (in which case, the number will be capture index 1)
Edit: Whoops, that one doesn't work, if the number is the entire string. Use the one below instead
Or, with negative look-behind and look-ahead: /(?<!\d)\d{12}(?!\d)/ (where the number will be capture index 0)
if( preg_match("/(?<!\d)\d{12}(?!\d)/", $string, $matches) ) {
$number = $matches[0];
# ....
}
where $string is the text you're testing
I have a large string (multiple lines) I need to find numbers in with regex. The position the number I need is always proceeded/follow by an exact order of characters so I can use non-capturing matches to pinpoint the exact number I need. I put together a regex to get this number but it refuses to work and I can't figure it out!
Below is a small bit of php code that I can't get to work showing the basic format of what i need
$sTestData = 'lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
$sNumberStripRE = '/.*?(?:sjdhfklsjaf<\\?kjnsdfh)(\\d+)(?:uihrfkjsn\\+%5Bmlknsadlfjncas).*?/gim';
if (preg_match_all($sNumberStripRE, $sTestData, $aMatches))
{
var_dump($aMatches);
}
the number I need is 461 and the characters before/after the spaces on either side of this number are always the same
any help getting the above regex working would be great!
This link RegExr: My Reg Ex (to an online regex genereator and my regex) shows that it should work!
g is an invalid modifier, drop it.
Ideone Link
With regard to that link, which regular expression engine is it working from? Built in Flex, so probably the ActionScript RegExp engine. They are not all the same, each one varies.
You have a number of double-backslashes, they should probably be single in those strings.
$sTestData = 'lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
$lDelim = ' sjdhfklsjaf<?kjnsdfh';
$rDelim = 'uihrfkjsn+%5Bmlknsadlfjncas ';
$start = strpos($sTestData, $lDelim) + strlen($lDelim);
$length = strpos($sTestData, $rDelim) - $start;
$number = substr($sTestData, $start, $length);
Using regex you can accomplish your goal with the following code:
$string='lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
if (preg_match('/(sjdhfklsjaf<\?kjnsdfh)(\d+)(uihrfkjsn\+%5Bmlknsadlfjncas)/', $string, $num_array)) {
$aMatches = $num_array[2];
} else {
$aMatches = "";
}
echo $aMatches;
Explanation:
I declared a variable entitled $string and made it equal to the variable you initially presented. You indicated that the characters on either side of the numeric value of interest were always the same. I assigned the numerical value of interest to $aMatches by setting $aMatches equal to back reference 2. Using the parentheses in regex you will get 3 matches: backreference 1 which will contain the characters before the number, backreference 2 which will contain the numbers that you want, and backreference 3 which is the stuff after the number. I assigned $num_array as the variable name for those backreferences and the [2] indicates that it is the second backreference. So, $num_array[1] would contain the match in backreference 1 and $num_array[3] would contain the match in backreference 3.
Here is the explanation of my regular expression:
Match the regular expression below and capture its match into backreference number 1 «(sjdhfklsjaf<\?kjnsdfh)»
Match the characters “sjdhfklsjaf<” literally «sjdhfklsjaf<»
Match the character “?” literally «\?»
Match the characters “kjnsdfh” literally «kjnsdfh»
Match the regular expression below and capture its match into backreference number 2 «(\d+)»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below and capture its match into backreference number 3 «(uihrfkjsn+%5Bmlknsadlfjncas)»
Match the characters “uihrfkjsn” literally «uihrfkjsn»
Match the character “+” literally «+»
Match the characters “%5Bmlknsadlfjncas” literally «%5Bmlknsadlfjncas»
Hope this helps and best of luck to you.
Steve
I need help on following regular expression rules of javascript and php.
JS
var charFilter = new RegExp("^[A|B].+[^0123456789]$");
PHP
if (!preg_match('/^[A|B].+[^0123456789]$/', $data_array['sample_textfield'])) {
This regular expression is about
First character must be start with A or B and last character must not include 0 to 9.
I have another validation about, character must be min 3 character and max 6 number.
New rule I want to add is, second character cannot be C, if first letter is A.
Which means
ADA (is valid)
ACA (is not valid)
So I changed the regex code like this
JS
var charFilter = new RegExp("^(A[^C])|(B).+[^0123456789]$");
PHP
if (!preg_match('/^(A[^C])|(B).+[^0123456789]$/', $data_array['sample_textfield'])) {
It is worked for first and second character. If i type
ACA (it says invalid) , But if i type
AD3 (it says valid), it doesn't check the last character anymore. Last character must not contain 0 to 9 number, but it's show as valid.
Can anyone help me to fix that regex code for me ? Thank you so much.
Putting all of your requirements together, it seems that you want this pattern:
^(?=.{3,6}$)(?=A(?!C)|B).+\D$
That is:
From the beginning of the string ^
We can assert that there are between 3 to 6 of "any" characters to end of the string (?=.{3,6}$)
We can also assert that it starts with A not followed by C, or starts with B (?=A(?!C)|B)
And the whole thing doesn't end with a digit .+\D$
This will match (as seen on rubular.com):
= match = = no match =
ADA ACA
ABCD AD3
ABCDE ABCDEFG
ABCDEF
A123X
A X
Note that spaces are allowed by .+ and \D. If you insist on no spaces, you can use e.g. (?=\S{3,6}$) in the first part of the pattern.
(?=…) is positive lookahead; it asserts that a given pattern can be matched. (?!…) is negative lookahead; it asserts that a given pattern can NOT be matched.
References
regular-expressions.info
Lookarounds, Alternation, Anchors, Repetition, Dot, Character Class
Related questions
How does the regular expression (?<=#)[^#]+(?=#) work?
On alternation precedence
The problem with the original pattern is in misunderstanding the precedence of the alternation | specifier.
Consider the following pattern:
this|that-thing
This pattern consists of two alternates, one that matches "this", and another that matches "that-thing". Contrast this with the following pattern:
(this|that)-thing
Now this pattern matches "this-thing" or "that-thing", thanks to the grouping (…). Coincidentally it also creates a capturing group (which will capture either "this" or "that"). If you don't need the capturing feature, but you need the grouping aspect, use a non-capturing group ``(?:…)`.
Another example of where grouping is desired is with repetition: ha{3} matches "haaa", but (ha){3} matches "hahaha".
References
regular-expressions.info/Brackets for Grouping
Your OR is against the wrong grouping. Try:
^((A[^C])|(B)).+[^0123456789]$
In jasonbars solution the reason it doesn't match ABC is because it requires A followed by not C, which is two characters, followed by one or more of any character followed by a non number. Thus if the string begins with an A the minimum length is 4. You can solve this by using a look ahead assertion.
PHP
$pattern = '#^(A(?=[^C])|B).+\D$#';
i think it should be like
/^(A[^C]|B.).*[^0-9]$/
try this test code
$test = "
A
B
AB
AC
AAA
ABA
ACA
AA9
add more
";
$pat = '/^(A[^C]|B.).*[^0-9]$/';
foreach(preg_split('~\s+~', $test) as $p)
printf("%5s : %s\n<br>", $p, preg_match($pat, $p) ? "ok" : "not ok");