I have numbers wrapped with curly brackets in my text i.e. {123} or {456ABC}. I also have numbers not wrapped with brackets i.e. 789. I want to match these not-yet wrapped numbers and use PHP's preg_replace to wrap them with pound signs i.e. #789#. The numbers usually range from 1-3 digits.
print(preg_replace('/\d+/','#$0#',
'1) I can count to 2997510. You can only count to {456ABC}.'));
Desired output:
#1#) I can count to #2997510#. You can only count to {456ABC}.
What regex would match the numbers? I've tried negative lookahead (?![^\{])\d+ and [^\{](\d+)[^\{]
[^\{\dA-F]([A-F\d]+)[^\}\dA-F]
(I'm assuming that you're trying to match hex numbers with capital letters; if not, just alter the character class appropriately.)
The extra \d's are in the negative character classes because if they aren't there, then the engine will avoid brackets by cutting off the outermost digits. For instance, [^\{](\d+)[^\}] will match the 456 in {34567}.
The number itself is "group 1" of any match. If you need the entire match itself to be the number, use a lookahead and a lookbehind:
(?<=[^\{\dA-F])([A-F\d]+)(?=[^\}\dA-F])
Here is a Perl-style search-and-replace to insert the #'s, with no lookahead or lookbehind:
s/([^\{\dA-F])([A-F\d]+)([^\}\dA-F])/$1#$2#$3/g
(\A|[^{\d])(\d[\d\w]*)(\z|[^\}\d\z]) should do it for you.
Used like:
print(preg_replace('/(\A|[^{\d])(\d[\d\w]*)(\z|[^\}\d\z])/','$1#$2#$3',
'1) I can count to 2997510. You can only count to {456ABC}.'));
Explanation:
The first part (\A|[^{\d]) matches either the start of the input (to catch numbers at the beginning of the string) or a non { or digit. This part ensures the numbers aren't already wrapped.
The second part (\d[\d\w]*) does the actual matching of the number. It matches anything that starts with a digit followed by any number of contiguous digits or letters.
The last part (\z|[^\}\d\z]) is analogous to the first part, except looks for the end of the input.
Because this regular expression can capture a character before and after the target number, it is important to add those characters back in using the 1st and 3rd matched subgroups (as seen in the PHP example.
Related
I have a piece of data, retrieved from the database and containing information I need. Text is entered in a free form so it's written in many different ways. The only thing I know for sure is that I'm looking for the first number after a given string, but after that certain string (before the number) can be any text as well.
I tried this (where mytoken is the string I know for sure its there) but this doesn't work.
/(mytoken|MYTOKEN)(.*)\d{1}/
/(mytoken|MYTOKEN)[a-zA-Z]+\d{1}/
/(mytoken|MYTOKEN)(.*)[0-9]/
/(mytoken|MYTOKEN)[a-zA-Z]+[0-9]/
Even mytoken can be written in capitals, lowercase or a mix of capitals and lowercase character. Can the expression be case insensitive?
You do not need any lazy matching since you want to match any number of non-digit symbols up to the first digit. It is better done with a \D*:
/(mytoken)(\D*)(\d+)/i
See the regex demo
The pattern details:
(mytoken) - Group 1 matching mytoken (case insensitively, as there is a /i modifier)
(\D*) - Group 2 matching zero or more characters other than a digit
(\d+) - Group 3 matching 1 or more digits.
Note that \D also matches newlines, . needs a DOTALL modifier to match across newlines.
You need to use a lazy quantifier. You can do that by putting a question mark after the star quantifier in the regex: .*?. Otherwise, the numbers will be matched by the dot operator until the last number, which will be matched by \d.
Regex: /(mytoken|MYTOKEN)(.*?)\d/
Regex demo
You can use the opposite:
/(mytoken|MYTOKEN)(\D+)(\d)/
This says: mytoken, followed by anything not a number, followed by a number. The (lazy) dot-star-soup is not always your best bet. The desired number will be in $3 in this example.
In my project (php), I got some regexs(pcre) like this one :
preg_match('/[\s^0-9]{0,1}([0-9]{2})[\s^0-9]{0,1}/',$chanson['nom'],$resultPreg1)
This regex catch two numbers who can be delimited or not by a single space, and can't be delimited by number. What I want to do is, that there is or a space (and no number) in beginning, or a space (and no number) at the end. But it must have at least one delimiter.
How can I do this ?
You simply need to split it up and test each case:
/\s\d{2}\D|\D\d{2}\s/
This will match a space, two digits, and any non-digit character or a non-digit character, two digits and a space.
Note: \d is a digit, equivalent to [0-9]. \D is a non-digit, equivalent to [^0-9].
The above regex requires there to be at least one non-digit on each side of the numbers, however. Also, if you had a pattern like .11 22., it would not match both numbers, because the space would be eaten up by the first match. If this is a problem, you can use look-arounds:
/\s\d{2}(?!\d)|(?<!\d)\d{2}\s/
This matches a space, then two digits not followed by another digit or two digits not preceded by a digit, followed by a space.
(?!...) is negative look-ahead. It means "the match cannot be followed by this."
(?<!...) is negative look-behind, meaning "the match cannot be preceded by this."
You can't mix the negative and positive character classes in a single set of square brackets. A "space" OR "not a number" could be written \s|[^0-9]. But a space isn't a number, so no need to put it in specially, just [^0-9] will suffice for you. Your syntax for "zero or one" of {0,1} is technically correct, but there is a much more concise syntax for the same thing: ?.
preg_match('/[^0-9]?([0-9]{2})[^0-9]?/',$chanson['nom'],$resultPreg1)
You could almost use word breaks around your number to get what you are looking for except it wouldn't find numbers embedded in letters like "abc23def".
preg_match('/\b([0-9]{2})\b/',$chanson['nom'],$resultPreg1)
I want to write php regular expression to find uppercase string , which can also contain one number and spaces, from text.
For example from this text "some text to contain EXAM PL E 7STRING uppercase word" I want to get string- EXAM PL E 7STRING ,
found string should start and end only with uppercase, but in the middle, without uppercase letters can also contain(but not necessarily ) one number and spaces. So, regex should match any of these patterns
1) EXAMPLESTRING - just uppercase string
2) EXAMP4LESTRING - with number
3) EXAMPLES TRING - with space
4) EXAM PL E STRING - with more than one spaces
5) EXAMP LE4STRING - with number and space
6) EXAMP LE 4ST RI NG - with number and spaces
and with total length string should be equal or more than 4 letters
I wrote this regex '/[A-Z]{1,}([A-Z\s]{2,}|\d?)[A-Z]{1,}/', that can find first 4 patterns, but I can not figure it out to match also the last 2 patterns.
Thanks
There is a neat trick called a lookahead. It just checks what is following after the current position. That can be used to check for multiple conditions:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])(?!(?:[A-Z\s]*\d){2})[A-Z][A-Z\s\d]*[A-Z]/'
The first lookaround is actually a lookbehind and checks that there is no previous uppercase letter. This is just a little speedup for strings that would fail the match anyway. The second lookaround (a lookahead) checks that there are at least four letters. The third one checks that there are no two digits. The rest just matches then a string of the allowed characters, starting and ending with an uppercase letter.
Note that in the case of two digits this will not match at all (instead of matching everything up to the second digit). If you do want to match in such a case, you could incorporate the "1 digit" rule into the actual match instead:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])[A-Z][A-Z\s]*\d?[A-Z\s]*[A-Z]/'
EDIT:
As Ωmega pointed out, this will cause problems if there are less then four letters before the second digit, but more after that. This is actually quite tough, because the assertion needs to be, that there are more than 4 letters before the second digit. Since we do not know where the first digit occurs in those four letters, we have to check for all possible positions. For this I would do away with the lookaheads altogether, and simply provide the three different alternatives. (I will keep the lookbehind as an optimization for non-matching parts.)
'/(?<![A-Z])[A-Z]\s*(?:\d\s*[A-Z]\s*[A-Z]|[A-Z]\s*\d\s*[A-Z]|[A-Z]\s*[A-Z][A-Z\s]*\d?)[A-Z\s]*[A-Z]/'
Or here with added comments:
'/
(?<! # negative lookbehind
[A-Z] # current position is not preceded by a letter
) # end of lookbehind
[A-Z] # match has to start with uppercase letter
\s* # optional spaces after first letter
(?: # subpattern for possible digit positions
\d\s*[A-Z]\s*[A-Z]
# digit comes after first letter, we need two more letters before last one
| # OR
[A-Z]\s*\d\s*[A-Z]
# digit comes after second letter, we need one more letter before last one
| # OR
[A-Z]\s*[A-Z][A-Z\s]*\d?
# digit comes after third letter, or later, or not at all
) # end of subpattern for possible digit positions
[A-Z\s]* # arbitrary amount of further letters and whitespace
[A-Z] # match has to end with uppercase letter
/x'
That gives the same result on Ωmega's lengthy test input.
I suggest to use regex pattern
[A-Z][ ]*(\d)?(?(1)(?:[ ]*[A-Z]){3,}|[A-Z][ ]*(\d)?(?(2)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(\d)?(?(3)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(?:\d|(?:[ ]*[A-Z])+[ ]*\d?))))(?:[ ]*[A-Z])*
(see this demo).
[A-Z][ ]*(?:\d(?:[ ]*[A-Z]){2}|[A-Z][ ]*\d[ ]*[A-Z]|(?:[A-Z][ ]*){2,}\d?)[A-Z ]*[A-Z]
(see this demo)
I'm trying to work out a regex pattern to search a string for a 12 digit number. The number could have any number of other characters (but not numbers) in front or behind the one I am looking for.
So far I have /([0-9]{12})/ which finds 12 digit numbers correctly, however it also will match on a 13 digit number in the string.
the pattern should match 123456789012 on the following strings
"rgergiu123456789012ergewrg"
"123456789012"
"#123456789012"
"ergerg ergerwg erwgewrg \n rgergewrgrewg regewrge 123456789012 ergwerg"
it should match nothing on these strings:
"123456789012000"
"egjkrgkergr 123123456789012"
What you want are look-arounds. Something like:
/(?<![0-9])[0-9]{12}(?![0-9])/
A lookahead or lookbehind matches if the pattern is preceded or followed by another pattern, without consuming that pattern. So this pattern will match 12 digits only if they are not preceded or followed by more digits, without consuming the characters before and after the numbers.
/\D(\d{12})\D/ (in which case, the number will be capture index 1)
Edit: Whoops, that one doesn't work, if the number is the entire string. Use the one below instead
Or, with negative look-behind and look-ahead: /(?<!\d)\d{12}(?!\d)/ (where the number will be capture index 0)
if( preg_match("/(?<!\d)\d{12}(?!\d)/", $string, $matches) ) {
$number = $matches[0];
# ....
}
where $string is the text you're testing
I recently asked a question on formatting a telephone number and I got lots of responses. Most of the responses were great but one i really wanted to figure out what its doing because it worked great. If phone is the following how do the other lines work...what are they doing so i can learn
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Let's break the code into two lines.
preg_replace("~[^0-9]~", "", $phone);
First, we're going to replace matches to a regex with an empty string (in other words, delete matches from the string). The regex is [^0-9] (the ~ on each end is a delimiter). [...] in a regex defines a character class, which tells the regex engine to match one character within the class. Dashes are generally special characters inside a character class, and are used to specify a range (ie. 0-9 means all characters between 0 and 9, inclusive).
You can think of a character class like a shorthand for a big OR condition: ie. [0-9] is a shorthand for 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9. Note that classes don't have to contain ranges, either -- [aeiou] is a character class that matches a or e or i or o or u (or in other words, any vowel).
When the first character in the class is ^, the class is negated, which means that the regex engine should match any character that isn't in the class. So when you put all that together, the first line removes anything that isn't a digit (a character between 0 and 9) from $phone.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
The second line tries to match $phone against a second expression, and puts the results into an array called $matches, if a match is made. You will note there are three sets of brackets; these define capturing groups -- ie. if there is a match of a pattern as a whole, you will end up with three submatches, which in this case will contain the area code, prefix and suffix of the phone number. In general, anything contained in brackets in a regular expression is capturing (while there are exceptions, they are beyond the scope of this explanation). Groups can be useful for other things too, without wanting the overhead of capturing, so a group can be made non-capturing by prefacing it with ?: (ie. (?:...)).
Each group does a similar thing: [0-9]{3} or [0-9]{4}. As we saw above, [0-9] defines a character class containing the digits between 0 and 9 (as the classes here don't start with ^, these are not negated groups). The {3} or {4} is a repetition operator, which says "match exactly 3 (or 4) of the previous token (or group)". So [0-9]{3} will match exactly three digits in a row, and [0-9]{4} will match exactly four digits in a row. Note that the digits don't have to be all the same (ie. 111), because the character class is evaluate for each repetition (so 123 will match because 1 matches [0-9], then 2 matches [0-9], and then 3 matches [0-9]).
In the preg_replace it looks for anything that is not, ^ inside of the [], 0-9 (basically not a number) and replaces / removes it from that string given the replacement is "".
For the first section, it pulls out the first 3 numbers ([0-9]{3}) the {3} is the number of characters to match the items inside the [] are what to match and since this is inside of paranthesis () it stores it as a match in the array $matches. The second part pulls out the next 3 numbers and the last part pulls out the last 4 numbers from $phone and stores the matches that were matched in $matches.
The ~ are delimeters for the regular expressions.
You know it's a regular expression from the regex tag.
So, you are pattern matching.
The pattern you are matching is: [^0-9] followed by the phone number.
[^0-9] is NOT '^' any one digit
So, the match after that is any 3 digits, followed by any 3 digits, followed by any 4 digits.
I don't think it will match because of the () around the area code and the dash are missing.
I'd do this:
~\(([0-9]{3})\)([0-9]{3})-([0-9]{4})~'
"[^0-9]" means everything but numbers from 0 to 9. So basically, first line replace everything but numbers with "" (nothing)
[0-9]{3} means number from 0 to 9, 3 times in a row.
So it check if you have 3 numbers then 3 numbers than 4 numbers and try to match it with $matches.
Check this tuts
Using Regular Expressions with PHP
http://www.webcheatsheet.com/php/regular_expressions.php
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
In php you have to delimit regex pattern in some non-alphanumeric character "~" is used here.
[^0-9] is regex pattern used to remove anything out of $phone that is not in 0-9 range remember [^...] will negate the pattern it precedes.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Again in this line of code you have "~" as delimiter and
([0-9]{3}) this part of pattern will return 3 numbers from string (note: {} is used to specify range/number of characters to match) in a different output array dimension (check your $matches variable for result) using ( ) in a pattern results in groups/submatches