I'm trying to get a six digit number that is not surrounded by any other number, and is not in a sequence of numbers. This number can exist at the beginning of the string, anywhere in it, and at the end. It can also have commas and text in front of it, but most importantly distinct 6 digit blocks of numbers. I've pulled my hair out doing lookaheads and conditions and can't find a complete solution that solves all issues.
Sample data:
00019123211231731ORDER NO 761616 BR ADDRESS 123 A ST
ORDER NO. 760641 JOHN DOE
REF: ORDER #761625
OP212312165 ORDER NUMBER 759699 /REC/YR 123 A ST
766911
761223,761224,761225
(^|\D)(\d{6})(\D|$). You will find your needed 6 digit match in capturing group 2. Notice that this solution is reliable only for one match. It won't find both numbers in 123456,567890 (Thank you Alan for pointing this out!). If multiple matches are needed a lookaround solution should be used.
With look-arounds:
(?<=^|\D)\d{6}(?=\D|$)
or with look-arounds and the condition to be a valid number (i.e. the first digit is not 0):
(?<=^|\D)[1-9]\d{5}(?=\D|$)
You can use a negative lookbehind and negative lookahead to make sure there are no digits adjacent to the match:
(?<!\d)\d{6}(?!\d)
This only matches the number, and not the adjacent characters.
Also, it works if the match is at the beginning or end of the string.
Couldn't you just as easily use this regex
[^0-9](\d{6})[^0-9]
It should match any 6 digit number, not padded by any other numbers. Therefore not being in a sequence.
Related
I'm trying to create regex, which will match 4 digits and 2 letters in any order. Letters can be in lower and upper cases.
Example:
a1234B
17AF45
aR1307
Any advice would be appreciated.
Thanks.
A brute force approach to this might be to just use two positive lookaheads:
^(?=.*[A-Za-z].*[A-Za-z])(?=.*\d.*\d.*\d.*\d).{6}$
This would match exactly two letters, lowercase or uppercase, and four digits, for a total of six characters.
Demo
For a deeper explanation, consider the first lookahead:
^(?=.*[A-Za-z].*[A-Za-z])
This says to assert (but not match) from the start of the string that two letters occur anywhere in the string. Assuming this is true, then the regex engine will evaluate the next lookahead, which checks for four numbers. If that also be true, then all that is needed is to match any 6 characters. Those matching characters must only letters and numbers, due to the lookaheads.
I have some kind of simple and tricky problem.
Here I have a METAR (Weather in a very specific string format).
LIEA 051550Z 21005KT 9999 FEW020 19/14 Q1011
In this string, 051550Z represents that the weather bullettin has been emitted on 5th of the month at 15:50 UTC,... and 9999 indicates the visibility,...
Well, I tried to match a RegExp which could output me the visibility, but I didn't manage to get out of the problem.
preg_match_all() returns me the numbers
0515 (from the time group)
2100 (from the wind group)
9999 (wanted)
1011 (from the pressure group)
with the RegExp I've tried
([0-9]{4})
And then, I blindly added a
(?!Z)
trying not to get at least the time group...
But it doesn't work...
Looking at the problem itself, is it better to consider taking every time the third element of the array (without (?!Z) RegExp addition) or trying to catch directly the right value?
In my opinion the last choice would be better...
So, how can I get the visibility?
You could use a word boundary \b and then match 4 digits to get the visibility:
\b\d{4}\b
If it has to be 4 digits at the fourth position you could also match the first 3 sets matching 1+ times not a whitespace character \S+ followed by 1+ times a horizonal whitespace \h and repeat that 3 times.
Then use \K to forget what was matched and match 4 digit followed by a word boundary.
^(?:\S+\h+){3}\K\d{4}\b
Regex demo
Assuming I have a set of numbers (from 1 to 22) divided by some trivial delimiters (comma, point, space, etc). I need to make sure that this set of numbers does not contain any repetition of the same number. Examples:
1,14,22,3 // good
1,12,12,3 // not good
Is it possible to do via regular expression?
I know it's easy to do using just php, but I really wander how to make it work with regex.
Yes, you could achieve this through regex via negative looahead.
^(?!.*\b(\d+)\b.*\b\1\b)\d+(?:,\d+)+$
(?!.*\b(\d+)\b.*\b\1\b) Negative lookahead at the start asserts that the there wouldn't be a repeated number present in the match. \b(\d+)\b.*\b\1\b matches the repeated number.
\d+ matches one or more digits.
(?:,\d+)+ One or more occurances of , , one or more digits.
$ Asserts that we are at the end .
DEMO
OR
Regex for the numbers separated by space, dot, comma as delimiters.
^(?!.*\b(\d+)\b.*\b\1\b)\d+(?:([.\s,])\d+)(?:\2\d+)*$
(?:([.\s,])\d+) capturing group inside this non-capturing group helps us to check for following delimiters are of the same type. ie, the above regex won't match the strings like 2,3 5.6
DEMO
You can use this regex:
^(?!.*?(\b\d+)\W+\1\b)\d+(\W+\d+)*$
Negative lookahead (?!.*?(\b\d+)\W+\1\b) avoids the match when 2 similar numbers appear one after another separated by 1 or more non-word characters.
RegEx Demo
Here is the solution that fit my current need:
^(?>(?!\2\b|\3\b)(1\d{1}|2[0-2]{1}|\d{1}+)[,.; ]+)(?>(?!\1\b|\3\b)(1\d{1}|2[0-2]{1}|\d{1}+)[,.; ]+)(?>(?!\1\b|\2\b)(1\d{1}|2[0-2]{1}|\d{1}+))$
It returns all the sequences with unique numbers divided by one or more separator and also limit the number itself from 1 to 22, allowing only 3 numbers in the sequence.
See working example
Yet, it's not perfect, but work fine! Thanks a lot to everyone who gave me a hand on this!
I am using some data which gives paths for google maps either as a path or a set of two latitudes and longitudes. I have stored both values as a BLOB in a mySql database, but I need to detect the values which are not paths when they come out in the result. In an attempt to do this, I have saved them in the BLOB in the following format:
array(lat,lng+lat,lng)
I am using preg_match to find these results, but i havent managed to get any to work. Here are the regex codes I have tried:
^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}[1-9\.\,\+]{1*}[\)]{1}^
^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}(\-?\d+(\.\d+)?),(\-?\d+(\.\d+)?)\+(\-?\d+(\.\d+)?),(\-?\d+(\.\d+)?)[\)]{1}^
Regex confuses me sometimes (as it is doing now). Can anyone help me out?
Edit:
The lat can be 2 digits followed by a decimal point and 8 more digits and the lng can be 3 digits can be 3 digits follwed by a decimal point and 8 more digits. Both can be positive or negative.
Here are some example lat lngs:
51.51160000,-0.12766000
-53.36442000,132.27519000
51.50628000,0.12699000
-51.50628000,-0.12699000
So a full match would look like:
array(51.51160000,-0.12766000+-53.36442000,132.27519000)
Further Edit
I am using the preg_match() php function to match the regex.
Here are some pointers for writing regex:
If you have a single possibility for a character, for example, the a in array, you can indeed write it as [a]; however, you can also write it as just a.
If you are looking to match exactly one of something, you can indeed write it as a{1}, however, you can also write it as just a.
Applying this lots, your example of ^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}[1-9\.\,\+]{1*}[\)]{1}^ reduces to ^array\([1-9\.\,\+]{1*}\)^ - that's certainly an improvement!
Next, numbers may also include 0's, as well as 1-9. In fact, \d - any digit - is usually used instead of 1-9.
You are using ^ as the delimiter - usually that is /; I didn't recognize it at first. I'm not sure what you can use for the delimiter, so, just in case, I'll change it to the usual /.This makes the above regex /array\([\d\.\,\+]{1*}\)/.
To match one or more of a character or character set, use +, rather than {1*}. This makes your query /array\([\d\.\,\+]+\)/
Then, to collect the resulting numbers (assuming you want only the part between the brackets, put it in (non-escaped) brackets, thus: /array\(([\d\.\,\+]+)\)/ - you would then need to split them, first by +, then by ,. Alternatively, if there are exactly two lat,lng pairs, you might want: /array\(([\d\.]+),([\d\.]+)\+([\d\.]+),([\d\.]+)\)/ - this will return 4 values, one for each number; the additional stuff (+, ,) will already be removed, because it is not in (unescaped) brackets ().
Edit: If you want negative lats and longs (and why wouldn't you?) you will need \-? (a "literal -", rather than part of a range) in the appropriate places; the ? makes it optional (i.e. 0 or 1 dashes). For example, /array\((\-?[\d\.]+),(\-?[\d\.]+)\+(\-?[\d\.]+),(\-?[\d\.]+)\)/
You might also want to check out http://regexpal.com - you can put in a regex and a set of strings, and it will highlight what matches/doesn't match. You will need to exclude the delimiter / or ^.
Note that this is a little fast and loose; it would also match array(5,0+0,1...........). You can nail it down a little more, for example, by using (\-?\d*\.\d+)\) instead of (\-?[\d\.]+)\) for the numbers; that will match (0 or 1 literal -) followed by (0 or more digits) followed by (exactly one literal dot) followed by (1 or more digits).
This is the regex I made:
array\((-*\d+\.\d+),(-*\d+\.\d+)\+(-*\d+\.\d+),(-*\d+\.\d+)\)
This also breaks the four numbers into groups so you can get the individual numbers.
You will note the repeated pattern of
(-*\d+\.\d+)
Explanation:
-* means 0 or more matches of the - sign ( so - sign is optional)
\d+ means 1 or more matches of a number
\. means a literal period (decimal)
\d+ means 1 or more matches of a number
The whole thing is wrapped in brackets to make it a captured group.
I am using the following regex to match an account number. When we originally put this regex together, the rule was that an account number would only ever begin with a single letter. That has since changed and I have an account number that has 3 letters at the beginning of the string.
I'd like to have a regex that will match a minimum of 1 letter and a maximum of 3 letters at the beginning of the string. The last issue is the length of the string. It can be as long as 9 characters and a minimum of 3.
Here is what I am currently using.
'/^([A-Za-z]{1})([0-9]{7})$/'
Is there a way to match all of this?
You want:
^[A-Za-z]([A-Za-z]{2}|[A-Za-z][0-9]|[0-9]{2})[0-9]{0,6}$
The initial [A-Za-z] ensures that it starts with a letter, the second bit ([A-Za-z]{2}|[A-Za-z][0-9]|[0-9]{2}) ensures that it's at least three characters long and consists of between one and three letters at the start, and the final bit [0-9]{0,6} allows you to go up to 9 characters in total.
Further explaining:
^ Start of string/line anchor.
[A-Za-z] First character must be alpha.
( [A-Za-z]{2} Second/third character are either alpha/alpha,
|[A-Za-z][0-9] alpha/digit,
|[0-9]{2} or digit/digit
) (also guarantees minimum length of three).
[0-9]{0,6} Then up to six digits (to give length of 3 thru 9).
$ End of string/line marker.
Try this:
'/^([A-Za-z]{1,3})([0-9]{0,6})$/'
That will give you from 1 to 3 letters and from 3 to 9 total characters.