I am new to regular expression and trying to match the following pattern using regular expression:
Groups of numbers, each looks like either a single number like 12, or a number range like 19-39
Groups are separated by semicolon(;)
All numbers are within range 1-48 (but we don't need to verify this in regular expression)
So an example match would be 12;13;19-39;43
For a single group, I can think of using
\b[1-9]{1}|[1-9]{1}[0-9]{1}\b
for single number, and
\b[1-9]{1}|[1-9]{1}[0-9]{1}-[1-9]{1}|[1-9]{1}[0-9]{1}\b
for number range.
The question is how to take the semicolon(;) into consideration also: any number of the above groups of number(s) connected by ; can be matched.
This should exactly match your requirement:
\d*[0-9](|-\d*[0-9]|;\d*[0-9])*$
Explanation:
Match any digit multiple times.
Next, check for a - or ; followed by another series of digits.
Repeat this till matches are found.
Try it out here:
http://gskinner.com/RegExr/
You can paste sample text in the big text area and see the exp in action. Cheers!
Try this:
/^\d*[0-9](|.\d*[0-9]|;\d*[0-9])*$/;
Its matches your requirement.
One trick to learning these is to try and break it into parts and write brutal ones to start:
1-48 alone ending in ; you can be as complicated as: ((\d)|([1-3]\d)|(4[0-8]));
for dashed groups just the same components repeated with a dash: ((\d)|([1-3]\d)|(4[0-8]))-((\d)|([1-3]\d)|(4[0-8]));
Now Combine to get either / or and repeat the whole group: ((((\d)|([1-3]\d)|(4[0-8]));)|(((\d)|([1-3]\d)|(4[0-8]))-((\d)|([1-3]\d)|(4[0-8]));))*
Now we have this gross, brute force, regex with a ridiculous number of groupings above, but it works. Next we can think about simplifying and you have an even better place (sort of) to start asking for help from.
Was going to start simplifying, but you have a other answers here already.
Simplifying a little and just noting your final number does not end with a semicolon you can start with merging with something like #Sunny has:
^((\d)|([1-3]\d)|(4[0-8]))(|-((\d)|([1-3]\d)|(4[0-8]))|;((\d)|([1-3]\d)|(4[0-8])))*$
Related
Assuming I have a set of numbers (from 1 to 22) divided by some trivial delimiters (comma, point, space, etc). I need to make sure that this set of numbers does not contain any repetition of the same number. Examples:
1,14,22,3 // good
1,12,12,3 // not good
Is it possible to do via regular expression?
I know it's easy to do using just php, but I really wander how to make it work with regex.
Yes, you could achieve this through regex via negative looahead.
^(?!.*\b(\d+)\b.*\b\1\b)\d+(?:,\d+)+$
(?!.*\b(\d+)\b.*\b\1\b) Negative lookahead at the start asserts that the there wouldn't be a repeated number present in the match. \b(\d+)\b.*\b\1\b matches the repeated number.
\d+ matches one or more digits.
(?:,\d+)+ One or more occurances of , , one or more digits.
$ Asserts that we are at the end .
DEMO
OR
Regex for the numbers separated by space, dot, comma as delimiters.
^(?!.*\b(\d+)\b.*\b\1\b)\d+(?:([.\s,])\d+)(?:\2\d+)*$
(?:([.\s,])\d+) capturing group inside this non-capturing group helps us to check for following delimiters are of the same type. ie, the above regex won't match the strings like 2,3 5.6
DEMO
You can use this regex:
^(?!.*?(\b\d+)\W+\1\b)\d+(\W+\d+)*$
Negative lookahead (?!.*?(\b\d+)\W+\1\b) avoids the match when 2 similar numbers appear one after another separated by 1 or more non-word characters.
RegEx Demo
Here is the solution that fit my current need:
^(?>(?!\2\b|\3\b)(1\d{1}|2[0-2]{1}|\d{1}+)[,.; ]+)(?>(?!\1\b|\3\b)(1\d{1}|2[0-2]{1}|\d{1}+)[,.; ]+)(?>(?!\1\b|\2\b)(1\d{1}|2[0-2]{1}|\d{1}+))$
It returns all the sequences with unique numbers divided by one or more separator and also limit the number itself from 1 to 22, allowing only 3 numbers in the sequence.
See working example
Yet, it's not perfect, but work fine! Thanks a lot to everyone who gave me a hand on this!
I am using some data which gives paths for google maps either as a path or a set of two latitudes and longitudes. I have stored both values as a BLOB in a mySql database, but I need to detect the values which are not paths when they come out in the result. In an attempt to do this, I have saved them in the BLOB in the following format:
array(lat,lng+lat,lng)
I am using preg_match to find these results, but i havent managed to get any to work. Here are the regex codes I have tried:
^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}[1-9\.\,\+]{1*}[\)]{1}^
^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}(\-?\d+(\.\d+)?),(\-?\d+(\.\d+)?)\+(\-?\d+(\.\d+)?),(\-?\d+(\.\d+)?)[\)]{1}^
Regex confuses me sometimes (as it is doing now). Can anyone help me out?
Edit:
The lat can be 2 digits followed by a decimal point and 8 more digits and the lng can be 3 digits can be 3 digits follwed by a decimal point and 8 more digits. Both can be positive or negative.
Here are some example lat lngs:
51.51160000,-0.12766000
-53.36442000,132.27519000
51.50628000,0.12699000
-51.50628000,-0.12699000
So a full match would look like:
array(51.51160000,-0.12766000+-53.36442000,132.27519000)
Further Edit
I am using the preg_match() php function to match the regex.
Here are some pointers for writing regex:
If you have a single possibility for a character, for example, the a in array, you can indeed write it as [a]; however, you can also write it as just a.
If you are looking to match exactly one of something, you can indeed write it as a{1}, however, you can also write it as just a.
Applying this lots, your example of ^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}[1-9\.\,\+]{1*}[\)]{1}^ reduces to ^array\([1-9\.\,\+]{1*}\)^ - that's certainly an improvement!
Next, numbers may also include 0's, as well as 1-9. In fact, \d - any digit - is usually used instead of 1-9.
You are using ^ as the delimiter - usually that is /; I didn't recognize it at first. I'm not sure what you can use for the delimiter, so, just in case, I'll change it to the usual /.This makes the above regex /array\([\d\.\,\+]{1*}\)/.
To match one or more of a character or character set, use +, rather than {1*}. This makes your query /array\([\d\.\,\+]+\)/
Then, to collect the resulting numbers (assuming you want only the part between the brackets, put it in (non-escaped) brackets, thus: /array\(([\d\.\,\+]+)\)/ - you would then need to split them, first by +, then by ,. Alternatively, if there are exactly two lat,lng pairs, you might want: /array\(([\d\.]+),([\d\.]+)\+([\d\.]+),([\d\.]+)\)/ - this will return 4 values, one for each number; the additional stuff (+, ,) will already be removed, because it is not in (unescaped) brackets ().
Edit: If you want negative lats and longs (and why wouldn't you?) you will need \-? (a "literal -", rather than part of a range) in the appropriate places; the ? makes it optional (i.e. 0 or 1 dashes). For example, /array\((\-?[\d\.]+),(\-?[\d\.]+)\+(\-?[\d\.]+),(\-?[\d\.]+)\)/
You might also want to check out http://regexpal.com - you can put in a regex and a set of strings, and it will highlight what matches/doesn't match. You will need to exclude the delimiter / or ^.
Note that this is a little fast and loose; it would also match array(5,0+0,1...........). You can nail it down a little more, for example, by using (\-?\d*\.\d+)\) instead of (\-?[\d\.]+)\) for the numbers; that will match (0 or 1 literal -) followed by (0 or more digits) followed by (exactly one literal dot) followed by (1 or more digits).
This is the regex I made:
array\((-*\d+\.\d+),(-*\d+\.\d+)\+(-*\d+\.\d+),(-*\d+\.\d+)\)
This also breaks the four numbers into groups so you can get the individual numbers.
You will note the repeated pattern of
(-*\d+\.\d+)
Explanation:
-* means 0 or more matches of the - sign ( so - sign is optional)
\d+ means 1 or more matches of a number
\. means a literal period (decimal)
\d+ means 1 or more matches of a number
The whole thing is wrapped in brackets to make it a captured group.
I want to highlight a group of words, they can appear single or in a row. I'd like them to be highlighted together if they appear one after the other, and if they don't, they should also be highlighted, like the normal behavior. For instance, if I want to highlight the words:
results as
And the subject is:
real time results: shows results as you type
I'd like the result to be:
real time results: shows <span class="highlighted"> results as </span> you type
The whitespaces are also a headache, because I tried using an or expression:
( results )|( as )
with whitespaces to prevent highlighting words like bass, crash, and so on. But since the whitespace after results is the same as the whitespace before as, the regexp ignores it and only highlights results.
It can be used to highlighted many words so combinations of
( (one) (two) )|( (two) (one) )|( one )|( two )
are not an option :(
Then I thought that there may be an operator that worked like | that could be use to match both if possible, else one, or the other.
Using spaces to ensure you match full words is the wrong approach. That's what word boundaries are for: \b matches a position between a word and a non-word character (where word characters usually are letters, digits and underscores). To match combinations of your desired words, you can simply put them all in an alternation (like you already do), and repeat as often as possible. Like so:
(?:\bresults\b\s*|\bas\b\s*)+
This assumes that you want to highlight the first and separate results in your example as well (which would satisfy your description of the problem).
Perhaps you do not need to match a string of words next to each other. Why not just apply your highlighting like so:
real time results: shows <span class="highlighted">results</span> <span class="highlighted">as</span> you type
The only realy difference is that the space between the words is not highlighted, but it's a clean and easy compromise which will save you hours of work and doesn't seem to hurt the UX in the least (in my opinion).
In that case, you could just use alternation:
\b(results|as)\b
(\b being the word boundary anchor)
If you really don't like the space between words not being highlight, you could write a jQuery function to find "highlighted" spans separated by only white space and then combine them (a "second stage" to achieve your UX design goals).
Update
(OK... so merging spans is actually kind of difficult via jQuery. See Find text between two tags/nodes)
I'm trying to query a database of Book titles based on the first letter of the title. However, I want to ignore common words such as "The" and "A".
So when searching for books that start with the letter "T"
"The Adventures of Huck Finn" - would NOT be matched
"Transformation of a Runner" - would be matched
I'm not very experienced with REGEX, but this is what I have so far (where $first_letter could equal 't')
... WHERE title = '^[(a )(the )]*[$first_letter]' ...
This successfully matches book titles that start with a particular letter even after the words "A" or "The", but doesn't ignore those words. So if $first_letter='t', it would match BOTH books mentioned above.
I've tried googling it, but haven't found any solutions. Any help would be greatly appreciated.
Thanks in advance.
Kevin
Read about MySQL full text search
The regular expression you've written isn't valid. []s are used to denote what is called a character class. Everything you enter between the brackets (with some characters potentially needing to be escaped, such as the literal characters [ and ]) is treated as standing-in for a single character.
edit After re-reading my answer, I realized lookaround wasn't a good way to approach this.
The functionality you're groping for is called negative lookahead, negative lookbehind, or some similar variant. I'm unsure whether MySQL's regex flavor supports it, but I don't think it would be a good fit for this problem.
Alternatively, you could do a regex that looks like this:
^((a|the|of|and) )?[letter of interest]
The breakdown:
There are two groups
The inner-most group looks for instances of words you want to ignore
The outer-most group just adds a space to the end of that
The ? asserts that there could be 0 or 1 instances of this group
You'll have to do the legwork of translating this into MySQL regex syntax yourself. My apologies.
I'm trying to devise a regex pattern (in PHP) which will allow for any alternation of two subpatterns. So if pattern A matches a group of three letters, and B matches a group of 2 numerals, all of these would be OK:
aaa
aaa66bbb
66
67abc
12abc34def56ghi78jkl
I don't mind which subpattern starts or ends the sequence, just that after the first match, the subpatterns must alternate. I'm totally stumped by this - any advice will be gratefully received!
Here's a general solution:
^(?:[a-z]{3}(?![a-z]{3})|[0-9]{2}(?![0-9]{2}))+$
It's a simple alternation--three letters or two digits--but the negative lookaheads ensure that the same alternative is never matched twice in a row. Here's a slightly more elegant solution just for PHP:
/^(?:([a-z]{3})(?!(?1))|([0-9]{2})(?!(?2)))+$/
Instead of typing the same subpatterns multiple times, you can put them capturing groups and use (?1), (?2), etc. to apply them again wherever else you want--in this case, in the lookaheads.
"/^(?:$A(?:$B$A)*$B?|$B(?:$A$B)*$A?)\$/"
will match either pattern A followed by however many alternating pattern B's and pattern A's, and maybe a final B...or a B followed by however many A-B pairs plus an A if it's there.
I've made this a string (and escaped the final $) cause you're going to have some interpolation to do. Make sure $A and $B are in some kind of grouping (like parentheses) if you want the ?'s to match the right thing. In your examples, $A might be '([a-zA-Z]{3})' and $B might be '(\d\d)'.
Note, if you want to match some number of the same letter or digit, or instances of the same set of letters or digits, you'll need to do some magic with backreferences -- probably named ones, since any numbered backreference will depend on the number of capture groups before the one you want (or between the one you want and where you are), but that number gets complicated if the subpatterns have parentheses in them.
Take a look at this (and check conditional subpatterns). I've personally never used them but seems to be what you're looking for.
/\b(?:(([a-z])\2\2)(?:(([0-9])\4)\1)*(?:([0-9])\5)?|(([0-9])\7)(?:(([a-z])\9\9)\6)*(?:([a-z])\10\10)?)\b/
or if you want to allow any non digit char in the group of three:
/\b(?:((\D)\2\2)(?:((\d)\4)\1)*(?:(\d)\5)?|((\d)\7)(?:((\D)\9\9)\6)*(?:(\D)\10\10)?)\b/
This will match any pattern that consists of two alternating groups one group consists of 3 times the same char and the other of 2 times the same digit.
This Regex will match
aaa
11
bbb22
33ccc
ddd44ddd
55eee55
fff66fff66
77ggg77ggg
But not
aaa11bbb