Problem
I have a working port validity checker, however I need to separate the ports by a comma (no spaces). For instance instead of just '80' being valid, now '80,443,8080' would be valid.
Regex
(^(?:6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9])$
Tried
I realise that I may need to break the query up, so tried many things including appending this (,\n|,?$) to the end of the query, however this did not work.
Since it is PHP, with PCRE regex flavor, you can easily recurse the subpattern with a subroutine:
^(6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9])(?:,(?1))*$
See the regex demo
Explanation:
^ - start of string
(6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9]) - Group 1: the single port validation subpattern
(?:,(?1))* - 0+ sequences of , followed with the subpattern above (with the (?1) subroutine that re-uses the pattern inside the Group 1)
$ - end of string
Other way, you can negate what you are looking for. This means:
5 digits numbers upper to 65535, or at least 6 digit numbers
contiguous commas, or commas at the end or beginning of the string
characters that are not a digit or a comma
To use this kind of pattern, you obviously need to negate the preg_match function:
return !preg_match('~6(?:5(?:5(?:3(?:[6-9]|\d{2})|[4-9]\d)|[6-9]\d{2})|[6-9]\d{3})|[7-9]\d{4}|[1-9]\d{5}|\B,|,\B|[^\d,]~S', $str);
The S modifier switches on the optimization for non-anchored patterns.
This approach can be interesting since the search stops as soon as the pattern succeeds.
Related
I want to validate MAC address and allow to use only one kind of delimiter.
I use pattern :
^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$
It works fine, but there is some bug.
For example: 01-23:45:67:89-AB is valid according pattern. How to allow use only one kind of delimiter ?
Thanks.
You may use
^[0-9A-Fa-f]{2}(?=([:-]))(?:\1[0-9A-Fa-f]{2}){5}$
See the regex demo
Details
^ - start of a string
[0-9A-Fa-f]{2} - two hex chars
(?=([:-])) - the next char must be a : or -, this value is captured into Group 1
(?:\1[0-9A-Fa-f]{2}){5} - exactly five occurrences of
\1 - the same char that is stored in Group 1 buffer
[0-9A-Fa-f]{2} - two hex chars
$ - end of string.
Alternatively, to shorten the pattern a bit, you may also use
^([0-9A-Fa-f]{2})(?=([:-]))(?:\2(?1)){5}$
See this regex demo. You may also use a case insensitive modifier to "shrink" it even more: '~^([0-9A-F]{2})(?=([:-]))(?:\2(?1)){5}$~i'. The thing is that the first part of the pattern, ([0-9A-Fa-f]{2}), is captured, and (?1) recurses the pattern later (so that you do not need to write it again).
Today I was working with regular expressions at work and during some experimentation I noticed that a regex such as (\w|) compiled. This seems to be an optional group but looking online didn't yield any results.
Is there any practical use of having a group that matches something, but otherwise can match anything? What's the difference between that and (\w|.*)? Thanks.
(\w|) is a verbose way of writing \w?, which checks for \w first, then empty string.
I remove the capturing group, since it seems that () is used for grouping property only. If you actually need the capturing group, then (\w?).
On the same vein, (|\w) is a verbose way of writing \w??, which tries for empty string first, before trying for \w.
(\w|.*) is a different regex altogether. It tries to match (in that order) one word character \w, or 0 or more of any character (except line terminators) .*.
I can't imagine how this regex fragment would be useful, though.
I am tired of being frightened of regular expressions. The topic of this post is limited to PHP implementation of regular expressions, however, any generic regular expression advice would obviously be appreciated (i.e. don't confuse me with scope that is not applicable to PHP).
The following (I believe) will remove any whitespace between numbers. Maybe there is a better way to do so, but I still want to understand what is going on.
$pat="/\b(\d+)\s+(?=\d+\b)/";
$sub="123 345";
$string=preg_replace($pat, "$1", $sub);
Going through the pattern, my interpretation is:
\b A word boundary
\d+ A subpattern of 1 or more digits
\s+ One or more whitespaces
(?=\d+\b) Lookahead assertion of one or more digit followed by a word boundary?
Putting it all together, search for any word boundary followed by one or more digits and then some whitespace, and then do some sort of lookahead assertion on it, and save the results in $1 so it can replace the pattern?
Questions:
Is my above interpretation correct?
What is that lookahead assertion all about?
What is the purpose of the leading / and trailing /?
Is my above interpretation correct?
Yes, your interpretation is correct.
What is that lookahead assertion all about?
That lookahead assertion is a way for you to match characters that have a certain pattern in front of them, without actually having to match the pattern.
So basically, using the regex abcd(?=e) to match the string abcde will give you the match: abcd.
The reason that this matches is that the string abcde does in fact contain:
An a
Followed by a b
Followed by a c
Followed by a d that has an e after it (this is a single character!)
It is important to note that after the 4th item it also contains an actual "e" character, which we didn't match.
On the other hand, trying to match the string against the regex abcd(?=f) will fail, since the sequence:
"a", followed by "b", followed by "c", followed by "d that has an f in front of it"
is not found.
What is the purpose of the leading / and trailing /
Those are delimiters, and are used in PHP to distinguish the pattern part of your string from the modifier part of your string. A delimiter can be any character, although I prefer # signs myself. Remember that the character you are using as a delimiter needs to be escaped if it is used in your pattern.
It would be a good idea to watch this video, and the 4 that follow this:
http://blog.themeforest.net/screencasts/regular-expressions-for-dummies/
The rest of the series is found here:
http://blog.themeforest.net/?s=regex+for+dummies
A colleague sent me the series and after watching them all I was much more comfortable using Regular Expressions.
Another good idea would be installing RegexBuddy or Regexr. Especially RegexBuddy is very useful for understanding the workings of a regular expression.
I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?
In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)
I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo
I am trying to validate a string of 3 numbers followed by / then 5 more numbers
I thought this would work
(/^([0-9]+[0-9]+[0-9]+/[0-9]+[0-9]+[0-9]+[0-9]+[0-9])/i)
but it doesn't, any ideas what i'm doing wrong
Try this
preg_match('#^\d{3}/\d{5}#', $string)
The reason yours is not working is due to the + symbols which match "one or more" of the nominated character or character class.
Also, when using forward-slash delimiters (the characters at the start and end of your expression), you need to escape any forward-slashes in the pattern by prefixing them with a backslash, eg
/foo\/bar/
PHP allows you to use alternate delimiters (as in my answer) which is handy if your expression contains many forward-slashes.
First of all, you're using / as the regexp delimiter, so you can't use it in the pattern without escaping it with a backslash. Otherwise, PHP will think that you're pattern ends at the / in the middle (you can see that even StackOverflow's syntax highlighting thinks so).
Second, the + is "greedy", and will match as many characters as it can, so the first [0-9]+ would match the first 3 numbers in one go, leaving nothing for the next two to match.
Third, there's no need to use i, since you're dealing with numbers which aren't upper- or lowercase, so case-sensitivity is a moot point.
Try this instead
/^\d{3}\/\d{5}$/
The \d is shorthand for writing [0-9], and the {3} and {5} means repeat 3 or 5 times, respectively.
(This pattern is anchored to the start and the end of the string. Your pattern was only anchored to the beginning, and if that was on purpose, the remove the $ from my pattern)
I recently found this site useful for debugging regexes:
http://www.regextester.com/index2.html
It assumes use of /.../ (meaning you should not include those slashes in the regex you paste in).
So, after I put your regex ^([0-9]+[0-9]+[0-9]+/[0-9]+[0-9]+[0-9]+[0-9]+[0-9]) in the Regex box and 123/45678 in the Test box I see no match. When I put a backslash in front of the forward slash in the middle, then it recognizes the match. You can then try matching 1234/567890 and discover it still matches. Then you go through and remove all the plus signs and then it correctly stops matching.
What I particularly like about this particular site is the way it shows the partial matches in red, allowing you to see where your regex is working up to.