Conditional regex for URL formatting - php

Looking for an efficient regex to do this
/url-example-123.shtml & /url-example-25.shtml
to /url-example-3
Objectives:
remove the .shtml
replace with 3 if the numbers are between 25 and 125

Since you didn't give a language I can only give you a regex. You can use the following to match numbers between 25 & 125: (?:(?:1[0-1][0-9])|(?:12[0-5]))|(?:[^\d](?:2[5-9])|(?:[^\d](?:[3-9][0-9]))). You can then just do a replace with that match and a 3. If there will always be a .shrml at the end then you could add that to the end of the expression.

The question is vague. Not sure what should happen if there is no number, or if the number is outside the given range. Now that you've specified PHP, I've edited my answer:
echo preg_replace(
"/(?<=\/url-example-)(2[5-9]|[4-9][0-9]|1[01][0-9]|12[0-5]).shtml/",
"3",
myUrl);

Related

Combine two regular expressions for php

I have these two regular expression
^(((98)|(\+98)|(0098)|0)(9){1}[0-9]{9})+$
^(9){1}[0-9]{9}+$
How can I combine these phrases together?
valid phone :
just start with : 0098 , +98 , 98 , 09 and 9
sample :
00989151855454
+989151855454
989151855454
09151855454
9151855454
You haven't provided what passes and what doesn't, but I think this will work if I understand correctly...
/^\+?0{0,2}98?/
Live demo
^ Matches the start of the string
\+? Matches 0 or 1 plus symbols (the backslash is to escape)
0{0,2} Matches between 0 and 2 (0, 1, and 2) of the 0 character
9 Matches a literal 9
8? Matches 0 or 1 of the literal 8 characters
Looking at your second regex, it looks like you want to make the first part ((98)|(\+98)|(0098)|0) in your first regex optional. Just make it optional by putting ? after it and it will allow the numbers allowed by second regex. Change this,
^(((98)|(\+98)|(0098)|0)(9){1}[0-9]{9})+$
to,
^(?:98|\+98|0098|0)?9[0-9]{9}$
^ this makes the non-grouping pattern optional which contains various alternations you want to allow.
I've made few more corrections in the regex. Use of {1} is redundant as that's the default behavior of a character, with or without it. and you don't need to unnecessarily group regex unless you need the groups. And I've removed the outer most parenthesis and + after it as that is not needed.
Demo
This regex
^(?:98|\+98|0098|0)?9[0-9]{9}$
matches
00989151855454
+989151855454
989151855454
09151855454
9151855454
Demo: https://regex101.com/r/VFc4pK/1/
However note that you are requiring to have a 9 as first digit after the country code or 0.

Retrieve 0 or more matches from comma separated list inside parenthesis using regex

I am trying to retrieve matches from a comma separated list that is located inside parenthesis using regular expression. (I also retrieve the version number in the first capture group, though that's not important to this question)
What's worth noting is that the expression should ideally handle all possible cases, where the list could be empty or could have more than 3 entries = 0 or more matches in the second capture group.
The expression I have right now looks like this:
SomeText\/(.*)\s\(((,\s)?([\w\s\.]+))*\)
The string I am testing this on looks like this:
SomeText/1.0.4 (debug, OS X 10.11.2, Macbook Pro Retina)
Result of this is:
1. [6-11] `1.0.4`
2. [32-52] `, Macbook Pro Retina`
3. [32-34] `, `
4. [34-52] `Macbook Pro Retina`
The desired result would look like this:
1. [6-11] `1.0.4`
2. [32-52] `debug`
3. [32-34] `OS X 10.11.2`
4. [34-52] `Macbook Pro Retina`
According to the image above (as far as I can see), the expression should work on the test string. What is the cause of the weird results and how could I improve the expression?
I know there are other ways of solving this problem, but I would like to use a single regular expression if possible. Please don't suggest other options.
When dealing with a varying number of groups, regex ain't the best. Solve it in two steps.
First, break down the statement using a simple regex:
SomeText\/([\d.]*) \(([^)]*)\)
1. [9-14] `1.0.4`
2. [16-55] `debug, OS X 10.11.2, Macbook Pro Retina`
Then just explode the second result by ',' to get your groups.
Probably the \G anchor works best here for binding the match to an entry point. This regex is designed for input that is always similar to the sample that is provided in your question.
(?<=SomeText\/|\G(?!^))[(,]? *\K[^,)(]+
(?<=SomeText\/|\G) the lookbehind is the part where matches should be glued to
\G matches where the previous match ended (?!^) but don't match start
[(,]? *\ matches optional opening parenthesis or comma followed by any amount of space
\K resets beginning of the reported match
[^,)(]+ matches the wanted characters, that are none of ( ) ,
Demo at regex101 (grab matches of $0)
Another idea with use of capture groups.
SomeText\/([^(]*)\(|\G(?!^),? *([^,)]+)
This one without lookbehind is a bit more accurate (it also requires the opening parenthesis), of better performance (needs fewer steps) and probably easier to understand and maintain.
SomeText\/([^(]*)\( the entry anchor and version is captured here to $1
|\G(?!^),? *([^,)]+) or glued to previous match: capture to $2 one or more characters, that are not , ) preceded by optional space or comma.
Another demo at regex101
Actually, stribizhev was close:
(?:SomeText\/([^() ]*)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\))
Just had to make that one class expect at least one match
(?:SomeText\/([0-9.]+)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\)) is a little more clear as long as the version number is always numbers and periods.
I wanted to come up with something more elegant than this (though this does actually work):
SomeText\/(.*)\s\(([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?\)
Obviously, the
([^\,]+)?\,?\s?
is repeated 6 times.
(It can be repeated any number of times and it will work for any number of comma-separated items equal to or below that number of times).
I tried to shorten the long, repetitive list of ([^\,]+)?\,?\s? above to
(?:([^\,]+)\,?\s?)*
but it doesn't work and my knowledge of regex is not currently good enough to say why not.
This should solve your problem. Use the code you already have and add something like this. It will determine where commas are in your string and delete them.
Use trim() to delete white spaces at the start or the end.
$a = strpos($line, ",");
$line = trim(substr($line, 55-$a));
I hope, this helps you!

Quick PHP regex for digit format

I just spent hours figuring out how to write a regular expression in PHP that I need to only allow the following format of a string to pass:
(any digit)_(any digit)
which would look like:
219211_2
so far I tried a lot of combinations, I think this one was the closest to the solution:
/(\\d+)(_)(\\d+)/
also if there was a way to limit the range of the last number (the one after the underline) to a certain amount of digits (ex. maximal 12 digits), that would be nice.
I am still learning regular expressions, so any help is greatly appreciated, thanks.
The following:
\d+_\d{1,12}(?!\d)
Will match "anywhere in the string". If you need to have it either "at the start", "at the end" or "this is the whole thing", then you will want to modify it with anchors
^\d+_\d{1,12}(?!d) - must be at the start
\d+_\d{1,12}$ - must be at the end
^\d+_\d{1,12}$ - must be the entire string
demo: http://regex101.com/r/jG0eZ7
Explanation:
\d+ - at least one digit
_ - literal underscore
\d{1,12} - between 1 and 12 digits
(?!\d) - followed by "something that is not a digit" (negative lookahead)
The last thing is important otherwise it will match the first 12 and ignore the 13th. If your number happens to be at the end of the string and you used the form I originally had [^\d] it would fail to match in that specific case.
Thanks to #sln for pointing that out.
You don't need double escaping \\d in PHP.
Use this regex:
"/^(\d+)_(\d{1,12})$/"
\d{1,12} will match 1 to 12 digist
Better to use line start/end anchors to avoid matching unexpected input
Try this:
$regex= '~^/(\d+)_(\d+)$~';
$input= '219211_2';
if (preg_match($regex, $input, $result)) {
print_r($result);
}
Just try with following regex:
^(\d+)_(\d{1,12})$

PHP regular expressions (phonenumber)

I'm having some trouble with a regular expression for phone numbers. I am trying to create a regex that is as broad as possible for european phone numbers. The phone number can start with a + or with two leading 0's, followed by a number in between 0 and 40. this is not necessary however, so this first part can also ignored. After that, it should all be numbers, grouped into pairs of at least two, with a whitespace or a - inbetween the groups.
The regex I have put together can be found below.
/((\+|00)+[0-4]+[0-9]+)?([ -]?[0-9]{2,15}){1,5}/
This should match the following structures
0031 34-56-78
0032123456789
0033 123 456 789
0034-123-456-789
+35 34-56-78
+36123456789
+37 123 456 789
+38-123-456-789
...
What it also matches according to my javascript
+32 a54b 67-0:
So I must have made a mistake somewhere, but I really can't see it. Any help would be appreciated.
The problem is that you don't use anchors ^ $ to define the start and ending of the string and will therefore find a match anywhere in the string.
/^((\+|00)+[0-4]+[0-9]+)?([ -]?[0-9]{2,15}){1,5}$/
Adding anchors will do the trick. More about these meta characters can be found here.
Try this, may be can help you.
if (ereg("^((\([0-9]{3}\) ?)|([0-9]{3}-))?[0-9]{3}-[0-9]{4}$",$var))
{
$valid = true;
}
Put ^ in the beginning of the RegExp and $ in the end.

Regular Expression to match dates in YYYY-MM-DD format

I have a regular expression in PHP that looks for the date in the format of YYYY-MM-DD
What I have is: [\d]{4}-[\d]{2}-[\d]{2}
I'm using preg_match to test the date, the problem is that 2009-11-10 works, but 2009-11-1033434 works as well. It's been awhile since I've done regex, how do I ensure that it stops at the correct spot? I've tried doing /([\d]{4}-[\d]{2}-[\d]{2}){1}/, but it returns the same result.
Any help would be greatly appreciated.
What you need is anchors, specifically ^ and $. The former matches the beginning of the string, the latter matches the end.
The other point I would make is the [] are unnecessary. \d retains its meaning outside of character ranges.
So your regex should look like this: /^\d{4}-\d{2}-\d{2}$/.
^20[0-2][0-9]-((0[1-9])|(1[0-2]))-([0-2][1-9]|3[0-1])$
I added a little extra check to help with the issue of MM and DD getting mixed up by the user. This doesn't catch all date mixups, but does keeps the YYYY part between 2000 and 2029, the MM between 01 and 12 and the DD between 01 and 31
How do you expect your date to be terminated ?
If an end-of-line, then a following $ should do the trick.
If by a non-digit character, then a following negative assertion (?!\d) will similarly work.
you're probably wanting to put anchors on the expression.
i.e.
^[\d]{4}-[\d]{2}-[\d]{2}$
note the caret and dollar sign.
You probably want look ahead assertions (assuming your engine supports them, php/preg/pcre does)
Look ahead assertions (or positive assertions) allow you to say "and it should be followed by X, but X shouldn't be a part of the match). Try the following syntax
\d{4}-\d{2}-\d{2}(?=[^0-9])
The assertion is this part
(?=[^0-9])
It's saying "after my regex, the next character can't be a number"
If that doesn't get you what you want/need, post an example of your input and your PHP code that's not working. Those two items can he hugely useful in debugging these kinds of problems.
[\d]{4}-[\d]{2}-[\d]{2}?
where the question mark means "non-greedy"
You could try putting both a '^' and a '$' symbol at the start and end of your expression:
/^[\d]{4}-[\d]{2}-[\d]{2}$/
which match the start and the end of the string respectively.

Categories