I've been struggling to achieve regex with the operator or.
For example
Having the following chain:
Allowed numbers: 1, 2, 5, 6, 20
"/path/item/1"
"/path/item/2"
"/path/item/5"
etc
The regex that I have been testing is:
"/\/path\/item\/(1|2|5|6|20)/"
What I want is for regex to return true only if it is 1 or 2 or 5 or 6, etc.
But for the example of the number 20, the regex returns true for 2 and not for 20.
How can I validate each value independently, that is to say that it is only true if it is 2 and not 20. But true when it is 20 but not 2.
How would the regex be to implement this validation?
Ejemplo
You need to restrict the search such that the matched digits bring you to the end of the string:
"/\/path\/item\/(1|2|5|6|20)$/"
This will mean that the digits must exactly match, and does not involve any re-ordering of the permitted values in your regex.
Demonstrated here
The key is to add the large numbers first in the capturing or non-capturing group, such as:
^\/path\/item\/(20|1|2|5|6)$
or
^\/path\/item\/(?:20|1|2|5|6)$
or
\/path\/item\/(?:20|1|2|5|6)
Test
$re = '/^\/path\/item\/(20|1|2|5|6)$/s';
$str = '/path/item/20';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
The expression is explained on the top right panel of this demo, if you wish to explore further or modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.
Problem with your code was, whenever you sent 20 to match, 2 was matched first and was ignored as there also was 0 following. This can be resolved by giving 20 first, like this:
\/path\/item\/(20|1|2|5|6)\/
View Here: https://regex101.com/r/aJf1Q8/1
Related
I'm trying to parse all numbers from a text:
text 2030 text 2,5 text 2.000.000 2,000,000 -200 +31600000000. 200. 2.5 200? 1:200
Based on this regex:
(?<!\S)(\-?|\+?)(\d*\.?\,?\d+|\d{1,3}(,?.?\d{3})*(\.\,\d+)?)(?!\S)
But endings like ., ?, !, , right after the number doesn't match. I only want full matches with preg_match_all. (see image)
I guess that the problem is in the last part of my regex (?!\S). I tried different things but I can't figured it out how to solve this.
If we don't wish to validate our numbers, maybe we could then start with a simple expression, maybe something similar to:
(?:^|\s)([+-]?[\d:.,]*\d)\b
Test
$re = '/(?:^|\s)([+-]?[\d:.,]*\d)\b/s';
$str = 'text 2030 text 2,5 text 2.000.000 2,000,000 -200 +31600000000. 200. 2.5 200? 1:200
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
In the right panel of this demo, the expression is further explained, if you might be interested.
EDITS:
Another expression would be:
(?:^|\s)\K[+-]?(?:\d+:\d+|\d+(?:[.,]\d{1,3})+|\d+)\b
which would not still validate our numbers and just collect those listed, with some invalid numbers.
DEMO 2
DEMO 3
I'm trying to remove / detect phone numbers from messages between users of my marketplace website (think eBay does something similar)
this is the code I'm using:
$string = preg_replace('/([0-9]+[\- ]?[0-9]+)/', '', $string);
BUT... it's too aggressive and it does strip away any number with 2 or more numerals... how can set a limit of say 7 numbers instead?
to be more precise the phone numbers can be any format like
3747657654
374-7657654
374-765-7654
(374)765-7654
etc...(i cannot predict what the users will write depending of their habits)
Try this regular expression :
/([0-9]+[\- ]?[0-9]{6,})/
changed to match your samples:
Regex101
That would depend on the exact requirements as now you have 1 or more numbers followed by an optional - or space followed by 1 or more numbers again.
If you wanted for example at least 2 numbers before the space or - followed by at least 5 numbers, you could use something like:
$string = preg_replace('/([0-9]{2,}[\- ]?[0-9]{5,})/', '', $string);
^^^^ Here you can specify mininimum / maximum
^^^^ Here you can specify mininimum / maximum
You can try something like this:
$string = preg_replace('/(?<![0-9]|[0-9]-)[0-9](?:[- ]?[0-9]){6}(?!-?[0-9])/', '', $string);
The lookarounds are here to avoid numbers with more than 7 digits, but if you want something more specific, you should provide an example string.
It is impossible to determine whether a number of X digits (where X is a valid phone number length) is a phone number or something else without some sort of context intelligence happening. A simple regex can't determine the difference between "call me at 3453456" and "call me when you've flown 3453456 miles".
Therefore trying to catch phone numbers without any formatting (just straight digits) with a regex is hopeless, pure and simple. Attempting to do so is only holding you back from finding a regex that can find formatted/semi-formatted numbers. What you should be going for here is "get the obvious and as many others as possible with minimal false positives...but recognize I can't get them all."
For that I'd recommend this:
/1?[ \-]?\(?([0-9]{3})?\)?[ \-]?([0-9]{3})[ \-]([0-9]{4})/g
It should not get the first three, but get all the rest in this list:
no-match: 3747657654
no-match: 444444444444444
no-match: 7657654
match: 374-765-7654
match: 1-374-765-7654
match: (374)765-7654
match: (374) 765 7654
match: 765-7654
match: 1 (374) 765 7654
match: 1(374)765 7654
I want to make a regex where I can find the exact number in between a string.
eg. finding the number 2 in 3, 5, 25, 22,2, 15
What I have is /*,2,*/.
But with this regex it matches 22,25 or just anything with a 2 in it. I want it where only match where the number 2 itself is between the commas or without the commas standing alone.
*Update
Both the number(needle) i look for and string(haystack) where i seek it can vary.
Eg if the number i seek is always 2
I want to find them in 2,3,44,23,22,1 or 3,4,22,5,2 or 2 and i should be able to find one match for each of the group of numbers.
You should probably use boundaries (\b) so a leading/trailing comma isn't required.
/\b2\b/
You should do this instead:
,(\d), #for any single digit
,(2), #for 2 in particular
Demo: http://regex101.com/r/vP6jI1
I'm using this regex to mach some words without numbers and it works well
(?:searchForThis|\G).+?(\b[^\d\s]+?\b)
The problem that Regex searching the entire document and not only in the line that contains searchForThis
So if I have 2 times searchForThis it will take them twice
I want to stop it only on that 1st line so it will not search the other lines after
Any help please?
I'm using Regex with php
Example of the problem here: http://www.rubular.com/r/vPhk8VbqZR
In the example you will see :
Match 1
1. word
Match 2
1. worldtwo
Match 3
1. wordfive
Match 4
1. word
Match 5
1. worldtwo
Match 6
1. wordfive
But I need only :
Match 1
1. word
Match 2
1. worldtwo
Match 3
1. wordfive
You will see that it's doing twice
===========Edit for more details as asked ===========================
In my php I have :
define('CODE_REGEX', '/(?:searchForThis|\G(?<!^)).*?(\b[a-zA-Z]+\b)/iu')
Output :
if (preg_match_all(CODE_REGEX, $content, $result))
return trim($result[1][0].' '.$result[1][1].' '.$result[1][2].' '.$result[1][3].' '.$result[1][4].' '.$result[1][5]);
Thank you
You can use this pattern instead:
(?:\A[\s\S]*?searchForThis|\G).*?(\b[a-z]+\b)/iu
or
(?:\A(?s).*?searchForThis|\G)(?-s).*?(\b[a-z]+\b)/iu
To deal with multiple line between the first "searchForThis" and others or the end of the string, you can use this: (with your example string you will obtain "After" and "this".)
(?:\A.*?searchForThis|\G)(?>[^a-z]++|\b[a-z]++\S)*?(?!searchForThis)(\b[a-z]+\b)/ius
Note: in all the three pattern you can replace \A with ^ since the multiline mode is not used. Be carefull with rubular that is designed for ruby regexes: m in ruby = s in php (that is the dotall/singleline mode), m in php is the multiline mode (each start of the line can be matched with ^)
You can make it in two stages :
// get the first line with 'searchForThis'
preg_match('/searchForThis(?<line>.*)\n/m', $text, $results);
$line = $results['line'];
// get every word from this line
preg_match_all('/\b[a-z]+\b/i', $line, $results);
$words = $results[0];
Another way, based on the great Casimir's answer (just for readibility) :
preg_match_all('/(?s:^.*?searchForThis|\G).*?(?<words>\b[a-z]+\b)/iu', $str, $results);
$words = $results['words'];
I think it only happens when I write a regex. I have a simple regex to validate a set of pagination numbers, that later will be submitted to database, like 5, 10, 25, 50, 100, 250 example:
/all|5|10|25|50|100|250/
When I perform a test, my regex above cuts 0 only from numbers 50, 100 and 250 but not from 10!!?
Online example:
http://viper-7.com/IbKFKw
What am I doing wrong here? What am I really missing this time?
This is because in the string 50, the regex first matches 5, which is valid. In the string 250, the regex first matches 25, which is valid and ends here.
You might try adding anchors:
/^(?:all|5|10|25|50|100|250)$/
This forces the regex to match the whole string, and hence, return the correct match you are looking for.
The alternatives are tried from left to right, so matching 5 takes precedence over 50. But there's no 1 to cut off the 0 from 10. You can simply reorder them:
/all|250|100|50|25|10|5/
Alternatively, add the 0 optionally to the relevant alternatives (and since ? is greedy, the 0 will be matched if present):
/all|50?|100?|250?/
or
/all|(?:5|10|25)0?/
If this is not for matching but for validation (i.e. checking against the entire string), then go with Jerry's suggestion and use anchors to make sure that there are no undesired characters around your number:
/^(?:all|5|10|25|50|100|250)$/
(Of course inside (?:...) you could also use any of my above patterns, but now precedence is irrelevant because incomplete matches are disallowed.)