Why does my regex group quantifier not work? - php

I try to write some regex for Dutch license plates (kentekens), the documentation is very clear and I only want to check them on format, not if the actual alpha character is possible for now.
My regex (regex101) looks as follows:
(([0-9]{1,2}|[a-z]{1,3})-([0-9]{2,3}|[a-z]{2,3})-([0-9]{1,2}|[a-z]{1,2})){8}/gi
However this returns no matched, while
([0-9]{1,2}|[a-z]{1,3})-([0-9]{2,3}|[a-z]{2,3})-([0-9]{1,2}|[a-z]{1,2}/gi
does
However I do like to check the total length as well.
JS Demo snippet
const regex = /([0-9]{1,2}|[a-z]{1,3})-([0-9]{2,3}|[a-z]{2,3})-([0-9]{1,2}|[a-z]{1,2})/gi;
const str = `XX-99-99
2​ 1965​ 99-99-XX ​
3​ 1973​ 99-XX-99​
4​ 1978​ XX-99-XX ​
5​ 1991​ XX-XX-99 ​
6​ 1999​ 99-XX-XX ​
7​ 2005​ 99-XXX-9​
8​ 2009​ 9-XXX-99​
9​ 2006​ XX-999-X ​
10​ 2008​ X-999-XX ​
​11 ​2015 ​XXX-99-X`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

This is because {8} quantifier added at the end will act on previous expression, in this case the whole regex, because it is enclosed with parenthesis. See here what matches this regex.
To test for length, use this regex (?=^.{1,8}$)(([0-9]{1,2}|[a-z]{1,3})-([0-9]{2,3}|[a-z]{2,3})-([0-9]{1,2}|[a-z]{1,2})) It uses a lookahead to make sure that the following characters match ^.{1,8}$, which means the whole string should contain between 1 and 8 character, you can adjust it to your needs.

Related

Regex to match four blocks ranges [duplicate]

What is the regular expression (in JavaScript if it matters) to only match if the text is an exact match? That is, there should be no extra characters at other end of the string.
For example, if I'm trying to match for abc, then 1abc1, 1abc, and abc1 would not match.
Use the start and end delimiters: ^abc$
It depends. You could
string.match(/^abc$/)
But that would not match the following string: 'the first 3 letters of the alphabet are abc. not abc123'
I think you would want to use \b (word boundaries):
var str = 'the first 3 letters of the alphabet are abc. not abc123';
var pat = /\b(abc)\b/g;
console.log(str.match(pat));
Live example: http://jsfiddle.net/uu5VJ/
If the former solution works for you, I would advise against using it.
That means you may have something like the following:
var strs = ['abc', 'abc1', 'abc2']
for (var i = 0; i < strs.length; i++) {
if (strs[i] == 'abc') {
//do something
}
else {
//do something else
}
}
While you could use
if (str[i].match(/^abc$/g)) {
//do something
}
It would be considerably more resource-intensive. For me, a general rule of thumb is for a simple string comparison use a conditional expression, for a more dynamic pattern use a regular expression.
More on JavaScript regexes: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
"^" For the begining of the line "$" for the end of it. Eg.:
var re = /^abc$/;
Would match "abc" but not "1abc" or "abc1". You can learn more at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

Check input contains either pure numeric characters or pure alpha character using single regular expression

I have a function which will return true if input is pure numeric or alphabate else it will return false. This function is working fine.
function checktype($a)
{
if (preg_match('/^\d+$/', $a)) { //check numeric (can use other numeric regex also like /^[0-9]+$/ etc)
$return = true;
} else if (preg_match('/^[a-zA-Z]+$/', $a)) { //check alphabates
$return = true;
} else { //others
$return = false;
}
return $return;
}
var_dump(checktype('abcdfekjh')); //bool(true)
var_dump(checktype('1324654')); //bool(true)
var_dump(checktype('1324654hkjhkjh'));//bool(false)
No I tried to optimized this function by removing conditions so I modified code to:
function checktype($a)
{
$return = (preg_match('/^\d+$/', $a) || preg_match('/^[a-zA-Z]+$/', $a)) ? true:false;
return $return;
}
var_dump(checktype('abcdfekjh')); //bool(true)
var_dump(checktype('1324654')); //bool(true)
var_dump(checktype('1324654hkjhkjh'));//bool(false)
Now in third step I tried to merge both regex in single regex so I can avoid two preg_match function and got stuck here:
function checktype($a)
{
return (preg_match('regex to check either numeric or alphabates', $a)) ? true:false;
}
I tried a lot of combinations since 2 days by using OR(!) operator using not operator(?!) but no success at all.
Below some reference website from which i pick expression and made some combinations:
http://regexlib.com/UserPatterns.aspx?authorid=26c277f9-61b2-4bf5-bb70-106880138842
http://www.rexegg.com/regex-conditionals.html
OR condition in Regex
Regex not operator (come to know about NOT operator)
https://www.google.co.in/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=regular+expression+not+condition (come to know about NOT operator)
So here main question is, is there any single regex pattern to check string contains pure numeric value or pure alphabates?
Note: Alternative solution can be check string is alphanumeric and then return true or false accordingly. Also php inbuilt function like is_numeric and is_string can be used, but I am more curious to know the single regex pattern to check weather string conains pure numeric digit or pure alphaba digits.
A one regex to check if a string is all ASCII digits or all ASCII letters is
'/^(?:\d+|[a-zA-Z]+)$/'
See regex demo
This regex has two things your regexps do not have:
a grouping construct (?:....)
an alternation operator |.
Explanation:
^ - start of string
(?:\d+ - one or more digits
| - or...
[a-zA-Z]+) - one or more ASCII letters
$ - end of string
If you need to make it Unicode-aware, use [\p{L}\p{M}] instead of [a-zA-Z] (and \p{N} instead of \d, but not necessary) and use the /u modifier:
'/^(?:\p{N}+|[\p{L}\p{M}]+)$/u'
And in case you want to really check that from the beginning to end, use
'/\A(?:\p{N}+|[\p{L}\p{M}]+)\z/u'
^^ ^^
or
'/^(?:\p{N}+|[\p{L}\p{M}]+)$/Du'
The $ without /D modifier does not match the string at its "very end", it also matches if there is a newline after it as the last character.

How to return false if all numerical values of 0-9 are matching WITH other characters in string?

I am specifically targeting numerical only, So if I am using a phone mask using javascript on front end that filters user input to (000)000-000, basically [2-9] and [0-9] as mask (jquery.maskedinput-1.3.js) and mobile filter...
jQuery(function ($e) {
var isMobile = navigator.userAgent.match(/(iPhone|iPod|iPad|Android|BlackBerry)/);
$e('#refer').val(window.location.href);
if (!(isMobile)) {
$e('#phone').mask('(299)299-9999');
$e('#field_phone_number').mask('299-299-9999');
}
});
For server side I have a regular expression in PHP as (nothing special yet)
function phonenumber($value)
{
return preg_match("/\(?\b[(. ]?[0-9]{3}\)?[). ]?[0-9]{3}[-. ]?[0-9]{4}\b/i", $value);
}
How can a create a regex or php script that targets all numerical values without creating a very long regex for each character? I just want to know if someone types in (222)222-2222, they get a false on the return.
function phonenumber($value)
{
$prefix = '\d{3}'; // You might want to specify '2\d\d' (200 to 299)
$regex = '#^(\('.$prefix.'\)|'.$prefix.')[\s\.-]?\d{3}[\.-]?\d{4}$#';
if (preg_match($regex, $value))
{
// Number is in a suitable format
// Now extract digits -- remove this section to not test repeated pattern
$digits = preg_replace('#[^\d]+#', '', $value);
// All numbers equal are rejected
if (preg_match('#^(\d)\1{9}$#', $digits))
return false;
// end of pattern check
// Otherwise it is accepted
return true;
}
return false; // Not in a recognized format
}
This will accept (299)423-1234 and 277-111-2222, and also (400)1234567 or 4001234567. It will reject (400-1234567 and 400-12-34-56-7. It will also reject (222)222-2222 because of the repeated 2's.
You can use a backreference \1 to detect recurring patterns. In your case you can simply mix in a .* to ignore in-between fillers like ( and -
/(\d)(.*\1){7}/
Will look for a number, and at least 7 repetitions of the same, ignoring any other characters used as filler. This will not ensure that they are consecutive however, so (222)222-8222 would match too.

Regex to match specific string not enclosed by another, different specific string

I need a regex to match a string not enclosed by another different, specific string. For instance, in the following situation it would split the content into two groups: 1) The content before the second {Switch} and 2) The content after the second {Switch}. It wouldn't match the first {Switch} because it is enclosed by {my_string}'s. The string will always look like shown below (i.e. {my_string}any content here{/my_string})
Some more
{my_string}
Random content
{Switch} //This {Switch} may or may not be here, but should be ignored if it is present
More random content
{/my_string}
Content here too
{Switch}
More content
So far I've gotten what is below which I know isn't very close at all:
(.*?)\{Switch\}(.*?)
I'm just not sure how to use the [^] (not operator) with a specific string versus different characters.
It really seems you're trying to use a regular expression to parse a grammar - something that regular expressions are really bad at doing. You might be better off writing a parser to break down your string into the tokens that build it, and then processing that tree.
Perhaps something like http://drupal.org/project/grammar_parser might help.
Try this simple function:
function find_content()
function find_content($doc) {
$temp = $doc;
preg_match_all('~{my_string}.*?{/my_string}~is', $temp, $x);
$i = 0;
while (isset($x[0][$i])) {
$temp = str_replace($x[0][$i], "{REPL:$i}", $temp);
$i++;
}
$res = explode('{Switch}', $temp);
foreach ($res as &$part)
foreach($x[0] as $id=>$content)
$part = str_replace("{REPL:$id}", $content, $part);
return $res;
}
Use it this way
$content_parts = find_content($doc); // $doc is your input document
print_r($content_parts);
Output (your example)
Array
(
[0] => Some more
{my_string}
Random content
{Switch} //This {Switch} may or may not be here, but should be ignored if it is present
More random content
{/my_string}
Content here too
[1] =>
More content
)
You can try positive lookahead and lookbehind assertions (http://www.regular-expressions.info/lookaround.html)
It might look something like this:
$content = 'string of text before some random content switch text some more random content string of text after';
$before = preg_quote('String of text before');
$switch = preg_quote('switch text');
$after = preg_quote('string of text after');
if( preg_match('/(?<=' $before .')(.*)(?:' $switch .')?(.*)(?=' $after .')/', $content, $matches) ) {
// $matches[1] == ' some random content '
// $matches[2] == ' some more random content '
}
$regex = (?:(?!\{my_string\})(.*?))(\{Switch\})(?:(.*?)(?!\{my_string\}));
/* if "my_string" and "Switch" aren't wrapped by "{" and "}" just remove "\{" and "\}" */
$yourNewString = preg_replace($regex,"$1",$yourOriginalString);
This might work. Can't test it know, but i'll update later!
I don't if this is what you're looking for, but to negate more than one character, the regex syntax is:
(?!yourString)
and it is called "negative lookahead assertion".
/Edit:
This should work and return true:
$stringMatchesYourRulesBoolean = preg_match('~(.*?)('.$my_string.')(.*?)(?<!'.$my_string.') ?('.$switch.') ?(?!'.$my_string.')(.*?)('.$my_string.')(.*?)~',$yourString);
Have a look at PHP PEG. It is a little parser written in PHP. You can write your own grammar and parse it. It's going to be very simple in your case.
The grammar syntax and the way of parsing is all explained in the README.md
Extracts from the readme:
token* - Token is optionally repeated
token+ - Token is repeated at least one
token? - Token is optionally present
Tokens may be :
- bare-words, which are recursive matchers - references to token rules defined elsewhere in the grammar,
- literals, surrounded by `"` or `'` quote pairs. No escaping support is provided in literals.
- regexs, surrounded by `/` pairs.
- expressions - single words (match \w+)
Sample grammar: (file EqualRepeat.peg.inc)
class EqualRepeat extends Packrat {
/* Any number of a followed by the same number of b and the same number of c characters
* aabbcc - good
* aaabbbccc - good
* aabbc - bad
* aabbacc - bad
*/
/*Parser:Grammar1
A: "a" A? "b"
B: "b" B? "c"
T: !"b"
X: &(A !"b") "a"+ B !("a" | "b" | "c")
*/
}

PHP Regular Expression Failing

My current regular expression should be correct, though I wouldn't expect so, it doesn't work properly. It won't return "Got Match"
My currrent code is as follows:
$id = "http://steamcommunity.com/id/TestID";
if (preg_match("^http://steamcommunity\.com/id/.*?\n$", $id)) {
print "Got match!\n";
}
You're missing delimiters on your regex:
if (preg_match("#^http://steamcommunity\.com/id/.*?\n$#", $id)) {
^--here ^--here
Note that I've used # as the delimiter here, since that saves you having to escape all of the internal / charrs, if you'd used the traditional / as the delimiter.
You need a delimiter, like this:
if (preg_match("#^http://steamcommunity\.com/id/.*?$#", $id)) {
^ ^
And what's with the newline at the end? Surely you don't need that.
You're missing delimiters. For example:
"#^http://steamcommunity\.com/id/.*?\n$#"
Also, you're trying to match a newline (\n) that isn't in your string.
You need to add the pattern delimiter:
$id = "http://steamcommunity.com/id/TestID";
if (preg_match("#^http://steamcommunity\.com/id/.*?(\n|$)#", $id)) {
print "Got match!\n";
}
There are a couple of things that are wrong with it. First of all, you need to delimit the start and end of your regex with a character. I used #. You're also matching for a new line at the end of your regex, which you don't have and likely won't ever have in your string.
<?php
$id = "http://steamcommunity.com/id/TestID";
if (preg_match("#^http://steamcommunity\.com/id/.*?$#", $id)) {
print "Got match!\n";
}
?>
http://codepad.viper-7.com/L7XctT
First of all, your regex shouldn't even compile because it's missing delimiters.
if (preg_match("~^http://steamcommunity\.com/id/.*?\n$~", $id)) {
^---- these guys here -----^
Second of all, why do you have a \n if your string doesn't contain a new line?
And finally, why are you using regex at all? Effectively, you are just trying to match a constant string. This should be equivalent to what you are trying to match:
if (strpos($id, 'http://steamcommunity.com/id/') === 0) {
You need to have starting and ending delimiter in your pattern like /pattern/ or #pattern# or with brackets (pattern). Why is that? To have some pattern modifiers after ending delimiter like #pattern#i (ignore case)
preg_match('(^http://steamcommunity\.com/id/.*?\n$)', $id)
As the say your patten is start and end wrong. (Delimiter)
But this will be a better match of a 64-bit Steam ID. (Minimum 17 and Maximum 25 numbers)
if( preg_match("#^http://steamcommunity\.com/id/([0-9]{17,25})#i", $id, $matches) )
{
echo "Got match! - ".$matches;
}
I believe that there is no need for you to require that the string must end with a line break.
Explanation.
http://steamcommunity\.com/id/([0-9]{17,25})
^--- string ---^^-- Regexp --^
[0-9] - Match a number between 0 to 9
{17,25} - Make 17 to 25 matches
() - Returns match
Or use pattern as those (It is the same):
/^http:\/\/steamcommunity\.com\/id\/([0-9]{17,25})/i
(^http://steamcommunity\.com/id/([0-9]{17,25}))i
Regular Expressions PHP Tutorial
Online regular expression testing <- Dont use delimiter.
<?php
# URL that generated this code:
# http://txt2re.com/index-php.php3?s=http://steamcommunity.com/id&-1
$txt='http://steamcommunity.com/id';
$re1='(http:\\/\\/steamcommunity\\.com\\/id)'; # HTTP URL 1
if ($c=preg_match_all ("/".$re1."/is", $txt, $matches))
{
$httpurl1=$matches[1][0];
print "($httpurl1) \n";
}
#-----
# Paste the code into a new php file. Then in Unix:
# $ php x.php
#-----
?>
Resorces:
http://txt2re.com/index.php3?s=http://steamcommunity.com/id&-1

Categories