Regular Expression for numbers unless all digits are identical - php

I need help to write a regular expression to match numbers which may be broken up into sections by spaces or dashes e.g:
606-606-606
606606606
123 456-789
However, matches should be rejected if all the digits of the number are identical (or if there are any other characters besides [0-9 -]):
111 111 111
111111111
123456789a
If spaces/dashes weren't allowed, the Regex would be simple:
/^(\d)(?!\1*$)\d*$/
But how would I allow dashes and spaces in the number?
EDIT
How would I allow also letters in the same regex (dashes and spaces shoud be still allowed) e.g:
aaaaa - it's not ok
aa-aaa-aaa-aaaaa - it's not OK
ababab - it's OK
ab-ab-ab - it's OK
This rule checks only numbers.
^(?!(?:(\d)\1+[ -]*)+$)\d[\d- ]+$

Desired results can be achieved by this Regular Expression:
^(?!(?:(\d)\1+[ -]*)+$)\d[\d- ]+$
Live demo
Explanations:
^ # Start of string
(?! # Negative Lookahead to check duplicate numbers
(?: # Non-capturing group
(\d) # Capture first digit
\1+ # More digits same as lately captured one
[ -]* # Any spaces and dashes between
)+ # One or more of what's captured up to now
$ # End of string
) # End of negative lookahead
\d # Start of match with a digit
[\d- ]+ # More than one digit/dash/space
$ # End of string
The theory behind this regex is to use a lookaround to check if string contains any duplicate numbers base on the first captured number. If we have no match in this lookaround, then match it.

Even if you can, i wonder if a regex is the right tool to solve this problem. Just imagine your fellow developers scratching their heads trying to understand your code, how much time do you grant them? Even worse, what if you need to alter the rules?
A small function with some comments could make them happy...
function checkNumberWithSpecialRequirements($number)
{
// ignore given set of characters
$cleanNumber = str_replace([' ', '-'], '', $number);
// handle empty string
if ($cleanNumber == '')
return false;
// check whether non-digit characters are inside
if (!ctype_digit($cleanNumber))
return false;
// check if a character differs from the first (not all equal)
for ($index = 1; $index < strlen($cleanNumber); $index++)
{
if ($cleanNumber[$index] != $cleanNumber[0])
return true;
}
return false;
}

Related

Validate string to contain only qualifying characters and a specific optional substring in the middle

I'm trying to make a regular expression in PHP. I can get it working in other languages but not working with PHP.
I want to validate item names in an array
They can contain upper and lower case letters, numbers, underscores, and hyphens.
They can contain => as an exact string, not separate characters.
They cannot start with =>.
They cannot finish with =>.
My current code:
$regex = '/^[a-zA-Z0-9-_]+$/'; // contains A-Z a-z 0-9 - _
//$regex = '([^=>]$)'; // doesn't end with =>
//$regex = '~.=>~'; // doesn't start with =>
if (preg_match($regex, 'Field_name_true2')) {
echo 'true';
} else {
echo 'false';
};
// Field=>Value-True
// =>False_name
//Bad_name_2=>
Use negative lookarounds. Negative lookahead (?!=>) at the beginning to prohibit beginning with =>, and negative lookbehind (?<!=>) at the end to prohibit ending with =>.
^(?!=>)(?:[a-zA-Z0-9-_]+(=>)?)+(?<!=>)$
DEMO
There is absolutely no requirement for lookarounds here.
Anchors and an optional group will suffice.
Demo
/^[\w-]+(?:=>[\w-]+)?$/
^^^^^^^^^^^^^-- this whole non-capturing group is optional
This allows full strings consisting exclusively of [0-9a-zA-Z-] or split ONCE by =>.
The non-capturing group may occur zero or one time.
In other words, => may occur after one or more [\w-] characters, but if it does occur, it MUST be immediately followed by one or more [\w-] characters until the end of the string.
To cover some of the ambiguity in the question requirements:
If foo=>bar=>bam is valid, then use /^[\w-]+(?:=>[\w-]+)*$/ which replaces ? (zero or one) with * (zero or more).
If foo=>=>bar is valid then use /^[\w-]+(?:(?:=>)+[\w-]+)*$/ which replaces => (must occur once) with (?:=>)+ (substring must occur one or more times).
Well, your character ranges equal to \w, so you could use
^(?!=>)(?:(?!=>$)(?:[-\w]|=>))+$
This construct uses a "tempered greedy token", see a demo on regex101.com.
More shiny, complicated and surely over the top, you could use subroutines as in:
(?(DEFINE)
(?<chars>[-\w]) # equals to A-Z, a-z, 0-9, _, -
(?<af>=>) # "arrow function"
(?<item>
(?!(?&af)) # no af at the beginning
(?:(?&af)?(?&chars)++)+
(?!(?&af)) # no af at the end
)
)
^(?&item)$
See another demo on regex101.com
For the example data, you can use
^[a-zA-Z0-9_-]+=>[a-zA-Z0-9_-]+$
The pattern matches:
^ Start of string
[a-zA-Z0-9_-]+ Match 1+ times any of the listed ranges or characters (can not start with =>)
=> Match literally
[a-zA-Z0-9_-]+ Match again 1+ times any of the listed ranges or characters
$ End of string
Regex demo
If you want to allow for optional spaces:
^\h*[a-zA-Z0-9_-]+\h*=>\h*[a-zA-Z0-9_-]+\h*$
Regex demo
Note that [a-zA-Z0-9_-] can be written as [\w-]

Validate specific string

I would like to validate if user input string is in correct form for further processing / database update.
Form:
elephant1:elephant2:elephant3;cat1:cat2:cat3;unicorn1:unicorn2:unicorn3
: as separator between siblings and ; as separator between groups of siblings
Rules:
There are ALWAYS 3 siblings, since it is meant just for personal bulk import, i just want to avoid mistakes with very long strings. As for the groups, there could be one or more, so group separator not obligatory. Siblings names are letters only with exception of underscore (_) for spaces when there are two or more words in a name.
i was thinking regex, but i am not very familiar with it. If there are any other, simpler ways to achieve this, please suggest.
Valid examples
N-number of groups, separated by semicolon, each of which containing exactly three (3) members separated by punctuation. As mentioned before, names are letters only, with exception of underscore as space for names with multiple words.
VALIDS:
john:mike:dave;jenny:helen:jessica
dog:cat:frog;car:boat:ship;house:flat:shack
meat:vegetable:fruit
UPDATE:
This is what i came up with while trying to understand your answers, it works fine so far
"/^(([a-z]+:[a-z]+:[a-z]+;?)+)$/"
Upgraded to Roman's answer
"/([a-z_]+:[a-z_]+:[a-z_]+;?)+/i"
allowing function to ignore spaces, tabs and allow underscores where items have multiple words.
The solution using preg_match function with specific regex pattern:
$str = 'og:cat:frog;car:boat:ship;house:flat:shack';
if (preg_match("/([a-z_]+:[a-z_]+:[a-z_]+;?)+/i", $str)) {
echo 'valid';
} else {
echo 'invalid';
}
^(?:[a-zA-Z_]+:[a-zA-Z_]+:[a-zA-Z_]+(?:;(?!$)|$))+$ (demo, with multiline flag on)
^ # Anchors to beginning of string
(?: # Opens non-capturing group
[a-zA-Z_]+ # Any number of letters/underscore, one or more times
: # Literal :
[a-zA-Z_]+ # Any number of letters/underscore, one or more times
: # Literal :
[a-zA-Z_]+ # Any number of letters/underscore, one or more times
(?: # Opens non-capturing group
; # Literal ;
(?!$) # Negative Lookahead, ensuring that semi-colons are not at the end of line
| # Or
$ # End of string
) # Closes non-capturing group
)+ # Repeats overall non-capturing-group one or more times
$ # Anchors to end of string
You didn't specify if siblings could be 0 characters, if that's the case, change each [a-zA-Z_]+ to [a-zA-Z_]*
// PHP Code generated by Regex101.
$re = '/^(?:[a-zA-Z_]+:[a-zA-Z_]+:[a-zA-Z_]+(?:;(?!$)|$))+$/m';
$str = 'a_b:bread:stack_overflow;test:this_thing:jane;Get_me:h:down
ab:bread:stack_overflow;test:this_thing:jane;Get_me:h:down
a_b:any other characters break it:stack_overflow;test:this_thing:jane;Get_me:h:down
a_b:bread:format_messed_up-test:this_thing:jane;Get_me:h:down
a_b:bread:stack_overflow;test:this_thing:jane;semi_colon_at_end;';
preg_match_all($re, $str, $matches);
// Print the entire match result
print_r($matches);

Replace content between special characters using preg_replace()

I have a paragraph as -
== one ===
==== two ==
= three ====
etc.
The number of = sign vary in every row of the paragraph.
I want to write a preg_replace() expression that will allow me to replace the texts between the = signs.
example:
== DONE ===
==== DONE ==
= DONE ====
I tried preg_replace("/\=+(.*)\=+/","DONE", $paragraph) but that doesn't work. Where am I going wrong?
You can use:
$str = preg_replace('/^=+\h*\K.+?(?=\h*=)/m', 'DONE', $str);
RegEx Demo
RegEx Breakup:
^ # Line start
=+ # Match 1 or more =
\h* # Match or more horizontal spaces
\K # resets the starting point of the reported match
.+? # match 1 or more of any character (non-greedy)
(?=\h*=) # Lookahead to make sure 0 or more space followed by 1 = is there
You have to place the =s back.
Also, instead of .* use [^=]* (matches characters, which are not =) so that the =s don't get eaten up for the replacement.
Additionally, you don't have to escape =:
preg_replace("/(=+)([^=]*)(=+)/","$1 DONE $3", $paragraph);
See it in action

check two slashes in string

I have following sting. I wanted to know any string has two slashes or not.
$sting = "largeimg/fee0b04800e22590/myimage1.jpg";
I am trying to use the following PHP emthodl
if(preg_match("#^/([A-Za-z]|[0-9])/([A-Za-z]|[0-9]+)$#", $sting))
But it is not working properly. Please help me.
Here is how to do it in regex (see demo):
^([^/]*/){2}
Your code:
if(preg_match("#^([^/]*/){2}#", $sting)) {
// two slashes!
}
Explain Regex
^ # the beginning of the string
( # group and capture to \1 (2 times):
[^/]* # any character except: '/' (0 or more
# times (matching the most amount
# possible))
/ # '/'
){2} # end of \1 (NOTE: because you are using a
# quantifier on this capture, only the LAST
# repetition of the captured pattern will be
# stored in \1)
you could use substr_count(), do:
$sting = "largeimg/fee0b04800e22590/myimage1.jpg";
if(substr_count($sting, '/') == 2) { echo "has 2 slashes"; }
To check for 2 slashes you can use this regex:
preg_match('#/[^/]*/#', $sting)
Several other answers provide regular expressions that work but they do not explain why the expression in the question does not work. The expression is:
#^/([A-Za-z]|[0-9])/([A-Za-z]|[0-9]+)$#
The section ([A-Za-z]|[0-9]) is equivalent to ([A-Za-z0-9]). The extra + in the second similar section makes that part quite different. The + is of higher precedence than the |. Hence the section ([A-Za-z]|[0-9]+) is equivalent to ([A-Za-z]|([0-9]+)) (ignoring the difference between capturing and non-capturing brackets). The expression is interpreted as:
^ Start of string
/ The character '/'
([A-Za-z]|[0-9]) One alphanumeric
/ The character '/'
(
[A-Za-z] One alpha character
| or
[0-9]+ One or more digits
)
$ End of the string
This will only match strings where the first three characters are /, one alphanumeric, then /. Then the remainder of the string must be either one alpha or several digits. Thus these strings would be matched:
/a/b
/c/123
/4/d
/5/6
/7/890123456789
These strings would not be matched:
/aa/b
c/c/123
/44/d
/5/6a
/5/a6
/7/ee

PHP Look behind Regex with variable distance

I need to match a sequence of characters but only if it's not preceded by a "?" or "#" with 0 or more (any) number of wildcard characters in between.
$extension_regex =
'/
(?<!\?|\#) # Negative look behind not "?" or "#"
\/ # Match forward slash
[^\/\?#]+ # Has one or more of any character except forward slash, question mark and hash
\. # Dot
([^\/\?#]+) # Has one or more of any character except forward slash, question mark and hash
/iux';
Examples:
"?randomcharacters/index.php" should not get matched
"#randomcharacters/index.php" should not get matched
"randomcharacters/index.php" should get matched
I understand that the lookbehind is not working because it sees that "/index.php" is not preceded by ? or #. But I can't figure out how to add wildcard "distance" between the ? or # and the /index.php.
The Answer
Based on #Jerry's answer. Here's the full regex as the answer:
$extension_regex =
'~
^
(?:
(?!
[?#]
.*
/
[^/?#]+
\.
[^/?#]+
)
.
)*
/
[^/?#]+
\.
([^/?#]+)
~iux';
You cannot put a variable width assertion within a lookbehind in PCRE, but you could perhaps use a work around using a negative lookahead, something like this maybe?
^(?:(?![#?].*/index.php).)*(/index.php)
I added the capture group just to get the part you want to match, even though it might not be actually useful here.
regex101 demo
^(?:(?![#?].*/index.php).)* will basically match any character, as long as there's no # or ? followed by the string you want to match (/index.php) immediately ahead.
In C#, you might otherwise be able to use:
(?<![#?].*)/index.php
This may help:
$extension_regex = 'string';
$arr = array('?', '#', '0');//these are the forbidden characters
if(in_array(substr($extension_regex, 0, 1), $arr))
echo "true";
else
echo "false";

Categories