<?PHP, REGEX and me. A tragedy in three acts - php

long time listener. First time caller...
Not strictly a PHP question as it involves regular expressions but this one has got me tearing my hair out.
I have 3 regular expressions that I want to create, and only one is working correctly.
Now I am not sure whether this is due to the fact that:
I don't understand preg_match and
ereg and their return codes as I
haven't worked in PHP for about 7
years.
My regular expressions are
just plain wrong.
I am mentally disabled.
Either way here are the expressions and my feeble attempts in making them work.
1) Match any number starting with 2,3,4 or 5 then followed by 5 digits. (This one I think works)
code:
if (!ereg('/[2-5]\d{5}/', $_POST['packageNumber' )
{
echo "The package number is not the correct format.";
}
2) Match any number starting with 2,3,4 or 5 then followed by 5 digits then a period then a 1 or a 2.
if (!ereg("/[2-5]\d{5}\.[1-2]/", $_POST['packageModifier' )
{
echo "The package modifier is not the correct format.";
}
3) Match any combination of alphanumerics, spaces,periods and hypens up to 50 characters.
if (!ereg("/[0-9a-zA-Z\s\-\.]{0,50}/", $_POST['customerNumber' )
{
echo "The customer number is not the correct format.";
}
If anyone can please tell me what I am doing wrong I'll give them my first born.

You are mixing up PCRE functions and POSIX regular expression functions. You are using a Perl-Compatible regular expression with a POSIX regular expression function.
So replace ereg by preg_match and it should work:
if (!preg_match('/^[2-5]\d{5}$/', $_POST['packageNumber'])) {
echo "The package number is not the correct format.";
}
if (!preg_match("/^[2-5]\d{5}\.[1-2]$/", $_POST['packageModifier'])) {
echo "The package modifier is not the correct format.";
}
if (!preg_match("/^[0-9a-zA-Z\s\-.]{0,50}$/", $_POST['customerNumber'])) {
echo "The customer number is not the correct format.";
}
Along with fixing the PHP syntax errors I added anchors for the start (^) and the end ($) of the string to be matched.

I'm assuming that you just missed off the closing ] on the $_POSTS and i've added in anchors for the start and end of the lines and used preg_match.
If you don't anchor it and the pattern is matched anywhere in the string then the entire thing will match. For example.
"dfasfasfasfasf25555555as5f15sdsdasdsfghfsgihfughd54" would be matched if the first one was not anchored.
Number One
if (!preg_match('/^[2-5]\d{5}$/', $_POST['packageNumber'])) {
echo "The package number is not the correct format.";
}
Number Two
if (!preg_match('/^[2-5]\d{5}\.[2-5]$/', $_POST['packageModifier'])) {
echo "The package modifier is not the correct format.";
}
Number Three
if (!preg_match('/^[0-9a-zA-Z\s\-.]{0,50}$/m', $_POST['customerNumber'])) {
echo "The package modifier is not the correct format.";
}

Don't you need to anchor the regular expressions?
Otheriwse '111111111111111211111111111' will match /[2-5]\d{5}/.

When using POSIX regular expressions (deprecated by PHP 5.3) you should write the tests like this:
if (ereg('^[2-5][0-9]{5}$', $_POST['packageNumber']) === false)
{
echo "The package number is not the correct format.";
}
if (ereg('^[2-5][0-9]{5}\\.[1-2]$', $_POST['packageModifier']) === false)
{
echo "The package modifier is not the correct format.";
}
if (ereg('^[[:alnum:][:space:].-]{0,50}$', $_POST['customerNumber']) === false)
{
echo "The customer number is not the correct format.";
}
Note that I anchored the regular expressions -- otherwise the customerNumber will always match (with a zero-length match).
See the POSIX regex man page for more information.

preg_match('/^[2-5]\d{5}$/', $str); // 1-st
preg_match('/^[2-5]\d{5}\.[1-2]$/', $str); // 2-nd
preg_match('/^[0-9a-z\s\-\.]{0,50}$/i', $str); // 3-rd
Your mistakes:
you didn't escape slashes and finally regexp "\d" means just 'd', but "\d" means '\d'.
you had to anchor regesp to beginnig and the end of entire string by ^ and $ symbols.
Thats all ;)
PS: better use single quotes for string literals - they are faster and safer...

Related

Regular expressions in preg_match pattern not matching string

I've read the PHPManual RegEx Intro, but am confused on how to structure the pattern for preg_match. I am checking that the username on the login form is all lower case alphabet between 2 and 5 characters in length.
Pattern 1: Initially, I used a character class followed by a repetition quantifier:
if (preg_match("[a-z]{2,5}",$_POST['ULusername'])) {
$formmessage = 'Hello, ' . $_POST['ULusername'];
} else {
$formmessage = 'Enter username.';
}
The output was always "Enter username."
Pattern 2: I then thought perhaps I needed delimiters:
if (preg_match("/[a-z]{2,5}/",$_POST['ULusername'])) {
$formmessage = 'Hello, ' . $_POST['ULusername'];
} else {
$formmessage = 'Enter username.';
}
But the output was still always "Enter username."
Pattern 3: Finally, I tried delimiters with the begin/end anchors:
if (preg_match("#^([a-z]{2,5})$#",$_POST['ULusername'])) {
$formmessage = 'Hello, ' . $_POST['ULusername'];
} else {
$formmessage = 'Enter username.';
}
This gave me the desired output.
Why does the third pattern work, but not the first two?
The first one fails because it doesn't contain a delimiter.
In the second one, there is a problem in your logic. Because /[a-z]{2,5}/ check only two to five consecutive lower case letters only. And there is no indication of input length in there. Try it with ABcdEF, then you'll understand what's going on there.
In the third one first, you grouped this pattern [a-z]{2,5} using () and check whether that given string starts and ends with this ([a-z]{2,5}) group pattern. But according to my tests of your third code, the grouping doesn't affect your logic. Try it without () and you will get the same result. Because when you group the logic [a-z]{2,5} and check whether a given string starts and ends with that group is same as #^[a-z]{2,5}$#.
For more information, you can refer tutorials about regular expressions.
http://www.rexegg.com/regex-quickstart.html
https://www.regular-expressions.info/refcapture.html
The first pattern returns false, an indication that an error occurred (here, no delimiter in pattern).
The second and third patterns are valid regex patterns but they do not match the same set of strings. Using "/[a-z]{2,5}/" you'd have a match whenever $_POST['ULusername'] contains at least two consecutive lowercase characters. However, it does not care if the length of the whole string is greater than 5.
The last pattern both has delimiters and a start and end anchors, so only lowercase strings of length 2 to 5 will match.

PHP preg_match regular expression for find date in string

I try to make system that can detect date in some string, here is the code :
$string = "02/04/16 10:08:42";
$pattern = "/\<(0?[1-9]|[12][0-9]|3[01])\/\.- \/\.- \d{2}\>/";
$found = preg_match($pattern, $string);
if ($found) {
echo ('The pattern matches the string');
} else {
echo ('No match');
}
The result i found is "No Match", i don't think that i used correct regex for the pattern. Can somebody tell me what i must to do to fix this code
First of all, remove all gibberish from the pattern. This is the part you'll need to work on:
(/0?[1-9]|[12][0-9]|3[01]/)
(As you said, you need the date only, not the datetime).
The main problem with the pattern, that you are using the logical OR operators (|) at the delimiters. If the delimiters are slashes, then you need to replace the tube characters with escaped slashes (/). Note that you need to escape them, because the parser will not take them as control characters. Like this: \/.
Now, you need to solve some logical tasks here, to match the numbers correctly and you're good to go.
(I'm not gonna solve the homework for you :) )
These articles will help you to solve the problem tough:
Character classes
Repetition opetors
Special characters
Pipe character (alternation operator)
Good luck!
In your comment you say you are looking for yyyy, but the example says yy.
I made a code for yy because that is what you gave us, you can easily change the 2 to a 4 and it's for yyyy.
preg_match("/((0|1|2|3)[0-9])\/\d{2}\/\d{2}/", $string, $output_array);
Echo $output_array[1]; // date
Edit:
If you use this pattern it will match the time too, thus make it harder to match wrong.
((0|1|2|3)[0-9])/\d{2}/\d{2}\s+\d{2}:\d{2}:\d{2}
http://www.phpliveregex.com/p/fjP
Edit2:
Also, you can skip one line of code.
You first preg_match to $found and then do an if $found.
This works too:
If(preg_match($pattern, $string, $found))}{
Echo $found[1];
}Else{
Echo "nothing found";
}
With pattern and string as refered to above.
As you can see the found variable is in the preg_match as the output, thus if there is a match the if will be true.

Regex for Chinese / Japanese letters

Okai so I already have this regular expression for names allowed on my website.
However, I also wish to add other possible letters that names use.
Does someone have a good regex or know how I can make this more complete? I have searched for quite a while now, and I can't find anything that suits my needs.
This is my current regex for checking names:
$regex = "/^([a-zA-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð ,.'-])+$/";
if(preg_match($regex, $fullname)){
// do something
}
As Lucas Trzesniewski has mentioned, the \p{L} will include the [a-zA-Z], so I have removed from the pattern.
Thus, combining the character lists that you have included in the example; the pattern will look like this, /^[\p{L}\s,.'-]+$/u
^[]+$ matches the string from start to end, thus + also imply the need of matching one or more
\p{L} matches unicode characters
\s,.'- matches space, comma, period, single quotation, and dash
u the PCRE_UTF8 modifier, this modifier turns on additional functionality of PCRE that is incompatible with Perl.
if(preg_match("/^[\p{L}\s,.'-]+$/u", "お元気ですか你好吗how are you你好嗎,.'-") === 1) {
echo "match";
}
else {
echo "no match";
}
// match
if(preg_match("/^[\p{L}\s,.'-]+$/u", "お元気ですか你好吗how are you你好_嗎-,.'") === 1) {
echo "match";
}
else {
echo "no match";
}
// no match as there are underscore in 你好_嗎

How to get regex to fail if more than 4 pipe characters

I am stuck on a regex expression. I have a string which should begin with several values separated by 4 pipe | characters. All I want the regex to do is let me know if there are less or more pipes. The regex works if there are less than 4 pipes, but continues to give a positive when there are more, even though I think I have what I need to basically say, "no more pipes the rest of the way."
I'm expecting the following example to fail, but it still returns true:
$string = 'a|b|c|d|e|f';
if (preg_match('/^.*\|.*\|.*\|.*\|[^\|]+$/u', $string)) {
echo 'Four pipes at beginning';
}
else {
echo 'not enough or too many pipes';
}
Any assistance would be greatly appreciated.
Thank you.
Replace your .*s with [^|]*s:
$string = 'a|b|c|d|e|f';
if (preg_match('/^[^|]*\|[^|]*\|[^|]*\|[^|]*\|[^\|]*$/u', $string)) {
echo 'Four pipes at beginning';
}
else {
echo 'not enough or too many pipes';
}
Your last one is right, but the first four .*s will match strings that include a pipe symbol. Note that the + at the end of your original regex requires a character at the end. If you really only want to check the number of pipes, you should use * instead.
It doesn't fail because you use .* which matches every character.
Try this instead:
/^([a-z])\|([a-z])\|([a-z])\|([a-z])\$/u

A solid nickname regexp

I want a regular expression to validate a nickname: 6 to 36 characters, it should contain at least one letter. Other allowed characters: 0-9 and underscores.
This is what I have now:
if(!preg_match('/^.*(?=\d{0,})(?=[a-zA-Z]{1,})(?=[a-zA-Z0-9_]{6,36}).*$/i', $value)){
echo 'bad';
}
else{
echo 'good';
}
This seems to work, but when a validate this strings for example:
11111111111a > is not valid, but it should
aaaaaaa!aaaa > is valid, but it shouldn't
Any ideas to make this regexp better?
I would actually split your task into two regex:
to find out whether it's a valid word: /^\w{6,36}$/i
to find out whether it contains a letter /[a-z]/i
I think it's much simpler this way.
Try this:
'/^(?=.*[a-z])\w{6,36}$/i'
Here are some of the problems with your original regex:
/^.*(?=\d{0,})(?=[a-zA-Z]{1,})(?=[a-zA-Z0-9_]{6,36}).*$/i
(?=\d{0,}): What is this for??? This is always true and doesn't do anything!
(?=[a-zA-Z]{1,}): You don't need the {1,} part, you just need to find one letter, and i flag also allows you to omit A-Z
/^.*: You're matching these outside of the lookaround; it should be inside
(?=[a-zA-Z0-9_]{6,36}).*$: this means that as long as there are between 6-36 \w characters, everything else in the rest of the string matches! The string can be 100 characters long mostly containing illegal characters and it will still match!
You can do it easily using two calls to preg_match as:
if( preg_match('/^[a-z0-9_]{6,36}$/i',$input) && preg_match('/[a-z]/i',$input)) {
// good
} else {
// bad
}

Categories