PHP Regex IF THEN pattern - php

I'm new to writing Regex patterns and I'm struggling to understand why the following line doesn't work.
/^(£)?[0-9]+(?(?=\.[0-9]{2}){0,1}(p)?|$)/
Note: I'm writing this in PHP
I want the code to find £3.10p, but not £3p. Essentially, the letter 'p' can't be allowed unless it is preceded with a decimal point and 2 digits.
EDIT: To clarify, the letter p can be used at the end of the string, however if the string contains a £ and/or a decimal point, the p must be preceded by the point and 2 digits.
More examples of valid inputs:
£3.50
350
£350
234p
Invalid input:
£2p
Could someone please fix this and explain where I've gone wrong here?
Thanks

If 0.50p is allowed, then you can do it like this:
^((£?[0-9]+)(?!p)|([0-9]+p?))?(?<!p)(\.[0-9]{2})?p?$
Regex saved with all your examples here: https://regex101.com/r/rE1bT9/3

Try this:
/^(?(?=£)(£\d+\.\d{2}p?|£\d+)|\d+p?)$/
You can test it here:
https://regex101.com/r/mG8kR0/1

It is unclear how your valid sample "234p" matches your rule "p is allowed if there are at least two digits and a point". However, in your question you are using positive lookahead, this seems an overhead here.
Your rule for p may be written as: (\.[0-9]{2}p?)
So over all, you just need: /^(£)?[0-9]+(\.[0-9]{2}p?)$/
And if you allow "234p" also, just make the period optional: /^(£)?[0-9]+(\.?[0-9]{2}p?)$/
Try it out here: http://www.regexr.com/
The latter regex gives positive feedback to all your valid samples, and it denies the invalid input. It is unclear what should happen if there are only two digits, and if it is important to catch some pieces, there should be more brackets.

How about:
/^(?:£?[0-9]+(?:\.[0-9]{2})?|[0-9]+p?)$)/

Related

PHP Regex Not Quite Working

I am using the following regex:
^[0-9.,]*(([.,][-])|([.,][0-9]{2}))?\$
I use this regex to check for valid prices -- so it catches/rejects things like xxx, or llddd or 34.23dsds
and allows things like 100 or 120.00
The problem with it seems to be if it is blank(empty) it passes as valid which it should not -- any ideas how to change this??
Thanks
One of your problems is that you use the dot in your regex which stands for "any character". If you mean a dot you need to escape it like this \.
Also you should have at least one number in it so exchange the asterisk * by a + for "one or more".
Then you can have .,.,.,.,.,.,- if you do not remove the comma and dot from the first part:
^[0-9]+(([\.,][-])|([\.,][0-9]{2}))?$
Taking yoiur regex and just solving the "don't match blanks" problem:
^[0-9.,]+(([.,][-])|([.,][0-9]{2}))?$
the * allows 0 or more, while the + allows 1 or more, thus the * allowed blanks but the + will not, instead there must be at least one digit.
EDIT:
You should clean this regex up a bit to be
^[0-9]+(?:[.,-](?:[0-9]{2})?)?$
This solves the matching of ",,,"
http://www.regextester.com/?fam=95185
EDIT 2: #Fuzzzzel pointed out that this did not match the case "50,-" which we assume you would like to match and that removing capturing groups is presumptive. Here's the latest iteration of my suggested regex:
^[0-9]+([.,-](-|([0-9]{2}))?)?$

Get 'XXX' value using a RegEx in PHP

I need some help building a regex for get the value of XXX in the following set of possible matches:
+58XXXYYYYYYY
+580XXXYYYYYYY
0XXXYYYYYYY
XXXYYYYYYY
This are phone numbers so XXX is dynamic and will not hold always the same value. The RegEx is intended to be used on PHP so I know I should use preg_match() function but I have not idea about the regex. Can any give me some advice on this?
This sounds like it matches your requirements:
(\d{3})\d{7}$
With a Live Demo
If you want to match the last 3 numbers, you could use /([0-9]{3,3})$/.
The parenthesis indicates a capturing group (what you are looking for). Inside that group you want to match a pattern [...]of any number 0-9 exactly 3 times {3,3}. And finally you want to match the last occurrence of this pattern, so the $ indicates the end of the line.
A very handy tool to building simple regex queries is Debuggex. I use it all the time!

PHP - Find number between 2 Unicode characters

Simple problem but i sux at regular expressions so i need here ur help.
What do i need to type to find a number between two first signs: •
Find out its codes but it doenst help me much: http://www.fileformat.info/info/unicode/char/2022/index.htm
Do you know what should i type in for example preg_match function to make it work?
Example:
• 12345 • TESTTESTTEST
Example Output:
12345
Thanks in advance!
To match a specific Unicode code point, use \x{FFFF} where FFFF is the hexadecimal number of the code point you want to match. You can omit leading zeros in the hexadecimal number between the curly braces. Since \x by itself is not a valid regex token, \x{1234} can never be confused to match \x 1234 times. It always matches the Unicode code point U+1234. \x{1234}{5678} will try to match code point U+1234 exactly 5678 times.
Anyway, what you're probably looking for is something like this:
\x{2022} (\d*) \x{2022}
As for the (\d*) part, it basically means match any digit infinite times, and assign this bit of the pattern as a match (braces stand for capture groups)
Actually i found out a way to do it a bit easier.
I used preg_match() with $pattern = "/[0-9]{1,}/";
Huh xD

Using a regular expression to validate email addresses

I have just started learning to code both PHP as well as HTML and had a look at a few tutorials on regular expressions however have a hard time understanding what these mean. I appreciate any help.
For example, I would like to validate the email address peanuts#monkey.com. I start off with the code and I get the message invalid email address.
What am I doing wrong?
I know that the metacharacters such as ^ denote the start of a string and $ denote the end of a string however what does this mean? What is the start of a string and what is the end of a string?
When do I group regular expressions?
$emailaddress = 'peanuts#monkey.com';
if(preg_match('/^[a-zA-z0-9]+#[a-zA-z0-9]+\.[a-zA-z0-9]$/', $emailaddress)) {
echo 'Great, you have a valid email address';
} else {
echo 'boo hoo, you have an invalid email address';
}
What you have written works with some small modifications if that is what you want to use, however you miss a '+' at the end.
1)
^[a-zA-Z0-9]+#[a-zA-Z0-9]+\.[a-zA-Z0-9]+$
The caret and dollar character match positions rather than characters, ^ is equal to the beginning of line and $ is equal to the end of line, they are used to anchor your regex. If you write your regex without those two you will match email addresses everywhere in your text, not only the email addresses which is on a single line in this case. If you had written only the ^ (caret) you would have found every email address which is on the start of the line and if you had written only the $ (dollar) you would have found only the email addresses on the end of the line.
Blah blah blah someEmail#email.com
blah blah
would not give you a match because you do NOT have a email address at the beginning of line and the line does not terminate with it either so in order to match it in this context you would have to drop ^ and $.
Grouping is used for two reasons as far I know: Back referencing and... grouping. Grouping is used for the same reasons as in math, 1 + 3 * 4 is not the same as (1 + 3) * 4. You use parentheses to constrain quantifiers such as '+', '*' and '?' as well as alternation '|' etc.
You also parentheses for back referencing, but since I can't explain it better I would link you to: http://www.regular-expressions.info/brackets.html
I will encourage you to take a look at this book, even though you only read the first 2-3 chapters you will learn a lot and it is a great book! http://oreilly.com/catalog/9781565922570
And as the commentators say, this regex is not perfect but it works and show you what you had forgotten. You were not far away!
UPDATED as requested:
The '+', '*' and '?' are quantifiers. And is also a good example where you group.
'+' mean match whatever charachter preceeds it or group 1 or n times.
'*' mean match whatever charachter preceeds it 0 or n times.
'?' mean match whatever charachter preceeds it or the group 0 or 1 time.
n times meaning (indefinitely)
The reason why you use [a-zA-Z0-9]+ is without the '+' it will only match one character. With the + it will match many but it must match at least one. With * it match many but also 0, and ? will match 1 character at most but also 0.
Your regex doesn't match email addresses. Try this one:
/\b[\w\.-]+#[\w\.-]+\.\w{2,4}\b/
I recommend you read through this tutorial to learn about Regular Expressions.
Also, RegExr is great for testing them out.
As for your second question; the ^ character means that the regular expression must start matching from the first character in the string you input. The $ means that the regular expression must end at the final character in the string you input. In essence, this means that your regular expression will match the following string:
peanuts#monkey.com
but NOT the following string:
My email address is peanuts#monkey.com, and I love it!
Grouping regular expressions has lots of use cases. Using matching groups will also make your expression cleaner and more readable. It's all explained quite well in the tutorial I linked earlier.
As CanSpice points out, matching all possible email addresses isn't all that easy. Using the RFC2822 Email Validation expression will do a better job:
/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/
There are many alternatives, but even the simplest ones will do a fair job as most email addresses end in .com (or other 2-4 character length top domains).
The only reason your original expression doesn't work is that you're limiting the number of characters behind the period (.) in your expressions to 1. Changing your expression to:
/^[a-zA-z0-9]+#[a-zA-z0-9]+\.[a-zA-z0-9]+$/
Will allow for an infinite amount of characters behind the last period.
/^[a-zA-z0-9]+#[a-zA-z0-9]+\.[a-zA-z0-9]{2,4}$/
Will allow 2 to 4 characters behind the last period. That would match:
name#email.com
name#email.info
but not:
fake#address.suckers
The top level domain (".com," ".net," ".museum") can be from 2 to 6 characters. So you should be saying 2,6 instead of 2,4.
I wrote an extremely good email address regular expression a few years ago:
^\w+([-+._]\w+)#(\w+((-+)|.))\w{1,63}.[a-zA-Z]{2,6}$
A lot of research went into that. But I have some basic tips:
DON'T JUST COPY-PASTE! If someone says "here's a great regex for that," don't just copy paste it! Understand what's going on! Regular expressions are not that hard. And once you learn them well, it'll pay dividends forever. I got good at them by taking a class in Perl back in college. Since then, I've barely gotten any better and am WAY better than the vast majority of programmers I know. It's sad. Anyways, learn it!
Start small. Instead of building a giant regex and testing it when you're done, test just a few characters. For example, when writing an email validator, why not try \w+#\w+.\w+ and see how good that is? Add in a few more things and re-test. Like ^\w+#\w+.[A-Za-z]{2,6}$
The start and end of a regex string means that nothing can come before or after the characters you specify. Your regex string needs to account for underscores, needs capitals Zs with your capital ranges, and other adjustments.
/^[a-zA-Z_0-9]+#[a-zA-Z0-9]+\.[a-zA-z0-9]{2,4}$/
{2,4} says the top level domain is between 2 and 4 characters.
This will validate ANY email address (at least i've tried a lot )
preg_match("/^[a-z0-9._-]{2,}+\#[a-z0-9_-]{2,}+\.([a-z0-9-]{2,4}|[a-z0-9-]{2,}+\.[a-z0-9-]{2,4})$/i", $emailaddress);
Hope it works!
Make sure you ALWAYS escape metacharacters (like dot):
if(preg_match('/^[a-zA-z0-9]+#[a-zA-z0-9]+\.[a-zA-z0-9]$/', $emailaddress)) {

Regular Expression to match dates in YYYY-MM-DD format

I have a regular expression in PHP that looks for the date in the format of YYYY-MM-DD
What I have is: [\d]{4}-[\d]{2}-[\d]{2}
I'm using preg_match to test the date, the problem is that 2009-11-10 works, but 2009-11-1033434 works as well. It's been awhile since I've done regex, how do I ensure that it stops at the correct spot? I've tried doing /([\d]{4}-[\d]{2}-[\d]{2}){1}/, but it returns the same result.
Any help would be greatly appreciated.
What you need is anchors, specifically ^ and $. The former matches the beginning of the string, the latter matches the end.
The other point I would make is the [] are unnecessary. \d retains its meaning outside of character ranges.
So your regex should look like this: /^\d{4}-\d{2}-\d{2}$/.
^20[0-2][0-9]-((0[1-9])|(1[0-2]))-([0-2][1-9]|3[0-1])$
I added a little extra check to help with the issue of MM and DD getting mixed up by the user. This doesn't catch all date mixups, but does keeps the YYYY part between 2000 and 2029, the MM between 01 and 12 and the DD between 01 and 31
How do you expect your date to be terminated ?
If an end-of-line, then a following $ should do the trick.
If by a non-digit character, then a following negative assertion (?!\d) will similarly work.
you're probably wanting to put anchors on the expression.
i.e.
^[\d]{4}-[\d]{2}-[\d]{2}$
note the caret and dollar sign.
You probably want look ahead assertions (assuming your engine supports them, php/preg/pcre does)
Look ahead assertions (or positive assertions) allow you to say "and it should be followed by X, but X shouldn't be a part of the match). Try the following syntax
\d{4}-\d{2}-\d{2}(?=[^0-9])
The assertion is this part
(?=[^0-9])
It's saying "after my regex, the next character can't be a number"
If that doesn't get you what you want/need, post an example of your input and your PHP code that's not working. Those two items can he hugely useful in debugging these kinds of problems.
[\d]{4}-[\d]{2}-[\d]{2}?
where the question mark means "non-greedy"
You could try putting both a '^' and a '$' symbol at the start and end of your expression:
/^[\d]{4}-[\d]{2}-[\d]{2}$/
which match the start and the end of the string respectively.

Categories