What does this PHP regular expression mean? [duplicate] - php

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I'm learning PHP regular expressions, and I came across something I'm having trouble making sense of.
The book gives this example in validating an e-mail address.
if (ereg("^[^#]+#([a-z0-9\-]+\.)+[a-z]{2,4}$", $email))
I'm not clear on a couple elements of this expression.
what does this mean [^#]+#
What is the purpose of the parentheses in ([a-z0-9\-]+\.)?

[^#]+# means:
[ - Match this group of characters
^# - Anything that is NOT an at sign
]
+ - One or more times
# - Match an at sign
So, it's essentially matching every character before the first at sign.
The purpose of parenthesis in ([a-z0-9-]+.) is to create a capturing group, which you should be able to reference later on once the group captures some amount of text.
Also note that ereg_* functions are deprecated, and your book must be a bit dated. Nowadays, we use the preg_* family of functions. A tutorial on converting them can be found in this SO question.

Related

Is it safe to mass replace legacy ASP tags with <?= ?> [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I have this RegEx:
('.+')
It has to match character literals like in C. For example, if I have 'a' b 'a' it should match the a's and the ''s around them.
However, it also matches the b also (it should not), probably because it is, strictly speaking, also between ''s.
Here is a screenshot of how it goes wrong (I use this for syntax highlighting):
I'm fairly new to regular expressions. How can I tell the regex not to match this?
It is being greedy and matching the first apostrophe and the last one and everything in between.
This should match anything that isn't an apostrophe.
('[^']+')
Another alternative is to try non-greedy matches.
('.+?')
Have you tried a non-greedy version, e.g. ('.+?')?
There are usually two modes of matching (or two sets of quantifiers), maximal (greedy) and minimal (non-greedy). The first will result in the longest possible match, the latter in the shortest. You can read about it (although in perl context) in the Perl Cookbook (Section 6.15).
Try:
('[^']+')
The ^ means include every character except the ones in the square brackets. This way, it won't match 'a' b 'a' because there's a ' in between, so instead it'll give both instances of 'a'
You need to escape the qutoes:
\'[^\']+\'
Edit: Hmm, we'll I suppose this answer depends on what lang/system you're using.

Regular Expression - Find all matches after string but before character [duplicate]

This question already has answers here:
Regular expression to stop at first match
(9 answers)
Closed 2 years ago.
I have this gigantic ugly string:
J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM
J0000010: Project name: E:\foo.pf
J0000011: Job name: MBiek Direct Mail Test
J0000020: Document 1 - Completed successfully
I'm trying to extract pieces from it using regex. In this case, I want to grab everything after Project Name up to the part where it says J0000011: (the 11 is going to be a different number every time).
Here's the regex I've been playing with:
Project name:\s+(.*)\s+J[0-9]{7}:
The problem is that it doesn't stop until it hits the J0000020: at the end.
How do I make the regex stop at the first occurrence of J[0-9]{7}?
Make .* non-greedy by adding '?' after it:
Project name:\s+(.*?)\s+J[0-9]{7}:
Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.
However, consider using a negative character class instead:
Project name:\s+(\S*)\s+J[0-9]{7}:
\S means “everything except a whitespace and this is exactly what you want.
Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.
Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.
string m = Regex.Match(s, #"Project name: (?<name>.*?) J\d+").Groups["name"].Value;
I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.
One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.
For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.
Available for download at their site:
http://www.ultrapico.com/Expresso.htm
Express download:
http://www.ultrapico.com/ExpressoDownload.htm
(Project name:\s+[A-Z]:(?:\\w+)+.[a-zA-Z]+\s+J[0-9]{7})(?=:)
This will work for you.
Adding (?:\\w+)+.[a-zA-Z]+ will be more restrictive instead of .*

Why does this regex pattern not match? [duplicate]

This question already has answers here:
Greedy vs. Reluctant vs. Possessive Qualifiers
(7 answers)
Can someone explain Possessive Quantifiers to me? (Regular Expressions)
(1 answer)
Closed 5 years ago.
Regex101 link: https://regex101.com/r/MsZy0A/2
I have the following regex pattern; .++b with the following test data; aaaaaaaacaeb.
What I don't understand is the "Possessive quantifier". I've read that it doesn't backtrack, which it normally does. However, I don't think it has to backtrack anyways? It only has to match anything up to and including "b", "b" would be matched twice, as .+ matches everything (including "b"), and the "b" after would also match "b".
Could someone please explain the possessive quantifier's role in this?
This question is not a duplicate of the one noted, I'm asking about this particular case because I still didn't get it after reading the other answer.
++ Matches between one and unlimited times, as many times as possible, without giving back - means, if you write .++, it matches everything including the final b. So the additional b in your regex will never matched.
You could get around this, if you don't use possessive quantifiers or simply remove the b from the matching class [^b]++b - but I would suggest the first. Possessive quantifiers are almost everytime unneccessary.

Regex for name with space in HTML5 [duplicate]

This question already has an answer here:
html5 pattern for first and last name
(1 answer)
Closed 5 years ago.
I tried below pattern for Name field in HTML5, but everytime I am getting error :- "Please match the requested format"
Apart from above pattern, I also gave different pattern like :-
a) pattern="/^[A-Za-z\s]+$/"
b) pattern="/^[A-Za-z]\s[A-Za-z]+$/"
All the three pattern are not working. What I want is simple Firstname Lastname like "Harry Potter".
Please advice.
Thanks in advance.
You need to remove leading and trailing slashes from your pattern because in JavaScript they're indicate that string is actually a regular expression, but in html attribute it is already known to be a regular expression.
Pattern itself will depend on what kind of names you want to accept, your expression will not accept non-latin names, but there is a lot of people with such names. Basically if you want to check for existence of at least 2 words (since name can contain more then 2 words). For example you can use this pattern="\D\S+(\s+\D\S+)+" that will check for existence for at least 2 words separated by whitespace and each word should not start with a digit.

Need to switch from ereg() to preg_match() [duplicate]

This question already has answers here:
How can I convert ereg expressions to preg in PHP?
(4 answers)
Closed 9 years ago.
I need to know what this line of code does, tried to figure it out because i have to build it with preg_match() but I didn't understand it completely:
ereg("([0-9]{1,2}).([0-9]{1,2}).([0-9]{4})", $date)
I know it checks a date, but i don't know in which way.
thanks for some help
Let's break this down:
([0-9]{1,2})
This looks for numbers zero through nine (- indicates a range when used in brackets []) and there can be 1 or two of them.
.
This looks for any single character
([0-9]{1,2})
This looks for numbers zero through nine and there can be 1 or two of them (again)
.
This looks for any single character (again)
([0-9]{4})
This looks for numbers zero through nine and there must be four of them in a row
So it is looking for a date in any of the following formats:
04 18 1973
04-18-1973
04/18/1973
04.18.1973
More will fit that pattern so it isn't a very good regex for what it is supposed to validate against. There are lots of sample regex patterns for matting dates in this format so if you google it you'll have a PCRE in no time.
It's a relatively simple regular expression (regex). If you're going to be working with regex, then I suggest taking a bit of time to learn the syntax. A good starting place to learn is http://regular-expressions.info.
"Regular expressions" or "regex" is a pattern matching language used for searching through strings. There are a number of dialects, which are mostly fairly similar but have some differences. PHP started out with the ereg() family of functions using one particular dialect and then switched to the preg_xx() functions to use a slightly different regex dialect.
There are some differences in syntax between the two, which it is helpful to learn, but they're fairly minor. And in fact the good news for you is that the pattern here is pretty much identical between the two.
Beyond the patterns themselves, the only other major difference you need to know about is that patterns in preg_match() must have a pair of delimiting characters at either end of the pattern string. The most commonly used characters for this are slashes (/).
So in this case, all you need to do is swap ereg for preg_match, and add the slashes to either end of the pattern:
$result = preg_match("/([0-9]{1,2}).([0-9]{1,2}).([0-9]{4})/", $date);
^ ^
slash here and here
It would still help to get an understanding of what the pattern is doing, but for a quick win, that's probably all you need to do in this case. Other cases may be more complex, but most will be as simple as that.
Go read the regular-expressions.info site I linked earlier though; it will help you.
One thing I would add, however, is that the pattern given here is actually quite poorly written. It is intending to match a date string, but will match a lot of things that it probably didn't intend to.
You could fix it up by finding a better regex expression for matching dates, but it is quite possible that the code could be written without needing regex at all -- PHP has some perfectly good date handling functionality built into it. You'd need to consider the code around it and understand what it's doing, but it's perfectly possible that the whole thing could be replaced with something like this:
$dateObject = DateTime::CreateFromFormat($date, 'd.M.Y');
It looks like it would be pretty much agnostic in its matching.
You could interpret it either as mm.dd.yyyy or dd.mm.yyyy. I would consider modifying it if you were in fact trying to match/verify a date as 00.00.0000 would be a match but is an invalid data, outside of possible historic context.
Edit: I forget '.' in this case would match any character without escaping.
this do the same, i have only replace [0-9] by \d, and the dot (that match all) by \D (a non digit, but can replace it by \. or [.- ])
preg_match("~\d{2}\D\d{2}\D\d{4}~", $date)

Categories