What do these certain symbols/parts mean in preg_match? - php

I know a little about preg_match, however there are some that look rather complex and some that contain symbols that I don't entirely understand. For example:
On the first one - I can only assume this has something to do with an e-mail address and url, but what do things like [^/] and the ? mean?
preg_match('#^(?:http://)?([^/]+)#i', $variable);
.....
In the second one - what do things like the ^, {5} and $ mean?
preg_match("/^[A-Z]{5}[0-9]{4}[A-Z]{1}$/", $variable);
It's just these small things I'm not entirely sure on and a brief explanation would be much appreciated.

Here are the direct answers. I kept them short because they won't make sense without an understanding of regex. That understanding is best gained at http://www.regular-expressions.info/tools.html. I advise you to also try out the regex helper tools listed there, they allow you to experiment - see live capturing/matching as you edit the pattern, very helpful.
Simple parentheses ( ) around something makes it a group. Here you have (?=) which is an assertion, specifically a positive look ahead assertion. All it does is check whether what's inside actually exists forward from the current cursor position in the haystack. Still with me?
Example: foo(?=bar) matches foo only if followed by bar. bar is never matched, only foo is returned.
With this in mind, let's dissect your regex:
/^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/
Reads as:
^.* From Start, capture 0-many of any character
(?=.{4,}) if there are at least 4 of anything following this
(?=.*[0-9]) if there is: 0-many of any, ending with an integer following
(?=.*[a-z]) if there is: 0-many of any, ending with a lowercase letter following
(?=.*[A-Z]) if there is: 0-many of any, ending with an uppercase letter following
.*$ 0-many of anything preceding the End

Although I am not a fan of just posting links, I think a regex tutorial would be too much. So check out this Regular Expression cheat sheet it will probably get you on your way if you already have a little understanding of what it does.
Also check out this for some explanations and more helpful links; http://coding.smashingmagazine.com/2009/06/01/essential-guide-to-regular-expressions-tools-tutorials-and-resources/

First one:
The # actually don't have anything to do with the content that is matched. Usually, you use / as the delimiter character in a regex. Downside is, that you need to escape it everytime you want to use it. So here, # is used as the delimiter.
[^/] is a character group. [/] would match only the / character, ^ inverts this. [^/] matches all characters except the /.
Second one:
^ matches the beginning of the string, $ the end of the string. You can use this to enforce that the regex has to apply to the whole string you are matching on.
{5} is a quantifier. It is equivalent to {5,5} which is minimum 5, maximum 5, so it matches exactly 5 characters.

first one:
[^/] = everything but no slash
second one:
^ look from beginning of $variable
{5} exactly 5 occurencies of [A-Z]
$ look until end of $variable reached
combination of ^ and $ means that everything between that has to apply to $variable

Related

PHP regex pattern for matching username

I'm developing a laravel application where a user can refer to his profile by putting his username in the appropriate form.
Let's see an example:
A user named John can refer to his profile using the following text: #John
I spent several hours trying to understand how regex works, but this pattern is where i've got so far: #([A-Za-z0-9]+)
This pattern perfectly matches the example above, but it also matches other formats that it normally shouldn't.
I need some help creating the perfect pattern.
It should only match a string that starts with the # symbol.
For example: #John, #Sam, #Bill, etc.
It shouldn't match a string that doesn't start with the # symbol.
For example: a#John, something#Sam, 123#Bill, etc.
It should also match those formats that contain more than one # symbols.
For example: #John#, #Sam#something, #Bill##sometext, etc.
In this case the pattern should capture: John#, Sam#something, Bill##sometext
Thanks for your help and sorry for my bad english.
This should work:
(?<=\s|^)#([\w#]+)
There is a positive lookbehind assertion to make sure the tag is preceded by whitespace, or the start of the string. After that it's just a case of consuming the # character and putting the username inside a capturing group.
Regex demo
Your regex is almost correct.
Firstly, you want to say that your regex should match also the begining of the string. You can achieve that with caret symbol (^):
^#([A-Za-z0-9]+)
Secondly, you want to be able to put the # sign inside. Now it's easy - just add that symbol inside the brackets.
^#([A-Za-z0-9#]+)
Try /(?:\s#{1,3})([a-zA-Z#.]+)/i
Explain
# Character. Matches a "#" character (char code 64).
{1,3} Quantifier. Match between 1 and 3 of the preceding token.
\w Word. Matches any word character (alphanumeric & underscore).
+ Plus. Match 1 or more of the preceding token.
Here is regexr: http://regexr.com/3djhq

Regex - Match Word Aslong As Nothing Follows It

Having a little trouble with regex. I'm trying to test for a match but only if nothing follows it. So in the below example if I go to test/create/1/2 - it still matches. I only want to match if it's explicitally test/create/1 (but the one is dynamic).
if(preg_match('^test/create/(.*)^', 'test/create/1')):
// do something...
endif;
I've found some answers that suggest using $ before my delimiter but it doesn't appear to do anything. Or a combination of ^ and $ but I can't quite figure it out. Regex confuses the hell out of me!
EDIT:
I didn't really explain this well enough so just to clarify:
I need the if statement to return true if a URL is test/create/{id} - the {id} being dynamic (and of any length). If the {id} is followed by a forward slash the if statement should fail. So that if someone types in test/create/1/2 - it will fail because of the forward slash after the 1.
Solution
I went for thedarkwinter's answer in the end as it's what worked best for me, although other answers did work as well.
I also had to add an little extra in the regex to make sure that it would work with hyphens as well so the final code looked like this:
if(preg_match('^test/create/[\w-]*$^', 'test/create/1')):
// do something...
endif;
/w matches word characters, and $ matches end of string
if(preg_match('^test/create/\w*$^', 'test/create/1'))
will match test/create/[word/num] and nothing following.
I think thats what you are after.
edit added * in \w*
Here you go:
"/^test\\/create\\/([^\\/]*)$/"
This says:
The string that starts with "test" followed by a forward slash (remember the first backslash escapes the second so PHP puts a letter backslash in the input, which escapes the / to regex) followed by create followed by a forward slash followed by and capture everything that isn't a slash which is then the end of the string.
Comment if you need more detail
I prefer my expressions to always start with / because it has no meaning as a regex character, I've seen # used, I believe some other answer uses ^, this means "start of string" so I wouldn't use it as my regex delimiters.
Use following regular expression (use $ to denote end of the input):
'|test/create/[^/]+$|'
If you want only match digits, use folloiwng instead (\d match digit character):
'^test/create/\d+$^'
The ^ is an anchor for the beginning of the line, i.e. no characters occurring before the ^ . Use a $ to designate the end of the string, or end of the line.
EDIT: wanted to add a suggestion as well:
Your solution is fine and works, but in terms of style I'd advise against using the carat (^) as a delimiter -- especially because it has special meaning as either negation or as a start of line anchor so it's a bit confusing to read it that way. You can legally use most special characters as long as they don't occur (or are escaped) in the regex itself. Just talking about a matter of style/maintainability here.
Of course nearly every potential delimiter has some special meaning, but you also often tend to see the ^ at the beginning of a regex so I might chose another alternative. For example # is a good choice here :
if(preg_match('#test/create/[\w-]*$#', $mystring)) {
//etc
}
The regex abc$ will match abc only when it's the last string.
abcd # no match
dabc # match
abc # match

What do the symbols mean in preg_match?

I have this expression in a code snippet i borrowed offline. It forces the new users to have a password that not only requires upper+lower+numbers but they must be in that order! If i enter lower+upper+numbers, it fails!
if (preg_match("/^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/", $pw_clean, $matches)) {
Ive searched online but can't find a resource that tells me what some characters mean. I can see that the pattern is preg_match("/some expression/",yourstring,your match).
What do these mean:
1. ^ - ???
2. .* - ???
3. (?=.{4,}) - requires 4 characters minimum
4. (?.*[0-9]) - requires it to have numbers
5. (?=.*[a-z])- requires it to have lowercase
6. (?=.*[A-Z])- requires it to have uppercase
7. .*$ - ???
Here are the direct answers. I kept them short because they won't make sense without an understanding of regex. That understanding is best gained at regular-expressions.info. I advise you to also try out the regex helper tools listed there, they allow you to experiment - see live capturing/matching as you edit the pattern, very helpful.
1: The caret ^ is an anchor, it means "the start of the haystack/string/line".
If a caret is the first symbol inside a character class [], it has a different meaning: It negates the class. (So in [^ab] the caret makes that class match anything which is not ab)
2: The dot . and the asterisk * serve two separate purposes:
The dot matches any single character except newline \n.
The asterisk says "allow zero or many of the preceeding type".
When these two are combined as .* it basically reads "zero or more of anything until a newline or another rule comes into effect".
7: The dollar $ is also an anchor like the caret, with the opposite function: "the end of the haystack".
Edit:
Simple parentheses ( ) around something makes it a group. Here you have (?=) which is an assertion, specifically a positive look ahead assertion. All it does is check whether what's inside actually exists forward from the current cursor position in the haystack. Still with me?Example: foo(?=bar) matches foo only if followed by bar. bar is never matched, only foo is returned.
With this in mind, let's dissect your regex:
/^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/
Reads as:
^.* From Start, capture 0-many of any character
(?=.{4,}) if there are at least 4 of anything following this
(?=.*[0-9]) if there is: 0-many of any, ending with an integer following
(?=.*[a-z]) if there is: 0-many of any, ending with a lowercase letter following
(?=.*[A-Z]) if there is: 0-many of any, ending with an uppercase letter following
.*$ 0-many of anything preceding the End
You say the order of password characters matter - it doesn't in my tests. See test script below. Hope this cleared up a thing or two. If you are looking for another regex which is a bit more forgiving, see regex password validation
<pre>
<?php
// Only the last 3 fail, as they should. You claim the first does not work?
$subjects = array("aaB1", "Baa1", "1Baa", "1aaB", "aa1B", "aa11", "aaBB", "aB1");
foreach($subjects as $s)
{
$res = preg_match("/^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/", $s, $matches);
echo "result: ";
print_r($res);
echo "<br>";
print_r($matches);
echo "<hr>";
}
Excellent online tool for checking and testing Regular Expressions:
https://regex101.com/
If you don't know this site, you should go there immediately.
This is like the bible of regular expressions.
Regular-expressions.info
To use regular expressions first you need to learn the syntax. This syntax consists in a series of letters, numbers, dots, hyphens and special signs, which we can group together using different parentheses.
Look at this link Getting Started with PHP Regular Expressions. An easy way to learn regular expressions.

regex validation

I am trying to validate a string of 3 numbers followed by / then 5 more numbers
I thought this would work
(/^([0-9]+[0-9]+[0-9]+/[0-9]+[0-9]+[0-9]+[0-9]+[0-9])/i)
but it doesn't, any ideas what i'm doing wrong
Try this
preg_match('#^\d{3}/\d{5}#', $string)
The reason yours is not working is due to the + symbols which match "one or more" of the nominated character or character class.
Also, when using forward-slash delimiters (the characters at the start and end of your expression), you need to escape any forward-slashes in the pattern by prefixing them with a backslash, eg
/foo\/bar/
PHP allows you to use alternate delimiters (as in my answer) which is handy if your expression contains many forward-slashes.
First of all, you're using / as the regexp delimiter, so you can't use it in the pattern without escaping it with a backslash. Otherwise, PHP will think that you're pattern ends at the / in the middle (you can see that even StackOverflow's syntax highlighting thinks so).
Second, the + is "greedy", and will match as many characters as it can, so the first [0-9]+ would match the first 3 numbers in one go, leaving nothing for the next two to match.
Third, there's no need to use i, since you're dealing with numbers which aren't upper- or lowercase, so case-sensitivity is a moot point.
Try this instead
/^\d{3}\/\d{5}$/
The \d is shorthand for writing [0-9], and the {3} and {5} means repeat 3 or 5 times, respectively.
(This pattern is anchored to the start and the end of the string. Your pattern was only anchored to the beginning, and if that was on purpose, the remove the $ from my pattern)
I recently found this site useful for debugging regexes:
http://www.regextester.com/index2.html
It assumes use of /.../ (meaning you should not include those slashes in the regex you paste in).
So, after I put your regex ^([0-9]+[0-9]+[0-9]+/[0-9]+[0-9]+[0-9]+[0-9]+[0-9]) in the Regex box and 123/45678 in the Test box I see no match. When I put a backslash in front of the forward slash in the middle, then it recognizes the match. You can then try matching 1234/567890 and discover it still matches. Then you go through and remove all the plus signs and then it correctly stops matching.
What I particularly like about this particular site is the way it shows the partial matches in red, allowing you to see where your regex is working up to.

php regular expression help finding multiple filenames only not full URL

I am trying to fix a regular expression i have been using in php it finds all find filenames within a sentence / paragraph. The file names always look like this: /this-a-valid-page.php
From help i have received on SOF my old pattern was modified to this which avoids full urls which is the issue i was having, but this pattern only finds one occurance at the beginning of a string, nothing inside the string.
/^\/(.*?).php/
I have a live example here: http://vzio.com/upload/reg_pattern.php
Remove the ^ - the carat signifies the beginning of a string/line, which is why it's not matching elsewhere.
If you need to avoid full URLs, you might want to change the ^ to something like (?:^|\s) which will match either the beginning of the string or a whitespace character - just remember to strip whitespace from the beginning of your match later on.
The last dot in your expression could still cause problems, since it'll match "one anything". You could match, for example, /somefilename#php with that pattern. Backslash it to make it a literal period:
/\/(.*?)\.php/
Also note the ? to make .* non-greedy is necessary, and Arda Xi's pattern won't work. .* would race to the end of the string and then backup one character at a time until it can match the .php, which certainly isn't what you'd want.
To find all the occurrences, you'll have to remove the start anchor and use the preg_match_all function instead of preg_match :
if(preg_match_all('/\/(.*?)\.php/',$input,$matches)) {
var_dump($matches[1]); // will print all filenames (after / and before .php)
}
Also . is a meta char. You'll have to escape it as \. to match a literal period.

Categories