Match regular expression without slash in URL-like string - php

I have the following regular expression #^en/cities/(.*?)/$# and I want it to match anything, but stop at the first slash.
I.e. it should match 'paris' but not 'paris/france' if someone manages to type that URL.
I thought I already had it as non-greedy with the questionmark, but apparently not.

Use a negated character class.
#^en/cities/([^/]+)/$#
In your regex the non-greediness failed because you anchored the last slash to the end ($). Removing that would also make your regex work:
#^en/cities/.*?/#

Related

Regex: Differentiating underscore(_) and dash(-)

I want to construct a pattern that identifies the valid domain name. A valid domain name has alphanumeric characters and dashes in it. The only rule is that the name should not begin or end with a dash.
I have a regular expression for the validation as ^\w((\w|-)*\w)?$
However the expression is validating the strings with underscores too (for ex.: cake_centre) which is wrong. Can anyone tell me why this is happening and how it can be corrected?
P.S.: I am using preg_match() function in PHP for checking the validation.
The metacharacter \w includes underscores, you can make a character class that will allow your listed requirements:
[a-zA-Z\d-]
or per your regex:
^[a-zA-Z\d]([a-zA-Z\d-]*[a-zA-Z\d])?$
(Also note the - position in the character class is important, a - at the start or end is the literal value. If you have it in the middle it can create a range. What special characters must be escaped in regular expressions?)
Underscores are being validated because they are part of the \w character class. If you want to exclude it, try:
/^[a-z0-9]+[a-z0-9\-]*[a-z0-9]+$/i
Here is the regexp with lookaround approach
(?<!-)([a-zA-Z0-9_]+)(?!-)
regexp pattern is created in 3 groups
First group ^(?<!-) is negetive look back to ensure that matched chars does not have dash before
Second group ([a-zA-Z0-9_]+) give matching characters
Third group (?!-) is negetive lookahead to ensure match is not ending with dash

Improving this regex to include what it matches until it matches a certain character

Can someone please help me improve this regex so that it captures everything that starts with http://, https://, or www and then continues until it reaches a ' or ". It includes punctuation and is case-insensitive.
Here is the regular expression right now:
(wwww|https?://)
/(?:https?:\/\/|www)[^'"]*/i
I escaped the slashes since they could conflict if you use /.../ notation. [^'"] is an inverted character class that allows everything but quotes.
Edit: I removed the caret to match any occurrence of the pattern, :? to make the group non-capturing.
#(www|https?://).*?(?=['"])#i
The .*? makes the quantifier reluctant so it will stop at the first quote rather than the last.
The following regex will work:
(?:https?:\/\/|www)[^'"]*
You can walk through the details of the match at www.debuggex.com.

Regex in PHP (preg_match) pattern

I created a regex pattern that works in Dreamweaver's regex Find, but when dropped into the pattern of preg_match, it fails. What am I breaking in the PHP (5.1.6) regex rules that otherwise works in Dreamweaver's interpretation? Here's the PHP:
preg_match("/(\{a\})([a-zA-Z0-9{} .])+(\{/a\})/i", "{a}{900678}{abcde}{0}{0}{0}{/a}");
Returns false currently. How can I modify the pattern so that it matches any string that begins with {a}anything goes in the middle{/a} type strings? I realize that the above regex will not match 'anything' in the middle, but I simplified the expression for debugging.
The slash in the /a part is being interpreted as the end delimiter of the expression. You should probably use another delimiter for the whole pattern, e.g.:
preg_match("~(\{a\})([a-zA-Z0-9{} .])+(\{/a\})~i",
"{a}{900678}{abcde}{0}{0}{0}{/a}");
See it in action.

How can I make this URL validation regular expression less greedy?

So I have the following regular expression:
https?://(www\.)?flickr\.com/photos/(.+)/?
To match against the following URL:
http://www.flickr.com/photos/username/
How can I stop the final forward slash (/) from being included in the username sub-pattern (.+)?
I have tried:
https?://(www\.)?flickr\.com/photos/(.+?)/?
But then it only matches the first letter of the username.
https?://(?:www\.)?flickr\.com/photos/([^/]+)/?
I added ?: to the first group so it's not capturing, then used [^/] instead of the dot in the last match. This assures you that everything between "photos/" and the very next "/" is captured.
If you need to capture the first www just use this:
https?://(www\.)?flickr\.com/photos/([^/]+)/?
You need to make sure it doesn't match the forward slash:
https?://(?:www\.)?flickr\.com/photos/([^/]+)/?
You could also make the regex lazy (which is what I guess you were doing with the (.+?) syntax), but the above will work just fine
Change (.+) to ([^/]+). This will match until it encounters a /, so you might want to throw some other stuff in the class too.
There are generally two ways to do this:
Append a question mark, to make the matching non-greedy. .* will match as much as possible, .*? will match as little as possible.
Exclude the character you want to match next. If you want to stop on /, use [^/]*.
If you know there will be a trailing slash, take out the final ?.

php regular expression help finding multiple filenames only not full URL

I am trying to fix a regular expression i have been using in php it finds all find filenames within a sentence / paragraph. The file names always look like this: /this-a-valid-page.php
From help i have received on SOF my old pattern was modified to this which avoids full urls which is the issue i was having, but this pattern only finds one occurance at the beginning of a string, nothing inside the string.
/^\/(.*?).php/
I have a live example here: http://vzio.com/upload/reg_pattern.php
Remove the ^ - the carat signifies the beginning of a string/line, which is why it's not matching elsewhere.
If you need to avoid full URLs, you might want to change the ^ to something like (?:^|\s) which will match either the beginning of the string or a whitespace character - just remember to strip whitespace from the beginning of your match later on.
The last dot in your expression could still cause problems, since it'll match "one anything". You could match, for example, /somefilename#php with that pattern. Backslash it to make it a literal period:
/\/(.*?)\.php/
Also note the ? to make .* non-greedy is necessary, and Arda Xi's pattern won't work. .* would race to the end of the string and then backup one character at a time until it can match the .php, which certainly isn't what you'd want.
To find all the occurrences, you'll have to remove the start anchor and use the preg_match_all function instead of preg_match :
if(preg_match_all('/\/(.*?)\.php/',$input,$matches)) {
var_dump($matches[1]); // will print all filenames (after / and before .php)
}
Also . is a meta char. You'll have to escape it as \. to match a literal period.

Categories