Regex Question Again! - php

I really don't know what my problem is lately, but Regex seems to be giving me the most trouble.
Very simple thing I need to do, but can't seem to get it:
I have a uri that returns either /xmlfeed or /xmlfeed/what/is/this
I want to match /xmlfeed on any occasion.
I've tried many variations of the following:
preg_match('/(\/.*?)\/?/', $_SERVER['REQUEST_URI'], $match);
I would read this as: Match forwardslash then match any character until you come to an optional forwardslash.

Why not:
preg_match ('#/[^/]+#', _SERVER['REQUEST_URI'], $match);
?
$match[0] will give you what you need

why do you need regex that make you confused??
$string = "/xmlfeed/what/is/this";
$s = explode("/",$string,3);
print "/".$s[1]."\n";
output
$ php test.php
/xmlfeed

Your problem is the reluctant quantifier. After the initial slash is matched, .*? consumes the minimum number of characters it's allowed to, which is zero. Then /? takes over; it doesn't see a slash in the next position (which is immediately after the first slash), but that's okay because it's optional. The result: the regex always matches a single slash, and group #1 always matches an empty string.
Obviously, you can't just replace the reluctant quantifier with a greedy one. But if you replace the .* with something that can't match a slash, you don't have to worry about greediness. That's what K Prime's regex, '#/[^/]+#' does. Notice as well how it uses # as the regex delimiter and avoids the necessity of escaping slashes within the regex.

In PHP: '/(\/.*?)\/?/' is a string containing a regular expression.
First you have to decode the string: /(/.*?)\/?/
So you have a forward slash that starts the result expression. An opening brace. A forward slash that ends the matching part of the expression … and I'm pretty sure that it will then error since you haven't closed the brace.
So, to get this working:
Remember to escape characters with special meanings in strings and regular expressions
Don't confuse the forward slash / with the backslash \
You want to match everything after and including the first slash, but before any (optional) second slash (so we don't want the ? that makes it non-greedy):
/(\/[^\/]*)/
Which, expressed as a PHP string is:
'/(\\/([^\\/]*)/'

I know this is avoiding the regex, and therefore avoids the question, but how about splitting the uri (at slashes) into an array.
Then you can deal with the elements of the array, and ignore the bits of the uri you don't want.

Using the suggestions posted, I ended up trying this:
echo $_SERVER['REQUEST_URI'];
preg_match("/(\/.*)[^\/]/", $_SERVER['REQUEST_URI'], $match);
$url = "http://".$_SERVER['SERVER_NAME'].$match[0];
foreach($match as $k=>$v){
echo "<h1>$k - $v</h1>";
}
I also tried it without the .* and without the parentheses.
Without the .* AND () it returns the / with the next character ONLY.
Like it is, it just returns the entire URI everytime
So, when ran with the code above, the output is
/tea-time-blog/post/20
0 - /tea-time-blog/post/20
1 - /tea-time-blog/post/2
This code is being eval()'d by the way. I don't think that should make any differnce in the way PHP handles the regular expression.

Related

preg_replace pattern to remove pNUMBERxNUMBER

Im trying to locate a pattern with preg_replace() and remove it...
I have a string, that contains this: p130x130/ and these numbers vary, they can be higher, or lower ... what I need to do is locate that string, and remove it, whole thing.
I've been trying to use this:
preg_replace('/p+[0-9]+x+[0-9]"/', '', $str);
but that doesnt work for some reason. Would any of you know the correct regexp?
Kind regards
You need to first remove the + quantifier after p then switch the + quantifier from after x and place it after your character class (e.g. x[0-9]+), also remove the quote " inside of your expression, which to me looks like a typo here. You can also use a different delimiter to avoid escaping the ending slash.
$str = preg_replace('~p[0-9]+x[0-9]+/~', '', $str);
If the ending slash is by mistake a typo as well, then this is what you're looking for.
$str = preg_replace('/p[0-9]+x[0-9]+/', '', $str);
Regex to match p130x130/ is,
p[0-9]+x[0-9]+\/
Try this:
$str = preg_replace("/p[0-9]+?x[0-9]+?\//is","",$str);
As mentioned by the comment I have to explain the code as I'm a teacher now.
I've used "/" as a delimiter, but you can use different characters to avoid slashing.
The part that says [0-9]+ is saying to match any character between 0 and 9 at least once, but more if possible. If I had put [0-9]*? then it would have matched an empty space too (as * means to match 0 or more, not 1 or more like +) which is probably not what you wanted anyway.
I've put the ? at the end to make it non-greedy, just a habit of mine but I don't think it's needed. (I used ereg a lot previously).
Anyway, it's going to find 0-9 until it hits an x, and then it does another match for more numbers until it hits a single forward slash. I've backslashed that slash because my delimiter is a slash also and I didn't want it to end there.

Regex - Match Word Aslong As Nothing Follows It

Having a little trouble with regex. I'm trying to test for a match but only if nothing follows it. So in the below example if I go to test/create/1/2 - it still matches. I only want to match if it's explicitally test/create/1 (but the one is dynamic).
if(preg_match('^test/create/(.*)^', 'test/create/1')):
// do something...
endif;
I've found some answers that suggest using $ before my delimiter but it doesn't appear to do anything. Or a combination of ^ and $ but I can't quite figure it out. Regex confuses the hell out of me!
EDIT:
I didn't really explain this well enough so just to clarify:
I need the if statement to return true if a URL is test/create/{id} - the {id} being dynamic (and of any length). If the {id} is followed by a forward slash the if statement should fail. So that if someone types in test/create/1/2 - it will fail because of the forward slash after the 1.
Solution
I went for thedarkwinter's answer in the end as it's what worked best for me, although other answers did work as well.
I also had to add an little extra in the regex to make sure that it would work with hyphens as well so the final code looked like this:
if(preg_match('^test/create/[\w-]*$^', 'test/create/1')):
// do something...
endif;
/w matches word characters, and $ matches end of string
if(preg_match('^test/create/\w*$^', 'test/create/1'))
will match test/create/[word/num] and nothing following.
I think thats what you are after.
edit added * in \w*
Here you go:
"/^test\\/create\\/([^\\/]*)$/"
This says:
The string that starts with "test" followed by a forward slash (remember the first backslash escapes the second so PHP puts a letter backslash in the input, which escapes the / to regex) followed by create followed by a forward slash followed by and capture everything that isn't a slash which is then the end of the string.
Comment if you need more detail
I prefer my expressions to always start with / because it has no meaning as a regex character, I've seen # used, I believe some other answer uses ^, this means "start of string" so I wouldn't use it as my regex delimiters.
Use following regular expression (use $ to denote end of the input):
'|test/create/[^/]+$|'
If you want only match digits, use folloiwng instead (\d match digit character):
'^test/create/\d+$^'
The ^ is an anchor for the beginning of the line, i.e. no characters occurring before the ^ . Use a $ to designate the end of the string, or end of the line.
EDIT: wanted to add a suggestion as well:
Your solution is fine and works, but in terms of style I'd advise against using the carat (^) as a delimiter -- especially because it has special meaning as either negation or as a start of line anchor so it's a bit confusing to read it that way. You can legally use most special characters as long as they don't occur (or are escaped) in the regex itself. Just talking about a matter of style/maintainability here.
Of course nearly every potential delimiter has some special meaning, but you also often tend to see the ^ at the beginning of a regex so I might chose another alternative. For example # is a good choice here :
if(preg_match('#test/create/[\w-]*$#', $mystring)) {
//etc
}
The regex abc$ will match abc only when it's the last string.
abcd # no match
dabc # match
abc # match

What do these certain symbols/parts mean in preg_match?

I know a little about preg_match, however there are some that look rather complex and some that contain symbols that I don't entirely understand. For example:
On the first one - I can only assume this has something to do with an e-mail address and url, but what do things like [^/] and the ? mean?
preg_match('#^(?:http://)?([^/]+)#i', $variable);
.....
In the second one - what do things like the ^, {5} and $ mean?
preg_match("/^[A-Z]{5}[0-9]{4}[A-Z]{1}$/", $variable);
It's just these small things I'm not entirely sure on and a brief explanation would be much appreciated.
Here are the direct answers. I kept them short because they won't make sense without an understanding of regex. That understanding is best gained at http://www.regular-expressions.info/tools.html. I advise you to also try out the regex helper tools listed there, they allow you to experiment - see live capturing/matching as you edit the pattern, very helpful.
Simple parentheses ( ) around something makes it a group. Here you have (?=) which is an assertion, specifically a positive look ahead assertion. All it does is check whether what's inside actually exists forward from the current cursor position in the haystack. Still with me?
Example: foo(?=bar) matches foo only if followed by bar. bar is never matched, only foo is returned.
With this in mind, let's dissect your regex:
/^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/
Reads as:
^.* From Start, capture 0-many of any character
(?=.{4,}) if there are at least 4 of anything following this
(?=.*[0-9]) if there is: 0-many of any, ending with an integer following
(?=.*[a-z]) if there is: 0-many of any, ending with a lowercase letter following
(?=.*[A-Z]) if there is: 0-many of any, ending with an uppercase letter following
.*$ 0-many of anything preceding the End
Although I am not a fan of just posting links, I think a regex tutorial would be too much. So check out this Regular Expression cheat sheet it will probably get you on your way if you already have a little understanding of what it does.
Also check out this for some explanations and more helpful links; http://coding.smashingmagazine.com/2009/06/01/essential-guide-to-regular-expressions-tools-tutorials-and-resources/
First one:
The # actually don't have anything to do with the content that is matched. Usually, you use / as the delimiter character in a regex. Downside is, that you need to escape it everytime you want to use it. So here, # is used as the delimiter.
[^/] is a character group. [/] would match only the / character, ^ inverts this. [^/] matches all characters except the /.
Second one:
^ matches the beginning of the string, $ the end of the string. You can use this to enforce that the regex has to apply to the whole string you are matching on.
{5} is a quantifier. It is equivalent to {5,5} which is minimum 5, maximum 5, so it matches exactly 5 characters.
first one:
[^/] = everything but no slash
second one:
^ look from beginning of $variable
{5} exactly 5 occurencies of [A-Z]
$ look until end of $variable reached
combination of ^ and $ means that everything between that has to apply to $variable

A preg_replace puzzle: replacing zero or more of a char at the end of the subject

Say $d is a directory path and I want to ensure that it starts and ends with exactly one slash (/). It may initially have zero, one or more leading and/or trailing slashes.
I tried:
preg_replace('%^/*|/*$', '/', $d);
which works for the leading slash but to my surprise yields two trailing slashes if $d has at least one trailing slash. If the subject is, e.g., 'foo///' then preg_replace() first matches and replaces the three trailing slashes with one slash and then it matches zero slashes at the end and replaces that with with a slash. (You can verify this by replacing the second argument with '[$0]'.) I find this rather counterintuitive.
While there are many other ways to solve the underlying problem (and I implemented one) this became a PCRE puzzle for me: what (scalar) pattern in a single preg_replace does this job?
ADDITIONAL QUESTION (edit)
Can anyone explain why this pattern matches the way it does at the end of the string but does not behave similarly at the start?
$path = '/' . trim($path, '/') . '/';
This first removes all slashes at beginning or end and then adds single ones again.
Given a regex like /* that can legitimately match zero characters, the regex engine has to make sure that it never matches more than once in the same spot, or it would get stuck in an infinite loop. Thus, if it does consume zero characters, the engine jumps forward one position before attempting another match. As far as I know, that's the only situation in which the regex engine does anything on its own initiative.
What you're seeing is the opposite situation: the regex consumes one or more characters, then on the next go-round it tries to start matching at the spot where it left off. Never mind that this particular regex can't match anything but the one character, and it already matched as many of those as it could; it still has the option of matching nothing, so that's what it does.
So, why doesn't your regex match twice at the beginning, like it does at the end? Because of the start anchor (^). If the subject starts with one or more slashes, it consumes them and then tries to match zero slashes, but it fails because it's not at the beginning of the string any more. And if there are no slashes at the beginning, the manual bump-along has the same affect.
At the end of the subject it's a different story. If there are no slashes there, it matches nothing, tries to bump along and fails; end of story. But if it does match one or more slashes, it consumes them and tries to match again--and succeeds because the $ anchor still matches.
So in general, if you want to prevent this kind of double match, you can either add a condition to the beginning of the match to prevent it, like the ^ anchor does for the first alternative:
preg_replace('%^/*|(?<!/)/*$%', '/', $d);
...or make sure that part of the regex has to consume at least one character:
preg_replace('%^/*|([^/])/*$%', '$1/', $d);
But in this case you have a much simpler option, as demonstrated by John Kugelman: just capture the part you want to keep and chuck the rest.
preg_replace('%^/*(.*?)/*$%', '/\1/', $d)
it can be done in a single preg_replace
preg_replace('/^\/{2,}|\/{2,}$|^([^\/])|([^\/])$/', '\2/\1', $d);
A small change to your pattern would be to separate out the two key concerns at the end of the string:
Replace multiple slashes with one slash
Replace no slashes with one slash
A pattern for that (and the existing part for matching at the start of the string) would look like:
#^/*|/+$|$(?<!/)#
A slightly less concise, but more precise, option would be to be very explicit about only matching zero or two-or-more slashes; the notion being, why replace one slash with one slash?
#^(?!/)|^/{2,}|/{2,}$|$(?<!/)#
Aside: nikic's suggestion to use trim (to remove leading/trailing slashes, then add your own) is a good one.

php regular expression help finding multiple filenames only not full URL

I am trying to fix a regular expression i have been using in php it finds all find filenames within a sentence / paragraph. The file names always look like this: /this-a-valid-page.php
From help i have received on SOF my old pattern was modified to this which avoids full urls which is the issue i was having, but this pattern only finds one occurance at the beginning of a string, nothing inside the string.
/^\/(.*?).php/
I have a live example here: http://vzio.com/upload/reg_pattern.php
Remove the ^ - the carat signifies the beginning of a string/line, which is why it's not matching elsewhere.
If you need to avoid full URLs, you might want to change the ^ to something like (?:^|\s) which will match either the beginning of the string or a whitespace character - just remember to strip whitespace from the beginning of your match later on.
The last dot in your expression could still cause problems, since it'll match "one anything". You could match, for example, /somefilename#php with that pattern. Backslash it to make it a literal period:
/\/(.*?)\.php/
Also note the ? to make .* non-greedy is necessary, and Arda Xi's pattern won't work. .* would race to the end of the string and then backup one character at a time until it can match the .php, which certainly isn't what you'd want.
To find all the occurrences, you'll have to remove the start anchor and use the preg_match_all function instead of preg_match :
if(preg_match_all('/\/(.*?)\.php/',$input,$matches)) {
var_dump($matches[1]); // will print all filenames (after / and before .php)
}
Also . is a meta char. You'll have to escape it as \. to match a literal period.

Categories