Replacing multiple slashes with exception in regex - php

There are quite a few questions on removing multiple slashes using regex in PHP. However, I have a special case I would like to exclude.
I have a full URL as my input: http://localhost/path/to/whatever
I have written to regex to convert backslashes to forward slashes, and then remove multiple consecutive slashes:
$cleaned = preg_replace('/(\\\+)|(\/+)/', "/", trim($input));
This works fine for the most part, however I need to be able to exclude the :// case, otherwise using that expression will result in which is not the intended result:
http:/localhost/path/to/whatever
I have tried using /(\\\+)|^[:](\/+)/, but this doesn't seem to work.
How can I exclude the :// case in my expression?

$cleaned = preg_replace('~(?<!https:|http:)[/\\\\]+~', "/", trim($input));
The subexpression inside the lookbehind can't use quantifiers, so the obvious approach - (?<!https?:) - won't work. But it can be made up of two or more fixed-length alternatives with different lengths. For example:
(?<!https:|http:) # OK
Be aware that the alternation has to be at the top level of the lookbehind, so this won't work:
(?<!(https:|http:)) # error

There is something called "negative look behind" (also available in positive or look ahead)
http://www.phpro.org/tutorials/Introduction-to-PHP-Regex.html
With this you could add an exception by something like
(?<=^https?:)
Then your expression will only match in places NOT preceded by "http:"

Simply a negative look-behind for a colon, preceding two or more forward or backward slashes:
$cleaned = preg_replace('/(?<!:)(?:\\/|\\\\){2,}/', "/", trim($input));

Related

Why are regular expressions wrapped in forward slashes

I'm diving into regex.
$subject = "abcdef";
$pattern = '/^def/';
preg_match($pattern, substr($subject,3), $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
I see that forward slashes are used to mark the beginning and end of a regular expression. I'm trying to find out why that is the case. I can't find any documentation about these forward slashes.
The first and most important thing to know about the preg_...() functions is that they expect one character delimiter on each side of the pattern. For the delimiter, you can choose any character apart from backslashes and spaces. The forward slash is the most used delimiter.
And, everything is stated pretty clear in the documentation referencing delimiters.
For conventional/traditional reasons. Often we'd have three parts to the regex, the pattern, the substitution, and the modifiers. For example, the substitution command:
s/Hello/G'day/g
has three components: the Hello which is the pattern - we want to find "Hello", the G'day which is the substitution - we want to replace "Hello" with "G'day", and g which are the modifiers - in this case, g means to do it as many times as possible and not just once.
With some languages, they have the components as separate arguments rather than being delimited using slashes.
Often if you see it without any slashes, it means it's the pattern. If for some reason a language surrounds a pattern with slashes, you know it means it's the pattern.
Hope this helps. :)

preg_replace pattern to remove pNUMBERxNUMBER

Im trying to locate a pattern with preg_replace() and remove it...
I have a string, that contains this: p130x130/ and these numbers vary, they can be higher, or lower ... what I need to do is locate that string, and remove it, whole thing.
I've been trying to use this:
preg_replace('/p+[0-9]+x+[0-9]"/', '', $str);
but that doesnt work for some reason. Would any of you know the correct regexp?
Kind regards
You need to first remove the + quantifier after p then switch the + quantifier from after x and place it after your character class (e.g. x[0-9]+), also remove the quote " inside of your expression, which to me looks like a typo here. You can also use a different delimiter to avoid escaping the ending slash.
$str = preg_replace('~p[0-9]+x[0-9]+/~', '', $str);
If the ending slash is by mistake a typo as well, then this is what you're looking for.
$str = preg_replace('/p[0-9]+x[0-9]+/', '', $str);
Regex to match p130x130/ is,
p[0-9]+x[0-9]+\/
Try this:
$str = preg_replace("/p[0-9]+?x[0-9]+?\//is","",$str);
As mentioned by the comment I have to explain the code as I'm a teacher now.
I've used "/" as a delimiter, but you can use different characters to avoid slashing.
The part that says [0-9]+ is saying to match any character between 0 and 9 at least once, but more if possible. If I had put [0-9]*? then it would have matched an empty space too (as * means to match 0 or more, not 1 or more like +) which is probably not what you wanted anyway.
I've put the ? at the end to make it non-greedy, just a habit of mine but I don't think it's needed. (I used ereg a lot previously).
Anyway, it's going to find 0-9 until it hits an x, and then it does another match for more numbers until it hits a single forward slash. I've backslashed that slash because my delimiter is a slash also and I didn't want it to end there.

PHP Regex, get multiple value with preg_match

I have this text string:
$text="::tower_unit::7::/tower_unit::<br/>::tower_unit::8::/tower_unit::<br/>::tower_unit::9::/tower_unit::";
Now I want to get the value of 7,8, and 9
how to do that in preg_match_all ?
I've tried this:
$pattern="/::tower_unit::(.*)::\/tower_unit::/i";
preg_match($pattern,$text,$matches);
print_r($matches);
but it still all wrong...
You forgot to escape the slash in your pattern. Since your pattern includes slashes, it's easier to use a different regex delimiter, as suggested in the comments:
$pattern="#::tower_unit::(\d+)::/tower_unit::#";
preg_match_all($pattern,$text,$matches);
I also converted (.*) to (\d+), which is better if the token you're looking for will always be a number. Plus, you might want to lose the i modifier if the text is always lower cased.
Your regex is "greedy".
Use the following one
$pattern="#::tower_unit::(.*?)::/tower_unit::#i";
or
$pattern="#::tower_unit::(.*)::/tower_unit::#iU";
and, if you wish, \d+ instead of .*? or .*
the function should be preg_match_all

rexexp solution for php

I have tried to work this out myself (even bought a Kindle book!), but I am struggling with backreferences in php.
What I want is like the following example:
var $html = "hello %world|/worldlink/% again";
output:
hello world again
I tried stuff like:
preg_replace('/%([a-z]+)|([a-z]+)%/', '\1', $html);
but with no joy.
Any ideas please? I am sure someone will post the exact answer but I would like an explanation as well please - so that I don't have to keep asking these questions :)
The slashes "/" are not included in your allowed range [a-z]. Instead use
preg_replace('/%([a-z]+)\|([a-z\/]+)%/', '\1', $html);
Your expression:
'/%([a-z]+)|([a-z]+)%/'
Is only capturing one thing. The | in the middle means "OR". You're trying to capture both, so you don't need an OR in there. You want a literal | symbol so you need to escape it:
'/%([a-z]+)\|([a-z\/]+)%/'
The / character also needs to be included in your char set, and escaped as above.
Your regex (/%([a-z]+)|([a-z]+)%/) reads this way:
Match % followed by + (= one or
more) a-z characters (and store this
into backreference #1).
Or (the |):
Match + (= one or more) a-z
characters (and store this into
backreference #2) followed by a
%.
What you are looking for is:
preg_replace('~%([a-z]+)[|]([a-z/]+)%~', '$1', $html);
Basically I just escaped the | regex meta character (you can do this by either surrounding it with [] like I did or just prepending a backwards slash \, personally I find the former easier to read), and added a / to the second capture group.
I also changed your delimiters from / to ~ because tildes are much more unlikely to appear in strings, if you want to keep using / as your delimiter you also have to escape their occurrences in your regex.
It's also recommended that you use the $ syntax instead of \ in your replacement backreferences:
$replacement may contain references
of the form \\n or (since PHP 4.0.4)
$n, with the latter form being the
preferred one.
Here is a version that works according to the OPs data/information provided (using a non-slash delimiter to avoid escaping slashes):
preg_replace('#%([a-z]+)\|([a-z/]+)%#', '\1', $html);
Using a non slash delimiter, would alleviate the need to escape slashes.
Outputs:
hello world again
The Explanation
Why yours did not work. First up the | is an OR operator, and, in your example, should be escaped. Second up, since you are using /'s or expect slashes it is better to use a non-slash delimiter, such as #. Third up, the slash needed to be added to list of allowed matches. As stated before you may want to include a bit more options, as any type of word with numbers underscores periods hyphens will fail / break the script. Hopefully that is the explanation you were looking for.
Here's what works for me:
preg_replace('/%([a-z]+)\|([a-z\/]+)%/', '\1', $html);
Your regular expression doesn't escape the |, and doesn't include the proper characters for the URL.
Here's a basic live example supporting only a-z and slashes:
preg_replace('/%([a-z]+)\|([a-z\/]+)%/', '\1', $html);
In reality, you're going to want to change those [a-z]+ blocks to something more expressive. Do some searches for URL-matching regular expressions, and pick one that fits what you want.
$html = "hello %world|/worldlink/% again";
echo preg_replace('/([A-ZA-z_ ]*)%(.+)\|(.+)%([A-ZA-z_ ]*)/', '$1$2$4', $html);
output:
hello world again
here is a working code : http://www.ideone.com/0qhZ8

Regex Question Again!

I really don't know what my problem is lately, but Regex seems to be giving me the most trouble.
Very simple thing I need to do, but can't seem to get it:
I have a uri that returns either /xmlfeed or /xmlfeed/what/is/this
I want to match /xmlfeed on any occasion.
I've tried many variations of the following:
preg_match('/(\/.*?)\/?/', $_SERVER['REQUEST_URI'], $match);
I would read this as: Match forwardslash then match any character until you come to an optional forwardslash.
Why not:
preg_match ('#/[^/]+#', _SERVER['REQUEST_URI'], $match);
?
$match[0] will give you what you need
why do you need regex that make you confused??
$string = "/xmlfeed/what/is/this";
$s = explode("/",$string,3);
print "/".$s[1]."\n";
output
$ php test.php
/xmlfeed
Your problem is the reluctant quantifier. After the initial slash is matched, .*? consumes the minimum number of characters it's allowed to, which is zero. Then /? takes over; it doesn't see a slash in the next position (which is immediately after the first slash), but that's okay because it's optional. The result: the regex always matches a single slash, and group #1 always matches an empty string.
Obviously, you can't just replace the reluctant quantifier with a greedy one. But if you replace the .* with something that can't match a slash, you don't have to worry about greediness. That's what K Prime's regex, '#/[^/]+#' does. Notice as well how it uses # as the regex delimiter and avoids the necessity of escaping slashes within the regex.
In PHP: '/(\/.*?)\/?/' is a string containing a regular expression.
First you have to decode the string: /(/.*?)\/?/
So you have a forward slash that starts the result expression. An opening brace. A forward slash that ends the matching part of the expression … and I'm pretty sure that it will then error since you haven't closed the brace.
So, to get this working:
Remember to escape characters with special meanings in strings and regular expressions
Don't confuse the forward slash / with the backslash \
You want to match everything after and including the first slash, but before any (optional) second slash (so we don't want the ? that makes it non-greedy):
/(\/[^\/]*)/
Which, expressed as a PHP string is:
'/(\\/([^\\/]*)/'
I know this is avoiding the regex, and therefore avoids the question, but how about splitting the uri (at slashes) into an array.
Then you can deal with the elements of the array, and ignore the bits of the uri you don't want.
Using the suggestions posted, I ended up trying this:
echo $_SERVER['REQUEST_URI'];
preg_match("/(\/.*)[^\/]/", $_SERVER['REQUEST_URI'], $match);
$url = "http://".$_SERVER['SERVER_NAME'].$match[0];
foreach($match as $k=>$v){
echo "<h1>$k - $v</h1>";
}
I also tried it without the .* and without the parentheses.
Without the .* AND () it returns the / with the next character ONLY.
Like it is, it just returns the entire URI everytime
So, when ran with the code above, the output is
/tea-time-blog/post/20
0 - /tea-time-blog/post/20
1 - /tea-time-blog/post/2
This code is being eval()'d by the way. I don't think that should make any differnce in the way PHP handles the regular expression.

Categories