PHP Regex - Issue with forward slashes and alternation - php

I have a series of URLs like so:
http://www.somesite.com/de/page
http://www.somesite.com/de/another
http://www.somesite.com/de/page/something
http://www.somesite.com/de/page/bar
I need to search the block of text and pull the language and am using a regex like so:
/(de|en|jp)/
I'm trying to find and replace, via preg_replace and including the forward slashes:
/de/
/en/
/jp/
However, this doesn't work and does not include the slashes. I've tried escaping the slashes with \, \\. I've tried placing the needle in preg_quote but this breaks the alternation.
I feel like I am missing something very simple here!
edit:
Full function call:
preg_replace("/(de|en|jp)/", "/".$newLang."/", $url);
--
(tagged magento and wordpress as I am trying to solve an issue with unifying the navigation menu when both CMSes are multilingual)

You don't have to use slashes as delimiters, but you have to have some delimiter. Try this:
if( preg_match("(/(de|en|jp)/)",$url,$m)) {
$lanuage = $m[1];
}

You can use a different delimiter, such as %.
if (preg_match('%/(de|en|jp)/%', $url, $match)) {
$lang = $match[1];
}
That should help you, just modify what you have :).

Related

Regex to match a section between two static url components

I have a url like so: http://example.com/c/TEXTTOMATCH/. The problem is that the url isn't always like that; sometimes it's http://example.com/c/TEXTTOMATCH/#/?test. I'm trying to use a regex to grab everything between /c/ and /. I've tried
$catpreg = preg_match('/c(.*)/', $reffer, $matches);
but it fails.
How about this:
<?php
$url='http://example.com/wreqwreqrq/rfqewrqwe/c/TEXTTOMATCH/';
$split_url=parse_url($url, PHP_URL_PATH);
//print_r($split_url);
$e=explode('/',$split_url);
//find "c" key and add one
$find=array_search('c',$e);
echo $e[$find+1];
Try this:
preg_match('#/c/(.*?)/#', $reffer, $matches);
You were just everything after c, not matching the slashes. The slashes in your call were being used as the delimiters around the regexp, I used # as the delimiters so I could use / inside the regexp without having to escape them.
The non-greedy quantifier .*? ensures that it only matches TEXTTOMATCH in the second example, not TEXTTOMATCH/#.

PHP preg_replace pattern only seems to work if its wrong?

I have a string that looks like this
../Clean_Smarty_Projekt/tpl/templates_c\.
../Clean_Smarty_Projekt/tpl/templates_c\..
I want to replace ../, \. and \.. with a regulare expression.
Before, I did this like this:
$result = str_replace(array("../","\..","\."),"",$str);
And there it (pattern) has to be in this order because changing it makes the output a little buggy. So I decided to use a regular expression.
Now I came up with this pattern
$result = preg_replace('/(\.\.\/)|(\\[\.]{1,2})/',"",$str);
What actually returns only empty strings...
Reason: (\\[\.]{1,2})
In Regex101 its all ok. (Took me a couple of minutes to realize that I don't need the /g in preg_replace)
If I use this pattern in preg_replace I have to do (\\\\[\.]{1,2}) to get it to work. But that's obviously wrong because im not searching for two slashes.
Of course I know the escaping rulse (escaping slashes).
Why doesn't this match correctly ?
I suggest you to use a different php delimiter. Within the / delimiter, you need to use three \\\ or four \\\\ backslashes to match a single backslash.
$string = '../Clean_Smarty_Projekt/tpl/templates_c\.'."\n".'../Clean_Smarty_Projekt/tpl/templates_c\..';
echo preg_replace('~\.\./|\\\.{1,2}~', '', $string)
Output:
Clean_Smarty_Projekt/tpl/templates_c
Clean_Smarty_Projekt/tpl/templates_c

how to use preg_replace to replace all ocurrences of a given pattern?

I have a pattern (a slash followed by 1 or more dashes) inside strings that could occur many times like
/hi/--hello/-hi
I want to replace it with
/hi/hello/hi
I have tried
$str = preg_replace('/\/-+/', '/', $subject);
but this does not seem to be working properly. Am I missing something. I use http://www.debuggex.com/ to test my regex and \/-+ does not seem to match the string.
The reason this doesn't work in debuggex.com is that you don't have to put the delimiters on this site.
Remove the slashes at the begining and at the end from the input box.
Write only: \/-+ or /-+ since you don't need to escape the slashes.

regex to clean up url

I am looking for a way to get a valid url out of a string like:
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
My original solution was:
preg_match('#^[^:|]*#', str_replace('//', '/', $string), $modifiedPath);
But obviously its going to remove a slash from the http:// instead of the one in the middle of the string.
My expected output that I want from the original is:
http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
I could always break off the http part of the string first but would like a more elegant solution in the form of regex if possible. Thanks.
This will do exactly what you are asking:
<?php
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
preg_match('/^([^|]+)/', $string, $m); // get everything up to and NOT including the first pipe (|)
$string = $m[1];
$string = preg_replace('/(?<!:)\/\//', '/' ,$string); // replace all occurrences of // as long as they are not preceded by :
echo $string; // outputs: http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
exit;
?>
EDIT:
(?<!X) in regular expressions is the syntax for what is called a lookbehind. The X is replaced with the character(s) we are testing for.
The following expression would match every instance of double slashes (/):
\/\/
But we need to make sure that the match we are looking for is NOT preceded by the : character so we need to 'lookbehind' our match to see if the : character is there. If it is then we don't want it to be counted as a match:
(?<!:)\/\/
The ! is what says NOT to match in our lookbehind. If we changed it to (?=:)\/\/ then it would only match the double slashes that did have the : preceding them.
Here is a Quick tutorial that can explain it all better than I can lookahead and lookbehind tutorial
Assuming all your strings are in the form given, you don't need any but the simplest of regexes to do this; if you want an elegant solution, then a regex is definitely not what you need. Also, double slashes are legal in a URL, just like in a Unix path, and mean the same thing a single slash does, so you don't really need to get rid of them at all.
Why not just
$url = array_shift(preg_split('/\|/', $string));
?
If you really, really care about getting rid of the double slashes in the URL, then you can follow this with
$url = preg_replace('/([^:])\/\//', '$1/', $url);
or even combine them into
$url = preg_replace('/([^:])\/\//', '$1/', array_shift(preg_split('/\|/', $string)));
although that last form gets a little bit hairy.
Since this is a quite strictly defined situation, I'd consider just one preg to be the most elegant solution.
From the top of my head:
$sanitizedURL = preg_replace('~((?<!:)/(?=/)|\\|.+)~', '', $rawURL);
Basically, what this does is look for any forward slash that IS NOT preceded by a colon (:), and IS followed bij another forward slash. It also searches for any pipe character and any character following it.
Anything found is removed from the result.
I can explain the RegEx in more detail if you like.

How to write regex to find one directory in a URL?

Here is the subject:
http://www.mysite.com/files/get/937IPiztQG/the-blah-blah-text-i-dont-need.mov
What I need using regex is only the bit before the last / (including that last / too)
The 937IPiztQG string may change; it will contain a-z A-Z 0-9 - _
Here's what I tried:
$code = strstr($url, '/http:\/\/www\.mysite\.com\/files\/get\/([A-Za-z0-9]+)./');
EDIT: I need to use regex because I don't actually know the URL. I have string like this...
a song
more text
oh and here goes some more blah blah
I need it to read that string and cut off filename part of the URLs.
You really don't need a regexp here. Here is a simple solution:
echo basename(dirname('http://www.mysite.com/files/get/937IPiztQG/the-blah-blah-text-i-dont-need.mov'));
// echoes "937IPiztQG"
Also, I'd like to quote Jamie Zawinski:
"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."
This seems far too simple to use regex. Use something similar to strrpos to look for the last occurrence of the '/' character, and then use substr to trim the string.
/http:\/\/www.mysite.com\/files\/get\/([^/]+)\/
How about something like this? Which should capture anything that's not a /, 1 or more times before a /.
The greediness of regexp will assure this works fine ^.*/
The strstr() function does not use a regular expression for any of its arguments it's the wrong function for regex replacement.
Are you thinking of preg_replace()?
But a function like basename() would be more appropriate.
Try this
$ok=preg_match('#mysite\.com/files/get/([^/]*)#i',$url,$m);
if($ok) $code=$m[1];
Then give a good read to these pages
http://www.php.net/preg_match
preg_replace
Note
the use of "#" as a delimiter to avoid getting trapped into escaping too many "/"
the "i" flag making match insensitive
(allowing more liberal spellings of the MySite.com domain name)
the $m array of captured results

Categories