Remove part of a string with regex - php

I'm trying to strip part of a string (which happens to be a url) with Regex. I'm getting better out regex but can't figure out how to tell it that content before or after the string is optional. Here is what I have
$string='http://www.example.com/username?refid=22';
$new_string= preg_replace('/[/?refid=0-9]+/', '', $string);
echo $new_string;
I'm trying to remove the ?refid=22 part to get http://www.example.com/username
Ideas?
EDIT
I think I need to use Regex instead of explode becuase sometimes the url looks like http://example.com/profile.php?id=9999&refid=22 In this case I also want to remove the refid but not get id=9999

parse_url() is good for parsing URLs :)
$string = 'http://www.example.com/username?refid=22';
$url = parse_url($string);
// Ditch the query.
unset($url['query']);
echo array_shift($url) . '://' . implode($url);
CodePad.
Output
http://www.example.com/username
If you only wanted to remove that specific GET param, do this...
parse_str($url['query'], $get);
unset($get['refid']);
$url['query'] = http_build_query($get);
CodePad.
Output
http://example.com/profile.php?id=9999
If you have the extension, you can rebuild the URL with http_build_url().
Otherwise you can make assumptions about username/password/port and build it yourself.
Update
Just for fun, here is the correction for your regular expression.
preg_replace('/\?refid=\d+\z/', '', $string);
[] is a character class. You were trying to put a specific order of characters in there.
\ is the escape character, not /.
\d is a short version of the character class [0-9].
I put the last character anchor (\z) there because it appears it will always be at the end of your string. If not, remove it.

Dont use regexs if you dont have to
echo current( explode( '?', $string ) );

Related

Regex to turn url into a clean page title

I use $_SERVER['REQUEST_URI'] to get this /res/about_us.php, I want the output to be About Us, I know I can use preg_replace for this, it is the regular expression I am struggling with.
I have this pitiful attempt thus far:
echo str_replace('.php', '', $uri);
I realised after this that I cant replace two cases using str_replace... So I thought I need preg_replace with regex, but not a clue how it works, cant seem to find a similar example through Google.
Regards
Without regular expressions you can do:
// Get the filename, e.g., 'about_us'.
$filename = pathinfo($_SERVER['REQUEST_URI'], PATHINFO_FILENAME);
// Replace underscore with spaces and capitalise words
$title = ucwords(str_replace('_', ' ', $filename));
See pathinfo, ucwords
Actually you can replace multiple cases with str_replace by passing an array of things you'd like to replace as first parameter of the function! Something like this:
echo ucwords(str_replace(array('_', '-', '.php'), ' ', basename($uri)));
Inside the array you could put whatever you'd like to replace.

Write regular expression in preg_replace

I still don't understand how regular expression work with preg_replace. I have some url in text:
site.com/user/login.php?valid=tru
site.com/eng/page/some_page.php?valid=tru&anothervar=1
I want to change it so it become this
site.com/user/login/
site.com/eng/page/some_page/
preg_replace(" 'no_mater_what_1'.php'no_mater_what_2' " , 'no_mater_what_1'/ , $some_var);
To avoid traps, like an other .php substring in the path, you can use this replacement:
$url = preg_replace('~\.php(?:[?#]\N*|\z)~i', '', $url, -1, $c);
if (!$c) // not a php file, do something else
or in this way:
if (preg_match('~[^?#]+\.php(?=[?#]|\z)~Ai', $url, $m))
$url = $m[0];
else
// not a php file, do something else
This way ensures that the .php matched is the extension of the file because the regex engine will find the leftmost result that is followed by either a ? for the query part, a # for the fragment part or the end of the string.
pattern elements:
\N: a character that isn't a newline.
\z: anchor for the end of the string.
A: modifier that anchors the pattern at the start of the string
(?=...): lookahead assertion
The advantage of this approach is the safety with a good efficiency.
An other way with parse_url:
You can use parse_url to separate an url into parts. If this way is a little fastidious because you need to rebuild the url after (and the way you will rebuild it depends of the elements present in the url), it's however far from impossible and provides too a safe way.
But why not simply do this:
$replace = explode('.php',$some_var);
$replace = $replace[0] . '/';
Because that I find it necessary to use a regular expression, because ".php" is not repeated in the string.
This should work
$subject = 'site.com/eng/page/some_page.php?valid=tru&anothervar=1';
if (preg_match('/(.*)\.php(?:\?.*)/', $subject, $regs)) {
$result = $regs[1] .'/';
echo $subject .' => '. $result;
} else {
echo 'NOT FOUND';
}
The regular expression doing the magic is this
/(.*)\.php(?:\?.*)?/
by parts:
(.*)\.php
Capture everything until (excluding) ".php"
(?:\?.*)
Search for the pattern "?..."
?
Make that last pattern optional
Because your two examples shows up on the same line, this looks a bit confusing. However, it appears that you want to replace everything from .php to the end of the line with a /. So, use:
$new_link = preg_replace('/\.php.*$/', '/', $old_link);
You need the \ in front of the . because . is a special character that needs to be escaped to make it work like a period. Then, you look for php, in that order, followed by anything to the end of the line ($ means end of the line). You replace all of that with /.

regex to clean up url

I am looking for a way to get a valid url out of a string like:
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
My original solution was:
preg_match('#^[^:|]*#', str_replace('//', '/', $string), $modifiedPath);
But obviously its going to remove a slash from the http:// instead of the one in the middle of the string.
My expected output that I want from the original is:
http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
I could always break off the http part of the string first but would like a more elegant solution in the form of regex if possible. Thanks.
This will do exactly what you are asking:
<?php
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
preg_match('/^([^|]+)/', $string, $m); // get everything up to and NOT including the first pipe (|)
$string = $m[1];
$string = preg_replace('/(?<!:)\/\//', '/' ,$string); // replace all occurrences of // as long as they are not preceded by :
echo $string; // outputs: http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
exit;
?>
EDIT:
(?<!X) in regular expressions is the syntax for what is called a lookbehind. The X is replaced with the character(s) we are testing for.
The following expression would match every instance of double slashes (/):
\/\/
But we need to make sure that the match we are looking for is NOT preceded by the : character so we need to 'lookbehind' our match to see if the : character is there. If it is then we don't want it to be counted as a match:
(?<!:)\/\/
The ! is what says NOT to match in our lookbehind. If we changed it to (?=:)\/\/ then it would only match the double slashes that did have the : preceding them.
Here is a Quick tutorial that can explain it all better than I can lookahead and lookbehind tutorial
Assuming all your strings are in the form given, you don't need any but the simplest of regexes to do this; if you want an elegant solution, then a regex is definitely not what you need. Also, double slashes are legal in a URL, just like in a Unix path, and mean the same thing a single slash does, so you don't really need to get rid of them at all.
Why not just
$url = array_shift(preg_split('/\|/', $string));
?
If you really, really care about getting rid of the double slashes in the URL, then you can follow this with
$url = preg_replace('/([^:])\/\//', '$1/', $url);
or even combine them into
$url = preg_replace('/([^:])\/\//', '$1/', array_shift(preg_split('/\|/', $string)));
although that last form gets a little bit hairy.
Since this is a quite strictly defined situation, I'd consider just one preg to be the most elegant solution.
From the top of my head:
$sanitizedURL = preg_replace('~((?<!:)/(?=/)|\\|.+)~', '', $rawURL);
Basically, what this does is look for any forward slash that IS NOT preceded by a colon (:), and IS followed bij another forward slash. It also searches for any pipe character and any character following it.
Anything found is removed from the result.
I can explain the RegEx in more detail if you like.

PHP Regex moving selection to different location in string

I currently have this regex:
$text = preg_replace("#<sup>(?:(?!</?sup).)*$key(?:(?!</?sup).)*<\/sup>#is", '<sup>'.$val.'</sup>', $text);
The objective of the regex is to take <sup>[stuff here]$key[stuff here]</sup> and remove the stuff within the [stuff here] locations.
What I actually would like to do, is not remove $key[stuff here]</sup>, but simply move the stuff to $key</sup>[stuff here]
I've tried using $1-$4 and \\1-\\4 and I can't seem to get the text to be added after </sup>
Try this;
$text = preg_replace(
'#<sup>((?:(?!</?sup).)*)'.$key.'((?:(?!</?sup).)*)</sup>#is',
'<sup>'.$val.'</sup>\1\2',
$text
);
The (?:...)* bit isn't actually a sub-pattern, and is therefor not available using backreferences. Also, if you use ' rather than " for string literals, you will only need to escape \ and '
// Cheers, Morten
You have to combine preg_match(); and preg_replace();
You match the desired stuff with preg_match() and store in to the variable.
You replace with the same regex to empty string.
Append the variable you store to at the end.

Remove spaces from the beginning and end of a string

I am pretty new to regular expressions.
I need to clean up a search string from spaces at the beginning and the end.
Example: " search string "
Result: "search string"
I have a pattern that works as a javascript solution but I cant get it to work on PHP using preg_replace:
Javascript patern that works:
/^[\s]*(.*?)[\s]*$/ig
My example:
$string = preg_replace( '/^[\s]*(.*?)[\s]*$/si', '', " search string " );
print $string; //returns nothing
On parse it tells me that g is not recognized so I had to remove it and change the ig to si.
If it's only white-space, why not just use trim()?
Yep, you should use trim() I guess.. But if you really want that regex, here it is:
((?=^)(\s*))|((\s*)(?>$))

Categories