Strip out rest of query string after first ampersand - php

I am trying to remove the query string from a url, but I need to leave the first key/var intact. So I know what first occurrence of an ampersand is the point from which I want to discard the query string. What would the best way to do this? Below is my code, which currently just keeps appending to the query string.
<a href="<?php echo $_SERVER["REQUEST_URI"] ?>&sortkey=year&sortval=asc">

You could simply match for everything that is not an ampersand until we hit the first ampersand. E.g.
$incomingURI = 'http://www.example.com/?id=12&left=right&up=down';
preg_match('/[^&]+/', $incomingURI, $match);
$outgoingURI = $match[0];
The above code will output the following in variable $outgoingURI:
http://www.example.com/?id=12
This will be much quicker than using a preg_replace.

If i understood your question correctly, you want to strip out everything after 1st occurrence of ampersand. You can use something like this:
<?php
$uri = 'blah.blah.com?a=b&sortkey=year&sortval=asc';
$new_uri = preg_replace("/([^&]+)&(.*)/", "$1", $uri)
?>
The pattern:
([^&]+) : Match everything except '&'
& : First '&'
(.*) : Any thing after that
Is replaced by first group ($1), which is anything before first occurrence of &.

With the strpos function find the location of the ampersand. Then with the substr function get the part of the URL until that point.

strpos will work, but unless you are using RewriteRule, $_SERVER['SCRIPT_NAME'] or $_SERVER['PHP_SELF'] should suffice (and presumably more efficient).
If the URL IS being rewritten, then $_SERVER['REDIRECT_URL'] is more appropriate.
EDIT: I missed the bit about keeping first part of query string :s

Related

regex to clean up url

I am looking for a way to get a valid url out of a string like:
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
My original solution was:
preg_match('#^[^:|]*#', str_replace('//', '/', $string), $modifiedPath);
But obviously its going to remove a slash from the http:// instead of the one in the middle of the string.
My expected output that I want from the original is:
http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
I could always break off the http part of the string first but would like a more elegant solution in the form of regex if possible. Thanks.
This will do exactly what you are asking:
<?php
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
preg_match('/^([^|]+)/', $string, $m); // get everything up to and NOT including the first pipe (|)
$string = $m[1];
$string = preg_replace('/(?<!:)\/\//', '/' ,$string); // replace all occurrences of // as long as they are not preceded by :
echo $string; // outputs: http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
exit;
?>
EDIT:
(?<!X) in regular expressions is the syntax for what is called a lookbehind. The X is replaced with the character(s) we are testing for.
The following expression would match every instance of double slashes (/):
\/\/
But we need to make sure that the match we are looking for is NOT preceded by the : character so we need to 'lookbehind' our match to see if the : character is there. If it is then we don't want it to be counted as a match:
(?<!:)\/\/
The ! is what says NOT to match in our lookbehind. If we changed it to (?=:)\/\/ then it would only match the double slashes that did have the : preceding them.
Here is a Quick tutorial that can explain it all better than I can lookahead and lookbehind tutorial
Assuming all your strings are in the form given, you don't need any but the simplest of regexes to do this; if you want an elegant solution, then a regex is definitely not what you need. Also, double slashes are legal in a URL, just like in a Unix path, and mean the same thing a single slash does, so you don't really need to get rid of them at all.
Why not just
$url = array_shift(preg_split('/\|/', $string));
?
If you really, really care about getting rid of the double slashes in the URL, then you can follow this with
$url = preg_replace('/([^:])\/\//', '$1/', $url);
or even combine them into
$url = preg_replace('/([^:])\/\//', '$1/', array_shift(preg_split('/\|/', $string)));
although that last form gets a little bit hairy.
Since this is a quite strictly defined situation, I'd consider just one preg to be the most elegant solution.
From the top of my head:
$sanitizedURL = preg_replace('~((?<!:)/(?=/)|\\|.+)~', '', $rawURL);
Basically, what this does is look for any forward slash that IS NOT preceded by a colon (:), and IS followed bij another forward slash. It also searches for any pipe character and any character following it.
Anything found is removed from the result.
I can explain the RegEx in more detail if you like.

how do I match a url in php using regex?

I'm trying to match the value of query v in the following regex:
http:\/\/www\.domain\.com\/videos\/video.php\?.*v=([a-z0-9-_]+)
A sample url:
http://www.domain.com/videos/video.php?v=9Gu0sd2dmm91B9b1
The url is always www and I'm only trying to match the v value. Does anyone know what's wrong with my syntax?
Use the parse_url() function. It's way easier to use:
$url_components = parse_url("http://www.domain.com/videos/video.php?v=9Gu0sd2dmm91B9b1");
echo $url_components['query'];
From there I think you can do the rest and slice off the first couple of letters. Once you do that you're left with only the stuff after v=.
you forget the capital letters
http:\/\/www\.domain\.com\/videos\/video.php\?.*v=([a-zA-Z0-9-_]+)
You are not escaping the period '.' in video.php. I also use a different delimiter if I am escaping paths/URL's - like this:
preg_match( "#http://www\.domain\.code/videos/video\.php\?.*v=([^&]*)#", $url, $matches );
If the v= is in the middle of the query string,
v=([^&]*)
.. will match everything up to another & symbol, just in case characters other than alphas and _,- end up in there for some reason.

PHP URL to Link with Regex

I know I've seen this done a lot in places, but I need something a little more different than the norm. Sadly When I search this anywhere it gets buried in posts about just making the link into an html tag link. I want the PHP function to strip out the "http://" and "https://" from the link as well as anything after the .* so basically what I am looking for is to turn A into B.
A: http://www.youtube.com/watch?v=spsnQWtsUFM
B: www.youtube.com
If it helps, here is my current PHP regex replace function.
ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]", "\\0", htmlspecialchars($body, ENT_QUOTES)));
It would probably also be helpful to say that I have absolutely no understanding in regular expressions. Thanks!
EDIT: When I entered a comment like this blahblah https://www.facebook.com/?sk=ff&ap=1 blah I get html like this<a class="bwl" href="blahblah https://www.facebook.com/?sk=ff&ap=1 blah">www.facebook.com</a> which doesn't work at all as it is taking the text around the link with it. It works great if someone only comments a link however. This is when I changed the function to this
preg_replace("#^(.*)//(.*)/(.*)$#",'<a class="bwl" href="\0">\2</a>', htmlspecialchars($body, ENT_QUOTES));
This is the simples and cleanest way:
$str = 'http://www.youtube.com/watch?v=spsnQWtsUFM';
preg_match("#//(.+?)/#", $str, $matches);
$site_url = $matches[1];
EDIT: I assume that the $str had been checked to be a URL in the first place, so I left that out. Also, I assume that all the URLs will contain either 'http://' or 'https://'. In case the url is formatted like this www.youtube.com/watch?v=spsnQWtsUFM or even youtube.com/watch?v=spsnQWtsUFM, the above regexp won't work!
EDIT2: I'm sorry, I didn't realize that you were trying to replace all strings in a whole test. In that case, this should work the way you want it:
$str = preg_replace('#(\A|[^=\]\'"a-zA-Z0-9])(http[s]?://(.+?)/[^()<>\s]+)#i', '\\1\\3', $str);
I am not a regex whizz either,
^(.*)//(.*)/(.*)$
\2
was what worked for me when I tried to use as find and replace in programmer's notepad.
^(.)// should extract the protocol - referred as \1 in the second line.
(.)/ should extract everything till the first / - referred as \2 in the second line.
(.*)$ captures everything till the end of the string. - referred as \3 in the second line.
Added later
^(.*)( )(.*)//(.*)/(.*)( )(.*)$
\1\2\4 \7
This should be a bit better, but will only replace just 1 URL
The \0 is replaced by the entire matched string, whereas \x (where x is a number other than 0 starting at 1) will be replaced by each subpart of your matched string based on what you wrap in parentheses and the order those groups appear. Your solution is as follows:
ereg_replace("[[:alpha:]]+://([^<>[:space:]]+[:alnum:]*)[[:alnum:]/]", "\\1
I haven't been able to test this though so let me know if it works.
I think this should do it (I haven't tested it):
preg_match('/^http[s]?:\/\/(.+?)\/.*/i', $main_url, $matches);
$final_url = ''.$matches[1].'';
I'm surprised no one remembers PHP's parse_url function:
$url = 'http://www.youtube.com/watch?v=spsnQWtsUFM';
echo parse_url($url, PHP_URL_HOST); // displays "www.youtube.com"
I think you know what to do from there.
$result = preg_replace('%(http[s]?://)(\S+)%', '\2', $subject);
The code with regex does not work completely.
I made this code. It is much more comprehensive, but it works:
See the result here: http://cht.dk/data/php-scripts/inc_functions_links.php
See the source code here: http://cht.dk/data/php-scripts/inc_functions_links.txt

PHP if string contains URL isolate it

In PHP, I need to be able to figure out if a string contains a URL. If there is a URL, I need to isolate it as another separate string.
For example: "SESAC showin the Love! http://twitpic.com/1uk7fi"
I need to be able to isolate the URL in that string into a new string. At the same time the URL needs to be kept intact in the original string. Follow?
I know this is probably really simple but it's killing me.
Something like
preg_match('/[a-zA-Z]+:\/\/[0-9a-zA-Z;.\/?:#=_#&%~,+$]+/', $string, $matches);
$matches[0] will hold the result.
(Note: this regex is certainly not RFC compliant; it may fetch malformed (per the spec) URLs. See http://www.faqs.org/rfcs/rfc1738.html).
this doesn't account for dashes -. needed to add -
preg_match('/[a-zA-Z]+:\/\/[0-9a-zA-Z;.\/\-?:#=_#&%~,+$]+/', $_POST['string'], $matches);
URLs can't contain spaces, so...
\b(?:https?|ftp)://\S+
Should match any URL-like thing in a string.
The above is the pure regex. PHP preg_* and string escaping rules apply before you can use it.
$test = "SESAC showin the Love! http://twitpic.com/1uk7fi";
$myURL= strstr ($test, "http");
echo $myURL; // prints http://twitpic.com/1uk7fi

How can I match the domain part of a URL in PHP?

I'm so bad at regexp, but I'm trying to get some/path/image.jpg out of http://somepage.com/some/...etc and trying this method:
function removeDomain($string) {
return preg_replace("/http:\/\/.*\//", "", $string);
}
It isn't working -- so far as I can tell it's just returning a blank string. How do I write this regexp?
you should use parse_url
you might want to use this rather than regex:
http://cz2.php.net/manual/en/function.parse-url.php
this will break up the URL for you, so you just read the resulting array for the domain name
Use parse_url as other people have already said.
But to answer your question about why your regex isn't working, it will match an entire URL because .* matches anything, and indeed it is. It is matching the whole URL, and replacing it with an empty string, hence your results. Try the following instead which will only match a hostname (anything up to the first '/'):
function removeDomain($string) {
return preg_replace("#^https?://[^/]+/#", "", $string);
}
While SilentGhost is right, the reason your regex is failing is because .* is greedy, and will eat everything, as long as there is a / afterwards.
If you put a ? mark after your .*, it will only match until the first /
function removeDomain($string) {
return preg_replace("/http:\/\/.*?\//", "", $string);
}

Categories