Extract url from string (no regex) - php

I have a string which contains a url. I am trying to extract the url from the additional text in the most efficient way. So far I have been using explode but I have to explode twice and then rebuild the url. Regex is not something I dominate yet so i placed it out of the question(unless it is the best solution). Is there a way to extract the url in one step?
$url = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$strip1 = explode( '&', $url );
$strip2 = explode('=', $strip1[0]);
$result = $strip2[1].'='.$strip2[2];
result:
http://www.somesite.com/sites/pages/page?id=1545778

Try like this:use preg_split()
$date = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$t =preg_split("/[=&]/", $date);
echo $t[1]."=".$t[2]; //output: http://www.somesite.com/sites/pages/page?id1545778

$strip1 = explode( '/url?q=', $url );
Use this regex $strip1
^((http):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$
you will get an array of sections in the url

Ugly one-step regex-free solution.
$url = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$result = substr( $url, strpos( $url, '=' ) + 1, strpos( $url, '&' ) - strpos( $url, '=' ) - 1 );
echo $result;
And cleaner two-step variation.
$url = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$start = strpos( $url, '=' ) + 1;
$result = substr( $url, $start, strpos( $url, '&' ) - $start );
echo $result;
Somewhat less-ugly regex solution.
$url = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$result = preg_replace( '/[^=]*=([^&]*).*/', '${1}', $url );
echo $result;
Both produce the following output.
http://www.somesite.com/sites/pages/page?id=1545778

Technically, that second ? in the URL should be URL encoded, but we can get around that. Use parse_url to get the query, then replace ? with a URL encoded version using str_replace. After this, you will have a valid query that you can parse using parse_str.
$query = parse_url($url, PHP_URL_QUERY);
$query = str_replace("?", urlencode("?"), $query);
parse_str($query, $params);
echo $params['q'];
// displays http://www.somesite.com/sites/pages/page?id=1545778

$url = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$strip3 = current(explode('&',end(explode('=', $url,2))));
print_r ($strip3); //output http://www.somesite.com/sites/pages/page?id=1545778

Related

How to strip text from string if contained

I have a string $current_url that can contain 2 different values:
http://url.com/index.php&lang=en
or
http://url.com/index.php&lang=jp
in both cases I need to strip the query part so I get: http://url.com/index.php
How can I do this in php?
Thank you.
Simplest Solution
$url = 'http://url.com/index.php&lang=en';
$array = explode('&', $url);
echo $new_url =$array[0];
To only remove the lang query do this
$url = 'http://url.com/index.php&lang=en&id=1';
$array = explode('&lang=en', $url);
echo $new_url = $array[0] .''.$array[1];
//output http://url.com/index.php&id=1
So this way it only removes the lang query and keep other queries
If the value of your lang parameter is always of length 2, which should be the case for languages, you could use:
if(strpos($current_url, '&lang=') !== false){
$current_url = str_replace(substr($current_url, strpos($current_url, '&lang='), 8), '', $current_url);
}
If the substring "&lang=" is present in $current_url, it removes a substring of length 8, starting at the "&lang=" position. So it basically removes "&lang=" plus the 2 following chars.
You can Use strtok to remove the query string from url.
<?php
echo $url=strtok('http://url.com/index.php&lang=jp','&');
?>
DEMO
Answer based on comment.
You can use preg_replace
https://www.codexworld.com/how-to/remove-specific-parameter-from-url-query-string-php/
<?php
$url = 'http://url.com/index.php?page=site&lang=jp';
function remove_query_string($url_name, $key) {
$url = preg_replace('/(?:&|(\?))' . $key . '=[^&]*(?(1)&|)?/i', "$1", $url_name);
$url = rtrim($url, '?');
$url = rtrim($url, '&');
return $url;
}
echo remove_query_string($url, 'lang');
?>
DEMO

preg_replace for Get Parameter - /index.php?color=blue&size=xl

I have this url /index.php?color=blue&size=xl
to get rid of the get parameter, I use this code:
$done = preg_replace('/(.*)(\?|&)color=[^&]*(?(1)&|)?/i', "$1", $url);
echo $done;
"output: index.phpsize=xl"
Now I need to clean the "size" part too. Have tried with two lines of preg_replace, but it doesn´t work.
$done = preg_replace('/(.*)(\?|&)color=[^&]*(?(1)&|)?/i', "$1", $url);
echo $done;
$done2 = preg_replace('/(.*)(\?|&)size=[^&]*(?(1)&|)?/i', "$1", $done);
Edit: I really need a solution where I can clean the exact parameter "color" or "size".
Sometimes I will only delete one of them.
Edit2:
Have this solution:
// Url is: index.php?color=black&size=xl&price=20
function removeqsvar($url, $varname) {
return preg_replace('/([?&])'.$varname.'=[^&]+(&|$)/','$1',$url);
}
$url = removeqsvar($url, color);
echo removeqsvar($url, price);
// will output: index.php?size=xl
Thank you all.
This will allow you to exactly specify which parameters to remove using the $remove array. It works by parsing the URL with parse_url(), then grabbing the query string and parsing it with parse_str().
From there, it's straightforward - Iterate over the parameters in the URL, if one of them is in the $remove array, then delete it from the $params array. By the end, if we have parameters to add to the URL, we add them back with http_build_query().
$url = '/index.php?color=blue&size=xl'; // Your input URL
$remove = array( 'color', 'size'); // Change this to remove what you want
$parts = parse_url( $url);
parse_str( $parts['query'], $params);
foreach( $params as $k => $v) {
if( in_array( $k, $remove)) {
unset( $params[$k]);
}
}
$url = $parts['path'] . ((count( $params) > 0) ? '?' . http_build_query( $params) : '');
echo $url;
list($done) = explode("?", $url);
This snytax also works in PHP 5.3 and lower
try this:
$result = explode('?', $url)[0];
for a php version lower than php 5.4:
$tmp = explode('?', $url);
$result = $tmp[0];

How can I url replacing in php

Can someone help me with replacing this url
http://www.izlesene.com/video/arabic-shkira-belly-dance/200012
to: 200012
$url = 'http://www.izlesene.com/video/arabic-shkira-belly-dance/200012';
$out = preg_replace("/[^0-9]/i","",$url);
or
preg_match("/\/([0-9]+)/i",$url,$m);
$out = $m[1];
use the basename
$video_id = basename($url);
var_dump($video_id);
or
try to explode and get the last item.
$url = "http://www.izlesene.com/video/arabic-shkira-belly-dance/200012";
$segments = explode("/",$url);
$video_id = end($segments);
var_dump($video_id);
If things works like in JS you can do it like so:
$url = "http://www.izlesene.com/video/arabic-shkira-belly-dance/200012";
$a = explode("/", $url);
$number = array_pop($a);
And maybe like this:
$url = "http://www.izlesene.com/video/arabic-shkira-belly-dance/200012";
$number = array_pop( explode("/", $url) );
Or you can do it faster without arrays and regex:
$url = "http://www.izlesene.com/video/arabic-shkira-belly-dance/200012";
$url = substr($url, strrpos($url, "/") + 1);
strrpos searches the position of the last occurrence of / in a string and substr returns the portion of a string specified by this position.
Try:
$number = basename($url);
Hint: Regular expressions are not always the best solution (even if quite powerful but also complicated). See basenameDocs.
If you really need a regex, start at the end:
$url = "http://www.izlesene.com/video/arabic-shkira-belly-dance/200012";
$number = preg_replace('~.*/([^/]+)$~', '$1', $url);
^

Get only a part of a string using PHP

I use the following code
foreach ($twitter_xml2->channel->item as $key) {
$author = $key->{"guid"};
echo"<li><h5>$author</h5></li>";
}
and it gets me http://twitter.com/USERNAME/statuses/167382363782206976
My question is how do I get only the username ?
Username may be anything
$url = "http://twitter.com/USERNAME/statuses/167382363782206976"
preg_match("#http://twitter.com/([^\/]+)/statuses/.*#", $url, $matches);
var_dump($matches[1]);
You can use either this regexp (for preg_match): ~twitter\.com/([^/]+)/~:
$match = array();
preg_match( '~twitter\.com/([^/]+)/~', $url, $match);
echo $match[1]; // list(,$userName) = $match;
Or more effective strpos and substr
$start = strpos( $url, '/', 10); // say 10th character is after http:// and before .com/
$end = strpos( $url, '/', $start+1); // This would be the end
// Check both idexes
$username = substr( $url, $start, $end-$start);
// you will maybe have to fix indexes +/-1
foreach ($twitter_xml2->channel->item as $key) {
$author = $key->{"guid"};
list(,,,$username) = explode('/', $author);
echo"<li><h5>$username</h5></li>";
}

How to escape url for fopen

It looks like fopen can't open files with spaces.
For example:
$url = 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg';
fopen($url, 'r');
returns false (mind the space in the url), but file is accessible by browsers.
I've also tried to escape the url by urlencode and rawurlencode with no luck. How to properly escape the spaces?
You can use this code:
$arr = parse_url ( 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg' );
$parts = explode ( '/', $arr['path'] );
$fname = $parts[count($parts)-1];
unset($parts[count($parts)-1]);
$url = $arr['scheme'] . '://' . $arr['host'] . join('/', $parts) . '/' . urlencode ( $fname );
var_dump( $url );
Alternative & Shorter Answer (Thanks to #Dziamid)
$url = 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg';
$parts = pathinfo($url);
$url = $parts['dirname'] . '/' . urlencode($parts['basename']);
var_dump( $url );
OUTPUT:
string(76) "http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main+616x200.jpg"
rawurlencodeis the way to go, but no not escape the full URL. Only escape the filename. So you will end up in http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main%20616x200.jpg
All solutions proposed here are wrong because they don't escape the query string part and the base directory part. Additionally they don't take in consideration user, pass and fragment url parts.
To correctly escape a valid URL you have to separately escape the path parts and the query parts.
So the solution is to extract the url parts, escape each part and rebuild the url.
Here is a simple code snippet:
function safeUrlEncode( $url ) {
$urlParts = parse_url($url);
$urlParts['path'] = safeUrlEncodePath( $urlParts['path'] );
$urlParts['query'] = safeUrlEncodeQuery( $urlParts['query'] );
return http_build_url($urlParts);
}
function safeUrlEncodePath( $path ) {
if( strlen( $path ) == 0 || strpos($path, "/") === false ){
return "";
}
$pathParts = explode( "/" , $path );
return implode( "/", $pathParts );
}
function safeUrlEncodeQuery( $query ) {
$queryParts = array();
parse_str($query, $queryParts);
$queryParts = urlEncodeArrayElementsRecursively( $queryParts );
return http_build_query( $queryParts );
}
function urlEncodeArrayElementsRecursively( $array ){
if( ! is_array( $array ) ) {
return urlencode( $array );
} else {
foreach( $array as $arrayKey => $arrayValue ){
$array[ $arrayKey ] = urlEncodeArrayElementsRecursively( $arrayValue );
}
}
return $array;
}
Usage would simply be:
$encodedUrl = safeUrlEncode( $originalUrl );
Side note
In my code snippet i'm making use of http://php.net/manual/it/function.http-build-url.php which is available under PECL extension. If you don't have PECL extension on your server you can simply include the pure PHP implementation: http://fuelforthefire.ca/free/php/http_build_url/
Cheers :)
$url = 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg';
fopen(urlencode($url), 'r');

Categories