I'm extracting tlds from my urls content with php's parse_url.
than I have an array of top level domains which are compared with the extracted top level domain if they match or not.
$url = parse_url($tag->getAttribute('href'));
if (in_array($url['host'], $affi_urls) || $url['host'] == "www.example.com"){
$tag->setAttribute('href', '/redirect.php?url='.$href);
}
this works fine if the ur['host'] contains the top level domain. if the url['host'] is a relative path than is a big mess overthere.
/redirect.php?url=/example/test
how could I avoid this case?
You need to save the hostname of the page that you're processing. If $url['host'] is empty, use that hostname in its place.
You should encode the url parameters.
$tag->setAttribute('href', '/redirect.php?url='.urlencode($href));
And then after getting the data by parse_url, use urldecode to decode the data.
Related
This question already has answers here:
How to remove the querystring and get only the URL?
(16 answers)
Closed 3 years ago.
Is there a simple way to get the requested file or directory without the GET arguments? For example, if the URL is http://example.com/directory/file.php?paramater=value I would like to return just http://example.com/directory/file.php. I was surprised that there is not a simple index in $_SERVER[]. Did I miss one?
Edit: #T.Todua provided a newer answer to this question using parse_url.
(please upvote that answer so it can be more visible).
Edit2: Someone has been spamming and editing about extracting scheme, so I've added that at the bottom.
parse_url solution
The simplest solution would be:
echo parse_url($_SERVER["REQUEST_URI"], PHP_URL_PATH);
Parse_url is a built-in php function, who's sole purpose is to extract specific components from a url, including the PATH (everything before the first ?). As such, it is my new "best" solution to this problem.
strtok solution
Stackoverflow: How to remove the querystring and get only the url?
You can use strtok to get string before first occurence of ?
$url=strtok($_SERVER["REQUEST_URI"],'?');
Performance Note:
This problem can also be solved using explode.
Explode tends to perform better for cases splitting the sring only on a single delimiter.
Strtok tends to perform better for cases utilizing multiple delimiters.
This application of strtok to return everything in a string before the first instance of a character will perform better than any other method in PHP, though WILL leave the querystring in memory.
An aside about Scheme (http/https) and $_SERVER vars
While OP did not ask about it, I suppose it is worth mentioning:
parse_url should be used to extract any specific component from the url, please see the documentation for that function:
parse_url($actual_link, PHP_URL_SCHEME);
Of note here, is that getting the full URL from a request is not a trivial task, and has many security implications. $_SERVER variables are your friend here, but they're a fickle friend, as apache/nginx configs, php environments, and even clients, can omit or alter these variables. All of this is well out of scope for this question, but it has been thoroughly discussed:
https://stackoverflow.com/a/6768831/1589379
It is important to note that these $_SERVER variables are populated at runtime, by whichever engine is doing the execution (/var/run/php/ or /etc/php/[version]/fpm/). These variables are passed from the OS, to the webserver (apache/nginx) to the php engine, and are modified and amended at each step. The only such variables that can be relied on are REQUEST_URI (because it's required by php), and those listed in RFC 3875 (see: PHP: $_SERVER ) because they are required of webservers.
please note: spaming links to your answers across other questions is not in good taste.
You can use $_SERVER['REQUEST_URI'] to get requested path. Then, you'll need to remove the parameters...
$uri_parts = explode('?', $_SERVER['REQUEST_URI'], 2);
Then, add in the hostname and protocol.
echo 'http://' . $_SERVER['HTTP_HOST'] . $uri_parts[0];
You'll have to detect protocol as well, if you mix http: and https://. That I leave as an exercise for you. $_SERVER['REQUEST_SCHEME'] returns the protocol.
Putting it all together:
echo $_SERVER['REQUEST_SCHEME'] .'://'. $_SERVER['HTTP_HOST']
. explode('?', $_SERVER['REQUEST_URI'], 2)[0];
...returns, for example:
http://example.com/directory/file.php
php.com Documentation:
$_SERVER — Server and execution environment information
explode — Split a string by a string
parse_url — Parse a URL and return its components (possibly a better solution)
Solution:
echoparse_url($_SERVER["REQUEST_URI"], PHP_URL_PATH);
Here is a solution that takes into account different ports and https:
$pageURL = (#$_SERVER['HTTPS'] == 'on') ? 'https://' : 'http://';
if ($_SERVER['SERVER_PORT'] != '80')
$pageURL .= $_SERVER['SERVER_NAME'].':'.$_SERVER['SERVER_PORT'].$_SERVER['PHP_SELF'];
else
$pageURL .= $_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
Or a more basic solution that does not take other ports into account:
$pageURL = (#$_SERVER['HTTPS'] == 'on') ? 'https://' : 'http://';
$pageURL .= $_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
I actually think that's not the good way to parse it. It's not clean or it's a bit out of subject ...
Explode is heavy
Session is heavy
PHP_SELF doesn't handle URLRewriting
I'd do something like ...
if ($pos_get = strpos($app_uri, '?')) $app_uri = substr($app_uri, 0, $pos_get);
This detects whether there's an actual '?' (GET standard format)
If it's ok, that cuts our variable before the '?' which's reserved for getting datas
Considering $app_uri as the URI/URL of my website.
$uri_parts = explode('?', $_SERVER['REQUEST_URI'], 2);
$request_uri = $uri_parts[0];
echo $request_uri;
You can use $_GET for url params, or $_POST for post params, but the $_REQUEST contains the parameters from $_GET $_POST and $_COOKIE, if you want to hide the URI parameter from the user you can convert it to a session variable like so:
<?php
session_start();
if (isset($_REQUEST['param']) && !isset($_SESSION['param'])) {
// Store all parameters received
$_SESSION['param'] = $_REQUEST['param'];
// Redirect without URI parameters
header('Location: /file.php');
exit;
}
?>
<html>
<body>
<?php
echo $_SESSION['param'];
?>
</body>
</html>
EDIT
use $_SERVER['PHP_SELF'] to get the current file name or $_SERVER['REQUEST_URI'] to get the requested URI
Not everyone will find it simple, but I believe this to be the best way to go around it:
preg_match('/^[^\?]+/', $_SERVER['REQUEST_URI'], $return);
$url = 'http' . ('on' === $_SERVER['HTTPS'] ? 's' : '') . '://' . $_SERVER['HTTP_HOST'] . $return[0]
What is does is simply to go through the REQUEST_URI from the beginning of the string, then stop when it hits a "?" (which really, only should happen when you get to parameters).
Then you create the url and save it to $url:
When creating the $url... What we're doing is simply writing "http" then checking if https is being used, if it is, we also write "s", then we concatenate "://", concatenate the HTTP_HOST (the server, fx: "stackoverflow.com"), and concatenate the $return, which we found before, to that (it's an array, but we only want the first index in it... There can only ever be one index, since we're checking from the beginning of the string in the regex.).
I hope someone can use this...
PS. This has been confirmed to work while using SLIM to reroute the URL.
I know this is an old post but I am having the same problem and I solved it this way
$current_request = preg_replace("/\?.*$/","",$_SERVER["REQUEST_URI"]);
Or equivalently
$current_request = preg_replace("/\?.*/D","",$_SERVER["REQUEST_URI"]);
It's shocking how many of these upvoted/accepted answers are incomplete, so they don't answer the OP's question, after 7 years!
If you are on a page with URL like: http://example.com/directory/file.php?paramater=value
...and you would like to return just: http://example.com/directory/file.php
then use:
echo $_SERVER['REQUEST_SCHEME'].'://'.$_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
Why so complicated? =)
$baseurl = 'http://mysite.com';
$url_without_get = $baseurl.$_SERVER['PHP_SELF'];
this should really do it man ;)
I had the same problem when I wanted a link back to homepage. I tried this and it worked:
<a href="<?php echo $_SESSION['PHP_SELF']; ?>?">
Note the question mark at the end. I believe that tells the machine stop thinking on behalf of the coder :)
I am using a script to check links on a given page. I am using simple html DOM to parse the information into an array. I have to check the href of all the a tags to find if they contain a file or something like # or JS.
I tried the following without success.
if(preg_match("|^(.*)|iU", $href)){
save_link();
}
I dont know it my pattern is wrong or if there is a better method to complete this function.
I want to be able to detect if $href contains .com .php .file extensions. This way it will filter out items like # "function()" and other items used in the href attribute.
EDIT:
parse_url will not work stop posting it. The value # returns as a valid url like I stated above I am trying to look for any string followed by .* with no more than 4 chars following the .
I believe that the function you're looking for is parse_url().
This function will take a URL string, and return an array of components, which will allow you to work out what kind of URL it is.
However note that it has issues with incomplete URLs in PHP versions prior to 5.4.7, so you need to have the very latest PHP to get the best out of it.
Hope that helps.
See http://php.net/manual/en/function.parse-url.php
I'm assuming you don't want to match fragments (#) because you are not concerned with following internal anchors.
parse_url breaks up the different parts of the url into an array. You can see the path component of the URL in this array and run your check against that.
You can use parse_url() , like this :
$res = parse_url($href);
if ( $res['scheme'] == 'http' || $res['scheme'] == 'https'){
//valid url
save_link();
}
UPDATE:
I've added code to filter only http and https urls, thanks to Baba for spotting this.
This question already has answers here:
How to remove the querystring and get only the URL?
(16 answers)
Closed 3 years ago.
Is there a simple way to get the requested file or directory without the GET arguments? For example, if the URL is http://example.com/directory/file.php?paramater=value I would like to return just http://example.com/directory/file.php. I was surprised that there is not a simple index in $_SERVER[]. Did I miss one?
Edit: #T.Todua provided a newer answer to this question using parse_url.
(please upvote that answer so it can be more visible).
Edit2: Someone has been spamming and editing about extracting scheme, so I've added that at the bottom.
parse_url solution
The simplest solution would be:
echo parse_url($_SERVER["REQUEST_URI"], PHP_URL_PATH);
Parse_url is a built-in php function, who's sole purpose is to extract specific components from a url, including the PATH (everything before the first ?). As such, it is my new "best" solution to this problem.
strtok solution
Stackoverflow: How to remove the querystring and get only the url?
You can use strtok to get string before first occurence of ?
$url=strtok($_SERVER["REQUEST_URI"],'?');
Performance Note:
This problem can also be solved using explode.
Explode tends to perform better for cases splitting the sring only on a single delimiter.
Strtok tends to perform better for cases utilizing multiple delimiters.
This application of strtok to return everything in a string before the first instance of a character will perform better than any other method in PHP, though WILL leave the querystring in memory.
An aside about Scheme (http/https) and $_SERVER vars
While OP did not ask about it, I suppose it is worth mentioning:
parse_url should be used to extract any specific component from the url, please see the documentation for that function:
parse_url($actual_link, PHP_URL_SCHEME);
Of note here, is that getting the full URL from a request is not a trivial task, and has many security implications. $_SERVER variables are your friend here, but they're a fickle friend, as apache/nginx configs, php environments, and even clients, can omit or alter these variables. All of this is well out of scope for this question, but it has been thoroughly discussed:
https://stackoverflow.com/a/6768831/1589379
It is important to note that these $_SERVER variables are populated at runtime, by whichever engine is doing the execution (/var/run/php/ or /etc/php/[version]/fpm/). These variables are passed from the OS, to the webserver (apache/nginx) to the php engine, and are modified and amended at each step. The only such variables that can be relied on are REQUEST_URI (because it's required by php), and those listed in RFC 3875 (see: PHP: $_SERVER ) because they are required of webservers.
please note: spaming links to your answers across other questions is not in good taste.
You can use $_SERVER['REQUEST_URI'] to get requested path. Then, you'll need to remove the parameters...
$uri_parts = explode('?', $_SERVER['REQUEST_URI'], 2);
Then, add in the hostname and protocol.
echo 'http://' . $_SERVER['HTTP_HOST'] . $uri_parts[0];
You'll have to detect protocol as well, if you mix http: and https://. That I leave as an exercise for you. $_SERVER['REQUEST_SCHEME'] returns the protocol.
Putting it all together:
echo $_SERVER['REQUEST_SCHEME'] .'://'. $_SERVER['HTTP_HOST']
. explode('?', $_SERVER['REQUEST_URI'], 2)[0];
...returns, for example:
http://example.com/directory/file.php
php.com Documentation:
$_SERVER — Server and execution environment information
explode — Split a string by a string
parse_url — Parse a URL and return its components (possibly a better solution)
Solution:
echoparse_url($_SERVER["REQUEST_URI"], PHP_URL_PATH);
Here is a solution that takes into account different ports and https:
$pageURL = (#$_SERVER['HTTPS'] == 'on') ? 'https://' : 'http://';
if ($_SERVER['SERVER_PORT'] != '80')
$pageURL .= $_SERVER['SERVER_NAME'].':'.$_SERVER['SERVER_PORT'].$_SERVER['PHP_SELF'];
else
$pageURL .= $_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
Or a more basic solution that does not take other ports into account:
$pageURL = (#$_SERVER['HTTPS'] == 'on') ? 'https://' : 'http://';
$pageURL .= $_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
I actually think that's not the good way to parse it. It's not clean or it's a bit out of subject ...
Explode is heavy
Session is heavy
PHP_SELF doesn't handle URLRewriting
I'd do something like ...
if ($pos_get = strpos($app_uri, '?')) $app_uri = substr($app_uri, 0, $pos_get);
This detects whether there's an actual '?' (GET standard format)
If it's ok, that cuts our variable before the '?' which's reserved for getting datas
Considering $app_uri as the URI/URL of my website.
$uri_parts = explode('?', $_SERVER['REQUEST_URI'], 2);
$request_uri = $uri_parts[0];
echo $request_uri;
You can use $_GET for url params, or $_POST for post params, but the $_REQUEST contains the parameters from $_GET $_POST and $_COOKIE, if you want to hide the URI parameter from the user you can convert it to a session variable like so:
<?php
session_start();
if (isset($_REQUEST['param']) && !isset($_SESSION['param'])) {
// Store all parameters received
$_SESSION['param'] = $_REQUEST['param'];
// Redirect without URI parameters
header('Location: /file.php');
exit;
}
?>
<html>
<body>
<?php
echo $_SESSION['param'];
?>
</body>
</html>
EDIT
use $_SERVER['PHP_SELF'] to get the current file name or $_SERVER['REQUEST_URI'] to get the requested URI
Not everyone will find it simple, but I believe this to be the best way to go around it:
preg_match('/^[^\?]+/', $_SERVER['REQUEST_URI'], $return);
$url = 'http' . ('on' === $_SERVER['HTTPS'] ? 's' : '') . '://' . $_SERVER['HTTP_HOST'] . $return[0]
What is does is simply to go through the REQUEST_URI from the beginning of the string, then stop when it hits a "?" (which really, only should happen when you get to parameters).
Then you create the url and save it to $url:
When creating the $url... What we're doing is simply writing "http" then checking if https is being used, if it is, we also write "s", then we concatenate "://", concatenate the HTTP_HOST (the server, fx: "stackoverflow.com"), and concatenate the $return, which we found before, to that (it's an array, but we only want the first index in it... There can only ever be one index, since we're checking from the beginning of the string in the regex.).
I hope someone can use this...
PS. This has been confirmed to work while using SLIM to reroute the URL.
I know this is an old post but I am having the same problem and I solved it this way
$current_request = preg_replace("/\?.*$/","",$_SERVER["REQUEST_URI"]);
Or equivalently
$current_request = preg_replace("/\?.*/D","",$_SERVER["REQUEST_URI"]);
It's shocking how many of these upvoted/accepted answers are incomplete, so they don't answer the OP's question, after 7 years!
If you are on a page with URL like: http://example.com/directory/file.php?paramater=value
...and you would like to return just: http://example.com/directory/file.php
then use:
echo $_SERVER['REQUEST_SCHEME'].'://'.$_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
Why so complicated? =)
$baseurl = 'http://mysite.com';
$url_without_get = $baseurl.$_SERVER['PHP_SELF'];
this should really do it man ;)
I had the same problem when I wanted a link back to homepage. I tried this and it worked:
<a href="<?php echo $_SESSION['PHP_SELF']; ?>?">
Note the question mark at the end. I believe that tells the machine stop thinking on behalf of the coder :)
I'm sending a php script multiple urls (about 15) at once, all containing about 5 url variables. In my script, I'm parsing the chunk of urls into individual ones by splitting them with two backslashes (which i add upon before to the script), and then curling each individual url. However, when I run my script, it only accepts a url up to the "&" symbol. I'd like to have the entire chunk, so that I can split it up later in my script. What might be the best way to approach this issue?
Thanks.
An example of what happens when i send my script a url chunk:
<?php
/*
$url variable being sent to script:
http://www.test1.com?q1=a&q2=b&q3=c&q4=d\\http://www.test2.com?r1=a&r2=b&r3=c&r4=d\\http://www.test3.com?q1=a&q2=b&q3=c&q4=d\\http://www.test4.com?e1=a&e2=b&e3=c&e4=d
*/
$url = $_GET['url'];
echo $url; // returns http://www.test1.com?q1=a
//later on in my script, i just need to curl each "\\" seperated url
?>
You need to urlencode() the (data) URLs before appending them to your script's request.
Otherwise, PHP is going to to see ?listOfUrls=http://someurl.com/?someVar=SomeVal& and stop right there, due to the literal "&"
If you're building the query string in PHP you could try something like:
<?PHP
//imagine $urls is an array of urls
$qs = '?urls=';
foreach($urls as $u){
$q .= urlencode($u) .'\\';
}
I also suspect you can play with [] notation in the url so that on the other side of the GET, you get a nice clean array of URLs back, instead of having to parse on some delimiter like "\"
Since you didn't url encode your url param, everything after the first & is treated as the param to the original url.
The $_GET array is formed by splitting on ampersands. URL-encode the URLs before passing them as parameters. PHP should decode them for you.
Example: pass url=http://www.test1.com?q1=a%26q2=b%26q3=c%26q4=d\\http://www.test2.com?r1=a%26r2=b%26r3=c%26r4=d\\http://www.test3.com?q1=a%26q2=b%26q3=c%26q4=d\\http://www.test4.com?e1=a%26e2=b%26e3=c%26e4=d
You can do away with the '\\' by turning the parameter into an array. Example: use url[]=http://www.test1.com?q1=a%26q2=b%26q3=c%26q4=d&url[]=http://www.test2.com?r1=a%26r2=b%26r3=c%26r4=d&url[]=http://www.test3.com?q1=a%26q2=b%26q3=c%26q4=d&url[]=http://www.test4.com?e1=a%26e2=b%26e3=c%26e4=d
How would I go about extracting just the base URl from a long string that was inputted into a form?
Example:
User inputs: http://stackoverflow.com/question/ask/asdfasneransea
Program needs just: http://stackoverflow.com and not /question/ask/asdfasneransea
What is the easist way to do this using PHP?
You can simply use parse_url()
$url = parse_url('http://example.com/foo/bar');
$host = $url['host']; // example.com
Use the parse_url function to get the separate parts of the URL, then reassemble the parts you are looking for.