get domain name from link with a fast and reliable method - php

Currently I am using this code to get the domain name (without www. or domain ending like .com):
explode('.', $url)[1];
Due to the fact that this code is in a loop it takes very long to handle it. Furthermore it can not get "example" from http://example.com/asd/asd.asd.html. Is there another and faster way to solve this?
Thank you for any answer in advance!
best greetings

use parse_url()
$host = parse_url($url, PHP_URL_HOST);
PHP_URL_HOST returns the Host
Further, use a Regex to get the desired Part of the Host:
$result = preg_match('/^(?:www\.)?([^\.]+)/', $match);

Related

Redirect to previous page in PHP but remove all URL variables [duplicate]

This question already has answers here:
How to remove the querystring and get only the URL?
(16 answers)
Closed 3 years ago.
Is there a simple way to get the requested file or directory without the GET arguments? For example, if the URL is http://example.com/directory/file.php?paramater=value I would like to return just http://example.com/directory/file.php. I was surprised that there is not a simple index in $_SERVER[]. Did I miss one?
Edit: #T.Todua provided a newer answer to this question using parse_url.
(please upvote that answer so it can be more visible).
Edit2: Someone has been spamming and editing about extracting scheme, so I've added that at the bottom.
parse_url solution
The simplest solution would be:
echo parse_url($_SERVER["REQUEST_URI"], PHP_URL_PATH);
Parse_url is a built-in php function, who's sole purpose is to extract specific components from a url, including the PATH (everything before the first ?). As such, it is my new "best" solution to this problem.
strtok solution
Stackoverflow: How to remove the querystring and get only the url?
You can use strtok to get string before first occurence of ?
$url=strtok($_SERVER["REQUEST_URI"],'?');
Performance Note:
This problem can also be solved using explode.
Explode tends to perform better for cases splitting the sring only on a single delimiter.
Strtok tends to perform better for cases utilizing multiple delimiters.
This application of strtok to return everything in a string before the first instance of a character will perform better than any other method in PHP, though WILL leave the querystring in memory.
An aside about Scheme (http/https) and $_SERVER vars
While OP did not ask about it, I suppose it is worth mentioning:
parse_url should be used to extract any specific component from the url, please see the documentation for that function:
parse_url($actual_link, PHP_URL_SCHEME);
Of note here, is that getting the full URL from a request is not a trivial task, and has many security implications. $_SERVER variables are your friend here, but they're a fickle friend, as apache/nginx configs, php environments, and even clients, can omit or alter these variables. All of this is well out of scope for this question, but it has been thoroughly discussed:
https://stackoverflow.com/a/6768831/1589379
It is important to note that these $_SERVER variables are populated at runtime, by whichever engine is doing the execution (/var/run/php/ or /etc/php/[version]/fpm/). These variables are passed from the OS, to the webserver (apache/nginx) to the php engine, and are modified and amended at each step. The only such variables that can be relied on are REQUEST_URI (because it's required by php), and those listed in RFC 3875 (see: PHP: $_SERVER ) because they are required of webservers.
please note: spaming links to your answers across other questions is not in good taste.
You can use $_SERVER['REQUEST_URI'] to get requested path. Then, you'll need to remove the parameters...
$uri_parts = explode('?', $_SERVER['REQUEST_URI'], 2);
Then, add in the hostname and protocol.
echo 'http://' . $_SERVER['HTTP_HOST'] . $uri_parts[0];
You'll have to detect protocol as well, if you mix http: and https://. That I leave as an exercise for you. $_SERVER['REQUEST_SCHEME'] returns the protocol.
Putting it all together:
echo $_SERVER['REQUEST_SCHEME'] .'://'. $_SERVER['HTTP_HOST']
. explode('?', $_SERVER['REQUEST_URI'], 2)[0];
...returns, for example:
http://example.com/directory/file.php
php.com Documentation:
$_SERVER — Server and execution environment information
explode — Split a string by a string
parse_url — Parse a URL and return its components (possibly a better solution)
Solution:
echoparse_url($_SERVER["REQUEST_URI"], PHP_URL_PATH);
Here is a solution that takes into account different ports and https:
$pageURL = (#$_SERVER['HTTPS'] == 'on') ? 'https://' : 'http://';
if ($_SERVER['SERVER_PORT'] != '80')
$pageURL .= $_SERVER['SERVER_NAME'].':'.$_SERVER['SERVER_PORT'].$_SERVER['PHP_SELF'];
else
$pageURL .= $_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
Or a more basic solution that does not take other ports into account:
$pageURL = (#$_SERVER['HTTPS'] == 'on') ? 'https://' : 'http://';
$pageURL .= $_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
I actually think that's not the good way to parse it. It's not clean or it's a bit out of subject ...
Explode is heavy
Session is heavy
PHP_SELF doesn't handle URLRewriting
I'd do something like ...
if ($pos_get = strpos($app_uri, '?')) $app_uri = substr($app_uri, 0, $pos_get);
This detects whether there's an actual '?' (GET standard format)
If it's ok, that cuts our variable before the '?' which's reserved for getting datas
Considering $app_uri as the URI/URL of my website.
$uri_parts = explode('?', $_SERVER['REQUEST_URI'], 2);
$request_uri = $uri_parts[0];
echo $request_uri;
You can use $_GET for url params, or $_POST for post params, but the $_REQUEST contains the parameters from $_GET $_POST and $_COOKIE, if you want to hide the URI parameter from the user you can convert it to a session variable like so:
<?php
session_start();
if (isset($_REQUEST['param']) && !isset($_SESSION['param'])) {
// Store all parameters received
$_SESSION['param'] = $_REQUEST['param'];
// Redirect without URI parameters
header('Location: /file.php');
exit;
}
?>
<html>
<body>
<?php
echo $_SESSION['param'];
?>
</body>
</html>
EDIT
use $_SERVER['PHP_SELF'] to get the current file name or $_SERVER['REQUEST_URI'] to get the requested URI
Not everyone will find it simple, but I believe this to be the best way to go around it:
preg_match('/^[^\?]+/', $_SERVER['REQUEST_URI'], $return);
$url = 'http' . ('on' === $_SERVER['HTTPS'] ? 's' : '') . '://' . $_SERVER['HTTP_HOST'] . $return[0]
What is does is simply to go through the REQUEST_URI from the beginning of the string, then stop when it hits a "?" (which really, only should happen when you get to parameters).
Then you create the url and save it to $url:
When creating the $url... What we're doing is simply writing "http" then checking if https is being used, if it is, we also write "s", then we concatenate "://", concatenate the HTTP_HOST (the server, fx: "stackoverflow.com"), and concatenate the $return, which we found before, to that (it's an array, but we only want the first index in it... There can only ever be one index, since we're checking from the beginning of the string in the regex.).
I hope someone can use this...
PS. This has been confirmed to work while using SLIM to reroute the URL.
I know this is an old post but I am having the same problem and I solved it this way
$current_request = preg_replace("/\?.*$/","",$_SERVER["REQUEST_URI"]);
Or equivalently
$current_request = preg_replace("/\?.*/D","",$_SERVER["REQUEST_URI"]);
It's shocking how many of these upvoted/accepted answers are incomplete, so they don't answer the OP's question, after 7 years!
If you are on a page with URL like: http://example.com/directory/file.php?paramater=value
...and you would like to return just: http://example.com/directory/file.php
then use:
echo $_SERVER['REQUEST_SCHEME'].'://'.$_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
Why so complicated? =)
$baseurl = 'http://mysite.com';
$url_without_get = $baseurl.$_SERVER['PHP_SELF'];
this should really do it man ;)
I had the same problem when I wanted a link back to homepage. I tried this and it worked:
<a href="<?php echo $_SESSION['PHP_SELF']; ?>?">
Note the question mark at the end. I believe that tells the machine stop thinking on behalf of the coder :)

Most efficient fix for an edgecase PHP bug, parse_url no scheme

I've recently run into a bug in PHP 7.1 which seems to have come back after being fixed in PHP 5.4.7
The problem is simply that if you pass a url to parse_url() and the url doesn't have a scheme it will return the whole url as if it's just a path. For example:
var_dump(parse_url('google.co.uk/test'))
Result:
array(1) { ["path"]=> string(12) "google.co.uk/test" }
While in reality here it should split into its domain and path.
I run parse_url a few ten million times a day as part of url decryption / encryption functionality. I'm looking for a fast way to fix this edgecase bug or have a reliable alternative to parse_url.
Edit:
Thanks for the helpful responses, here's the solution I used in the end, I hope it helps someone. I won't submit it as an answer because I already marked someone else as correct (which they are) which allowed me to write this.
$parsedUrl = parse_url($uri);
// if the uri has no scheme, it won't think there's a host and will give bad results
if ($parsedUrl !== false && !isset($parsedUrl['host'])) {
// double slash prepended will parse $uri as if it has a schema and no schema will be in the result
$parsedUrl = parse_url('//' . $uri);
}
if ($parsedUrl === false) {
throw new MalformedUrlException('Malformed URL: ' . $uri);
}
// use parsed url as needed
parse_url needs to have information if the given string is the beginning of a url.
this is why parse_url('//domain/path') works -> it will just not output any schema.
now to describe the problem you want to be solved: php would need to know every domain there is and to then be able to decide if this is what the user wanted (basically impossible)
Take for example the following url: 'http://whois.domaintools.com/test.at' -> if I only pass the path it will write 'test.at' -> is this now a path or domain?

Extract domain name from affiliate URL using PHP

Here is the format of affiliate URL I have http://tracking.vcommission.com/aff_c?offer_id=2119&&url=http%3A%2F%2Fwww.netmeds.com%2F%3Fsource_attribution%3DVC-CPS-Emails%26utm_source%3DVC-CPS-Emails%26utm_medium%3DCPS-Emails%26utm_campaign%3DEmails
If you see it has 2 URLs:
first URL: is for vcommission.com and
Second URL: netmeds.com
I have CSV file with lot of rows. Each rows may have different second URL. I wanted to get second URL for each rows. First URL is also not static as for different CSV, this would also different.
How can I get second URL?
Some basic string parsing like this should give you an idea.
$url='http://tracking.vcommission.com/aff_c?offer_id=2119&&url=http%3A%2F%2Fwww.netmeds.com%2F%3Fsource_attribution%3DVC-CPS-Emails%26utm_source%3DVC-CPS-Emails%26utm_medium%3DCPS-Emails%26utm_campaign%3DEmails';
list($u,$q)=explode('url=',urldecode($url));
$o=(object)parse_url($q);
echo $o->host;
A good way to find the domain for a URL is with parse_url
Unfortunately due to the way your data is stored this is not really an option however you may be able to use some sort of regex to find contained web addresses in the query string
<?php
$url = "http://tracking.vcommission.com/aff_c?offer_id=2119&&url=http%3A%2F%2Fwww.netmeds.com%2F%3Fsource_attribution%3DVC-CPS-Emails%26utm_source%3DVC-CPS-Emails%26utm_medium%3DCPS-Emails%26utm_campaign%3DEmails";
$p = parse_url($url);
$pattern = "/www[^%]*/";
preg_match($pattern, $p['query'], $result);
var_dump($result);
You may need to adjust the regex pattern based on how the other data presents itself.

Remove certain part of string in PHP [duplicate]

This question already has answers here:
Get domain name (not subdomain) in php
(18 answers)
Closed 10 years ago.
I've already seen a bunch of questions on this exact subject, but none seem to solve my problem. I want to create a function that will remove everything from a website address, except for the domain name.
For example if the user inputs: http://www.stackoverflow.com/blahblahblah I want to get stackoverflow, and the same way if the user inputs facebook.com/user/bacon I want to get facebook.
Do anyone know of a function or a way where I can remove certain parts of strings? Maybe it'll search for http, and when found it'll remove everything until after the // Then it'll search for www, if found it'll remove everything until the . Then it keeps everything until the next dot, where it removes everything behind it? Looking at it now, this might cause problems with sites as http://www.en.wikipedia.org because I'll be left with only en.
Any ideas (preferably in PHP, but JavaScript is also welcome)?
EDIT 1:
Thanks to great feedback I think I've been able to work out a function that does what I want:
function getdomain($url) {
$parts = parse_url($url);
if($parts['scheme'] != 'http') {
$url = 'http://'.$url;
}
$parts2 = parse_url($url);
$host = $parts2['host'];
$remove = explode('.', $host);
$result = $remove[0];
if($result == 'www') {
$result = $remove[1];
}
return $result;
}
It's not perfect, at least considering subdomains, but I think it's possible to do something about it. Maybe add a second if statement at the end to check the length of the array. If it's bigger than two, then choose item nr1 instead of item nr0. This obviously gives me trouble related to any domain using .co.uk (because that'll be tree items long, but I don't want to return co). I'll try to work around on it a little bit, and see what I come up with. I'd be glad if some of you PHP gurus out there could take a look as well. I'm not as skilled or as experienced as any of you... :P
Use parse_url to split the URL into the different parts. What you need is the hostname. Then you will want to split it by the dot and get the first part:
$url = 'http://facebook.com/blahblah';
$parts = parse_url($url);
$host = $parts['host']; // facebook.com
$foo = explode('.', $host);
$result = $foo[0]; // facebook
You can use the parse_url function from PHP which returns exactly what you want - see
Use the parse_url method in php to get domain.com and then use replace .com with empty string.
I am a little rusty on my regular expressions but this should work.
$url='http://www.en.wikipedia.org';
$domain = parse_url($url, PHP_URL_HOST); //Will return en.wikipedia.org
$domain = preg_replace('\.com|\.org', '', $domain);
http://php.net/manual/en/function.parse-url.php
PHP REGEX: Get domain from URL
http://rubular.com/r/MvyPO9ijnQ //Check regular expressions
You're looking for info on Regular Expression. It's a bit complicated, so be prepared to read up. In your case, you'll best utilize preg_match and preg_replace. It searches for a match based on your pattern and replaces the matches with your replacement.
preg_match
preg_replace
I'd start with a pattern like this: find .com, .net or .org and delete it and everything after it. Then find the last . and delete it and everything in front of it. Finally, if // exists, delete it and everything in front of it.
if (preg_match("/^http:\/\//i",$url))
preg_replace("/^http:\/\//i","",$url);
if (preg_match("/www./i",$url))
preg_replace("/www./i","",$url);
if (preg_match("/.com/i",$url))
preg_replace("/.com/i","",$url);
if (preg_match("/\/*$/",$url))
preg_replace("/\/*$/","",$url);
^ = at the start of the string
i = case insensitive
\ = escape char
$ = the end of the string
This will have to be played around with and tweaked, but it should get your pointed in the right direction.
Javascript:
document.domain.replace(".com","")
PHP:
$url = 'http://google.com/something/something';
$parse = parse_url($url);
echo str_replace(".com","", $parse['host']); //returns google
This is quite a quick method but should do what you want in PHP:
function getDomain( $URL ) {
return explode('.',$URL)[1];
}
I will update it when I get chance but basically it splits the URL into pieces by the full stop and then returns the second item which should be the domain. A bit more logic would be required for longer domains such as www.abc.xyz.com but for normal urls it would suffice.

Request string without GET arguments [duplicate]

This question already has answers here:
How to remove the querystring and get only the URL?
(16 answers)
Closed 3 years ago.
Is there a simple way to get the requested file or directory without the GET arguments? For example, if the URL is http://example.com/directory/file.php?paramater=value I would like to return just http://example.com/directory/file.php. I was surprised that there is not a simple index in $_SERVER[]. Did I miss one?
Edit: #T.Todua provided a newer answer to this question using parse_url.
(please upvote that answer so it can be more visible).
Edit2: Someone has been spamming and editing about extracting scheme, so I've added that at the bottom.
parse_url solution
The simplest solution would be:
echo parse_url($_SERVER["REQUEST_URI"], PHP_URL_PATH);
Parse_url is a built-in php function, who's sole purpose is to extract specific components from a url, including the PATH (everything before the first ?). As such, it is my new "best" solution to this problem.
strtok solution
Stackoverflow: How to remove the querystring and get only the url?
You can use strtok to get string before first occurence of ?
$url=strtok($_SERVER["REQUEST_URI"],'?');
Performance Note:
This problem can also be solved using explode.
Explode tends to perform better for cases splitting the sring only on a single delimiter.
Strtok tends to perform better for cases utilizing multiple delimiters.
This application of strtok to return everything in a string before the first instance of a character will perform better than any other method in PHP, though WILL leave the querystring in memory.
An aside about Scheme (http/https) and $_SERVER vars
While OP did not ask about it, I suppose it is worth mentioning:
parse_url should be used to extract any specific component from the url, please see the documentation for that function:
parse_url($actual_link, PHP_URL_SCHEME);
Of note here, is that getting the full URL from a request is not a trivial task, and has many security implications. $_SERVER variables are your friend here, but they're a fickle friend, as apache/nginx configs, php environments, and even clients, can omit or alter these variables. All of this is well out of scope for this question, but it has been thoroughly discussed:
https://stackoverflow.com/a/6768831/1589379
It is important to note that these $_SERVER variables are populated at runtime, by whichever engine is doing the execution (/var/run/php/ or /etc/php/[version]/fpm/). These variables are passed from the OS, to the webserver (apache/nginx) to the php engine, and are modified and amended at each step. The only such variables that can be relied on are REQUEST_URI (because it's required by php), and those listed in RFC 3875 (see: PHP: $_SERVER ) because they are required of webservers.
please note: spaming links to your answers across other questions is not in good taste.
You can use $_SERVER['REQUEST_URI'] to get requested path. Then, you'll need to remove the parameters...
$uri_parts = explode('?', $_SERVER['REQUEST_URI'], 2);
Then, add in the hostname and protocol.
echo 'http://' . $_SERVER['HTTP_HOST'] . $uri_parts[0];
You'll have to detect protocol as well, if you mix http: and https://. That I leave as an exercise for you. $_SERVER['REQUEST_SCHEME'] returns the protocol.
Putting it all together:
echo $_SERVER['REQUEST_SCHEME'] .'://'. $_SERVER['HTTP_HOST']
. explode('?', $_SERVER['REQUEST_URI'], 2)[0];
...returns, for example:
http://example.com/directory/file.php
php.com Documentation:
$_SERVER — Server and execution environment information
explode — Split a string by a string
parse_url — Parse a URL and return its components (possibly a better solution)
Solution:
echoparse_url($_SERVER["REQUEST_URI"], PHP_URL_PATH);
Here is a solution that takes into account different ports and https:
$pageURL = (#$_SERVER['HTTPS'] == 'on') ? 'https://' : 'http://';
if ($_SERVER['SERVER_PORT'] != '80')
$pageURL .= $_SERVER['SERVER_NAME'].':'.$_SERVER['SERVER_PORT'].$_SERVER['PHP_SELF'];
else
$pageURL .= $_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
Or a more basic solution that does not take other ports into account:
$pageURL = (#$_SERVER['HTTPS'] == 'on') ? 'https://' : 'http://';
$pageURL .= $_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
I actually think that's not the good way to parse it. It's not clean or it's a bit out of subject ...
Explode is heavy
Session is heavy
PHP_SELF doesn't handle URLRewriting
I'd do something like ...
if ($pos_get = strpos($app_uri, '?')) $app_uri = substr($app_uri, 0, $pos_get);
This detects whether there's an actual '?' (GET standard format)
If it's ok, that cuts our variable before the '?' which's reserved for getting datas
Considering $app_uri as the URI/URL of my website.
$uri_parts = explode('?', $_SERVER['REQUEST_URI'], 2);
$request_uri = $uri_parts[0];
echo $request_uri;
You can use $_GET for url params, or $_POST for post params, but the $_REQUEST contains the parameters from $_GET $_POST and $_COOKIE, if you want to hide the URI parameter from the user you can convert it to a session variable like so:
<?php
session_start();
if (isset($_REQUEST['param']) && !isset($_SESSION['param'])) {
// Store all parameters received
$_SESSION['param'] = $_REQUEST['param'];
// Redirect without URI parameters
header('Location: /file.php');
exit;
}
?>
<html>
<body>
<?php
echo $_SESSION['param'];
?>
</body>
</html>
EDIT
use $_SERVER['PHP_SELF'] to get the current file name or $_SERVER['REQUEST_URI'] to get the requested URI
Not everyone will find it simple, but I believe this to be the best way to go around it:
preg_match('/^[^\?]+/', $_SERVER['REQUEST_URI'], $return);
$url = 'http' . ('on' === $_SERVER['HTTPS'] ? 's' : '') . '://' . $_SERVER['HTTP_HOST'] . $return[0]
What is does is simply to go through the REQUEST_URI from the beginning of the string, then stop when it hits a "?" (which really, only should happen when you get to parameters).
Then you create the url and save it to $url:
When creating the $url... What we're doing is simply writing "http" then checking if https is being used, if it is, we also write "s", then we concatenate "://", concatenate the HTTP_HOST (the server, fx: "stackoverflow.com"), and concatenate the $return, which we found before, to that (it's an array, but we only want the first index in it... There can only ever be one index, since we're checking from the beginning of the string in the regex.).
I hope someone can use this...
PS. This has been confirmed to work while using SLIM to reroute the URL.
I know this is an old post but I am having the same problem and I solved it this way
$current_request = preg_replace("/\?.*$/","",$_SERVER["REQUEST_URI"]);
Or equivalently
$current_request = preg_replace("/\?.*/D","",$_SERVER["REQUEST_URI"]);
It's shocking how many of these upvoted/accepted answers are incomplete, so they don't answer the OP's question, after 7 years!
If you are on a page with URL like: http://example.com/directory/file.php?paramater=value
...and you would like to return just: http://example.com/directory/file.php
then use:
echo $_SERVER['REQUEST_SCHEME'].'://'.$_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF'];
Why so complicated? =)
$baseurl = 'http://mysite.com';
$url_without_get = $baseurl.$_SERVER['PHP_SELF'];
this should really do it man ;)
I had the same problem when I wanted a link back to homepage. I tried this and it worked:
<a href="<?php echo $_SESSION['PHP_SELF']; ?>?">
Note the question mark at the end. I believe that tells the machine stop thinking on behalf of the coder :)

Categories