Trim a url to just the domain name using PHP

Trim a url to just the domain name using PHP - php

I have a database table column that stores urls of a persons website. This column is unique as I don't want people using the same website twice!
However a person could get around this by doing:
domain.com
domain.com/hello123
www.domain.com
So my plan is to make it so that when a person saves their record it will remove everything after the first slash to make sure only the domain is saved into the database.
How would I do this though? I'm presuming this has been done lots of times before, but I'm looking for something VERY VERY simple and not interested in using libraries or other long code snippets. Just something that strips out the rest and keeps just the domain name.

See PHP: parse_url
// Force URL to begin with "http://" or "https://" so 'parse_url' works
$url = preg_replace('/^(?!https?:\/\/)(.*:\/\/)/i', 'http://', $inputURL);
$parts = parse_url($url);
// var_dump($parts); // To see the parsed URL parts, uncomment this line
print $parts['host'];
Note, the subdomains are not unique using the code as listed. www.domain.com and domain.com will be separate entries.

Use parse_url:
$hostname = parse_url($userwebsite,PHP_URL_HOST);

$sDomain = NULL;
foreach (explode('/', $sInput) as $sPart) {
switch ($sPart) {
case 'http:':
case 'https:':
case '':
break;
default:
$sDomain = $sPart;
break 2;
}
}
if ($sDomain !== NULL) {
echo $sDomain;
}
First, all slashes are used as separators. Next, all "known/supported" schemes are ignored, as well as the empty part which happens from "http://". Finally, whatever is next will be stored in $sDomain.
If you do not mind the dependency of PCRE, you can use a regular expression as well:
if (preg_match('/^https?:\/\/([^\/]+)/', $sInput, $aisMatch) === 1) {
echo $aisMatch[1];
}

You could try
int strrpos ( string $haystack , string $needle [, int $offset = 0 ] )
and then put the result of that into
string substr ( string $string , int $start [, int $length ] )
using $needle = "/" and $needle = "."

Related

PHP Get URL without Query String - While Modifiying existing query

Long time lurker first time poster here:
I have searched high and low and am trying to keep my php script somewhat the same as it is:
$url = "https://" . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'];
however when I echo $url I need to:
Remove the string from www.example.com/?utm_source=etc
removing everything after /? on multiple file names like
www.example.com/page.htm/?utm_source=etc
www.example.com/page1.htm/?utm_source=etc
and so on
Keep the google custom search query string www.example.com/search.htm?q=term
Keep a couple other search string close the GSE string example
Ive seen some examples but none worked for me without making a ton of changes, surley there is a easier way.
Thanks in Advance

This will do the job for you. I used php regex and preg_replace() predefined function for it.
Regex (Fiddle Link)
/www(\.)[A-Za-z0-9]+\.\w+(\/)(\?)?((\w)+(\.)(\w)+)?((.)+)?/i
Php example
<?php
$input_line = 'www.example.com/?utm_source=etc'; //Input String
$replace_String = ''; //specify your replace string here
$regex = preg_replace("/www(\.)[A-Za-z0-9]+\.\w+(\/)(\?)?((\w)+(\.)(\w)+)?((.)+)?/i", $replace_String, $input_line);
print_r($regex); //this will print the output with replaced string
?>

You need to split url into different pieces and then put all of it back together - this is the only way.
So for example if your url is
$url = www.example.com/page1.htm/?utm_source=etc&some_key=some_key_value&someothekey=someotherid
$url_array = explode("?",$url);
$url_path = $url_array[0];
$url_params = explode("&",$url_array[1]);
$i = 0;
foreach($url_params as $url_param){ // Let's get specific url parameter and it's value
$i++;
$url_key = explode("=",$url_param)[0];
$url_value = explode("=",$url_param)[1];
$some_key = "some_key";
/*
* or use $i auto increment to find, relevant search parameter based on position e.g.
* if($i = 0){ // 0 is first etc.
* $some_key = $url_key;
* }
*/
// now that we have all keys - let's compare
if($url_key === $some_key){
$some_key_value .= $url_value; // Note the dot before ".=" so you can use it outside the loop, do the same if statements for any other keys you need
}
}
$new_url = $url_path."?".$some_key."=".$some_key_value;

fetching particular value after applying string manupulation in url

I am working on query string pattern for a redirection application. Can we use multiple OR statement in strstr. ex. if i am not sure whether it will be & or & or $ or / then is it possible to use the other signs in OR condition with $code= strstr($code, '/',+1);
Explanation:
the url will come to our redirect application like this
https://www.sample.com?key=1234~rety~1234~retu&c=12&k=12
OR
https://www.sample.com?key=1234~rety~1234~retu/c=12/k=12
OR
https://www.sample.com?key=1234~rety~1234~retu$c=12$k=12
the only variable for our purpose is key(whose name can be changed but data will come int he same pattern) all i want is to get data with ~ pattern.
I am doing following:
parse_str(parse_url($url, PHP_URL_QUERY), $parts);
$keys = array_keys($parts);
$size=sizeof($keys);
$total=substr_count($uri, '~');
$var=$keys[0];
$value=$parts[$var];
for($i=0;$i<$size;$i++){
$var=$keys[$i];
$value=$parts[$var];
$total=substr_count($value, '~');
if($total==3){
$digit_code = preg_split('/\~+/', $value);
$digit_code = array_filter($digit_code);
$digit_code = array_values($digit_code);
$project_id=$digit_code[0];
$country_id=$digit_code[1];
$vendor=$digit_code[2];
$code=$digit_code[3];
//$code= strstr($code, '/',+1);
}
But it is not working when url is coming like this
https://www.sample.com?key=1234~rety~1234~retu/c=12/k=12
or
https://www.sample.com?key=1234~rety~1234~retu$c=12$k=12

This is where using regex could be applicable:
preg_match('/((?:\/|&|\$).*)/', $code, $matches);
$code = $matches[1];
This will match a / or & or $ followed by the rest of the string.
http://au1.php.net/preg_match

are you referring in PHP redirection? or a query string in database to fetch data?
you can read strstr manual.

I found the solution:
I just put
if (strpos($code,'/') !== false) {
$code= strstr($code, '/',+1);
}
if (strpos($code,'$') !== false) {
$code= strstr($code, '$',+1);
}
once after for loop

PHP explode work only with last line

I have a PHP script that include different pages for special referers:
$ref_found = false;
// get referer if exists
$referer = false;
if ( isset($_SERVER['HTTP_REFERER']) ) {
$referer = $_SERVER['HTTP_REFERER'];
// get content of list.txt
$list = explode(chr(10), file_get_contents('list.txt'));
foreach ( $list as $l ) {
if ( strlen($l) > 0 ) {
if ( strpos( $referer, $l ) ) {
$ref_found = true;
}
}
}
}
// include the correct file
if ( $ref_found ) {
require_once('special_page.html');
} else {
require_once('regular_page.html');
}
Referer DB is in simple txt file (list.txt) and it looks like this:
domain1.com
domain2.com
domain3.com
Unfortunalty this script works only for last domain from the list (domain3.com).
What shoud I add? \n ?
Or it's better idea to create domains DB in different way?

The problem is that when you explode() your list of domain names, you end up with whitespace around each item. At the very least, you will have a newline (\n) somewhere, since the linebreaks in your file are probably \r\n.
So you're checking against something like " domain1.com" or maybe "\ndomain1.com", or maybe "domain1.com\n". Since this extra whitespace doesn't exists in the referrer header, it's not matching when you expect it to.
By calling trim() on each value you find, you'll get a clean domain name that you can use to do a more useful comparison:
$list = explode("\n", file_get_contents('list.txt'));
foreach ($list as $l) {
$l = trim($l);
if ((strlen($l) > 0) && (strpos($referer, $l) !== false)) {
$ref_found = true;
break;
}
}
I made a couple other minor updates to your code as well:
I switched away from using chr() and just used a string literal ("\n"). As long as you use double-quotes, it'll be a literal newline character, instead of an actual \ and n, and the string literal is much easier to understand for somebody reading your code.
I switched from a "\r" character (chr 10) to a "\n" character (chr 13). There's several different newline formats, but the most common are "\n" and "\r\n". By exploding on "\n", your code will work with both formats, where "\r" will only work with the second.
I combined your two if statements. This is a very minor update that doesn't have much effect except to (in my opinion) make the code easier to read.
I updated your strpos() to do a literal comparison to false (!==). It's probably not an issue with this code because the referrer value will start with http://, but it's a good habit to get into. If the substring happens to occur at the beginning of the parent string, strpos() will return 0, which will be interpreted as false in your original code.
I added a break statement in your loop if you found a matching domain name. Once you find one and set the flag, there's no reason to continue checking the rest of the domains in the list, and break allows you to cancel the rest of the foreach loop.

chr(13) == "\n"
chr(10) == "\r"
"\n" is most likely what you want.

Trimming characters off a string

Given the following URL:
http://www.domain.com/reporting/category-breakdown.php?re=updated
I need to remove everything after the .php
It might be "?re=updated" or it could be something else. The number of characters won't always be the same, the string will always end with .php though.
How do I do this?

To find the first position of a substring in a string you can use strpos() in PHP.
$mystring = 'http://www.domain.com/reporting/category-breakdown.php?re=updated';
$findme = '.php';
$pos = strpos($mystring, $findme);
After, you have the position of the first character of your substring '.php' in your URL. You want to get the URL until the end of '.php', that means the position you get + 4 (substring length). To get this, you can use substr(string,start,length) function.
substr($mystring, 0, $pos + 4);
Here you are!

Find the first indexOf (".php"), then use substring from char 0 to your index + the length of (".php");

3 line solution:
$str = "http://www.domain.com/reporting/category-breakdown.php?re=updated";
$str = array_shift(explode('?', $str));
echo $str;
Note: it's not fool-proof and could fail in several cases, but for the kind of URLs you mentioned, this works.

Here is another way to get the non-query-string part of a url with PHP:
$url = 'http://www.domain.com/reporting/category-breakdown.php?re=updated';
$parsed = parse_url($url);
$no_query_string = $parsed['scheme'] . '://' . $parsed['hostname'] . $parsed['path'];
// scheme: http, hostname: www.domain.com, path: /reporting/category-breakdown.php
That will handle .php, .phtml, .htm, .html, .aspx, etc etc.
Link to Manual page.

Url splitting in php

I have an url like this:
http://www.w3schools.com/PHP/func_string_str_split.asp
I want to split that url to get the host part only. For that I am using
parse_url($url,PHP_URL_HOST);
it returns www.w3schools.com.
I want to get only 'w3schools.com'.
is there any function for that or do i have to do it manually?

There are many ways you could do this. A simple replace is the fastest if you know you always want to strip off 'www.'
$stripped=str_replace('www.', '', $domain);
A regex replace lets you bind that match to the start of the string:
$stripped=preg_replace('/^www\./', '', $domain);
If it's always the first part of the domain, regardless of whether its www, you could use explode/implode. Though it's easy to read, it's the most inefficient method:
$parts=explode('.', $domain);
array_shift($parts); //eat first element
$stripped=implode('.', $parts);
A regex achieves the same goal more efficiently:
$stripped=preg_replace('/^\w+\./', '', $domain);
Now you might imagine that the following would be more efficient than the above regex:
$period=strpos($domain, '.');
if ($period!==false)
{
$stripped=substr($domain,$period+1);
}
else
{
$stripped=$domain; //there was no period
}
But I benchmarked it and found that over a million iterations, the preg_replace version consistently beat it. Typical results, normalized to the fastest (so it has a unitless time of 1):
Simple str_replace: 1
preg_replace with /^\w+\./: 1.494
strpos/substr: 1.982
explode/implode: 2.472
The above code samples always strip the first domain component, so will work just fine on domains like "www.example.com" and "www.example.co.uk" but not "example.com" or "www.department.example.com". If you need to handle domains that may already be the main domain, or have multiple subdomains (such as "foo.bar.baz.example.com") and want to reduce them to just the main domain ("example.com"), try the following. The first sample in each approach returns only the last two domain components, so won't work with "co.uk"-like domains.
explode:
$parts = explode('.', $domain);
$parts = array_slice($parts, -2);
$stripped = implode('.', $parts);
Since explode is consistently the slowest approach, there's little point in writing a version that handles "co.uk".
regex:
$stripped=preg_replace('/^.*?([^.]+\.[^.]*)$/', '$1', $domain);
This captures the final two parts from the domain and replaces the full string value with the captured part. With multiple subdomains, all the leading parts get stripped.
To work with ".co.uk"-like domains as well as a variable number of subdomains, try:
$stripped=preg_replace('/^.*?([^.]+\.(?:[^.]*|[^.]{2}\.[^.]{2}))$/', '$1', $domain);
str:
$end = strrpos($domain, '.') - strlen($domain) - 1;
$period = strrpos($domain, '.', $end);
if ($period !== false) {
$stripped = substr($domain,$period+1);
} else {
$stripped = $domain;
}
Allowing for co.uk domains:
$len = strlen($domain);
if ($len < 7) {
$stripped = $domain;
} else {
if ($domain[$len-3] === '.' && $domain[$len-6] === '.') {
$offset = -7;
} else {
$offset = -5;
}
$period = strrpos($domain, '.', $offset);
if ($period !== FALSE) {
$stripped = substr($domain,$period+1);
} else {
$stripped = $domain;
}
}
The regex and str-based implementations can be made ever-so-slightly faster by sacrificing edge cases (where the primary domain component is a single letter, e.g. "a.com"):
regex:
$stripped=preg_replace('/^.*?([^.]{3,}\.(?:[^.]+|[^.]{2}\.[^.]{2}))$/', '$1', $domain);
str:
$period = strrpos($domain, '.', -7);
if ($period !== FALSE) {
$stripped = substr($domain,$period+1);
} else {
$stripped = $domain;
}
Though the behavior is changed, the rankings aren't (most of the time). Here they are, with times normalized to the quickest.
multiple subdomain regex: 1
.co.uk regex (fast): 1.01
.co.uk str (fast): 1.056
.co.uk regex (correct): 1.1
.co.uk str (correct): 1.127
multiple subdomain str: 1.282
multiple subdomain explode: 1.305
Here, the difference between times is so small that it wasn't unusual for . The fast .co.uk regex, for example, often beat the basic multiple subdomain regex. Thus, the exact implementation shouldn't have a noticeable impact on speed. Instead, pick one based on simplicity and clarity. As long as you don't need to handle .co.uk domains, that would be the multiple subdomain regex approach.

You have to strip off the subdomain part by yourself - there is no built-in function for this.
// $domain beeing www.w3scools.com
$domain = implode('.', array_slice(explode('.', $domain), -2));
The above example also works for subdomains of a unlimited depth as it'll alwas return the last two domain parts (domain and top-level-domain).
If you only want to strip off www. you can simply do a str_replace(), which will be faster indeed:
$domain = str_replace('www.', '', $domain);

You need to strip off any characters before the first occurencec of [.] character (along with the [.] itself) if and only if there are more than 1 occurence of [.] in the returned string.
for example if the returned string is www-139.in.ibm.com then the regular expression should be such that it returns in.ibm.com since that would be the domain.
If the returned string is music.domain.com then the regular expression should return domain.com
In rare cases you get to access the site without the prefix of the server that is you can access the site using http://domain.com/pageurl, in this case you would get the domain directly as domain.com, in such case the regex should not strip anything
IMO this should be the pseudo logic of the regex, if you want I can form a regex for you that would include these things.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Trim a url to just the domain name using PHP - php

Use parse_url: $hostname = parse_url($userwebsite,PHP_URL_HOST);

You could try int strrpos ( string $haystack , string $needle [, int $offset = 0 ] ) and then put the result of that into string substr ( string $string , int $start [, int $length ] ) using $needle = "/" and $needle = "."

Related

PHP Get URL without Query String - While Modifiying existing query

fetching particular value after applying string manupulation in url

PHP explode work only with last line

Trimming characters off a string

Url splitting in php

Categories

Resources