subdomain extract without regex with php

subdomain extract without regex with php - php

is there a great way to extract subdomain with php without regex ?
why without regex ?
there are a lot of topic about this, one if them is Find out subdomain using Regular Expression in PHP
the internet says it consumes memory a
lot, if there is any consideration or
you think better use regex ( maybe we
use a lot of function to get this
solution ) please comment below too.
example
static.x.com = 'static'
helloworld.x.com = 'helloworld'
b.static.ak.x.com = 'b.static.ak'
x.com = ''
www.x.com = ''
Thanks for looking in.
Adam Ramadhan

http://php.net/explode ?
Just split them on the dot? And do some functions?
Or if the last part (x.com) is the same everytime, do a substring on the hostname, stripping of the last part.
The only exception you'll have to make in your handling is the www.x.com (which technically is a subdomain).
$hostname = '....';
$baseHost = 'x.com';
$subdomain = substr($hostname, 0, -strlen($baseHost));
if ($subdomain === 'www') {
$subdomain = '';
}

Whoever told you that regexes "consume a lot" was an idiot. Simple regexes are not very cpu/memory-consuming.
However, for your purpose a regex is clearly overkill. You can explode() the string and then take as many elements from the array as you need. However, your last example is really bad. www is a perfectly valid subdomain.

You can first use parse_url http://www.php.net/manual/de/function.parse-url.php
and than explode with . as delimiter on the host http://www.php.net/manual/de/function.explode.php
I would not say it is quicker (just test it), but maybe this solution is better.

function getSubdomain($host) {
return implode('.', explode('.', $host, -2));
}
explode splits the string on the dot and drops the last two elements. Then implode combines these pieces again using the dot as separator.

Related

Getting base domain name php

function getHost($Address) {
$parseUrl = parse_url(trim($Address));
return trim($parseUrl[host]
? $parseUrl[host]
: array_shift(explode('/', $parseUrl[path], 2))
);
}
$httpreferer = getHost($_SERVER['HTTP_REFERER']);
$httpreferer = preg_replace('#^(http(s)?://)?w{3}\.#', '$1', $httpreferer);
echo $httpreferer;
I am using this to strip http:// , www and subdomains to return just the host however it returns the following:
http://site.google.com ==> google.com
http://google.com ==> com
How do i get it to just remove the subdomain when it exists instead of stripping down to the tld when it doesn't exist?

Start with parse_url specifically parse_url($url)['host']
$arr = parse_url($url);
echo preg_replace('/^www\./', '', $arr['host'])."\n";
Output
site.google.com
google.com
Sandbox
The Regex for this is just matches www. if it's the start of the string, you could probably do this part a few ways, such as with
No subdomain
If you don't want any subdomain at all:
$arr = parse_url($url)['host'];
echo preg_replace('/^(?:[-a-z0-9_]+\.)?([-a-z0-9_]+\..+)$/', '$1',$arr['host'])."\n";
Sandbox
No subdomain, no Country Code
$arr = parse_url($url)['host'];
echo preg_replace('/^(?:[-a-z0-9_]+\.)?([-a-z0-9_]+)(\.[^.]+).*?$/', '$1$2',$arr['host'])."\n";
Sandbox
How it works,
Same as the previous one but the domain is separated from the host, and instead of just capturing everything, we capture everything but the . and outside the new group we capture everything (confusingly the . is everything here) but with *? which means * 0 or more times, ? non-greedy don't take characters from previous expressions.
Or to put it another way. Capture anything 0 or more times don't steal characters from previous matches. This way if there is nothing such as www.google.com we are only worried about stuff after .com then its 0 matches. But if its www.google.com.uk it matches the .uk.
Single Line Answer.
Some versions of PHP, I forget what ones but the newer ones actually let you do this:
$host = parse_url($url)['host'];
So taking the last example we can compress that into one line and remove the variable assignment.
echo preg_replace('/^(?:[-a-z0-9_]+\.)?([-a-z0-9_]+)(\.[^.]+).*?$/', '$1$2',parse_url($url)['host'])."\n";
See it in action
That was just for fun!
Summery
Using parse_url is really the "correct" way to do it. Or the proper way to start as it removes a lot of the other "stuff" and gives you a good starting place. Anyway this was fun for me ... :) ... And I needed a break from coding my Website, because it's tedious for me now (It was 8 years old, so I'm redoing it in WordPress, and I've done about a zillion WordPress site) ...
Cheers, hope it helps!

Found the Answer
$testAdd = "https://testing.google.co.uk";
$parse = parse_url($testAdd);
$httpreferer = preg_replace("/^([a-zA-Z0-9].*\.)?([a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z.]{2,})$/", '$2', $parse['host']);
echo $httpreferer;
This will also deal with domain with country TLD
Thanks for all your help.

Extraction of filename

Let's say I have a string like:
Cant_Hold_Us_-_Fingerstyle_Guitar_0.mp3
How can I get rid of _0.mp3 dynamically with PHP? The values are not hard-coded for the current example. Maybe if i use explode?

$old = "your file name here";
$new = substr($old, 0, strrpos($old, "_"));
The substr method will take only the part of the string that you want.
The strrpos method finds the last index of the underscore and passes that number into the substr method so that it knows how far to cut. This method is ideal because rather than hardcoding a specific value, the script will dynamically change for each file, just as you suggested.
Just a bit of warning: if the file name does not contain an underscore, the method won't know where to cut to and will cause error in execution. A good bit of practice would be checking if (strrpos($old, "_") !== false).

I like Confiqure's answer. To complement, you could also use a regular expression if you find that you need more power.
$old = "Cant_Hold_Us_-_Fingerstyle_Guitar_0.mp3";
$new = preg_replace('/_\d+\.mp3$/', '', $old);

Regex is your friend. Have a look at preg_match and / or preg_replace.
$title = preg_replace("/_[0-9]+\.mp3$/i", "", $x);
Works for blablabla_21.mp3 as well....

PHP Regex to remove everything after a character

So I've seen a couple articles that go a little too deep, so I'm not sure what to remove from the regex statements they make.
I've basically got this
foo:bar all the way to anotherfoo:bar;seg98y34g.?sdebvw h segvu (anything goes really)
I need a PHP regex to remove EVERYTHING after the colon. the first part can be any length (but it never contains a colon. so in both cases above I'd end up with
foo and anotherfoo
after doing something like this horrendous example of psuedo-code
$string = 'foo:bar';
$newstring = regex_to_remove_everything_after_":"($string);
EDIT
after posting this, would an explode() work reliably enough? Something like
$pieces = explode(':', 'foo:bar')
$newstring = $pieces[0];

explode would do what you're asking for, but you can make it one step by using current.
$beforeColon = current(explode(':', $string));
I would not use a regex here (that involves some work behind the scenes for a relatively simple action), nor would I use strpos with substr (as that would, effectively, be traversing the string twice). Most importantly, this provides the person who reads the code with an immediate, "Ah, yes, that is what the author is trying to do!" instead of, "Wait, what is happening again?"
The only exception to that is if you happen to know that the string is excessively long: I would not explode a 1 Gb file. Instead:
$beforeColon = substr($string, 0, strpos($string,':'));
I also feel substr isn't quite as easy to read: in current(explode you can see the delimiter immediately with no extra function calls and there is only one incident of the variable (which makes it less prone to human errors). Basically I read current(explode as "I am taking the first incident of anything prior to this string" as opposed to substr, which is "I am getting a substring starting at the 0 position and continuing until this string."

Your explode solution does the trick. If you really want to use regexes for some reason, you could simply do this:
$newstring = preg_replace("/(.*?):(.*)/", "$1", $string);

A bit more succinct than other examples:
current(explode(':', $string));

You can use RegEx that m.buettner wrote, but his example returns everything BEFORE ':', if you want everything after ':' just use $2 instead of $1:
$newstring = preg_replace("/(.*?):(.*)/", "$2", $string);

You could use something like the following. demo: http://codepad.org/bUXKN4el
<?php
$s = 'anotherfoo:bar;seg98y34g.?sdebvw h segvu';
$result = array_shift(explode(':', $s));
echo $result;
?>

Why do you want to use a regex?
list($beforeColon) = explode(':', $string);

PHP Regex on URL - split into variables

I am trying to implement a php script which will run on every call to my site, look for a certain pattern of URL, then explode the URL and perform a redirect.
Basically I want to run this on a new CMS to catch all incoming links from the old CMS, and redirect, based on mapping, say an article id stripped form the URL to the same article ID imported into the new CMS's DB.
I can do the implementation, the redirect etc, but I am lost on the regex.
I need to catch any occurrences of:
domain.com/content/view/*/34/ or domain.com/content/view/*/30/ (where * is a wildcard) and capture * and the 30 or 34 in a variable which I will then use in a DB query.
If the following is encountered:
domain.com/content/view/*/34/1/*/
I need to capture the first * and the second *.
Be very grateful for anyone who can give me a hand on this.

I'm not sure regular expressions are the way to go. I think it would probably be easier to use explode ('/' , $url) and check by looping over that array.
Here are the steps I would follow:
$url = parse_url($url, PHP_URL_PATH);
$url = trim($url, '/');
$parts = explode ('/' , $url);
Then you can check if
($parts[0]=='content' && $parts[1]=='view' && $parts[3]=='34')
You can also easily get the information you want with $parts[2].

It's actually very simple, a more flexible and straightforward approach is to explode() the url into an array called something like $segments, and then test on there. If you have a very small number of expected URLs, then this kind of approach is probably easier to maintain and to read.
I wouldn't recommend doing this in the htaccess file because of the performance overhead.

First, I would use the PHP function parse_url() to get the path, devoid of any protocol or hostname.
Once you have that the following code should get you the info you need.
<?php
$url = 'http://domain.com/content/view/*/34/'; // first example
$url = 'http://domain.com/content/view/*/34/1/*/'; // second example
$url_array = parse_url($url);
$path = $url_array['path'];
// Match the URL against regular expressions
if (preg_match('/content\/view\/([^\/]+)\/([0-9]+)\//i', $path, $matches)){
print_r($matches);
}
if (preg_match('/content\/view\/([^\/]+)\/([0-9]+)\/([0-9]+)\/([^\/]+)/i', $path, $matches)){
print_r($matches);
}
?>
([^/]+) matches any sequence of characters except a forward slash
([0-9]+) matches any sequence of numbers
Though you can probably write a single regular expression to match most URL variants, consider using multiple regular expressions to check for different types of URLs. Depending on how much traffic you get, the speed hit won't be all that terrible.
Also, I recommend reading Mastering Regular Expressions by O'reilly. A good knowledge of regular expressions will come in handy quite often.
http://www.regular-expressions.info/php.html

Regular expression to extract from URI

I need a regular expression to extract from two types of URIs
http://example.com/path/to/page/?filter
http://example.com/path/to/?filter
Basically, in both cases I need to somehow isolate and return
/path/to
and
?filter
That is, both /path/to and filter is arbitrary. So I suppose I need 2 regular expressions for this? I am doing this in PHP but if someone could help me out with the regular expressions I can figure out the rest. Thanks for your time :)
EDIT: So just want to clearify, if for example
http://example.com/help/faq/?sort=latest
I want to get /help/faq and ?sort=latest
Another example
http://example.com/site/users/all/page/?filter=none&status=2
I want to get /site/users/all and ?filter=none&status=2. Note that I do not want to get the page!

Using parse_url might be easier and have fewer side-effects then regex:
$querystring = parse_url($url, PHP_URL_QUERY);
$path = parse_url($var, PHP_URL_PATH);
You could then use explode on the path to get the first two segments:
$segments = explode("/", $path);

Try this:
^http://[^/?#]+/([^/?#]+/[^/?#]+)[^?#]*\?([^#]*)
This will get you the first two URL path segments and query.

not tested but:
^https?://[^ /]+[^ ?]+.*
which should match http and https url with or without path, the second argument should match until the ? (from the ?filter for instance) and the .* any char except the \n.

Have you considered using explode() instead (http://nl2.php.net/manual/en/function.explode.php) ? The task seems simple enough for it. You would need 2 calls (one for the / and one for the ?) but it should be quite simple once you did that.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

subdomain extract without regex with php - php

You can first use parse_url http://www.php.net/manual/de/function.parse-url.php and than explode with . as delimiter on the host http://www.php.net/manual/de/function.explode.php I would not say it is quicker (just test it), but maybe this solution is better.

function getSubdomain($host) { return implode('.', explode('.', $host, -2)); } explode splits the string on the dot and drops the last two elements. Then implode combines these pieces again using the dot as separator.

Related

Getting base domain name php

Extraction of filename

PHP Regex to remove everything after a character

PHP Regex on URL - split into variables

Regular expression to extract from URI

Categories

Resources