parse string for subdomain in php - php

How can i find if a string has subdomain existing if there is no scheme / host present.
eg: $url="sub.main.com/images/sample.jpg";
I am trying to parse the url for images, and I am using parse_url for most cases.
But given the url strings can some in different flavors,
eg:
/images/sample.jpg
//main.com/images/sample.jpg
images/sample.jpg
etc, I am trying to address the different cases one by one. Right now, I am finding it hard to detect if a string has subdomain present or not.
so for a string such as $url="sub.main.com/images/sample.jpg";` i would like to extract the subdomain, and for a string such as images/sample.jpg, i would like to find out that there is no subdomain

Interesting problem. I've fiddled around with this for a while; this method inevitably isn't perfect, but it may start you down the right path.
My solution begins with the two source files in this repository: https://github.com/usrflo/registered-domain-libs/tree/master/PHP
First, you may need to modify regDomain.inc.php to change an instance of $signingDomainParts = split('\.', $signingDomain); to $signingDomainParts = preg_split('/\./', $signingDomain); if split is deprecated in your php version.
Once you've got those saved, try this testing code, I put all of the URLs mentioned in the thread here as test cases:
<?php
require_once("effectiveTLDs.inc.php");
require_once("regDomain.inc.php");
$tests = Array("/images/sample.jpg","//main.com/images/sample.jpg","images/sample.jpg", "sub.main.com/images/sample.jpg", "http://www.example.com/www.google.com/sample.jpg", "amazon.co.uk/images/sample.jpg", "amazon.com/images/sample.jpg", "http://sub2.sub.main.co.uk/images/sample.jpg", "sub2.sub.main.co.uk/images/sample.jpg");
foreach($tests as $test)
{
echo "Attempting $test.<BR/>";
$one = parse_url($test);
if(!array_key_exists("host", $one))
{
echo "Converting to: http://$test";
echo "<BR/>";
$one = parse_url("http://$test");
}
if(!$one){echo "<BR/>";continue;}
echo "parse_url parts: ";
print_r($one);
echo "<BR/>";
if($one && array_key_exists("host", $one))
{
$domain = getRegisteredDomain($one["host"], $tldTree);
if(sizeof($domain))
{
$two = explode(".", $domain);
echo "domain parts: ";
print_r($two);
echo "<BR/>";
if(sizeof($two))
{
$three = array_diff(explode(".", $one["host"]), $two);
if(sizeof($three))
{
echo "Hark! A subdomain!: ";
print_r($three);
echo "<BR/>";
}
}
}
}
echo "<BR/>";
}
?>
This code identifies the following of the test-cases as having subdomains:
Attempting sub.main.com/images/sample.jpg.
Hark! A subdomain!: Array ( [0] => sub )
Attempting http://www.example.com/www.google.com/sample.jpg.
Hark! A subdomain!: Array ( [0] => www )
Attempting http://sub2.sub.main.co.uk/images/sample.jpg.
Hark! A subdomain!: Array ( [0] => sub2 [1] => sub )
Attempting sub2.sub.main.co.uk/images/sample.jpg.
Hark! A subdomain!: Array ( [0] => sub2 [1] => sub )

Try this code
<?php
$url = 'sub.main.com/images/sample.jpg';
$arr = explode('/',$url);
$domain = $arr[0];
$string = $arr[1];
$arr2 = explode('.',$domain);
if(count($arr2)>2) {
$subdomain = $arr2[0];
echo $subdomain;
}
?>

<?php
$url = 'http://sub.main.com/images/sample.jpg';
$arr = explode('/',$url);
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs))
{
$main_domain=$regs['domain'];
}
$host=$pieces['host'];
$path=$pieces['path'];
if($host != $main_domain)
{
$arr2 = explode('.',$host);
$subdomain = $arr2[0];
echo $subdomain;
}
$string=substr($path,1,strlen($path));
?>

Try the following:
<?php
$url="sub.main.com/images/sample.jpg";
preg_match('#^(?:http://)?([^.]+).?([^/]+)#i',$url, $hits);
print_r($hits);
?>
This should output something like:
Array ( [0] => sub.main.com [1] => sub [2] => main.com )

Related

Removing first part of string?

Trying to edit urls in array.
[0] => https://www.proud-web.jp/mansion/b115110/https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011244
[1] => https://www.proud-web.jp/mansion/p-ebisuminami88/https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011205
As you see urls are like this. trying to remove first url and contain the second.
expected result is like:
https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011244
https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011205
what I tried is right below. In that way I can only remove the second. But how can I fix this code to remove first url in the string not the second.
$result = [];
foreach($setLinks as $key) {
array_push($result, current(explode("/h", $key)));
}
You can use foreach followed by explode to get split the string w.r.t /https. Below is the code:
$array = ['https://www.proud-web.jp/mansion/b115110/https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011244','https://www.proud-web.jp/mansion/p-ebisuminami88/https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011205'];
$result = [];
foreach($array as $arr){
$getUrl = explode('/https', $arr);
array_push($result, 'https' . $getUrl[1]);
}
print_r($result);
I would separate the task in 3 subtasks.
First one being to capture the protocol using regex in example (the protocol of that url could be https, http, ftp ...)
Then, capture the url itself, splitting the string using :// as delimiter
Finally, rebuild protocol . "://" . url
In example :
<?php
$array =
[
'https://www.proud-web.jp/mansion/b115110/https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011244',
'https://www.proud-web.jp/mansion/p-ebisuminami88/https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011205',
'http://www.example.com/home/http://something',
'http://www.example.com/https/ftp://something',
'https://nothing.to.capture'
];
$result = array();
/*
* matches a slash -> \/
* followed by letters -> ([a-z]*)
* followed by :// -> :\/\/
* and capture the letters -> (the parenthesis)
* it can match, in example : something/mycustomprotocol://somethingelse
*/
$pattern = "/\/([a-z]*):\/\//i";
foreach($array as $item) {
preg_match_all($pattern, $item, $matches);
if (count($matches) > 0)
{
$urls = explode("://", $item, 3);
if (count($urls) > 2)
{
$protocol = $matches[1][0];
$result[] = $protocol . "://" . $urls[2];
}
}
}
var_dump($result);
Output
array(4) {
[0]=>
string(83) "https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011244"
[1]=>
string(83) "https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011205"
[2]=>
string(16) "http://something"
[3]=>
string(15) "ftp://something"
}
try this
unset($setLinks[0]);
foreach($setLinks as $key) {
echo $key;
}
This is what I would do:
$urls = [
'https://www.proud-web.jp/mansion/b115110/https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011244',
'https://www.proud-web.jp/mansion/p-ebisuminami88/https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011205'
];
foreach($urls as &$url){
$url = 'http'.preg_split('/^.+?\/http/', $url, 2, PREG_SPLIT_NO_EMPTY)[0];
}
print_r($urls);
Output
Array
(
[0] => https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011244
[1] => https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011205
)
Sandbox
I set it up so that it would handle both HTTP and HTTPS
You could use preg_replace to remove the leading text:
foreach ($setLinks as &$value) {
$value = preg_replace('#^.+(https?://.*)$#', '$1', $value);
}
print_r($setLinks);
Output:
Array (
[0] => https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011244
[1] => https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011205
)
Demo on 3v4l.org
i miss understand your question .kindly try it
<?php $quest = array("https://www.proud-web.jp/mansion/b115110/https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011244",
"https://www.proud-web.jp/mansion/p-ebisuminami88/https://www.proud-web.jp/module/structure/outline/BukkenOutline.xphp?code_no=011205");
foreach($quest as $q )
{
$allquest = explode("/https",$q);
echo "https".$allquest[1];
}
?>

find regex in glob() search filesin directory php

I'm serching files in a directory
PROBLEM
I need to retrieve files whose directory matches with the word entered
I need something like: glob (gifs/[like $varSearched]/*.gif)
Directory
gifs/hello/1.gif
gifs/hello/2.gif
gifs/hello/3.gif
gifs/claps/1.gif
gifs/claps/2.gif
gifs/wow/1.gif
gifs/wow/2.gif
PATH PHP (it works but its needed to type all the file directory "hello", "wow", "claps" to retrieve the results). What I need is to type one or two letter only to retrieve the results
$dir="o"; // "o" is the searched term
$mdir = "../gifs/".$dir."/";
$files = glob($mdir.'*.gif');
foreach ($files as $gif){
$title = basename(dirname($gif));
$arr[] = $title." - ".$gif;
}
$arr= implode("",$arr);
echo $arr;
EXPECTED RESULTS
gifs/hello/1.gif
gifs/hello/2.gif
gifs/hello/3.gif
gifs/wow/1.gif
gifs/wow/2.gif
As in case with file name, you can add * as a placeholder for any symbol in directory name:
$mdir = "../gifs/*".$dir."*/"; // see `*` here?
$files = glob($mdir.'*.gif');
// rest of the code here
You can also do this.
<?php
$search = "c";
$dir = "gifs/";
$folder = exec( "ls -d ".$dir.$search."*" );
$files = glob($folder.'/*.gif');
foreach ( $files as $gif ) {
$title = basename( dirname( $gif ) );
$arr[] = $title." - ".$gif;
}
print_r( $arr );
$arr = implode( "", $arr );
//echo $arr."\n";
Replacing the $search = "c"; with whatever directory letter you want and replacing $dir = "gifs/"; with whatever your filepath is.
Print_r with search letter "c"
Array
(
[0] => claps - gifs/claps/1.gif
[1] => claps - gifs/claps/2.gif
)
Print_r with search letter "h"
Array
(
[0] => hello - gifs/hello/1.gif
[1] => hello - gifs/hello/2.gif
[2] => hello - gifs/hello/3.gif
)
You could also wrap this in a function and pass it the two arguments $search and $dir and add a return $arr at the end.

Break Path into multiple Paths

I have the following path for example:
/Test1/Test2/Test3
Sometimes this path can be for example:
/Test1/Test2/Test3/Test4/Test5 and so on...
What I would like to do is take this unknown path and translate it into sections which will ultimately result in a navigation URL such as:
/Test1
/Test1/Test2
/Test1/Test2/Test3
and so on...
It's difficult to supply you with any code examples because many of the things I have attempted have resulted in no good results.
I assume I need to explode() the path using / as the delimiter and then splice it together somehow. I'm really at a loss here.
Does anyone have any suggestions I can try?
<?php
$path = '/Test1/Test2/Test3/Test4/Test5';
$explode = explode('/', $path);
$count = count($explode);
$res = '';
for($i = 1; $i < $count; $i++) {
echo $res .= '/' . $explode[$i];
echo '<br/>';
}
Returns:
/Test1
/Test1/Test2
/Test1/Test2/Test3
/Test1/Test2/Test3/Test4
/Test1/Test2/Test3/Test4/Test5
Here is how you get your array segments:
$path = '/Test1/Test2/Test3/Test4/Test5'; // or whatever your path is
$segments = explode('/', ltrim('/',$path));
If I understand you, then what you want to do is to build an array that is like
Array(
[0] => '/Test1'
[1] => '/Test1/Test2'
...
)
So you could just loop through your array and build up this new array
$paths_from_segments = array();
$segment_count = count($sgements);
$path_string = '';
foreach($sgement as $segment) {
$path_string .= '/' . $segment;
$paths_from_segments[] = $path_string;
}
var_dump($paths_from_segments);
Not exactly what you mean by "splice it together", but from the sounds of it you're looking for PHP's implode(), which is explode() in reverse.
explode("/", "test1/test2");
// result:
// Array
// (
// [0] => test1
// [1] => test2
// )
implode("/", Array("test1", "test2"));
// result:
// "test1/test2"

Get part of array string

Hello my output PHP code is :
Array ( [country] => BG - Bulgaria )
... and he comes from here :
<?php
$ip = $_SERVER['REMOTE_ADDR'];
print_r(geoCheckIP($ip));
//Array ( [domain] => dslb-094-219-040-096.pools.arcor-ip.net [country] => DE - Germany [state] => Hessen [town] => Erzhausen )
//Get an array with geoip-infodata
function geoCheckIP($ip)
{
//check, if the provided ip is valid
if(!filter_var($ip, FILTER_VALIDATE_IP))
{
throw new InvalidArgumentException("IP is not valid");
}
//contact ip-server
$response=#file_get_contents('http://www.netip.de/search?query='.$ip);
if (empty($response))
{
throw new InvalidArgumentException("Error contacting Geo-IP-Server");
}
//Array containing all regex-patterns necessary to extract ip-geoinfo from page
$patterns=array();
$patterns["country"] = '#Country: (.*?) #i';
//Array where results will be stored
$ipInfo=array();
//check response from ipserver for above patterns
foreach ($patterns as $key => $pattern)
{
//store the result in array
$ipInfo[$key] = preg_match($pattern,$response,$value) && !empty($value[1]) ? $value[1] : '';
}
return $ipInfo;
}
?>
How can I get ONLY the name of the Country like in my case "Bulgaria"? I think it will happen with preg_replace or substr but i dont know what is the better solution now.
substr's probably easiest:
$bad_country = 'BG - Bulgaria';
$good_country = substr($bad_country, 5); // start at char 5, 'B'
if the country is always separated from the acronym by ' - ', do it like this:
list($acrn, $country) = explode(' - ', $var);
If you are guaranteed that the output will always be in the same format(ie BG - Bulgaria, US - United States, etc), you could use explode():
$array['country'] = "BG - Bulgaria";
$country = explode(" - ", $array['country']);
echo $country[1];
This will output "Bulgaria".
try:
foreach( $list as $v) {
$temp = explode(' - ', $v);
$countries[] = $temp[1];
}
$patterns["country"] = '#Country:.*-\s+(\w+?) #i';
try this one as your pattern
Change your pattern to this:
'#Country: [a-z]{2,} - (.*?) #i'
Assuming the pattern won't change

How to remove the querystring and get only the URL?

I'm using PHP to build the URL of the current page. Sometimes, URLs in the form of
www.example.com/myurl.html?unwantedthngs
are requested. I want to remove the ? and everything that follows it (querystring), such that the resulting URL becomes:
www.example.com/myurl.html
My current code is this:
<?php
function curPageURL() {
$pageURL = 'http';
if ($_SERVER["HTTPS"] == "on") {
$pageURL .= "s";
}
$pageURL .= "://";
if ($_SERVER["SERVER_PORT"] != "80") {
$pageURL .= $_SERVER["SERVER_NAME"] . ":" .
$_SERVER["SERVER_PORT"] . $_SERVER["REQUEST_URI"];
} else {
$pageURL .= $_SERVER["SERVER_NAME"] . $_SERVER["REQUEST_URI"];
}
return $pageURL;
}
?>
You can use strtok to get string before first occurence of ?
$url = strtok($_SERVER["REQUEST_URI"], '?');
strtok() represents the most concise technique to directly extract the substring before the ? in the querystring. explode() is less direct because it must produce a potentially two-element array by which the first element must be accessed.
Some other techniques may break when the querystring is missing or potentially mutate other/unintended substrings in the url -- these techniques should be avoided.
A demonstration:
$urls = [
'www.example.com/myurl.html?unwantedthngs#hastag',
'www.example.com/myurl.html'
];
foreach ($urls as $url) {
var_export(['strtok: ', strtok($url, '?')]);
echo "\n";
var_export(['strstr/true: ', strstr($url, '?', true)]); // not reliable
echo "\n";
var_export(['explode/2: ', explode('?', $url, 2)[0]]); // limit allows func to stop searching after first encounter
echo "\n";
var_export(['substr/strrpos: ', substr($url, 0, strrpos( $url, "?"))]); // not reliable; still not with strpos()
echo "\n---\n";
}
Output:
array (
0 => 'strtok: ',
1 => 'www.example.com/myurl.html',
)
array (
0 => 'strstr/true: ',
1 => 'www.example.com/myurl.html',
)
array (
0 => 'explode/2: ',
1 => 'www.example.com/myurl.html',
)
array (
0 => 'substr/strrpos: ',
1 => 'www.example.com/myurl.html',
)
---
array (
0 => 'strtok: ',
1 => 'www.example.com/myurl.html',
)
array (
0 => 'strstr/true: ',
1 => false, // bad news
)
array (
0 => 'explode/2: ',
1 => 'www.example.com/myurl.html',
)
array (
0 => 'substr/strrpos: ',
1 => '', // bad news
)
---
Use PHP Manual - parse_url() to get the parts you need.
Edit (example usage for #Navi Gamage)
You can use it like this:
<?php
function reconstruct_url($url){
$url_parts = parse_url($url);
$constructed_url = $url_parts['scheme'] . '://' . $url_parts['host'] . $url_parts['path'];
return $constructed_url;
}
?>
Edit (second full example):
Updated function to make sure scheme will be attached and none notice msgs appear:
function reconstruct_url($url){
$url_parts = parse_url($url);
$constructed_url = $url_parts['scheme'] . '://' . $url_parts['host'] . (isset($url_parts['path'])?$url_parts['path']:'');
return $constructed_url;
}
$test = array(
'http://www.example.com/myurl.html?unwan=abc',
`http://www.example.com/myurl.html`,
`http://www.example.com`,
`https://example.com/myurl.html?unwan=abc&ab=1`
);
foreach($test as $url){
print_r(parse_url($url));
}
Will return:
Array
(
[scheme] => http
[host] => www.example.com
[path] => /myurl.html
[query] => unwan=abc
)
Array
(
[scheme] => http
[host] => www.example.com
[path] => /myurl.html
)
Array
(
[scheme] => http
[host] => www.example.com
)
Array
(
[path] => example.com/myurl.html
[query] => unwan=abc&ab=1
)
This is the output from passing example URLs through parse_url() with no second parameter (for explanation only).
And this is the final output after constructing URL using:
foreach($test as $url){
echo reconstruct_url($url) . '<br/>';
}
Output:
http://www.example.com/myurl.html
http://www.example.com/myurl.html
http://www.example.com
https://example.com/myurl.html
best solution:
echo parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH);
No need to include your http://example.com in your <form action=""> if you're submitting a form to the same domain.
$val = substr( $url, 0, strrpos( $url, "?"));
Most Easiest Way
$url = 'https://www.youtube.com/embed/ROipDjNYK4k?rel=0&autoplay=1';
$url_arr = parse_url($url);
$query = $url_arr['query'];
print $url = str_replace(array($query,'?'), '', $url);
//output
https://www.youtube.com/embed/ROipDjNYK4k
You'll need at least PHP Version 5.4 to implement this solution without exploding into a variable on one line and concatenating on the next, but an easy one liner would be:
$_SERVER["HTTP_HOST"].explode('?', $_SERVER["REQUEST_URI"], 2)[0];
Server Variables: http://php.net/manual/en/reserved.variables.server.php
Array Dereferencing: https://wiki.php.net/rfc/functionarraydereferencing
You can use the parse_url build in function like that:
$baseUrl = $_SERVER['SERVER_NAME'] . parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH);
You can try:
<?php
$this_page = basename($_SERVER['REQUEST_URI']);
if (strpos($this_page, "?") !== false) $this_page = reset(explode("?", $this_page));
?>
If you want to get request path (more info):
echo parse_url($_SERVER["REQUEST_URI"])['path']
If you want to remove the query and (and maybe fragment also):
function strposa($haystack, $needles=array(), $offset=0) {
$chr = array();
foreach($needles as $needle) {
$res = strpos($haystack, $needle, $offset);
if ($res !== false) $chr[$needle] = $res;
}
if(empty($chr)) return false;
return min($chr);
}
$i = strposa($_SERVER["REQUEST_URI"], ['#', '?']);
echo strrpos($_SERVER["REQUEST_URI"], 0, $i);
could also use following as per the php manual comment
$_SERVER['REDIRECT_URL']
Please note this is working only for certain PHP environment only and follow the bellow comment from that page for more information;
Purpose: The URL path name of the current PHP file, path-info is N/A
and excluding URL query string. Includes leading slash.
Caveat: This is before URL rewrites (i.e. it's as per the original
call URL).
Caveat: Not set on all PHP environments, and definitely only ones with
URL rewrites.
Works on web mode: Yes
Works on CLI mode: No
explode('?', $_SERVER['REQUEST_URI'])[0]
To remove the query string from the request URI, replace the query string with an empty string:
function request_uri_without_query() {
$result = $_SERVER['REQUEST_URI'];
$query = $_SERVER['QUERY_STRING'];
if(!empty($query)) {
$result = str_replace('?' . $query, '', $result);
}
return $result;
}
Because I deal with both relative and absolute URLs, I updated veritas's solution like the code below.
You can try yourself here: https://ideone.com/PvpZ4J
function removeQueryStringFromUrl($url) {
if (substr($url,0,4) == "http") {
$urlPartsArray = parse_url($url);
$outputUrl = $urlPartsArray['scheme'] . '://' . $urlPartsArray['host'] . ( isset($urlPartsArray['path']) ? $urlPartsArray['path'] : '' );
} else {
$URLexploded = explode("?", $url, 2);
$outputUrl = $URLexploded[0];
}
return $outputUrl;
}
Assuming you still want to get the URL without the query args (if they are not set), just use a shorthand if statement to check with strpos:
$request_uri = strpos( $_SERVER['REQUEST_URI'], '?' ) !== false ? strtok( $_SERVER["REQUEST_URI"], '?' ) : $_SERVER['REQUEST_URI'];
Try this
$url_with_querystring = 'www.example.com/myurl.html?unwantedthngs';
$url_data = parse_url($url_with_querystring);
$url_without_querystring = str_replace('?'.$url_data['query'], '', $url_with_querystring);
Try this:
$urrl=$_SERVER['HTTP_HOST'] . $_SERVER['SCRIPT_NAME']
or
$urrl=$_SERVER['HTTP_HOST'] . $_SERVER['PHP_SELF']

Categories