PHP Finding a domain name and domain extension from an array - php

I have an array of data containing some domains with TLD extensions. I want to collect the domain name and TLD extension seperately.
E.g. From "hello.com" I want to collect "hello" as one variable, and then collect ".com" as another variable.
Another E.g. IMPORTANT, from "hello.co.uk" I want to collect "hello" as one variable, and then collect ".co.uk" as another variable.
My current code using pathinfo() will work correctly on "hello.com", but not "hello.co.uk". For "hello.co.uk" it will collect "hello.co" as one variable, and then collect ".uk" as another variable.
Here is the code I am using:
// Get a file into an array
$lines = file($_FILES['file']['tmp_name']);
// Loop through array
foreach ($lines as $line_num => $line) {
echo $line;
//Find TLD
$tld = ".".pathinfo($line, PATHINFO_EXTENSION);
echo $tld;
//Find Domain
$domain = pathinfo($line, PATHINFO_FILENAME);
echo $domain;
}
Hopefully I explained that well enough.
I use stackoverflow a lot but couldn't find a specific example of this.
Thanks

Instead of using functions intended for files, you could just use some simple string manipulation:
$domain = substr($line, 0, strpos($line, "."));
$tld = substr($line, strpos($line, "."), (strlen($line) - strlen($domain)));

First method:
$domains = array("hello.co.uk", "hello.com");
foreach ($domains as $d) {
$ext = strstr($d, '.'); // extension
$index = strpos($d, '.');
$arr = str_split($d, $index);
$domain = $arr[0]; // domain name
echo "domain: $domain, extension: $ext <br/>";
}
Second method: (Thanks to hakre)
$domains = array("hello.co.uk", "hello.com");
foreach ($domains as $d) {
list($domain, $ext) = explode('.', $d, 2);
echo "domain: $domain, extension: $ext <br/>";
}

Here's a function that's pretty flexible, and will work with everything from example.com to http://username:password#example.com/public_html/test.zip to ftp://username#example.com to http://www.reddit.com/r/aww/comments/165v9u/shes_allergic_couldnt_help_herself/
function splitDomain($url) {
$host = "";
$url = parse_url($url);
if(isset($url['host'])) {
$host = $url['host'];
} else {
$host = $url['path'];
}
$host = str_replace('www.','',$host);
$tmp = explode('.', $host);
$name = $tmp[0];
$tld = $tmp[1];
return array('name'=>$name,'tld'=>$tld);
}

There's no reliable way of doing this other than to use a large table of legal extensions.
A popular table is the one known as the Public Suffix List.

For work with two (co.uk) and three level TLDs (act.edu.au) you need library that using Public Suffix List (list of top level domains), I recomend to use TLDExtract.

Related

Php code that returns an array with filenames of files which contains a string

Im trying to make a Php file that receives nothing and checks every file on the folder, searching for a string inside them. it echos a array of filenames that have the string inside. Any way to do it, possibly with low memory usage?
Thank you a lot.
To achieve something like this, I recommend you read about the DirectoryIterator class, file_get_contents, and about strings in PHP.
Here is an example of how you can read the contents of a a given directory ($dir) and use strstr to search for a specific string occurrence in each file's contents ($contents):
<?php
$dir = '.';
if (substr($dir, -1) !== '/') {
$dir .= '/';
}
$matchedFiles = [];
$dirIterator = new \DirectoryIterator($dir);
foreach ($dirIterator as $item) {
if ($item->isDot() || $item->isDir()) {
continue;
}
$file = realpath($dir . $item->getFilename());
// Skip this PHP file.
if ($file === __FILE__) {
continue;
}
$contents = file_get_contents($file);
// Seach $contents for what you're looking for.
if (strstr($contents, 'this is what I am looking for')) {
echo 'Found something in ' . $file . PHP_EOL;
$matchedFiles[] = $file;
}
}
var_dump($matchedFiles);
There is some extra code in this example (adding a trailing slash to $dir, skipping dot files and directories, skipping itself, etc.) that I encourage you to read and learn about.
<?php
$folderPath = '/htdocs/stock/tae';
$searchString = 'php';
$cmd = "grep -r '$searchString' $folderPath";
$output = array();
$files = array();
$res = exec($cmd, $output);
foreach ($output as $line) {
$files[] = substr($line, 0, strpos($line, ':'));
}
print_r($files);

URL - Get last part in PHP

I have my url:
http://domain/fotografo/admin/gallery_bg.php
and i want last part of the url:
gallery_bg.php
but, I do not want to link static, ie, for each page that vistitar I want to get the last part of the url
use following
<?php
$link = $_SERVER['PHP_SELF'];
$link_array = explode('/',$link);
echo $page = end($link_array);
?>
Use basename function
echo basename("http://domain/fotografo/admin/gallery_bg.php");
If it is same page:
echo $_SERVER["REQUEST_URI"];
or
echo $_SERVER["SCRIPT_NAME"];
or
echo $_SERVER["PHP_SELF"];
In each case a back slash(/gallery_bg.php) will appear. You can trim it as
echo trim($_SERVER["REQUEST_URI"],"/");
or split the url by / to make an array and get the last item from array
$array = explode("/",$url);
$last_item_index = count($url) - 1;
echo $array[$last_item_index];
or
echo basename($url);
$url = "http://domain/fotografo/admin/gallery_bg.php";
$keys = parse_url($url); // parse the url
$path = explode("/", $keys['path']); // splitting the path
$last = end($path); // get the value of the last element
you can use basename($url) function as suggested above. This returns the file name from the url. You can also provide the file extension as second argument to this function like basename($url, '.jpg'), then the filename without the extension will be served.
Eg:
$url = "https://i0.com/images/test.jpg"
then echo basename($url) will print test.jpg
and echo basename($url,".jpg") will print test
$url = $_SERVER["PHP_SELF"];
$path = explode("/", $url);
$last = end($path);
Try this:
Here you have 2 options.
1. Using explode function.
$filename = end(explode('/', 'http://domain/fotografo/admin/gallery_bg.php'));
2. Use basename function.
$filename = basename("http://domain/fotografo/admin/gallery_bg.php");
-
Thanks
$basepath = implode('/', array_slice(explode('/', $_SERVER['SCRIPT_NAME']), 0, -1)) . '/';
$uri = substr($_SERVER['REQUEST_URI'], strlen($basepath));
if (strstr($uri, '?')) $uri = substr($uri, 0, strpos($uri, '?'));
$url = trim($uri, '/');
In PHP 7 the accepted solution is giving me the error that only variables are allowed in explode so this works for me.

Php parse string error

I am extracting files from a string which can be entered by a user or taken from reading a page source.
I want to extract all .jpg image URLs
So, I am using the following (example text shown) but a) it only returns the first one and b) it misses off '.jpg'
$word1='http://';
$word2='.jpg';
$contents = 'uuuuyyyyyhttp://image.jpgandagainhereitishttp://image2.jpgxxxxcccffff';
$between=substr($contents, strpos($contents, $word1), strpos($contents, $word2) - strpos($contents, $word1));
echo $between;
Is there maybe a better way to do this?
In the case of parsing a web page I cannot use a simple DOM e.g. $images = $dom->getElementsByTagName('img'); as sometimes the image references are not in standard tags
You can do something like this :
<?php
$contents = 'uuuuyyyyyhttp://image.jpgandagainhereitishttp://image2.jpgxxxxcccffff';
$matches = array();
preg_match_all('#(http://[^\s]*?\.jpg)#i',$matches);
print_r($matches);
You can either do this using preg_match_all (as previously answered) or alternatively use the following function.
It simply explodes the original string, checks all parts for a valid link and adds it to the array, that's getting returned.
function getJpgLinks($string) {
$return = array();
foreach (explode('.jpg', $string) as $value) {
$position = strrpos($value, 'http://');
if ($position !== false) {
$return[] = substr($value, $position) . '.jpg';
}
}
return $return;
}

PHP. How to remove filename parts?

I have:
$filename = basename(__FILE__);
$id = preg_replace("/\\.[^.\\s]{3,4}$/", "", $filename);
$id is filename without extension now. How can I remove not only extension but prefix and suffix from the file too?
prefix_ineedthis_suffix.php -> ineedthis
Update: Thanks for your answers! Unfortunately, I can mark only one answer as answer.
$prefix = 'prefix_';
$suffix = '_suffix';
$pattern = sprintf('/%s(.+)%s/i', $prefix, $suffix);
if (preg_match($pattern, $filename, $matches)) {
$id = $matches[1];
}
If "prefix" and "suffix" are parts separated by _ (underscore), then you might not need regex at all:
$parts = explode("_", $filename);
array_shift($parts);
array_pop($parts);
$ineedthis = implode("_", $parts);
OR, if ineedthis does not contain underscores for sure then:
$parts = explode("_", $filename);
$ineedthis = $parts[1];
If you still wanna use regex then:
if(preg_match("/^[^_]+_(.*)_[^_]+\.[a-z]{3,4}$/", $filename, $match))
$ineedthis = $match[1];
else
/// oops!
Use basename(string $path , string $suffix) instead. This can remove the directory part and also the extension part if you want.
$id = basename(__FILE__, "_suffix.php")
$prefix = "prefix_";
if (substr($id, 0, strlen($prefix) ) == $prefix) {
$id = substr($id, strlen($prefix), strlen($id) );
}
And according to this question this is faster than using RegEx.
You can use explode() twice to remove first the extension, then the prefix & suffix. This will store all the parts within arrays, which is handy if you later need those parts.

Using PHP to find part of a URL

Take this domain:
http://www.?.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html
How could i use PHP to find the everything between the first and second slash regardless of whether it changes or no?
Ie. elderly-care-advocacy
Any helo would be greatly appreciated.
//strip the "http://" part. Note: Doesn't work for HTTPS!
$url = substr("http://www.example.com/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html", 7);
// split the URL in parts
$parts = explode("/", $url);
// The second part (offset 1) is the part we look for
if (count($parts) > 1) {
$segment = $parts[1];
} else {
throw new Exception("Full URLs please!");
}
$url = "http://www.example.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html";
$parts = parse_url($url);
$host = $parts['host'];
$path = $parts['path'];
$items = preg_split('/\//',$path,null,PREG_SPLIT_NO_EMPTY);
$firstPart = $items[0];
off the top of my head:
$url = http://www.example.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html
$urlParts = parse_url($url); // An array
$target_string = $urlParts[1] // 'elderly-care-advocacy'
Cheers
explode('/', $a);
All you should do, is parse url first, and then explode string and get first part. With some sanity checks that would lok like following:
$url = 'http://www.?.co.uk/elderly-care-advocacy/mental-capacity-act-advance-medical-directive.html';
$url_parts = parse_url($url);
if (isset($url_parts['path'])) {
$path_components = explode('/', $ul_parts['path']);
if (count($path_components) > 1) {
// All is OK. Path's first component is in $path_components[0]
} else {
// Throw an error, since there is no directory specified in path
// Or you could assume, that $path_components[0] is the actual path
}
} else {
// Throw an error, since there is no path component was found
}
I was surprised too, but this works.
$url='http://www.?.co.uk/elderly-care-advocacy/...'
$result=explode('/',$url)[3];
I think a Regular Expression should be fine for that.
Try using e.g.: /[^/]+/ that should give you /elderly-care-advocacy/ as the second index of an array in your example.
(The first string is /www.?.com/)
Parse_URL is your best option. It breaks the URL string down into components, which you can selectively query.
This function could be used:
function extract_domain($url){
if ($url_parts = parse_url($url), $prefix = 'www.', $suffix = '.co.uk') {
$host = $url_parts['host'];
$host = str_replace($prefix,'',$host);
$host = str_replace($suffix,'',$host);
return $host;
}
return false;
}
$host_component = extract_domain($_SERVER['REQUEST_URI']);

Categories