Parsing complex URLs - php

I try to parse a list of url strings, after two hours of work I don't reach any result, the list of url strings look like this:
$url_list = array(
'http://google.com',
'http://localhost:8080/test/project/',
'http://mail.yahoo.com',
'http://www.bing.com',
'http://www.phpromania.net/forum/viewtopic.php?f=24&t=7549',
'https://prodgame10.alliances.commandandconquer.com/12/index.aspx',
'https://prodgame10.alliances.commandandconquer.ro/12/index.aspx',
);
Output should be:
Array
(
[0] => .google.com
[1] => .localhost
[2] => .yahoo.com
[3] => .bing.com
[4] => .phpromania.net
[5] => .commandandconquer.com
)
The first thing what induce me in the error zone is more than 2 dots in the url.
Any algorithm example?
This is what I try:
$url_list = array(
'http://google.com',
'http://localhost:8080/test/project/',
'http://mail.yahoo.com',
'http://www.bing.com',
'http://www.phpromania.net/forum/viewtopic.php?f=24&t=27549',
'https://prodgame10.alliances.commandandconquer.com/12/index.aspx',
);
function size($list)
{
$i=0;
while($list[++$i]!=NULL);
return $i;
}
function url_Host($list)
{
$listSize = size($list)-1;
do
{
$strSize = size($list[$listSize]);
$points = 0;
$dpoints = 0;
$tmpString = '';
do
{
$currentChar = $list[$listSize][$strSize];
if(ord('.')==ord($currentChar))
{
$tmpString .= '.';
$points++;
}
else if(ord(':')==ord($currentChar))
{
$tmpString .= ':';
$dpoints++;
}
}while($list[$listSize][--$strSize]!=NULL);
print $tmpString;
$strSize = size($list[$listSize]);
$tmpString = '';
do
{
$slice = false;
$currentChar = $list[$listSize][$strSize];
if($dpoints > 2)
{
if(ord('\\')==ord($curentChar)) $slice = true;
$tmpString .= '';
}
}while($list[$listSize][--$strSize]!=NULL);
print $tmpString."<br />";
}while($list[--$listSize]);
}
url_Host($url_list);

You can use the built-in function parse_url() as follows:
function getDomain($url)
{
$domain = implode('.', array_slice(explode('.', parse_url($url, PHP_URL_HOST)), -2));
return $domain;
}
Test cases:
foreach ($url_list as $url) {
$result[] = getDomain($url);
}
Output:
Array
(
[0] => google.com
[1] => localhost
[2] => yahoo.com
[3] => bing.com
[4] => phpromania.net
[5] => commandandconquer.com
[6] => commandandconquer.ro
)
As for the dots, you can manually prepend them to string, like so:
$result[] = "." . getDomain($url);
I'm not sure why you need to do this, but this should work.
Demo!

Look at parse_url. For example:
$url = 'http://www.phpromania.net/forum/viewtopic.php?f=24&t=7549';
$host = parse_url($url, PHP_URL_HOST);

First the result for localhost is no sense, but try this:
$result =array();
foreach($url_list as $u){
$arr = explode('//',$u);
$arr2 = explode('.', $arr[1]);
if($arr2[0] == 'www')
array_push($result, $arr2[1]);
else
array_push($result, $arr2[0]);
}

We can also use array_map() with an arrow function to simplify the code.
I'm refactoring #Alessandro Minoccheri's code here.
$domains = array_map(fn($url) => implode('.', array_slice(explode('.', parse_url($url, PHP_URL_HOST)), -2)),$urls);
var_dump($domains);

Related

Mixed Domains How to bring a certain order?

RANDOM DOMAIN LIST:
$DOMAIN_ARRAY = [
'mydomain.com',
'hair.mydomain.com',
'web.developer.yoursite.com',
'game.yoursite.com',
'yoursite.com',
'good.mydomain.com',
'great.yoursite.com',
'test.page.mydomain.com',
'check.yoursite.com',
'test.mydomain.com'
];
DESIRED RESULT:
mydomain.com
test.page.mydomain.com
hair.mydomain.com
good.mydomain.com
test.mydomain.com
yoursite.com
web.developer.yoursite.com
game.yoursite.com
great.yoursite.com
check.yoursite.com
<?php
$URL = "ide.geeksforgeeks.org";
$arr = preg_split('[\.]', $URL);
$subdomain = $arr[0];
echo $subdomain;
<?php
$domains = [
'mydomain.com',
'hair.mydomain.com',
'web.developer.yoursite.com',
'game.yoursite.com',
'yoursite.com',
'good.mydomain.com',
'great.yoursite.com',
'test.page.mydomain.com',
'check.yoursite.com',
'test.mydomain.com'
];
function domainsFirst($a, $b)
{
if ($a == $b) {
return 0;
}
return (substr_count($a, '.') < substr_count($b, '.')) ? -1 : 1;
}
// Sorts the array by main domains first
usort($domains, "domainsFirst");
$result = [];
foreach ($domains as $domain){
// Main domain
if (substr_count($domain, '.') == 1){
if (!isset($result[$domain])){
$result[$domain] = [];
}
} else { // Subdomain
$sub = explode('.', $domain);
// Discover the main domain by taking the two last indexes
$key = $sub[count($sub) - 2]. '.' .$sub[count($sub) - 1];
$result[$key][] = $domain;
}
}
print_r($result);
Outputs
Array
(
[yoursite.com] => Array
(
[0] => check.yoursite.com
[1] => great.yoursite.com
[2] => game.yoursite.com
[3] => web.developer.yoursite.com
)
[mydomain.com] => Array
(
[0] => test.mydomain.com
[1] => good.mydomain.com
[2] => hair.mydomain.com
[3] => test.page.mydomain.com
)
)
https://3v4l.org/tK9Kk

PHP group certain results from foreach on array into another array

I have an array that looks something like this:
$array = array( [0] => FILE-F01-E1-S01.pdf
[1] => FILE-F01-E1-S02.pdf
[2] => FILE-F01-E1-S03.pdf
[3] => FILE-F01-E1-S04.pdf
[4] => FILE-F01-E1-S05.pdf
[5] => FILE-F02-E1-S01.pdf
[6] => FILE-F02-E1-S02.pdf
[7] => FILE-F02-E1-S03.pdf );
Basically, I need to look at the first file and then get all the other files that have the same beginning ('FILE-F01-E1', for example) and put them into an array. I don't need to do anything with the other ones at this point.
I've been trying to use a foreach loop finding the previous value to do this, but am not having any luck.
Like this:
$previousFile = null;
foreach($array as $file)
{
if(substr_replace($previousFile, "", -8) == substr_replace($file, "", -8))
{
$secondArray[] = $file;
}
$previousFile = $file;
}
So then $secondArray would look like this:
Array ( [0] => FILE-F01-E1-S01.pdf [1] => FILE-F01-E1-S02.pdf
[2] => FILE-F01-E1-S03.pdf [3] => FILE-F01-E1-S04.pdf
[4] => FILE-F01-E1-S05.pdf)
As my result.
Thank you!
You can use array_filter combined with strpos:
$result = array_filter($array, function($filename) {
return strpos($filename, 'FILE-F01-E1') === 0;
});
Are you sure this will be the naming format? That is crucial information to have to construct a regexp or something to check for being a substring of the following strings.
If we can assume this and that the "base" name is always at index 0 then you could do something like.
<?php
$myArr = [
'FILE-F01-E1-S01.pdf',
'FILE-F01-E1-S02.pdf',
'FILE-F01-E1-S03.pdf',
'FILE-F01-E1-S04.pdf',
'FILE-F01-E1-S05.pdf',
'FILE-F02-E1-S01.pdf',
'FILE-F02-E1-S02.pdf',
'FILE-F02-E1-S03.pdf'
];
$baseName = '';
$allSimilarNames = [];
foreach($myArr as $index => &$name) {
if($index == 0) {
$baseName = substr($name, 0, strrpos($name, '-'));
$allSimilarNames[] = $name;
}
else {
if(strpos($name, $baseName) === 0) {
$allSimilarNames[] = $name;
}
}
}
var_dump($allSimilarNames);
This will
Check at index one to get the base name to compare against
Loop all items in the array and match all items, no matter where in the array they are, that are similar according to your naming convention
So if you next time have an array that is
$myArr = [
'FILE-F02-E1-S01.pdf',
'FILE-F01-E1-S01.pdf',
'FILE-F01-E1-S02.pdf',
'FILE-F01-E1-S03.pdf',
'FILE-F01-E1-S04.pdf',
'FILE-F01-E1-S05.pdf',
'FILE-F02-E1-S02.pdf',
'FILE-F02-E1-S03.pdf'
];
this will return all the items that match FILE-F02-E1*.
You could also make a small function of it for easier use and not have to rely on the element at index 0 having to be the "base" name.
<?php
function findMatches($baseName, &$names) {
$matches = [];
$baseName = substr($baseName, 0, strrpos($baseName, '-'));
foreach($names as &$name) {
if(strpos($name, $baseName) === 0) {
$matches[] = $name;
}
}
return $matches;
}
$myArr = [
'FILE-F01-E1-S01.pdf',
'FILE-F01-E1-S02.pdf',
'FILE-F01-E1-S03.pdf',
'FILE-F01-E1-S04.pdf',
'FILE-F01-E1-S05.pdf',
'FILE-F02-E1-S01.pdf',
'FILE-F02-E1-S02.pdf',
'FILE-F02-E1-S03.pdf'
];
$allSimilarNames = findMatches('FILE-F01-E1-S01.pdf', $myArr);
var_dump($allSimilarNames);
Run a simple foreach with strpos() which looks for an occurrence of a string within a string.
$results = array();
foreach($array as $item){
if (strpos($item, 'FILE-F01-E1') === 0) {
array_push($results, $item);
}
}
You could get the first item from the array and use explode and implode to get the part from the filename without the last hyphen and the content after that.
Then use array_filter and use substr using 0 as the start position and the length of the $fileBeginning as the length to check if the string starts with FILE-F01-E1:
$array = [
'FILE-F01-E1-S01.pdf',
'FILE-F01-E1-S02.pdf',
'FILE-F01-E1-S03.pdf',
'FILE-F01-E1-S04.pdf',
'FILE-F01-E1-S05.pdf',
'FILE-F02-E1-S01.pdf',
'FILE-F02-E1-S02.pdf',
'FILE-F02-E1-S03.pdf',
"TESTFILE-F01-E1-S03.pdf"
];
$parts = explode('-', $array[0]);
array_pop($parts);
$fileBeginning = implode('-', $parts);
$secondArray = array_filter($array, function ($x) use ($fileBeginning) {
return substr($x, 0, strlen($fileBeginning)) === $fileBeginning;
});
print_r($secondArray);
Result
Array
(
[0] => FILE-F01-E1-S01.pdf
[1] => FILE-F01-E1-S02.pdf
[2] => FILE-F01-E1-S03.pdf
[3] => FILE-F01-E1-S04.pdf
[4] => FILE-F01-E1-S05.pdf
)
Demo

Reduce URL strings with no duplicates

I have an array that looks like the following...
$urls = array(
"http://www.google.com",
"http://www.google.com/maps",
"http://www.google.com/mail",
"https://drive.google.com",
"https://www.youtube.com",
"https://www.youtube.com/feed/subscriptions",
"https://www.facebook.com/me",
"https://www.facebook.com/me/friends"
);
I find this hard to explain but I want to break this array down to only show the reduced URLs with no duplicates, so it looks like this...
$urls = array(
"http://www.google.com",
"https://drive.google.com",
"https://www.youtube.com",
"https://www.facebook.com/me"
);
Notice the last URL in the second array still has it's path. This is because I want still want to show the lowest level paths
Based on #Tim's answer
foreach ($urls as &$url) {
$url_parts = parse_url($url);
$url = $url_parts["scheme"]."://".$url_parts["host"];
}
$urls = array_unique($urls);
Just sort the array in reverse order, and create an array indexed by host:
$urls = array(
"http://www.google.com",
"http://www.google.com/maps",
"http://www.google.com/mail",
"https://drive.google.com",
"https://www.youtube.com",
"https://www.youtube.com/feed/subscriptions",
"https://www.facebook.com/me",
"https://www.facebook.com/me/friends"
);
rsort($urls);
$return = [];
foreach($urls as $url) {
$host = parse_url($url, PHP_URL_HOST);
$return[$host] = $url;
}
$return = array_values($return); // To remove array keys, if desired.
The reverse-ordered urls array would be:
Array
(
[0] => https://www.youtube.com/feed/subscriptions
[1] => https://www.youtube.com
[2] => https://www.facebook.com/me/friends
[3] => https://www.facebook.com/me
[4] => https://drive.google.com
[5] => http://www.google.com/maps
[6] => http://www.google.com/mail
[7] => http://www.google.com
)
Since the last entry (per host name) in the sorted array is the one that you want, and it deliberately clobbers any existing array value, this would output:
Array
(
[www.youtube.com] => https://www.youtube.com
[www.facebook.com] => https://www.facebook.com/me
[drive.google.com] => https://drive.google.com
[www.google.com] => http://www.google.com
)
Try this:
$result = array();
array_push($result, $urls[0])
for($i=1; $i<count($urls); $i++)
{
$repeat = false;
foreach($result as $res)
{
if(strpos($urls[i], $res))
{
$repeat = true;
break;
}
}
if(!repeat)
array_push($result, $urls[i])
}
return $result;

Dynamically creating a multidimensional array based on paths

So I've got a list of paths, such as:
path/to/directory/file1
path/directory/file2
path2/dir/file3
path2/dir/file4
And I'd like to convert them into a multidimensional array like this:
array(
path => array(
to => array(
directory => array(
file1 => someValue
),
),
directory => array(
file2 => someValue
),
),
path2 => array(
dir => array(
file3 => someValue,
file4 => someValue
)
)
)
My first thought was to explode() the paths into segments and set up the array using a foreach loop, something like this:
$arr = array();
foreach ( $path as $p ) {
$segments = explode('/', $p);
$str = '';
foreach ( $segments as $s ) {
$str .= "[$s]";
}
$arr{$str} = $someValue;
}
But this doesn't work, and since the number of segments varies, I've kinda got stumped. Is there away to do this?
If somevalue can be an empty array:
<?php
$result = array();
$input = [
'path/to/directory/file1',
'path/directory/file2',
'path2/dir/file3',
'path2/dir/file4',
];
foreach( $input as $e ) {
nest( $result, explode('/', $e));
}
var_export($result);
function nest(array &$target, array $parts) {
if ( empty($parts) ) {
return;
}
else {
$e = array_shift($parts);
if ( !isset($target[$e]) ) {
$target[$e] = [];
}
nest($target[$e], $parts);
}
}
Here is the solution and a easy way
Just Reverse the whole exploded array and start creating array within a Array
$path[1] = "path/to/directory/file1";
$path[2] = "path/directory/file2";
$path[3] = "path2/dir/file3";
$path[4] = "path2/dir/file4";
$arr = array();
$b = array();
$k = 0;
foreach($path as $p) {
$c = 0;
$segments = explode('/', $p);
$reversed = array_reverse($segments);
foreach($reversed as $s) {
if ($c == 0) {
$g[$k] = array($s => "somevalue");
} else {
$g[$k] = array($s => $g[$k]);
}
$c++;
}
$k++;
}
var_dump($g);
Thanks so much VolkerK! Your answer didn't quite answer my question but it got me on the right track. Here's the version I ended up using to get it to work:
$result = array();
$input = [
'path/to/directory/file1' => 'someValue',
'path/directory/file2' => 'someValue',
'path2/dir/file3' => 'someValue',
'path2/dir/file4' => 'someValue',
];
foreach( $input as $e=>$val ) {
nest( $result, explode('/', $e), $val);
}
var_export($result);
function nest(array &$target, array $parts, $leafValue) {
$e = array_shift($parts);
if ( empty($parts) ) {
$target[$e] = $leafValue;
return;
}
if ( !isset($target[$e]) ) {
$target[$e] = [];
}
nest($target[$e], $parts, $leafValue);
}
I basically just added the somevalue as $leafValue and moved the base case around so that it would add the leafValue instead of a blank array at the end.
This results in:
Array
(
[path] => Array
(
[to] => Array
(
[directory] => Array
(
[file1] => someValue
)
)
[directory] => Array
(
[file2] => someValue
)
)
[path2] => Array
(
[dir] => Array
(
[file3] => someValue
[file4] => someValue
)
)
)
Thanks a lot!
It can be done without recursion
$path = array(
'path/to/directory/file1',
'path/directory/file2',
'path2/dir/file3',
'path2/dir/file4');
$arr = [];
$someValue = 'someValue';
foreach ( $path as $p ) {
$segments = explode('/', $p);
$str = '';
$p = &$arr;
foreach ( $segments as $s ) {
if (! isset($p[$s] ) ) $p[$s] = array();
$p = &$p[$s];
}
$p = $someValue;
}
print_r($arr);

Is there something like keypath in an associative array in PHP?

I want to dissect an array like this:
[
"ID",
"UUID",
"pushNotifications.sent",
"campaigns.boundDate",
"campaigns.endDate",
"campaigns.pushMessages.sentDate",
"pushNotifications.tapped"
]
To a format like this:
{
"ID" : 1,
"UUID" : 1,
"pushNotifications" :
{
"sent" : 1,
"tapped" : 1
},
"campaigns" :
{
"boundDate" : 1,
"endDate" : 1,
"pushMessages" :
{
"endDate" : 1
}
}
}
It would be great if I could just set a value on an associative array in a keypath-like manner:
//To achieve this:
$dissected['campaigns']['pushMessages']['sentDate'] = 1;
//By something like this:
$keypath = 'campaigns.pushMessages.sentDate';
$dissected{$keypath} = 1;
How to do this in PHP?
You can use :
$array = [
"ID",
"UUID",
"pushNotifications.sent",
"campaigns.boundDate",
"campaigns.endDate",
"campaigns.pushMessages.sentDate",
"pushNotifications.tapped"
];
// Build Data
$data = array();
foreach($array as $v) {
setValue($data, $v, 1);
}
// Get Value
echo getValue($data, "campaigns.pushMessages.sentDate"); // output 1
Function Used
function setValue(array &$data, $path, $value) {
$temp = &$data;
foreach(explode(".", $path) as $key) {
$temp = &$temp[$key];
}
$temp = $value;
}
function getValue($data, $path) {
$temp = $data;
foreach(explode(".", $path) as $ndx) {
$temp = isset($temp[$ndx]) ? $temp[$ndx] : null;
}
return $temp;
}
function keyset(&$arr, $keypath, $value = NULL)
{
$keys = explode('.', $keypath);
$current = &$arr;
while(count($keys))
{
$key = array_shift($keys);
if(!isset($current[$key]) && count($keys))
{
$current[$key] = array();
}
if(count($keys))
{
$current = &$current[$key];
}
}
$current[$key] = $value;
}
function keyget($arr, $keypath)
{
$keys = explode('.', $keypath);
$current = $arr;
foreach($keys as $key)
{
if(!isset($current[$key]))
{
return NULL;
}
$current = $current[$key];
}
return $current;
}
//Testing code:
$r = array();
header('content-type: text/plain; charset-utf8');
keyset($r, 'this.is.path', 39);
echo keyget($r, 'this.is.path');
var_dump($r);
It's a little rough, I can't guarantee it functions 100%.
Edit: At first you'd be tempted to try to use variable variables, but I've tried that in the past and it doesn't work, so you have to use functions to do it. This works with some limited tests. (And I just added a minor edit to remove an unnecessary array assignment.)
In the meanwhile, I came up with (another) solution:
private function setValueForKeyPath(&$array, $value, $keyPath)
{
$keys = explode(".", $keyPath, 2);
$firstKey = $keys[0];
$remainingKeys = (count($keys) == 2) ? $keys[1] : null;
$isLeaf = ($remainingKeys == null);
if ($isLeaf)
$array[$firstKey] = $value;
else
$this->setValueForKeyPath($array[$firstKey], $value, $remainingKeys);
}
Sorry for the "long" namings, I came from the Objective-C world. :)
So calling this on each keyPath, it actually gives me the output:
fields
Array
(
[0] => ID
[1] => UUID
[2] => pushNotifications.sent
[3] => campaigns.boundDate
[4] => campaigns.endDate
[5] => campaigns.pushMessages.endDate
[6] => pushNotifications.tapped
)
dissectedFields
Array
(
[ID] => 1
[UUID] => 1
[pushNotifications] => Array
(
[sent] => 1
[tapped] => 1
)
[campaigns] => Array
(
[boundDate] => 1
[endDate] => 1
[pushMessages] => Array
(
[endDate] => 1
)
)
)

Categories