Reduce URL strings with no duplicates - php

I have an array that looks like the following...
$urls = array(
"http://www.google.com",
"http://www.google.com/maps",
"http://www.google.com/mail",
"https://drive.google.com",
"https://www.youtube.com",
"https://www.youtube.com/feed/subscriptions",
"https://www.facebook.com/me",
"https://www.facebook.com/me/friends"
);
I find this hard to explain but I want to break this array down to only show the reduced URLs with no duplicates, so it looks like this...
$urls = array(
"http://www.google.com",
"https://drive.google.com",
"https://www.youtube.com",
"https://www.facebook.com/me"
);
Notice the last URL in the second array still has it's path. This is because I want still want to show the lowest level paths

Based on #Tim's answer
foreach ($urls as &$url) {
$url_parts = parse_url($url);
$url = $url_parts["scheme"]."://".$url_parts["host"];
}
$urls = array_unique($urls);

Just sort the array in reverse order, and create an array indexed by host:
$urls = array(
"http://www.google.com",
"http://www.google.com/maps",
"http://www.google.com/mail",
"https://drive.google.com",
"https://www.youtube.com",
"https://www.youtube.com/feed/subscriptions",
"https://www.facebook.com/me",
"https://www.facebook.com/me/friends"
);
rsort($urls);
$return = [];
foreach($urls as $url) {
$host = parse_url($url, PHP_URL_HOST);
$return[$host] = $url;
}
$return = array_values($return); // To remove array keys, if desired.
The reverse-ordered urls array would be:
Array
(
[0] => https://www.youtube.com/feed/subscriptions
[1] => https://www.youtube.com
[2] => https://www.facebook.com/me/friends
[3] => https://www.facebook.com/me
[4] => https://drive.google.com
[5] => http://www.google.com/maps
[6] => http://www.google.com/mail
[7] => http://www.google.com
)
Since the last entry (per host name) in the sorted array is the one that you want, and it deliberately clobbers any existing array value, this would output:
Array
(
[www.youtube.com] => https://www.youtube.com
[www.facebook.com] => https://www.facebook.com/me
[drive.google.com] => https://drive.google.com
[www.google.com] => http://www.google.com
)

Try this:
$result = array();
array_push($result, $urls[0])
for($i=1; $i<count($urls); $i++)
{
$repeat = false;
foreach($result as $res)
{
if(strpos($urls[i], $res))
{
$repeat = true;
break;
}
}
if(!repeat)
array_push($result, $urls[i])
}
return $result;

Related

PHP group certain results from foreach on array into another array

I have an array that looks something like this:
$array = array( [0] => FILE-F01-E1-S01.pdf
[1] => FILE-F01-E1-S02.pdf
[2] => FILE-F01-E1-S03.pdf
[3] => FILE-F01-E1-S04.pdf
[4] => FILE-F01-E1-S05.pdf
[5] => FILE-F02-E1-S01.pdf
[6] => FILE-F02-E1-S02.pdf
[7] => FILE-F02-E1-S03.pdf );
Basically, I need to look at the first file and then get all the other files that have the same beginning ('FILE-F01-E1', for example) and put them into an array. I don't need to do anything with the other ones at this point.
I've been trying to use a foreach loop finding the previous value to do this, but am not having any luck.
Like this:
$previousFile = null;
foreach($array as $file)
{
if(substr_replace($previousFile, "", -8) == substr_replace($file, "", -8))
{
$secondArray[] = $file;
}
$previousFile = $file;
}
So then $secondArray would look like this:
Array ( [0] => FILE-F01-E1-S01.pdf [1] => FILE-F01-E1-S02.pdf
[2] => FILE-F01-E1-S03.pdf [3] => FILE-F01-E1-S04.pdf
[4] => FILE-F01-E1-S05.pdf)
As my result.
Thank you!
You can use array_filter combined with strpos:
$result = array_filter($array, function($filename) {
return strpos($filename, 'FILE-F01-E1') === 0;
});
Are you sure this will be the naming format? That is crucial information to have to construct a regexp or something to check for being a substring of the following strings.
If we can assume this and that the "base" name is always at index 0 then you could do something like.
<?php
$myArr = [
'FILE-F01-E1-S01.pdf',
'FILE-F01-E1-S02.pdf',
'FILE-F01-E1-S03.pdf',
'FILE-F01-E1-S04.pdf',
'FILE-F01-E1-S05.pdf',
'FILE-F02-E1-S01.pdf',
'FILE-F02-E1-S02.pdf',
'FILE-F02-E1-S03.pdf'
];
$baseName = '';
$allSimilarNames = [];
foreach($myArr as $index => &$name) {
if($index == 0) {
$baseName = substr($name, 0, strrpos($name, '-'));
$allSimilarNames[] = $name;
}
else {
if(strpos($name, $baseName) === 0) {
$allSimilarNames[] = $name;
}
}
}
var_dump($allSimilarNames);
This will
Check at index one to get the base name to compare against
Loop all items in the array and match all items, no matter where in the array they are, that are similar according to your naming convention
So if you next time have an array that is
$myArr = [
'FILE-F02-E1-S01.pdf',
'FILE-F01-E1-S01.pdf',
'FILE-F01-E1-S02.pdf',
'FILE-F01-E1-S03.pdf',
'FILE-F01-E1-S04.pdf',
'FILE-F01-E1-S05.pdf',
'FILE-F02-E1-S02.pdf',
'FILE-F02-E1-S03.pdf'
];
this will return all the items that match FILE-F02-E1*.
You could also make a small function of it for easier use and not have to rely on the element at index 0 having to be the "base" name.
<?php
function findMatches($baseName, &$names) {
$matches = [];
$baseName = substr($baseName, 0, strrpos($baseName, '-'));
foreach($names as &$name) {
if(strpos($name, $baseName) === 0) {
$matches[] = $name;
}
}
return $matches;
}
$myArr = [
'FILE-F01-E1-S01.pdf',
'FILE-F01-E1-S02.pdf',
'FILE-F01-E1-S03.pdf',
'FILE-F01-E1-S04.pdf',
'FILE-F01-E1-S05.pdf',
'FILE-F02-E1-S01.pdf',
'FILE-F02-E1-S02.pdf',
'FILE-F02-E1-S03.pdf'
];
$allSimilarNames = findMatches('FILE-F01-E1-S01.pdf', $myArr);
var_dump($allSimilarNames);
Run a simple foreach with strpos() which looks for an occurrence of a string within a string.
$results = array();
foreach($array as $item){
if (strpos($item, 'FILE-F01-E1') === 0) {
array_push($results, $item);
}
}
You could get the first item from the array and use explode and implode to get the part from the filename without the last hyphen and the content after that.
Then use array_filter and use substr using 0 as the start position and the length of the $fileBeginning as the length to check if the string starts with FILE-F01-E1:
$array = [
'FILE-F01-E1-S01.pdf',
'FILE-F01-E1-S02.pdf',
'FILE-F01-E1-S03.pdf',
'FILE-F01-E1-S04.pdf',
'FILE-F01-E1-S05.pdf',
'FILE-F02-E1-S01.pdf',
'FILE-F02-E1-S02.pdf',
'FILE-F02-E1-S03.pdf',
"TESTFILE-F01-E1-S03.pdf"
];
$parts = explode('-', $array[0]);
array_pop($parts);
$fileBeginning = implode('-', $parts);
$secondArray = array_filter($array, function ($x) use ($fileBeginning) {
return substr($x, 0, strlen($fileBeginning)) === $fileBeginning;
});
print_r($secondArray);
Result
Array
(
[0] => FILE-F01-E1-S01.pdf
[1] => FILE-F01-E1-S02.pdf
[2] => FILE-F01-E1-S03.pdf
[3] => FILE-F01-E1-S04.pdf
[4] => FILE-F01-E1-S05.pdf
)
Demo

Get URLs lowest path by domain

I have an array that looks like the following...
$urls = array(
"http://www.google.com",
"http://www.google.com/maps",
"http://www.google.com/mail",
"https://drive.google.com/help",
"https://www.youtube.com",
"https://www.youtube.com/feed/subscriptions",
"https://www.facebook.com/me",
"https://www.facebook.com/me/friends"
);
I find this hard to explain but I want to break this array down to only show the lowest path for each domain with no duplicates, so it looks like this...
$urls = array(
"http://www.google.com",
"https://drive.google.com/help",
"https://www.youtube.com",
"https://www.facebook.com/me"
);
This can be achieved by walking through the array and inspecting the host key by using parse_url(). The following logic will give your desired result.
$output = array();
//Sort the array by character length
usort($urls, function($a, $b) {
return strlen($a)-strlen($b);
});
array_walk($urls, function($url) use (&$output) {
//Parse the URL to get its components
$parsed_url = parse_url($url);
//See if we've already added the host to our final array
if( array_key_exists($parsed_url['host'], $output) === FALSE ) {
//We haven't, so we can now add the url to our final array
$output[$parsed_url['host']] = $url;
}
});
https://eval.in/415655
try this,
$urls = array(
"http://www.google.com",
"http://www.google.com/maps",
"http://www.google.com/mail",
"https://drive.google.com/help",
"https://www.youtube.com",
"https://www.youtube.com/feed/subscriptions",
"https://www.facebook.com/me",
"https://www.facebook.com/me/friends"
);
$temp = array();
$res = array();
usort($urls, function($a, $b) {
return strlen($a)-strlen($b);
});//sort the array based string length
foreach($urls as $url){
$str = preg_replace('#^https?://#', '', $url);
$strarray = explode("/", $str);
if(!in_array($strarray[0], $temp)){
$temp[] = $strarray[0];
$res[] = $url;
}
}
echo"<pre>";
print_r($res);
echo"</pre>";
output:
Array
(
[0] => http://www.google.com
[1] => https://www.youtube.com
[2] => https://www.facebook.com/me
[3] => https://drive.google.com/help
)

Dynamically creating a multidimensional array based on paths

So I've got a list of paths, such as:
path/to/directory/file1
path/directory/file2
path2/dir/file3
path2/dir/file4
And I'd like to convert them into a multidimensional array like this:
array(
path => array(
to => array(
directory => array(
file1 => someValue
),
),
directory => array(
file2 => someValue
),
),
path2 => array(
dir => array(
file3 => someValue,
file4 => someValue
)
)
)
My first thought was to explode() the paths into segments and set up the array using a foreach loop, something like this:
$arr = array();
foreach ( $path as $p ) {
$segments = explode('/', $p);
$str = '';
foreach ( $segments as $s ) {
$str .= "[$s]";
}
$arr{$str} = $someValue;
}
But this doesn't work, and since the number of segments varies, I've kinda got stumped. Is there away to do this?
If somevalue can be an empty array:
<?php
$result = array();
$input = [
'path/to/directory/file1',
'path/directory/file2',
'path2/dir/file3',
'path2/dir/file4',
];
foreach( $input as $e ) {
nest( $result, explode('/', $e));
}
var_export($result);
function nest(array &$target, array $parts) {
if ( empty($parts) ) {
return;
}
else {
$e = array_shift($parts);
if ( !isset($target[$e]) ) {
$target[$e] = [];
}
nest($target[$e], $parts);
}
}
Here is the solution and a easy way
Just Reverse the whole exploded array and start creating array within a Array
$path[1] = "path/to/directory/file1";
$path[2] = "path/directory/file2";
$path[3] = "path2/dir/file3";
$path[4] = "path2/dir/file4";
$arr = array();
$b = array();
$k = 0;
foreach($path as $p) {
$c = 0;
$segments = explode('/', $p);
$reversed = array_reverse($segments);
foreach($reversed as $s) {
if ($c == 0) {
$g[$k] = array($s => "somevalue");
} else {
$g[$k] = array($s => $g[$k]);
}
$c++;
}
$k++;
}
var_dump($g);
Thanks so much VolkerK! Your answer didn't quite answer my question but it got me on the right track. Here's the version I ended up using to get it to work:
$result = array();
$input = [
'path/to/directory/file1' => 'someValue',
'path/directory/file2' => 'someValue',
'path2/dir/file3' => 'someValue',
'path2/dir/file4' => 'someValue',
];
foreach( $input as $e=>$val ) {
nest( $result, explode('/', $e), $val);
}
var_export($result);
function nest(array &$target, array $parts, $leafValue) {
$e = array_shift($parts);
if ( empty($parts) ) {
$target[$e] = $leafValue;
return;
}
if ( !isset($target[$e]) ) {
$target[$e] = [];
}
nest($target[$e], $parts, $leafValue);
}
I basically just added the somevalue as $leafValue and moved the base case around so that it would add the leafValue instead of a blank array at the end.
This results in:
Array
(
[path] => Array
(
[to] => Array
(
[directory] => Array
(
[file1] => someValue
)
)
[directory] => Array
(
[file2] => someValue
)
)
[path2] => Array
(
[dir] => Array
(
[file3] => someValue
[file4] => someValue
)
)
)
Thanks a lot!
It can be done without recursion
$path = array(
'path/to/directory/file1',
'path/directory/file2',
'path2/dir/file3',
'path2/dir/file4');
$arr = [];
$someValue = 'someValue';
foreach ( $path as $p ) {
$segments = explode('/', $p);
$str = '';
$p = &$arr;
foreach ( $segments as $s ) {
if (! isset($p[$s] ) ) $p[$s] = array();
$p = &$p[$s];
}
$p = $someValue;
}
print_r($arr);

Parsing complex URLs

I try to parse a list of url strings, after two hours of work I don't reach any result, the list of url strings look like this:
$url_list = array(
'http://google.com',
'http://localhost:8080/test/project/',
'http://mail.yahoo.com',
'http://www.bing.com',
'http://www.phpromania.net/forum/viewtopic.php?f=24&t=7549',
'https://prodgame10.alliances.commandandconquer.com/12/index.aspx',
'https://prodgame10.alliances.commandandconquer.ro/12/index.aspx',
);
Output should be:
Array
(
[0] => .google.com
[1] => .localhost
[2] => .yahoo.com
[3] => .bing.com
[4] => .phpromania.net
[5] => .commandandconquer.com
)
The first thing what induce me in the error zone is more than 2 dots in the url.
Any algorithm example?
This is what I try:
$url_list = array(
'http://google.com',
'http://localhost:8080/test/project/',
'http://mail.yahoo.com',
'http://www.bing.com',
'http://www.phpromania.net/forum/viewtopic.php?f=24&t=27549',
'https://prodgame10.alliances.commandandconquer.com/12/index.aspx',
);
function size($list)
{
$i=0;
while($list[++$i]!=NULL);
return $i;
}
function url_Host($list)
{
$listSize = size($list)-1;
do
{
$strSize = size($list[$listSize]);
$points = 0;
$dpoints = 0;
$tmpString = '';
do
{
$currentChar = $list[$listSize][$strSize];
if(ord('.')==ord($currentChar))
{
$tmpString .= '.';
$points++;
}
else if(ord(':')==ord($currentChar))
{
$tmpString .= ':';
$dpoints++;
}
}while($list[$listSize][--$strSize]!=NULL);
print $tmpString;
$strSize = size($list[$listSize]);
$tmpString = '';
do
{
$slice = false;
$currentChar = $list[$listSize][$strSize];
if($dpoints > 2)
{
if(ord('\\')==ord($curentChar)) $slice = true;
$tmpString .= '';
}
}while($list[$listSize][--$strSize]!=NULL);
print $tmpString."<br />";
}while($list[--$listSize]);
}
url_Host($url_list);
You can use the built-in function parse_url() as follows:
function getDomain($url)
{
$domain = implode('.', array_slice(explode('.', parse_url($url, PHP_URL_HOST)), -2));
return $domain;
}
Test cases:
foreach ($url_list as $url) {
$result[] = getDomain($url);
}
Output:
Array
(
[0] => google.com
[1] => localhost
[2] => yahoo.com
[3] => bing.com
[4] => phpromania.net
[5] => commandandconquer.com
[6] => commandandconquer.ro
)
As for the dots, you can manually prepend them to string, like so:
$result[] = "." . getDomain($url);
I'm not sure why you need to do this, but this should work.
Demo!
Look at parse_url. For example:
$url = 'http://www.phpromania.net/forum/viewtopic.php?f=24&t=7549';
$host = parse_url($url, PHP_URL_HOST);
First the result for localhost is no sense, but try this:
$result =array();
foreach($url_list as $u){
$arr = explode('//',$u);
$arr2 = explode('.', $arr[1]);
if($arr2[0] == 'www')
array_push($result, $arr2[1]);
else
array_push($result, $arr2[0]);
}
We can also use array_map() with an arrow function to simplify the code.
I'm refactoring #Alessandro Minoccheri's code here.
$domains = array_map(fn($url) => implode('.', array_slice(explode('.', parse_url($url, PHP_URL_HOST)), -2)),$urls);
var_dump($domains);

PHP - Create Hierarchal Array

I'm not even sure how to begin wording this question, but basically, I have an array, that looks like this:
Array
(
[0] => /
[1] => /404/
[2] => /abstracts/
[3] => /abstracts/edit/
[4] => /abstracts/review/
[5] => /abstracts/view/
[6] => /admin/
[7] => /admin/ads/
[8] => /admin/ads/clickcounter/
[9] => /admin/ads/delete/
[10] => /admin/ads/edit/
[11] => /admin/ads/list/
[12] => /admin/ads/new/
[13] => /admin/ads/sponsordelete/
[14] => /admin/ads/sponsoredit/
[15] => /admin/ads/sponsornew/
[16] => /admin/ads/stats/
[17] => /admin/boilerplates/
[18] => /admin/boilerplates/deleteboiler/
[19] => /admin/boilerplates/editboiler/
[20] => /admin/boilerplates/newboilerplate/
[21] => /admin/calendar/event/add/
[22] => /admin/calendar/event/copy/
)
And I need to 'reduce' / 'process' it into an array that looks like this:
Array
(
[''] => Array()
['404'] => Array()
['abstracts'] => Array
(
[''] => Array()
['edit'] => Array()
['review'] => Array()
['view'] => Array()
)
['admin'] => Array
(
['ads'] => Array
(
[''] => Array()
['clickcounter'] => Array()
['delete'] =>Array()
['edit'] => Array()
)
)
.....
.....
)
That, if manually initialized would look something like this:
$urlTree = array( '' => array(),
'404' => array(),
'abstracts'=> array( '' => array(),
'edit' => array(),
'review'=> array(),
'view' => array() ),
'admin' => array( 'ads'=> array( '' => array(),
'clickcounter'=> array(),
'delete' => array(),
'edit' => array() ) )
);
I usually stray away from asking straight up for a chunk of code on SO, but does anyone perhaps have any advice / code that can traverse my array and convert it to a hierarchy?
EDIT: Here is the bit I have right now, which, I know is pitifully small, I'm just blanking out today it seems.
function loadUrlData()
{
// hold the raw data, /blah/blah/
$urlData = array();
$res = sql::query( "SELECT DISTINCT(`url`) FROM `pages` ORDER BY `url` ASC" );
while( $row = sql::getarray( $res ) )
{
$urlData[] = explode( '/', substr( $row['url'], 1, -1 ) );
}
// populated, eventually, with the parent > child data
$treeData = array();
// a url
foreach( $urlData as $k=> $v )
{
// the url pieces
foreach( $v as $k2=> $v2 )
{
}
}
// $treeData eventually
return $urlData;
}
Looks rather easy. You want to loop through all lines (foreach), split them into parts (explode), loop through them (foreach) and categorize them.
Since you don't like asking for a chunk of code, I won't provide any.
Update
A very nice way to solve this is to reference the $urlTree (use &), loop through every part of the URL and keep updating a variable like $currentPosition to the current part in the URL tree. Because you use &, you can simply edit the array directly while still using a simple variable.
Update 2
This might work:
// a url
foreach( $urlData as $k=> $v )
{
$currentSection = &$treeData;
// the url pieces
foreach( $v as $k2=> $v2 )
{
if (!isset($currentSection[$v2])) {
$currentSection[$v2] = array();
}
$currentSection = &$currentSection[$v2];
}
}
I know you didn't ask for a chunk of code, but I'd just call this a petit serving:
$map = array();
foreach($urls as $url) {
$folders = explode('/', trim($url, '/'));
applyChain($map, $folders, array());
}
function applyChain(&$arr, $indexes, $value) { //Here's your recursion
if(!is_array($indexes)) {
return;
}
if(count($indexes) == 0) {
$arr = $value;
} else {
applyChain($arr[array_shift($indexes)], $indexes, $value);
}
}
It's fairly simple. We separate each url into its folders (removing trailing and leading slashes) and then work our way down the array chain until we reach the folder mentioned in the URL. Then we place a new empty array there and continue to the next URL.
My version:
$paths = array(
0 => '/',
1 => '/404/',
2 => '/abstracts/',
3 => '/abstracts/edit/',
4 => '/abstracts/review/',
5 => '/abstracts/view/',
6 => '/admin/',
7 => '/admin/ads/',
// ....
);
$tree = array();
foreach($paths as $path){
$tmp = &$tree;
$pathParts = explode('/', rtrim($path, '/'));
foreach($pathParts as $pathPart){
if(!array_key_exists($pathPart, $tmp)){
$tmp[$pathPart] = array();
}
$tmp = &$tmp[$pathPart];
}
}
echo json_encode($tree, JSON_PRETTY_PRINT);
https://ideone.com/So1HLm
http://ideone.com/S9pWw
$arr = array(
'/',
'/404/',
'/abstracts/',
'/abstracts/edit/',
'/abstracts/review/',
'/abstracts/view/',
'/admin/',
'/admin/ads/',
'/admin/ads/clickcounter/',
'/admin/ads/delete/',
'/admin/ads/edit/',
'/admin/ads/list/',
'/admin/ads/new/',
'/admin/ads/sponsordelete/',
'/admin/ads/sponsoredit/',
'/admin/ads/sponsornew/',
'/admin/ads/stats/',
'/admin/boilerplates/',
'/admin/boilerplates/deleteboiler/',
'/admin/boilerplates/editboiler/',
'/admin/boilerplates/newboilerplate/',
'/admin/calendar/event/add/',
'/admin/calendar/event/copy/');
$result = array();
foreach ($arr as $node) {
$result = magic($node, $result);
}
var_dump($result);
function magic($node, $tree)
{
$path = explode('/', rtrim($node, '/'));
$original =& $tree;
foreach ($path as $node) {
if (!array_key_exists($node, $tree)) {
$tree[$node] = array();
}
if ($node) {
$tree =& $tree[$node];
}
}
return $original;
}
<?php
$old_array = array("/", "/404/", "/abstracts/", "/abstracts/edit/", "/abstracts/review/", "/rrl/");
$new_array = array();
foreach($old_array as $woot) {
$segments = explode('/', $woot);
$current = &$new_array;
for($i=1; $i<sizeof($segments); $i++) {
if(!isset($current[$segments[$i]])){
$current[$segments[$i]] = array();
}
$current = &$current[$segments[$i]];
}
}
print_r($new_array);
?>
You might consider converting your text to a JSON string, then using json_decode() to generate the structure.

Categories