I have built a web scraper that recursively gets all URLs from a specific website and stores them in an array
Example below:
$array = [
'http://site.test',
'http://site.test/blog',
'http://site.test/blog/blog1',
'http://site.test/blog/blog2',
'http://site.test/services',
'http://site.test/services/service1',
'http://site.test/services/service2',
'http://site.test/services/service2/sub-service',
'http://site.test/product',
'http://site.test/product/product1',
'http://site.test/product/product1',
];
I am looking for some sort of way to organise this array into a multidimensional array so that I can see what pages are child pages and of which section something like the below structure
ie:
Home
----blog
--------article1
--------article2
----services
--------service1
--------service2
------------sub-service1
-----product
--------product1
--------product2
I have tried looping through and extracting certain segments of each string but cannot seem to get the desired result.
Ideally I would like to have the result in an array or even in displayed in a multi-level list for display purposes.
Any guidance would be much appreciated!
Let's try) we have an array of links
$array = [
'http://site.test',
'http://site.test/blog',
'http://site.test/blog/blog1',
'http://site.test/blog/blog2',
'http://site.test/services',
'http://site.test/services/service1',
'http://site.test/services/service2',
'http://site.test/services/service2/sub-service',
'http://site.test/product',
'http://site.test/product/product1',
'http://site.test/product/product2',
];
For creating a tree we should create the Node class
class Node
{
private array $childNodes;
private string $name;
public function __construct(string $name)
{
$this->name = $name;
$this->childNodes = [];
}
public function getName(): string
{
return $this->name;
}
public function addChildNode(Node $node): void
{
$this->childNodes[$node->getName()] = $node;
}
public function hasChildNode(string $name): bool
{
return array_key_exists($name, $this->childNodes);
}
public function getChildNode(string $name): Node
{
return $this->childNodes[$name];
}
public function getChildNodes(): array
{
return $this->childNodes;
}
}
And Tree class, that used Node class.
Method appendUrl parses URL and builds nodes chain.
class Tree
{
private Node $head;
public function __construct()
{
$this->head = new Node('Head');
}
public function getHead(): Node
{
return $this->head;
}
public function appendUrl(string $url): void
{
$parsedUrl = parse_url($url);
$uri = sprintf('%s//%s', $parsedUrl['scheme'], $parsedUrl['host']);
$keys = array_filter(explode('/', $parsedUrl['path'] ?? ''));
$keys = [$uri, ...$keys];
$node = $this->head;
foreach ($keys as $key) {
if (!$node->hasChildNode($key)) {
$prevNode = $node;
$node = new Node($key);
$prevNode->addChildNode($node);
} else {
$node = $node->getChildNode($key);
}
}
}
}
Now we create ConsoleTreeDrawer class that draw our tree to console
class ConsoleTreeDrawer
{
public function draw(Tree $tree): void
{
$node = $tree->getHead();
$this->drawNode($node);
}
private function drawNode(Node $node, int $level = 1): void
{
$prefix = implode('', array_fill(0, 2 * $level, '-'));
print("{$prefix}{$node->getName()}\n");
foreach ($node->getChildNodes() as $childNode) {
$this->drawNode($childNode, $level + 1);
}
}
}
And let`s use our classes
$tree = new Tree();
foreach ($array as $url) {
$tree->appendUrl($url);
}
$drawer = new ConsoleTreeDrawer();
$drawer->draw($tree);
And we drew the tree
--Head
----http//site.test
------blog
--------blog1
--------blog2
------services
--------service1
--------service2
----------sub-service
------product
--------product1
Algorithm:
Remove the prefix http:// for now as it is useless for our requirement. You can add it later on again.
Next is to sort all the elements using usort. Here, based on length obtained from exploding based on /.
Now, we can be assured that all parents are before child in the array.
Next is to assign version number/rank to each link. Naming is as follows:
'http://site.test' => 1
'http://site.test/blog' => 1.1
'http://site.test/services' => 1.2
'http://site.test/blog/blog1' => 1.1.1
Above is the strategy in which version numbers will be assigned.
Now, we just need to sort the array based on this version numbers using uasort
and you are done.
Snippet:
<?php
$array = [
'http://site.test',
'http://site.test/blog',
'http://site.test/blog/blog1',
'http://site.test/blog/blog2',
'http://site.test/services',
'http://site.test/services/service1',
'http://site.test/services/service2',
'http://site.test/services/service2/sub-service',
'http://site.test/product',
'http://site.test/product/product1',
];
// remove http://
foreach($array as &$val){
$val = substr($val,7);
}
// sort based on length on explode done on delimiter '/'
usort($array, function($a,$b){
return count(explode("/",$a)) <=> count(explode("/",$b));
});
$ranks = [];
$child_count = [];
// assign ranks/version numbers
foreach($array as $link){
$parent = getParent($link);
if(!isset($ranks[$parent])){
$ranks[$link] = 1;
}else{
$child_count[$parent]++;
$ranks[$link] = $ranks[$parent] . "." . $child_count[$parent];
}
$child_count[$link] = 0;
}
function getParent($link){
$link = explode("/",$link);
array_pop($link);
return implode("/",$link);
}
// sort based on version numbers
uasort($ranks,function($a,$b){
$version1 = explode(".", $a);
$version2 = explode(".", $b);
foreach($version1 as $index => $v_num){
if(!isset($version2[$index])) return 1;
$aa = intval($v_num);
$bb = intval($version2[$index]);
if($aa < $bb) return -1;
if($bb < $aa) return 1;
}
return count($version1) <=> count($version2);
});
// get the actual product links that were made as keys
$array = array_keys($ranks);
print_r($array);// now you can attach back http:// prefix if you like
Note: Current algorithm removes duplicates as well as there is no point in keeping them.
#Update:
Since you need a multidimensional hierarchical array, we can keep track of parent and child array link references and insert children into their respective parents.
<?php
$array = [
'http://site.test',
'http://site.test/blog',
'http://site.test/blog/blog1',
'http://site.test/blog/blog2',
'http://site.test/services',
'http://site.test/services/service1',
'http://site.test/services/service2',
'http://site.test/services/service2/sub-service',
'http://site.test/product',
'http://site.test/product/product1',
];
foreach($array as &$val){
$val = substr($val,7);
}
usort($array, function($a,$b){
return count(explode("/",$a)) <=> count(explode("/",$b));
});
$hier = [];
$set = [];
foreach($array as $link){
$parent = getParent($link);
if(!isset($set[$parent])){
$hier[$link] = [];
$set[$link] = &$hier[$link];
}else{
$parent_array = &$set[$parent];
$parent_array[$link] = [];
$set[$link] = &$parent_array[$link];
}
}
function getParent($link){
$link = explode("/",$link);
array_pop($link);
return implode("/",$link);
}
print_r($hier);
Related
I created a cache from xml, and by a construct I generate the object which finally become the arrays. And everything would be ok, if the key of these arrays wasnt "0". I dont know how it works. I searched the information how to change the class, or how to replace the keys. I am stuck. Could you help me with this.
$xml = simplexml_load_file($cache);
}
class Property {
public $xmlClass;
public $elemClass = '';
public $result_array = [];
public $data = '';
public function __construct($xml,$elem) {
$this->xmlClass=$xml;
$this->elemClass=$elem;
foreach($xml->list->movie as $value) {
$data = $value->$elem;
$this->result_array[] = $data;
}
}
public function getResult() {
return $this->result_array;
}
}
$result_zn = new Property($xml,'zn');
$result_au = new Property($xml,'au');
$result_ti = new Property($xml, 'ti');
$zn = $result_zn->getResult();
$au = $result_au->getResult();
$ti = $result_ti->getResult();
I think you can use the function array_values() to get the key 0,like this:
$arr = array(
'1' => 'cat',
'2' => 'dog'
);
$newarr = array_values($arr);
print_r($newarr);
and the result is :
Array ( [0] => cat [1] => dog )
I need to return family data (parents, siblings and partners) for 'x' number of generations (passed as $generations parameter) starting from a single person (passed as $id parameter). I can't assume two parents, this particular genealogy model has to allow for a dynamic number of parents (to allow for biological and adoptive relationships). I think my recursion is backwards, but I can't figure out how.
The code below is triggering my base clause 5 times, once for each generation, because $generation is being reduced by 1 not for every SET of parents but for every parent. What I want is for the base clause ($generations == 0) to only be triggered once, when 'x' number of generations for all parents of the initial person are fetched.
public function fetchRelationships($id = 1, $generations = 5, $relationships = array())
{
$perId = $id;
if ($generations == 0) {
return $relationships;
} else {
$parents = $this->fetchParents($perId);
$relationships[$perId]['parents'] = $parents;
$relationships[$perId]['partners'] = $this->fetchPartners($perId);
if (!empty($parents)) {
--$generations;
foreach ($parents as $parentRel) {
$parent = $parentRel->getPer2();
$pid = $parent->getId();
$relationships[$perId]['siblings'][$pid] = $this->fetchSiblings($perId, $pid);
$perId = $pid;
$relationships[$perId] = $this->fetchRelationships($perId, $generations, $relationships);
}
}
return $relationships;
}
}
The methods fetchPartners, fetchParents and fetchSiblings just fetch the matching entities. So I am not pasting them here. Assuming that there are 2 parents, 5 generations and each generation has 2 parents then the return array should contain 62 elements, and should only trigger the base clause once those 62 elements are filled.
Thanks, in advance, for any help.
-----------Edit--------
Have rewritten with fetchSiblings and fetchPartners code removed to make it easier to read:
public function fetchRelationships($id = 1, $generations = 5, $relationships = array())
{
$perId = $id;
if ($generations == 0) {
return $relationships;
} else {
$parents = $this->fetchParents($perId);
$relationships[$perId]['parents'] = $parents;
if (!empty($parents)) {
--$generations;
foreach ($parents as $parentRel) {
$perId = $parentRel->getPer2()->getId();
$relationships[$perId] = $this->fetchRelationships($perId, $generations, $relationships);
}
}
return $relationships;
}
}
Garr Godfrey got it right. $generations will equal zero when it reaches the end of each branch. So you'll hit the "base clause" as many times as there are branches. in the foreach ($parents as $parentRel) loop, you call fetchRelationships for each parent. That's two branches, so you'll have two calls to the "base clause". Then for each of their parents, you'll have another two calls to the "base clause", and so on...
Also, you're passing back and forth the relationships, making elements of it refer back to itself. I realize you're just trying to retain information as you go, but you're actually creating lots of needless self-references.
Try this
public function fetchRelationships($id = 1, $generations = 5)
{
$perId = $id;
$relationships = array();
if ($generations == 0) {
return $relationships;
} else {
$parents = $this->fetchParents($perId);
$relationships[$perId]['parents'] = $parents;
if (!empty($parents)) {
--$generations;
foreach ($parents as $parentRel) {
$perId = $parentRel->getPer2()->getId();
$relationships[$perId] = $this->fetchRelationships($perId, $generations);
}
}
return $relationships;
}
}
you'll still hit the base clause multiple times, but that shouldn't matter.
you might be thinking "but then i will lose some of the data in $relationships", but you won't. It's all there from the recursive returns.
If you're pulling this out of a database, have you considered having the query do all of the leg work for you?
Not sure how you need the data stacked or excluded, but here's one way to do it:
<?php
class TreeMember {
public $id;
// All three should return something like:
// array( $id1 => $obj1, $id2 => $obj2 )
// and would be based on $this->$id
public function fetchParents(){ return array(); }
public function fetchPartners(){ return array(); };
public function fetchSiblings(){ return array(); };
public function fetchRelationships($generations = 5)
{
// If no more to go
if ($generations == 0) { return; }
$branch = array();
$branch['parents'] = $this->fetchParents();
$branch['partners'] = $this->fetchPartners();
$branch['partners'] = $this->fetchSiblings();
// Logic
$generations--;
foreach($branch as $tmType, $tmArr)
{
foreach($tmArr as $tmId => $tmObj)
{
$branch[$tmType][$tmId] =
$mObj->fetchRelationships
(
$generations
)
);
});
return array($this->id => $branch);
}
}
How would something like this be possible:
I have an object called Player:
class Player
{
public $name;
public $lvl;
}
and I have an array of these players in: $array.
For example $array[4]->name = 'Bob';
I want to search $array for a player named "Bob".
Without knowing the array key, how would I search $array for a Player named "Bob" so that it returns the key #? For example it should return 4.
Would array_search() work in this case? How would it be formatted?
Using array_filter will return you a new array with only the matching keys.
$playerName = 'bob';
$bobs = array_filter($players, function($player) use ($playerName) {
return $player->name === $playerName;
});
According to php docs, array_search would indeed work:
$players = array(
'Mike',
'Chris',
'Steve',
'Bob'
);
var_dump(array_search('Bob', $players)); // Outputs 3 (0-index array)
-- Edit --
Sorry, read post to quick, didn't see you had an array of objects, you could do something like:
$playersScalar = array(
'Mike',
'Chris',
'Steve',
'Bob'
);
class Player
{
public $name;
public $lvl;
}
foreach ($playersScalar as $playerScaler) {
$playerObject = new Player;
$playerObject->name = $playerScaler;
$playerObjects[] = $playerObject;
}
function getPlayerKey(array $players, $playerName)
{
foreach ($players as $key => $player) {
if ($player->name === $playerName) {
return $key;
}
}
}
var_dump(getPlayerKey($playerObjects, 'Steve'));
I need to do fast lookups to find if an array exists in an array. If I knew the depth of the array It would be easy - and fast!
$heystack['lev1']['lev2']['lev3'] = 10; // $heystack stores 10,000s of arrays like this
if(isset($heystack[$var1][$var2][$var3])) do something...
How would you do this dynamically if you don't know the depth? looping and searching at each level will be too slow for my application.
Your question has already the answer:
if (isset($heystack[$var1][$var2][$var3]))
{
# do something...
}
If you don't know the how many $var1 ... $varN you have, you can only do it dynamically which involves either looping or eval and depends if you need to deal with string or numerical keys. This has been already asked and answered:
Loop and Eval: use strings to access (potentially large) multidimensional arrays (and that's only one of the many)
If you are concerned about speed, e.g. if the array is always the same but you need to query it often, create a index first that has compound keys so you can more easily query it. That could be either done by storing all keys while traversing the array recursively:
class CompoundKeys extends RecursiveIteratorIterator
{
private $keys;
private $separator;
public function __construct($separator, RecursiveIterator $iterator, $mode = RecursiveIteratorIterator::SELF_FIRST, $flags = 0)
{
$this->separator = $separator;
parent::__construct($iterator, $mode, $flags);
}
public function current()
{
$current = parent::current();
if (is_array($current))
{
$current = array_keys($current);
}
return $current;
}
public function key()
{
$depth = $this->getDepth();
$this->keys[$depth] = parent::key();
return implode('.', array_slice($this->keys, 0, $depth+1));
}
}
Usage:
$it = new CompoundKeys('.', new RecursiveArrayIterator($array));
$compound = iterator_to_array($it, 1);
isset($compound["$var1.$var2.$var3"]);
Alternatively this can be done by traversing recursively and referencing the original arrays values:
/**
* create an array of compound array keys aliasing the non-array values
* of the original array.
*
* #param string $separator
* #param array $array
* #return array
*/
function array_compound_key_alias(array &$array, $separator = '.')
{
$index = array();
foreach($array as $key => &$value)
{
if (is_string($key) && FALSE !== strpos($key, $separator))
{
throw new InvalidArgumentException(sprintf('Array contains key ("%s") with separator ("%s").', $key, $separator));
}
if (is_array($value))
{
$subindex = array_compound_key_alias($value, $separator);
foreach($subindex as $subkey => &$subvalue)
{
$index[$key.$separator.$subkey] = &$subvalue;
}
}
else
{
$index[$key] = &$value;
}
}
return $index;
}
Usage:
$index = array_compound_key_alias($array);
isset($index["$var1.$var2.$var3"]);
You'll need some sort of looping but you won't need to traverse the entire depth. You can simply use a function that does the equivalent of $heystack[$var1][$var2][$var3], but dynamically:
$heystack['lev1']['lev2']['lev3'] = 10;
echo getElement($heystack, array('lev1', 'lev2', 'lev3')); // you could build second parameter dynamically
function getElement($array, $indexes = array())
{
foreach ($indexes as $index) {
$array = $array[$index];
}
return $array;
}
// output: 10
You'll need to put in some defense mechanisms to make the function more robust (for elements/indexes that don't exist) but this is the basic approach.
Is there a way to instantiate a new PHP object in a similar manner to those in jQuery? I'm talking about assigning a variable number of arguments when creating the object. For example, I know I could do something like:
...
//in my Class
__contruct($name, $height, $eye_colour, $car, $password) {
...
}
$p1 = new person("bob", "5'9", "Blue", "toyota", "password");
But I'd like to set only some of them maybe. So something like:
$p1 = new person({
name: "bob",
eyes: "blue"});
Which is more along the lines of how it is done in jQuery and other frameworks. Is this built in to PHP? Is there a way to do it? Or a reason I should avoid it?
the best method to do this is using an array:
class Sample
{
private $first = "default";
private $second = "default";
private $third = "default";
function __construct($params = array())
{
foreach($params as $key => $value)
{
if(isset($this->$key))
{
$this->$key = $value; //Update
}
}
}
}
And then construct with an array
$data = array(
'first' => "hello"
//Etc
);
$Object = new Sample($data);
class foo {
function __construct($args) {
foreach($args as $k => $v) $this->$k = $v;
echo $this->name;
}
}
new foo(array(
'name' => 'John'
));
The closest I could think of.
If you want to be more fancy and just want to allow certain keys, you can use __set() (only on php 5)
var $allowedKeys = array('name', 'age', 'hobby');
public function __set($k, $v) {
if(in_array($k, $this->allowedKeys)) {
$this->$k = $v;
}
}
get args won't work as PHP will see only one argument being passed.
public __contruct($options) {
$options = json_decode( $options );
....
// list of properties with ternary operator to set default values if not in $options
....
}
have a looksee at json_decode()
The closest I can think of is to use array() and extract().
...
//in your Class
__contruct($options = array()) {
// default values
$password = 'password';
$name = 'Untitled 1';
$eyes = '#353433';
// extract the options
extract ($options);
// stuff
...
}
And when creating it.
$p1 = new person(array(
'name' => "bob",
'eyes' => "blue"
));