I have the tree nodes connection info in the form of the linked list and ID of the root node. I need to numerate this nodes in such order that any lower level node in result should have higher number than any node of a higher level. Numbering starts from the ID value of the root node and incrementing by 1 for every other node. Order of the nodes on the same level is not important. What algorithms and data structures I may use to solve this problem before I start reinventing the wheel and to avoid the pitfalls? Language that will be used is pure PHP and the data comes from MySQL DB, but any solution like pseudo-code or just plain explanations is welcomed.
Edit:
So far I came up to this (thanks Beta for helping me out):
<?php
$data = array(
array(551285, 551286),
array(551286, 551290),
array(551286, 551297),
array(551288, 551432),
array(551289, 552149),
array(551290, 551292),
array(551292, 551294),
array(551296, 551355),
array(551296, 552245),
array(551297, 551299),
array(551299, 551301),
array(551299, 551304),
array(551304, 551306),
array(551306, 551307),
array(551307, 551308),
array(551308, 551309),
array(551309, 551312),
array(551311, 551328),
array(551312, 551313),
array(551313, 551315),
array(551315, 551316),
array(551316, 551317),
array(551286, 551288),
array(551286, 551289),
array(551286, 551320),
array(551290, 551322),
array(551292, 551324),
array(551294, 551296),
array(551294, 551326),
array(551297, 551342),
array(551299, 551344),
array(551301, 551303),
array(551304, 551346),
array(551307, 551349),
array(551309, 551311),
array(551309, 551353),
array(551313, 551357),
array(551317, 552094),
array(551286, 551287),
array(551290, 551291),
array(551292, 551293),
array(551294, 551295),
array(551297, 551298),
array(551299, 551300),
array(551301, 551302),
array(551304, 551305),
array(551309, 551310),
array(551313, 551314)
);
var_dump(numerateTreeNodes($data, 551285));
function numerateTreeNodes($linked_nodes, $root_node)
{
$numbered_nodes = array();
$queue = new SplQueue();
$queue->enqueue($root_node);
$counter = $root_node;
$children_grouped = groupDirectChildren($linked_nodes);
while ($queue->count()) {
$t = $queue->dequeue();
$numbered_nodes[$counter++] = $t;
if (isset($children_grouped[$t])) {
foreach ($children_grouped[$t] as $t_child) {
$queue->enqueue($t_child);
}
}
}
return $numbered_nodes;
}
function groupDirectChildren($nodes)
{
$grouped = array();
foreach ($nodes as $n) {
$grouped[$n[0]][] = $n[1];
}
return $grouped;
}
Any suggestions/corrections?
Related
I have an array of symbols (not only characters, but also syllables, such as 'p', 'pa', etc.) and I'm trying to come up with a good algorithm to identify words that can be created by concatenating those symbols.
e.g. given the array of symbols ('p', 'pa', 'aw'), the string 'paw' would be a positive match.
This is my current implementation (too slow):
function isValidWord($word,&$symbols){
$nodes = array($word);
while (count($nodes)>0){
$node = array_shift($nodes);
$nodeExpansions = array();
$nodeLength = strlen($node);
if (in_array($node,$symbols)) { return true; }
for ($len=$nodeLength-1;$len>0;$len--){
if (in_array(substr($node, 0, $len), $symbols)){
$nodeExpansions[] = substr($node, $len-$nodeLength);
}
}
$nodes = array_merge($nodeExpansions,$nodes);
}
return false;
}
It doesn't seem like a difficult problem, it's just a depth-first search implementation on an acyclic? tree, but I'm struggling to come up with an implementation which is both memory and CPU efficient. Where can I find resources to learn about this kind of problem?
Also, here is a link to a script for testing it and comparing it to the solutions proposed in the comments below: http://ideone.com/zQ9Cie
And here an album showing captures of really odd results: How can my current iterative method be 12x faster than the recursive one (proposed by #Waleed Khan) when I run them on my dev server, but 2x slower when I run them on my production server, considering both servers have almost identical configurations? (One is an EC2 micro instance and the other a VirtualBox container, but they both have the same OS, config, updates, PHP version and config, number of cores and available RAM)
Not sure wether it's very efficient but I guess I would create a loop with an inner loop which goes through the given array containg the symbols.
<?php
$aSymbols = array('p', 'pa', 'aw');
$aDatabase = array('paw');
$aMatches = array();
for ($iCounter = 0; $iCounter < count($aSymbols); $iCounter++)
{
for ($yCounter = 0; $yCounter < count($aSymbols); $yCounter++)
{
$sString = $aSymbols[$iCounter].$aSymbols[$yCounter];
if (in_array($sString, $aDatabase))
{
$aMatches[] = $sString;
}
}
}
?>
The if query can be replaced by a regex query, too.
As #Waleed Khan suggested, I've tried improving my algorithm using a Trie structure for the dictionary instead of a plain array to speed up the search for matches.
function generateTrie(&$dictionary){
if (is_string($dictionary)){
$dictionary = array($dictionary);
}
if (!is_array($dictionary)){
throw new Exception(
"Invalid input argument for \$dictionary (must be array)",
500
);
}
$trie = array();
$dictionaryCount = count($dictionary);
$f = false;
for ($i=0;$i<$dictionaryCount;$i++){
$word = $dictionary[$i];
if ($f&&!inTrie('in',$trie)){
var_export($trie);
exit;
}
if (!is_string($word)){
throw new Exception(
"Invalid input argument for \$word (must be string)",
500
);
}
$wordLength = strlen($word);
$subTrie = &$trie;
for ($j=1;$j<$wordLength;$j++){
if (array_key_exists($subWord = substr($word,0,$j),$subTrie)){
$subTrie = &$subTrie[$subWord];
}
}
if (array_key_exists($word,$subTrie)){
continue;
}
$keys = array_keys($subTrie);
if (!array_key_exists($word,$subTrie)) {
$subTrie[$word] = array();
}
foreach ($keys as $testWordForPrefix){
if (substr($testWordForPrefix,0,$wordLength) === $word){
$subTrie[$word][$testWordForPrefix] = &$subTrie[$testWordForPrefix];
unset($subTrie[$testWordForPrefix]);
}
}
}
return $trie;
}
/**
* Checks if word is on dictionary trie
*/
function inTrie($word, &$trie){
$wordLen = strlen($word);
$node = &$trie;
$found = false;
for ($i=1;$i<=$wordLen;$i++){
$index = substr($word,0,$i);
if (isset($node[$index])){
$node = &$node[$index];
$found = true;
} else {
$found = false;
}
}
return $found;
}
/**
* Checks if a $word is a concatenation of valid $symbols using inTrie()
*
* E.g. `$word = 'paw'`, `$symbols = array('p', 'pa', 'aw')` would return
* true, because `$word = 'p'.'aw'`
*
*/
function isValidTrieWord($word,&$trie){
$nodes = array($word);
while (count($nodes)>0){
$node = array_shift($nodes);
if (inTrie($node,$trie)) { return true; }
$nodeExpansions = array();
$nodeLength = strlen($node);
for ($len=$nodeLength-1;$len>0;$len--){
if (inTrie(substr($node, 0, $len), $trie)){
$nodeExpansions[] = substr($node, $len-$nodeLength);
}
}
$nodes = array_merge($nodeExpansions,$nodes);
}
return false;
}
It doesn't make much of a difference for small dictionary sizes (where preg_match is still the fastest implementation by several orders of magnitude), but for medium sized dictionaries (~10000 symbols) where longer symbols are usually a combination of shorter ones (which is where preg breaks and the other two implementations can take close to 25 seconds per 2-6 symbols word), the Trie search takes only about 1 second. That's close enough for my needs (check if a given password is a combination of symbols from a given dictionary or not).
(See the whole script on http://ideone.com/zQ9Cie)
Results on my local dev VM:
Results on my AWS EC2 test server:
I want to generate all possible combination of array elements to fill a placeholder, the placeholder size could vary.
Let say I have array $a = array(3, 2, 9, 7) and placeholder size is 6. I want to generate something like the following:
3,3,3,3,3,3
2,3,3,3,3,3
2,2,3,3,3,3
...........
...........
7,7,7,7,7,9
7,7,7,7,7,7
However (2,3,3,3,3,3) would be considered the same as (3,2,3,3,3,3) so the later one doesn't count.
Could anyone point me to the right direction? I know there is Math_Combinatorics pear package, but that one is only applicable to placeholder size <= count($a).
Edit
I am thinking that this one is similar to bits string combination though with different number base
I have no PHP source code for you but some sources that might help.
Some C code. Look at 2.1:
http://www.aconnect.de/friends/editions/computer/combinatoricode_g.html
Delphi code: combination without repetition of N elements without use for..to..do
Wiki article here
Well it took quit some time to figure this one out.
So i split the question into multiple parts
1.
I firsrt made an array with all the possible value options.
function create_all_array($placeholder, array $values)
{
if ($placeholder <= 0) {
return [];
}
$stack = [];
$values = array_unique($values);
foreach ($values as $value) {
$stack[] = [
'first' => $value,
'childs' => create_all_array($placeholder - 1, $values)
];
}
return $stack;
}
2.
Then I made a function to stransform this massive amount of data into string (no check for uniques).
function string($values, $prefix = '')
{
$stack = [];
foreach($values as $value) {
$sub_prefix = $prefix . $value['first'];
if (empty($value['childs'])) {
$stack[$sub_prefix] = (int)$sub_prefix;
} else {
$stack = array_merge($stack, string($value['childs'], $sub_prefix));
}
}
return $stack;
}
3.
Then the hard part came. Check for duplicates. This was harder than expected, but found some good anser to it and refactored it for my use.
function has_duplicate($string, $items)
{
$explode = str_split ($string);
foreach($items as $item) {
$item_explode = str_split($item);
sort($explode);
$string = implode('',$explode);
sort($item_explode);
$item = implode($item_explode);
if ($string == $item) {
return true;
}
}
return false;
}
4.
The last step was to combine the intel into a new funciton :P
function unique_string($placeholder, array $values)
{
$stack = string(create_all_array($placeholder, $values));
$check_stack = [];
foreach($stack as $key => $item) {
if (has_duplicate($item, $check_stack)) {
unset($stack[$key]);
}
$check_stack[] = $item;
}
return $stack;
}
Now you can use it simple as followed
unique_string(3 /* amount of dept */, [1,2,3] /* keys */);
Ps the code is based for PHP5.4+, to convert to lower you need to change the [] to array() but I love the new syntax so sorry :P
I'm using cURL to pull a webpage from a server. I pass it to Tidy and throw the output into a DOMDocument. Then the trouble starts.
The webpage contains about three thousand (yikes) table tags, and I'm scraping data from them. There are two kinds of tables, where one or more type B follow a type A.
I've profiled my script using microtome(true) calls. I've placed calls before and after each stage of my script and subtracted the times from each other. So, if you'll follow me through my code, I'll explain it, share the profile results, and point out where the problem is. Maybe you can even help me solve the problem. Here we go:
First, I include two files. One handles some parsing, and the other defines two "data structure" classes.
// Imports
include('./course.php');
include('./utils.php');
Includes are inconsequential as far as I know, and so let's proceed to the cURL import.
// Execute cURL
$response = curl_exec($curl_handle);
I've configured cURL to not time out, and to post some header data, which is required to get a meaningful response. Next, I clean up the data to prepare it for DOMDocument.
// Run about 25 str_replace calls here, to clean up
// then run tidy.
$html = $response;
//
// Prepare some config for tidy
//
$config = array(
'indent' => true,
'output-xhtml' => true,
'wrap' => 200);
//
// Tidy up the HTML
//
$tidy = new tidy;
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();
$html = $tidy;
Up until now, the code has taken about nine seconds. Considering this to be a cron job, running infrequently, I'm fine with that. However, the next part of the code really barfs. Here's where I take what I want from the HTML and shove it into my custom classes. (I plan to stuff this into a MySQL database too, but this is a first step.)
// Get all of the tables in the page
$tables = $dom->getElementsByTagName('table');
// Create a buffer for the courses
$courses = array();
// Iterate
$numberOfTables = $tables->length;
for ($i=1; $i <$numberOfTables ; $i++) {
$sectionTable = $tables->item($i);
$courseTable = $tables->item($i-1);
// We've found a course table, parse it.
if (elementIsACourseSectionTable($sectionTable)) {
$course = courseFromTable($courseTable);
$course = addSectionsToCourseUsingTable($course, $sectionTable);
$courses[] = $course;
}
}
For reference, here's the utility functions that I call:
//
// Tell us if a given element is
// a course section table.
//
function elementIsACourseSectionTable(DOMElement $element){
$tableHasClass = $element->hasAttribute('class');
$tableIsCourseTable = $element->getAttribute("class") == "coursetable";
return $tableHasClass && $tableIsCourseTable;
}
//
// Takes a table and parses it into an
// instance of the Course class.
//
function courseFromTable(DOMElement $table){
$secondRow = $table->getElementsByTagName('tr')->item(1);
$cells = $secondRow->getElementsByTagName('td');
$course = new Course;
$course->startDate = valueForElementInList(0, $cells);
$course->endDate = valueForElementInList(1, $cells);
$course->name = valueForElementInList(2, $cells);
$course->description = valueForElementInList(3, $cells);
$course->credits = valueForElementInList(4, $cells);
$course->hours = valueForElementInList(5, $cells);
$course->division = valueForElementInList(6, $cells);
$course->subject = valueForElementInList(7, $cells);
return $course;
}
//
// Takes a table and parses it into an
// instance of the Section class.
//
function sectionFromRow(DOMElement $row){
$cells = $row->getElementsByTagName('td');
//
// Skip any row with a single cell
//
if ($cells->length == 1) {
# code...
return NULL;
}
//
// Skip header rows
//
if (valueForElementInList(0, $cells) == "Section" || valueForElementInList(0, $cells) == "") {
return NULL;
}
$section = new Section;
$section->section = valueForElementInList(0, $cells);
$section->code = valueForElementInList(1, $cells);
$section->openSeats = valueForElementInList(2, $cells);
$section->dayAndTime = valueForElementInList(3, $cells);
$section->instructor = valueForElementInList(4, $cells);
$section->buildingAndRoom = valueForElementInList(5, $cells);
$section->isOnline = valueForElementInList(6, $cells);
return $section;
}
//
// Take a table containing course sections
// and parse it put the results into a
// give course object.
//
function addSectionsToCourseUsingTable(Course $course, DOMElement $table){
$rows = $table->getElementsByTagName('tr');
$numRows = $rows->length;
for ($i=0; $i < $numRows; $i++) {
$section = sectionFromRow($rows->item($i));
// Make sure we have an array to put sections into
if (is_null($course->sections)) {
$course->sections = array();
}
// Skip "meta" rows, since they're not really sections
if (is_null($section)) {
continue;
}
$course->addSection($section);
}
return $course;
}
//
// Returns the text from a cell
// with a
//
function valueForElementInList($index, $list){
$value = $list->item($index)->nodeValue;
$value = trim($value);
return $value;
}
This code takes 63 seconds. That's over a minute for a PHP script to pull data from a webpage. Sheesh!
I've been advised to split up the workload of my main work loop, but considering the homogenous nature of my data, I'm not entirely sure how. Any suggestions on improving this code are greatly appreciated.
What can I do to improve my code execution time?
It turns out that my loop is terribly inefficient.
Using a foreach cut time in half to about 31 seconds. But that wasn't fast enough. So I reticulated some splines and did some brainstorming with about half of the programmers that I know how to poke online. Here's what we found:
Using DOMNodeList's item() accessor is linear, producing exponentially slow processing times in loops. So, removing the first element after each iteration makes the loop faster. Now, we always access the first element of the list. This brought me down to 8 seconds.
After playing some more, I realized that the ->length property of DOMNodeList is just as bad as item(), since it also incurs linear cost. So I changed my for loop to this:
$table = $tables->item(0);
while ($table != NULL) {
$table = $tables->item(0);
if ($table === NULL) {
break;
}
//
// We've found a section table, parse it.
//
if (elementIsACourseSectionTable($table)) {
$course = addSectionsToCourseUsingTable($course, $table);
}
//
// Skip the last table if it's not a course section
//
else if(elementIsCourseHeaderTable($table)){
$course = courseFromTable($table);
$courses[] = $course;
}
//
// Remove the first item from the list
//
$first = $tables->item(0);
$first->parentNode->removeChild($first);
//
// Get the next table to parse
//
$table = $tables->item(0);
}
Note that I've done some other optimizations in terms of targeting the data I want, but the relevant part is how I handle progressing from one item to the next.
TL;DR
I have this data: var_export and print_r.
And I need to narrow it down to: http://pastebin.com/EqwgpgAP ($data['Stock Information:'][0][0]);
How would one achieve it? (dynamically)
I'm working with vTiger 5.4.0 CRM and am looking to implement a function that would return a particular field information based on search criteria.
Well, vTiger is pretty weakly written system, looks and feels old, everything comes out from hundreds of tables with multiple joins (that's actually not that bad) etc., but job is job.
The need arose from getting usageunit picklist from Products module, Stock Information block.
Since there is no such function as getField();, I am looking forward to filter it out from Blocks, that is actually gathering the information about fields also.
getBlocks(); then calls something close to getFields();, that again something close to getValues(); and so on.
So...
$focus = new $currentModule(); // Products
$displayView = getView($focus->mode);
$productsBlocks = getBlocks($currentModule, $displayView, $focus->mode, $focus->column_fields); // in theory, $focus->column_fields should/could be narrowed down to my specific field, but vTiger doesn't work that way
echo "<pre>"; print_r($productsBlocks); echo "</pre>"; // = http://pastebin.com/3iTDUUgw (huge dump)
As you can see, the array under the key [Stock Information:], that actually comes out from translations (yada, yada...), under [0][0] contains information for usageunit.
Now, I was trying to array_filter(); the data out from there, but only thing I've managed to get is $productsBlocks stripped down to only contain [Stock Information:] with all the data:
$getUsageUnit = function($value) use (&$getUsageUnit) {
if(is_array($value)) return array_filter($value, $getUsageUnit);
if($value == 'usageunit') return true;
};
$productsUsageUnit = array_filter($productsBlocks, $getUsageUnit);
echo "<pre>"; print_r($productsUsageUnit); echo "</pre>"; // = http://pastebin.com/LU6VRC4h (not that huge of a dump)
And, the result I'm looking forward to is http://pastebin.com/EqwgpgAP, that I've manually got by print_r($productsUsageUnit['Stock Information:'][0][0]);.
How do I achieve this? (dynamically...)
function helper($data, $query) {
$result = array();
$search = function ($data, &$stack) use(&$search, $query) {
foreach ($data as $entry) {
if (is_array($entry) && $search($entry, $stack) || $entry === $query) {
$stack[] = $entry;
return true;
}
}
return false;
};
foreach ($data as $sub) {
$parentStack = array();
if ($search($sub, $parentStack)) {
$result[] = $parentStack[sizeof($parentStack) - 2];
}
}
return $result;
}
$node = helper($data, 'usageunit');
print_r($node);
Suppose there are 2 directories on my server:
/xyz/public_html/a/
/xyz/public_html/b/
And both of them consist of many files. How do i detect the files that are common to both the folders in terms of their name and file_extension. This program is to be implemented in PHP. Any suggestions?
Using FileSystemIterator, you might do something like this...
<?
$it = new FilesystemIterator('/xyz/public_html/a/');
$commonFiles = array();
foreach ($it as $file) {
if ($file->isDot() || $file->isDir()) continue;
if (file_exists('/xyz/public_html/b/' . $file->getFilename())) {
$commonFiles[] = $file->getFilename();
}
}
Basically, you have to loop through all the files in one directory, and see if any identically-named files exist in the other directory. Remember that the file name includes the extension.
If it’s just two directories, you could use an algorithm similar to the merge algorithm of merge sort where you have two lists of already sorted items and walk them simultaneously while comparing the current items:
$iter1 = new FilesystemIterator('/xyz/public_html/a/');
$iter2 = new FilesystemIterator('/xyz/public_html/b/');
while ($iter1->valid() && $iter2->valid()) {
$diff = strcmp($iter1->current()->getFilename(), $iter2->current()->getFilename());
if ($diff === 0) {
// duplicate found
} else if ($diff < 0) {
$iter1->next();
} else {
$iter2->next();
}
}
Another solution would be to use the uniqueness of array keys so that you put each directory item into an array as key and then check for each item of the other directory if such a key exists:
$arr = array();
$iter1 = new FilesystemIterator('/xyz/public_html/a/');
foreach ($iter1 as $item) {
$arr[$item->getFilename()] = true;
}
$iter2 = new FilesystemIterator('/xyz/public_html/a/');
foreach ($iter2 as $item) {
if (array_key_exists($item->getFilename(), $arr)) {
// duplicate found
}
}
If you just want to find out which are in common, you can easily use scandir twice and find what's in common, for example:
//Remove first two elements, which will be the constant . and .. Not a very sexy solution
$filesInA = array_shift(array_shift(scandir('/xyz/publichtml/a/')));
$filesInB = array_shift(array_shift(scandir('/xyz/publichtml/b/')));
$filesInCommon = array_intersect($filesInA, $filesInB);