I am working on a application that has the following the following:
Each month, an API returns a series of values depending on the data, so something like (These are updated everyday so the results are cached)
$data = array(
array("2016-02-03" => 3", "2016-02-04" => 4", "2016-02-05" => 1"),
array("2016-02-03" => 1", "2016-02-04" => 2", "2016-02-05" => 3"),
array("2016-02-03" => 60", "2016-02-04" => 18", "2016-02-05" => 3"),
);
What I am therefore trying to achieve is that the algorithm will take the first key ("2016-02-03") then iterate through all of the sub arrays and find the values for this key and then sums them up and calculates the average, finally add this to another array. This will continue until there are no more keys left.
The problem is, I could have a huge foreach loop and do it that way, but the problem is, there are over 40 values and all contain around 30 days worth of data so this would be inefficient.
Is there an alternative to solving this problem? One that won't be intensive and slow?
I can only imagine the sulotion is to run the server for as long as it takes. I also suggest that after each date match you add the value to your new array and unset the index to reduce memory and the time needed to loop through everything.
In your new array you can have the syntax:
[ "Year-month-day" => [...] ]
Where the dots will be all the values.
Related
I have a very complex array that I need to loop through.
Array(
[1] => Array(
[1] => ""
[2] => Array(
[1] => ""
[2] => Array(
[1] => ""
)
)
)
)
I can't use nested loops because this array could contain hundreds of nested arrays. Also, the nested ones could contain nested arrays too.
This array presents comments and replies, Where replies could contain more replies.
Any thoughts?
You could use a \RecursiveArrayIterator, which is part of the PHP SPL, shipped non-optional, with the PHP core.
<?php
$arr = [
'lvl1-A' => [
'lvl2' => [
'lvl3' => 'done'
],
],
'lvl1-B' => 'done',
];
function traverse( \Traversable $it ): void {
while ( $it->valid() ) {
$it->hasChildren()
? print "{$it->key()} => \n" and traverse( $it->getChildren() )
: print "{$it->key()} => {$it->current()}\n";
$it->next();
}
}
$it = new \RecursiveArrayIterator( $arr );
$it->rewind();
traverse( $it );
print 'Done.';
Run and play this example in the REPL here: https://3v4l.org/cGtoi
The code is just meant to verbosely explain what you can expect to see. The Iterator walks each level. How you actually code it is up to you. Keep in mind that filtering or flattening the array (read: transforming it up front) might be another option. You could as well use a generator and emit each level and maybe go with Cooperative Multitasking/ Coroutines as PHP core maintainer nikic explained in his blog post.
ProTip: Monitor your RAM consumption with different variants in case your nested Array really is large and maybe requested often or should deliver results fast.
In case you really need to be fast, consider streaming the result, so you can process the output while you are still working on processing the input array.
A last option might be to split the actual array in chunks (like when you are streaming them), therefore processing smaller parts.
The case is quite complex, as you have to loop, but you can't or don't want to for some reasons:
... that I need to loop through
and
I can't use nested loops because this array could contain hundreds of nested arrays
It means you have to either handle your data differently, as you can pack that huge amount of data to be processed later.
If for some reasons it's not an option, you can consider to:
split somehow this big array into smaller arrays
check how does it work with json_encode and parsing string with str_* functions and regex
Your question contains too many things we can't be sure e.g. what exactly these subarrays contain, can you ignore some parts of them, can you change the code that creates huge array in first place etc.
Assuming on the other hand that you could loop. What could bother you? The memory usage, how long it will take etc.?
You can always use cron to run it daily etc. but the most important is to find the cause why you ended up with huge array in the first place.
For my problem you're selecting up to 24 items from a pool of maybe 5-10,000 items. In other words we're generating configurations.
The number 24 comes from the item categories, each item is associated with a particular installation location, an item from location 1 cannot be installed in location 10, so I have arranged my associative array to organize the data in groups. Each item looks like:
$items[9][] = array("id" => "0", "2" => 2, "13" => 20);
Where the first parameter ( $item[9] ) tells you the location it is allowed in. If you want it's ok to think of the idea that you cannot install a tire in the spot for an exhaust pipe.
The items are stored in a mySQL database. The user can specify restrictions on the solution, for example, attribute 2 must have a final value of 25 or more. They can have multiple competing restrictions. The queries retrieve items that have any value for the attributes under consideration (unspecified attributes are stored but we don't do any calculations with them). The PHP script then prunes out any redundant choices (for example: if item 1 has an attribute value of 3 and item 2 has an attribute value of 5, in the absence of another
restriction you would never choose item 1).
After all the processing has occurred get an associative array that looks like:
$items[10][] = array("id" => "3", "2" => 2, "13" => 100);
$items[10][] = array("id" => "4", "2" => 3, "13" => 50);
$items[9][] = array("id" => "0", "2" => 2, "13" => 20);
$items[9][] = array("id" => "1", "2" => -1, "13" => 50);
I have posted a full example data set at this pastebin link. There is reason to believe I can be more restrictive on what I accept into the data set but even at a restriction of 2 elements per option there's a problem.
In the array() value, the id is the reference to the index of the item in the array, and the other values are attribute id and value pairs. So $items[10][] = array("id" => "3", "2" => 2, "13" => 100); means that in location 10 there is an item with id 3 which as a value of 2 in attribute 2 and a value of 100 in attribute 13. If it helps think of an item being identified by a pair eg (10,0) is item 0 in location 10.
I know I'm not being specific, there are 61 attributes and I don't think it changes the structure of the problem with what they represent. If we want, we can think of attribute 2 as weight and attribute 13 as cost. The problem the user wants solved might be to generate a configuration where the weight is 25 exactly and the cost is minimized.
Back of the envelope math says a rough estimate, if there were only 2 choices per location, is 2^24 choices x size of the record. Assuming a 32 bit integer could be encoded to represent a single record somehow, we're looking at 16,777,216 * 4 = 67,108,864 bytes of memory (utterly ignoring data structure overhead) and there is no reason to believe that either of these assumptions is going to be valid, though an algorithm with an upper memory bound in the realm of 67 megs would be an acceptable memory size.
There's no particular reason to stick to this representation, I used associative arrays since I have a variable number of attributes to use and figured that would allow me to avoid a large, sparse array. Above "2"=>2 actually means that filtered attribute with id #2 has a value of 2 and similarly attribute 13's value is 100. I'm happy to change my data structure to something more compact.
One thought I had was that I do have an evaluation criteria I can use to discard most of the intermediate configurations. As an example, I can compute 75 * "value of "2"" + 10 * "value of "13" to provide a relative weighting of the solutions. In other words, if there were no other restrictions on a problem, each value improvement by 1 of attribute 2 costs 75 and each value improvement of attribute 13 costs 10. Continuing the idea of a car part, think of it like buying a stock part and having a machinist modify it to our specifications.
One problem I see with discarding configurations too early is that the weighting function does not take into account restrictions such as "the final result must have a value of "2" that is at exactly 25". So it's fine if I have a full 24 element configuration, I can run through a loop of the restrictions, discard the solutions that don't match and then finally rank the remaining solutions by the function, but I'm not sure there's a valid line of thought that allows me to throw away solutions earlier.
Does anyone have any suggestions on how to move forward? Although a language agnostic solution is fine, I am implementing in PHP if there's some relevant language feature that might be useful.
I solved my issue with memory by performing a depth first cartesian product. I can weigh the solutions one at a time and retain some if I choose or simply output them as I am doing here in this code snippet.
The main inspiration for this solution came from the very concise answer on this question. Here is my code as it seems like finding a php depth first cartesian product algorithm is less than trivial.
function dfcartesian ( $input, $current, $index ) {
// sample use: $emptyArray = array();
// dfcartesian( $items, $emptyArray, 0 )
if ( $index == count( $input ) ) {
// If we have iterated over the entire space and are at the bottom
// do whatever is relevant to your problem and return.
//
// If I were to improve the solution I suppose I'd pass in an
// optional function name that we could pass data to if desired.
var_dump( $current );
echo '<br><br>';
return;
}
// I'm using non-sequential numerical indicies in an associative array
// so I want to skip any empty numerical index without aborting.
//
// If you're using something different I think the only change that
// needs attention is to change $index + 1 to a different type of
// key incrementer. That sort of issue is tackled at
// https://stackoverflow.com/q/2414141/759749
if ( isset ( $input[$index] ) ) {
foreach ( $input[$index] as $element ) {
$current[] = $element;
// despite my concern about recursive function overhead,
// this handled 24 levels quite smoothly.
dfcartesian( $input, $current, ( $index + 1 ) );
array_pop( $current );
}
} else {
// move to the next index if there is a gap
dfcartesian( $input, $current, ( $index + 1 ) );
}
}
I hope this is of use to someone else tackling the same problem.
I have a 1D array (XYData), e.g.
$TE = array(
"1"=>"20",
"2"=>"30",
"5"=>"50",
"10"=>"90"
)
I would like to create a memory effective PHP function which do the Linear interpolation of the passed X value and return the corresponding Y value. e.g.
calling function interpolate($TE,9.5)
then it should return 86
Is there any way to avoid the array search as the XYData set may be very long, say more then 100 points.
Thank you in advance!
No, you cannot avoid looking at your array. To make it more efficient you have to restructure your data. Do this by recursively looking for the middle, and then split it at that point into two parts. For your short example you would get this:
$TER = array("2 and lower" => array("1" => "20",
"2" => "30"),
"5 and higher" => array("5" => "50",
"8" => "100"));
No recursion is shown, and it really doesn't make any sense for such a small set of data, but when the dataset becomes large there's a clear advantage. It's basically a simple binary search tree.
But I have my doubts implementing it would be useful in this case. I'm not going to work it all out, you really should have 100.000 items or more to make this useful. If not, then just work through the array.
Background: Using CodeIgniter with this MongoDB library.
This is my first go-round with mongodb and I'm enjoying it thus far. It's taken a while for me to separate myself from the sql way of thinking, but it is a perfect fit for my current project.
I'm trying to push an array into a document using...
$this->mongo_db->where(array('_id'=>$estimate_id))->push("measurements", $newData)->update('estimates');
If I encode $newData using json_encode($newData) I get {"levels":[{"level_qty":12,"level_uom":"ft"}]}
The problem is, when my function creates the measurements line in the mongodb document, it automatically starts an array with my insertion at [0]. Like this...
"measurements" : [ { "levels" : [ { "level_qty" : 12, "level_uom" : "ft" } ] }]
...leaving me with...
-measurements
--0
----levels
-----0
------level_qty => 2,
------level_uom => ft
What I really want is...
-measurements
--levels
---0
----level_qty => 2,
----level_uom => ft
I'm certain I'm missing something fairly elementary (i.e. php related & not mongodb related), but I'm an admitted amateur who has waded too deep.
$push is used to append a value to an array. In your example, measurements is an array and Mongo is appending $newData as its first element. This explains the 0 index between measurements and levels. In your desired result, measurements is an object equivalent to $newData (i.e. it has a levels property, which in turn has an array of objects within).
Either of the following examples should accomplish what you want:
// if $newData is {"levels": [{"level_qty":12,"level_uom":"ft"}]}
->set("measurements", $newData)
// if $newData is [{"level_qty":12,"level_uom":"ft"}]
->set("measurements.levels", $newData)
// if $newData is {"level_qty":12,"level_uom":"ft"}
->push("measurements.levels", $newData)
Note: $push is going to be more flexible if you want to append data with future updates, whereas $set will naturally overwrite the given field.
sorry if this question has already been answered somewhere else, but I couldn't find it (possibly because I had a tough time phrasing my question properly).
I'm working with a double dimension array which is the result set from a db query. Ive got the array set up so the array's first index is the pk of the row array so the array would look like...
$array[345] = {'id' => 345,
'info1' => 'lorem',
'infor2' => 'ipsum'}
$array[448] = {'id' => 448,
'info1' => 'lorem',
'infor2' => 'ipsum'}
My question... I know the index's are being passed as integers. So, I'm thinking (perhaps incorrectly) that they are being treated as numerical offsets by the array (as opposed to associatively.) So, if the first index is 345, does the system automatically reserve space in memory for index's 0 through 344? The code all works perfectly, but I am wondering if this method is going to eat up a boatload of memory. Especially if I get to the point where there are only two arrays being stored at 322,343 and 554,324. Sorry if it's a dumb question, thanks for any answers.
No, arrays are hashmaps and keys dont equal offsets, e.g
$foo = array(0 => 'x', 1000 => 'y')
is two elements only. There is nothing reserved inbetween.