I've got a class: Game which has values.
I've got two arrays with Game instances. Now I need to compare those two arrays for identical values in the game instance.
The game class has the attributes:
homeId
visitingId
Now I need to check for identical values in both arrays (they are large, 100+ game instances)
What I do is:
foreach ($games1 as $game1) {
foreach ($games2 as $game2) {
if ( ($game1->getHomeId() == $game2->getHomeId()) && ($game1->getVisitingId() == $game2->getVisitingId())) {
//Games are the same
}
}
}
This takes ages, is there a way to do it quicker?
Your current solution has complexity of O(n*n). it is possible to get it down to O(nlogn). For this you'll have to sort both arrays and then compare them. I would do something like:
$t1=array();
foreach ($games1 as $key=>$game1) {
$t1[$key]=$game1->getHomeId;
}
asort($t1);
$t2=array();
foreach ($games2 as $key=>$game2) {
$t2[$key]=$game2->getHomeId();
}
asort($t2);
$el1=each($t1);
$el2=each($t2);
do{
if ($el1['value']<$el2['value'])
$el1=each($t1);
elseif ($el1['value']>$el2['value'])
$el2=each($t2);
elseif($games1[$el1['key']]->getVisitingId == $games2[$el2['key']]->getVisitingId())
//game are the same
}while($el1 !== false && $el2 !== false)
this produces considerable overhead, so on small amount of data it will work slower. However the more data are in the arrays, the more efficient this algorithm will be.
How about using the array_diff function in some way which compares two arrays.
http://php.net/manual/en/function.array-diff.php
You're doing a lot of redundant calculations. Use a for loop instead of a foreach loop and start at where you left off, instead of at the beginning:
$games1_count = count($games1);
$games2_count = count($games2);
for($i=0; $i < $games1_count; $i++) {
$game1 = $games1[$i];
for($j=$i; $j < $games2_count; $j++) {
$game2 = $games2[$j];
if (($game1->getHomeId == $game2->getHomeId()) && $game1->getVisitingId == $game2->getVisitingId()) {
//Games are the same
}
}
}
This should provide a pretty significant speed boost. It won't decrease the order of the problem, but it will cut your calculations in half.
EDIT
You should also look into some sort of indexing. When you populate $game1, for instance, create an array that stores the games by value:
$game_index = array(
"home_id"=array(
"id1"=>$reference_to_game_with_id1,
"id2"=>$reference_to_game_with_id2
),
"visiting_id"=array(
"id1"=>$reference_to_game_with_visiting_id1,
"id2"=>$reference_to_game_with_visiting_id2
)
);
I've got it running but it's dirty I think.
At first I store the instances in a hashtable, the hash is made out of the visitorId and the homeId.
Then I create an hash of the visitorId and homeId of the other games array.
Then I retrieve the instance by using $table[$hash].
The arrays I used to have aren't the same in lenght, so this works. I don't know if it's too dirty to post here but it works :P
foreach($pGames as $pGame) {
$hash = $pGame->getHomeId() . '-' . $pGame->getVisitingId();
$table[$hash] = $pGame;
}
foreach($games as $game) {
$hash = $game->getHomeId() . '-' . $game->getVisitingId();
$pGame = $table[$hash];
if($pGame instanceof Game) {
//use the instance
}
}
Related
I have the following code, which checks is an element exists, and if it exists, it checks for the same name, with an incremented number at the end.
For example, it checks is the key "test" exists in the array $this->elements, and if it exists, it checks for "test2", and so on, until the key doesn't exist.
My original code is:
if (isset($this->elements[$desired])) {
$inc = 0;
do {
$inc++;
$new_desired = $desired . $inc;
} while (isset($this->elements[$new_desired]));
$desired = $new_desired;
}
I tried with:
if (isset($this->elements[$desired])) {
return $this->generateUniqueElement($desired, $postfix);
}
private function generateUniqueElement($desired, $postfix) {
$new_desired = $desired . $postfix;
return isset($this->elements[$new_desired]) ? $this->generateUniqueElement($desired, ++$postfix) : $new_desired;
}
But in my tests there's no speed improvement.
Any idea how can I improve the code? On all the pages, this code is called over 10 000 times. And sometimes even over 100k times.
Anticipated thanks!
Without further knowledge on how you generate this list, here's an idea:
$highestElementIds = [];
foreach($this->elements as $element) {
preg_match('/(.*?)(\d+)/', $element, $matches);
$text = $matches[1];
$id = (int)$matches[2];
if(!isset($highestElementIds[$text])) {
$highestElementIds[$text] = $id;
} else {
if($id > $highestElementIds[$text]) {
$highestElementIds[$text] = $id;
}
}
}
// find some element by a simple array access
$highestElementIds['test']; // will return 2 in your example
If your code is really being called 100k times, it should be a lot faster to iterate your list only once and then get the highest id directly from an array which contains the highest number (since you don't need to iterate through it again).
That being said, I still wonder what's the actual reason for having such a huge array in the first place...
Typical unique IDs are either random (UUID or random chars) or sequential numbers. The latter is as simple as it gets and it can be generated with a simple counter:
function generateNewElement($postfix) {
static $i = 0;
return sprintf('%d%s', $i++, $postfix);
}
echo generateNewElement('foo'), PHP_EOL;
echo generateNewElement('foo'), PHP_EOL;
echo generateNewElement('foo'), PHP_EOL;
echo generateNewElement('foo'), PHP_EOL;
0foo
1foo
2foo
3foo
Of course this is just a generic solution so it may not fit your specific use case.
I am looking for a way to create new arrays in a loop. Not the values, but the array variables. So far, it looks like it's impossible or complicated, or maybe I just haven't found the right way to do it.
For example, I have a dynamic amount of values I need to append to arrays. Let's say it will be 200 000 values. I cannot assign all of these values to one array, for memory reasons on server, just skip this part.
I can assign a maximum amount of 50 000 values per one array. This means, I will need to create 4 arrays to fit all the values in different arrays. But next time, I will not know how many values I need to process.
Is there a way to generate a required amount of arrays based on fixed capacity of each array and an amount of values? Or an array must be declared manually and there is no workaround?
What I am trying to achieve is this:
$required_number_of_arrays = ceil(count($data)/50000);
for ($i = 1;$i <= $required_number_of_arrays;$i++) {
$new_array$i = array();
foreach ($data as $val) {
$new_array$i[] = $val;
}
}
// Created arrays: $new_array1, $new_array2, $new_array3
A possible way to do is to extend ArrayObject. You can build in limitation of how many values may be assigned, this means you need to build a class instead of $new_array$i = array();
However it might be better to look into generators, but Scuzzy beat me to that punchline.
The concept of generators is that with each yield, the previous reference is inaccessible unless you loop over it again. It will be in a way, overwritten unlike in arrays, where you can always traverse over previous indexes using $data[4].
This means you need to process the data directly. Storing the yielded data into a new array will negate its effects.
Fetching huge amounts of data is no issue with generators but one should know the concept of them before using them.
Based on your comments, it sounds like you don't need separate array variables. You can reuse the same one. When it gets to the max size, do your processing and reinitialize it:
$max_array_size = 50000;
$n = 1;
$new_array = [];
foreach ($data as $val) {
$new_array[] = $val;
if ($max_array_size == $n++) {
// process $new_array however you need to, then empty it
$new_array = [];
$n = 1;
}
}
if ($new_array) {
// process the remainder if the last bit is less than max size
}
You could create an array and use extract() to get variables from this array:
$required_number_of_arrays = ceil($data/50000);
$new_arrays = array();
for ($i = 1;$i <= $required_number_of_arrays;$i++) {
$new_arrays["new_array$i"] = $data;
}
extract($new_arrays);
print_r($new_array1);
print_r($new_array2);
//...
I think in your case you have to create an array that holds all your generated arrays insight.
so first declare a variable before the loop.
$global_array = [];
insight the loop you can generate the name and fill that array.
$global_array["new_array$i"] = $val;
After the loop you can work with that array. But i think in the end that won't fix your memory limit problem. If fill 5 array with 200k entries it should be the same as filling one array of 200k the amount of data is the same. So it's possible that you run in both ways over the memory limit. If you can't define the limit it could be a problem.
ini_set('memory_limit', '-1');
So you can only prevent that problem in processing your values directly without saving something in an array. For example if you run a db query and process the values directly and save only the result.
You can try something like this:
foreach ($data as $key => $val) {
$new_array$i[] = $val;
unset($data[$key]);
}
Then your value is stored in a new array and you delete the value of the original data array. After 50k you have to create a new one.
Easier way use array_chunk to split your array into parts.
https://secure.php.net/manual/en/function.array-chunk.php
There's non need for multiple variables. If you want to process your data in chunks, so that you don't fill up memory, reuse the same variable. The previous contents of the variable will be garbage collected when you reassign it.
$chunk_size = 50000;
$number_of_chunks = ceil($data_size/$chunk_size);
for ($i = 0; $i < $data_size; $i += $chunk_size) {
$new_array = array();
foreach ($j = $i * $chunk_size; $j < min($j + chunk_size, $data_size); $j++) {
$new_array[] = get_data_item($j);
}
}
$new_array[$i] serves the same purpose as your proposed $new_array$i.
You could do something like this:
$required_number_of_arrays = ceil(count($data)/50000);
for ($i = 1;$i <= $required_number_of_arrays;$i++) {
$array_name = "new_array_$i";
$$array_name = [];
foreach ($data as $val) {
${$array_name}[] = $val;
}
}
I'm becoming a little frustrated with my array results. Ideally, I am creating a form maker module within my application and I am working with two different arrays to establish my database columns and excel columns. Essentially, I am using the results provided by the arrays to write directly to a php file (Excel reader file). In order to establish a difference in Excel Workbooks, I am putting forth an identifier "page2","page3" and so on within the "excel_rows" array.
//my arrays
$table_columns = array('field1','field2','field3','field4','field5'); //fields
$excel_rows = array('c1','c2','page2','c3','c4','page3','c5'); //excel columns
From here.. I go on to try to filter the array keys..
foreach(array_keys($excel_rows) as $key){
$page = array_search(strpos(trim($excel_rows[$key]),'page'),$excel_rows);
if(strpos(trim($excel_rows[$key]),'page') !== false){
$excel_row .= '$objTpl->setActiveSheetIndex('.(str_replace('page','',trim($excel_rows[$key])) -1).');<br/>'.PHP_EOL;
$table_columns[$key] = 0;
}
else {
$excel_row .= '$objTpl->getActiveSheet()->setCellValue(\''.trim($excel_rows[$key]).'\',$row[\''.trim($table_columns[$key]).'\']);<br/>'.PHP_EOL;
}
}
print $excel_row;
The result should echo out the following:
$objTpl->getActiveSheet()->setCellValue('c1', $row['field1']);
$objTpl->getActiveSheet()->setCellValue('c2', $row['field2']);
$objTpl->setActiveSheetIndex(1);<br/>
$objTpl->getActiveSheet()->setCellValue('c3', $row['field4']);
$objTpl->getActiveSheet()->setCellValue('c4', $row['field5']);
$objTpl->setActiveSheetIndex(2);
$objTpl->getActiveSheet()->setCellValue('c5', $row['']);
As one can see, I am missing 'field3' from my result and 'cs' produces and empty row rather than "field5".
I'm assuming something like array_compare or array_combine is the solution - I'm just not able to put it together.
Everything works lovely with module pardoning the array code above. Any help with this would be sincerely appreciated!
-Regards.
How it currently is I'd say you need to set an integer +1 whenever you create a new page and then subtract that integer from the key so you can get the right field.
$subkey = 0;
foreach(array_keys($excel_rows) as $key){
$fieldkey = $key - $subkey;
$page = array_search(strpos(trim($excel_rows[$key]),'page'),$excel_rows);
if(strpos(trim($excel_rows[$key]),'page') !== false){
$excel_row .= '$objTpl->setActiveSheetIndex('.(str_replace('page','',trim($excel_rows[$key])) -1).');<br/>'.PHP_EOL;
//$table_columns[$key] = 0; I'm not sure what this is supposed to do
$subkey++;
}
else {
$excel_row .= '$objTpl->getActiveSheet()->setCellValue(\''.trim($excel_rows[$key]).'\',$row[\''.trim($table_columns[$fieldkey]).'\']);<br/>'.PHP_EOL;
}
}
print $excel_row;
I have a large array.
In this array I have got (among many other things) a list of products:
$data['product_name_0'] = '';
$data['product_desc_0'] = '';
$data['product_name_1'] = '';
$data['product_desc_1'] = '';
This array is provided by a third party (so I have no control over this).
It is not known how many products there will be in the array.
What would be a clean way to loop though all the products?
I don't want to use a foreach loop since it will also go through all the other items in the (large) array.
I cannot use a for loop cause I don't know (yet) how many products the array contains.
I can do a while loop:
$i = 0;
while(true) { // doing this feels wrong, although it WILL end at some time (if there are no other products)
if (!array_key_exists('product_name_'.$i, $data)) {
break;
}
// do stuff with the current product
$i++;
}
Is there a cleaner way of doing the above?
Doing a while(true) looks stupid to me or is there nothing wrong with this approach.
Or perhaps there is another approach?
Your method works, as long as the numeric portions are guaranteed to be sequential. If there's gaps, it'll miss anything that comes after the first gap.
You could use something like:
$names = preg_grep('/^product_name_\d+$/', array_keys($data));
which'll return all of the 'name' keys from your array. You'd extract the digit portion from the key name, and then can use that to refer to the 'desc' section as well.
foreach($names as $name_field) {
$id = substr($names, 12);
$name_val = $data["product_name_{$id}"];
$desc_val = $data["product_desc_{$id}"];
}
How about this
$i = 0;
while(array_key_exists('product_name_'.$i, $data)) {
// loop body
$i++;
}
I think you're close. Just put the test in the while condition.
$i = 0;
while(array_key_exists('product_name_'.$i, $data)) {
// do stuff with the current product
$i++;
}
You might also consider:
$i = 0;
while(isset($data['product_name_'.$i])) {
// do stuff with the current product
$i++;
}
isset is slightly faster than array_key_exists but does behave a little different, so may or may not work for you:
What's quicker and better to determine if an array key exists in PHP?
Difference between isset and array_key_exists
I have a PHP script which reads a large CSV and performs certain actions, but only if the "username" field is unique. The CSV is used in more than one script, so changing the input from the CSV to only contain unique usernames is not an option.
The very basic program flow (which I'm wondering about) goes like this:
$allUsernames = array();
while($row = fgetcsv($fp)) {
$username = $row[0];
if (in_array($username, $allUsernames)) continue;
$allUsernames[] = $username;
// process this row
}
Since this CSV could actually be quite large, it's that in_array bit which has got me thinking. The most ideal situation when searching through an array for a member is if it is already sorted, so how would you build up an array from scratch, keeping it in order? Once it is in order, would there be a more efficient way to search it than using in_array(), considering that it probably doesn't know the array is sorted?
Not keeping the array in order, but how about this kind of optimization? I'm guessing isset() for an array key should be faster than in_array() search.
$allUsernames = array();
while($row = fgetcsv($fp)) {
$username = $row[0];
if (isset($allUsernames[$username])) {
continue;
} else {
$allUsernames[$username] = true;
// do stuff
}
}
The way to build up an array from scratch in sorted order is an insertion sort. In PHP-ish pseudocode:
$list = []
for ($element in $elems_to_insert) {
$index = binary_search($element, $list);
insert_into_list($element, $list, $index);
}
Although, it might actually turn out to be faster to just create the array in unsorted order and then use quicksort (PHP's builtin sort functions use quicksort)
And to find an element in a sorted list:
function binary_search($list, $element) {
$start = 0;
$end = count($list);
while ($end - $start > 1) {
$mid = ($start + $end) / 2;
if ($list[$mid] < $element){
$start = $mid;
}
else{
$end = $mid;
}
}
return $end;
}
With this implementation you'd have to test $list[$end] to see if it is the element you want, since if the element isn't in the array, this will find the point where it should be inserted. I did it that way so it'd be consistent with the previous code sample. If you want, you could check $list[$end] === $element in the function itself.
The array type in php is an ordered map (php array type). If you pass in either ints or strings as keys, you will have an ordered map...
Please review item #6 in the above link.
in_array() does not benefit from having a sorted array. PHP just walks along the whole array as if it were a linked list.