PHP array copy certain keys, built-in functions? Nested loop performance? - php

I have a PHP array that I'd like to duplicate but only copy elements from the array whose keys appear in another array.
Here are my arrays:
$data[123] = 'aaa';
$data[423] = 'bbb';
$data[543] = 'ccc';
$data[231] = 'ddd';
$data[642] = 'eee';
$data[643] = 'fff';
$data[712] = 'ggg';
$data[777] = 'hhh';
$keys_to_copy[] = '123';
$keys_to_copy[] = '231';
$keys_to_copy[] = '643';
$keys_to_copy[] = '712';
$keys_to_copy[] = '777';
$copied_data[123] = 'aaa';
$copied_data[231] = 'ddd';
$copied_data[643] = 'fff';
$copied_data[712] = 'ggg';
$copied_data[777] = 'hhh';
I could just loop through the data array like this:
foreach ($data as $key => $value) {
if ( in_array($key, $keys_to_copy)) {
$copied_data[$key] = $value;
}
}
But this will be happening inside a loop which is retrieving data from a MySQL result set. So it would be a loop nested within a MySQL data loop.
I normally try and avoid nested loops unless there's no way of using PHP's built-in array functions to get the result I'm looking for.
But I'm also weary of having a nested loop within a MySQL data loop, I don't want to keep MySQL hanging around.
I'm probably worrying about nested loop performance unnecessarily as I'll never be doing this for more than a couple of hundred rows of data and maybe 10 keys.
But I'd like to know if there's a way of doing this with built-in PHP functions.
I had a look at array_intesect_key() but that doesn't quite do it, because my $keys_to_copy array has my desired keys as array values rather than keys.
Anyone got any ideas?
Cheers, B

I worked it out - I almost had it above.I thought I'd post the answer anyway for completeness. Hope this helps someone out!
array_intersect_key($data, array_flip($keys_to_copy))
Use array_flip() to switch $keys_to_copy so it can be used within array_intersect_keys()
I'll run some tests to compare performance between the manual loop above, to this answer. I would expect the built-in functions to be faster but they might be pretty equal. I know arrays are heavily optimised so I'm sure it will be close.
EDIT:
I have run some benchmarks using PHP CLI to compare the foreach() code in my question with the code in my answer above. The results are quite astounding.
Here's the code I used to benchmark, which I think is valid:
<?php
ini_set('max_execution_time', 0);//NOT NEEDED FOR CLI
// BUILD RANDOM DATA ARRAY
$data = array();
while ( count($data) <= 200000) {
$data[rand(0, 500000)] = rand(0, 500000);
}
$keys_to_copy = array_rand($data, 100000);
// FOREACH
$timer_start = microtime(TRUE);
foreach ($data as $key => $value) {
if ( in_array($key, $keys_to_copy)) {
$copied_data[$key] = $value;
}
}
echo 'foreach: '.(microtime(TRUE) - $timer_start)."s\r\n";
// BUILT-IN ARRAY FUNCTIONS
$timer_start = microtime(TRUE);
$copied_data = array_intersect_key($data, array_flip($keys_to_copy));
echo 'built-in: '.(microtime(TRUE) - $timer_start)."s\r\n";
?>
And the results...
foreach: 662.217s
array_intersect_key: 0.099s
So it's much faster over loads of array elements to use the PHP array functions rather than foreach. I thought it would be faster but not by that much!

Why not load the entire result set into an array, then begin processing with nested loops?
$query_result = mysql_query($my_query) or die(mysql_error());
$query_rows = mysql_num_rows($query_result);
for ($i = 0; $i < $query_rows; $i++)
{
$row = mysql_fetch_assoc($query_result);
// 'key' is the name of the column containing the data key (123)
// 'value' is the name of the column containing the value (aaa)
$data[$row['key']] = $row['value'];
}
foreach ($data as $key => $value)
{
if ( in_array($key, $keys_to_copy))
{
$copied_data[$key] = $value;
}
}

Related

Is there a way to generate new array variables in a loop in PHP?

I am looking for a way to create new arrays in a loop. Not the values, but the array variables. So far, it looks like it's impossible or complicated, or maybe I just haven't found the right way to do it.
For example, I have a dynamic amount of values I need to append to arrays. Let's say it will be 200 000 values. I cannot assign all of these values to one array, for memory reasons on server, just skip this part.
I can assign a maximum amount of 50 000 values per one array. This means, I will need to create 4 arrays to fit all the values in different arrays. But next time, I will not know how many values I need to process.
Is there a way to generate a required amount of arrays based on fixed capacity of each array and an amount of values? Or an array must be declared manually and there is no workaround?
What I am trying to achieve is this:
$required_number_of_arrays = ceil(count($data)/50000);
for ($i = 1;$i <= $required_number_of_arrays;$i++) {
$new_array$i = array();
foreach ($data as $val) {
$new_array$i[] = $val;
}
}
// Created arrays: $new_array1, $new_array2, $new_array3
A possible way to do is to extend ArrayObject. You can build in limitation of how many values may be assigned, this means you need to build a class instead of $new_array$i = array();
However it might be better to look into generators, but Scuzzy beat me to that punchline.
The concept of generators is that with each yield, the previous reference is inaccessible unless you loop over it again. It will be in a way, overwritten unlike in arrays, where you can always traverse over previous indexes using $data[4].
This means you need to process the data directly. Storing the yielded data into a new array will negate its effects.
Fetching huge amounts of data is no issue with generators but one should know the concept of them before using them.
Based on your comments, it sounds like you don't need separate array variables. You can reuse the same one. When it gets to the max size, do your processing and reinitialize it:
$max_array_size = 50000;
$n = 1;
$new_array = [];
foreach ($data as $val) {
$new_array[] = $val;
if ($max_array_size == $n++) {
// process $new_array however you need to, then empty it
$new_array = [];
$n = 1;
}
}
if ($new_array) {
// process the remainder if the last bit is less than max size
}
You could create an array and use extract() to get variables from this array:
$required_number_of_arrays = ceil($data/50000);
$new_arrays = array();
for ($i = 1;$i <= $required_number_of_arrays;$i++) {
$new_arrays["new_array$i"] = $data;
}
extract($new_arrays);
print_r($new_array1);
print_r($new_array2);
//...
I think in your case you have to create an array that holds all your generated arrays insight.
so first declare a variable before the loop.
$global_array = [];
insight the loop you can generate the name and fill that array.
$global_array["new_array$i"] = $val;
After the loop you can work with that array. But i think in the end that won't fix your memory limit problem. If fill 5 array with 200k entries it should be the same as filling one array of 200k the amount of data is the same. So it's possible that you run in both ways over the memory limit. If you can't define the limit it could be a problem.
ini_set('memory_limit', '-1');
So you can only prevent that problem in processing your values directly without saving something in an array. For example if you run a db query and process the values directly and save only the result.
You can try something like this:
foreach ($data as $key => $val) {
$new_array$i[] = $val;
unset($data[$key]);
}
Then your value is stored in a new array and you delete the value of the original data array. After 50k you have to create a new one.
Easier way use array_chunk to split your array into parts.
https://secure.php.net/manual/en/function.array-chunk.php
There's non need for multiple variables. If you want to process your data in chunks, so that you don't fill up memory, reuse the same variable. The previous contents of the variable will be garbage collected when you reassign it.
$chunk_size = 50000;
$number_of_chunks = ceil($data_size/$chunk_size);
for ($i = 0; $i < $data_size; $i += $chunk_size) {
$new_array = array();
foreach ($j = $i * $chunk_size; $j < min($j + chunk_size, $data_size); $j++) {
$new_array[] = get_data_item($j);
}
}
$new_array[$i] serves the same purpose as your proposed $new_array$i.
You could do something like this:
$required_number_of_arrays = ceil(count($data)/50000);
for ($i = 1;$i <= $required_number_of_arrays;$i++) {
$array_name = "new_array_$i";
$$array_name = [];
foreach ($data as $val) {
${$array_name}[] = $val;
}
}

array_key_exist() with array as key

I'm trying to check if in an array there is any value of another array. The function array_key_exist() looks like what I'm searching for, but I don't understand how to give the key value at the function as an array. Here's the code:
$risultato_query_controllo_numero = mysql_query($query_controllo_numero);
$voucher_esistenti = array();
while(($row = mysql_fetch_assoc($risultato_query_controllo_numero))) {
$voucher_esistenti[] = $row['numero'];
}
Which populates the first array with numbers:
$voucher = range($numero, $numero + $quantita);
Which populates the second array with numbers.
What I need to do now is to check if any of the value in $voucher is present in $voucher_presenti.
You can use the array_intersect function:
$overlap = array_intersect($voucher, $voucher_presenti);
You can find more examples in the documentation.
You could use the in_array() function to get the result you are looking for.
$arrayOne = range(1, 10);
$arrayTwo = range(5, 15);
foreach ($arrayOne as $value) {
if (in_array($value, $arrayTwo)) {
echo 'value '.$value.' is in the first and second array.<br />';
}
}
Resources
in_array() - Manual
in_array could be a good solution for your need, for example you can assign $voucher_esistenti only when you have a new value in the sql row.
$risultato_query_controllo_numero=mysql_query($query_controllo_numero);
$voucher_esistenti=array();
while(($row = mysql_fetch_assoc($risultato_query_controllo_numero))){
if(!in_array($row['numero'], $voucher_esistenti) {
$voucher_esistenti[] = $row['numero'];
}
} // this solution isn't optimal, because you will check subarrays with each new value
There's a better way to achieve that, by using a hashmap which has a complexity of O(1) ( best complexity :) )
$risultato_query_controllo_numero=mysql_query($query_controllo_numero);
$voucher_esistenti=array();
while(($row = mysql_fetch_assoc($risultato_query_controllo_numero))){
// here is what we changed, instead of key = value, we actually append keys
if(!isset($voucher_esistenti[$row['numero']]) {
$voucher_esistenti[$row['numero']] = true;
}
}
/*
The second implementation is a lot faster due to the algorithm, but you will have to change the reading of $voucher_esistenti array. */

Input CSV to Array assign keys

This works. It's sort-of a generic csv file importer and key assigner. Looking for feedback how this approach could be made more elegant. I started learning php last week. This forum is fantastic.
<?php
$csvfilename = "sdb.csv";
$filekeys = array('SSID','EquipName','EquipTypeSignalName','elecpropID');
$records = InputCsvFile($csvfilename,$filekeys);
function InputCsvFile($filename,$filekeys){
$array1 = array_map('str_getcsv', file($filename));
foreach($array1 as $element){
$int1 = 0;
unset($array3);
foreach($filekeys as $key){
$array3[$key] = $element[$int1];
$int1++;}
$array2[] = $array3;}
return $array2;}
?>
Using array_map() is clever, but since you have to further process each row, is somewhat unnecessary. I would rewrite InputCsvFile like this:
function InputCsvFile($filename, array $columns) {
$expectedCols = count($columns);
$arr = [];
// NOTE: Confirm the file actually exists before getting its contents
$rows = file($filename);
foreach($rows as $row) {
$row = str_getcsv($row);
if (count($row) == $expectedCols)) {
$arr[] = array_combine($filekeys, $element);
} else {
// Handle the column count mismatch. The test is required because
// otherwise, array_combine will complain loudly.
}
}
return $arr;
}
Alternatively, since you're dealing with files, you could loop on fgetcsv(), rather than using file() + str_getcsv(). Using fgetcsv() will use less memory (since the entire file doesn't have to be read in entirely and be kept in memory through the duration of the iteration), which may or may not be a concern depending on your file sizes.
array_combine() (which incidentally, is one of my favorite functions in PHP) creates a new array given arrays of keys (your list of columns in your $filekeys array) and values (the processed rows from the csv), and is practically tailor-made for turning csv files into more usable arrays.

Patterns for building multidimensional array with unique key

Problem:
Extract data from an object/array and represent this data using a multidimensional array with a unique key generated from the inner loop.
I always find myself building multidimensional arrays like this:
$final_array = array();
foreach ($table as $row) {
$key = null;
$data = array();
foreach ($row as $col => $val) {
/* Usually some logic goes here that does
some data transformation / concatenation stuff */
if ($col=='my_unique_key_name') {
$key = $val;
}
$data[$col] = $val;
}
if (!is_null($key) {
if (!isset($final_array[$key]) {
$final_array[$key] = array();
}
$final_array[$key][] = $data;
}
}
I can't help but wonder if I'm constantly doing this out of habit, but it feels kind of verbose with all the key-checking and whatnot. Is there a native function I am not utilizing? Can this be refactored into something more simple or am I overthinking this?
Why are you always doing that? Doesn't seem the common kind of stuff one works with on a day to day basis... Anyway, that's kinda cryptic (an example would be nice) but have you though of using an MD5 hash of the serialized dump of the array to uniquely define a key?
$key = md5(serialize($value));

Keeping an array sorted in PHP

I have a PHP script which reads a large CSV and performs certain actions, but only if the "username" field is unique. The CSV is used in more than one script, so changing the input from the CSV to only contain unique usernames is not an option.
The very basic program flow (which I'm wondering about) goes like this:
$allUsernames = array();
while($row = fgetcsv($fp)) {
$username = $row[0];
if (in_array($username, $allUsernames)) continue;
$allUsernames[] = $username;
// process this row
}
Since this CSV could actually be quite large, it's that in_array bit which has got me thinking. The most ideal situation when searching through an array for a member is if it is already sorted, so how would you build up an array from scratch, keeping it in order? Once it is in order, would there be a more efficient way to search it than using in_array(), considering that it probably doesn't know the array is sorted?
Not keeping the array in order, but how about this kind of optimization? I'm guessing isset() for an array key should be faster than in_array() search.
$allUsernames = array();
while($row = fgetcsv($fp)) {
$username = $row[0];
if (isset($allUsernames[$username])) {
continue;
} else {
$allUsernames[$username] = true;
// do stuff
}
}
The way to build up an array from scratch in sorted order is an insertion sort. In PHP-ish pseudocode:
$list = []
for ($element in $elems_to_insert) {
$index = binary_search($element, $list);
insert_into_list($element, $list, $index);
}
Although, it might actually turn out to be faster to just create the array in unsorted order and then use quicksort (PHP's builtin sort functions use quicksort)
And to find an element in a sorted list:
function binary_search($list, $element) {
$start = 0;
$end = count($list);
while ($end - $start > 1) {
$mid = ($start + $end) / 2;
if ($list[$mid] < $element){
$start = $mid;
}
else{
$end = $mid;
}
}
return $end;
}
With this implementation you'd have to test $list[$end] to see if it is the element you want, since if the element isn't in the array, this will find the point where it should be inserted. I did it that way so it'd be consistent with the previous code sample. If you want, you could check $list[$end] === $element in the function itself.
The array type in php is an ordered map (php array type). If you pass in either ints or strings as keys, you will have an ordered map...
Please review item #6 in the above link.
in_array() does not benefit from having a sorted array. PHP just walks along the whole array as if it were a linked list.

Categories