I have a PHP script which reads a large CSV and performs certain actions, but only if the "username" field is unique. The CSV is used in more than one script, so changing the input from the CSV to only contain unique usernames is not an option.
The very basic program flow (which I'm wondering about) goes like this:
$allUsernames = array();
while($row = fgetcsv($fp)) {
$username = $row[0];
if (in_array($username, $allUsernames)) continue;
$allUsernames[] = $username;
// process this row
}
Since this CSV could actually be quite large, it's that in_array bit which has got me thinking. The most ideal situation when searching through an array for a member is if it is already sorted, so how would you build up an array from scratch, keeping it in order? Once it is in order, would there be a more efficient way to search it than using in_array(), considering that it probably doesn't know the array is sorted?
Not keeping the array in order, but how about this kind of optimization? I'm guessing isset() for an array key should be faster than in_array() search.
$allUsernames = array();
while($row = fgetcsv($fp)) {
$username = $row[0];
if (isset($allUsernames[$username])) {
continue;
} else {
$allUsernames[$username] = true;
// do stuff
}
}
The way to build up an array from scratch in sorted order is an insertion sort. In PHP-ish pseudocode:
$list = []
for ($element in $elems_to_insert) {
$index = binary_search($element, $list);
insert_into_list($element, $list, $index);
}
Although, it might actually turn out to be faster to just create the array in unsorted order and then use quicksort (PHP's builtin sort functions use quicksort)
And to find an element in a sorted list:
function binary_search($list, $element) {
$start = 0;
$end = count($list);
while ($end - $start > 1) {
$mid = ($start + $end) / 2;
if ($list[$mid] < $element){
$start = $mid;
}
else{
$end = $mid;
}
}
return $end;
}
With this implementation you'd have to test $list[$end] to see if it is the element you want, since if the element isn't in the array, this will find the point where it should be inserted. I did it that way so it'd be consistent with the previous code sample. If you want, you could check $list[$end] === $element in the function itself.
The array type in php is an ordered map (php array type). If you pass in either ints or strings as keys, you will have an ordered map...
Please review item #6 in the above link.
in_array() does not benefit from having a sorted array. PHP just walks along the whole array as if it were a linked list.
Related
I'm trying to check if in an array there is any value of another array. The function array_key_exist() looks like what I'm searching for, but I don't understand how to give the key value at the function as an array. Here's the code:
$risultato_query_controllo_numero = mysql_query($query_controllo_numero);
$voucher_esistenti = array();
while(($row = mysql_fetch_assoc($risultato_query_controllo_numero))) {
$voucher_esistenti[] = $row['numero'];
}
Which populates the first array with numbers:
$voucher = range($numero, $numero + $quantita);
Which populates the second array with numbers.
What I need to do now is to check if any of the value in $voucher is present in $voucher_presenti.
You can use the array_intersect function:
$overlap = array_intersect($voucher, $voucher_presenti);
You can find more examples in the documentation.
You could use the in_array() function to get the result you are looking for.
$arrayOne = range(1, 10);
$arrayTwo = range(5, 15);
foreach ($arrayOne as $value) {
if (in_array($value, $arrayTwo)) {
echo 'value '.$value.' is in the first and second array.<br />';
}
}
Resources
in_array() - Manual
in_array could be a good solution for your need, for example you can assign $voucher_esistenti only when you have a new value in the sql row.
$risultato_query_controllo_numero=mysql_query($query_controllo_numero);
$voucher_esistenti=array();
while(($row = mysql_fetch_assoc($risultato_query_controllo_numero))){
if(!in_array($row['numero'], $voucher_esistenti) {
$voucher_esistenti[] = $row['numero'];
}
} // this solution isn't optimal, because you will check subarrays with each new value
There's a better way to achieve that, by using a hashmap which has a complexity of O(1) ( best complexity :) )
$risultato_query_controllo_numero=mysql_query($query_controllo_numero);
$voucher_esistenti=array();
while(($row = mysql_fetch_assoc($risultato_query_controllo_numero))){
// here is what we changed, instead of key = value, we actually append keys
if(!isset($voucher_esistenti[$row['numero']]) {
$voucher_esistenti[$row['numero']] = true;
}
}
/*
The second implementation is a lot faster due to the algorithm, but you will have to change the reading of $voucher_esistenti array. */
This works. It's sort-of a generic csv file importer and key assigner. Looking for feedback how this approach could be made more elegant. I started learning php last week. This forum is fantastic.
<?php
$csvfilename = "sdb.csv";
$filekeys = array('SSID','EquipName','EquipTypeSignalName','elecpropID');
$records = InputCsvFile($csvfilename,$filekeys);
function InputCsvFile($filename,$filekeys){
$array1 = array_map('str_getcsv', file($filename));
foreach($array1 as $element){
$int1 = 0;
unset($array3);
foreach($filekeys as $key){
$array3[$key] = $element[$int1];
$int1++;}
$array2[] = $array3;}
return $array2;}
?>
Using array_map() is clever, but since you have to further process each row, is somewhat unnecessary. I would rewrite InputCsvFile like this:
function InputCsvFile($filename, array $columns) {
$expectedCols = count($columns);
$arr = [];
// NOTE: Confirm the file actually exists before getting its contents
$rows = file($filename);
foreach($rows as $row) {
$row = str_getcsv($row);
if (count($row) == $expectedCols)) {
$arr[] = array_combine($filekeys, $element);
} else {
// Handle the column count mismatch. The test is required because
// otherwise, array_combine will complain loudly.
}
}
return $arr;
}
Alternatively, since you're dealing with files, you could loop on fgetcsv(), rather than using file() + str_getcsv(). Using fgetcsv() will use less memory (since the entire file doesn't have to be read in entirely and be kept in memory through the duration of the iteration), which may or may not be a concern depending on your file sizes.
array_combine() (which incidentally, is one of my favorite functions in PHP) creates a new array given arrays of keys (your list of columns in your $filekeys array) and values (the processed rows from the csv), and is practically tailor-made for turning csv files into more usable arrays.
If you have any array $p that you populated in a loop like so:
$p[] = array( "id"=>$id, "Name"=>$name);
What's the fastest way to search for John in the Name key, and if found, return the $p index? Is there a way other than looping through $p?
I have up to 5000 names to find in $p, and $p can also potentially contain 5000 rows. Currently I loop through $p looking for each name, and if found, parse it (and add it to another array), splice the row out of $p, and break 1, ready to start searching for the next of the 5000 names.
I was wondering if there if a faster way to get the index rather than looping through $p eg an isset type way?
Thanks for taking a look guys.
Okay so as I see this problem, you have unique ids, but the names may not be unique.
You could initialize the array as:
array($id=>$name);
And your searches can be like:
array_search($name,$arr);
This will work very well as native method of finding a needle in a haystack will have a better implementation than your own implementation.
e.g.
$id = 2;
$name= 'Sunny';
$arr = array($id=>$name);
echo array_search($name,$arr);
Echoes 2
The major advantage in this method would be code readability.
If you know that you are going to need to perform many of these types of search within the same request then you can create an index array from them. This will loop through the array once per index you need to create.
$piName = array();
foreach ($p as $k=>$v)
{
$piName[$v['Name']] = $k;
}
If you only need to perform one or two searches per page then consider moving the array into an external database, and creating the index there.
$index = 0;
$search_for = 'John';
$result = array_reduce($p, function($r, $v) use (&$index, $search_for) {
if($v['Name'] == $search_for) {
$r[] = $index;
}
++$index;
return $r;
});
$result will contain all the indices of elements in $p where the element with key Name had the value John. (This of course only works for an array that is indexed numerically beginning with 0 and has no “holes” in the index.)
Edit: Possibly even easier to just use array_filter, but that will not return the indices only, but all array element where Name equals John – but indices will be preserved:
$result2 = array_filter($p, function($elem) {
return $elem["Name"] == "John" ? true : false;
});
var_dump($result2);
What suits your needs better, resp. which one is maybe faster, is for you to figure out.
I have a PHP array that I'd like to duplicate but only copy elements from the array whose keys appear in another array.
Here are my arrays:
$data[123] = 'aaa';
$data[423] = 'bbb';
$data[543] = 'ccc';
$data[231] = 'ddd';
$data[642] = 'eee';
$data[643] = 'fff';
$data[712] = 'ggg';
$data[777] = 'hhh';
$keys_to_copy[] = '123';
$keys_to_copy[] = '231';
$keys_to_copy[] = '643';
$keys_to_copy[] = '712';
$keys_to_copy[] = '777';
$copied_data[123] = 'aaa';
$copied_data[231] = 'ddd';
$copied_data[643] = 'fff';
$copied_data[712] = 'ggg';
$copied_data[777] = 'hhh';
I could just loop through the data array like this:
foreach ($data as $key => $value) {
if ( in_array($key, $keys_to_copy)) {
$copied_data[$key] = $value;
}
}
But this will be happening inside a loop which is retrieving data from a MySQL result set. So it would be a loop nested within a MySQL data loop.
I normally try and avoid nested loops unless there's no way of using PHP's built-in array functions to get the result I'm looking for.
But I'm also weary of having a nested loop within a MySQL data loop, I don't want to keep MySQL hanging around.
I'm probably worrying about nested loop performance unnecessarily as I'll never be doing this for more than a couple of hundred rows of data and maybe 10 keys.
But I'd like to know if there's a way of doing this with built-in PHP functions.
I had a look at array_intesect_key() but that doesn't quite do it, because my $keys_to_copy array has my desired keys as array values rather than keys.
Anyone got any ideas?
Cheers, B
I worked it out - I almost had it above.I thought I'd post the answer anyway for completeness. Hope this helps someone out!
array_intersect_key($data, array_flip($keys_to_copy))
Use array_flip() to switch $keys_to_copy so it can be used within array_intersect_keys()
I'll run some tests to compare performance between the manual loop above, to this answer. I would expect the built-in functions to be faster but they might be pretty equal. I know arrays are heavily optimised so I'm sure it will be close.
EDIT:
I have run some benchmarks using PHP CLI to compare the foreach() code in my question with the code in my answer above. The results are quite astounding.
Here's the code I used to benchmark, which I think is valid:
<?php
ini_set('max_execution_time', 0);//NOT NEEDED FOR CLI
// BUILD RANDOM DATA ARRAY
$data = array();
while ( count($data) <= 200000) {
$data[rand(0, 500000)] = rand(0, 500000);
}
$keys_to_copy = array_rand($data, 100000);
// FOREACH
$timer_start = microtime(TRUE);
foreach ($data as $key => $value) {
if ( in_array($key, $keys_to_copy)) {
$copied_data[$key] = $value;
}
}
echo 'foreach: '.(microtime(TRUE) - $timer_start)."s\r\n";
// BUILT-IN ARRAY FUNCTIONS
$timer_start = microtime(TRUE);
$copied_data = array_intersect_key($data, array_flip($keys_to_copy));
echo 'built-in: '.(microtime(TRUE) - $timer_start)."s\r\n";
?>
And the results...
foreach: 662.217s
array_intersect_key: 0.099s
So it's much faster over loads of array elements to use the PHP array functions rather than foreach. I thought it would be faster but not by that much!
Why not load the entire result set into an array, then begin processing with nested loops?
$query_result = mysql_query($my_query) or die(mysql_error());
$query_rows = mysql_num_rows($query_result);
for ($i = 0; $i < $query_rows; $i++)
{
$row = mysql_fetch_assoc($query_result);
// 'key' is the name of the column containing the data key (123)
// 'value' is the name of the column containing the value (aaa)
$data[$row['key']] = $row['value'];
}
foreach ($data as $key => $value)
{
if ( in_array($key, $keys_to_copy))
{
$copied_data[$key] = $value;
}
}
I have a large array.
In this array I have got (among many other things) a list of products:
$data['product_name_0'] = '';
$data['product_desc_0'] = '';
$data['product_name_1'] = '';
$data['product_desc_1'] = '';
This array is provided by a third party (so I have no control over this).
It is not known how many products there will be in the array.
What would be a clean way to loop though all the products?
I don't want to use a foreach loop since it will also go through all the other items in the (large) array.
I cannot use a for loop cause I don't know (yet) how many products the array contains.
I can do a while loop:
$i = 0;
while(true) { // doing this feels wrong, although it WILL end at some time (if there are no other products)
if (!array_key_exists('product_name_'.$i, $data)) {
break;
}
// do stuff with the current product
$i++;
}
Is there a cleaner way of doing the above?
Doing a while(true) looks stupid to me or is there nothing wrong with this approach.
Or perhaps there is another approach?
Your method works, as long as the numeric portions are guaranteed to be sequential. If there's gaps, it'll miss anything that comes after the first gap.
You could use something like:
$names = preg_grep('/^product_name_\d+$/', array_keys($data));
which'll return all of the 'name' keys from your array. You'd extract the digit portion from the key name, and then can use that to refer to the 'desc' section as well.
foreach($names as $name_field) {
$id = substr($names, 12);
$name_val = $data["product_name_{$id}"];
$desc_val = $data["product_desc_{$id}"];
}
How about this
$i = 0;
while(array_key_exists('product_name_'.$i, $data)) {
// loop body
$i++;
}
I think you're close. Just put the test in the while condition.
$i = 0;
while(array_key_exists('product_name_'.$i, $data)) {
// do stuff with the current product
$i++;
}
You might also consider:
$i = 0;
while(isset($data['product_name_'.$i])) {
// do stuff with the current product
$i++;
}
isset is slightly faster than array_key_exists but does behave a little different, so may or may not work for you:
What's quicker and better to determine if an array key exists in PHP?
Difference between isset and array_key_exists