Manipulating data to matrix-like format in PHP - php

Given input, which shows tag assignments to images, as follows (reading this from php://stdin line by line, as the input can get rather large)
image_a tag_lorem
image_a tag_ipsum
image_a tag_amit
image_b tag_sit
image_b tag_dolor
image_b tag_ipsum
... (there are more lines, may get up to a million)
Output of the input is shown as follows. Basically it is the same format with another entry showing whether the image-tag combination exists in input. Note that for every image, it will list all the available tags and show whether the tag is assigned to the image by using 1/0 at the end of each line.
image_a tag_sit 0
image_a tag_lorem 1
image_a tag_dolor 0
image_a tag_ipsum 1
image_a tag_amit 1
image_b tag_sit 1
image_b tag_lorem 0
image_b tag_dolor 1
image_b tag_ipsum 1
image_b tag_amit 0
... (more)
I have posted my no-so-efficient solution down there. To give a better picture of input and output, I fed 745 rows (which explains tag assignment of 10 images) into the script via stdin, and I receive 555025 lines after the execution of the script using about 0.4MB of memory. However, it may kill the harddisk faster because of the heavy disk I/O activity (while writing/reading to the temporary column cache file).
Is there any other way of doing this? I have another script that can turn the stdin into something like this (not sure if this is useful)
image_foo tag_lorem tag_ipsum tag_amit
image_bar tag_sit tag_dolor tag_ipsum
p/s: order of tag_* is not important, but it has to be the same for all rows, i.e. this is not what i want (notice the order of tag_* is inconsistent for both tag_a and tag_b)
image_foo tag_lorem 1
image_foo tag_ipsum 1
image_foo tag_dolor 0
image_foo tag_sit 0
image_foo tag_amit 1
image_bar tag_sit 1
image_bar tag_lorem 0
image_bar tag_dolor 1
image_bar tag_ipsum 1
image_bar tag_amit 0
p/s2: I don't know the range of tag_* until i finish reading stdin
p/s3: I don't understand why I get down-voted, if clarification is needed I am more than happy to provide them, I am not trying to make fun of something or posting nonsense here. I have re-written the question again to make it sound more like a real problem (?). However, the script really doesn't have to care about what the input really is or whether database is used (well, the data is retrieved from an RDF data store if you MUST know) because I want the script to be usable for other type of data as long as the input is in right format (hence the original version of this question was very general).
p/s4: I am trying to avoid using array because I want to avoid out of memory error as much as possible (if 745 lines expaining just 10 images will be expanded into 550k lines, just imagine I have 100, 1000, or even 10000+ images).
p/s5: if you have answer in other language feel free to post it here. I have thought of solving this using clojure but still couldn't find a way to do it properly.

Sorry, maby I misunderstood you - this looks too easy:
$stdin = fopen('php://stdin', 'r');
$columns_arr=array();
$rows_arr=array();
function set_empty_vals(&$value,$key,$columns_arr) {
$value=array_merge($columns_arr,$value);
ksort($value);
foreach($value AS $val_name => $flag) {
echo $key.' '.$val_name.' '.$flag.PHP_EOL;
}
$value=NULL;
}
while ($line = fgets($stdin)) {
$line=trim($line);
list($row,$column)=explode(' ',$line);
$row=trim($row);
$colum=trim($column);
if(!isset($rows_arr[$row]))
$rows_arr[$row]=array();
$rows_arr[$row][$column]=1;
$columns_arr[$column]=0;
}
array_walk($rows_arr,'set_empty_vals',$columns_arr);
UPD:
1 million lines is easy for php:
$columns_arr = array();
$rows_arr = array();
function set_null_arr(&$value, $key, $columns_arr) {
$value = array_merge($columns_arr, $value);
ksort($value);
foreach($value AS $val_name => $flag) {
//echo $key.' '.$val_name.' '.$flag.PHP_EOL;
}
$value=NULL;
}
for ($i = 0; $i < 100000; $i++) {
for ($j = 0; $j < 10; $j++) {
$row='row_foo'.$i;
$column='column_ipsum'.$j;
if (!isset($rows_arr[$row]))
$rows_arr[$row] = array();
$rows_arr[$row][$column] = 1;
$columns_arr[$column] = 0;
}
}
array_walk($rows_arr, 'set_null_arr', $columns_arr);
echo memory_get_peak_usage();
147Mb for me.
Last UPD - this is how I see low memory usage(but rather fast) script:
//Approximate stdin buffer size, 1Mb should be good
define('MY_STDIN_READ_BUFF_LEN', 1048576);
//Approximate tmpfile buffer size, 1Mb should be good
define('MY_TMPFILE_READ_BUFF_LEN', 1048576);
//Custom stdin line delimiter(\r\n, \n, \r etc.)
define('MY_STDIN_LINE_DELIM', PHP_EOL);
//Custom stmfile line delimiter - chose smallset possible
define('MY_TMPFILE_LINE_DELIM', "\n");
//Custom stmfile line delimiter - chose smallset possible
define('MY_OUTPUT_LINE_DELIM', "\n");
function my_output_arr($field_name,$columns_data) {
ksort($columns_data);
foreach($columns_data AS $column_name => $column_flag) {
echo $field_name.' '.$column_name.' '.$column_flag.MY_OUTPUT_LINE_DELIM;
}
}
$tmpfile=tmpfile() OR die('Can\'t create/open temporary file!');
$buffer_len = 0;
$buffer='';
//I don't think there is a point to save columns array in file -
//it should be small enough to hold in memory.
$columns_array=array();
//Open stdin for reading
$stdin = fopen('php://stdin', 'r') OR die('Failed to open stdin!');
//Main stdin reading and tmp file writing loop
//Using fread + explode + big buffer showed great performance boost
//in comparison with fgets();
while ($read_buffer = fread($stdin, MY_STDIN_READ_BUFF_LEN)) {
$lines_arr=explode(MY_STDIN_LINE_DELIM,$buffer.$read_buffer);
$read_buffer='';
$lines_arr_size=count($lines_arr)-1;
$buffer=$lines_arr[$lines_arr_size];
for($i=0;$i<$lines_arr_size;$i++) {
$line=trim($lines_arr[$i]);
//There must be a space in each line - we break in it
if(!strpos($line,' '))
continue;
list($row,$column)=explode(' ',$line,2);
$columns_array[$column]=0;
//Save line in temporary file
fwrite($tmpfile,$row.' '.$column.MY_TMPFILE_LINE_DELIM);
}
}
fseek($tmpfile,0);
$cur_row=NULL;
$row_data=array();
while ($read_buffer = fread($tmpfile, MY_TMPFILE_READ_BUFF_LEN)) {
$lines_arr=explode(MY_TMPFILE_LINE_DELIM,$buffer.$read_buffer);
$read_buffer='';
$lines_arr_size=count($lines_arr)-1;
$buffer=$lines_arr[$lines_arr_size];
for($i=0;$i<$lines_arr_size;$i++) {
list($row,$column)=explode(' ',$lines_arr[$i],2);
if($row!==$cur_row) {
//Output array
if($cur_row!==NULL)
my_output_arr($cur_row,array_merge($columns_array,$row_data));
$cur_row=$row;
$row_data=array();
}
$row_data[$column]=1;
}
}
if(count($row_data)&&$cur_row!==NULL) {
my_output_arr($cur_row,array_merge($columns_array,$row_data));
}

Here's a MySQL example that works with your supplied test data:
CREATE TABLE `url` (
`url1` varchar(255) DEFAULT NULL,
`url2` varchar(255) DEFAULT NULL,
KEY `url1` (`url1`),
KEY `url2` (`url2`)
);
INSERT INTO url (url1, url2) VALUES
('image_a', 'tag_lorem'),
('image_a', 'tag_ipsum'),
('image_a', 'tag_amit'),
('image_b', 'tag_sit'),
('image_b', 'tag_dolor'),
('image_b', 'tag_ipsum');
SELECT url1, url2, assigned FROM (
SELECT t1.url1, t1.url2, 1 AS assigned
FROM url t1
UNION
SELECT t1.url1, t2.url2, 0 AS assigned
FROM url t1
JOIN url t2
ON t1.url1 != t2.url1
JOIN url t3
ON t1.url1 != t3.url1
AND t1.url2 = t3.url2
AND t2.url2 != t3.url2 ) tmp
ORDER BY url1, url2;
Result:
+---------+-----------+----------+
| url1 | url2 | assigned |
+---------+-----------+----------+
| image_a | tag_amit | 1 |
| image_a | tag_dolor | 0 |
| image_a | tag_ipsum | 1 |
| image_a | tag_lorem | 1 |
| image_a | tag_sit | 0 |
| image_b | tag_amit | 0 |
| image_b | tag_dolor | 1 |
| image_b | tag_ipsum | 1 |
| image_b | tag_lorem | 0 |
| image_b | tag_sit | 1 |
+---------+-----------+----------+
This should be simple enough to convert to SQLite, so if required you could use PHP to read the data into a temporary SQLite database, and then extract the results.

Put your input data in array and then sort them by using usort, define comparison function which compares array elements by row values and then column values if row values are equal.

This is my current implementation, I don't like it, but it does the job for now.
#!/usr/bin/env php
<?php
define('CACHE_MATCH', 0);
define('CACHE_COLUMN', 1);
define('INPUT_ROW', 0);
define('INPUT_COLUMN', 1);
define('INPUT_COUNT', 2);
output_expanded_entries(
cache_input(array(tmpfile(), tmpfile()), STDIN, fgets(STDIN))
);
echo memory_get_peak_usage();
function cache_input(Array $cache_files, $input_pointer, $input) {
if(count($cache_files) != 2) {
throw new Exception('$cache_files requires 2 file pointers');
}
if(feof($input_pointer) == FALSE) {
cache_match($cache_files[CACHE_MATCH], trim($input));
cache_column($cache_files[CACHE_COLUMN], process_line($input));
cache_input(
$cache_files,
$input_pointer,
fgets($input_pointer)
);
}
return $cache_files;
}
function cache_column($cache_column, $input) {
if(empty($input) === FALSE) {
rewind($cache_column);
$column = get_field($input, INPUT_COLUMN);
if(column_cached_in_memory($column) === FALSE && column_cached_in_file($cache_column, fgets($cache_column), $column) === FALSE) {
fputs($cache_column, $column . PHP_EOL);
}
}
}
function cache_match($cache_match, $input) {
if(empty($input) === FALSE) {
fputs($cache_match, $input . PHP_EOL);
}
}
function column_cached_in_file($cache_column, $current, $column, $result = FALSE) {
return $result === FALSE && feof($cache_column) === FALSE ?
column_cached_in_file($cache_column, fgets($cache_column), $column, $column == $current)
: $result;
}
function column_cached_in_memory($column) {
static $local_cache = array(), $index = 0, $count = 500;
$result = TRUE;
if(in_array($column, $local_cache) === FALSE) {
$result = FALSE;
$local_cache[$index++ % $count] = $column;
}
return $result;
}
function output_expanded_entries(Array $cache_files) {
array_map('rewind', $cache_files);
for($current_row = NULL, $cache = array(); feof($cache_files[CACHE_MATCH]) === FALSE;) {
$input = process_line(fgets($cache_files[CACHE_MATCH]));
if(empty($input) === FALSE) {
if($current_row !== get_field($input, INPUT_ROW)) {
output_cache($current_row, $cache);
$cache = read_columns($cache_files[CACHE_COLUMN]);
$current_row = get_field($input, INPUT_ROW);
}
$cache = array_merge(
$cache,
array(get_field($input, INPUT_COLUMN) => get_field($input, INPUT_COUNT))
);
}
}
output_cache($current_row, $cache);
}
function output_cache($row, $column_count_list) {
if(count($column_count_list) != 0) {
printf(
'%s %s %s%s',
$row,
key(array_slice($column_count_list, 0, 1)),
current(array_slice($column_count_list, 0, 1)),
PHP_EOL
);
output_cache($row, array_slice($column_count_list, 1));
}
}
function get_field(Array $input, $field) {
$result = NULL;
if(in_array($field, array_keys($input))) {
$result = $input[$field];
} elseif($field == INPUT_COUNT) {
$result = 1;
}
return $result;
}
function process_line($input) {
$result = trim($input);
return empty($result) === FALSE && strpos($result, ' ') !== FALSE ?
explode(' ', $result)
: NULL;
}
function push_column($input, Array $result) {
return empty($input) === FALSE && is_array($input) ?
array_merge(
$result,
array(get_field($input, INPUT_COLUMN))
)
: $result;
}
function read_columns($cache_columns) {
rewind($cache_columns);
$result = array();
while(feof($cache_columns) === FALSE) {
$column = trim(fgets($cache_columns));
if(empty($column) === FALSE) {
$result[$column] = 0;
}
}
return $result;
}
EDIT: yesterday's version was bugged :/

Related

Value of previous row in CSV import via PHP

I'm importing data from a CSV via PHP.
On some lines of the CSV, column 1 has a value, and on other lines it does not have a value. Column 2 always has a value.
If column 1 is empty and column 2 has a value, I would like column 1 to use the previous value in column 1. For example
|-------|-------|
| Col 1 | Col 2 |
|-------|-------|
| A | 2 |
| B | 5 |
| C | 3 |
| | 1 |
| D | 7 |
Would return
A2
B5
C3
C1
D7
I can use prev() on an array, but as $data[0] is a string I can't use prev()
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
if ($i > 0) {
if ($data[0]=='' && $data[1]!=='') {
echo prev($data[0]).$data[1]'<br/>';
} else {
echo $data[0].$data[1]'<br/>';
}
}
$i++;
}
Banging my head against a wall for hours now, so I'd love a pointer!
This should work as long as the first row is not empty...
$x = null;
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
if ($i > 0) {
$x = $data[0] == '' ? $x : $data[0];
// this is like: if $data[0] is empty then use $x else say $x is $data[0].
// Like that you can keep track of the previous value.
// If there is one overwrite with the new one else keep the old value
echo $x.$data[1]'<br/>';
}
$i++;
}
You can store previous value in the variable and use that
$previousColumn1 = '';
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
if ($i > 0) {
if ($data[0]=='' && $data[1]!=='') {
echo $previousColumn1.$data[1]'<br/>';
} else {
$previousColumn1 = $data[0];
echo $data[0].$data[1]'<br/>';
}
}
$i++;
}
As has been mentioned, just storing the previous value and overwriting it any time a new value is given will do the job. It will assume that if the first row is blank that the $prev value you provide outside the loop will be used...
$prev = '';
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$prev = $data[0]?:$prev;
echo $prev.$data[1].PHP_EOL;
}
This uses the short form of ?: to say if there is a value for $data[0] - then use it, otherwise use $prev.

How to sort CSV table data in PHP by a specific column?

Here's the CSV table:
-------------------------
| Name | Age | Favorite |
-------------------------
| John | 30 | Apple |
-------------------------
| Bill | 25 | Grape |
-------------------------
| Ann | 40 | Orange |
-------------------------
Now, using strictly PHP, is there anyway to sort only the "Favorite" by ascending order of "Age"? An expected output would be something like this:
25 Grape
30 Apple
40 Orange
I've been using fgetcsv to echo them onto the document, but they are not not sorted by ascending age, of course. Is there anyway to throw these in an array or something, sort by age, and then echo?
To open up your CSV file:
function readCSV($file)
{
$row = 0;
$csvArray = array();
if( ( $handle = fopen($file, "r") ) !== FALSE ) {
while( ( $data = fgetcsv($handle, 0, ";") ) !== FALSE ) {
$num = count($data);
for( $c = 0; $c < $num; $c++ ) {
$csvArray[$row][] = $data[$c];
}
$row++;
}
}
if( !empty( $csvArray ) ) {
return array_splice($csvArray, 1); //cut off the first row (names of the fields)
} else {
return false;
}
}
$csvData = readCSV($csvPath); //This is your array with the data
Then you could use array_multisort() to sort it on a value.
<?php
// Obtain a list of columns
foreach ($csvData as $key => $row) {
$age[$key] = $row['volume'];
$favorite[$key] = $row['edition'];
}
// Sort the data with age first, then favorite
// Add $csvData as the last parameter, to sort by the common key
array_multisort($age, SORT_ASC, $favorite, SORT_ASC, $csvData);
?>

PHP How can I combine two CSVs based on common fields in a rows of each csv?

I have two CSVs:
1st csv is something like this:
number | color | size | animal
**1234** | black | big | cat
2nd csv is like this:
name | country | os | number | flavour | yesorno
john | world | windows | **1234** | good | yes
What I'm trying to do is to merge both CSV's (header titles and values of each row) based on matching number values:
number | color | size | animal | name | country | os | flavour | yesorno
**1234** | black | big | cat | john | world | windows | good | yes
I have been trying to use fgetcsv and used keys but I am really a newbie to php and I do not know how to that. I need to understand the logic. Could anyone please help?
Many thanks.
-- edit for better udnerstanding --
based on another question on stackoverflow, I have tryed the following code, which is not working well. The two headers from both CSV files is not merged. It is also missing all the data from a csv, it only has one row of data merged.
Code was found in another question: Merging two csv files together using php and seemed like a perfect base for what I am trying to achieve. Unfortunately the outputed csv is malformed...
<?php
// 1st section
$fh = fopen('csv1.csv', 'r');
$fhg = fopen('csv2.csv', 'r');
while (($data = fgetcsv($fh, 0, ";")) !== FALSE) {
$csv1[] = $data;
}
while (($data = fgetcsv($fhg, 0, ",")) !== FALSE) {
$csv2[] = $data;
}
// 2nd section
for ($x = 0; $x < count($csv2data); $x++) {
if ($x == 0) {
unset($csv1data[0][17]);
$line[$x] = array_merge($csv2data[0], $csv1data[17]); //header
} else {
$deadlook = 0;
for ($y = 0; $y <= count($csv1data); $y++) {
if($csv1data[$y][17] == $csv2data[$x][0]){
unset($csv1data[$y][17]);
$line[$x]=array_merge($csv2data[$x],$csv1data[$y]);
$deadlook=1;
}
}
if ($deadlook == 0)
$line[$x] = $csv2data[$x];
}
}
// 3 section
$fp = fopen('final.csv', 'w'); //output file set here
foreach ($line as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);
?>
$fh = fopen('csv1', 'r');
$fhg = fopen('csv2', 'r');
while (($data = fgetcsv($fh, 0, ",")) !== FALSE) {
$csv1[]=$data;
}
while (($data = fgetcsv($fhg, 0, ",")) !== FALSE) {
$csv2[]=$data;
}
// 2nd section
for($x=0;$x< count($csv2);$x++)
{
if($x==0){
unset($csv1[0][0]);
$line[$x]=array_merge($csv2[0],$csv1[0]); //header
}
else{
$deadlook=0;
for($y=0;$y <= count($csv1);$y++)
{
if($csv1[$y][0] == $csv2[$x][0]){
unset($csv1[$y][0]);
$line[$x]=array_merge($csv2[$x],$csv1[$y]);
$deadlook=1;
}
}
if($deadlook==0)
$line[$x]=$csv2[$x];
}
}
// 3 section
$fp = fopen('final.csv', 'w');//output file set here
foreach ($line as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);
In section 1
open those 2 file and put the content in an array $csv1[] and $csv2[], so $csv1[0] is the first line in csv1.csv, $csv1[1] is the second line in csv1.csv, etc, the same in $csv2[].
In section 2,
i compare the first word in every array, $csv1[x][0] and $csv2[x][0].
the first if, if($x==0), is to make the header. first, i delete the first word contractname using unset function and join $csv1[0] and $csv2[0] using array_merge function.
then, using those for i select the first word in every line from $csv2 array and compare with the first word from every line from $csv1 array. so, if($csv1[$y][0] == $csv2[$x][0]) check if those first word are the same, if those are the same string, delete it and merge those lines.
if those word arent the same, save the $csv2[x] line in $line array and continue.
In section 3,
save the content from $line array in the file.
$fromFile = fopen($REcsv,'r');
$fromFile2 = fopen($pathToFile,'r');
$toFile = fopen($reportFile,'w');
$delimiter = ';';
$line = fgetcsv($fromFile, 65536, $delimiter);
$line2 = fgetcsv($fromFile2, 65536, $delimiter);
while ($line !== false) {
while ($line2 !== false) {
fputcsv($toFile, array_merge($line2,$line), $delimiter);
$line = fgetcsv($fromFile, 65536, $delimiter);
$line2 = fgetcsv($fromFile2, 65536, $delimiter);
}
}
fclose($fromFile);
fclose($fromFile2);
fclose($toFile);

Grouping similar records with php

I need help writing the logic of the php script which sorts data into a certain format...
Firstly the script needs to loop through each s1 value and ping an endpoint to get the ml values (more like) which are actually referencing other s1 records. this is the easy part! the data is returned like so;
Table 1
s1 | ml
----------
1 | -
2 | 3,4
3 | 2,8,9
4 | -
5 | 2
6 | 1
7 | 10
8 | -
9 | -
10 | -
Condition 1: As you can see the endpoint returns data for the s1 value telling it other s1 records are similar to other ones, but the direction of ml is not always bidirectional. sometimes, like when s1=6 the ml value is 1 however when s1=1 there isn't a ml value.
Condition 2: Again just to explain the ml records, look above and below where s1=5 (above) and where s1=2 + rec=5 (below), this script needs to realise there is already an s1 record for it's value and that it should be added there.
Condition 3: Note how when s1=2,ml=3 this is stored but when s1=3,ml=2 this is ignored because we have the reverse record.
I basically want to match all the data into 1 sorted 'profile' so it ends up in the below format which i will store in another db table of 'sorted' records.
Table 2
s1 | rec
----------
2 | 3
2 | 4
2 | 8
2 | 9
2 | 9
2 | 5
6 | 1
7 | 10
This has been racking my brains for days now, I need something thats efficient because in the end it will deal with millions of records and I'm sure there is an easy solution but i just can't figure how to start it.
I tried the following, but I'm stuck and don't know how to go further;
public function getrelated($id='', $t=''){
if($id != ""){
$get = Easytest::where('s1','=',$id)->get();
if(count($get) > 0){
$ret= array();
foreach($get as $go){
$v = explode(",", $go->s2);
foreach ($v as $e) {
if($e != $t){
$ret[$e] = $this->getrelated($e, $id);
}
}
}
if(count($ret) > 0){
return $ret;
}else{
return "";
}
}else{
return $id;
}
}else{
return "";
}
}
public function easytest(){
ob_start();
$a = array(
array("s1"=>1,"s2"=>implode(",",array()).""),
array("s1"=>2,"s2"=>implode(",",array(3,4)).","),
array("s1"=>3,"s2"=>implode(",",array(2,8,9)).","),
array("s1"=>4,"s2"=>implode(",",array()).""),
array("s1"=>5,"s2"=>implode(",",array(2)).","),
array("s1"=>6,"s2"=>implode(",",array(1)).","),
array("s1"=>7,"s2"=>implode(",",array(10)).","),
array("s1"=>8,"s2"=>implode(",",array()).""),
array("s1"=>9,"s2"=>implode(",",array()).""),
array("s1"=>10,"s2"=>implode(",",array()).""),
array("s1"=>11,"s2"=>implode(",",array(12)).","),
array("s1"=>12,"s2"=>implode(",",array(2)).",")
);
//return Easytest::insert($a);
$records = Easytest::all();
foreach ($records as $record) {
$id = $record->s1;
echo "ROW: ".$id." > ";
$record->s2 = ltrim($record->s2,",");
$ml = explode(",",$record->s2);
if(count($ml) >= 1){
foreach ($ml as $t) {
echo "RESULT: ".$t." -".print_r($this->getrelated($t, $id), true);
echo ",\n";
}
}
echo " <br><br>\n\n";
}
return ob_get_clean();
}
Ok, so I eventually solved this... esentially this is the code below;
improvements welcome :)
You need to call the function like so
related(array('searched'=>array(),'tosearch'=>array(13)));
function:
public function related($input){
$searched = $input['searched'];
$ar = array();
$bits = array();
if(count($input['tosearch']) != 0){
$get = Easytest::orWhere(function($query) use ($input)
{
foreach ($input['tosearch'] as $k=>$v) {
$query->orWhere('s2', 'LIKE', '%,'.$v.',%')->orWhere('s1', '=', $v);
}
})
->orderBy('s1', 'ASC')->get();
foreach ($input['tosearch'] as $k=>$v) {
unset($input['tosearch'][$k]);
$input['searched'][$v] = $v;
}
foreach ($get as $result) {
$thesebits = explode(",", trim($result->s2,","));
foreach ($thesebits as $smallbit) {
if($smallbit != ""){
$bits[] = $smallbit;
}
}
$bits[] = $result->s1;
$bits = array_unique($bits);
foreach ($bits as $k=>$v) {
if(($key = array_search($v, $input['searched'])) == false) {
$input['tosearch'][$v] = $v;
}else{
unset($input['tosearch'][$v]);
}
}
$input['tosearch'] = array_unique($input['tosearch']);
}
return $this->related($input);
}else{
return $input;
}
}

Slow php array processing

My question might be a bit vague, because I cannot quite figure it out.
I have a piece of PHP that tries to convert a mysql query result into an array "tree".
I.e. arrays of arrays depending on the defined groups.
The code assumes that a column name would start with a double underscore __ to indicate grouping and the results will already be ordered by the grouping.
The code works , but in certain cases it slows down to unusable speeds.
Cases which I would expect it to be fast. Only one grouping with only a few unique values and many items in each branch sometimes takes upto 30seconds.
Where other cases with many layers of branches and many different values , it only takes 1 second. (The result set is usually around 20 000 rows)
So, my question I guess is simply, what is wrong with my code ? Where am messing up so bad that it would impact performance significantly.
P.S. I'm a relative php novice , so be gentle :)
Sorry, no code comments O_o
$encodable = array();
$rownum = 0;
$branch = null;
$row = null;
$first = true;
$NULL = null;
$result = mysql_query($value,$mysql);
error_log (date("F j, Y, g:i a")."\r\n",3,"debug.log");
if (gettype($result) == "resource")
{
while($obj = mysql_fetch_object($result))
{
$newrow = true;
$branch = &$encodable;
$row = &$NULL;
if (count($branch) > 0)
{
$row = &$branch[count($branch)-1];
}
foreach ($obj as $column => $value)
{
if ($column[0] == '_' && $column[1] == '_')
{
$gname = substr($column,2);
if (isset($row[$gname]) && $row[$gname] == $value)
{
$branch = &$row["b"];
$row = &$NULL;
if (count($branch) > 0)
{
$row = &$branch[count($branch)-1];
}
}
else
{
$branch[] = array();
$row = &$branch[count($branch)-1];
$row[$gname] = $value;
$row["b"] = array();
$branch = &$row["b"];
$row = &$NULL;
if (count($branch) > 0)
{
$row = &$branch[count($branch)-1];
}
}
}
else
{
if ($newrow)
{
$branch[] = array();
$row = &$branch[count($branch)-1];
$newrow = false;
}
$row[$column] = $value;
}
}
$rownum++;
}
}
$encoded = json_encode($encodable);
EDIT:
A sample output - the resulting arrays is converted to json.
This small set is grouped by "av" , b is created by the code for each branche and then contains a list of the [hid , utd] records per AV.
[{"av":"eset nod","b":[{"hid":"3","utd":"1"}]},{"av":"None","b":[{"hid":"2","utd":"0"},{"hid":"4","utd":"0"},{"hid":"5","utd":"0"},{"hid":"1","utd":"0"}]}]
The actual sql result that produced this result is:
+----------+-----+-----+
| __av | hid | utd |
+----------+-----+-----+
| eset nod | 3 | 1 |
| None | 2 | 0 |
| None | 4 | 0 |
| None | 5 | 0 |
| None | 1 | 0 |
+----------+-----+-----+
Turns out its all the calls to count($branch).
Apparently calling a function that doesnt expect a variable by reference like count , With a variable by reference , causes the function to make a Copy of the variable to operate on.
In my case arrays with thousands of elements. Which also explains why its the results with few (but large branches) are the ones that suffer the most.
See this thread:
Why is calling a function (such as strlen, count etc) on a referenced value so slow?

Categories