I use the php script below to remove some columns from a csv, order them new and save it as new file.
And it works for the file i made it for.
Now i need to do the same with another csv but i don't know whats wrong. I always get a comma befor the data in the first column.
This is what i have, but it doesn't really work.
<?php
$input = 'http://***/original.csv';
$output = 'new.csv';
if (false !== ($ih = fopen($input, 'r'))) {
$oh = fopen($output, 'w');
while (false !== ($data = fgetcsv($ih))) {
// this is where you build your new row
$outputData = array($data[4], $data[0]);
fputcsv($oh, $outputData);
}
fclose($ih);
fclose($oh);
}
The original.csv looks that:
subproduct_number barcode stock week_number qty_estimate productid variantid
05096470000 4024144513543 J 3 6 35016
ae214 848518017215 N 23 0 7 35015
05097280000 4024144513727 J 1 32 34990
The seperator is ';'. The same seperator is used in the file that is working
But here it will be go wrong because my saved new.csv looks like this:
subproduct_number barcode stock week_number qty_estimate productid variantid
,05096470000 4024144513543 J 3 6 35016
,ae214 848518017215 N 23 0 7 35015
,05097280000 4024144513727 J 1 32 34990
But what i need is a new csv that looks like this:
qty_estimate subproduct_number
3 05096470000
0 ae214
1 05097280000
As you can see, i need only the 5. column ($data[4]) as first and the first column ($data[0]) as the second one.
I hope someone can point me in the reight direction.
Thanks
You can do so:
while (false !== ($data = fgetcsv($ih))) {
$data = explode(';', $data[0]);
$outputData = array($data[4], $data[0]);
fputcsv($oh, $outputData, ';');
}
Here's the CSV table:
-------------------------
| Name | Age | Favorite |
-------------------------
| John | 30 | Apple |
-------------------------
| Bill | 25 | Grape |
-------------------------
| Ann | 40 | Orange |
-------------------------
Now, using strictly PHP, is there anyway to sort only the "Favorite" by ascending order of "Age"? An expected output would be something like this:
25 Grape
30 Apple
40 Orange
I've been using fgetcsv to echo them onto the document, but they are not not sorted by ascending age, of course. Is there anyway to throw these in an array or something, sort by age, and then echo?
To open up your CSV file:
function readCSV($file)
{
$row = 0;
$csvArray = array();
if( ( $handle = fopen($file, "r") ) !== FALSE ) {
while( ( $data = fgetcsv($handle, 0, ";") ) !== FALSE ) {
$num = count($data);
for( $c = 0; $c < $num; $c++ ) {
$csvArray[$row][] = $data[$c];
}
$row++;
}
}
if( !empty( $csvArray ) ) {
return array_splice($csvArray, 1); //cut off the first row (names of the fields)
} else {
return false;
}
}
$csvData = readCSV($csvPath); //This is your array with the data
Then you could use array_multisort() to sort it on a value.
<?php
// Obtain a list of columns
foreach ($csvData as $key => $row) {
$age[$key] = $row['volume'];
$favorite[$key] = $row['edition'];
}
// Sort the data with age first, then favorite
// Add $csvData as the last parameter, to sort by the common key
array_multisort($age, SORT_ASC, $favorite, SORT_ASC, $csvData);
?>
I have two CSVs:
1st csv is something like this:
number | color | size | animal
**1234** | black | big | cat
2nd csv is like this:
name | country | os | number | flavour | yesorno
john | world | windows | **1234** | good | yes
What I'm trying to do is to merge both CSV's (header titles and values of each row) based on matching number values:
number | color | size | animal | name | country | os | flavour | yesorno
**1234** | black | big | cat | john | world | windows | good | yes
I have been trying to use fgetcsv and used keys but I am really a newbie to php and I do not know how to that. I need to understand the logic. Could anyone please help?
Many thanks.
-- edit for better udnerstanding --
based on another question on stackoverflow, I have tryed the following code, which is not working well. The two headers from both CSV files is not merged. It is also missing all the data from a csv, it only has one row of data merged.
Code was found in another question: Merging two csv files together using php and seemed like a perfect base for what I am trying to achieve. Unfortunately the outputed csv is malformed...
<?php
// 1st section
$fh = fopen('csv1.csv', 'r');
$fhg = fopen('csv2.csv', 'r');
while (($data = fgetcsv($fh, 0, ";")) !== FALSE) {
$csv1[] = $data;
}
while (($data = fgetcsv($fhg, 0, ",")) !== FALSE) {
$csv2[] = $data;
}
// 2nd section
for ($x = 0; $x < count($csv2data); $x++) {
if ($x == 0) {
unset($csv1data[0][17]);
$line[$x] = array_merge($csv2data[0], $csv1data[17]); //header
} else {
$deadlook = 0;
for ($y = 0; $y <= count($csv1data); $y++) {
if($csv1data[$y][17] == $csv2data[$x][0]){
unset($csv1data[$y][17]);
$line[$x]=array_merge($csv2data[$x],$csv1data[$y]);
$deadlook=1;
}
}
if ($deadlook == 0)
$line[$x] = $csv2data[$x];
}
}
// 3 section
$fp = fopen('final.csv', 'w'); //output file set here
foreach ($line as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);
?>
$fh = fopen('csv1', 'r');
$fhg = fopen('csv2', 'r');
while (($data = fgetcsv($fh, 0, ",")) !== FALSE) {
$csv1[]=$data;
}
while (($data = fgetcsv($fhg, 0, ",")) !== FALSE) {
$csv2[]=$data;
}
// 2nd section
for($x=0;$x< count($csv2);$x++)
{
if($x==0){
unset($csv1[0][0]);
$line[$x]=array_merge($csv2[0],$csv1[0]); //header
}
else{
$deadlook=0;
for($y=0;$y <= count($csv1);$y++)
{
if($csv1[$y][0] == $csv2[$x][0]){
unset($csv1[$y][0]);
$line[$x]=array_merge($csv2[$x],$csv1[$y]);
$deadlook=1;
}
}
if($deadlook==0)
$line[$x]=$csv2[$x];
}
}
// 3 section
$fp = fopen('final.csv', 'w');//output file set here
foreach ($line as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);
In section 1
open those 2 file and put the content in an array $csv1[] and $csv2[], so $csv1[0] is the first line in csv1.csv, $csv1[1] is the second line in csv1.csv, etc, the same in $csv2[].
In section 2,
i compare the first word in every array, $csv1[x][0] and $csv2[x][0].
the first if, if($x==0), is to make the header. first, i delete the first word contractname using unset function and join $csv1[0] and $csv2[0] using array_merge function.
then, using those for i select the first word in every line from $csv2 array and compare with the first word from every line from $csv1 array. so, if($csv1[$y][0] == $csv2[$x][0]) check if those first word are the same, if those are the same string, delete it and merge those lines.
if those word arent the same, save the $csv2[x] line in $line array and continue.
In section 3,
save the content from $line array in the file.
$fromFile = fopen($REcsv,'r');
$fromFile2 = fopen($pathToFile,'r');
$toFile = fopen($reportFile,'w');
$delimiter = ';';
$line = fgetcsv($fromFile, 65536, $delimiter);
$line2 = fgetcsv($fromFile2, 65536, $delimiter);
while ($line !== false) {
while ($line2 !== false) {
fputcsv($toFile, array_merge($line2,$line), $delimiter);
$line = fgetcsv($fromFile, 65536, $delimiter);
$line2 = fgetcsv($fromFile2, 65536, $delimiter);
}
}
fclose($fromFile);
fclose($fromFile2);
fclose($toFile);
I have a csv file, build in this format:
| id | keyword | description | more ... |
| 1 | myKeyword | MyDescription | some stuff |
| 2 | | | some stuff |
| 3 | myKeyword | MyDescription | some stuff |
right now, I want to create a small PHP script, which reads only the first three columns and save these afterwards in an array. For that, i looked up fgetcsv but that didn't do the trick. Then I tried somehting like this:
$csv = array_map('str_getcsv', file('myFile.csv'));
but that wasn't it, either.
The file i got is pretty messy and this is why I have to check if keywords and descriptions even exists (more like they are not '').
My output array should be something like:
$array[0][0] = 1
$array[0][1] = MyKeyword
$array[0][2] = MyDescription
$array[1][0] = 3
$array[1][1] = MyKeyword
$array[1][2] = MyDescription
Thank you already in advance
Okay, nevermind, i got it.
I found a pretty good solution in the phpdoc of fgetcsv
<?php
function get2DArrayFromCsv($file,$delimiter) {
if (($handle = fopen($file, "r")) !== FALSE) {
$i = 0;
while (($lineArray = fgetcsv($handle, 4000, $delimiter)) !== FALSE) {
for ($j=0; $j<count($lineArray); $j++) {
$data2DArray[$i][$j] = $lineArray[$j];
}
$i++;
}
fclose($handle);
}
return $data2DArray;
}
?>
Read the CSV line by line, and split the lines by your delimiter, and append the resultant arrays to a result array.
array1 = array();
$handle = fopen("file.csv", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
array1 = array_merge(array1, explode('|',$line);
}
}
fclose($handle);
You could first parse everything with:
using the function str-getcsv using its options (delimiters and enclosure). The "basic" function may not be able to catch the syntax of your csv file as there is no real standard for it.
or if you think it worths it, using this parser class
Then, you may either drop the undesired elements of your table or just ignore them. I hope it will help you a bit.
Given input, which shows tag assignments to images, as follows (reading this from php://stdin line by line, as the input can get rather large)
image_a tag_lorem
image_a tag_ipsum
image_a tag_amit
image_b tag_sit
image_b tag_dolor
image_b tag_ipsum
... (there are more lines, may get up to a million)
Output of the input is shown as follows. Basically it is the same format with another entry showing whether the image-tag combination exists in input. Note that for every image, it will list all the available tags and show whether the tag is assigned to the image by using 1/0 at the end of each line.
image_a tag_sit 0
image_a tag_lorem 1
image_a tag_dolor 0
image_a tag_ipsum 1
image_a tag_amit 1
image_b tag_sit 1
image_b tag_lorem 0
image_b tag_dolor 1
image_b tag_ipsum 1
image_b tag_amit 0
... (more)
I have posted my no-so-efficient solution down there. To give a better picture of input and output, I fed 745 rows (which explains tag assignment of 10 images) into the script via stdin, and I receive 555025 lines after the execution of the script using about 0.4MB of memory. However, it may kill the harddisk faster because of the heavy disk I/O activity (while writing/reading to the temporary column cache file).
Is there any other way of doing this? I have another script that can turn the stdin into something like this (not sure if this is useful)
image_foo tag_lorem tag_ipsum tag_amit
image_bar tag_sit tag_dolor tag_ipsum
p/s: order of tag_* is not important, but it has to be the same for all rows, i.e. this is not what i want (notice the order of tag_* is inconsistent for both tag_a and tag_b)
image_foo tag_lorem 1
image_foo tag_ipsum 1
image_foo tag_dolor 0
image_foo tag_sit 0
image_foo tag_amit 1
image_bar tag_sit 1
image_bar tag_lorem 0
image_bar tag_dolor 1
image_bar tag_ipsum 1
image_bar tag_amit 0
p/s2: I don't know the range of tag_* until i finish reading stdin
p/s3: I don't understand why I get down-voted, if clarification is needed I am more than happy to provide them, I am not trying to make fun of something or posting nonsense here. I have re-written the question again to make it sound more like a real problem (?). However, the script really doesn't have to care about what the input really is or whether database is used (well, the data is retrieved from an RDF data store if you MUST know) because I want the script to be usable for other type of data as long as the input is in right format (hence the original version of this question was very general).
p/s4: I am trying to avoid using array because I want to avoid out of memory error as much as possible (if 745 lines expaining just 10 images will be expanded into 550k lines, just imagine I have 100, 1000, or even 10000+ images).
p/s5: if you have answer in other language feel free to post it here. I have thought of solving this using clojure but still couldn't find a way to do it properly.
Sorry, maby I misunderstood you - this looks too easy:
$stdin = fopen('php://stdin', 'r');
$columns_arr=array();
$rows_arr=array();
function set_empty_vals(&$value,$key,$columns_arr) {
$value=array_merge($columns_arr,$value);
ksort($value);
foreach($value AS $val_name => $flag) {
echo $key.' '.$val_name.' '.$flag.PHP_EOL;
}
$value=NULL;
}
while ($line = fgets($stdin)) {
$line=trim($line);
list($row,$column)=explode(' ',$line);
$row=trim($row);
$colum=trim($column);
if(!isset($rows_arr[$row]))
$rows_arr[$row]=array();
$rows_arr[$row][$column]=1;
$columns_arr[$column]=0;
}
array_walk($rows_arr,'set_empty_vals',$columns_arr);
UPD:
1 million lines is easy for php:
$columns_arr = array();
$rows_arr = array();
function set_null_arr(&$value, $key, $columns_arr) {
$value = array_merge($columns_arr, $value);
ksort($value);
foreach($value AS $val_name => $flag) {
//echo $key.' '.$val_name.' '.$flag.PHP_EOL;
}
$value=NULL;
}
for ($i = 0; $i < 100000; $i++) {
for ($j = 0; $j < 10; $j++) {
$row='row_foo'.$i;
$column='column_ipsum'.$j;
if (!isset($rows_arr[$row]))
$rows_arr[$row] = array();
$rows_arr[$row][$column] = 1;
$columns_arr[$column] = 0;
}
}
array_walk($rows_arr, 'set_null_arr', $columns_arr);
echo memory_get_peak_usage();
147Mb for me.
Last UPD - this is how I see low memory usage(but rather fast) script:
//Approximate stdin buffer size, 1Mb should be good
define('MY_STDIN_READ_BUFF_LEN', 1048576);
//Approximate tmpfile buffer size, 1Mb should be good
define('MY_TMPFILE_READ_BUFF_LEN', 1048576);
//Custom stdin line delimiter(\r\n, \n, \r etc.)
define('MY_STDIN_LINE_DELIM', PHP_EOL);
//Custom stmfile line delimiter - chose smallset possible
define('MY_TMPFILE_LINE_DELIM', "\n");
//Custom stmfile line delimiter - chose smallset possible
define('MY_OUTPUT_LINE_DELIM', "\n");
function my_output_arr($field_name,$columns_data) {
ksort($columns_data);
foreach($columns_data AS $column_name => $column_flag) {
echo $field_name.' '.$column_name.' '.$column_flag.MY_OUTPUT_LINE_DELIM;
}
}
$tmpfile=tmpfile() OR die('Can\'t create/open temporary file!');
$buffer_len = 0;
$buffer='';
//I don't think there is a point to save columns array in file -
//it should be small enough to hold in memory.
$columns_array=array();
//Open stdin for reading
$stdin = fopen('php://stdin', 'r') OR die('Failed to open stdin!');
//Main stdin reading and tmp file writing loop
//Using fread + explode + big buffer showed great performance boost
//in comparison with fgets();
while ($read_buffer = fread($stdin, MY_STDIN_READ_BUFF_LEN)) {
$lines_arr=explode(MY_STDIN_LINE_DELIM,$buffer.$read_buffer);
$read_buffer='';
$lines_arr_size=count($lines_arr)-1;
$buffer=$lines_arr[$lines_arr_size];
for($i=0;$i<$lines_arr_size;$i++) {
$line=trim($lines_arr[$i]);
//There must be a space in each line - we break in it
if(!strpos($line,' '))
continue;
list($row,$column)=explode(' ',$line,2);
$columns_array[$column]=0;
//Save line in temporary file
fwrite($tmpfile,$row.' '.$column.MY_TMPFILE_LINE_DELIM);
}
}
fseek($tmpfile,0);
$cur_row=NULL;
$row_data=array();
while ($read_buffer = fread($tmpfile, MY_TMPFILE_READ_BUFF_LEN)) {
$lines_arr=explode(MY_TMPFILE_LINE_DELIM,$buffer.$read_buffer);
$read_buffer='';
$lines_arr_size=count($lines_arr)-1;
$buffer=$lines_arr[$lines_arr_size];
for($i=0;$i<$lines_arr_size;$i++) {
list($row,$column)=explode(' ',$lines_arr[$i],2);
if($row!==$cur_row) {
//Output array
if($cur_row!==NULL)
my_output_arr($cur_row,array_merge($columns_array,$row_data));
$cur_row=$row;
$row_data=array();
}
$row_data[$column]=1;
}
}
if(count($row_data)&&$cur_row!==NULL) {
my_output_arr($cur_row,array_merge($columns_array,$row_data));
}
Here's a MySQL example that works with your supplied test data:
CREATE TABLE `url` (
`url1` varchar(255) DEFAULT NULL,
`url2` varchar(255) DEFAULT NULL,
KEY `url1` (`url1`),
KEY `url2` (`url2`)
);
INSERT INTO url (url1, url2) VALUES
('image_a', 'tag_lorem'),
('image_a', 'tag_ipsum'),
('image_a', 'tag_amit'),
('image_b', 'tag_sit'),
('image_b', 'tag_dolor'),
('image_b', 'tag_ipsum');
SELECT url1, url2, assigned FROM (
SELECT t1.url1, t1.url2, 1 AS assigned
FROM url t1
UNION
SELECT t1.url1, t2.url2, 0 AS assigned
FROM url t1
JOIN url t2
ON t1.url1 != t2.url1
JOIN url t3
ON t1.url1 != t3.url1
AND t1.url2 = t3.url2
AND t2.url2 != t3.url2 ) tmp
ORDER BY url1, url2;
Result:
+---------+-----------+----------+
| url1 | url2 | assigned |
+---------+-----------+----------+
| image_a | tag_amit | 1 |
| image_a | tag_dolor | 0 |
| image_a | tag_ipsum | 1 |
| image_a | tag_lorem | 1 |
| image_a | tag_sit | 0 |
| image_b | tag_amit | 0 |
| image_b | tag_dolor | 1 |
| image_b | tag_ipsum | 1 |
| image_b | tag_lorem | 0 |
| image_b | tag_sit | 1 |
+---------+-----------+----------+
This should be simple enough to convert to SQLite, so if required you could use PHP to read the data into a temporary SQLite database, and then extract the results.
Put your input data in array and then sort them by using usort, define comparison function which compares array elements by row values and then column values if row values are equal.
This is my current implementation, I don't like it, but it does the job for now.
#!/usr/bin/env php
<?php
define('CACHE_MATCH', 0);
define('CACHE_COLUMN', 1);
define('INPUT_ROW', 0);
define('INPUT_COLUMN', 1);
define('INPUT_COUNT', 2);
output_expanded_entries(
cache_input(array(tmpfile(), tmpfile()), STDIN, fgets(STDIN))
);
echo memory_get_peak_usage();
function cache_input(Array $cache_files, $input_pointer, $input) {
if(count($cache_files) != 2) {
throw new Exception('$cache_files requires 2 file pointers');
}
if(feof($input_pointer) == FALSE) {
cache_match($cache_files[CACHE_MATCH], trim($input));
cache_column($cache_files[CACHE_COLUMN], process_line($input));
cache_input(
$cache_files,
$input_pointer,
fgets($input_pointer)
);
}
return $cache_files;
}
function cache_column($cache_column, $input) {
if(empty($input) === FALSE) {
rewind($cache_column);
$column = get_field($input, INPUT_COLUMN);
if(column_cached_in_memory($column) === FALSE && column_cached_in_file($cache_column, fgets($cache_column), $column) === FALSE) {
fputs($cache_column, $column . PHP_EOL);
}
}
}
function cache_match($cache_match, $input) {
if(empty($input) === FALSE) {
fputs($cache_match, $input . PHP_EOL);
}
}
function column_cached_in_file($cache_column, $current, $column, $result = FALSE) {
return $result === FALSE && feof($cache_column) === FALSE ?
column_cached_in_file($cache_column, fgets($cache_column), $column, $column == $current)
: $result;
}
function column_cached_in_memory($column) {
static $local_cache = array(), $index = 0, $count = 500;
$result = TRUE;
if(in_array($column, $local_cache) === FALSE) {
$result = FALSE;
$local_cache[$index++ % $count] = $column;
}
return $result;
}
function output_expanded_entries(Array $cache_files) {
array_map('rewind', $cache_files);
for($current_row = NULL, $cache = array(); feof($cache_files[CACHE_MATCH]) === FALSE;) {
$input = process_line(fgets($cache_files[CACHE_MATCH]));
if(empty($input) === FALSE) {
if($current_row !== get_field($input, INPUT_ROW)) {
output_cache($current_row, $cache);
$cache = read_columns($cache_files[CACHE_COLUMN]);
$current_row = get_field($input, INPUT_ROW);
}
$cache = array_merge(
$cache,
array(get_field($input, INPUT_COLUMN) => get_field($input, INPUT_COUNT))
);
}
}
output_cache($current_row, $cache);
}
function output_cache($row, $column_count_list) {
if(count($column_count_list) != 0) {
printf(
'%s %s %s%s',
$row,
key(array_slice($column_count_list, 0, 1)),
current(array_slice($column_count_list, 0, 1)),
PHP_EOL
);
output_cache($row, array_slice($column_count_list, 1));
}
}
function get_field(Array $input, $field) {
$result = NULL;
if(in_array($field, array_keys($input))) {
$result = $input[$field];
} elseif($field == INPUT_COUNT) {
$result = 1;
}
return $result;
}
function process_line($input) {
$result = trim($input);
return empty($result) === FALSE && strpos($result, ' ') !== FALSE ?
explode(' ', $result)
: NULL;
}
function push_column($input, Array $result) {
return empty($input) === FALSE && is_array($input) ?
array_merge(
$result,
array(get_field($input, INPUT_COLUMN))
)
: $result;
}
function read_columns($cache_columns) {
rewind($cache_columns);
$result = array();
while(feof($cache_columns) === FALSE) {
$column = trim(fgets($cache_columns));
if(empty($column) === FALSE) {
$result[$column] = 0;
}
}
return $result;
}
EDIT: yesterday's version was bugged :/