I have two csv files, and both have same data structure.
ID - Join_date - Last_Login
I want to compare and get the exactly matching records numbers based on this example:
the first files has 100 records, of which 20 are not included in the 2nd file.
the 2nd file has 120 records.
I want a script in PHP to compare these two files and build two separate CSV files.
And I want to remove all extra records from the 2nd file which are not included in the first file.
And remove all records from the first file which are not included in the 2nd file.
Thanks
There is a GNU utility comm that will do this really easily. You could exec that through php or just do it directly. If you don't have access to comm, the easiest thing to do would be to store both files in an array (probably via file()) and use array_intersect().
You an try this for limited number of CSV file .. if you have a very large CSV i would advice you import it directly into MySQL
function csvToArray($csvFile, $full = false) {
$handle = fopen ( $csvFile, "r" );
$array = array ();
while ( ($data = fgetcsv ( $handle )) !== FALSE ) {
$array [] = ($full === true) ? $data : $data[0]; // Full array or only ID
}
return $array;
}
$file1 = "file1.csv" ;
$file2 = "file2.csv" ;
$fileData1 = csvToArray($file1);
$fileData2 = csvToArray($file2);
var_dump(array_diff($fileData1,$fileData2));
var_dump(array_intersect($fileData1,$fileData2));
Related
Is there an effective way to update/delete specific row in CSV file? Every other method included reading contents of entire file, creating temporary file and then replacing old file with it, etc...
But let's say, I have big CSV with 10000 records, so this kind of solution would be rather resource-heavy.
Let's say, I am unable to use database, so writing to file is the only way of storing data.
So, the question is, what would be the most effective way to do it?
Thank you in advance!
You're going to have to read the entire file. Sorry, no way around that. A CSV is a single, flat, text file with randomly sized fields and rows.
You definitely shouldn't be working directly with a CSV for database operations. You ought to pull the data into a database to work with it, then output it back to CSV when you're done.
You don't mention why you can't use a database, so I'm going to guess it's a resource issue, and you also don't say why you don't want to rewrite the file, so I'm going to guess it's due to performance. You could cache a number of operations and perform them all at once, but you're not going to get away from rewriting all or at least some portion of the file.
Consider reading the csv line by line into a multi-dimensional array, and at a certain row make your changes. Then, export array data out to csv. Below example modifies the 100th row assuming a 6-column comma delimited csv file (0-5).
Now, if you want to delete the row, then exclude it from $newdata array by conditionally skipping to next loop iteration with continue. Alternatively, if you want to update, simple set current inner array $newdata[$i] to new values:
$i = 0;
$newdata = [];
$handle = fopen("OldFile.csv", "r");
// READ CSV
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
// UPDATE 100TH ROW DATA (TO EXCLUDE, KEEP ONLY $i++ AND continue)
if ($i == 99) {
$newdata[$i][] = somenewvalue;
$newdata[$i][] = somenewvalue;
$newdata[$i][] = somenewvalue;
$newdata[$i][] = somenewvalue;
$newdata[$i][] = somenewvalue;
$newdata[$i][] = somenewvalue;
$i++;
continue;
}
$newdata[$i][] = $data[0];
$newdata[$i][] = $data[1];
$newdata[$i][] = $data[2];
$newdata[$i][] = $data[3];
$newdata[$i][] = $data[4];
$newdata[$i][] = $data[5];
$i++;
}
// EXPORT CSV
$fp = fopen('NewFile.csv', 'w');
foreach ($newdata as $rows) {
fputcsv($fp, $rows);
}
fclose($fp);
Break the CSV into multiple files all in one directory.
That way you still have to rewrite files, but you don't have to rewrite nearly as much.
Bit late but for people who may search same thing, you could put your csv into an sqlite what addionaly gives you the ability to search in the dataset. There is some sample code: Import CSV File into a SQLite Database via PHP
I have a large number of one dimensional arrays stored in memory cache and I want to write them to a CSV files, the arrays come one by one through the use of a queue. But I want to "limit each csv file to about 100 rows" and then write the remaining arrays into newer files and so on.
I would highly appreciate any help in this.
I used this function to pass arrays into a csv but I don't know how to limit the number of rows to 100 and then open new files and write to them.
The messages passed in the queue are keys named as SO1,SO2, SO3 n so on with the last message being passed as "LAST". And based on the keys , the arrays associated with the keys are read in from memcache and have to be written into csv files. The messages reach one after another via rabitmq queue from a preceeding module.
// Assuming $S01 is an array fetched from memcache based on the key say S01 received via a queue.
$SO1 = array('Name'=> 'Ubaid', 'Age'=>'24', 'Gender'=>'Male','Lunch'=>'Yes', 'Total'=> '1000');
$row_count= 0;
$csv_file_count= 1;
while($msg != "LAST" ){ // As long as msg recieved is not LAST
$csv = fopen("file_". $csv_file_count.".csv", "w");
fputcsv($csv, array_keys($SO));
while($msg != "LAST" && $row_count<100){
fputcsv($csv, $SO); // Write toCSV
$row_count++;
}
$row_count=0;
$csv_file_count++;
fclose($csv);
You could make a counter like this.
$row_count = 0;
$csv_file_count = 1;
while (!$q->isEmpty()){ // as long as the queue is not empty
$csv = fopen("file_".$csv_file_count.".csv", "w"); // open "file_n.csv" n file number
fputcsv($csv,explode(',',"col1,col2,col3,col4")); // Your custom headers
while (!$q->isEmpty() && $row_count < 100){ // so while queue is not empty and the counter didnt reach 100
fputcsv($csv, explode(',',$q->pop())); // write to file. Explode by , or space or whatever your data looks like
$row_count++; // increment row counter
}
$row_count = 0; // when that is not true anymore, reset row counter
$csv_file_count++; // increment file counter
$csv.close() // close file
} // repeats untill queue is empty
Updated to use fputcsv()
If you want another seperator in your csv file you can do like this:
fputcsv($csv, explode(',',explode(',',$q->pop()),";"); // (;) for example. Default is comma (,)
You can also specify a field enclosure
fputcsv($csv, explode(',',explode(',',$q->pop()),",","'"); // (') for example. Default is double quote (")
fputcsv() takes 2 required parameters and 2 optional
From php.net fputcsv
int fputcsv ( resource $handle , array $fields [, string $delimiter = "," [, string $enclosure = '"' [, string $escape_char = "\" ]]] )
Fields shall be an array, therefore explode(',',$q->pop()) as the 2nd parameter
I have seen few similar examples but it is still not working.
csv data file "data1.csv" is as below:
symbol,num1,num2
QCOM,10,100
QCOM,20,200
QCOM,30,300
QCOM,40,400
CTSH,10,111
CTSH,20,222
CTSH,30,333
CTSH,40,444
AAPL,10,11
AAPL,20,22
AAPL,30,33
AAPL,40,44
--end of file ----
$inputsymbol = QCOM ; // $inputsymbol will come from html.works fine.
I want to read the csv file and fetch lines that matches symbol = QCOM. and convert it in to array $data1 to plot line chart for num1 and num2 as below.
$data1 = array (
array(10,100),
array(20,200),
array(30,300),
array(40,400)
);
Note: 1. no comma at the end of each csv lines in csv datafile.
2. Multiple symbols in same file. so the lines that match symbols only
should be included in $data1.
==============
Mark's soluition solves the problem. Now to make the data access faster (for a very large csv file), I have (externally) formatted same data as below. Question is how it can automatically extract headers and then for the data1 array?
symbol,1/1/2015,1/2/2015,1/3/2015,1/4/2015
QCOM,100,200,300,400
CTSH,11,22,33,44
AAPL,10,11,12,13
Note that the number of fields in header is not fixed. (it will increase every month). But the data will also increse accordingly.
Not complicated:
$inputsymbol = 'QCOM';
$data1 = [];
$fh = fopen("data1.csv", "r"));
while (($data = fgetcsv($fh, 1024)) !== FALSE) {
if ($data[0] == $inputsymbol) {
unset($data[0]);
$data1[] = $data;
}
}
fclose($fh);
So where exactly are you having the problem?
need to pull a ton of info, i.e.
file1:
10948|Book|Type1
file2:
SHA512||0||10948
file3:
0|10948|SHA512|c3884fbd7fc122b5273262b7a0398e63
I'd like to get it into soething like
c3884fbd7fc122b5273262b7a0398e63|SHA512|Type1|Book
I do not have access to an actual database, is there any way to do this? Basically looking for a $id = $file1[0]; if($file3[1] == $id) or something unles sthere's more efficient.
Each CSV file is anywhere from 100k-300k lines. I don't care if it takes a while, I can just let it run on EC2 for a while.
$data = array();
$fh = fopen('file1') or die("Unable to open file1");
while(list($id, $val1, $val2) = fgetcsv($fh, 0, '|')) {
$data[$id]['val1'] = $val1;
$data[$id]['val2'] = $val2;
}
fclose($fh);
$fh = fopen('file2') or die ("Unable to open file2");
while(list($method, null, null, null, $id) = fgetcsv($fh, 0, '|')) {
$data[$id]['method'] = $method;
}
fclose($fh);
$fh = fopen('file3') or die("Unable to open file3");
while(list(null, $id, null, $hash) = fgetcsv($fh, 0, '|')) {
$data[$id]['hash'] = $hash;
}
fclose($fh);
Tedious, but should you get an array with the data you want. Outputting it it as another csv is left as an exercise to the reader (hint: see fputcsv()).
All three files appear to have a common field (i.e. in your example, "10948" was common to all three lines). If you're not worried about using a lot of memory, you could load all three files in different array, setting the common field as the array key and using a foreach loop to reassemble all three.
For example:
$result = array();
// File 1
$fh = fopen('file1');
while ( ($data = fgetcsv($fh, 0, '|')) !== FALSE )
$result[$data[0]] = $data;
fclose($fh);
// File 2
$fh = fopen('file2')
while ( ($data = fgetcsv($fh, 0, '|')) !== FALSE )
$result[$data[5]] = array_merge($result[$data[3]], $data);
fclose($fh);
// File 3
$fh = fopen('file3')
while ( ($data = fgetcsv($fh, 0, '|')) !== FALSE )
$result[$data[1]] = array_merge($result[$data[1]], $data);
fclose($fh);
I would suggest to perform a merged-sort using basic unix tools:
a) sort your .CSV files by the columns common between each file, sort -d" " -K? -K? -K?
b) Using the unix 'join' command to output records common between pairs of .CSV files.
The 'join' command only works with 2 files at a time, so you'll have to 'chain' the results for multiple data sources:
# where 'x' is field number from file A, and 'y' is field number from file B
sort -kx "fileA"
sort -ky "fileB"
join -1x -2y "fileA" "fileB" > file1
sort -kx "fileC"
join -1x -2y "file1" "fileC" > file2
sort -kx "fileD"
join -1x -2y "file2" "fileD" > file3
etc...
This is very fast, and allows you to filter your .CSV files as if an impromptu database join occurred.
If you have to write your own merge-sort in php: (Read Here: Merge Sort )
The easiest implementing to merge-sort of .CSV files is 2-stage: a) unix sort your files, then B) 'merge' all the sources in parallel, reading in a record from each, looking for the case where your value in your common fields match all the other sources (JOIN in database terminology):
rule 1) Skip the record that is less than (<) ALL the other sources.
rule 2) When a record's common value is equal to (==) ALL other sources do you have a match.
rule 3) When a record's common value is equal to (==) is SOME of the other source, you can use 'LEFT-JOIN' logic if desired, otherwise skip that record from all sources.
Pseudo code for a join of multiple files
read 1st record from every data source;
while "record exists from all data sources"; do
for A in each Data-Source ; do
set cntMissMatch=0
for B in each Data-Source; do
if A.field < B.field then
cntMissMatch+=1
end if
end for
if cntMissMatch == count(Data-Sources) then
# found record with lowest values, skip it
read next record in current Data-source;
break; # start over again looking for lowest
else
if cntMissMatch == 0 then
we have a match, process this record;
read in next record from ALL data-sources ;
break; # start over again looking for lowest
else
# we have a partial match, you can choose to have
# 'LEFT-JOIN' logic at this point if you choose,
# where records are spit out even if they do NOT
# match to ALL data-sources.
end if
end if
end for
done
Hope that helps.
I am attempting to insert the data from an uploaded file into a single dimension array.
The file is as such that there is one student number to a line like so:
392232,392231,etc
this is the most common way I've found online:
while (($line = fgetcsv($file, 25, ',')) !== FALSE) {
//$line is an array of the csv elements
print_r($line);
}
However form what I understand this will create an array ($line) for each row. Which is not what I want.
that aside I tried this to see if it is working and my code is not printing out the array after using ftgetcsv(). The file is successfully uploading.
here is my code:
if(isset($_FILES['csv_file']) && is_uploaded_file($_FILES['csv_file']['tmp_name'])){
//create file name
$file_path = "csv_files/" . $_FILES['csv_file']['name'];
//move uploaded file to upload dir
if (!move_uploaded_file($_FILES['csv_file']['tmp_name'], $file_path)) {
//error moving upload file
echo "Error moving uploaded file";
}
print_r($_FILES['csv_file']);
$file = fopen('$file_path', 'r');
while (($line = fgetcsv($file, 25, ',')) !== FALSE) {
//$line is an array of the csv elements
print_r($line);
}
//delete csv file
unlink($file_path);
}
First off, can anyone obviously see why it wouldnt work to at least print them as seperate arrays of data (each row).
Second, is it possible to set it so that it creates a 1d array of all rows in the file?
Many thanks,
Question 1 is because of
print_r($_FILES['csv_file']);
$file = fopen('$file_path', 'r');
should be:
$file = fopen($file_path, 'r');
and for Question 2, check out the array_push
1st Question:
This line will actually try to open a file called '$file_path' because you're using single quotes (so it doesn't expand to the value of the variable). You can just remove the quotes.
$file = fopen('$file_path', 'r');
$file is null after this.
2nd Question:
If all you want to do is convert a file into an array by lines you can use one of these instead:
file() - get whole file into a 1D array of lines of the file (closest to what you want)
fgets() - get a string per line per call; keep calling this until it returns false to get each line one at a time
file_get_contents() - get the whole file into a string and process as you like
According to PHP.net $line has to return as array.
"returns an array containing the fields read."
But if you are sure it's contains only one student number you can use $line[0] to get the first line value (Ignoring the ",")
Here are some general comments on your code:
You are passing the file path into the fopen() function incorrectly. The variable should not be surrounded with single quotes.
Since you are deleting the CSV file after processing it, moving it is unnecessary. Simply use $_FILES['csv_file']['tmp_name'] as the path to the file.
Since there is only one entry per row in your CSV file, simply access the first element of the array that is returned from fgetcsv(): $numbers[] = $line[0];