need to pull a ton of info, i.e.
file1:
10948|Book|Type1
file2:
SHA512||0||10948
file3:
0|10948|SHA512|c3884fbd7fc122b5273262b7a0398e63
I'd like to get it into soething like
c3884fbd7fc122b5273262b7a0398e63|SHA512|Type1|Book
I do not have access to an actual database, is there any way to do this? Basically looking for a $id = $file1[0]; if($file3[1] == $id) or something unles sthere's more efficient.
Each CSV file is anywhere from 100k-300k lines. I don't care if it takes a while, I can just let it run on EC2 for a while.
$data = array();
$fh = fopen('file1') or die("Unable to open file1");
while(list($id, $val1, $val2) = fgetcsv($fh, 0, '|')) {
$data[$id]['val1'] = $val1;
$data[$id]['val2'] = $val2;
}
fclose($fh);
$fh = fopen('file2') or die ("Unable to open file2");
while(list($method, null, null, null, $id) = fgetcsv($fh, 0, '|')) {
$data[$id]['method'] = $method;
}
fclose($fh);
$fh = fopen('file3') or die("Unable to open file3");
while(list(null, $id, null, $hash) = fgetcsv($fh, 0, '|')) {
$data[$id]['hash'] = $hash;
}
fclose($fh);
Tedious, but should you get an array with the data you want. Outputting it it as another csv is left as an exercise to the reader (hint: see fputcsv()).
All three files appear to have a common field (i.e. in your example, "10948" was common to all three lines). If you're not worried about using a lot of memory, you could load all three files in different array, setting the common field as the array key and using a foreach loop to reassemble all three.
For example:
$result = array();
// File 1
$fh = fopen('file1');
while ( ($data = fgetcsv($fh, 0, '|')) !== FALSE )
$result[$data[0]] = $data;
fclose($fh);
// File 2
$fh = fopen('file2')
while ( ($data = fgetcsv($fh, 0, '|')) !== FALSE )
$result[$data[5]] = array_merge($result[$data[3]], $data);
fclose($fh);
// File 3
$fh = fopen('file3')
while ( ($data = fgetcsv($fh, 0, '|')) !== FALSE )
$result[$data[1]] = array_merge($result[$data[1]], $data);
fclose($fh);
I would suggest to perform a merged-sort using basic unix tools:
a) sort your .CSV files by the columns common between each file, sort -d" " -K? -K? -K?
b) Using the unix 'join' command to output records common between pairs of .CSV files.
The 'join' command only works with 2 files at a time, so you'll have to 'chain' the results for multiple data sources:
# where 'x' is field number from file A, and 'y' is field number from file B
sort -kx "fileA"
sort -ky "fileB"
join -1x -2y "fileA" "fileB" > file1
sort -kx "fileC"
join -1x -2y "file1" "fileC" > file2
sort -kx "fileD"
join -1x -2y "file2" "fileD" > file3
etc...
This is very fast, and allows you to filter your .CSV files as if an impromptu database join occurred.
If you have to write your own merge-sort in php: (Read Here: Merge Sort )
The easiest implementing to merge-sort of .CSV files is 2-stage: a) unix sort your files, then B) 'merge' all the sources in parallel, reading in a record from each, looking for the case where your value in your common fields match all the other sources (JOIN in database terminology):
rule 1) Skip the record that is less than (<) ALL the other sources.
rule 2) When a record's common value is equal to (==) ALL other sources do you have a match.
rule 3) When a record's common value is equal to (==) is SOME of the other source, you can use 'LEFT-JOIN' logic if desired, otherwise skip that record from all sources.
Pseudo code for a join of multiple files
read 1st record from every data source;
while "record exists from all data sources"; do
for A in each Data-Source ; do
set cntMissMatch=0
for B in each Data-Source; do
if A.field < B.field then
cntMissMatch+=1
end if
end for
if cntMissMatch == count(Data-Sources) then
# found record with lowest values, skip it
read next record in current Data-source;
break; # start over again looking for lowest
else
if cntMissMatch == 0 then
we have a match, process this record;
read in next record from ALL data-sources ;
break; # start over again looking for lowest
else
# we have a partial match, you can choose to have
# 'LEFT-JOIN' logic at this point if you choose,
# where records are spit out even if they do NOT
# match to ALL data-sources.
end if
end if
end for
done
Hope that helps.
Related
I have seen few similar examples but it is still not working.
csv data file "data1.csv" is as below:
symbol,num1,num2
QCOM,10,100
QCOM,20,200
QCOM,30,300
QCOM,40,400
CTSH,10,111
CTSH,20,222
CTSH,30,333
CTSH,40,444
AAPL,10,11
AAPL,20,22
AAPL,30,33
AAPL,40,44
--end of file ----
$inputsymbol = QCOM ; // $inputsymbol will come from html.works fine.
I want to read the csv file and fetch lines that matches symbol = QCOM. and convert it in to array $data1 to plot line chart for num1 and num2 as below.
$data1 = array (
array(10,100),
array(20,200),
array(30,300),
array(40,400)
);
Note: 1. no comma at the end of each csv lines in csv datafile.
2. Multiple symbols in same file. so the lines that match symbols only
should be included in $data1.
==============
Mark's soluition solves the problem. Now to make the data access faster (for a very large csv file), I have (externally) formatted same data as below. Question is how it can automatically extract headers and then for the data1 array?
symbol,1/1/2015,1/2/2015,1/3/2015,1/4/2015
QCOM,100,200,300,400
CTSH,11,22,33,44
AAPL,10,11,12,13
Note that the number of fields in header is not fixed. (it will increase every month). But the data will also increse accordingly.
Not complicated:
$inputsymbol = 'QCOM';
$data1 = [];
$fh = fopen("data1.csv", "r"));
while (($data = fgetcsv($fh, 1024)) !== FALSE) {
if ($data[0] == $inputsymbol) {
unset($data[0]);
$data1[] = $data;
}
}
fclose($fh);
So where exactly are you having the problem?
I have a CSV that is downloaded from the wholesaler everynight with updated prices.
What I need to do is edit the price column (2nd column) and multiply the current value by 1.3 (30%).
My code to read the provided CSV and take just the columns I need is below, however I can't seem to figure out how to edit the price column.
<?php
// open the csv file in write mode
$fp = fopen('var/import/tb_prices.csv', 'w');
// read csv file
if (($handle = fopen("var/import/Cbl_4036_2408.csv", "r")) !== FALSE) {
$targetColumns = array(1, 2, 3); // get data from the 1st, 4th and 15th column
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$targetData = array(); // array that hold target data
foreach($targetColumns as $column){ // loop throught the targeted columns array
if($column[2]){
$data[$column] = $data[0] * 1.3;
}
$targetData[] = $data[$column]; // get the data from the column
}
# Populate the multidimensional array.
$csvarray[$nn] = $targetData; // add target data to csvarray
// write csv file
fputcsv($fp, $targetData);
}
fclose($handle);
fclose($fp);
echo "CSV File Written Successfully!";
}
?>
Could somebody point me in the right direction please, explaining how you've worked out the function too so I can learn at the same time.
You are multiplying your price column always as - $data[0] * 1.3.
It may be wrong here.
Other views:
If you are doing it once in a lifetime of this data(csv) handling, try to solve it using mysql itself only. Create the table similar to the database, import the .csv data into that mysql table. And then, SQL operate as you want.
No loops; no coding, no file read/write, and precise control over what you want to do with UPDATE. You just need to be aware of the delimiters (line separators eg. \r\n, column separators (eg. comma or tab or semicolon) and data encoding in double/single-quotes or not)
Once you modify your data, you can export it back to csv again.
If you want to handle the .csv file itself, open it in one connection (read only mode), and write to another file - saving the original data.
you say that the column that contains the price is the second but then use that index with zero. anyway the whole thing can be easier
$handle = fopen("test.csv", "r");
if ( $handle !== FALSE) {
$out = "";
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
$data[1] = ((float)$data[1] * 1.3);
$out .= implode(";",$data) . "\n";
}
fclose($handle);
file_put_contents("test2.csv", $out);
}
this code open a csv file with comma as separator.
than read every line and for every line it's multiplies the second coloumn (index 1) for 1.3
this line
$out .= implode(";",$data) . "\n";
generate a line for new csb file. see implode on the officile documentation ...
after I close the connection to the file. and 'useless to have a connection with two files when you can do the writing of the second file in one fell swoop. the thing is true for small files
I have a file with the size of around 10 GB or more. The file contains only numbers ranging from 1 to 10 on each line and nothing else. Now the task is to read the data[numbers] from the file and then sort the numbers in ascending or descending order and create a new file with the sorted numbers.
Can anyone of you please help me with the answer?
I'm assuming this is somekind of homework and goal for this is to sort more data than you can hold in your RAM?
Since you only have numbers 1-10, this is not that complicated task. Just open your input file and count how many occourances of every specific number you have. After that you can construct simple loop and write values into another file. Following example is pretty self explainatory.
$inFile = '/path/to/input/file';
$outFile = '/path/to/output/file';
$input = fopen($inFile, 'r');
if ($input === false) {
throw new Exception('Unable to open: ' . $inFile);
}
//$map will be array with size of 10, filled with 0-s
$map = array_fill(1, 10, 0);
//Read file line by line and count how many of each specific number you have
while (!feof($input)) {
$int = (int) fgets($input);
$map[$int]++;
}
fclose($input);
$output = fopen($outFile, 'w');
if ($output === false) {
throw new Exception('Unable to open: ' . $outFile);
}
/*
* Reverse array if you need to change direction between
* ascending and descending order
*/
//$map = array_reverse($map);
//Write values into your output file
foreach ($map AS $number => $count) {
$string = ((string) $number) . PHP_EOL;
for ($i = 0; $i < $count; $i++) {
fwrite($output, $string);
}
}
fclose($output);
Taking into account the fact, that you are dealing with huge files, you should also check script execution time limit for your PHP environment, following example will take VERY long for 10GB+ sized files, but since I didn't see any limitations concerning execution time and performance in your question, I'm assuming it is OK.
I had a similar issue before. Trying to manipulate such a large file ended up being huge drain on resources and it couldn't cope. The easiest solution I ended up with was to try and import it into a MySQL database using a fast data dump function called LOAD DATA INFILE
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Once it's in you should be able to manipulate the data.
Alternatively, you could just read the file line by line while outputting the result into another file line by line with the sorted numbers. Not too sure how well this would work though.
Have you had any previous attempts at it or are you just after a possible method of doing it?
If that's all you don't need PHP (if you have a Linux maschine at hand):
sort -n file > file_sorted-asc
sort -nr file > file_sorted-desc
Edit: OK, here's your solution in PHP (if you have a Linux maschine at hand):
<?php
// Sort ascending
`sort -n file > file_sorted-asc`;
// Sort descending
`sort -nr file > file_sorted-desc`;
?>
:)
Is it possible to validate a text file before I dump its data into a MYSQL database?
I want to check if it contains, say, 5 columns (of data). If so, then i go ahead with the following query:
LOAD DATA CONCURRENT INFILE 'c:/test/test.txt'
INTO TABLE DUMP_TABLE FIELDS TERMINATED BY '\t' ENCLOSED BY '' LINES TERMINATED BY '\n' ignore 1 lines.
If not, I remove the entire row. I repeat this process for all rows in the txt file.
The text file contains data of the format:
id col2 col3 2012-07-27-19:27:06 col5
id col2 col3 2012-07-25-09:58:50 col5
id col2 col3 2012-07-23-10:14:13 col5
EDIT: After reading your comments, here's the code for doing the same on tab separated data:
$handler = fopen("myfile.txt","r");
$error = false;
while (!feof($handler)){
fgets($handler,$linetocheck);
$cols = explode (chr(9), $linetocheck); //edit: using http://es.php.net/manual/en/function.fgetcsv.php you can get the same result as with fgets+explode
if (count($cols)>$max_cols){
$error=true;
break;
}
}
fclose($handler);
if (!$error){
//...do stuff
}
This code reads a file, let's say "myfile.txt", line by line, and sets variable $error to true if any of the lines has a length of more than $max_cols. (My apologies if that's not what you're asking, your question is not the most clear to me)
$handler = fopen("myfile.txt","r");
$error = false;
while (!feof($handler)){
fgets($handler,$linetocheck);
if (strlen($linetocheck)>$max_cols){
$error=true;
break;
}
}
fclose($handler);
if (!$error){
//...do stuff
}
I know it's an old thread, but I was looking something similar for myself and I came across to this topic, but none of the answers provided here helped me.
Thus, I've went ahead and came with my own solution which is tested and works perfectly (can be improved).
Assume, we have a CSV file named example.csv that contains the following dummy data (on purpose, the last line, 6th, contains one extra data then the other rows):
Name,Country,Age
John,Ireland,18
Ted,USA,22
Lisa,UK,23
Michael,USA,20
Louise,Ireland,22,11
Now, when we're checking the CSV file to assure all the rows have the same number of data, the following block of code will do the trick and pin-point on what line the error occurred:
function validateCsvColumnLength($pathToCsvFile)
{
if(!file_exists($pathToCsvFile) || !is_readable($pathToCsvFile)){
throw new \Exception('Filename doesn`t exist or is not readable.');
}
if (!$handle = fopen($pathToCsvFile, "r")) {
throw new \Exception("Stream error");
}
$rowLength = [];
$rowNumber = 0;
while (($data = fgetcsv($handle)) !== FALSE) {
$rowLength[] = count($data);
$rowNumber++;
}
fclose($handle);
$rowKeyWithError = array_search(max($rowLength), $rowLength);
$differentRowCount = count(array_unique($rowLength));
// if there's a row that has more or less data, throw an error with the line that triggered it
if ($differentRowCount !== 1) {
throw new \Exception("Error, data count from row {$rowKeyWithError} does not match header size");
}
return true;
}
To actually test it, just do a var_dump() to see the result:
var_dump(validateCsvColumnLength('example.csv'));
What columns do you mean? If you just means amount of characters in rows, just split (explode) the file into many rows and check whether their lengths are equal to 5.
If you meant columns with delimeters, then you should find amount of occurences of that splitter in each row and then again check are they equal to 5. use fgetcsv for that
I'm assuming your talking about the length of each line in the file. If so, here's a possible solution.
$file_handle = fopen("myfile", "r");
while (!feof($file_handle)) {
$line = fgets($file_handle);
if(strlen($line)!=5) {
throw new Exception("Could not save file to database.");
break;
}
}
fclose($file_handle);
Yes, it is possible. I've done that exact thing. Use PHP's csv processing functions.
You will need these functions:
fopen()
fgetcsv()
And possibly some others.
fgetcsv returns an array.
I'll give you a short example of how you can validate.
here's the csv:
col1,col2,col3,col4
1,2,3,4
1,2,3,4,
1,2,3,4,5
1,2,3,4
I'll skip the fopen part and go straight to the validation step.
Note that "\t" is the tab character.
$row_length;
$i = 0;
while($row = fgetcsv($handle,0,"\t") {
if($i == 0) {
$row_length = sizeof($row);
} else {
if(sizeof($row) != $row_length) {
echo "Error, line $i of the data does not match header size";
break;
}
}
}
That would test each row to make sure it is the same as the 1st row's ($i = 0) length.
EDIT:
And, in case you don't know how to search the internet, here is the page for fgetcsv:
http://php.net/manual/en/function.fgetcsv.php
Here is the function prototype:
array fgetcsv ( resource $handle [, int $length = 0 [, string $delimiter = ',' [, string $enclosure = '"' [, string $escape = '\' ]]]] )
As you can see, it has everything you would need for doing a quick scan in PHP before you send your data to LOAD DATA IN FILE.
I have solved your exact problem in my own program. My program also automatically eliminates duplicate rows and other cool stuff.
You can try to see if fgetcsv will suffice. If it doesn't, please be a bit more descriptive on what you mean by columns.
I have two csv files, and both have same data structure.
ID - Join_date - Last_Login
I want to compare and get the exactly matching records numbers based on this example:
the first files has 100 records, of which 20 are not included in the 2nd file.
the 2nd file has 120 records.
I want a script in PHP to compare these two files and build two separate CSV files.
And I want to remove all extra records from the 2nd file which are not included in the first file.
And remove all records from the first file which are not included in the 2nd file.
Thanks
There is a GNU utility comm that will do this really easily. You could exec that through php or just do it directly. If you don't have access to comm, the easiest thing to do would be to store both files in an array (probably via file()) and use array_intersect().
You an try this for limited number of CSV file .. if you have a very large CSV i would advice you import it directly into MySQL
function csvToArray($csvFile, $full = false) {
$handle = fopen ( $csvFile, "r" );
$array = array ();
while ( ($data = fgetcsv ( $handle )) !== FALSE ) {
$array [] = ($full === true) ? $data : $data[0]; // Full array or only ID
}
return $array;
}
$file1 = "file1.csv" ;
$file2 = "file2.csv" ;
$fileData1 = csvToArray($file1);
$fileData2 = csvToArray($file2);
var_dump(array_diff($fileData1,$fileData2));
var_dump(array_intersect($fileData1,$fileData2));