I try a bit with CSV import methods. In the moment I use fgetcsv with each row validation and inserts, which works but is slow. So i thought to use load data infile instead.
The CSV is exported by excel with ; as standard delimiter. Its from different users, so I cannot change this.
Problem is one column, which can contain HTML and also the use of ; in code.
When I use fgetscv with inserts I have no problems, identifying headers. But with load data in-file I get for sure problems with this delimiter.
This way i do this now:
while (($aCell = fgetcsv($handle, 1000, ";")) !== FALSE) {
$num = count($aCell);
//Run through columns, build array
for ($a = 0; $a < $num; $a++) {
// IDENTIFY HEADERS
switch ($field[$a]) {
case ($field_name = "comlumn1"):
// DO SOME VALIDATION etc.
$array['column1'] = $aCell[$a];
break;
case ($field_name = "comlumn2"):
// DO SOME VALIDATION etc.
$array['column2'] = $aCell[$a];
break;
// AND SO ON
}
}
// INSERT array in db for each row
}
An example of the csv structure:
column1 column2
1 <p style="margin-top: 10;">...
When i use load data infile i get problems with column description cause of ";"
LOAD DATA INFILE '/filepath/import.csv' INTO TABLE import_csv
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\n'
Which solutions I've got?
One idea is to read out the lines by fgetscv first (like i do above) make an temp array with no validation and no inserts and then do a new implode with an other delimiter, so that i can use it in load data in-file.
I'm not sure if it is faster or really make sense than doing the whole thing with fgetcsv, remember the fact, that there will not be more than 1000 rows in CSV.
EDIT
My solution now is, as written above build an array and implode it with a new unique delimiter:
$fp = fopen("file.csv","w");
foreach((array)$oldarray as $val) {
fwrite($fp,implode("*|*",$val)."\r\n");
}
fclose($fp);
With this unique delimiter i can use LOAD DATA INFILE with no problem.
Related
I am developing an application which has to read large CSV file and process data. It will be definitely not possible to make it in one request because processing the data also takes time, it is not just about reading.
So what I tried so far and what has been working well so far is the following:
// Open file
$handle = fopen($file, 'r');
// Move pointer to a place where it stopped last time
fseek($handle, $offset);
// Read limited line and process
for ($i = 0; $i < $limit; $i++) {
// Get length of line for offset purposes
$newlength = strlen(fgets($handle));
// Move pointer back. fgets moves pointer so we move it back for fgetcsv to get that line again
fseek($handle, $offset);
$line = fgetcsv($handle, 0, $csv_delimiter);
// Process data here
// Save offset
$offset += $newlength;
}
So the problem is here on this line:
$newlength = strlen(fgets($handle));
It fails when csv column has line breaks.
I also tried $newlength = strlen(implode(';', fgetcsv($handle, 0, $csv_delimiter))); but this does not always work. It usually fails for few characters. Probably quotations and end of line is not handled properly here.
All I need is to get length of csv line, not just single line, but csv line which might have line breaks within quotes.
Anybody has better solution?
do one thing, create one mysql temporary table named "my_csv_data", and add one field in that table with all fields which are in csv file and extra add one "is_processed" with enum(0,1) default value '0'.
now import your all csv data in that sql table. it will never take more time for single insert.
now cerate one function/file which access my_csv_data table 10 or 100 records where is_processed='0' and process it and if process done successfully then update "is_processed" field to '1'.
now create one cronjob which hit that file/function. periodically.
using this way data will going to silently insert in your table without disturb/suffer any admin/front end user.
i have codeignitor code where i uploading the csv file data and insert it into mysql database. hope this will help you
if($_FILES["file"]["size"] > 0)
{
$file = fopen($filename, "r");
while (($emapData = fgetcsv($file, 10000, ",")) !== FALSE)
{
$data = array(
'reedumption_code' => $emapData[0],
'jb_note_id' =>$jbmoney_id,
'jbmoney' =>$jbamount,
'add_date'=>time(),
'modify_date'=>time(),
'user_id'=>0,
'status'=>1,
'assign_date'=>0,
'del_status'=>1,
'store_status'=>1
);
$this->load->model('currency_model');
$insertId = $this->currency_model->insertCSV($data);
}
fclose($file);
redirect('currency/add_currency?msg=Data Imported Successfully');
}
I have a CSV that is downloaded from the wholesaler everynight with updated prices.
What I need to do is edit the price column (2nd column) and multiply the current value by 1.3 (30%).
My code to read the provided CSV and take just the columns I need is below, however I can't seem to figure out how to edit the price column.
<?php
// open the csv file in write mode
$fp = fopen('var/import/tb_prices.csv', 'w');
// read csv file
if (($handle = fopen("var/import/Cbl_4036_2408.csv", "r")) !== FALSE) {
$targetColumns = array(1, 2, 3); // get data from the 1st, 4th and 15th column
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$targetData = array(); // array that hold target data
foreach($targetColumns as $column){ // loop throught the targeted columns array
if($column[2]){
$data[$column] = $data[0] * 1.3;
}
$targetData[] = $data[$column]; // get the data from the column
}
# Populate the multidimensional array.
$csvarray[$nn] = $targetData; // add target data to csvarray
// write csv file
fputcsv($fp, $targetData);
}
fclose($handle);
fclose($fp);
echo "CSV File Written Successfully!";
}
?>
Could somebody point me in the right direction please, explaining how you've worked out the function too so I can learn at the same time.
You are multiplying your price column always as - $data[0] * 1.3.
It may be wrong here.
Other views:
If you are doing it once in a lifetime of this data(csv) handling, try to solve it using mysql itself only. Create the table similar to the database, import the .csv data into that mysql table. And then, SQL operate as you want.
No loops; no coding, no file read/write, and precise control over what you want to do with UPDATE. You just need to be aware of the delimiters (line separators eg. \r\n, column separators (eg. comma or tab or semicolon) and data encoding in double/single-quotes or not)
Once you modify your data, you can export it back to csv again.
If you want to handle the .csv file itself, open it in one connection (read only mode), and write to another file - saving the original data.
you say that the column that contains the price is the second but then use that index with zero. anyway the whole thing can be easier
$handle = fopen("test.csv", "r");
if ( $handle !== FALSE) {
$out = "";
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
$data[1] = ((float)$data[1] * 1.3);
$out .= implode(";",$data) . "\n";
}
fclose($handle);
file_put_contents("test2.csv", $out);
}
this code open a csv file with comma as separator.
than read every line and for every line it's multiplies the second coloumn (index 1) for 1.3
this line
$out .= implode(";",$data) . "\n";
generate a line for new csb file. see implode on the officile documentation ...
after I close the connection to the file. and 'useless to have a connection with two files when you can do the writing of the second file in one fell swoop. the thing is true for small files
Simply trying to use the url query string as a lookup code in an csv table. This works 9/10 times, but on occasion this will return the wrong line (usually a few lines below what should be the correct line).
csv looks something like this
taskcode1, info1, info1, info1
taskcode2, info2, info2, info2
taskcode3, info3, info3, info3
The problem is that sometimes (around 1/10 times so far), a given url query of taskcode1 will actually return line info3.
This csv file is being read concurrently by more than one user. Could the problem be stemming from simultaneously reading? I know there can be issues for writing, and a flock on the file may be necessary. Here's the actual code in my php script. Thank you for any advice.
Notice that as soon as the task code is found, $this_taskcode == $taskcode, I break the while loop.
//get query from request and look up task configuration
$csv_file_path = "tasks.csv";
$taskcode = $_SERVER['QUERY_STRING'];
//open csv file and find taskcode
$fid = fopen($csv_file_path, 'r');
//loop through each line of csv until taskcode is found, then save the whole line as $hit
while (($line = fgetcsv($fid)) !== FALSE){
$this_taskcode = $line[0];
if ($this_taskcode == $taskcode){
$hit = $line;
break;
};
}
fclose($fid);
I am writing a php script that will parse through a file, (synonyms.dat), and coordinate a list of synonyms with their parent word, for about 150k words.
Example from file:
1|2
(adj)|one|i|ane|cardinal
(noun)|one|I|ace|single|unity|digit|figure
1-dodecanol|1
(noun)|lauryl alcohol|alcohol
1-hitter|1
(noun)|one-hitter|baseball|baseball game|ball
10|2
(adj)|ten|x|cardinal
(noun)|ten|X|tenner|decade|large integer
100|2
(adj)|hundred|a hundred|one hundred|c|cardinal
(noun)|hundred|C|century|one C|centred|large integer
1000|2
(adj)|thousand|a thousand|one thousand|m|k|cardinal
(noun)|thousand|one thousand|M|K|chiliad|G|grand|thou|yard|large integer
**10000|1
(noun)|ten thousand|myriad|large**
In the example above I want to link ten thousand, myriad, large to the word 1000.
I have tried various method of reading the .dat file into memory using file_get_contents and then exploding the file at \n, and using various array search techniques to find the 'parent' word and it's synonyms. However, this is extremely slow, and more often then not crashes my web server.
I believe what I need to do is use preg_match_all to explode the string, and then just iterate over the string, inserting into my database where appropriate.
$contents = file_get_contents($page);
preg_match_all("/([^\s]+)\|[0-9].*/",$contents,$out, PREG_SET_ORDER);
This matches each
1|2
1-dodecanol|1
1-hitter|1
But I don't know how to link the fields in between each match, IE the synonyms themselves.
This script is intended to be run once, to get all the information into my database appropriately. For those interested, I have a database 'synonym_index' which holds a unique id of each word, as well as the word. Then another table 'synonym_listing' which contains a 'word_id' column and a 'synomym_id' column where each column is a foreign key to synonym_index. There can be multiple synonym_id's to each word_id.
Your help is greatly appreciated!
You can use explode() to split each line into fields. (Or, depending on the precise format of the input, fgetcsv() might be a better choice.)
Illustrative example, which will almost certainly need adjustment for your specific use case and data format:
$infile = fopen('synonyms.dat', 'r');
while (!feof($infile)) {
$line = rtrim(fgets($infile), "\r\n");
if ( $line === '' ) {
continue;
}
// Line follows the format HEAD_WORD|NUMBER_OF_SYNONYM_LINES
list($headWord, $n) = explode('|', $line);
$synonyms = array();
// For each synonym line...
while ( $n-- ) {
$line = rtrim(fgets($infile), "\r\n");
$fields = explode('|', $line);
$partOfSpeech = substr(array_shift($fields), 1, -1);
$synonyms[$partOfSpeech] = $fields;
}
// Now here, when $headWord is '**10000', $synonyms should be array(
// 'noun' => array('ten thousand', 'myriad', 'large**')
// )
}
Wow, for this type of functionality you have databases with tables and indices.
PHP is to serve a request/response, not to read a big file into memory. I advise you to put the data in a database. That will be much faster - and it is made for it.
Is it possible to validate a text file before I dump its data into a MYSQL database?
I want to check if it contains, say, 5 columns (of data). If so, then i go ahead with the following query:
LOAD DATA CONCURRENT INFILE 'c:/test/test.txt'
INTO TABLE DUMP_TABLE FIELDS TERMINATED BY '\t' ENCLOSED BY '' LINES TERMINATED BY '\n' ignore 1 lines.
If not, I remove the entire row. I repeat this process for all rows in the txt file.
The text file contains data of the format:
id col2 col3 2012-07-27-19:27:06 col5
id col2 col3 2012-07-25-09:58:50 col5
id col2 col3 2012-07-23-10:14:13 col5
EDIT: After reading your comments, here's the code for doing the same on tab separated data:
$handler = fopen("myfile.txt","r");
$error = false;
while (!feof($handler)){
fgets($handler,$linetocheck);
$cols = explode (chr(9), $linetocheck); //edit: using http://es.php.net/manual/en/function.fgetcsv.php you can get the same result as with fgets+explode
if (count($cols)>$max_cols){
$error=true;
break;
}
}
fclose($handler);
if (!$error){
//...do stuff
}
This code reads a file, let's say "myfile.txt", line by line, and sets variable $error to true if any of the lines has a length of more than $max_cols. (My apologies if that's not what you're asking, your question is not the most clear to me)
$handler = fopen("myfile.txt","r");
$error = false;
while (!feof($handler)){
fgets($handler,$linetocheck);
if (strlen($linetocheck)>$max_cols){
$error=true;
break;
}
}
fclose($handler);
if (!$error){
//...do stuff
}
I know it's an old thread, but I was looking something similar for myself and I came across to this topic, but none of the answers provided here helped me.
Thus, I've went ahead and came with my own solution which is tested and works perfectly (can be improved).
Assume, we have a CSV file named example.csv that contains the following dummy data (on purpose, the last line, 6th, contains one extra data then the other rows):
Name,Country,Age
John,Ireland,18
Ted,USA,22
Lisa,UK,23
Michael,USA,20
Louise,Ireland,22,11
Now, when we're checking the CSV file to assure all the rows have the same number of data, the following block of code will do the trick and pin-point on what line the error occurred:
function validateCsvColumnLength($pathToCsvFile)
{
if(!file_exists($pathToCsvFile) || !is_readable($pathToCsvFile)){
throw new \Exception('Filename doesn`t exist or is not readable.');
}
if (!$handle = fopen($pathToCsvFile, "r")) {
throw new \Exception("Stream error");
}
$rowLength = [];
$rowNumber = 0;
while (($data = fgetcsv($handle)) !== FALSE) {
$rowLength[] = count($data);
$rowNumber++;
}
fclose($handle);
$rowKeyWithError = array_search(max($rowLength), $rowLength);
$differentRowCount = count(array_unique($rowLength));
// if there's a row that has more or less data, throw an error with the line that triggered it
if ($differentRowCount !== 1) {
throw new \Exception("Error, data count from row {$rowKeyWithError} does not match header size");
}
return true;
}
To actually test it, just do a var_dump() to see the result:
var_dump(validateCsvColumnLength('example.csv'));
What columns do you mean? If you just means amount of characters in rows, just split (explode) the file into many rows and check whether their lengths are equal to 5.
If you meant columns with delimeters, then you should find amount of occurences of that splitter in each row and then again check are they equal to 5. use fgetcsv for that
I'm assuming your talking about the length of each line in the file. If so, here's a possible solution.
$file_handle = fopen("myfile", "r");
while (!feof($file_handle)) {
$line = fgets($file_handle);
if(strlen($line)!=5) {
throw new Exception("Could not save file to database.");
break;
}
}
fclose($file_handle);
Yes, it is possible. I've done that exact thing. Use PHP's csv processing functions.
You will need these functions:
fopen()
fgetcsv()
And possibly some others.
fgetcsv returns an array.
I'll give you a short example of how you can validate.
here's the csv:
col1,col2,col3,col4
1,2,3,4
1,2,3,4,
1,2,3,4,5
1,2,3,4
I'll skip the fopen part and go straight to the validation step.
Note that "\t" is the tab character.
$row_length;
$i = 0;
while($row = fgetcsv($handle,0,"\t") {
if($i == 0) {
$row_length = sizeof($row);
} else {
if(sizeof($row) != $row_length) {
echo "Error, line $i of the data does not match header size";
break;
}
}
}
That would test each row to make sure it is the same as the 1st row's ($i = 0) length.
EDIT:
And, in case you don't know how to search the internet, here is the page for fgetcsv:
http://php.net/manual/en/function.fgetcsv.php
Here is the function prototype:
array fgetcsv ( resource $handle [, int $length = 0 [, string $delimiter = ',' [, string $enclosure = '"' [, string $escape = '\' ]]]] )
As you can see, it has everything you would need for doing a quick scan in PHP before you send your data to LOAD DATA IN FILE.
I have solved your exact problem in my own program. My program also automatically eliminates duplicate rows and other cool stuff.
You can try to see if fgetcsv will suffice. If it doesn't, please be a bit more descriptive on what you mean by columns.