Is it possible to validate a text file before I dump its data into a MYSQL database?
I want to check if it contains, say, 5 columns (of data). If so, then i go ahead with the following query:
LOAD DATA CONCURRENT INFILE 'c:/test/test.txt'
INTO TABLE DUMP_TABLE FIELDS TERMINATED BY '\t' ENCLOSED BY '' LINES TERMINATED BY '\n' ignore 1 lines.
If not, I remove the entire row. I repeat this process for all rows in the txt file.
The text file contains data of the format:
id col2 col3 2012-07-27-19:27:06 col5
id col2 col3 2012-07-25-09:58:50 col5
id col2 col3 2012-07-23-10:14:13 col5
EDIT: After reading your comments, here's the code for doing the same on tab separated data:
$handler = fopen("myfile.txt","r");
$error = false;
while (!feof($handler)){
fgets($handler,$linetocheck);
$cols = explode (chr(9), $linetocheck); //edit: using http://es.php.net/manual/en/function.fgetcsv.php you can get the same result as with fgets+explode
if (count($cols)>$max_cols){
$error=true;
break;
}
}
fclose($handler);
if (!$error){
//...do stuff
}
This code reads a file, let's say "myfile.txt", line by line, and sets variable $error to true if any of the lines has a length of more than $max_cols. (My apologies if that's not what you're asking, your question is not the most clear to me)
$handler = fopen("myfile.txt","r");
$error = false;
while (!feof($handler)){
fgets($handler,$linetocheck);
if (strlen($linetocheck)>$max_cols){
$error=true;
break;
}
}
fclose($handler);
if (!$error){
//...do stuff
}
I know it's an old thread, but I was looking something similar for myself and I came across to this topic, but none of the answers provided here helped me.
Thus, I've went ahead and came with my own solution which is tested and works perfectly (can be improved).
Assume, we have a CSV file named example.csv that contains the following dummy data (on purpose, the last line, 6th, contains one extra data then the other rows):
Name,Country,Age
John,Ireland,18
Ted,USA,22
Lisa,UK,23
Michael,USA,20
Louise,Ireland,22,11
Now, when we're checking the CSV file to assure all the rows have the same number of data, the following block of code will do the trick and pin-point on what line the error occurred:
function validateCsvColumnLength($pathToCsvFile)
{
if(!file_exists($pathToCsvFile) || !is_readable($pathToCsvFile)){
throw new \Exception('Filename doesn`t exist or is not readable.');
}
if (!$handle = fopen($pathToCsvFile, "r")) {
throw new \Exception("Stream error");
}
$rowLength = [];
$rowNumber = 0;
while (($data = fgetcsv($handle)) !== FALSE) {
$rowLength[] = count($data);
$rowNumber++;
}
fclose($handle);
$rowKeyWithError = array_search(max($rowLength), $rowLength);
$differentRowCount = count(array_unique($rowLength));
// if there's a row that has more or less data, throw an error with the line that triggered it
if ($differentRowCount !== 1) {
throw new \Exception("Error, data count from row {$rowKeyWithError} does not match header size");
}
return true;
}
To actually test it, just do a var_dump() to see the result:
var_dump(validateCsvColumnLength('example.csv'));
What columns do you mean? If you just means amount of characters in rows, just split (explode) the file into many rows and check whether their lengths are equal to 5.
If you meant columns with delimeters, then you should find amount of occurences of that splitter in each row and then again check are they equal to 5. use fgetcsv for that
I'm assuming your talking about the length of each line in the file. If so, here's a possible solution.
$file_handle = fopen("myfile", "r");
while (!feof($file_handle)) {
$line = fgets($file_handle);
if(strlen($line)!=5) {
throw new Exception("Could not save file to database.");
break;
}
}
fclose($file_handle);
Yes, it is possible. I've done that exact thing. Use PHP's csv processing functions.
You will need these functions:
fopen()
fgetcsv()
And possibly some others.
fgetcsv returns an array.
I'll give you a short example of how you can validate.
here's the csv:
col1,col2,col3,col4
1,2,3,4
1,2,3,4,
1,2,3,4,5
1,2,3,4
I'll skip the fopen part and go straight to the validation step.
Note that "\t" is the tab character.
$row_length;
$i = 0;
while($row = fgetcsv($handle,0,"\t") {
if($i == 0) {
$row_length = sizeof($row);
} else {
if(sizeof($row) != $row_length) {
echo "Error, line $i of the data does not match header size";
break;
}
}
}
That would test each row to make sure it is the same as the 1st row's ($i = 0) length.
EDIT:
And, in case you don't know how to search the internet, here is the page for fgetcsv:
http://php.net/manual/en/function.fgetcsv.php
Here is the function prototype:
array fgetcsv ( resource $handle [, int $length = 0 [, string $delimiter = ',' [, string $enclosure = '"' [, string $escape = '\' ]]]] )
As you can see, it has everything you would need for doing a quick scan in PHP before you send your data to LOAD DATA IN FILE.
I have solved your exact problem in my own program. My program also automatically eliminates duplicate rows and other cool stuff.
You can try to see if fgetcsv will suffice. If it doesn't, please be a bit more descriptive on what you mean by columns.
Related
I am developing an application which has to read large CSV file and process data. It will be definitely not possible to make it in one request because processing the data also takes time, it is not just about reading.
So what I tried so far and what has been working well so far is the following:
// Open file
$handle = fopen($file, 'r');
// Move pointer to a place where it stopped last time
fseek($handle, $offset);
// Read limited line and process
for ($i = 0; $i < $limit; $i++) {
// Get length of line for offset purposes
$newlength = strlen(fgets($handle));
// Move pointer back. fgets moves pointer so we move it back for fgetcsv to get that line again
fseek($handle, $offset);
$line = fgetcsv($handle, 0, $csv_delimiter);
// Process data here
// Save offset
$offset += $newlength;
}
So the problem is here on this line:
$newlength = strlen(fgets($handle));
It fails when csv column has line breaks.
I also tried $newlength = strlen(implode(';', fgetcsv($handle, 0, $csv_delimiter))); but this does not always work. It usually fails for few characters. Probably quotations and end of line is not handled properly here.
All I need is to get length of csv line, not just single line, but csv line which might have line breaks within quotes.
Anybody has better solution?
do one thing, create one mysql temporary table named "my_csv_data", and add one field in that table with all fields which are in csv file and extra add one "is_processed" with enum(0,1) default value '0'.
now import your all csv data in that sql table. it will never take more time for single insert.
now cerate one function/file which access my_csv_data table 10 or 100 records where is_processed='0' and process it and if process done successfully then update "is_processed" field to '1'.
now create one cronjob which hit that file/function. periodically.
using this way data will going to silently insert in your table without disturb/suffer any admin/front end user.
i have codeignitor code where i uploading the csv file data and insert it into mysql database. hope this will help you
if($_FILES["file"]["size"] > 0)
{
$file = fopen($filename, "r");
while (($emapData = fgetcsv($file, 10000, ",")) !== FALSE)
{
$data = array(
'reedumption_code' => $emapData[0],
'jb_note_id' =>$jbmoney_id,
'jbmoney' =>$jbamount,
'add_date'=>time(),
'modify_date'=>time(),
'user_id'=>0,
'status'=>1,
'assign_date'=>0,
'del_status'=>1,
'store_status'=>1
);
$this->load->model('currency_model');
$insertId = $this->currency_model->insertCSV($data);
}
fclose($file);
redirect('currency/add_currency?msg=Data Imported Successfully');
}
I have a CSV that is downloaded from the wholesaler everynight with updated prices.
What I need to do is edit the price column (2nd column) and multiply the current value by 1.3 (30%).
My code to read the provided CSV and take just the columns I need is below, however I can't seem to figure out how to edit the price column.
<?php
// open the csv file in write mode
$fp = fopen('var/import/tb_prices.csv', 'w');
// read csv file
if (($handle = fopen("var/import/Cbl_4036_2408.csv", "r")) !== FALSE) {
$targetColumns = array(1, 2, 3); // get data from the 1st, 4th and 15th column
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$targetData = array(); // array that hold target data
foreach($targetColumns as $column){ // loop throught the targeted columns array
if($column[2]){
$data[$column] = $data[0] * 1.3;
}
$targetData[] = $data[$column]; // get the data from the column
}
# Populate the multidimensional array.
$csvarray[$nn] = $targetData; // add target data to csvarray
// write csv file
fputcsv($fp, $targetData);
}
fclose($handle);
fclose($fp);
echo "CSV File Written Successfully!";
}
?>
Could somebody point me in the right direction please, explaining how you've worked out the function too so I can learn at the same time.
You are multiplying your price column always as - $data[0] * 1.3.
It may be wrong here.
Other views:
If you are doing it once in a lifetime of this data(csv) handling, try to solve it using mysql itself only. Create the table similar to the database, import the .csv data into that mysql table. And then, SQL operate as you want.
No loops; no coding, no file read/write, and precise control over what you want to do with UPDATE. You just need to be aware of the delimiters (line separators eg. \r\n, column separators (eg. comma or tab or semicolon) and data encoding in double/single-quotes or not)
Once you modify your data, you can export it back to csv again.
If you want to handle the .csv file itself, open it in one connection (read only mode), and write to another file - saving the original data.
you say that the column that contains the price is the second but then use that index with zero. anyway the whole thing can be easier
$handle = fopen("test.csv", "r");
if ( $handle !== FALSE) {
$out = "";
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
$data[1] = ((float)$data[1] * 1.3);
$out .= implode(";",$data) . "\n";
}
fclose($handle);
file_put_contents("test2.csv", $out);
}
this code open a csv file with comma as separator.
than read every line and for every line it's multiplies the second coloumn (index 1) for 1.3
this line
$out .= implode(";",$data) . "\n";
generate a line for new csb file. see implode on the officile documentation ...
after I close the connection to the file. and 'useless to have a connection with two files when you can do the writing of the second file in one fell swoop. the thing is true for small files
Simply trying to use the url query string as a lookup code in an csv table. This works 9/10 times, but on occasion this will return the wrong line (usually a few lines below what should be the correct line).
csv looks something like this
taskcode1, info1, info1, info1
taskcode2, info2, info2, info2
taskcode3, info3, info3, info3
The problem is that sometimes (around 1/10 times so far), a given url query of taskcode1 will actually return line info3.
This csv file is being read concurrently by more than one user. Could the problem be stemming from simultaneously reading? I know there can be issues for writing, and a flock on the file may be necessary. Here's the actual code in my php script. Thank you for any advice.
Notice that as soon as the task code is found, $this_taskcode == $taskcode, I break the while loop.
//get query from request and look up task configuration
$csv_file_path = "tasks.csv";
$taskcode = $_SERVER['QUERY_STRING'];
//open csv file and find taskcode
$fid = fopen($csv_file_path, 'r');
//loop through each line of csv until taskcode is found, then save the whole line as $hit
while (($line = fgetcsv($fid)) !== FALSE){
$this_taskcode = $line[0];
if ($this_taskcode == $taskcode){
$hit = $line;
break;
};
}
fclose($fid);
I try a bit with CSV import methods. In the moment I use fgetcsv with each row validation and inserts, which works but is slow. So i thought to use load data infile instead.
The CSV is exported by excel with ; as standard delimiter. Its from different users, so I cannot change this.
Problem is one column, which can contain HTML and also the use of ; in code.
When I use fgetscv with inserts I have no problems, identifying headers. But with load data in-file I get for sure problems with this delimiter.
This way i do this now:
while (($aCell = fgetcsv($handle, 1000, ";")) !== FALSE) {
$num = count($aCell);
//Run through columns, build array
for ($a = 0; $a < $num; $a++) {
// IDENTIFY HEADERS
switch ($field[$a]) {
case ($field_name = "comlumn1"):
// DO SOME VALIDATION etc.
$array['column1'] = $aCell[$a];
break;
case ($field_name = "comlumn2"):
// DO SOME VALIDATION etc.
$array['column2'] = $aCell[$a];
break;
// AND SO ON
}
}
// INSERT array in db for each row
}
An example of the csv structure:
column1 column2
1 <p style="margin-top: 10;">...
When i use load data infile i get problems with column description cause of ";"
LOAD DATA INFILE '/filepath/import.csv' INTO TABLE import_csv
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\n'
Which solutions I've got?
One idea is to read out the lines by fgetscv first (like i do above) make an temp array with no validation and no inserts and then do a new implode with an other delimiter, so that i can use it in load data in-file.
I'm not sure if it is faster or really make sense than doing the whole thing with fgetcsv, remember the fact, that there will not be more than 1000 rows in CSV.
EDIT
My solution now is, as written above build an array and implode it with a new unique delimiter:
$fp = fopen("file.csv","w");
foreach((array)$oldarray as $val) {
fwrite($fp,implode("*|*",$val)."\r\n");
}
fclose($fp);
With this unique delimiter i can use LOAD DATA INFILE with no problem.
I would like to check if csv file contain a header and ignore the header.
I have to do a check if the first column is not a character
csv file has format : avgTemperature, minTemperature, maxTemperature
$f = fopen("./uploads/" .$filename, "r");
$string = "avgTemperature";
if (fgetcsv($f)==$string){
// read the first line and ignore it
fgets($f);
}
I assume your complete code uses a loop (while or for).
As such, you have a few options.
Simply skip the first row always.
Use logic to test for the header row then skip.
Either way, continue is the key piece.
PHP pseudo code:
while (…) {
if ($row == $header_row) {
continue;
}
// data rows
}
UPDATE
The logic for determining if the first row is a header row seems like a better solution in your case. You could use the following to test for that.
if ($row[0] == 'avgTemperature') {
// header row
}
Note: This makes the assumption that the first column of data is avgTemperature and it's header is avgTemperature. Adjust as necessary.
Going from your comment, and from the idea that the actual data is temperatures (i.e. numeric data), if you do have headers, then they will be text strings and not numbers. Therefore you can do something like this:
$f = fopen("./uploads/" .$filename, "r");
if(!($data = fgetcsv($f))) {
return; //most likely empty file
}
if(!is_numeric($data[0])) {
//this is your header line - skip it - and read the next line
$data = fgetcsv($f);
}
while($data) {
//process a line of data
...
//and read the next line
$data = fgetcsv($f);
}
EDIT: An alternative version of the last loop would look like this:
do {
//process a line of data
...
}
while ($data = fgetcsv($f));