I am reading a CSV file to import data. I am having a column with some auto-generated numbers(text & Numbers). The problem is in some of the rows my script reads the value as exponential number.
Example: 58597E68 is considered 5.86E+72
I need it to read as String as not number. The issue occurs only if I am having the character (E) in middle of the auto-generated number.
$feed = 'path-to-csv/import.csv';
if (!file_exists($feed)) {
//$feed = 'import.csv';
exit('Cannot find the CSV file: ' . $feed);
}
$row=0;
if (($handle = fopen($feed, 'r')) !== FALSE) {
while (($data_csv_rows = fgetcsv($handle, 1000000, ',')) !== FALSE) {
$row++;
if ($row == 1) {
continue;
} // skipping header row
echo "Row " . ($row-1) . "<br>";print_r($data_csv_rows);echo "<br><br>";
}
}
The problem is not your CSV but the original software (probably Excel) that produced the CSV.
CSV is a simple data format when you find something like 5.86E+72 it's like that in the CSV data and it's too late to fix it.
To avoid this make sure you export the data correct into CSV.
Some PHP code to find this kind of bad data in a field:
if (strpos($value, 'E+') !== FALSE) {
preg_match('~E\+[0-9]+$~', $value, $preg_result);
if (isset($preg_result[0])) {
die('Probably wrong data found within "'.$value.'".');
}
}
}
In your case it seems that 58597E68 is converted to float(5.8597E+72).
At least with str_getcsv() I can not recreate the problem, see https://3v4l.org/RZ1eA.
By definition it would be correct, since there are no " around this data, so PHP tries to determinate the type of this data and if it is potentally a numeric value... So be sure add " around strings. PHP String to Numeric Conversion documentation.
Update: I can not reproduce your use-case! 58597E68 becomes "58597E68" with str_getcsv() and with fgetcsv() It is not autoconverted to float! See https://3v4l.org/oXkBu for details! I suspect there is something wrong with the data you provide us or your validation.
Related
I have a CSV that is downloaded from the wholesaler everynight with updated prices.
What I need to do is edit the price column (2nd column) and multiply the current value by 1.3 (30%).
My code to read the provided CSV and take just the columns I need is below, however I can't seem to figure out how to edit the price column.
<?php
// open the csv file in write mode
$fp = fopen('var/import/tb_prices.csv', 'w');
// read csv file
if (($handle = fopen("var/import/Cbl_4036_2408.csv", "r")) !== FALSE) {
$targetColumns = array(1, 2, 3); // get data from the 1st, 4th and 15th column
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$targetData = array(); // array that hold target data
foreach($targetColumns as $column){ // loop throught the targeted columns array
if($column[2]){
$data[$column] = $data[0] * 1.3;
}
$targetData[] = $data[$column]; // get the data from the column
}
# Populate the multidimensional array.
$csvarray[$nn] = $targetData; // add target data to csvarray
// write csv file
fputcsv($fp, $targetData);
}
fclose($handle);
fclose($fp);
echo "CSV File Written Successfully!";
}
?>
Could somebody point me in the right direction please, explaining how you've worked out the function too so I can learn at the same time.
You are multiplying your price column always as - $data[0] * 1.3.
It may be wrong here.
Other views:
If you are doing it once in a lifetime of this data(csv) handling, try to solve it using mysql itself only. Create the table similar to the database, import the .csv data into that mysql table. And then, SQL operate as you want.
No loops; no coding, no file read/write, and precise control over what you want to do with UPDATE. You just need to be aware of the delimiters (line separators eg. \r\n, column separators (eg. comma or tab or semicolon) and data encoding in double/single-quotes or not)
Once you modify your data, you can export it back to csv again.
If you want to handle the .csv file itself, open it in one connection (read only mode), and write to another file - saving the original data.
you say that the column that contains the price is the second but then use that index with zero. anyway the whole thing can be easier
$handle = fopen("test.csv", "r");
if ( $handle !== FALSE) {
$out = "";
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
$data[1] = ((float)$data[1] * 1.3);
$out .= implode(";",$data) . "\n";
}
fclose($handle);
file_put_contents("test2.csv", $out);
}
this code open a csv file with comma as separator.
than read every line and for every line it's multiplies the second coloumn (index 1) for 1.3
this line
$out .= implode(";",$data) . "\n";
generate a line for new csb file. see implode on the officile documentation ...
after I close the connection to the file. and 'useless to have a connection with two files when you can do the writing of the second file in one fell swoop. the thing is true for small files
I created a CSV parser that works fine for some CSV files I've found online, but one that I converted from XLS to CSV via Microsoft Excel 2011 does not work.
The ones that work are formatted as such:
"Sort Order","Common Name","Formal Name","Type","Sub Type","Sovereignty","Capital","ISO 4217 Currency Code","ISO 4217 Currency Name","ITU-T Telephone Code","ISO 3166-1 2 Letter Code","ISO 3166-1 3 Letter Code","ISO 3166-1 Number","IANA Country Code TLD"
"1","Afghanistan","Islamic State of Afghanistan","Independent State",,,"Kabul","AFN","Afghani","+93","AF","AFG","004",".af".........................etc...
The one that doesn't work is formatted like this:
Order Id,Date Ordered,Date Returned,Product Id,Description,Order Reason Code,Return Qty,Order Return Comment,Ship To Name,Ship To Address1,Ship To Address2,Ship To Address3,Ship To City,Ship To State,Ship To Zipcode,Ship To Country,Disposition,Ship To Email,ShipVia
5555555,2013-07-05 13:58:36.000,2013-08-16 00:00:00.000,5555-55,0555 - Some Test Thing,Refund,2,,jeric beatty,123 fake st,,,burke,NJ,55055,US,Discard,test#test.com,Super Fast Shipping
Is there anyway to get excel to export in the format as the first one? I would like to avoid doing this manually as the file is huge and I would have to manually edit lots of parts of it where I couldn't do a "replace all". Another issue could be that there are double and sometimes triple commas in some places. Though this does appear in both files.
Here is the parser:
function ingest_csv() {
$file_url = 'http://www.path.to/csv/file.csv';
$record_num = 0;
$records = array();
$header = array();
if (($handle = fopen($file_url, "r")) !== FALSE) {
$records['id'] = '';
while (($data = fgetcsv($handle)) !== FALSE) {
$records['id'][$record_num] = '';
$cell_num = 0;
foreach ($data as $cell) {
if($record_num == 0) {
$header = $data;
} else {
$current_key = $header[$cell_num];
$records['id'][$record_num][$current_key] = $cell;
}
$cell_num++;
}
$record_num++;
}
fclose($handle);
}
else {
echo 'could not open file.';
}
return array($record_num, $records);
}
function batch_csv() {
list($num_rows, $rows) = ingest_csv
print_r($num_rows);
print_r($rows);
}
As mentioned in the comments though you may be trying to reinvent the wheel here, though personally I've asked questions where I didn't want to give long rambling explanations of why I was forced to use unconventional approaches so should this be one of those situations here's an answer.
In OpenOffice Calculator (for example) and when you go to save as CSV you get a number of further options including the decision to double quote all fields.
Unfortunately Excel doesn't give you the choice, but Microsoft do offer up a workaround using a macro - http://support.microsoft.com/kb/291296/en-us
From a csv file I need to extract the header and the values. Both are later accessed in frontend.
$header = array();
$contacts = array();
if ($request->isMethod('POST')) {
if (($handle = fopen($_FILES['file']['tmp_name'], "r")) !== FALSE) {
$header = fgetcsv($handle, 1000, ",");
while (($values = fgetcsv($handle, 1000, ",")) !== FALSE) {
// array_combine
// Creates an array by using one array for keys
// and another for its values
$contacts[] = array_combine($header, $values);
}
fclose($handle);
}
}
It works with csv files that look like this
Name,Firstname,Organisation,
Bar,Foo,SO,
I just exported my gmail contacts and tried to read them using the above code but I get following error
Warning: array_combine() [function.array-combine]: Both
parameters should have an equal number of elements
The gmail csv looks like this
Name,Firstname,Organisation
Bar,Foo,SO
Is the last missing , the reason for the error? What is wrong and how to fix it?
I found this on SO
function array_combine2($arr1, $arr2) {
$count = min(count($arr1), count($arr2));
return array_combine(array_slice($arr1, 0, $count),
array_slice($arr2, 0, $count));
}
This works but it skips the Name field and not all fields are combined. Is this because the gmail csv is not realy valid? Any suggestions?
I managed this by expanding the array size or slicing it depending on the size of the header.
if (count($header) > count($values)) {
$contacts = array_pad($values, count($header), null);
} else if (count($header) < count($values)) {
$contacts = array_slice($values, 0, count($header));
} else {
$contacts = $values;
}
Although this isn't the answer to the question you asked, it might be the answer to the source of the problem. I recently had this problem and realized I was making a silly error because I didn't understand the fgetcsv() function's parameters:
That 1000 up there denotes the maximum line length of a single line in the csv you're taking content from. Longer than that, and the function returns null! I don't know why the version given in the examples is so stingy, but it's not required; setting it to 0 allows fgetcsv() to read lines of any length. (The documentation warns this is slower. For most use cases of fgetcsv() I can hardly imagine it's slow enough to notice.)
Is it possible to validate a text file before I dump its data into a MYSQL database?
I want to check if it contains, say, 5 columns (of data). If so, then i go ahead with the following query:
LOAD DATA CONCURRENT INFILE 'c:/test/test.txt'
INTO TABLE DUMP_TABLE FIELDS TERMINATED BY '\t' ENCLOSED BY '' LINES TERMINATED BY '\n' ignore 1 lines.
If not, I remove the entire row. I repeat this process for all rows in the txt file.
The text file contains data of the format:
id col2 col3 2012-07-27-19:27:06 col5
id col2 col3 2012-07-25-09:58:50 col5
id col2 col3 2012-07-23-10:14:13 col5
EDIT: After reading your comments, here's the code for doing the same on tab separated data:
$handler = fopen("myfile.txt","r");
$error = false;
while (!feof($handler)){
fgets($handler,$linetocheck);
$cols = explode (chr(9), $linetocheck); //edit: using http://es.php.net/manual/en/function.fgetcsv.php you can get the same result as with fgets+explode
if (count($cols)>$max_cols){
$error=true;
break;
}
}
fclose($handler);
if (!$error){
//...do stuff
}
This code reads a file, let's say "myfile.txt", line by line, and sets variable $error to true if any of the lines has a length of more than $max_cols. (My apologies if that's not what you're asking, your question is not the most clear to me)
$handler = fopen("myfile.txt","r");
$error = false;
while (!feof($handler)){
fgets($handler,$linetocheck);
if (strlen($linetocheck)>$max_cols){
$error=true;
break;
}
}
fclose($handler);
if (!$error){
//...do stuff
}
I know it's an old thread, but I was looking something similar for myself and I came across to this topic, but none of the answers provided here helped me.
Thus, I've went ahead and came with my own solution which is tested and works perfectly (can be improved).
Assume, we have a CSV file named example.csv that contains the following dummy data (on purpose, the last line, 6th, contains one extra data then the other rows):
Name,Country,Age
John,Ireland,18
Ted,USA,22
Lisa,UK,23
Michael,USA,20
Louise,Ireland,22,11
Now, when we're checking the CSV file to assure all the rows have the same number of data, the following block of code will do the trick and pin-point on what line the error occurred:
function validateCsvColumnLength($pathToCsvFile)
{
if(!file_exists($pathToCsvFile) || !is_readable($pathToCsvFile)){
throw new \Exception('Filename doesn`t exist or is not readable.');
}
if (!$handle = fopen($pathToCsvFile, "r")) {
throw new \Exception("Stream error");
}
$rowLength = [];
$rowNumber = 0;
while (($data = fgetcsv($handle)) !== FALSE) {
$rowLength[] = count($data);
$rowNumber++;
}
fclose($handle);
$rowKeyWithError = array_search(max($rowLength), $rowLength);
$differentRowCount = count(array_unique($rowLength));
// if there's a row that has more or less data, throw an error with the line that triggered it
if ($differentRowCount !== 1) {
throw new \Exception("Error, data count from row {$rowKeyWithError} does not match header size");
}
return true;
}
To actually test it, just do a var_dump() to see the result:
var_dump(validateCsvColumnLength('example.csv'));
What columns do you mean? If you just means amount of characters in rows, just split (explode) the file into many rows and check whether their lengths are equal to 5.
If you meant columns with delimeters, then you should find amount of occurences of that splitter in each row and then again check are they equal to 5. use fgetcsv for that
I'm assuming your talking about the length of each line in the file. If so, here's a possible solution.
$file_handle = fopen("myfile", "r");
while (!feof($file_handle)) {
$line = fgets($file_handle);
if(strlen($line)!=5) {
throw new Exception("Could not save file to database.");
break;
}
}
fclose($file_handle);
Yes, it is possible. I've done that exact thing. Use PHP's csv processing functions.
You will need these functions:
fopen()
fgetcsv()
And possibly some others.
fgetcsv returns an array.
I'll give you a short example of how you can validate.
here's the csv:
col1,col2,col3,col4
1,2,3,4
1,2,3,4,
1,2,3,4,5
1,2,3,4
I'll skip the fopen part and go straight to the validation step.
Note that "\t" is the tab character.
$row_length;
$i = 0;
while($row = fgetcsv($handle,0,"\t") {
if($i == 0) {
$row_length = sizeof($row);
} else {
if(sizeof($row) != $row_length) {
echo "Error, line $i of the data does not match header size";
break;
}
}
}
That would test each row to make sure it is the same as the 1st row's ($i = 0) length.
EDIT:
And, in case you don't know how to search the internet, here is the page for fgetcsv:
http://php.net/manual/en/function.fgetcsv.php
Here is the function prototype:
array fgetcsv ( resource $handle [, int $length = 0 [, string $delimiter = ',' [, string $enclosure = '"' [, string $escape = '\' ]]]] )
As you can see, it has everything you would need for doing a quick scan in PHP before you send your data to LOAD DATA IN FILE.
I have solved your exact problem in my own program. My program also automatically eliminates duplicate rows and other cool stuff.
You can try to see if fgetcsv will suffice. If it doesn't, please be a bit more descriptive on what you mean by columns.
I have a huge CSV file (10M records) in the following format.
147804,AC,34,15AUG09,09:00,15AUG09,21:00,YYZ,YVR,PLS
147816,AC,34,26AUG09,09:00,01SEP09,21:00,YYZ,YVR,PLS
I need to import them into a mysql database. How can I change all the months to numeric months and preferably into yyyy/mm/dd format.
Thanks
This is difficult to accomplish with regex and would be error prone. PHP has CSV support built-in and it’s a lot safer.
<?php
if (($if = fopen("src_file.csv", "r")) !== FALSE) {
if (($of = fopen("dst_file.csv", "w")) !== FALSE) {
while (($cols = fgetcsv($if)) !== FALSE) {
$cols[3] = date('Y/m/d',strtotime($cols[3]));
$cols[5] = date('Y/m/d',strtotime($cols[5]));
fputcsv($of, $cols);
}
fclose($of);
}
fclose($if);
}
?>
I don’t know if it would be more efficient to just store $cols in the database or create a new file and import it. I don’t have any benchmarks.
Amazingly, this seems to do the trick :
echo date('Y/m/d', strtotime('15AUG09'));
returns : 2009/08/15.
If you can manage to parse your CSV, you'll get your date in the format you want.