I have a huge CSV file (10M records) in the following format.
147804,AC,34,15AUG09,09:00,15AUG09,21:00,YYZ,YVR,PLS
147816,AC,34,26AUG09,09:00,01SEP09,21:00,YYZ,YVR,PLS
I need to import them into a mysql database. How can I change all the months to numeric months and preferably into yyyy/mm/dd format.
Thanks
This is difficult to accomplish with regex and would be error prone. PHP has CSV support built-in and it’s a lot safer.
<?php
if (($if = fopen("src_file.csv", "r")) !== FALSE) {
if (($of = fopen("dst_file.csv", "w")) !== FALSE) {
while (($cols = fgetcsv($if)) !== FALSE) {
$cols[3] = date('Y/m/d',strtotime($cols[3]));
$cols[5] = date('Y/m/d',strtotime($cols[5]));
fputcsv($of, $cols);
}
fclose($of);
}
fclose($if);
}
?>
I don’t know if it would be more efficient to just store $cols in the database or create a new file and import it. I don’t have any benchmarks.
Amazingly, this seems to do the trick :
echo date('Y/m/d', strtotime('15AUG09'));
returns : 2009/08/15.
If you can manage to parse your CSV, you'll get your date in the format you want.
Related
I am reading a CSV file to import data. I am having a column with some auto-generated numbers(text & Numbers). The problem is in some of the rows my script reads the value as exponential number.
Example: 58597E68 is considered 5.86E+72
I need it to read as String as not number. The issue occurs only if I am having the character (E) in middle of the auto-generated number.
$feed = 'path-to-csv/import.csv';
if (!file_exists($feed)) {
//$feed = 'import.csv';
exit('Cannot find the CSV file: ' . $feed);
}
$row=0;
if (($handle = fopen($feed, 'r')) !== FALSE) {
while (($data_csv_rows = fgetcsv($handle, 1000000, ',')) !== FALSE) {
$row++;
if ($row == 1) {
continue;
} // skipping header row
echo "Row " . ($row-1) . "<br>";print_r($data_csv_rows);echo "<br><br>";
}
}
The problem is not your CSV but the original software (probably Excel) that produced the CSV.
CSV is a simple data format when you find something like 5.86E+72 it's like that in the CSV data and it's too late to fix it.
To avoid this make sure you export the data correct into CSV.
Some PHP code to find this kind of bad data in a field:
if (strpos($value, 'E+') !== FALSE) {
preg_match('~E\+[0-9]+$~', $value, $preg_result);
if (isset($preg_result[0])) {
die('Probably wrong data found within "'.$value.'".');
}
}
}
In your case it seems that 58597E68 is converted to float(5.8597E+72).
At least with str_getcsv() I can not recreate the problem, see https://3v4l.org/RZ1eA.
By definition it would be correct, since there are no " around this data, so PHP tries to determinate the type of this data and if it is potentally a numeric value... So be sure add " around strings. PHP String to Numeric Conversion documentation.
Update: I can not reproduce your use-case! 58597E68 becomes "58597E68" with str_getcsv() and with fgetcsv() It is not autoconverted to float! See https://3v4l.org/oXkBu for details! I suspect there is something wrong with the data you provide us or your validation.
I am building a website in which i want to give the users a choice to upload their excel file which has all the data.
Website is built on PHP, Database used- MySQL.
When a user uploads the excel sheet, all the data has to be imported into my Database. Now i want to do it programatically using PHP. Can anyone help me out with this. The code should also be able to extract data from multiple tabs in the excel file.
Thank you.
You can try with any of the below libraries if you want Excel file itself need to be imported.
http://phpexcel.codeplex.com/
http://sourceforge.net/projects/phpexcelreader/
Note :
Importing from Excel files is harder than improting from CSV files. So I suggest you to provide an option for importing into MySQL from CSV. (Users can convert XLS to CSV using Excel)
Look at PHP function fgetcsv at:
http://ca.php.net/manual/en/function.fgetcsv.php
Eg.
<?php
$row = 1;
if (($handle = fopen("test.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num = count($data);
echo "<p> $num fields in line $row: <br /></p>\n";
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c] . "<br />\n";
}
}
fclose($handle);
}
?>
First, try to avoid Excel format in favor of CSV. It is much faster and simpler.
Also, you can use PHPExcel library.
you should use PHPExcel
http://phpexcel.codeplex.com/
you can use following examples
http://phpexcel.codeplex.com/wikipage?title=Examples
you can also have a look at this link
https://code.google.com/p/php-excel-reader/wiki/Documentation
I wand to read biiiiig CSV-Files and want to insert them into a database. That already works:
if(($handleF = fopen($path."\\".$file, 'r')) !== false){
$i = 1;
// loop through the file line-by-line
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->executeInsert($data, $tableFields);
}
unset($dataRow);
}
$i++;
}
fclose($handleF);
}
My problem of this solution is, that it's very slow. But the files are too big to put it directly into the memory... So I wand to ask, if there a posibility to read, for example 10 lines, into the $dataRow array not only one or all.
I want to get a better balance between the memory and the performance.
Do you understand what i mean? Thanks for help.
Greetz
V
EDIT:
Ok, I still have to try to find a solution with the MSSQL-Database. My solution was to stack the data and than make a multiple-MSSQL-Insert:
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->setCurrentRow($i);
if(count($dataStack) > 210){
array_push($dataStack, $data);
#echo '<pre>', print_r($dataStack), '</pre>';
$this->executeInsert($dataStack, $tableFields, true);
// reset the stack
unset($dataStack);
$dataStack = array();
} else {
array_push($dataStack, $data);
}
unset($data);
}
$i++;
unset($dataRow);
}
}
Finaly I have to loop the Stack and build in mulitiple Insert in the method "executeInsert", to create a query like this:
INSERT INTO [myTable] (field1, field2) VALUES ('data1', 'data2'),('data2', 'datta3')...
That works much better. I still have to check the best balance, but therefor i can change only the value '210' in the code above. I hope that help's everybody with a similar problem.
Attention: Don't forget to execute the method "executeInsert" again after readin the complete file, because it could happen that there are still some data in the stack and the method will only be executed when the stack reach the size of 210....
Greetz
V
I think your bottleneck is not reading the file. Which is a text file. Your bottleneck is the INSERT in the SQL table.
Do something, just comment the line that actually do the insert and you will see the difference.
I had this same issue in the past, where i did exactly what you are doing. reading a 5+ million lines CSV and inserting it in a Mysql table. The execution time was 60 hours which is
unrealistic.
My solutions was switch to another db technology. I selected MongoDB and the execution time
was reduced to 5 minutes. MongoDB performs really fast on this scenarios and also have a tool called mongoimport that will allow you to import a csv file firectly from the command line.
Give it a try if the db technology is not a limitation on your side.
Another solution will be spliting the huge CSV file into chunks and then run the same php script multiple times in parallel and each one will take care of the chunks with an specific preffix or suffix on the filename.
I don't know which specific OS are you using, but in Unix/Linux there is a command line tool
called split that will do that for you and will also add any prefix or suffix you want to the filename of the chunks.
I created a CSV parser that works fine for some CSV files I've found online, but one that I converted from XLS to CSV via Microsoft Excel 2011 does not work.
The ones that work are formatted as such:
"Sort Order","Common Name","Formal Name","Type","Sub Type","Sovereignty","Capital","ISO 4217 Currency Code","ISO 4217 Currency Name","ITU-T Telephone Code","ISO 3166-1 2 Letter Code","ISO 3166-1 3 Letter Code","ISO 3166-1 Number","IANA Country Code TLD"
"1","Afghanistan","Islamic State of Afghanistan","Independent State",,,"Kabul","AFN","Afghani","+93","AF","AFG","004",".af".........................etc...
The one that doesn't work is formatted like this:
Order Id,Date Ordered,Date Returned,Product Id,Description,Order Reason Code,Return Qty,Order Return Comment,Ship To Name,Ship To Address1,Ship To Address2,Ship To Address3,Ship To City,Ship To State,Ship To Zipcode,Ship To Country,Disposition,Ship To Email,ShipVia
5555555,2013-07-05 13:58:36.000,2013-08-16 00:00:00.000,5555-55,0555 - Some Test Thing,Refund,2,,jeric beatty,123 fake st,,,burke,NJ,55055,US,Discard,test#test.com,Super Fast Shipping
Is there anyway to get excel to export in the format as the first one? I would like to avoid doing this manually as the file is huge and I would have to manually edit lots of parts of it where I couldn't do a "replace all". Another issue could be that there are double and sometimes triple commas in some places. Though this does appear in both files.
Here is the parser:
function ingest_csv() {
$file_url = 'http://www.path.to/csv/file.csv';
$record_num = 0;
$records = array();
$header = array();
if (($handle = fopen($file_url, "r")) !== FALSE) {
$records['id'] = '';
while (($data = fgetcsv($handle)) !== FALSE) {
$records['id'][$record_num] = '';
$cell_num = 0;
foreach ($data as $cell) {
if($record_num == 0) {
$header = $data;
} else {
$current_key = $header[$cell_num];
$records['id'][$record_num][$current_key] = $cell;
}
$cell_num++;
}
$record_num++;
}
fclose($handle);
}
else {
echo 'could not open file.';
}
return array($record_num, $records);
}
function batch_csv() {
list($num_rows, $rows) = ingest_csv
print_r($num_rows);
print_r($rows);
}
As mentioned in the comments though you may be trying to reinvent the wheel here, though personally I've asked questions where I didn't want to give long rambling explanations of why I was forced to use unconventional approaches so should this be one of those situations here's an answer.
In OpenOffice Calculator (for example) and when you go to save as CSV you get a number of further options including the decision to double quote all fields.
Unfortunately Excel doesn't give you the choice, but Microsoft do offer up a workaround using a macro - http://support.microsoft.com/kb/291296/en-us
I am trying to use a function much like this.....
$file = fopen("/tmp/$importedFile.csv","r");
while ($line = fgetcsv($file))
{
$csv_data[] = $line;
}
fclose($file);
...to load CSV values. This is gravy but now I wish to select individual columns by their array number. I believe I want to select it with something like this, but cannot find any clarity.
$csv_data[2] = $line;
This however just shows second (third) row of data rather than column.
Regards
Do you need the whole file in memory or will you be processing the lines individually?
Processing individually:
$line is already an array. If you want the 3rd column, use $line[2]
Processing after reading the whole file:
$csv_data[$lineNo][$columnNo]
$inputfiledelimiter = ",";
if (($handle = fopen($PathOfCsvFile, "r")) !== FALSE)
{
while (($data = fgetcsv($handle, 0, $inputfiledelimiter)) !== FALSE)
{
//get data from $data
}
}
Well, your CSV file is now split up in lines, that is all.
No concept of columns yet in that structure.
So you need to split the lines into columns.
Or, much better, let PHP do that for you: Have a look at fgetcsv() and the associated functions:
http://nl.php.net/manual/en/function.fgetcsv.php