Convert large Excel Sheet to Mysql? - php

I have an excel sheet that contains almost 67,000 rows , and i tried to convert as mysql using excel_reader. But its not supporting large number of items. Please help to solve this issue.

Try also EasyXLS Excel library. You can import large data from Excel with this library. It includes a library as COM component, that can be used from PHP. COM objects are a little slower, but you can obtain a reasonable importing time.
Use this link as starting point:
https://www.easyxls.com/manual/FAQ/import-excel-to-mysql.html

A viable option (But certainly not the easiest) would be to construct a script using php -- Note this would be the loop itself; you would need your db connection etc etc.
<?php
$file = fopen("import.csv","r");
while(! feof($file))
{
//MYSQL insert Statement here
}
fclose($file);
?>
That would create an array for every line then you can use the array positions in your insert statement which will be repeated roughly 67,000 times Wouldnt take excessively long but may be a better approach than using say phpmyadmin if it is timing out on you etc etc.

Related

Export excel or csv big row and dynamic column

I have excel data more than 5k rows and 17 columns, I use the nested loop technique in php, but this takes a long time, to process the data using the xls file format takes 45 minutes, while using the csv file format takes 30 minutes , is there a technique to speed up uploading files from excel to the database (I use Postgresql).
I use a nested loop because how many columns depend on the parameters, and for the INSERT or UPDATE process to the database also depends on the parameters.
Here is my code for the import process
<?php
$row = 5000; // estimated row
$col = 17; // estimated col
for($i=1; $i<=$row; $i+=1){
for($j=1; $j<=$col; $j+=1){
$custno = $custno = $sheetData[$i][0];
$getId = "SELECT id from data WHERE 'custno' = $custno";
if($getId){
$update = "UPDATE data SET address = 'address 1' WHERE custno = $custno";
}else{
$insert = "INSERT INTO data (address) VALUES (address jon);
}
}
}
I use the PhpSpreadsheet library
First, try to find out what is the root of the issue, is it because operating over the file is slow or there are too many SQL queries being executed in the meantime?
Bear in mind that running queries in the loop is always asking for performance trouble. Maybe you can avoid that by asking for needed data before processing the file? You may not be able to define which data are needed on that step but fetching more than you need could be still faster than making separate queries one by one. Also, I would like to encourage you to limit INSERT or UPDATE queries. They are usually slower than the SELECT one. Try to collect data for database write operations and run it once after the loop.
For CSV operations I would prefer basic php methods like fgetcsv() and str_getcsv() than the separate library as long as the file is not overcomplicated. If you are keen to check some alternatives for PhpSpreadsheet take a look at Spout by box.com, it looks promising but I have never used that.
I'm sure that you can improve your performance by using PHP Genrators, they are perfect everytime you have to read a file content. Here you have some more links:
https://www.sitepoint.com/memory-performance-boosts-with-generators-and-nikiciter/
https://www.sitepoint.com/generators-in-php/
https://riptutorial.com/php/example/5441/reading-a-large-file-with-a-generator/
If not using php for this operation is an option for your, try exporting this spreadsheet as CSV and importing the file using COPY. It won't take more than a few seconds.
If your database is installed locally you just need to execute COPY in a client of your choice, e.g. pgAdmin. Check this answer for more information.
COPY your_table FROM '/home/user/file.csv' DELIMITER ',' CSV HEADER;
Keep in mind that the user postgres in your system must have the necessary permissions to access the CSV file. Check how to do that in your operating system, e.g. chown in Linux.
In case your database is installed in a remote server, you have to use the STDIN facility of COPY via psql
$ cat file.csv | psql your_db -c "COPY your_table FROM STDIN;"

I am getting issues when importing data using .csv into PostgreSQL ( greater than 50K lines in csv)

ERROR i'm getting :
This page isn’t working didn’t send any data.
ERR_EMPTY_RESPONSE
I am using PHP language for reading the csv file .
My PHP approach is look like for procession the csv data :
$csvAsArray = array_map('str_getcsv', file($tmpName));
I am sure that the above code creating problem afterwords the code is not getting executing .
How i can import more that greater than at least 1 million data at a time ?
Can anyone help me out , which approach should i choose ?
It looks like you're trying to grab the entire contents of the file in one gulp. Don't do that :) PHP array_map() isn't scalable to 1000's ... or millions of lines.
SUGGESTION:
Read your data into a temp file (as you appear to be doing now).
Do a Postgresql COPY
EXAMPLE:
COPY my_table(my_columns, ...)
FROM 'my_file.csv' DELIMITER ',' CSV HEADER;
I would suggest using a league/csv package for CSV parsing and importing. #paulsm4 is correct that it's never needed to put the whole file into memory and then work with it, one should rather read line-by-line. But this package is well-maintained and does this all under the hood and quite effectively. And it is much more flexible than COPY postgres command to my mind. One can filter the contents, map callbacks to fields/rows and all this on PHP side.

How to process 80MB+ xlsx to database MySQL with PHPExcel?

I need to insert all the data in an Excel file (.xlsx) to my database. I have tried all the available methods, such as caching, make it read chunk by chunk but nothing seems to work at all. Has anyone tried to do this with a big file size before? My spreadsheet has about 32 columns and about 700,000 rows of records.
The file is already uploaded in the server. And I want to write a cron job to actually read the excel file and insert the data to the database. I chunk it to read each time 5000, 3000 or even 10 records only, but none worked. What happens is it will return this error:
simplexml_load_string(): Memory allocation failed: growing buffer.
I did try with CSV file type and manage to get the thing run at 4000k records each time but will take about five minutes each time to process, but any higher will fail too. And get the same error. But the requirement need it in .xlsx file types, so I need to stick with that.
Consider converting it to CSV format using external tool, like ssconvert from Gnumeric package and then read CSV line by line with fgetcsv function.
Your issue occurs because you are trying to read the contents of a whole XML file. Caching and reading chunk by chunk does not help because the library you are using needs to read the entire XML file at one point to determine the structure of the spreadsheet.
So for very large files, the XML file is so big that reading it consumes all the available memory. The only working option is to use streamers and optimize the reading.
This is still a pretty complex problem. For instance, to resolve the data in your sheet, you need to read the shared strings from one XML file and the structure of your sheet from another one. Because of the way shared strings are stored, you need to have those strings in memory when reading the sheet structure. If you have thousands of shared strings, that becomes a problem.
If you are interested, Spout solves this problem. It is open-source so you can take a look at the code!

Reading large excel file with PHP

I'm trying to read a 17MB excel file (2003) with PHPExcel1.7.3c, but it crushes already while loading the file, after exceeding the 120 seconds limit I have.
Is there another library that can do it more efficiently? I have no need in styling, I only need it to support UTF8.
Thanks for your help
Filesize isn't a good measure when using PHPExcel, it's more important to get some idea of the number of cells (rowsxcolumns) in each worksheet.
If you have no need for styling, are you calling:
$objReader->setReadDataOnly(true);
before loading the file?
If you don't need to access all worksheets, or only certain cells within a worksheet, look at using
$objReader->setLoadSheetsOnly(array(1,2))
or
$objReader->setLoadSheetsOnly(1)
or defining a readFilter
Are you using cell caching? If so, what method? That slows down the load time.
If the only thing you need from your read Excel file is data, here is my way to read huge Excel files :
I install gnumeric on my server, ie with debian/ubuntu :
apt-get install gnumeric
Then the php calls to read my excel file and store it into a two dimensionnal data array are incredibly simple (dimensions are rows and cols) :
system("ssconvert \"$excel_file_name\" \"temp.csv\"");
$array = array_map("str_getcsv", file("temp.csv"));
Then I can do what I want with my array. This takes less than 10 seconds for a 10MB large xls file, the same time I would need to load the file on my favorite spreadsheet software !
For very huge files, you should use fopen() and file_getcsv() functions and do what you have to do without storing data in a huge array to avoid storing the whole csv file in memory with the file() function. This will be slower, but will not eat all your server's memory !
17MB is a hefty file.
Time how long a 1MB file takes to parse so you can work out how long a 17MB file would take. Then one option might be just to increase your 120 second limit.
Alternatively, you could export to CSV, which will be way more efficient, and import via PHP's fgetcsv.
I've heard that Excel Explorer is better in reading large files.
Maybe you could convert/export into csv, and use built-in fgetcsv(). Depends on what kind of functionality you need.
I'm currently using the spreadsheet-reader (https://github.com/nuovo/spreadsheet-reader) which is quite fast in reading XLSX, ODS and CSV, and has the problems mentioned only in reading the XLS format.

Advantage to parsing Excel Spreadsheet data vs. CSV?

I have tabulated data in an Excel spreadsheet (file size will likely never be larger than 1 mb). I want to use PHP to parse the data and insert in to a MySQL database.
Is there any advantage to keeping the file as an .xls/.xlsx and parsing it using a PHP Excel Parsing Library? If so, what are some good libraries to use?
Obviuously, I can save the .xls/.xlsx as a CSV and handle the file that way.
Thanks!
If you are just after the values, I would save it as a CSV. This is much easier to parse programatically, especially if you are trying to do this on a non-windows box.
That being said, there will be information lost in the export to CSV. It will only save the values of the cells - not their formatting information, formulas, etc. If you need to use that information, you're better off doing this straight from Excel.
Here is a PHP Excel Reading library. If you decide to read Excel files directly, this may help get you started.
If your excel files contain strictly data and contain no formulas, scripts, macros and etc., I would say parsing through Excel will only add development overhead, and will potentially slow down processing. It would probably be best to convert the files to CSV in this case.
Also consider that MySQL's 'LOAD DATA INFILE' command can be used to import entire CSV files into a table, this can potentially further uncomplicate matters for you.
when you provide a way for customers to upload excel/csv files, you should consider that
CSV files will only export one sheet
Having multiline cells will make the CSV parsing complicated
You cannot easily detect corrupted/incomplete CSV files
CSV files do not include formatting
Besides from that, importing CSV is a lot easier than importing XLS.
Remember that if you're importing the csv file directly into Mysql, that you may have problems with the date format (as Mysql uses a different date format to Excel). I find it easier to change the date fields in Excel first (to format yyyy-mm-dd) prior to saving as a csv file.
Edit: Although I've not used it myself, others have recommended Navicat as a very good tool for converting Excel spreadsheets or Access data into Mysql databases. May be worth a look.
With Office 2003 there's an XML format called SpreadsheetML which is a bit in-between XML and Excel. I've considered using this format to import/export data to a web site but the format turns out to be a bit complex. Internally, this format turns all references into relative references. (Relative from the current location.) Worse, some cells have an index, thus you might see a row with only two cells, but the second cell might be 6 columns away from the first cell. (In which case Index=5.) Basically, if you want to use the Excel format, you will need to have a good way to calculate the position of each cell and know how to translate the references in the cells in a proper way.
If you're onlyinterested in the data, CSV would be much, much easier to implement. As an in-between solution, you could define an XML schema and add an XML mapping to your spreadsheet to export the data to an XML file. It's more complex than CSV i9mport/export, but also a bit more robust. But the Excel or Excel XML formats themselves are horrible to implement. (Or just a nice challenge, if you're a real XML expert.)

Categories