I’m working with the royal mail PAF database in csv format (approx 29 million lines), and need to split the data into sql server using php.
Can anyone recommend the best method for this to prevent timeout?
Here is a sample of the data: https://gist.github.com/anonymous/8278066
To disable the script execution time limit, start your script off with this:
set_time_limit(0);
Another problem you will likely run into is a memory limit. Make sure you are reading your file line-by-line or in chunks, rather than the whole file at once. You can do this with fgets().
Start your script with
ini_set('max_execution_time', 0);
The quickest way I found was to use SQL Servers BULK INSERT to load the data directly and unchanged, from the csv files, into matching db import tables. Then do my own manipulation and population of application specific tables from those import tables.
I found BULK INSERT will import the main CSVPAF file, containing nearly 31 million address records in just a few minutes.
Related
So I have a situation where I need to offer the user a multi-sheet excel document where each sheet has thousands of rows and ~10 columns. Data is coming from multiple MySQL queries.
Currently using "Laravel Excel" library to do this, but it uses up way too much memory and is giving me huge scalability problems.
So I have an idea to use MySQL OUTFILE to write a set of csv files on disk, one for each sheet, and then to create an xls document and write the previously written csv data as sheets in the xls.
Is there a method to accomplish writing the csv contents to a sheet "in bulk" per-say without iterating line-by-line through the csv or using up a large amount of memory (like writing to disk directly perhaps?)
Thanks for any help!
I had a very similar problem recently. My solution was to use the very lightweight PHP library PHP_XLSXWriter.
You can find it here: https://github.com/mk-j/PHP_XLSXWriter
It streams the output so it doesn't have to retain as much in memory.
In my usage case, I broke apart the "writeStream" method into three methods: one for each the header and footer, and one for the sheet content (i.e. the actual rows). This way I could write the header and then use Laravel's "chunking" feature, to get even more gradual with the writes.
The time increased slightly, but executing the script went from ~200Mb of RAM usage to under 15Mb!
Excel file formats (both BIFF and OfficeOpenXML) are not conducive to writing line-by-line like a CSV, because data isn't stored linearly. This means that all PHP libraries for writing native format Excel files have to work in PHP memory to manage the order of writing data to that file format, which means they will all consume large amounts of memory for larger volumes of data.
Laravel Excel is a wrapper around PHPExcel, which provides some options for reducing memory usage (eg. caching cell data to disk or SQLite database rather than holding it all in PHP memory), albeit at a cost in execution speed. What I don't know is whether Laravel Excel provides calls to enable these caching methods, though I believe some options are available allowing you to configure this.
Your alternative on a Linux platform is using non-PHP solutions like libXl or PUNO with Open/Libre Office
I need to insert all the data in an Excel file (.xlsx) to my database. I have tried all the available methods, such as caching, make it read chunk by chunk but nothing seems to work at all. Has anyone tried to do this with a big file size before? My spreadsheet has about 32 columns and about 700,000 rows of records.
The file is already uploaded in the server. And I want to write a cron job to actually read the excel file and insert the data to the database. I chunk it to read each time 5000, 3000 or even 10 records only, but none worked. What happens is it will return this error:
simplexml_load_string(): Memory allocation failed: growing buffer.
I did try with CSV file type and manage to get the thing run at 4000k records each time but will take about five minutes each time to process, but any higher will fail too. And get the same error. But the requirement need it in .xlsx file types, so I need to stick with that.
Consider converting it to CSV format using external tool, like ssconvert from Gnumeric package and then read CSV line by line with fgetcsv function.
Your issue occurs because you are trying to read the contents of a whole XML file. Caching and reading chunk by chunk does not help because the library you are using needs to read the entire XML file at one point to determine the structure of the spreadsheet.
So for very large files, the XML file is so big that reading it consumes all the available memory. The only working option is to use streamers and optimize the reading.
This is still a pretty complex problem. For instance, to resolve the data in your sheet, you need to read the shared strings from one XML file and the structure of your sheet from another one. Because of the way shared strings are stored, you need to have those strings in memory when reading the sheet structure. If you have thousands of shared strings, that becomes a problem.
If you are interested, Spout solves this problem. It is open-source so you can take a look at the code!
I'm using a foreach function to import a 30mb CSV file into MySQL. This script need to run for about 2-5 minutes and I'm already using the ob_flush.
Now my question is:
Is there any other option to give the user an indication of the loading progress? At this point you never know when the script will be fully load.
I'm using a foreach function to import a 30mb CSV file into MySQL. This script need to run for about 2-5 minutes and I'm already using the ob_flush.
The best advice is not to use a foreach loop to import CSV into MySQL at all.... MySQL provides a built-in feature for importing CSV files which is much quicker.
Look up the manual page for the MySQL LOAD DATA INFILE feature.
Seriously, when I say "much quicker", I mean it -- switch your code to use LOAD DATA INFILE, and you won't need a progress bar.
I am agree with #SuperJer, you must use AJAX.
Everyday I importing 50MB data because we have ecommerce website. We used AJAX on left side and put a loader which show "Files uploading....".
Foreach is good if you have less data, but for huge data I think its not good. Always use inbuilt methods or function.
To upload csv file you may use below syntax
LOAD DATA INFILE 'c:/your_csv_file.csv' INTO TABLE tablename.
One of the easier ways to do this would be to call the function that does the work at the end of the page, AFTER all the page HTML has been sent to the browser.
Assuming jQuery, make note of the element in which you want the progress indicator to appear, such as id='progressBar'.
Then, as you iterate through your loop, every 100 iterations or so, echo a javascript line to update that element with the new percentage:
echo "
<script type='text/javascript'>
$('#progressBar').html('".(int)$percentage."');
</script>";
You'll want to make sure, depending on the number of iterations, to not overwhelm the browser with javascript code blocks. Perhaps check if the percentage is divisible by 5, prior to echoing the block.
Also, if you are looping through a CSV and INSERTing it line-by-line, it would be better (faster) to insert a block of them, say 500 lines per INSERT. This is also easily displayed using the same method.
I have a trouble when I try to upload and parse through PHPExcel library a xlsx file with 12500 rows and 20 columns The whole process takes about ten minutes to end, it does querys and validations. The upload works fine, but when it starts to parse the excel file with all that rows the excel library fails.
I have to divided into two files the big file, so it works fine.
Is posible parse a file of more than 12000 lines and 20 columns in PHPExcel,
without it will fail?
And I'd like your suggestion.
Is more fast and light parse a csv file(I think so) than parse a excel file, when we are talking as many lines?
The browsers die about minute three of the execution, but the process in server continues.
Is there anyway that the browser doesn't reset the connection? and Why the process continues in server and the browser dies?
I'm thinking to pass the process to ajax for avoid this point, What do you think of that?
What's the better way to parse this type of files with that number of lines?
Thank you very much!
I need to convert an XML file (about 200 Mb in size) to SQL files and insert them into a MySQL table (one table - it looks like there are about 10 million rows, only a few columns).
Unfortunately, I don't have access to shell / command line tools
It looks like I would need to use PHPMyAdmin import tool where the import size is limited to 50mb per upload
Or, PHP is enabled via web browsers only, so write a PHP script to execute from the browser.
So, steps are (please let me know if there are better ways to go around):
unpack the file into the server
write a PHP script to convert and insert
or
do it locally and use phpadmin to upload them separately
What would be a good way to get this done? any ideas / feedbacks / details are appreciated.
DomDocument is very good at dealing with xml data. You can parse the data with it and convert it to the format you need.
You might have an issue with the size of the file if you can't change the configuration though. I believe the default allowed memory size is ~8MB.
If you are really dealing with 10 million rows, (200Mb of data, at 10 millows, rows, approx 21 bytes per row?), then unpacking the file on the server and writing a script to handle the insert's would probably be the best best.