convert xml to sql - php

I need to convert an XML file (about 200 Mb in size) to SQL files and insert them into a MySQL table (one table - it looks like there are about 10 million rows, only a few columns).
Unfortunately, I don't have access to shell / command line tools
It looks like I would need to use PHPMyAdmin import tool where the import size is limited to 50mb per upload
Or, PHP is enabled via web browsers only, so write a PHP script to execute from the browser.
So, steps are (please let me know if there are better ways to go around):
unpack the file into the server
write a PHP script to convert and insert
or
do it locally and use phpadmin to upload them separately
What would be a good way to get this done? any ideas / feedbacks / details are appreciated.

DomDocument is very good at dealing with xml data. You can parse the data with it and convert it to the format you need.
You might have an issue with the size of the file if you can't change the configuration though. I believe the default allowed memory size is ~8MB.

If you are really dealing with 10 million rows, (200Mb of data, at 10 millows, rows, approx 21 bytes per row?), then unpacking the file on the server and writing a script to handle the insert's would probably be the best best.

Related

PHP array uses a lot more memory then it should

I tried to load a 16MB file, into an php array.
It ends up with about 63MB memory usage.
Loading it into a string, just consumes the 16MB, but the issue is, I need it inside of an array, to access it faster, afterwards.
The file consists of about 750k lines (routing table dump).
I proberly should load it into a MySQL database, issue there, not enough memory to run that thing, so I did choose rqlite: https://github.com/rqlite/rqlite. Since I also need the replication features.
I am not sure if a SQLite database is fast enough for that.
Does anyone got an Idea for that issue?
You can get the actual file here: http://data.caida.org/datasets/routing/routeviews-prefix2as/2018/07/routeviews-rv2-20180715-1400.pfx2as.gz
The code I used:
$data = file('routeviews-rv2-20180715-1400.pfx2as');
var_dump(memory_get_usage());
Thanks.
You may use the Php fread function. It allows reading data of fixed size. It can be used inside a loop to read sized blocks of data. It does not consume much memory and is suitable for reading large files.
If you want to sort the data, then you may want to use a database. You can read the data from the large file one line at a time using fread and then insert it to the database.

How to process 80MB+ xlsx to database MySQL with PHPExcel?

I need to insert all the data in an Excel file (.xlsx) to my database. I have tried all the available methods, such as caching, make it read chunk by chunk but nothing seems to work at all. Has anyone tried to do this with a big file size before? My spreadsheet has about 32 columns and about 700,000 rows of records.
The file is already uploaded in the server. And I want to write a cron job to actually read the excel file and insert the data to the database. I chunk it to read each time 5000, 3000 or even 10 records only, but none worked. What happens is it will return this error:
simplexml_load_string(): Memory allocation failed: growing buffer.
I did try with CSV file type and manage to get the thing run at 4000k records each time but will take about five minutes each time to process, but any higher will fail too. And get the same error. But the requirement need it in .xlsx file types, so I need to stick with that.
Consider converting it to CSV format using external tool, like ssconvert from Gnumeric package and then read CSV line by line with fgetcsv function.
Your issue occurs because you are trying to read the contents of a whole XML file. Caching and reading chunk by chunk does not help because the library you are using needs to read the entire XML file at one point to determine the structure of the spreadsheet.
So for very large files, the XML file is so big that reading it consumes all the available memory. The only working option is to use streamers and optimize the reading.
This is still a pretty complex problem. For instance, to resolve the data in your sheet, you need to read the shared strings from one XML file and the structure of your sheet from another one. Because of the way shared strings are stored, you need to have those strings in memory when reading the sheet structure. If you have thousands of shared strings, that becomes a problem.
If you are interested, Spout solves this problem. It is open-source so you can take a look at the code!

Import paf using php

I’m working with the royal mail PAF database in csv format (approx 29 million lines), and need to split the data into sql server using php.
Can anyone recommend the best method for this to prevent timeout?
Here is a sample of the data: https://gist.github.com/anonymous/8278066
To disable the script execution time limit, start your script off with this:
set_time_limit(0);
Another problem you will likely run into is a memory limit. Make sure you are reading your file line-by-line or in chunks, rather than the whole file at once. You can do this with fgets().
Start your script with
ini_set('max_execution_time', 0);
The quickest way I found was to use SQL Servers BULK INSERT to load the data directly and unchanged, from the csv files, into matching db import tables. Then do my own manipulation and population of application specific tables from those import tables.
I found BULK INSERT will import the main CSVPAF file, containing nearly 31 million address records in just a few minutes.

Parsing an excel with more than 12000 lines in PHP with PHPExcel

I have a trouble when I try to upload and parse through PHPExcel library a xlsx file with 12500 rows and 20 columns The whole process takes about ten minutes to end, it does querys and validations. The upload works fine, but when it starts to parse the excel file with all that rows the excel library fails.
I have to divided into two files the big file, so it works fine.
Is posible parse a file of more than 12000 lines and 20 columns in PHPExcel,
without it will fail?
And I'd like your suggestion.
Is more fast and light parse a csv file(I think so) than parse a excel file, when we are talking as many lines?
The browsers die about minute three of the execution, but the process in server continues.
Is there anyway that the browser doesn't reset the connection? and Why the process continues in server and the browser dies?
I'm thinking to pass the process to ajax for avoid this point, What do you think of that?
What's the better way to parse this type of files with that number of lines?
Thank you very much!

PHPExcel large data sets with multiple tabs - memory exhausted

Using PHPExcel I can run each tab separately and get the results I want but if I add them all into one excel it just stops, no error or any thing.
Each tab consists of about 60 to 80 thousand records and I have about 15 to 20 tabs. So about 1600000 records split into multiple tabs (This number will probably grow as well).
Also I have tested the 65000 row limitation with .xls by using the .xlsx extension with no problems if I run each tab it it's own excel file.
Pseudo code:
read data from db
start the PHPExcel process
parse out data for each page (some styling/formatting but not much)
(each numeric field value does get summed up in a totals column at the bottom of the excel using the formula SUM)
save excel (xlsx format)
I have 3GB of RAM so this is not an issue and the script is set to execute with no timeout.
I have used PHPExcel in a number of projects and have had great results but having such a large data set seems to be an issue.
Anyone every have this problem? work around? tips? etc...
UPDATE:
on error log --- memory exhausted
Besides adding more RAM to the box is there any other tips I could do?
Anyone every save current state and edit excel with new data?
I had the exact same problem and googling around did not find a valuable solution.
As PHPExcel generates Objects and stores all data in memory, before finally generating the document file which itself is also stored in memory, setting higher memory limits in PHP will never entirely solve this problem - that solution does not scale very well.
To really solve the problem, you need to generate the XLS file "on the fly". Thats what i did and now i can be sure that the "download SQL resultset as XLS" works no matter how many (million) row are returned by the database.
Pity is, i could not find any library which features "drive-by" XLS(X) generation.
I found this article on IBM Developer Works which gives an example on how to generate the XLS XML "on-the-fly":
http://www.ibm.com/developerworks/opensource/library/os-phpexcel/#N101FC
Works pretty well for me - i have multiple sheets with LOTS of data and did not even touch the PHP memory limit. Scales very well.
Note that this example uses the Excel plain XML format (file extension "xml") so you can send your uncompressed data directly to the browser.
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats#Excel_XML_Spreadsheet_example
If you really need to generate an XLSX, things get even more complicated. XLSX is a compressed archive containing multiple XML files. For that, you must write all your data on disk (or memory - same problem as with PHPExcel) and then create the archive with that data.
http://en.wikipedia.org/wiki/Office_Open_XML
Possibly its also possible to generate compressed archives "on the fly", but this approach seems really complicated.

Categories