CSV export script running out of memory - php

I have a PHP script that exports a MySQL database to CSV format. The strange CSV format is a requirement for a 3rd party software application. It worked initially, but I am now running out of memory at line 743. My hosting service has a limit of 66MB.
My complete code is listed here. http://pastebin.com/fi049z4n
Are there alternatives to using array_splice? Can anyone please assist me in reducing memory usage. What functions can be changed? How do I free up memory?

You have to change your CSV creation strategy. Instead if reading the whole result from the database into an array in memory you must work sequencially:
Read a row from the database
Write that row into the CSV file (or maybe buffer)
Free the row from memory
Do this in a loop...
This way the memory consumption of your script does not rise proportionally with the number of rows in the result, but stays (more or less) constant.
This might serve as an untested example to sketch the idea:
while (FALSE!==($row=mysql_fetch_result($res))) {
fwrite ($csv_file."\n", implode(',', $row));
}

Increase the memory limit and maximum execution time
For example :
// Change the values to suit you
ini_set("memory_limit","15000M");
ini_set("max-execution_time", "5000");

Related

PHP array uses a lot more memory then it should

I tried to load a 16MB file, into an php array.
It ends up with about 63MB memory usage.
Loading it into a string, just consumes the 16MB, but the issue is, I need it inside of an array, to access it faster, afterwards.
The file consists of about 750k lines (routing table dump).
I proberly should load it into a MySQL database, issue there, not enough memory to run that thing, so I did choose rqlite: https://github.com/rqlite/rqlite. Since I also need the replication features.
I am not sure if a SQLite database is fast enough for that.
Does anyone got an Idea for that issue?
You can get the actual file here: http://data.caida.org/datasets/routing/routeviews-prefix2as/2018/07/routeviews-rv2-20180715-1400.pfx2as.gz
The code I used:
$data = file('routeviews-rv2-20180715-1400.pfx2as');
var_dump(memory_get_usage());
Thanks.
You may use the Php fread function. It allows reading data of fixed size. It can be used inside a loop to read sized blocks of data. It does not consume much memory and is suitable for reading large files.
If you want to sort the data, then you may want to use a database. You can read the data from the large file one line at a time using fread and then insert it to the database.

How to process 80MB+ xlsx to database MySQL with PHPExcel?

I need to insert all the data in an Excel file (.xlsx) to my database. I have tried all the available methods, such as caching, make it read chunk by chunk but nothing seems to work at all. Has anyone tried to do this with a big file size before? My spreadsheet has about 32 columns and about 700,000 rows of records.
The file is already uploaded in the server. And I want to write a cron job to actually read the excel file and insert the data to the database. I chunk it to read each time 5000, 3000 or even 10 records only, but none worked. What happens is it will return this error:
simplexml_load_string(): Memory allocation failed: growing buffer.
I did try with CSV file type and manage to get the thing run at 4000k records each time but will take about five minutes each time to process, but any higher will fail too. And get the same error. But the requirement need it in .xlsx file types, so I need to stick with that.
Consider converting it to CSV format using external tool, like ssconvert from Gnumeric package and then read CSV line by line with fgetcsv function.
Your issue occurs because you are trying to read the contents of a whole XML file. Caching and reading chunk by chunk does not help because the library you are using needs to read the entire XML file at one point to determine the structure of the spreadsheet.
So for very large files, the XML file is so big that reading it consumes all the available memory. The only working option is to use streamers and optimize the reading.
This is still a pretty complex problem. For instance, to resolve the data in your sheet, you need to read the shared strings from one XML file and the structure of your sheet from another one. Because of the way shared strings are stored, you need to have those strings in memory when reading the sheet structure. If you have thousands of shared strings, that becomes a problem.
If you are interested, Spout solves this problem. It is open-source so you can take a look at the code!

PHP Reading large tab delimited file looking for one line

We get a product list from our suppliers delivered to our site by ftp. I need to create a script that searches through that file (tab delimited) for the products relevant to us and use the information to update stock levels, prices etc.
The file itself is something like 38,000 lines long and I'm wondering on the best way of handling this.
The only way I can think initially is using fopen and fgetcsv then cycling through each line. Putting the line into an array and looking for the relevant product code.
I'm hoping there is a much more efficient way (though I haven't tested the efficiency of this yet)
The file I'll be reading is 8.8 Mb.
All of this will need to be done automatically, e.g. by CRON on a daily basis.
Edit - more information.
I have run my first trial, and based on the 2 answers, I have the following code:
I have the items I need to pick out of the text file from the database in the array with $items[$row['item_id']] = $row['prod_code'];
$catalogue = file('catalogue.txt');
while ($line = $catalogue)
{
$prod = explode(" ",$line);
if (in_array($prod[0],$items))
{
echo $prod[0]."<br>";//will be updating the stock level in the db eventually
}
}
Though this is not giving the correct output currently
I used to do a similar thing with Dominos Pizza clocking in daily data (all UK).
Either load it all into a database in one go.
OR
Use fopen and load a line at a time into a database, keeping memory overheads low. (I had to use this method as the data wasn't formatted very well)
You can then query the database at your leisure.
What do you mean by »I hope there is a more efficient way«? Effecient in respect to what? Writing the code? CPU consumption while executing the code? Disk I/O? Memory consumption?
Holding ~9MB of text in memory is not a problem (unless you've got a very low memory limit). A file() call would read the entire file and return an array (split by lines). This or file_get_contents() will be the most efficient approach in respect to Disk I/O, but consume a lot more memory than necessary.
Putting the line into an array and looking for the relevant product code.
I'm not sure why you would need to cache the contents of that file in an array. But if you do, remember that the array will use slightly more memory than the ~9MB of text. So you'd probably want to read the file sequentially, to avoid having the same data in memory twice.
Depending on what you want to do with the data, loading it into a database might be a viable solution as well, as #user1487944 already pointed out.

PHP Reducing memory usage when using fputcsv to php://output

We've run into a bit of a weird issue. It goes like this. We have large reams of data that we need to output to the client. These data files cannot be pre-built, they must be served from live data.
My preferred solution has been to write into the CSV line by line from a fetch like this:
while($datum = $data->fetch(PDO::FETCH_ASSOC)) {
$size += fputcsv($outstream, $datum, chr(9), chr(0));
}
This got around a lot of ridiculous memory usage (reading 100000 records into memory at once is bad mojo) but we still have lingering issues for large tables that are only going to get worse as the data increases in size. And please note, there is no data partitioning; they don't download in year segments, but they download all of their data and then segment it themselves. This is per the requirements; I'm not in a position to change this as much as it would remove the problem entirely.
In either case, on the largest table it runs out of memory. One solution is to increase the memory available, which solves one problem but suggests the creation of server load problems later on or even now if more than one client is downloading.
In this case, $outstream is:
$outstream = fopen("php://output",'w');
Which seems pretty obviously not really a physical disk location. I don't know much about php://output in terms of where the data resides before it is sent to the client, but it seems obvious that there are memory issues with simply writing a manifold database table to csv via this method.
To be exact, the staging box allows about 128mb for PHP, and this call in particular was short about 40mb (it tried to allocate 40mb more.) This seems a bit odd for behavior, as you would expect it to ask for memory in smaller parts.
Anyone know what can be done to get a handle on this?
So it looks like the memory consumption is caused by Zend Framework's output buffering. The best solution that I came up with was this.
doing ob_end_clean() right before we start to stream the file to the client. This particular instance of ZF is not going to produce any normal output or do anything more after this point, so complications don't arise. What odd thing does happen (perhaps from the standpoint of the user) is that they really get the file streamed to them.
Here's the code:
ob_end_clean();
while($datum = $data->fetch(PDO::FETCH_ASSOC)) {
$size += fputcsv($outstream, $datum, chr(9), chr(0));
}
Memory usage (according to the function memory_get_peak_usage(true) suggested in a ZF forum post somewhere) went from 90 megabytes down to 9 megabytes, which is what it was using here on my development box prior to any file reading.
Thanks for the help, guys!

PHPExcel - Write/append in chunks

I am using the PHPExcel framework to try to write out a very large excel document from a mysql query.
Everything works fine until I hit the 5000 row mark (or there about's) where the page gets flagged with the error:
Fatal error: Allowed memory size of xxx bytes exhausted (tried to allocate yyy bytes) in zzz on line aaa
I know this is documented, and I have adjusted the memory allocation on the server but even so I am still hitting the ceiling. I have also tried to turn off formatting, but in reality I will need it on.
So is there a way to write in small chunks, or append to the excel document so I don't exhaust the memory allocation? I am thinking along the lines of the page writing say 1000 lines, then redirect to itself and process the next 1000 using a GET to keep track. For example:
index.php?p=0
then redirect to
index.php?p=1000,
But I can't find a way to append to an existing document without opening up the whole thing.
There is no way of writing in chunks, although a common mistake is for people to load their mysql data to an array, then loop through the array setting the Excel cell data. It's more memory efficient to set the cell data as you loop through the mySQL query resultset.
If you need to keep memory usage to a minimum, what cell caching method are you using? Cell caching is slower, but can save significant amounts of memory.

Categories