I tried to load a 16MB file, into an php array.
It ends up with about 63MB memory usage.
Loading it into a string, just consumes the 16MB, but the issue is, I need it inside of an array, to access it faster, afterwards.
The file consists of about 750k lines (routing table dump).
I proberly should load it into a MySQL database, issue there, not enough memory to run that thing, so I did choose rqlite: https://github.com/rqlite/rqlite. Since I also need the replication features.
I am not sure if a SQLite database is fast enough for that.
Does anyone got an Idea for that issue?
You can get the actual file here: http://data.caida.org/datasets/routing/routeviews-prefix2as/2018/07/routeviews-rv2-20180715-1400.pfx2as.gz
The code I used:
$data = file('routeviews-rv2-20180715-1400.pfx2as');
var_dump(memory_get_usage());
Thanks.
You may use the Php fread function. It allows reading data of fixed size. It can be used inside a loop to read sized blocks of data. It does not consume much memory and is suitable for reading large files.
If you want to sort the data, then you may want to use a database. You can read the data from the large file one line at a time using fread and then insert it to the database.
Related
I am using file_get_contents to get 1 million records from URL and output the results which is in json format and I can't go for pagination and currently working by increasing my memory. Is there any other solution for this?
If you're processing large amounts of data, fscanf will probably prove
valuable and more efficient than, say, using file followed by a split
and sprintf command. In contrast, if you're simply echoing a large
amount of text with little modification, file, file_get_contents, or
readfile might make more sense. This would likely be the case if
you're using PHP for caching or even to create a makeshift proxy
server.
More
The right way to read files with PHP
I need to insert all the data in an Excel file (.xlsx) to my database. I have tried all the available methods, such as caching, make it read chunk by chunk but nothing seems to work at all. Has anyone tried to do this with a big file size before? My spreadsheet has about 32 columns and about 700,000 rows of records.
The file is already uploaded in the server. And I want to write a cron job to actually read the excel file and insert the data to the database. I chunk it to read each time 5000, 3000 or even 10 records only, but none worked. What happens is it will return this error:
simplexml_load_string(): Memory allocation failed: growing buffer.
I did try with CSV file type and manage to get the thing run at 4000k records each time but will take about five minutes each time to process, but any higher will fail too. And get the same error. But the requirement need it in .xlsx file types, so I need to stick with that.
Consider converting it to CSV format using external tool, like ssconvert from Gnumeric package and then read CSV line by line with fgetcsv function.
Your issue occurs because you are trying to read the contents of a whole XML file. Caching and reading chunk by chunk does not help because the library you are using needs to read the entire XML file at one point to determine the structure of the spreadsheet.
So for very large files, the XML file is so big that reading it consumes all the available memory. The only working option is to use streamers and optimize the reading.
This is still a pretty complex problem. For instance, to resolve the data in your sheet, you need to read the shared strings from one XML file and the structure of your sheet from another one. Because of the way shared strings are stored, you need to have those strings in memory when reading the sheet structure. If you have thousands of shared strings, that becomes a problem.
If you are interested, Spout solves this problem. It is open-source so you can take a look at the code!
This question already has answers here:
Processing large JSON files in PHP
(7 answers)
Closed 9 years ago.
I'm working on a cron script that hits an API, receives JSON file (a large array of objects) and stores it locally. Once that is complete another script needs to parse the downloaded JSON file and insert each object into a MySQL database.
I'm currently using a file_get_contents() along with json_decode(). This will attempt to read the whole file into memory before trying to process it. This would be fine except for the fact that my JSON files will usually range from 250MB-1GB+. I know I can increase my PHP memory limit but that doesn't seem to be the greatest answer in my mind. I'm aware that I can run fopen() and fgets() to read the file in line by line, but I need to read the file in by each json object.
Is there a way to read in the file per object, or is there another similar approach?
try this lib https://github.com/shevron/ext-jsonreader
The existing ext/json which is shipped with PHP is very convenient and
simple to use - but it is inefficient when working with large
ammounts of JSON data, as it requires reading the entire JSON data
into memory (e.g. using file_get_contents()) and then converting it
into a PHP variable at once - for large data sets, this takes up a lot
of memory.
JSONReader is designed for memory efficiency - it works on streams and
can read JSON data from any PHP stream without loading the entire
data into memory. It also allows the developer to extract specific
values from a JSON stream without decoding and loading all data into
memory.
This really depends on what the json files contain.
If opening the file one shot into memory is not an option, your only other option, as you eluded to, is fopen/fgets.
Reading line by line is possible, and if these json objects have a consistent structure, you can easily detect when a json object in a file starts, and ends.
Once you collect a whole object, you insert it into a db, then go on to the next one.
There isn't much more to it. the algorithm to detect the beginning and end of a json object may get complicating depending on your data source, but I hvae done something like this before with a far more complex structure (xml) and it worked fine.
Best possible solution:
Use some sort of delimiter (pagination, timestamp, object ID etc) that allows you to read the data in smaller chunks over multiple requests. This solution assumes that you have some sort of control of how these JSON files are generated. I'm basing my assumption on:
This would be fine except for the fact that my JSON files will usually
range from 250MB-1GB+.
Reading in and processing 1GB of JSON data is simply ridiculous. A better approach is most definitely needed.
I'm trying to create an array from a file using PHP's unpack function. The problem is that PHP runs out of memory when working with bigger files. The script should handle files between 3 and 4 MB when done, but still stay reasonably fast.
Here's the basic idea:
<?php
$file = 'uploads/file.pcg';
$array = unpack('C*', file_get_contents($file));
?>
Is there a way of producing the array from the entire file at once without overloading PHP , or is my only option to work with a reasonable amount of data per script instance?
- About 1 MB seems to be reasonably fast.
- Could it be that even the array alone would need more memory than the allowed limit?
Also... Sorry if something similar has already been posted here - I don't think it was, though. :D
Thank you for the help.
Looks like you must increase memory_limit in your ini file as file_get_contents will load whole file into a memory. Basically, if you want to get a large array - you must do that anyway. Or you may look for other way to read file and unpack it step by step, without reading whole file into a memory.
We get a product list from our suppliers delivered to our site by ftp. I need to create a script that searches through that file (tab delimited) for the products relevant to us and use the information to update stock levels, prices etc.
The file itself is something like 38,000 lines long and I'm wondering on the best way of handling this.
The only way I can think initially is using fopen and fgetcsv then cycling through each line. Putting the line into an array and looking for the relevant product code.
I'm hoping there is a much more efficient way (though I haven't tested the efficiency of this yet)
The file I'll be reading is 8.8 Mb.
All of this will need to be done automatically, e.g. by CRON on a daily basis.
Edit - more information.
I have run my first trial, and based on the 2 answers, I have the following code:
I have the items I need to pick out of the text file from the database in the array with $items[$row['item_id']] = $row['prod_code'];
$catalogue = file('catalogue.txt');
while ($line = $catalogue)
{
$prod = explode(" ",$line);
if (in_array($prod[0],$items))
{
echo $prod[0]."<br>";//will be updating the stock level in the db eventually
}
}
Though this is not giving the correct output currently
I used to do a similar thing with Dominos Pizza clocking in daily data (all UK).
Either load it all into a database in one go.
OR
Use fopen and load a line at a time into a database, keeping memory overheads low. (I had to use this method as the data wasn't formatted very well)
You can then query the database at your leisure.
What do you mean by »I hope there is a more efficient way«? Effecient in respect to what? Writing the code? CPU consumption while executing the code? Disk I/O? Memory consumption?
Holding ~9MB of text in memory is not a problem (unless you've got a very low memory limit). A file() call would read the entire file and return an array (split by lines). This or file_get_contents() will be the most efficient approach in respect to Disk I/O, but consume a lot more memory than necessary.
Putting the line into an array and looking for the relevant product code.
I'm not sure why you would need to cache the contents of that file in an array. But if you do, remember that the array will use slightly more memory than the ~9MB of text. So you'd probably want to read the file sequentially, to avoid having the same data in memory twice.
Depending on what you want to do with the data, loading it into a database might be a viable solution as well, as #user1487944 already pointed out.