I'm trying to create an array from a file using PHP's unpack function. The problem is that PHP runs out of memory when working with bigger files. The script should handle files between 3 and 4 MB when done, but still stay reasonably fast.
Here's the basic idea:
<?php
$file = 'uploads/file.pcg';
$array = unpack('C*', file_get_contents($file));
?>
Is there a way of producing the array from the entire file at once without overloading PHP , or is my only option to work with a reasonable amount of data per script instance?
- About 1 MB seems to be reasonably fast.
- Could it be that even the array alone would need more memory than the allowed limit?
Also... Sorry if something similar has already been posted here - I don't think it was, though. :D
Thank you for the help.
Looks like you must increase memory_limit in your ini file as file_get_contents will load whole file into a memory. Basically, if you want to get a large array - you must do that anyway. Or you may look for other way to read file and unpack it step by step, without reading whole file into a memory.
Related
I tried to load a 16MB file, into an php array.
It ends up with about 63MB memory usage.
Loading it into a string, just consumes the 16MB, but the issue is, I need it inside of an array, to access it faster, afterwards.
The file consists of about 750k lines (routing table dump).
I proberly should load it into a MySQL database, issue there, not enough memory to run that thing, so I did choose rqlite: https://github.com/rqlite/rqlite. Since I also need the replication features.
I am not sure if a SQLite database is fast enough for that.
Does anyone got an Idea for that issue?
You can get the actual file here: http://data.caida.org/datasets/routing/routeviews-prefix2as/2018/07/routeviews-rv2-20180715-1400.pfx2as.gz
The code I used:
$data = file('routeviews-rv2-20180715-1400.pfx2as');
var_dump(memory_get_usage());
Thanks.
You may use the Php fread function. It allows reading data of fixed size. It can be used inside a loop to read sized blocks of data. It does not consume much memory and is suitable for reading large files.
If you want to sort the data, then you may want to use a database. You can read the data from the large file one line at a time using fread and then insert it to the database.
I am using file_get_contents to get 1 million records from URL and output the results which is in json format and I can't go for pagination and currently working by increasing my memory. Is there any other solution for this?
If you're processing large amounts of data, fscanf will probably prove
valuable and more efficient than, say, using file followed by a split
and sprintf command. In contrast, if you're simply echoing a large
amount of text with little modification, file, file_get_contents, or
readfile might make more sense. This would likely be the case if
you're using PHP for caching or even to create a makeshift proxy
server.
More
The right way to read files with PHP
I am building a website where the basic premise is there are two files. index.php and file.txt.
File.txt has (currently) 10megs of data, this can potentially be up to 500mb. The idea of the site is, people go to index.php and then can seek to any position of the file. Another feature is they can read up to 10kb data from the point of seeking. So:
index.php?pos=432 will get the byte at position 423 on the file.
index.php?pos=555&len=5000 will get 5kb of the data from the file starting from position 555
Now, Imagine the site getting thousands of hits a day.
I currently use fseek and fread to serve the data. Is there any faster way of doing this? Or is my usage too low to consider advanced optimizations such as caching the results of each request or loading the file into memory and reading it from there?
Thousands of hits per day, that's like one every few seconds? That's definitely too low to need optimizing at this point, so just use fseek and fread if that's what's easiest for you.
If it is crucial for you to keep all data into a file, I would suggest you to split your file into a chunk of smaller files.
So for example you could make a decision, that a file size should not be more then 1 mb. It means that you have to split your file.txt file into 10 separate files: file-1.txt, file-2.txt, file-3.txt and so on...
When you will process a request, you will need to determine what file to pickup by division pos argument on file size and show appropriate amount of data. In this case fseek function will work faster, perhaps...
But anyway you have to stick with fseek and fopen functions.
edit: now that I consider it, so long as you're using fseek() to go to a byte offset and then using fread() to get a a certain number of bytes it shouldn't be a problem. For some reason I read your question as serving X number of lines from a file which would be truly terrible.
The problem is you are absolutely hammering the disk with IO operations, and you're not just causing performance issues with this one file/script, you're causing performance issues with anything that needs that disk. Other users, the OS, etc. if you're on shared hosting I guarantee that one of the sysadmins is trying to figure out who you are so they can turn you off. [I would be]
You need to find a way to either:
Offload this to memory.
Set up a daemon on the server that loads the file into memory and serves chunks on request.
Offload this to something more efficient, like mySQL.
You're already serving the data in sequential chunks, eg: line 466 to 476, it will be much faster to retrieve the data from a table like:
CREATE TABLE mydata (
line INTEGER NOT NULL AUTO_INCREMENT,
data VARCHAR(2048)
) PRIMARY KEY (line);
by:
SELECT data FROM mydata WHERE line BETWEEN 466 AND 476;
If the file never changes, and is truly limited in maximum size, I would simply mount a ramdisk, and have a boot script which copies the file from permanent storage to RAM storage.
This probably requires hosting the site on linux if you aren't already.
This would allow you to guarantee that the file segments are served from memory, without relying on the OS filesystem cache.
We get a product list from our suppliers delivered to our site by ftp. I need to create a script that searches through that file (tab delimited) for the products relevant to us and use the information to update stock levels, prices etc.
The file itself is something like 38,000 lines long and I'm wondering on the best way of handling this.
The only way I can think initially is using fopen and fgetcsv then cycling through each line. Putting the line into an array and looking for the relevant product code.
I'm hoping there is a much more efficient way (though I haven't tested the efficiency of this yet)
The file I'll be reading is 8.8 Mb.
All of this will need to be done automatically, e.g. by CRON on a daily basis.
Edit - more information.
I have run my first trial, and based on the 2 answers, I have the following code:
I have the items I need to pick out of the text file from the database in the array with $items[$row['item_id']] = $row['prod_code'];
$catalogue = file('catalogue.txt');
while ($line = $catalogue)
{
$prod = explode(" ",$line);
if (in_array($prod[0],$items))
{
echo $prod[0]."<br>";//will be updating the stock level in the db eventually
}
}
Though this is not giving the correct output currently
I used to do a similar thing with Dominos Pizza clocking in daily data (all UK).
Either load it all into a database in one go.
OR
Use fopen and load a line at a time into a database, keeping memory overheads low. (I had to use this method as the data wasn't formatted very well)
You can then query the database at your leisure.
What do you mean by »I hope there is a more efficient way«? Effecient in respect to what? Writing the code? CPU consumption while executing the code? Disk I/O? Memory consumption?
Holding ~9MB of text in memory is not a problem (unless you've got a very low memory limit). A file() call would read the entire file and return an array (split by lines). This or file_get_contents() will be the most efficient approach in respect to Disk I/O, but consume a lot more memory than necessary.
Putting the line into an array and looking for the relevant product code.
I'm not sure why you would need to cache the contents of that file in an array. But if you do, remember that the array will use slightly more memory than the ~9MB of text. So you'd probably want to read the file sequentially, to avoid having the same data in memory twice.
Depending on what you want to do with the data, loading it into a database might be a viable solution as well, as #user1487944 already pointed out.
We've run into a bit of a weird issue. It goes like this. We have large reams of data that we need to output to the client. These data files cannot be pre-built, they must be served from live data.
My preferred solution has been to write into the CSV line by line from a fetch like this:
while($datum = $data->fetch(PDO::FETCH_ASSOC)) {
$size += fputcsv($outstream, $datum, chr(9), chr(0));
}
This got around a lot of ridiculous memory usage (reading 100000 records into memory at once is bad mojo) but we still have lingering issues for large tables that are only going to get worse as the data increases in size. And please note, there is no data partitioning; they don't download in year segments, but they download all of their data and then segment it themselves. This is per the requirements; I'm not in a position to change this as much as it would remove the problem entirely.
In either case, on the largest table it runs out of memory. One solution is to increase the memory available, which solves one problem but suggests the creation of server load problems later on or even now if more than one client is downloading.
In this case, $outstream is:
$outstream = fopen("php://output",'w');
Which seems pretty obviously not really a physical disk location. I don't know much about php://output in terms of where the data resides before it is sent to the client, but it seems obvious that there are memory issues with simply writing a manifold database table to csv via this method.
To be exact, the staging box allows about 128mb for PHP, and this call in particular was short about 40mb (it tried to allocate 40mb more.) This seems a bit odd for behavior, as you would expect it to ask for memory in smaller parts.
Anyone know what can be done to get a handle on this?
So it looks like the memory consumption is caused by Zend Framework's output buffering. The best solution that I came up with was this.
doing ob_end_clean() right before we start to stream the file to the client. This particular instance of ZF is not going to produce any normal output or do anything more after this point, so complications don't arise. What odd thing does happen (perhaps from the standpoint of the user) is that they really get the file streamed to them.
Here's the code:
ob_end_clean();
while($datum = $data->fetch(PDO::FETCH_ASSOC)) {
$size += fputcsv($outstream, $datum, chr(9), chr(0));
}
Memory usage (according to the function memory_get_peak_usage(true) suggested in a ZF forum post somewhere) went from 90 megabytes down to 9 megabytes, which is what it was using here on my development box prior to any file reading.
Thanks for the help, guys!