Laravel chunking not reducing PHP memory usage - php

So I've been trying my hands on laravel's chunking in Eloquent but I've run into a problem. Consider the following code (a much more simplified version of my problem):
$data = DB::connection('mydb')->table('bigdata')
->chunk(200, function($data) {
echo memory_get_usage();
foreach($data as $d) {
Model::create(
array(
'foo' => $d->bar,
...
//etc
));
}
}
So when I run the following code my memory outputs look like this:
19039816
21490096
23898816
26267640
28670432
31038840
So without jumping into php.ini and changing the memory_limit value any clue why it isn't working? According to the documentation: "If you need to process a lot (thousands) of Eloquent records, using the chunk command will allow you to do without eating all of your RAM".
I tried unset($data) after the foreach function but it did not help. Any clue as to how I can make use of chunk or did I misinterpret what it does?

Chunking data doesn't reduce memory usage, you need to do it like pagination directly using the database.
Like first fetch starting 200 order by id or something, and after processing first 200, fire that query again with a where clause asking next 200 results.

You can use lazy collections to improve memory uses for a big collection of data. It uses PHP generators under the hood. Take a look at the cursor example here https://laravel.com/docs/5.4/eloquent#chunking-results

Related

Fastest way to get objects from s3 to ec2 PHP

On S3 I've got files around 100M (2.5M each) in this hierarchy:
id_folder / date_folder / hour_file.raw
I'm tried 3 different ways to fetch them ASAP:
I start with laravel Storage facade (I'm using laravel)..
Storage::disk('s3')->get($filePath); -> this one is the slowest
then I google a little and i found this class:
Amazon S3 PHP Class
http://undesigned.org.za/2007/10/22/amazon-s3-php-class/
I tried also to use Amazon instructions about creating S3Client and use getObject function and it still slow...
So, i need to get a lot of files from s3 to ec2 - what is the fastest way to do it?
Thanx!
If I'm understanding everything you're saying, there's not going to be a way around downloading that many objects being slow. 100,000,000 * 2.5MB = 250TB. That's a lot of data. There are things you can do to make it more efficient though.
If you try to get many (i.e thousands) objects "at once" by synchronously downloading them using S3\Client::getObjects, it will take forever. You get a little faster by using S3\Client::getObjectsAsync which returns a Guzzle\Promise\Promise. This isn't really asynchronous. All requests to S3 do not execute concurrently. No matter what, calling getObjectsAsync will block the thread until the request completes. And simply iterating through a loop and calling Guzzle\Promise\Promise::wait will still take forever.
However, if you break up your requests and execute them in batches of promises simultaneously you can shave some significant time from your requests. Guzzle provides a few options to wait on an array of promises, but I prefer the Guzzle\Promise\unwrap function. It returns an array of the results of the array of promises given to it.
Below is a generator I've written that does just that:
public function getObjectsBatch($bucket, $keys, $chunkSize = 350)
{
foreach (array_chunk($keys, $chunkSize) as $chunk) {
$promises = [];
foreach ($chunk as $key) {
$promises[] = $this->getClient()->getObjectAsync([
'Bucket' => $bucket,
'Key' => $key
])->then($success = function (Result $res) use ($key) {
$res->offsetSet('Key', $key);
return $res;
}, $fail = function (S3Exception $res) {
return $res;
});
}
yield unwrap($promises);
}
}
I'm using this to download thousands of objects, and stream them to the user as they are downloaded.
The size of your batch is important. In the example, I'm executing 350 requests at a time. I've done a bit of testing and this seems to be the most efficient. In my tests, I downloaded 4500 objects from S3 using various batch sizes. I performed my test 10 times for each batch size. 350 seems to be most efficient.
But your specific use case--downloading 250TB of data at one time--will take a long time no matter what way you do it. And you'll quickly run out of memory if you don't save the files to disk, then you'll also have to worry about disk space. I'm not sure why you need to download that many files, but it doesn't seem like a good idea.

Optimization db queries - scaling big data - Laravel 5

I've been working on a web-app that uses Laravel 5. It's running on localhost (xampp) on a windows 8.1 PC. 4GB RAM, 2,67GHz processor, pretty simple.
The table I'm querying most of the times contains a lot of rows (10.000 give or take) - so many that to write a route that does:
return User::all();
Running this just returns a white screen. Sometimes Chrome console lists a 500 (Internal Server error).
Echoes or prints made before the query are shown but nothing after that is executed. Querying another model (whose table only has 2 rows) returns the data correctly.
Which leads me to conclude that my server isn't scaling well for this amount of data. I'm trying to fix this by doing.
User::all()->chunk(200, function($chunkOfTickets){ /*some code*/});
which I expected would split the data into chunks to make it easier on the server. This doesn't work, however, because Eloquent is first fetching all the data (and breaking because it can't handle it) and only then dividing it into chunks.
Thanks for reading.
EDIT: just tested over and over, requesting increasingly greater ammounts of data. The limit is 26000 rows approximately (27000 and out of memory error is returned).
As stated in the comments the php log states this. Apparently I was requesting so much memory it crashed before Laravel could show the error message
[01-Jul-2015 17:27:51 UTC] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 8376445 bytes) in C:\xampp\htdocs\gamescaffold\vendor\laravel\framework\src\Illuminate\Support\Collection.php on line 791
Extra Edit
Is there a way I can divide in chunks the reply from the DB? like:
User::chunk(200)->all(); /*which obviously does not work*/
If I write a seemingly complex query to the database directly through phpMyAdmin it's returning 37035 rows in 0.0045 seconds. (I suspect there's some under the hood optimizations here by xampp or something though)
Note: I'm posting this as an answer because it involves some code. I guess I've totally missed your point in the original question because I though that you were trying to return the whole result set to the client. If I'm still missing something, please leave a comment and I'll delete this answer.
So I want to take a set of objects, do stuff with them and save them back to the DB
That's an easy one!
$chunkSize = 100; // or whatever your memory allows
$totalUsers = User::count();
$chunks = floor($totalUsers / $chunkSize);
for ($chunk = 0; $chunk <= $chunks; $chunk++) {
$offset = $chunk * $chunkSize;
$users = User::skip($offset)->take($chunkSize)->get();
foreach($users as $user)
{
// do something
$user->save();
}
}
If it takes too long, you'll probably get a timeout if you trigger this loop from HTTP, so you should probably run it from console.
DB::table("table")
->select('column1', 'column2')
->orderBy('column2', 'asc')
->chunk(70000, function($users) {
foreach ($users as $row) {
// To do with data
}
}

execute three function simultaneously

I have sript php with three function like this:
public function a($html,$text)
{
//blaa
return array();
}
public function b($html,$text){
//blaa
return array();
}
public function c($html,$text){
//blaa
return array();
}
require_once 'simple_html_dom.php';
$a=array();
$html=new simple_html_dom();
$a=$this->a($html,$text);
$b=$this->b($html,$text);
$c=$this->c($html,$text);
$html->clear();
unset($html);
$a=array_merge($a, $c);
$a=array_merge($a, $b);
a($html,$text) takes 5 seconds before giving a result
b($html,$text) takes 10 seconds before giving a result
c($html,$text) takes 12 seconds before giving a result
Thus the system takes 27 seconds before geving me a result, but I want take my result in 12 seconds. I can't use threads because my hosting does not support threads. How can I solve this problem?
PHP does not support this out of the box. If you really want to do this, you have two basic options (yep, it's going to be dirty). If you want a serious solution depending on your actual use-case, there is another option to consider.
Option 1: Use some AJAX-trickery
Create a page with a button that triggers three AJAX-calls to the different functions that you want to call.
Option 2: Run a command
If you're on UNIX, you can trigger a command from the PHP script to run a PHP script (php xyz.php) and that actually runs it on a different thread.
Serious option: use queues
Seriously: use a queue system like rabbitMQ or BeanstalkD to do these kind of things. Laravel supports it out of the box.
If the wait time is caused by blocking IO (waiting for server response) then curl_multi might help.
From the code you posted, though, it doesn't look like is your problem.
It looks more like simple html dom is taking a long time to parse your html. That's not too surprising because it's not a very good library. If this is the case you should consider switching to DomXPath.
You might wanna look into jQuery deferred objects.... $.when should handle this kinda of situation.

Memory usage increasing inside loop: are Magento functions the cause?

My platform is PHP 5.2, Apache, Magento EE 1.9 and CentOS.
I have a pretty basic script which is fetching about 60,000 rows of data from an MS-SQL database using PHP's ms_sql() functions. The data is then processed a bit via data from Magento and finally written to a text file.
Really simple stuff...
$result = mssql_query($query);
while($row = mssql_fetch_assoc($result)) {
$member = $row; // Copied so I can modify it
// Do some stuff with each row... e.g.:
$customer = Mage::getModel("customer/customer");
$customer->loadByEmail($member["email"]);
$customerId = $customer->getId();
// Some more stuff like that...
$ordersCollection = Mage::getResourceModel('sales/order_collection');
// ...........
// Some more stuff like that...
$wishList = Mage::getModel('wishlist/wishlist')->loadByCustomer($customer);
// ...........
// Write straight to a file
fwrite($fp, implode("\t", $member) . "\r\n");
// Probably not even necessary
unset($member);
}
The problem is, the memory usage of my script increases with each iteration of the loop (about 10MB for every 300 rows), with a theoretical peak of about 2GB (though it hasn't got there yet).
I've taken great pains to ensure that I'm not leaving any data in memory. No huge arrays are building up, no variables are being added to, everything is either unset() or directly overwritten with each iteration of the loop.
So my question is: could the Magento functions be causing memory leaks?
And if so, how do I stop them from doing so?
Ideally this script should be totally "passive": just grab the query results, modify them a bit (very temporary memory needed for this) then dump them straight to a file and destroy the memory. But this is not happening!
Thanks
Exclude all Mage:: from your code and just dump data to the file without processing. And see what happens to the memory while doing this. Then start adding the Mage:: functions back one by one and see when it breaks.
This way you'll find the culprit. Then you need to start digging into it's implementation and see what could go wrong. You could also consider doing the processing without relying on your Mage:: calls. Just write the plain code to deal with the data in self-contained functions/classes and compare how things turn out if you exclude Mage:: entirely from the process.
Yes — PHP has a long history of non-ideal behavior when it comes to memory managment and code that pushes the edges of it's object oriented model.
You can try an alternate method of querying for your data that wastes less memory, or you can read up on how the Magento core team deals with this same issue.

Diagnosing Memory Leaks - Allowed memory size of # bytes exhausted

I've encountered the dreaded error-message, possibly through-painstaking effort, PHP has run out of memory:
Allowed memory size of #### bytes exhausted (tried to allocate #### bytes) in file.php on line 123
Increasing the limit
If you know what you're doing and want to increase the limit see memory_limit:
ini_set('memory_limit', '16M');
ini_set('memory_limit', -1); // no limit
Beware! You may only be solving the symptom and not the problem!
Diagnosing the leak:
The error message points to a line withing a loop that I believe to be leaking, or needlessly-accumulating, memory. I've printed memory_get_usage() statements at the end of each iteration and can see the number slowly grow until it reaches the limit:
foreach ($users as $user) {
$task = new Task;
$task->run($user);
unset($task); // Free the variable in an attempt to recover memory
print memory_get_usage(true); // increases over time
}
For the purposes of this question let's assume the worst spaghetti code imaginable is hiding in global-scope somewhere in $user or Task.
What tools, PHP tricks, or debugging voodoo can help me find and fix the problem?
PHP doesn't have a garbage collector. It uses reference counting to manage memory. Thus, the most common source of memory leaks are cyclic references and global variables. If you use a framework, you'll have a lot of code to trawl through to find it, I'm afraid. The simplest instrument is to selectively place calls to memory_get_usage and narrow it down to where the code leaks. You can also use xdebug to create a trace of the code. Run the code with execution traces and show_mem_delta.
Here's a trick we've used to identify which scripts are using the most memory on our server.
Save the following snippet in a file at, e.g., /usr/local/lib/php/strangecode_log_memory_usage.inc.php:
<?php
function strangecode_log_memory_usage()
{
$site = '' == getenv('SERVER_NAME') ? getenv('SCRIPT_FILENAME') : getenv('SERVER_NAME');
$url = $_SERVER['PHP_SELF'];
$current = memory_get_usage();
$peak = memory_get_peak_usage();
error_log("$site current: $current peak: $peak $url\n", 3, '/var/log/httpd/php_memory_log');
}
register_shutdown_function('strangecode_log_memory_usage');
Employ it by adding the following to httpd.conf:
php_admin_value auto_prepend_file /usr/local/lib/php/strangecode_log_memory_usage.inc.php
Then analyze the log file at /var/log/httpd/php_memory_log
You might need to touch /var/log/httpd/php_memory_log && chmod 666 /var/log/httpd/php_memory_log before your web user can write to the log file.
I noticed one time in an old script that PHP would maintain the "as" variable as in scope even after my foreach loop. For example,
foreach($users as $user){
$user->doSomething();
}
var_dump($user); // would output the data from the last $user
I'm not sure if future PHP versions fixed this or not since I've seen it. If this is the case, you could unset($user) after the doSomething() line to clear it from memory. YMMV.
There are several possible points of memory leaking in php:
php itself
php extension
php library you use
your php code
It is quite hard to find and fix the first 3 without deep reverse engineering or php source code knowledge. For the last one you can use binary search for memory leaking code with memory_get_usage
I recently ran into this problem on an application, under what I gather to be similar circumstances. A script that runs in PHP's cli that loops over many iterations. My script depends on several underlying libraries. I suspect a particular library is the cause and I spent several hours in vain trying to add appropriate destruct methods to it's classes to no avail. Faced with a lengthy conversion process to a different library (which could turn out to have the same problems) I came up with a crude work around for the problem in my case.
In my situation, on a linux cli, I was looping over a bunch of user records and for each one of them creating a new instance of several classes I created. I decided to try creating the new instances of the classes using PHP's exec method so that those process would run in a "new thread". Here is a really basic sample of what I am referring to:
foreach ($ids as $id) {
$lines=array();
exec("php ./path/to/my/classes.php $id", $lines);
foreach ($lines as $line) { echo $line."\n"; } //display some output
}
Obviously this approach has limitations, and one needs to be aware of the dangers of this, as it would be easy to create a rabbit job, however in some rare cases it might help get over a tough spot, until a better fix could be found, as in my case.
I came across the same problem, and my solution was to replace foreach with a regular for. I'm not sure about the specifics, but it seems like foreach creates a copy (or somehow a new reference) to the object. Using a regular for loop, you access the item directly.
I would suggest you check the php manual or add the gc_enable() function to collect the garbage... That is the memory leaks dont affect how your code runs.
PS: php has a garbage collector gc_enable() that takes no arguments.
I recently noticed that PHP 5.3 lambda functions leave extra memory used when they are removed.
for ($i = 0; $i < 1000; $i++)
{
//$log = new Log;
$log = function() { return new Log; };
//unset($log);
}
I'm not sure why, but it seems to take an extra 250 bytes each lambda even after the function is removed.
I didn't see it explicitly mentioned, but xdebug does a great job profiling time and memory (as of 2.6). You can take the information it generates and pass it off to a gui front end of your choice: webgrind (time only), kcachegrind, qcachegrind or others and it generates very useful call trees and graphs to let you find the sources of your various woes.
Example (of qcachegrind):
If what you say about PHP only doing GC after a function is true, you could wrap the loop's contents inside a function as a workaround/experiment.
One huge problem I had was by using create_function. Like in lambda functions, it leaves the generated temporary name in memory.
Another cause of memory leaks (in case of Zend Framework) is the Zend_Db_Profiler.
Make sure that is disabled if you run scripts under Zend Framework.
For example I had in my application.ini the folowing:
resources.db.profiler.enabled = true
resources.db.profiler.class = Zend_Db_Profiler_Firebug
Running approximately 25.000 queries + loads of processing before that, brought the memory to a nice 128Mb (My max memory limit).
By just setting:
resources.db.profiler.enabled = false
it was enough to keep it under 20 Mb
And this script was running in CLI, but it was instantiating the Zend_Application and running the Bootstrap, so it used the "development" config.
It really helped running the script with xDebug profiling
I'm a little late to this conversation but I'll share something pertinent to Zend Framework.
I had a memory leak problem after installing php 5.3.8 (using phpfarm) to work with a ZF app that was developed with php 5.2.9. I discovered that the memory leak was being triggered in Apache's httpd.conf file, in my virtual host definition, where it says SetEnv APPLICATION_ENV "development". After commenting this line out, the memory leaks stopped. I'm trying to come up with an inline workaround in my php script (mainly by defining it manually in the main index.php file).
I didn't see it mentioned here but one thing that might be helpful is using xdebug and xdebug_debug_zval('variableName') to see the refcount.
I can also provide an example of a php extension getting in the way: Zend Server's Z-Ray. If data collection is enabled it memory use will balloon on each iteration just as if garbage collection was off.

Categories