Optimization db queries - scaling big data - Laravel 5 - php

I've been working on a web-app that uses Laravel 5. It's running on localhost (xampp) on a windows 8.1 PC. 4GB RAM, 2,67GHz processor, pretty simple.
The table I'm querying most of the times contains a lot of rows (10.000 give or take) - so many that to write a route that does:
return User::all();
Running this just returns a white screen. Sometimes Chrome console lists a 500 (Internal Server error).
Echoes or prints made before the query are shown but nothing after that is executed. Querying another model (whose table only has 2 rows) returns the data correctly.
Which leads me to conclude that my server isn't scaling well for this amount of data. I'm trying to fix this by doing.
User::all()->chunk(200, function($chunkOfTickets){ /*some code*/});
which I expected would split the data into chunks to make it easier on the server. This doesn't work, however, because Eloquent is first fetching all the data (and breaking because it can't handle it) and only then dividing it into chunks.
Thanks for reading.
EDIT: just tested over and over, requesting increasingly greater ammounts of data. The limit is 26000 rows approximately (27000 and out of memory error is returned).
As stated in the comments the php log states this. Apparently I was requesting so much memory it crashed before Laravel could show the error message
[01-Jul-2015 17:27:51 UTC] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 8376445 bytes) in C:\xampp\htdocs\gamescaffold\vendor\laravel\framework\src\Illuminate\Support\Collection.php on line 791
Extra Edit
Is there a way I can divide in chunks the reply from the DB? like:
User::chunk(200)->all(); /*which obviously does not work*/
If I write a seemingly complex query to the database directly through phpMyAdmin it's returning 37035 rows in 0.0045 seconds. (I suspect there's some under the hood optimizations here by xampp or something though)

Note: I'm posting this as an answer because it involves some code. I guess I've totally missed your point in the original question because I though that you were trying to return the whole result set to the client. If I'm still missing something, please leave a comment and I'll delete this answer.
So I want to take a set of objects, do stuff with them and save them back to the DB
That's an easy one!
$chunkSize = 100; // or whatever your memory allows
$totalUsers = User::count();
$chunks = floor($totalUsers / $chunkSize);
for ($chunk = 0; $chunk <= $chunks; $chunk++) {
$offset = $chunk * $chunkSize;
$users = User::skip($offset)->take($chunkSize)->get();
foreach($users as $user)
{
// do something
$user->save();
}
}
If it takes too long, you'll probably get a timeout if you trigger this loop from HTTP, so you should probably run it from console.

DB::table("table")
->select('column1', 'column2')
->orderBy('column2', 'asc')
->chunk(70000, function($users) {
foreach ($users as $row) {
// To do with data
}
}

Related

Simple HTML DOM with pagination

I want to parse values from website category with paginated posts. Information that I need is inside posts. I tried to use Simple HTML DOM to do that. I got it, but I did not think that is correct. The script works slowly and with a large amount of data I get the error
Maximum execution timeout of 300 seconds
<?php
include('simple_html_dom.php');
$total_pages=600;
$i = 1;
while ($i <= $total_pages):
$html = file_get_html(''.$url.'/'.$from.'');
foreach($html->find('.itemReview h3 a') as $a) {
$post = file_get_html('http://www.website.com/'.$a->href.'');
$author_mail = $post->find('.sellerAreaSecond',0);
$author_mail = $post->plaintext;
$a_mail_array[] = $author_mail;
}
$fp = fopen('file.csv', 'w');
foreach( $a_mail_array as $ddd) {
fputcsv($fp, array($ddd));
}
fclose($fp);
$from++;
endwhile;
?>
As you are requesting your pages and the posts inside them via network, of course this is slow and you run into script timeout with large amounts af data. Try increasing the max execution time in your php.ini file.
One solution would be to increase the time limit in your server settings (php.ini)
A better one would be not to have your server download 100 pages from itself and parse them. Parsing HTML takes tons of time, it has to go through all the code and find your .read_more a and .authoremail. I suspect you're working on plain files for data storage, if that's the case you should switch to a database like MySQL or even SQLite, then you can just query the database - which takes considerably less time. This not only makes your website not crash when then's more content, but also speeds it up.
With SQL, you could just store author's email in a table and then use SELECT authoremail FROM posts and then use foreach(). This also enables you to do things like sorting by date, name etc. on the fly. Just letting your website run slow and inefficient my increasing the time limit is probably not a good idea.

Laravel chunking not reducing PHP memory usage

So I've been trying my hands on laravel's chunking in Eloquent but I've run into a problem. Consider the following code (a much more simplified version of my problem):
$data = DB::connection('mydb')->table('bigdata')
->chunk(200, function($data) {
echo memory_get_usage();
foreach($data as $d) {
Model::create(
array(
'foo' => $d->bar,
...
//etc
));
}
}
So when I run the following code my memory outputs look like this:
19039816
21490096
23898816
26267640
28670432
31038840
So without jumping into php.ini and changing the memory_limit value any clue why it isn't working? According to the documentation: "If you need to process a lot (thousands) of Eloquent records, using the chunk command will allow you to do without eating all of your RAM".
I tried unset($data) after the foreach function but it did not help. Any clue as to how I can make use of chunk or did I misinterpret what it does?
Chunking data doesn't reduce memory usage, you need to do it like pagination directly using the database.
Like first fetch starting 200 order by id or something, and after processing first 200, fire that query again with a where clause asking next 200 results.
You can use lazy collections to improve memory uses for a big collection of data. It uses PHP generators under the hood. Take a look at the cursor example here https://laravel.com/docs/5.4/eloquent#chunking-results

Memory usage increasing inside loop: are Magento functions the cause?

My platform is PHP 5.2, Apache, Magento EE 1.9 and CentOS.
I have a pretty basic script which is fetching about 60,000 rows of data from an MS-SQL database using PHP's ms_sql() functions. The data is then processed a bit via data from Magento and finally written to a text file.
Really simple stuff...
$result = mssql_query($query);
while($row = mssql_fetch_assoc($result)) {
$member = $row; // Copied so I can modify it
// Do some stuff with each row... e.g.:
$customer = Mage::getModel("customer/customer");
$customer->loadByEmail($member["email"]);
$customerId = $customer->getId();
// Some more stuff like that...
$ordersCollection = Mage::getResourceModel('sales/order_collection');
// ...........
// Some more stuff like that...
$wishList = Mage::getModel('wishlist/wishlist')->loadByCustomer($customer);
// ...........
// Write straight to a file
fwrite($fp, implode("\t", $member) . "\r\n");
// Probably not even necessary
unset($member);
}
The problem is, the memory usage of my script increases with each iteration of the loop (about 10MB for every 300 rows), with a theoretical peak of about 2GB (though it hasn't got there yet).
I've taken great pains to ensure that I'm not leaving any data in memory. No huge arrays are building up, no variables are being added to, everything is either unset() or directly overwritten with each iteration of the loop.
So my question is: could the Magento functions be causing memory leaks?
And if so, how do I stop them from doing so?
Ideally this script should be totally "passive": just grab the query results, modify them a bit (very temporary memory needed for this) then dump them straight to a file and destroy the memory. But this is not happening!
Thanks
Exclude all Mage:: from your code and just dump data to the file without processing. And see what happens to the memory while doing this. Then start adding the Mage:: functions back one by one and see when it breaks.
This way you'll find the culprit. Then you need to start digging into it's implementation and see what could go wrong. You could also consider doing the processing without relying on your Mage:: calls. Just write the plain code to deal with the data in self-contained functions/classes and compare how things turn out if you exclude Mage:: entirely from the process.
Yes — PHP has a long history of non-ideal behavior when it comes to memory managment and code that pushes the edges of it's object oriented model.
You can try an alternate method of querying for your data that wastes less memory, or you can read up on how the Magento core team deals with this same issue.

Breaking Up Massive MySQL Update

Right now I have something like this in my CodeIgniter model:
<?php
$array = array(...over 29k IDs...);
$update = array();
foreach ($array as $line) {
$update[] = array('id' => $line, 'spintax' => $this->SpinTax($string));
### $this->SpinTax parses the spintax from a string I have. It has to be generated for each row.
}
$this->db->update_batch('table', $update, 'id');
?>
The first 20k records get updated just fine, but I get a 504 Gateway Time-out before it completes.
I have tried increasing the nginx server timeout to something ridiculous (like 10 minutes), and I still get the error.
What can I do to make this not timeout. I've read many answers and HOW-TOs to segment the update, but I continue to get the server timeout. A PHP or CodeIgniter solution would be excellent, and I need to deploy this code to multiple servers that might not be using nginx (similar error in Apache).
Thanks in advance.
You'll likely need to run this through command line and set_time_limit(0). IF you're in codeigniter, check this out on how to run a command line through the user guide. http://codeigniter.com/user_guide/general/cli.html
Now, before you do that, you mentioned you are using array chunk. If you're getting all the values from the database, no need to use array_chunk. Just set a get variable for instance.
/your/url?offset=1000, when that finishes, do a redirect to the same thing, but with 2000 and so on until it finishes.
Not the nicest or cleanest, but will likely get it done.

Grabbing data using the YouTube API over 1000 results and without using the Max Memory

I am trying to grab an entire channels video feed (all of the videos data) and store it into my MySQL database for use in an application I am currently working on. I am not the most experienced with the YouTube API. The code I am working with is the following:
public function printVideoFeed($count)
{
$this->startIndex($count);
$data = $this->yt->getVideoFeed($this->query);
foreach($data as $video)
{
echo $count .' - '.$video->getVideoTitle().'<br/>';
$count++;
}
//check if there are more videos
try{
$nextFeed = $data->getNextFeed();
} catch(Zend_Gdata_App_Exception $e)
{
echo $e->getMessage(). '<br/>';
}
if($nextFeed)
{
$this->printVideoFeed($count);
}
}
The error I am getting is:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 36 bytes) in C:\Program Files\ZendFrameworkCli\library\Zend\Gdata\App\Base.php on line 431
This is one of a few errors I am getting while trying to grab upwards of 3000 videos. My question is how can I make this not continue to expand the memory usage while continuing to do the printVideoFeed method again. If there is a way to make it break out of the loop but restart if there are still videos left that would be awesome. Ive been looking but to google this question is kind of a hard thing to do (to get the results im looking for).
Have you tried using iteration instead of recursion? I can imagine that PHP might keep the variables declared in the function, i.e. especially $data, until the function is left. Alternatively, you could call unset($data); before starting the recursion.
Also: Are you sure you have no infinite loop? Maybe you need to call startIndex() again before calling getNextFeed()?

Categories