Laravel importer routine slows down over time

Laravel importer routine slows down over time - php

I have this routine that fetches some data from a webservice and stores it in my database. This data have 20k+ items. To save them into the database, i have to retrieve some information first and then store them. So i have this foreach loop that runs 20k+ times, performing a read and a write to the database each time.
But this approach slows down over time. It takes more than an hour to finish!
I've disabled the query log (DB::disableQueryLog()) but i didn't notice any gain in performance.
Here's my code:
$data = API::getItems();
foreach ($data as $item) {
$otherItem = OtherItem::where('something', $item['something'])->first();
if (!is_null($otherItem)) {
Item::create([
...
]);
}
}
As a solution i decided to pre-fetch all the OtherItem into a collection and it solved the problem:
$data = API::getItems();
$otherItems = OtherItem::all();
foreach ($data as $item) {
$otherItem = otherItems->where('something', $item['something'])->first();
if (!is_null($otherItem)) {
Item::create([
...
]);
}
}
But i want to understand why the first approach slows down drastically over time and what is the best way to do such sort of things.
EDIT:
To clarify:
I know that doing 20k queries is not performant and, in this case, performance is not important (unless it takes hours instead of minutes). I will only run this routine while in development now and then. My final approach was a mix of both answers (I haven't thought in buffering the items and insert them in batches).
Here's the code for anyone interested:
$data = collect(API::getPrices());
$chunks = $data->chunk(500);
$otherItems = OtherItem::all();
foreach ($chunks as $items) {
$buffer = [];
foreach ($items as $item) {
$otherItem = otherItems->where('something', $item['something'])->first();
if (!is_null($otherItem)) {
$buffer[] = [
...
];
}
}
Item::insert($buffer);
}
So, what is bothering me is why is painfully slow (even with all the queries). I've decided to do some benchmarking to analyse the question further.
With the two queries approach i get the following results:
For 6000 loop:
Max read: 11.5232 s
Min read: 0.0044 s
Mean read: 0.3196 s
Max write: 0.9133 s
Min write: 0.0007 s
Mean write: 0.0085 s
Every 10-20 iteractions the read time goes up to over a sec for 2-3 iteractions which is weird and i have no ideia why.
Just out of curiosity, i've also benchmarked the diference between chunking and buffering the items before inserting into the DB:
without buffering: 1 115,4 s (18 min 35 s)
chunking and buffering: 1064.7 s (17 min 45 s)

In first code snippet, you're creating 40000 queries for 20000 items. It's two queries per item - first will get the data, second will store something.
Second code snippet will create 20001 query and it's very slow solution too.
You can build an array and use insert() instead of using create() method each time you want to store some data. So this code will create just 2 queries instead of 40000 and 20001.
$otherItems = OtherItem::all();
$items = [];
foreach ($data as $item) {
$otherItem = otherItems->where('something', $item['something'])->first();
if (!is_null($model)) {
$items[] = [.....];
}
}
Item::insert($items);

It slows down because there's simply so many queries - each one is a round trip to the database.
Another thing you can do is try chunking the inserts with database transactions. Play around with the exact numbers but try inserting in batches of a few hundred or so.
i.e.
start transaction
loop over chunk, performing inserts
commit
repeat for next chunk until no chunks remain
Laravel's ORM provides a chunk method for this kind of use case.

Related

How do I efficiently run a PHP script that doesn't take forever to execute in wamp enviornemnt...?

I've made a script that pretty much loads a huge array of objects from a mysql database, and then loads a huge (but smaller) list of objects from the same mysql database.
I want to iterate over each list to check for irregular behaviour, using PHP. BUT everytime I run the script it takes forever to execute (so far I haven't seen it complete). Is there any optimizations I can make so it doesn't take this long to execute...? There's roughly 64150 entries in the first list, and about 1748 entries in the second list.
This is what the code generally looks like in pseudo code.
// an array of size 64000 containing objects in the form of {"id": 1, "unique_id": "kqiweyu21a)_"}
$items_list = [];
// an array of size 5000 containing objects in the form of {"inventory: "a long string that might have the unique_id", "name": "SomeName", id": 1};
$user_list = [];
Up until this point the results are instant... But when I do this it takes forever to execute, seems like it never ends...
foreach($items_list as $item)
{
foreach($user_list as $user)
{
if(strpos($user["inventory"], $item["unique_id"]) !== false)
{
echo("Found a version of the item");
}
}
}
Note that the echo should rarely happen.... The issue isn't with MySQL as the $items_list and $user_list array populate almost instantly.. It only starts to take forever when I try to iterate over the lists...

With 130M iterations, adding a break will help somehow despite it rarely happens...
foreach($items_list as $item)
{
foreach($user_list as $user)
{
if(strpos($user["inventory"], $item["unique_id"])){
echo("Found a version of the item");
break;
}
}
}
alternate solutions 1 with PHP 5.6: You could also use PTHREADS and split your big array in chunks to pool them into threads... with break, this will certainly improve it.
alternate solutions 2: use PHP7, the performances improvements regarding arrays manipulations and loop is BIG.
Also try to sort you arrays before the loop. depends on what you are looking at but very oftenly, sorting arrays before will limit a much as possible the loop time if the condition is found.

Your example is almost impossible to reproduce. You need to provide an example that can be replicated ie the two loops as given if only accessing an array will complete extremely quickly ie 1 - 2 seconds. This means that either the string your searching is kilobytes or larger (not provided in question) or something else is happening ie a database access or something like that while the loops are running.

You can let SQL do the searching for you. Since you don't share the columns you need I'll only pull the ones I see.
SELECT i.unique_id, u.inventory
FROM items i, users u
WHERE LOCATE(i.unique_id, u inventory)

PHP array optimization for 80k rows

I need help to find workaround for getting over memory_limit. My limit is 128MB, from database I'm getting something about 80k rows, script stops at 66k. Thanks for help.
Code:
$posibilities = [];
foreach ($result as $item) {
$domainWord = str_replace("." . $item->tld, "", $item->address);
for ($i = 0; $i + 2 < strlen($domainWord); $i++) {
$tri = $domainWord[$i] . $domainWord[$i + 1] . $domainWord[$i + 2];
if (array_key_exists($tri, $possibilities)) {
$possibilities[$tri] += 1;
} else {
$possibilities[$tri] = 1;
}
}
}

Your bottleneck, given your algorithm, is most possibly not the database query, but the $possibilities array you're building.
If I read your code correctly, you get a list of domain names from the database. From each of the domain names you strip off the top-level-domain at the end first.
Then you walk character-by-character from left to right of the resulting string and collect triplets of the characters from that string, like this:
example.com => ['exa', 'xam', 'amp', 'mpl', 'ple']
You store those triplets in the keys of the array, which is nice idea, and you also count them, which doesn't have any effect on the memory consumption. However, my guess is that the sheer number of possible triplets, which is for 26 letters and 10 digits is 36^3 = 46656 possibilities each taking 3 bytes just for key inside array, don't know how many boilerplate code around it, take quite a lot from your memory limit.
Probably someone will tell you how PHP uses memory with its database cursors, I don't know it, but you can do one trick to profile your memory consumption.
Put the calls to memory-get-usage:
before and after each iteration, so you'll know how many memory was wasted on each cursor advancement,
before and after each addition to $possibilities.
And just print them right away. So you'll be able to run your code and see in real time what and how seriously uses your memory.
Also, try to unset the $item after each iteration. It may actually help.
Knowledge of specific database access library you are using to obtain $result iterator will help immensely.

Given the tiny (pretty useless) code snippet you've provided I want to provide you with a MySQL answer, but I'm not certain you're using MySQL?
But
- Optimise your table.
Use EXPLAIN to optimise your query. Rewrite your query to put as much of the logic in the query rather than in the PHP code.
edit: if you're using MySQL then prepend EXPLAIN before your SELECT keyword and the result will show you an explanation of actually how the query you give MySQL turns into results.
Do not use PHP strlen function as this is memory inefficient - instead you can compare by treating a string as a set of array values, thus:
for ($i = 0; !empty($domainWord[$i+2]); $i++) {
in your MySQL (if that's what you're using) then add a LIMIT clause that will break the query into 3 or 4 chunks, say of 25k rows per chunk, which will fit comfortably into your maximum operating capacity of 66k rows. Burki had this good idea.
At the end of each chunk clean all the strings and restart, set into a loop
$z = 0;
while ($z < 4){
///do grab of data from database. Preserve only your output
$z++;
}
But probably more important than any of these is provide enough details in your question!!
- What is the data you want to get?
- What are you storing your data in?
- What are the criteria for finding the data?
These answers will help people far more knowledgable than me to show you how to properly optimise your database.

Optimizing array merge operation

I would appreciate any help, given.
I have 7 separate arrays with approx. 90,000 numbers in each array (let's call them arrays1-arrays7). There are no duplicate numbers within each array itself. BUT, there can be duplicates between the arrays. for example, array2 has no duplicates but it is possible to have numbers in common with arrays3 and arrays4.
The Problem:
I am trying to identify all of the numbers that are duplicated 3 times once all 7 arrays are merged.
I must do this calculation 1000 times and it takes 15 mins but that is not ok because I have to run it 40 times -- The code:
if you know of another language that is best suited for this type of calculation please let me know. any extension suggestions such as redis or gearman are helpful.
for($kj=1; $kj<=1000; $kj++)
{
$result=array_merge($files_array1,$files_array2,$files_array3,$files_array4,$files_array5,$files_array6,$files_array7);
$result=array_count_values($result);
$fp_lines = fopen("equalTo3.txt", "w");
foreach($result as $key => $val)
{
if($result[$key]==3)
{
fwrite($fp_lines, $key."\r\n");
}
}
fclose($fp_lines);
}
i have also tried the code below with strings but the array_map call and the array_count values call take 17 mins:
for($kj=1; $kj<=1000; $kj++)
{
$result='';
for ($ii = 0; $ii< 7; $ii++) {
$result .= $files_array[$hello_won[$ii]].'\r\n';
}
$result2=explode("\n",$result);//5mins
$result2=array_map("trim",$result2);//11mins
$result2=array_count_values($result2);//4-6mins
$fp_lines = fopen("equalTo3.txt", "w");
foreach($result2 as $key => $val)
{
if($result2[$key]==3)
{
fwrite($fp_lines, $key."\r\n");
}
}
fclose($fp_lines);
unset($result2);

array_merge() is significantly slower with more elements in the array because (from php.net):
If the input arrays have the same string keys, then the later value
for that key will overwrite the previous one. If, however, the arrays
contain numeric keys, the later value will not overwrite the original
value, but will be appended.
Values in the input array with numeric keys will be renumbered with
incrementing keys starting from zero in the result array.
So this function is actually making some conditional statements. You can replace array merge with normal adding, consisting of the loop (foreach or any other) and the [] operator. You can write a function imitating array_merge, like(using reference to not copy the array..):
function imitateMerge(&$array1, &$array2) {
foreach($array2 as $i) {
$array1[] = $i;
}
}
And you will see the increase of speed really hard.

I suggest replacing
foreach($result as $key => $val)
{
if($result[$key]==3)
{
fwrite($fp_lines, $key."\r\n");
}
}
With something like
$res = array_keys(array_filter($result, function($val){return $val == 3;}));
fwrite($fp_lines, implode("\r\n", $res));

This is probably all wrong, look at last edit
I also think array_merge is the problem, but my suggestion would be to implement
a function counting the values in several array directly instead of merging first.
This depends a little bit on how much overlap you have in your arrays. If the overlap
is very small then this might not be much faster then merging, but with significant
overlap (rand(0, 200000) to fill arrays when i tried) this will be much faster.
function arrValues($arrs) {
$values = array();
foreach($arrs as $arr) {
foreach($arr as $key => $val) {
if(array_key_exists($key, $values)) {
$values[$val]++;
} else {
$values[$val] = 1;
}
}
}
return $values;
}
var_dump(arrValues(array
($files_array1
,$files_array2
,$files_array3
,$files_array4
,$files_array5
,$files_array6
,$files_array7
)));
The computation takes about 0.5s on my machine, then another 2s for printing the stuff.
-edit-
It's also not clear to me why you do the same thing 1000 times? Are the arrays different
each time or something? Saying a bit about the reason might give people additional ideas...
-edit again-
After some more exploration I don't believe array_merge is at fault any more. You don't
have enough overlap to benefit that much from counting everything directly. Have you
investigated available memory on your machine? For me merging 7 arrays with 90k elements
in each takes ~250M. If you have allowed php to use this much memory, which I assume you
have since you do not get any allocation errors then maybe the problem is that the memory
is simply not available and you get a lot of page faults? If this is not the problem then
on what kind of machine and what php version are you using? I've tested your
original code on 5.5 and 5.4 and fixing the memory issue it also runs in about 0.5s. That
is per iteration mind you. Now if you do this a 1000 times in the same php script then
it will take a while. Even more so considering that you allocate all this memory each time.
I believe you really should consider putting stuff in a database. Given your numbers it seems you have ~500M rows in total. That is an awful lot to handle in php. A database makes it easy.

Initiating the same loop with either a while or foreach statement

I have code in php such as the following:
while($r = mysql_fetch_array($q))
{
// Do some stuff
}
where $q is a query retrieving a set of group members. However, certain groups have there members saved in memcached and that memcached value is stored in an array as $mem_entry. To run through that, I'd normally do the following
foreach($mem_entry as $k => $r)
{
// Do some stuff
}
Here's the problem. I don't want to have two blocks of identical code (the //do some stuff section) nested in two different loops just because in one case I have to use mysql for the loop and the other memcached. Is there some way to toggle starting off the loop with the while or foreach? In other words, if $mem_entry has a non-blank value, the first line of the loop will be foreach($mem_entry as $k => $r), or if it's empty, the first line of the loop will be while($r = mysql_fetch_array($q))
Edit
Well, pretty much a few seconds after I wrote this I ended up coming with the solution. Figure I'd leave this up for anyone else that might come upon this problem. I first set the value of $members to the memcached value. If that's blank, I run the mysql query and use a while loop to transfer all the records to an array called $members. I then initiate the loop using foreach($members as as $k => $r). Basically, I'm using a foreach loop everytime, but the value of $members is set differently based on whether or not a value for it exists in memcached.

Why not just refactor out doSomeStuff() as a function which gets called from within each loop. Yes, you'll need to see if this results in a performance hit, but unless that's significant, this is a simple approach to avoiding code repetition.
If there's a way to toggle as you suggest, I don't know of it.

Not the ideal solution but i will give you my 2 cents. The ideal would have been to call a function but if you dont want to do that then, you can try something like this:
if(!isset($mem_entry)){
$mem_entry = array();
while($r = mysql_fetch_array($q))
{
$mem_entry[] = $r;
}
}
The idea is to just use the foreach loop to do the actual work, if there is nothing in memcache then fill your mem_entry array with stuff from mysql and then feed it to your foreach loop.

PHP - Foreach loops and ressources

I'm using a foreach loop to process a large set of items, unfortunately it's using alot of memory. (probably because It's doing a copy of the array).
Apparently there is a way to save some memory with the following code: $items = &$array;
Isn't it better to use for loops instead?
And is there a way to destroy each item as soon as they have been processed in a foreach loop.
eg.
$items = &$array;
foreach($items as $item)
{
dosomethingwithmy($item);
destroy($item);
}
I'm just looking for the best way to process a lot of items without running out of ressources.

Try a for loop:
$keys = array_keys($array);
for ($i=0, $n=count($keys); $i<$n; ++$i) {
$item = &$array[$keys[$i]];
dosomethingwithmy($item);
destroy($item);
}

Resource-wise, your code will be more efficient if you use a for loop, instead of a foreach loop. Each iteration of your foreach loop will copy the current element in memory, which will take time and memory. Using for and accessing the current item with an index is a bit better and faster.

use this:
reset($array);
while(list($key_d, $val_d) = each($array)){
}
because foreach create a copy

If you are getting that large data set from a database, it can often help to try and consume the data set as soon as it comes from the database. For example from the php mysql_fetch_array documentation.
$resource = mysql_query("query");
while ($row = mysql_fetch_array($resource, MYSQL_NUM)) {
process($row);
}
this loop will not create an in memory copy of the entire dataset (at least not redundantly). A friend of mine sped up some of her query processing by 10x using this technique (her datasets are biological so they can get quite large).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.