LAMP / Laravel - Report generation maxing out single CPU

LAMP / Laravel - Report generation maxing out single CPU - php

So I have developed a report generation system in Laravel. We are using php 7 (opcache enabled) / apache / mysql / on a centos 7 box. With one report, grabbing all the information ends up taking about 15 seconds but then I have to loop through and do a bunch of filtering on Collections etc etc. I have optimized this from top to bottom for about a week and have got the entire report generation to take about 45 seconds (dealing with multiple tables with greater than 1 million entries). This maxes out my CPU until its done of course.
My issue is when we pushed it live to the client their CPU is not up to the task. They have 4 cpu's # 8 cores each # 2.2ghz. However, since php is a single process it only runs on one cpu and maxes it out and since its so slow it takes closer to 10 minutes to run the report.
Is there any way to get apache / php / linux ...whatever....to use all 4 cpu's for a single php process? The only other option is to tell the client they need a better server....not an option. Please help.

So I stopped trying to find a way to have the server handle my code better and found a few ways to optimize my code.
First off, I used the collection groupBy() method to group my collection so that i had a bunch of sub-arrays with the id as key. When I looped through these I just grabbed that sub-array instead of using the collection's filter() method which is REALLY slow when dealing with this many items. That saved me a LOT of processing power.
Secondly, every time I used a sub-array I removed it from the main array. So the array became smaller and smaller every time it went through the foreach.
These optimizations ended up saving me a LOT of processing power and now my reports run fine. After days of searching for a way to allow php to handle parallel processing etc I have come to the conclusion that its simply not possible.
Hope this helps.

Related

Running 600+ threads with PHP pthreads - what about the overhead

I have a server with 2 physical CPU which have together 24 cores and 10 GB RAM.
The PHP program is calculating a statistic and I could run each section totally independent of the others. Once all calculations finished I have only to "merge" them.
Therefore I had the idea to perform each calculation phase in a separate thread created/controlled by "pthread".
Each calculation takes around 0,10 seconds but the amount of the calculations lets it take that long when they are serialized.
My questions:
Is there a limitation when creating a new "thread" with "pthreads"?
What is the overhead when creating a new thread? I must consider this to avoid a new delay.
I can imagine that for several seconds the load would be very high but then it ends suddenly once each calculation finished. This is not the problem. It is "my" server and I do not have to take care regarding other users [or when it is a shared server].

While "waiting" for an answer :-) I started to rewrite the class.
I can summarize it like this:
There is no way to start 600 threads at once. I expected it but I wanted to know where is the limit. My configuration "allowed" around 160 threads to be started.
When starting more than these 150 threads the PHP script stopped working without any further notice.
As Franz Gleichmann pointed out the whole process took longer when starting lot of threads. I found out that starting 20 threads has the best performance.
The achieved performance gain is between 20% and 50% - I am satisfied.
I don't know if it is a bug in the pthread library but I could not access any class members. I had to move the class members inside the function. Due to the fact the calculation is in one function it did not bother me and I do not investigate it further.

Suggestion for using php array?

Let say i have 100k records in table, after fetching that records from table i am pushing it to an array with some calculations, and then send them to server for further processing.
I have test the scenario with(1k) records, its working perfectly, but worrying about if there is performance issue, because the page which do calculation and fetching records from db run after each 2 mins.
My Question is can I use array for more than 2 Millions records?

There's no memory on how much data an array can hold, the limit is server memory/PHP memory limit.
Why would you push 100k records into an array? You know databases have sorting and limiting for that reason!

My Question is can I use array for more than 2 Millions records?
Yes you can, 2 Million array entries is not a limit in PHP for arrays. The array limit depends on the memory that is available to PHP.
ini_set('memory_limit', '320M');
$moreThan2Million = 2000001;
$array = range(0, $moreThan2Million);
echo count($array); #$moreThan2Million
You wrote:
The page is scheduled and run after 2 min, so I am worrying about the performance issue.
And:
But I need to fetch all, not 100 at time, and send them to server for further processing.
Performance for array operations is dependent on processing power. With a fast enough computer, you should not run into any problems. However, keep in mind that PHP is an interpreted language and therefore considerably slower than compiled binaries.
If you need to run the same script every 2 minutes but the runtime of the script is larger than two minutes, you can distribute script execution over multiple computers, so one process is not eating the CPU and memory resources of the other and can finish the work in meantime another process runs on an additional box.
Edit
Good answer, but can you write your consideration, about how much time the script will need to complete, if the there is no issue with the server processor and RAM.
That depends on the size of the array, the amount of processing each entry needs (in relation to the overall size of the array) and naturally the processor power and the amount of RAM. All these are unspecified with your question, so I can specifically say, that I would consider this unspecified. You'll need to test this on your own and building metrics for your application by profiling it.
I have 10GB RAM and More than 8 Squad processor.
For example you could do a rough metric for 1, 10, 100, 1000, 10000, 100000 and 1 million entries to see how your (unspecified) script scales on that computer.
I am sending this array to another page for further processing.
Metric as well the amount of data you send between computers and how much bandwidth you have available for inter-process communication over the wire.

Let say i have 100k records in table, after fetching that records from table i am pushing it to an array with some filters.
Filters? Can't you just write a query that implements those filters instead? A database (depending on vendor) isn't just a data store, it can do calculations and most of the time it's much quicker than transferring the data to PHP and doing the calculations there. If you have a database in, say, PostgreSQL, you can do pretty much everything you've ever wanted with plpgsql.

Zend Lucene exhausts memory when indexing

An oldish site I'm maintaining uses Zend Lucene (ZF 1.7.2) as it's search engine. I recently added two new tables to be indexed, together containing about 2000 rows of text data ranging between 31 bytes and 63kB.
The indexing worked fine a few times, but after the third run or so it started terminating with a fatal error due to exhausting it's allocated memory. The PHP memory limit was originally set to 16M, which was enough to index all other content, 200 rows of text at a few kilobytes each. I gradually increased the memory limit to 160M but it still isn't enough and I can't increase it any higher.
When indexing, I first need to clear the previously indexed results, because the path scheme contains numbers which Lucene seems to treat as stopwords, returning every entry when I run this search:
$this->index->find('url:/tablename/12345');
After clearing all of the results I reinsert them one by one:
foreach($urls as $v) {
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::UnStored('content', $v['data']);
$doc->addField(Zend_Search_Lucene_Field::Text('title', $v['title']);
$doc->addField(Zend_Search_Lucene_Field::Text('description', $v['description']);
$doc->addField(Zend_Search_Lucene_Field::Text('url', $v['path']);
$this->index->addDocument($doc);
}
After about a thousand iterations the indexer runs out of memory and crashes. Strangely doubling the memory limit only helps a few dozen rows.
I've already tried adjusting the MergeFactor and MaxMergeDocs parameters (to values of 5 and 100 respectively) and calling $this->index->optimize() every 100 rows but neither is providing consistent help.
Clearing the whole search index and rebuilding it seems to result in a successful indexing most of the time, but I'd prefer a more elegant and less CPU intensive solution. Is there something I'm doing wrong? Is it normal for the indexing to hog so much memory?

I had a similar problem for a site I had to maintain that had at least three different languages and had to re-index the same 10'000+ (and growing) localized documents for each different locale separately (each using their own localized search engine). Suffice to say that it failed usually within the second pass.
We ended up implementing an Ajax based re-indexing process that called the script a first time to initialize and start re-indexing. That script aborted at a predefined number of processed documents and returned a JSON value indicating if it was completed or not, along with other progress information. We then re-called the same script again with the progress variables until the script returned a completed state.
This allowed also to have a progress bar of the process for the admin area.
For the cron job, we simply made a bash script doing the same task but with exit codes.
This was about 3 years ago and nothing has failed since then.

Common technique in cleaning cache with php infinite loop

Hi the question is when you are executing infinite loop with php, how do you control memory clean up ?
The rough example is to get the result or to update the result from / to mysql in infinity while loop.
Need any common methods .
Thank you.
PS - all the nemesis and bugs of PHP were replaced by moving to python completely ...

As far as i know in PHP memory is freed when variable goes out of scope. But there are some other problems:
circullar references - PHP 5.3
should solve it - it also allows to run GC when you want
If PHP takes for example 5 MB of
memory in first iteration its
process will occupy this memory even
if later iterations would take for
example 1 MB
You have to free some things
manually (like for example mentioned
before database results)
Using scripting language for process-like running is very bad idea.
Try do it other way:
Write a script which would processs
amount of data that it would take
approximately 55-60 seconds to run.
Add a cron job to run it every
minute.
Add some kind of mutual exclusion to
script so cron would not run
concurrent scripts - you can
synchronise it on database table
(using SELECT FOR UPDATE)

As of PHP 5.3, you can explicitly trigger a GC cycle with gc_collect_cycles() as documented here.
Before that, it was out of your control, and you'd have to wait for PHP to decide it was time to take out the trash on its own - either by trying to exceed the memory limit with a significant amount of used-but-unattached memory objects or sacrificing a goat under the full moon and hoping for the best.

PHP and CPU - Process of chat + notifications

My site has a PHP process running, for each window/tab open, that runs in a maximum of 1 minute, and it returns notifications/chat messages/people online or offline. When JavaScript gets the output, it calls the same PHP process again and so on.
This is like Facebook chat.
But, seems it is taking too much CPU when it is running. Have you something in mind how Facebook handles this problem? What do they do so their processes don't take too much CPU and put their servers down?
My process has a "while(true)", with a "sleep(1)" at the end. Inside the cycle, it checks for notifications, checks if one of the current online people got offline/changed status, reads unread messages, etc.
Let me know if you need more info about how my process works.
Does calling other PHPs from "system()" (and wait for its output) alleviate this?
I ask this because it makes other processes to check notifications, and flushes when finished, while the main PHP is just collecting the results.
Thank you.

I think your main problem here is the parallelism. Apache and PHP do not excell at tasks like this where 100+ Users have an open HTTP-Request.
If in your while(true) you spend 0.1 second on CPU-bound workload (checking change status or other useful things) and 1 second on the sleep, this would result in a CPU load of 100% as soon as you have 10 users online in the chat. So in order so serve more users with THIS model of a chat you would have to optimize the workload in your while(true) cycle and/or bring the sleep interval from 1 second to 3 or higher.
I had the same problem in a http-based chat system I wrote many years ago where at some point too many parallel mysql-selects where slowing down the chat, creating havy load on the system.
What I did is implement a fast "ring-buffer" for messages and status information in shared memory (sysv back in the day - today I would probably use APC or memcached). All operations write and read in the buffer and the buffer itself gets periodicaly "flushed" into the database to persist it (but alot less often than once per second per user). If no persistance is needed you can omit a backend of course.
I was able to increase the number of user I could serve by roughly 500% that way.
BUT as soon as you solved this isse you will be faced with another: Available System Memory (100+ apache processes a ~5MB each - fun) and process context switching overhead. The more active processes you have the more your operating system will spend on the overhead involved with assigning "fair enough" CPU-slots AFAIK.
You'll see it is very hard to scale efficently with apache and PHP alone for your usecase. There are open source tools, client and serverbased to help though. One I remember places a server before the apache and queues messages internally while having a very efficent multi-socket communication with javascript clients making real "push" events possible. Unfortunatly I do not remember any names so you'll have to research or hope on the stackoverflow-community to bring in what my brain discarded allready ;)
Edit:
Hi Nuno,
the comment field has too few characters so I reply here.
Lets get to the 10 users in parallel again:
10*0.1 second CPU time per cycle (assumed) is roughly 1s combined CPU-time over a period of 1.1 second (1 second sleep + 0.1 second execute). This 1 / 1.1 which I would boldly round to 100% cpu utilization even though it is "only" %90.9
If there is 10*0.1s CPU time "stretched" over a period of not 1.1 seconds but 3.1 (3 seconds sleep + 0.1 seconds execute) the calculation is 1 / 3.1 = %32
And it is logical. If your checking-cycle queries your backend three times slower you have only a third of the load on your system.
Regarding the shared memory: The name might imply it but if you use good IDs for your cache-areas, like one ID per conversation or user, you will have private areas within the shared memory. Database tables also rely on you providing good IDs to seperate private data from public information so those should be arround allready :)
I would also not "split" any more. The fewer PHP-processes you have to "juggle" in parallel the easier it is for your systems and for you. Unless you see it makes absolutly sense because one type of notification takes alot more querying ressources than another and you want to have different refresh-times or something like that. But even this can be decided in the whyile cycle. users "away"-status could be checked every 30 seconds while the messages he might have written could get checked every 3. No reason to create more cycles. Just different counter variables or using the right divisor in a modulo operation.
The inventor of PHP said that he believes man is too limited to controll parallel processes :)
Edit 2
ok lets build a formula. We have these variables:
duration of execution (e)
duration of sleep (s)
duration of one cycle (C)
number of concurrent users (u)
CPU load (l)
c=e+s
l=ue / c #expresses "how often" the available time-slot c fits into the CPU load generated by 30 CONCURRENT users.
l=ue / (e+s)
for 30 users ASSUMING that you have 0.1s execution time and 1 second sleep
l=30*0.1 / (0.1 + 1)
l=2.73
l= %273 CPU utilization (aka you need 3 cores :P)
exceeding capab. of your CPU measn that cycles will run longer than you intend. the overal response time will increase (and cpu runs hot)

PHP blocks all sleep() and system() calls. What you really need is to research pcntl_fork(). Fortunately, I had these problems over a decade ago and you can look at most of my code.
I had the need for a PHP application that could connect to multiple IRC servers, sit in unlimited IRC chatrooms, moderate, interact with, and receive commands from people. All this and more was done in a process efficient way.
You can check out the entire project at http://sourceforge.net/projects/phpegg/ The code you want is in source/connect.inc.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.