PHP multiple curl/file_get_contents() with cron, high cpu usage

PHP multiple curl/file_get_contents() with cron, high cpu usage - php

I have about 8 cron tasks running every minute, every one of them takes time as they download data from other website by curl (single script makes multiple curl requests). Is there any way to lower the cpu or memory usage? Does unstetting variables help?

Yes , unsetting variables will lower the memory usage.
If you want to lower the cpu usage you have to give them fewer tasks per second. You can start each of your scripts after some time intervals. Since each script make multiple requests that would be the best way.
The bottleneck here should be the I/O usage , not the cpu , basically if its not going at 100% , You dont have to worry about it.

Related

How to limit CPU usage from within a PHP script / module

What is the best way to limit PHP scripts CPU usage from within that script?
I am not looking to re-nice the whole PHP system process, but rather keep a PHP script running for longer and adjusting the CPU usage of that script.
Basically, it would need to "renice" itself dynamically and specifically to only the script or it would need to slow down the computations / activities it is doing.
Tried proc_nice(), but could not get PHP to increase CPU usage of other scripts after my script finished. The change in my script affected other subsequent scripts/requests. This is, when used in my script, and after increasing the nice level, the nice value stayed for the PHP process in the system.

You could usleep() at regular intervals in your code. This will delay script execution for a given number of microseconds leaving more time for other processes.

What is the maximum number of cURL connections set by?

I have a script which runs 1000 cURL requests using curl_multi_* functions in PHP.
What is the bottleneck behind them timing out?
Would it be the CPU usage? Is there some more efficient way, in terms of how that number of outbound connections is handled by the server, to do this?
I cannot change the functionality and the requests themselves are simple calls to a remote API. I am just wondering what the limit is - would I need to increase memory on the server, or Apache connections, or CPU? (Or something else I have missed)

Your requests are made in a single thread of execution. The bottleneck is almost certainly CPU, have you ever actually watched curl multi code run ? ... it is incredibly cpu hungry; because you don't really have enough control over dealing with the requests. curl_multi makes it possible for you to orchestrate 1000 requests at once, but this doesn't make it a good idea. You have almost no chance of using curl_multi efficiently because you cannot control the flow of execution finely enough, just servicing the sockets and select()'ing on them will account for a lot of the high CPU usage you would see watching your code run on the command line.
The reasons the CPU usage is high during such tasks is this; PHP is designed to run for a fraction of a second, do everything as fast as it can. It usually does not matter how the CPU is utilized, because it's for such a short space of time. When you prolong a task like this the problem becomes more apparent, the overhead incurred with every opcode becomes visible to the programmer.
I'm aware you have said you cannot change the implementation, but still, for a complete answer. Such a task is far more suitable for Threading than curl multi, and you should start reading http://php.net/pthreads, starting with http://php.net/Thread
Left to their own devices on an idle CPU even 1000 threads would consume as much CPU as curl_multi, the point is that you can control precisely the code responsible for downloading every byte of response and upload every byte of the request, and if CPU usage is a concern you can implement a "nice" process by explicitly calling usleep, or limiting connection usage in a meaningful way, additionally your requests can be serviced in separate threads.
I do not suggest that 1000 threads is the thing to do, it is more than likely not. The thing to do would be design a Stackable ( see the documentation ) whose job is to make and service a request in a "nice", efficient way, and design pools ( see examples on github/pecl extension sources ) of workers to execute your newly designed requests ...

What is a tolerable memory usage for a cron job?

I created a crawler that will operate as a cron job. The object of the crawler is to go through posts on my site and pull keywords from them.
Currently, I am optimizing the script for both speed and server load - but I am curious one what types of benchmarks for each are considered "good"?
For example, here are some configurations I have tested, running through 5,000 posts each time (you'll notice the trade off between speed and memory):
Test 1 - script optimized for memory conservation:
Run time: 52 seconds
Avg. memory load: ~6mb
Peak memory load: ~7mb
Test 2 - script optimized for speed
Run time: 30 seconds
Avg. memory load: ~40mb
Peak memory load: ~48mb
Clearly the decision here is speed vs. server load. I am curious what your reactions are to these numbers. Is 40mb an expensive number, if it increases speed so drastically (and also minimizes MySQL connections?)
Or is it better to run the script slower with more MySQL connections, and keep the overhead memory low?

This is a really subjective question given that what is "tolerable" depends on many factors such as how many concurrent processes will be running, the specs of the hardware it'll be running on, and how long you expect it to take.

how much is CPU usage considered high on linux server

I'm running a few PHP job which fetches 100th thousands of data from a webservice and insert them to database. These jobs take up the CPU usage of the server.
My question is, how much is it considered high?
When i do a "top" command on linux server,
it seems like 77% .. It will go up to more than 100% if i run more jobs simultaneously. It seems high to me, (does more than 100% means it is running on the 2nd CPU ?)
28908 mysql 15 0 152m 43m 5556 S 77.6 4.3 2099:25 mysqld
7227 apache 15 0 104m 79m 5964 S 2.3 7.8 4:54.81 httpd
This server is also has also webpages/projects hosted in it. The hourly job since to be affecting the server as well as the other web project's loading time.
If high, is there any way of making it more efficient on the CPU?
Anyone can enlighten?

A better indicator is the load average, if I simplify, it is the amount of waiting tasks because of insufficient resources.
You can access it in the uptime command, for example: 13:05:31 up 6 days, 22:54, 5 users, load average: 0.01, 0.04, 0.06. The 3 numbers at the end are the load averages for the last minute, the last 5 minutes and the last 15 minutes. If it reaches 1.00, (no matter of the number of cores) it is that something it waiting.

I'd say 77% is definitly high.
There are probably many ways to make the job more efficient, (recursive import), but not much info given.
A quick fix would be invoking the script with the nice cmd,
and add a few sleeps to stretch the load over time.
I guess you also saturate the network during import, so can you split up the job it would prevent your site from stalling.
regards,
/t

You can always nice your tasks
http://unixhelp.ed.ac.uk/CGI/man-cgi?nice
With the command nice you can give proccesses more or less priority

These jobs take up the CPU usage of the server.
My question is, how much is it considered high?
That is entirely subjective. On computing nodes, the CPU usage is pretty much 100% per core all the time. Is that high? No, not at all, it is proper use of hardware that has been bought for money.

Nice won't help much, since it's mysql that's occupying your cpu,
putting nice on a php-client as in
nice -10 php /home/me/myjob.php
won't make any significant difference.
Better to split up the job so smaller parts, call your php-script
from cron and build it like
<?
ini_set("max_execution_time", "600")
//
//1. get the file from remote server, in chunks to avoid net saturation
$fp = fopen('http://example.org/list.txt');
$fp2 = fopen('local.txt','w');
while(!feof($fp)) {
fwrite($fp2,fread($fp,10000));
sleep(5);
}
fclose($fp/fp2);
while(!eof(file) {
//read 1000 lines
//do insert..
sleep(10);
}
//finished, now rename to .bak, log success or whatever...

Is CPU execution time different in Loop with Sleep() and Long Loops without sleep(), with both having the same total running time?

I have a loop that runs for approx. 25 minutes i.e 1500 seconds. [100 loops with sleep(15)]
The execution time for the statements inside loop is very less.
My scripts are hosted on GoDaddy. I am sure that they are having some kind of limit on execution time.
My question is, are they concerned with "the total CPU execution time" or the total running time.

They will be concerned with the CPU Execution Time, not the total running time unless connections are an issue and you're using a lot of them (which it doesn't sound like you are).
Running time, as in a stopwatch, doesn't matter much to a shared host, if your loop runs for 3 years but only uses 0.01% CPU doing it, it doesn't impact their ability to host. However if you ran for 3 years at 100% CPU, that directly impacts how many other applications/VMs/whatever can be run on that same hardware. This would mean more servers to host the same number of people which means money...that they care about.
For the question in the title: they are very different. With sleep() and the same amount of total time, that means the actual work the CPU is doing is much less because it can do the work, sleep/idle, and still finish in the same amount of time. When you're calling sleep() you're not taxing the CPU, it's a very low-power operation for it to keep the timer going until calling your code again.

This is the typical time limit:
http://es2.php.net/manual/en/info.configuration.php#ini.max-execution-time
It can normally be altered in a per-script basis with ini_set(), e.g.:
ini_set('max_execution_time', 20*60); // 20 minutes (in seconds)
Whatever, the exact time limits probably depend on how PHP is running (Apache module, fastCGI, CLI...).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.