What is a tolerable memory usage for a cron job? - php

I created a crawler that will operate as a cron job. The object of the crawler is to go through posts on my site and pull keywords from them.
Currently, I am optimizing the script for both speed and server load - but I am curious one what types of benchmarks for each are considered "good"?
For example, here are some configurations I have tested, running through 5,000 posts each time (you'll notice the trade off between speed and memory):
Test 1 - script optimized for memory conservation:
Run time: 52 seconds
Avg. memory load: ~6mb
Peak memory load: ~7mb
Test 2 - script optimized for speed
Run time: 30 seconds
Avg. memory load: ~40mb
Peak memory load: ~48mb
Clearly the decision here is speed vs. server load. I am curious what your reactions are to these numbers. Is 40mb an expensive number, if it increases speed so drastically (and also minimizes MySQL connections?)
Or is it better to run the script slower with more MySQL connections, and keep the overhead memory low?

This is a really subjective question given that what is "tolerable" depends on many factors such as how many concurrent processes will be running, the specs of the hardware it'll be running on, and how long you expect it to take.

Related

Mysql connections reduction impact?

I get around 4 million hits on my web server daily. On each page, I open 2 Mysql connections which get close after script execution.
After some optimisations done, for 10% of my requests .i.e. for 400k hits, I open single mysql connection now instead of 2.
After reducing Total Mysql connections per day, I checked Total Mysql No. of processes, sleeps etc, where I don't see any significant gain. It's almost same pre and post optimisation.
In which area I can expect to see the benefit in terms of performance? Will it be CPU utilisation? Memory benefit? IO?
I use LAMP stack.

PHP multiple curl/file_get_contents() with cron, high cpu usage

I have about 8 cron tasks running every minute, every one of them takes time as they download data from other website by curl (single script makes multiple curl requests). Is there any way to lower the cpu or memory usage? Does unstetting variables help?
Yes , unsetting variables will lower the memory usage.
If you want to lower the cpu usage you have to give them fewer tasks per second. You can start each of your scripts after some time intervals. Since each script make multiple requests that would be the best way.
The bottleneck here should be the I/O usage , not the cpu , basically if its not going at 100% , You dont have to worry about it.

How to run 50k jobs per second with gearman

According to Gearman website
"A 16 core Intel machine is able to process upwards of 50k jobs per second."
I have load balancer that moves traffic to 4 different machines. Each machine has 8 cores. I want to have the ability to run 13K jobs per machine, per second (it's definitely more then 50K jobs).
Each job takes between 0.02 - 0.8 MS.
How many workers do I need to open for this type of performance?
What is the steps that I need to take to open these amount of workers?
Depending on what kind of processing you're doing, this will require a little experimentation and load testing. Before you start, make sure you have a way to reboot the server without SSH, as you can easily peg the CPU. Follow these steps to find the optimum number of workers:
Begin by adding a number of workers equal to the number of cores minus one. If you have 8 cores, start with 7 workers (hopefully leaving a core free for doing things like SSH).
Run top and observe the load average. The load average should not be higher than the number of cores. For 8 cores, a load average of 7 or above would indicate you have too many workers. A lower load average means you can try adding another worker.
If you added another worker in step 2, observe the load average again. Also observe the increase in RAM usage.
If you repeat the above steps, eventually you will either run out of CPU or RAM.
When doing parallel processing, keep in mind that you could run into a point of diminishing returns. Read about Amdahl's law for more information.

how much is CPU usage considered high on linux server

I'm running a few PHP job which fetches 100th thousands of data from a webservice and insert them to database. These jobs take up the CPU usage of the server.
My question is, how much is it considered high?
When i do a "top" command on linux server,
it seems like 77% .. It will go up to more than 100% if i run more jobs simultaneously. It seems high to me, (does more than 100% means it is running on the 2nd CPU ?)
28908 mysql 15 0 152m 43m 5556 S 77.6 4.3 2099:25 mysqld
7227 apache 15 0 104m 79m 5964 S 2.3 7.8 4:54.81 httpd
This server is also has also webpages/projects hosted in it. The hourly job since to be affecting the server as well as the other web project's loading time.
If high, is there any way of making it more efficient on the CPU?
Anyone can enlighten?
A better indicator is the load average, if I simplify, it is the amount of waiting tasks because of insufficient resources.
You can access it in the uptime command, for example: 13:05:31 up 6 days, 22:54, 5 users, load average: 0.01, 0.04, 0.06. The 3 numbers at the end are the load averages for the last minute, the last 5 minutes and the last 15 minutes. If it reaches 1.00, (no matter of the number of cores) it is that something it waiting.
I'd say 77% is definitly high.
There are probably many ways to make the job more efficient, (recursive import), but not much info given.
A quick fix would be invoking the script with the nice cmd,
and add a few sleeps to stretch the load over time.
I guess you also saturate the network during import, so can you split up the job it would prevent your site from stalling.
regards,
/t
You can always nice your tasks
http://unixhelp.ed.ac.uk/CGI/man-cgi?nice
With the command nice you can give proccesses more or less priority
These jobs take up the CPU usage of the server.
My question is, how much is it considered high?
That is entirely subjective. On computing nodes, the CPU usage is pretty much 100% per core all the time. Is that high? No, not at all, it is proper use of hardware that has been bought for money.
Nice won't help much, since it's mysql that's occupying your cpu,
putting nice on a php-client as in
nice -10 php /home/me/myjob.php
won't make any significant difference.
Better to split up the job so smaller parts, call your php-script
from cron and build it like
<?
ini_set("max_execution_time", "600")
//
//1. get the file from remote server, in chunks to avoid net saturation
$fp = fopen('http://example.org/list.txt');
$fp2 = fopen('local.txt','w');
while(!feof($fp)) {
fwrite($fp2,fread($fp,10000));
sleep(5);
}
fclose($fp/fp2);
while(!eof(file) {
//read 1000 lines
//do insert..
sleep(10);
}
//finished, now rename to .bak, log success or whatever...

Is CPU execution time different in Loop with Sleep() and Long Loops without sleep(), with both having the same total running time?

I have a loop that runs for approx. 25 minutes i.e 1500 seconds. [100 loops with sleep(15)]
The execution time for the statements inside loop is very less.
My scripts are hosted on GoDaddy. I am sure that they are having some kind of limit on execution time.
My question is, are they concerned with "the total CPU execution time" or the total running time.
They will be concerned with the CPU Execution Time, not the total running time unless connections are an issue and you're using a lot of them (which it doesn't sound like you are).
Running time, as in a stopwatch, doesn't matter much to a shared host, if your loop runs for 3 years but only uses 0.01% CPU doing it, it doesn't impact their ability to host. However if you ran for 3 years at 100% CPU, that directly impacts how many other applications/VMs/whatever can be run on that same hardware. This would mean more servers to host the same number of people which means money...that they care about.
For the question in the title: they are very different. With sleep() and the same amount of total time, that means the actual work the CPU is doing is much less because it can do the work, sleep/idle, and still finish in the same amount of time. When you're calling sleep() you're not taxing the CPU, it's a very low-power operation for it to keep the timer going until calling your code again.
This is the typical time limit:
http://es2.php.net/manual/en/info.configuration.php#ini.max-execution-time
It can normally be altered in a per-script basis with ini_set(), e.g.:
ini_set('max_execution_time', 20*60); // 20 minutes (in seconds)
Whatever, the exact time limits probably depend on how PHP is running (Apache module, fastCGI, CLI...).

Categories