I have a little more than hundred php scripts running on my server right now. Each one of them run loops and insert data into my db. I did that in order to learn killing processes in mysql. So to kill them, I coded a php file that loops through the processlist and kill them one by one. The problem is that this script is not executed. It keeps loading in my browser (no errors...). Also do note that I can't manually launch a show processlist in mysql, as mysql is totally overloaded at the moment, and nothing is responding. So what I guess is that my 'killing process' script is the last one on the queue and will only be executed at the end. So my question is to know if there is a way to force a process in mysql and put it at priority number one. Thank you in advance for your replies. Cheers. Marc
This is how I am killing the processes:
$qry = mysql_query("SHOW FULL PROCESSLIST");
while ($row=mysql_fetch_array($qry)) {
$process_id=$row["Id"];
$sql="KILL $process_id";
mysql_query($sql);
}
I'm not sure if this will actually affect MySQL, but on Unix/Linux, you could try calling proc_nice() near the top of your script with a negative increment (like -20). It basically does the same thing as the nice command.
From the Wikipedia page on nice:
"... nice is used to invoke a utility or shell script with a particular priority, thus giving the process more or less CPU time than other processes. A niceness of −20 is the highest priority and 19 or 20 is the lowest priority. The default niceness for processes is inherited from its parent process, usually 0."
Related
I have a daily cron job which takes about 5 minutes to run (it does some data gathering and then various database updates). It works fine, but the problem is that, during those 5 minutes, the site is completely unresponsive to any requests, HTTP or otherwise.
It would appear that the cron job script takes up all the resources while it runs. I couldn't find anything in the PHP docs to help me out here - how can I make the script know to only use up, say, 50% of available resources? I'd much rather have it run for 10 minutes and have the site available to users during that time, than have it run for 5 minutes and have user complaints about downtime every single day.
I'm sure I could come up with a way to configure the server itself to make this happen, but I would much prefer if there was a built-in approach in PHP to resolving this issue. Is there?
Alternatively, as plan B, we could redirect all user requests to a static downtime page while the script is running (as opposed to what's happening now, which is the page loading indefinitely or eventually timing out).
A normal script can't hog up 100% of resources, resources get split over the processes. It could slow everything down intensly, but not lock all resources in (without doing some funky stuff). You could get a hint by doing top -s in your commandline, see which process takes up a lot.
That leads to conclude that something locks all further processes. As Arkascha comments, there is a fair chance that your database gets locked. This answer explains which table type you should use; If you do not have it set to InnoDB, you probally want that, at least for the locking tables.
It could also be disk I/O if you write huge files, try to split it into smaller read/writes or try to place some of the info (e.g. if it are files with lists) to your database (assuming that has room to spare).
It could also be CPU. To fix that, you need to make your code more efficient. Recheck your code, see if you do heavy operations and try to make those smaller. Normally you want this as fast as possible, now you want them as lightweight as possible, this changes the way you write code.
If it still locks up, it's time to debug. Turn off a large part of your code and check if the locking still happens. Continue turning on code untill you notice locking. Then fix that. Try to figure out what is costing you so much. Only a few scripts require intense resources, it is now time to optimize. One option might be splitting it into two (or more) steps. Run a cron that prepares/sanites the data, and one that processed the data. These dont have to run at syncronical, there might be a few minutes between them.
If that is not an option, benchmark your code and improve as much as you can. If you have a heavy query, it might improve by selecting only ID's in the heavy query and use a second query just to fetch the data. If you can, use your database to filter, sort and manage data, don't do that in PHP.
What I have also implemented once is a sleep every N actions.
If your script really is that extreme, another solution could be moving it to a time when little/no visitors are on your site. Even if you remove the bottleneck, nobody likes a slow website.
And there is always the option of increasing your hardware.
You don't mention which resources are your bottleneck; CPU, memory or disk I/O.
However if it is CPU or memory you can do something this in you script:
http://php.net/manual/en/function.sys-getloadavg.php
http://php.net/manual/en/function.memory-get-usage.php
$yourlimit = 100000000;
$load = sys_getloadavg();
if ($load[0] > 0.80 || memory_get_usage() > $yourlimit) {
sleep(5);
}
Another thing to try would be to set your process priority in your script.
This requires SU though, which should be fine for a cronjob?
http://php.net/manual/en/function.proc-nice.php
proc_nice(50);
I did a quick test for both and it work like a charm, thanks for asking I have cronjob like that as well and will implement it. It looks like the proc_nice only will do fine.
My test code:
proc_nice(50);
$yourlimit = 100000000;
while (1) {
$x = $x+1;
$load = sys_getloadavg();
if ($load[0] > 0.80 || memory_get_usage() > $yourlimit) {
sleep(5);
}
echo $x."\n";
}
It really depend of your environment.
If using a unix base, there is built-in tools to limit cpu/priority of a given process.
You can limit the server or php alone, wich is probably not what you are looking for.
What you can do first is to separate your task in a separate process.
There is popen for that, but i found it much more easier to make the process as a bash script. Let''s name it hugetask for the example.
#!/usr/bin/php
<?php
// Huge task here
Then to call from the command line (or cron):
nice -n 15 ./hugetask
This will limit the scheduling. It mean it will low the priority of the task against others. The system will do the job.
You can as well call it from your php directly:
exec("nice -n 15 ./hugetask &");
Usage: nice [OPTION] [COMMAND [ARG]...] Run COMMAND with an adjusted
niceness, which affects process scheduling. With no COMMAND, print the
current niceness. Niceness values range from
-20 (most favorable to the process) to 19 (least favorable to the process).
To create a cpu limit, see the tool cpulimit which has more options.
This said, usually i am just putting some usleep() in my scripts, to slow it down and avoid to create a funnel of data. This is ok if you are using loops in your script. If you slow down your task to run in say 30 minutes, there won't be much issues.
See also proc_nice http://php.net/manual/en/function.proc-nice.php
proc_nice() changes the priority of the current process by the amount
specified in increment. A positive increment will lower the priority
of the current process, whereas a negative increment will raise the
priority.
And sys_getloadavg can also help. It will return an array of the system load in the last 1,5, and 15 minutes.
It can be used as a test condition before launching the huge task.
Or to log the average to find the best day time to launch huge task. It can be susrprising!
print_r(sys_getloadavg());
http://php.net/manual/en/function.sys-getloadavg.php
You could try to delay execution using sleep. Just cause your script to pause between several updates of your database.
sleep(60); // stop execution for 60 seconds
Although this depends a lot on the kind of process you are doing in your script. Maybe or not helpful in your case. Worth a try, so you could
Split your queries
do the updates in steps with sleep inbetween
References
Using sleep for cron process
I could not describe it better than the quote in the above answer:
Maybe you're walking the database of 9,000,000 book titles and updating about 10% of them. That process has to run in the middle of the day, but there are so many updates to be done that running your batch program drags the database server down to a crawl for other users.
So modify the batch process to submit, say, 1000 updates, then sleep for 5 seconds to give the database server a chance to finish processing any requests from other users that have backed up.
Sleep and server resources
sleep resources depend on OS
adding sleep to allevaite server resources
Probably to minimize you memory usage you should process heavy and lengthy operations in batches. If you query the database using an ORM like doctrine you can easily use existing functions
http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/batch-processing.html
It's hard to tell what exactly the issue may be without having a look at your code (cron script). But to confirm that the issue is caused by the cron job you can run the script manually and check website responsiveness. If you notice the site being down when running the cron job then we would have to have a look at your script in order to come up with a solution.
Many loops in your cron script might consume a lot of CPU resources.
To prevent that and reduce CPU usage simply put some delays in your script, for example:
while($long_time_condition) {
//Do something here
usleep(100000);
}
Basically, you are giving the processor some time to do something else.
Also you can use the proc_nice() function to change the process priority. For example proc_nice(20);//very low priority. Look at this question.
If you want to find the bottlenecks in your code you can try to use Xdebug profiler.
Just set it up in your dev environment, start the cron manually and then profile any page. Also you can profile your cron script as well php -d xdebug.profiler_enable=On script.php, look at this question.
If you suspect that the database is your bottleneck than import pretty large dataset (or entire database) in your local database and repeat the steps, logging and inspecting all the queries.
Alternatively if it possible setup the Xdebug on the staging server where the server is as close as possible to production and profile the page during cron execution.
I want to accomplish the following behavior in php:
1 - Script gets called with parameters
2- I Intiate a thread for a long running operation
3 - Script should return control to the caller
4- Thread executes till its finished
Is this behavior possible? What i am seeing now, is that the script wont return until the thread has finished executing, which makes sense as the execution of the thread would probably die if the script stops executing , but is there no way to stop blocking the client so they can go on about their business? Am i stuck using some exec() call to get this behavior? Is there a way to get this done with threading only? Id like to avoid using exec if possible..
So if someone calls my script from a browser, it should just return immidiatly, and the long running process should keep executing until its done.
Thanks
Daniel
Yes, its possible. Call your php script via AJAX, and and create multiple instances of the ajax function dynamically. See attached screenshot. When I compared results of running a single function versus 24 instances, my data was processed about 15x faster. I am trying to populate a MySQL table with about 30 million records, and each record involves calculating distance in miles from city center, based on lat/lng. So yes, its no walk in the park. As you can see, I am averaging about See this:
multi threads http://gaysugardaddyfinder.com/screen2.PNG
multi threads http://gaysugardaddyfinder.com/screen.png
This may be a glorious hack or what not - but it sure worked great for me.
My server is a Xeon 72 Core setup with 64 GB RAM.
I am building a turn-based multiplayer game with Flash and PHP. Sometimes two users may call on the same PHP script at the same time. This script is designed to write some information to the database. But that script should not run if that information has already been written by another user, or else the game will break. If PHP processes these scripts sequentially (similar to how MySQL queues up multiple queries), then only one script should run in total and everything should be fine.
However, I find that around 10% of the time, BOTH user's scripts are executed. My theory is that the server sometimes receives both user requests to run the script at exactly the same time and they both run because neither detected that anything has been written to the database yet. Is it possible that both scripts were executed at the same time? If so, what are the possible solutions to this problem.
THis is indeed possible. You can try locking and unlocking tables at the beginning and end of your scripts.
Though this will slow down some requests, as they would have to first wait for the locked tables to be unlocked.
It doesnt matter, if it is PHP, C, Java what ever. At the same time can run max only as much processes, as you have CPUs (and cores). There can be running lets say 100 processes at the same time, if you have only 2 cores. Only 2 are running, rest is waiting.
Now it depends what you see under run. If you take it as active or if you take also waiting processes. Secondly, it depends on your system configuration, how many processes can wait and on your system specs.
Sounds, at first glance, like what keeps a 2nd instance of the script to roll just does not happen fast enough, 10% of the time... I understand that you already have some kind of a 'lock' like someone told you to add, which is great; as someone mentioned above, always put this lock FIRST THING in your script, if not even before calling the script (a.k.a in parent script). Same goes for competing functions / objects etc...
Just a note though, I was directed here by google and what i wanted to find out is if script B will run IN AN IFRAME (so in a 'different window' if you wish) if script A is not finished running; basically your title is a bit blurry. Thank you very much.
Fortunately enough we're in the same pants : I'm programing an Hearthstone-like card game using php (which I know, ain't suited for this at all, but I just like challenging tasks, (and okay, that's the only language i'm familiar with)). Basically I gotta keep multiple 'instants' or actions if you prefer from triggering while another set of global event/instant - instants - sub-instants is rolling. This includes NEVER calling a function that has an event into it into the same rolling snipet, EXCEPT if I roll a while on a $_SESSION variable with value y that only does sleep(1) (that happens in scritpt A); while $_SESSION["phase"] == "EndOfTurnEffects" and then continue to roll until $_SESSION["phase"] == "StandBy" (other player's turn), and I wish script B to mofity $_SESSION["phase"]. Basically if script B does not run before script A is done executing, I'm caught in an endless loop of the while statement...
That's very plausible that they do. Look into database transactions.
Briefly, database transactions are used to control concurrency between programs that access the database at the same time. You start a transaction, then execute multiple queries and finally commit the transaction. If two scripts overlap each other, one of them will fail.
Note that isolation levels can further give fine grained control of how much the two (or more) competing scripts may share. Typically all are allowed to ready from the database, but only one is allowed to write. So the error will happen at the final commit. This is fine as long as all side effects are happening in the database, but not sufficient if you have external side effects (Such as deleting a file or sending an email). In these cases you may want to lock a table or row for the duration of the transaction or set the isolation level.
Here is an example of SQL table locking that you can use so that the first PHP thread which grabs the DB first will lock the table (using lock name "awesome_lock_row") until it finally releases it. The second thread attempts to use the same table, and since the lock name is also "awesome_lock_row"), it keeps on waiting until the first PHP thread has unlocked the table.
For this example, you can try running the same script perhaps 100 times concurrently as a cron job and you should see "update_this_data" number field increment to 100. If the table hadn't been locked, all the concurrent 100 threads would probably first see "update_this_data" as 0 at the same time and the end result would have been just 1 instead of 100.
<?php
$db = new mysqli( 'host', 'username', 'password', 'dbname');
// Lock the table
$db->query( "DO GET_LOCK('awesome_lock_row', 30)" ); // Timeout 30 seconds
$result = $db->query( "SELECT * FROM table_name" );
if($result) {
if ( $row = $result->fetch_object() )
$output = $row;
$result->close();
}
$update_id = $output->some_id;
$db->query( UPDATE table_name SET update_this_data=update_this_data+1 WHERE id={$update_id} );
// Unlock the table
$db->query( "DO RELEASE_LOCK('awesome_lock_row')" );
?>
Hope this helps.
I have daemon script written in Perl that checks a database tables for rows, pulls them in one by one, sends the contents via HTTP post to another service, then logs the result and repeats (only a single child). When there are rows present, the first one is posted and logged immediately, but every subsequent one is delayed for around 20 seconds. There are no sleep()'s running, and I can't find any other obvious delays. Any ideas?
Without code nobody can help you. You should reduce your code in a minimum testcase that represent your error and post it here. Probablby if you this often you find the error yourself.
And probably even without "sleep" your process can hang if you don't did asynchronous programming and call something that just takes a long of time to execute.
You can find such code that hangs if you execute your program in a profiler like Devel::NYTProf
I have a personal web site that crawls and collects MP3s from my favorite music blogs for later listening...
The way it works is a CRON job runs a .php scrip once every minute that crawls the next blog in the DB. The results are put into the DB and then a second .php script crawls the collected links.
The scripts only crawl two levels down into the page so.. main page www.url.com and links on that page www.url.com/post1 www.url.com/post2
My problem is that as I start to get a larger collection of blogs. They are only scanned once ever 20 to 30 minutes and when I add a new blog to to script there is a backup in scanning the links as only one is processed every minute.
Due to how PHP works it seems I cannot just allow the scripts to process more than one or a limited amount of links due to script execution times. Memory limits. Timeouts etc.
Also I cannot run multiple instances of the same script as they will overwrite each other in the DB.
What is the best way I could speed this process up.
Is there a way I can have multiple scripts affecting the DB but write them so they do not overwrite each other but queue the results?
Is there some way to create threading in PHP so that a script can process links at its own pace?
Any ideas?
Thanks.
USE CURL MULTI!
Curl-mutli will let you process the pages in parallel.
http://us3.php.net/curl
Most of the time you are waiting on the websites, doing the db insertions and html parsing is orders of magnitude faster.
You create a list of the blogs you want to scrape,Send them out to curl multi. Wait and then serially process the results of all the calls. You can then do a second pass on the next level down
http://www.developertutorials.com/blog/php/parallel-web-scraping-in-php-curl-multi-functions-375/
pseudo code for running parallel scanners:
start_a_scan(){
//Start mysql transaction (needs InnoDB afaik)
BEGIN
//Get first entry that has timed out and is not being scanned by someone
//(And acquire an exclusive lock on affected rows)
$row = SELECT * FROM scan_targets WHERE being_scanned = false AND \
(scanned_at + 60) < (NOW()+0) ORDER BY scanned_at ASC \
LIMIT 1 FOR UPDATE
//let everyone know we're scanning this one, so they'll keep out
UPDATE scan_targets SET being_scanned = true WHERE id = $row['id']
//Commit transaction
COMMIT
//scan
scan_target($row['url'])
//update entry state to allow it to be scanned in the future again
UPDATE scan_targets SET being_scanned = false, \
scanned_at = NOW() WHERE id = $row['id']
}
You'd probably need a 'cleaner' that checks periodically if there's any aborted scans hanging around too, and reset their state so they can be scanned again.
And then you can have several scan processes running in parallel! Yey!
cheers!
EDIT: I forgot that you need to make the first SELECT with FOR UPDATE. Read more here
This surely isn't the answer to your question but if you're willing to learn python I recommend you look at Scrapy, an open source web crawler/scraper framework which should fill your needs. Again, it's not PHP but Python. It is how ever very distributable etc... I use it myself.
Due to how PHP works it seems I cannot just allow the scripts to process more than one or a limited amount of links due to script execution times. Memory limits. Timeouts etc.
Memory limit is only a problem, if your code leaks memory. You should fix that, rather than raising the memory limit. Script execution time is a security measure, which you can simply disable for your cli-scripts.
Also I cannot run multiple instances of the same script as they will overwrite each other in the DB.
You can construct your application in such a way that instances don't override each other. A typical way to do it would be to partition per site; Eg. start a separate script for each site you want to crawl.
CLI scripts are not limited by max execution times. Memory limits are not normally a problem unless you have large sets of data in memory at any one time. Timeouts should be handle gracefully by your application.
It should be possible to change your code so that you can run several instances at once - you would have to post the script for anyone to advise further though. As Peter says, you probably need to look at the design. Providing the code in a pastebin will help us to help you :)