Say I have a website hosted on a remote server. I navigate to a page on this website and attempt to perform a specific action (the details of which I can go into more depth if necessary, but for the time being let me just say that the action involves running a program on the server with data obtained from a database.)
The page just continually loads. So, I attempt to navigate to the main website. That now continually loads without resolve as well. After about a day the website comes back, so perhaps there is some automated process that kills tasks after a certain time has passed. My question is this:
Am I able to kill this task or perform any action to allow me to navigate to the website without waiting a full day? I can go into more detail if necessary.
Thanks.
By request, some more in-depth information.
The PHP script retrieves text-based information from the database. Based on this information the PHP script calls an executable program. The output from the executable is output to the screen.
I've checked mysql processlist and found a process that took a particularly long time. I killed it, so it may be that the executable is continually running. If so, how would I go about determining this, and if not is there anything else it could potentially be? Thanks.
Solved:
Alright, so basically Mike Purcell's advice which was to show the process list of mysql processes and kill any ones that I saw that took a substantially long time. Once that was done, it was just a matter of restarting mysql and httpd. Thanks again to everyone who commented.
Sounds like that mysterious action may have caused an un-optimized query to be executed, which may cause other queries to hang until the bad query has finished executing. If you have access to the mysql server via terminal you could issue the following commands to kill the long running query:
mysql> show processlist;
This command will output any currently running queries. Pay attention to the time column, this will display the time in seconds of how long a query has been executing. In theory you should never have any queries running past a few seconds, but some queries may take upwards of 10 minutes depending on the query and the dataset involved. The other column to note is the id column, with this value you can kill a query manually (much like killing a process on a linux machine).
mysql> kill 387 # 387 is just an example
Now when you run the show processlist command again, that query should disappear from the process list.
Looks like you DoS'ed your own website!
Maybe this webpage shouldn't be a webpage, but a task performed manually or via cron? Else, anyone will be able to find this page and kill your website when they want...
If this program need so much resources, you should limit it somehow : try to optimize it, try to limit the resources allowed to it (I don't know how to do that, I just know it's possible :s )
If the problem is already here, you can try to kill the process (ps aux | grep <processname", then kill -9 processid), or the query if the problems comes from MySQL (inside your mysql client, show processist; then kill <query n°>). Try the SQL first.
If the site is locked because of an intensive MySQL process, or a MySQL process gone rogue, the best thing you could do is isolate the process to its own database thread so normal tables don't get locked.
Other option is to switch some of your databases from (assuming) MyISAM to the InnoDB engine, if that can work. Batch insertions will be slower and performance signature will vary, but, you won't be subjected to such severe locking.
Related
Im just getting started with Queues, and they work fine for messaging and sending emails and SMS's to Twilio etc.
But now I want to do something more complex, and time consuming. I'm looking to upload a file of about 10,000 rows to AmazonS3, parse it, check for duplicates, and then only insert records that aren't duplicates.
When I run this process it takes over 6 minutes to complete. Which is way to long. I want to have this run in the background, with a visual progress bar that gets updated sporadically, based on the queue status.
Also, while this is running, I want the users to have full access to the site and database tables. This process, will lock my main table.
So I basically want to have it run in the background, only touch the main table once to check for duplicates, and from there, just proces/parse the file into a temporary table of 10,000+ rows. While leaving the other table free.
Once completed...it will then only write back to the main table once.
How can I achieve this without slowing the site/main server down?? I apologize for the extremely broad question
Laravel Queues can do what you want, but there are a couple of points to address in your email.
How can I achieve this without slowing the site/main server down?
Well, the queue is run as a separate process on the server, so you probably won't see a major impact on the server, provided your background process doesn't do anything too stressful for the server. If you're concerned about an impact on performance and you're running a Linux server, there are options for limiting the resources used by processes - check out the renice command which allows you to adjust the priority of processes. If you're not on Linux, then there are probably other options for your OS.
With respect to the database, that's harder to answer without knowing what your tables look like. It might be possible to do the check for duplicates with a single query and JOIN on the two tables, perhaps writing the results of the check to a different table. This might work, but it could also take a long time depending on how the tables are set up. Another solution would be to use a mirror of the main database table - copy it temporarily, do your work, then delete it. And finally, for a really involved solution, set up database replication and work off a slave.
As for running the queue worker, I have found that using supervisord to run my background working is VERY helpful - it allows me to start/stop the process easily and will automatically restart the process when it fails. The documentation on queue listeners has some discussion of this.
And the worker will fail - I have found that my worker process fails on a pretty regular basis. I think it has something to do with the PHP CLI settings, but it hasn't caused me any issues so I haven't really investigated it further. However, for a long-running job, you might run into difficulties. One way to mitigate this would be to break your job up into multiple smaller jobs and "daisy-chain" them together: when part1 finishes, it queues up part2; when part2 finishes, it queues up part3, etc.
As for the progress bar, that's pretty easy. Have the jobs update a value (in your database probably, or possibly in the filesystem) with the current status and have a Javascript function on the client periodically performing an AJAX request to get that value & update the progress bar.
I have a PHP script that has been running for a week, as it's doing a lot of number crunching -- it's not hung, just doing a lot of work. Problem is that I don't know where it has got to and so I don't know if it will finish in 1 hour or 1 month. (Yes, I should have put something in there so that it tell me, but too late for that now.)
Is there any way I can find out what the script is doing? Or even better, to extract variables from its current state?
On Windows you can check what php.exe is doing with Microsoft's procmon.exe.
It won't give you a full feedback on variables etc, but you can check for any filesystem-operations (which php does very often). PHP has some internal functions stored as extra .exe's. You can check with procmon if PHP call's them...
I hope that the script has finished its tasks by now. As a general guide, on Linux, you check what's happening with the processes running on your system, with top command. It gives you a window of all processes running.
To concentrate on a single process you may get its PID (Process ID) by ps command. So you can run:
$ ps -e
This command will give a 4 columns output. The left column is PID while the right one is the script name. Say you find the script name at the right column, and you check that its PID is 3468. Then running
$ top -p 3468
you get the periodically updated top window. You can select different kind of info in there. Usually you check %CPU, and VIRT, first. VIRT column is useful, because if it's constantly increasing, it's an almost definite sign of memory leak in the script.
Alternatively you may use htop and highlight the process you chose, if you have it installed on your system.
The already suggested answers.. Simply show that a script is running.. This as you already stated is not your need...
As for a script already running that does not log / store any information.. Well the only true option you have that i can think of is checking what it is processing.. So if it is processing files / a database.. You should be able to see the changed / modified results.
If its purely all in memory, your more or less out of luck unless you happen to use something like a session.
But simply checking if the script is running... is of no help at all.., So if its processing a database, check the database, If its processing files, check the files.. Otherwise your out of luck.
When it comes to heavy cruncher scripts, for example when i do geo-coding... I store results / processed data in local files.. So if the script fails or stops.. I can resume.. But the benefit of this is i can run another script and read those files and see the progress or last step performed.
CAN this be resumed or would it start over again? If it can be resumed, stop it and have it log or something so you can monitor it? Otherwise your just going to have to wait for it to finish.
I have written some PHP scripts that take a very long time to run. Usually I have the script update a database table with some statistics occasionally (ex: if in a loop, every 10000 loops, depending on weight of the loop...also take samples of microtime to see how long a particular step takes). I can then check the table for the "statistics" to see how the script is performing, or what particular step it is on.
I'd agree with what others have said about PHP not being the best language for this, but in a pinch I've found this method works well (I've had PHP scripts run stable for months).
I don't have a server, so i don't have crontab access. I am thinking of using a PHP script to check current time and how much time has passed. For example, I run the script, it stores the current date in MySQL and check if 30 days have passed. If so, do my stuff.
Is it possible to do all these without MySQL? And of course it is only my thinking, i haven't tried yet.
Keeping script running:
The issue is that you've either got to keep that script running for a long, long time (which PHP doesn't like) or you'll have to manually run that script every day or whatever.
One thing you could do is write a script to run on your local machine that accesses that PHP script (e.g. using the commandline tool 'wget') every minute or hour or whatever.
If you want to have a long-running script, you'll need to use: http://php.net/manual/en/function.set-time-limit.php. That'll let you execute a script for much longer.
As noted in another answer, there are also services like this: https://stackoverflow.com/questions/163476/free-alternative-to-webcron
Need for MySQL?
As for whether you need MySQL - definitely not, though it isn't a bad option if you have it available. You can use a file if required (http://php.net/manual/en/function.fopen.php) or even SQLite (http://php.net/manual/en/book.sqlite.php) which is a file-based SQL database.
As i understand, you can only run php scripts, which involved by user request.
I tried this once, but it is dangerous. If the scheduled process took too long time, user may be interrupting it, and hanging up the script, causing half-processed data. I'm not suggesting it for production environment.
Take a look at Drupals Poormanscron module: http://drupal.org/project/poormanscron. From the introduction text:
The module inserts a small amount of JavaScript on each page of your
site that when a certain amount of time has passed since the last cron
run, calls an AJAX request to run the cron tasks.
You can implement something like this yourself, possibly using their code as a starting point. However, this implementation depends on regular visits to the pages of your website. If nobody visits your website, the cronjobs do not get executed.
You'd need some kind of persistent storage, but a simple file should do the trick. Give it a go and see how you get on. :) Come back for help if you get stuck. Here are some pointers to get you started:
http://php.net/manual/en/function.file-get-contents.php
http://nz.php.net/manual/en/function.file-put-contents.php
You could use a webcron (basically a cronjob on another server that calls your script on a given time)
https://stackoverflow.com/questions/163476/free-alternative-to-webcron
I'm currently running a Linux based VPS, with 768MB of Ram.
I have an application which collects details of domains and then connect to a service via cURL to retrieve details of the pagerank of these domains.
When I run a check on about 50 domains, it takes the remote page about 3 mins to load with all the results, before the script can parse the details and return it to my script. This causes a problem as nothing else seems to function until the script has finished executing, so users on the site will just get a timer / 'ball of death' while waiting for pages to load.
**(The remote page retrieves the domain details and updates the page by AJAX, but the curl request doesnt (rightfully) return the page until loading is complete.
Can anyone tell me if I'm doing anything obviously wrong, or if there is a better way of doing it. (There can be anything between 10 and 10,000 domains queued, so I need a process that can run in the background without affecting the rest of the site)
Thanks
A more sensible approach would be to "batch process" the domain data via the use of a cron triggered PHP cli script.
As such, once you'd inserted the relevant domains into a database table with a "processed" flag set as false, the background script would then:
Scan the database for domains that aren't marked as processed.
Carry out the CURL lookup, etc.
Update the database record accordingly and mark it as processed.
...
To ensure no overlap with an existing executing batch processing script, you should only invoke the php script every five minutes from cron and (within the PHP script itself) check how long the script has been running at the start of the "scan" stage and exit if its been running for four minutes or longer. (You might want to adjust these figures, but hopefully you can see where I'm going with this.)
By using this approach, you'll be able to leave the background script running indefinitely (as it's invoked via cron, it'll automatically start after reboots, etc.) and simply add domains to the database/review the results of processing, etc. via a separate web front end.
This isn't the ideal solution, but if you need to trigger this process based on a user request, you can add the following at the end of your script.
set_time_limit(0);
flush();
This will allow the PHP script to continue running, but it will return output to the user. But seriously, you should use batch processing. It will give you much more control over what's going on.
Firstly I'm sorry but Im an idiot! :)
I've loaded the site in another browser (FF) and it loads fine.
It seems Chrome puts some sort of lock on a domain when it's waiting for a server response, and I was testing the script manually through a browser.
Thanks for all your help and sorry for wasting your time.
CJ
While I agree with others that you should consider processing these tasks outside of your webserver, in a more controlled manner, I'll offer an explanation for the "server standstill".
If you're using native php sessions, php uses an exclusive locking scheme so only a single php process can deal with a given session id at a time. Having a long running php script which uses sessions can certainly cause this.
You can search for combinations of terms like:
php session concurrency lock session_write_close()
I'm sure its been discussed many times here. I'm too lazy to search for you. Maybe someone else will come along and make an answer with bulleted lists and pretty hyperlinks in exchange for stackoverflow reputation :) But not me :)
good luck.
I'm not sure how your code is structured but you could try using sleep(). That's what I use when batch processing.
Is there a way to force Mysql from PHP to kill a query if it didn't return within a certain time frame?
I sometimes see expensive queries running for hours (obviously by this time HTTP connection timed out or the user has left). Once enough such queries accumulate it starts to affect the overall performance badly.
Long running queries are a sign of poor design. It is best to inspect the queries in question and how they could be optimised. This would be just ignoring the problem.
If still want this you could use the SHOW PROCESSESLIST command to get all running processes. And then use KILL x to kill a client connection. But for this to work you have to check this from another PHP script since multithreading is not yet possible. Also it is not advisable to give users utilized by PHP grants to mess with the server settings.
Warning: The intended behaviour should now be used with something like upstart.
You want to create DAEMONS for this sort of thing but you could also use cron.
Just have a script looking at set intervals for queries above xyz time and kill them.
// this instruction kills all processes executing for more than 10 seconds
SELECT CONCAT('KILL ',id,';')
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE state = "executing" AND `time` >= 10
However if queries are running for such a long time... they must be optimized.
On the other hand you may be trying to admin a shared server and there can be some rogue users. On that scenario you should specify in the terms of service that scripts will be monitored and disabled and that is exactly what should be done to such offensive ones.