Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have created a PHP script which scrapes 1 mio. domains and analyzes the content. I tested it locally and it takes 20 mins per 1,000 domains scraped.
Can I just setup a server with it and let it run for 2 weeks or is there a reason why a PHP file would crash after a certain execution time?
If you run PHP from the console, it has no max execution time. That being said, you should probably rearchitect your idea if it takes 2 weeks to execute. Maybe have a js frontend that calls a PHP script that scrapes 5 or 10 domains at a time...
Sure, you could if you run the code via command line or set a max_execution_time
With that said I would highly recommend that you re-architect your code, if your running this code on a Linux box look into pThreads. The task your trying to do seem like it would be easier with c# if your running on windows machine.
NOTE I can't stress enough that if you use threading for this task it will go much faster.
I would suggest the following:
Limit your script to X domains per execution.
Create a CRON that runs your script every minute.
This way you won't have to worry too much about memory leaks. You might also want to create a .lock file at the beginning of your process to make sure your CRON doesn't run the script before it's finish. Sometimes when you are requesting information from other websites it might take very long...
The problem with cron-jobs is that they can end up over-running, and so have more than one copy running at the same time. If you are running multiple copies from Cron at once, there will be a huge load spike, but there might not be anything running for the last 30 seconds of every minute. (Trust me, I've seen it happen, it was not pretty).
A simple shell script can be set running easily with normal Linux startup mechanisms and will them loop forever. Here, I've added the ability to check for the exit of a PHP script (or whatever) to exit the loop. Add other checks to deliberately slow down execution. Here's my blogpost on the subject.
I would arrange the script to run somewhere from 10-50 domain-scrapes, and then exit, ready to run again until you run out of data to look for, or some other issue happens that requires attention.
#!/bin/bash
# a shell script that keeps looping until a specific exit code is given
# Start from /etc/init.d, or SupervisorD, for example.
# It will restart itself until the script it calls returns a given exit code
nice php -q -f ./cli-worker.php -- $#
ERR=$?
# if php does an `exit(99);` ...
if [ $ERR -eq 99 ]
then
# planned complete exit
echo "99: PLANNED_SHUTDOWN";
exit 0;
fi
sleep 1
# Call ourself, replacing the script without a sub-call
exec $0 $#
Related
This is more a question about nohup than my PHP script although I will include code for you guys to see. I am using a script what is designed to never end, meaning script termination should never take place. In an ideal world the script would run forever. This is achieved with <?php while (true) {} ?> which I am led to believe is the correct way of doing this?
However I am finding my script is terminating for reasons unknown every few days. The longest the script has run for is 4 days. I am left baffled and unable to reproduce test case scenarios without the aid of having the output from the process at the time of termination. Does nohup allow you to see what happens when the process terminates?
I can see the process running when I do ps aux and once the script has finished execution it disappears from the ps aux list, suggesting that the problem is with the she'll environment the script is run in rather than any portion of my code?
Can anybody help. Any debugging tools for this would be appreciated.
EDIT: I am looking for tools to debug this scenario any help appreciated.
The problem here was with MySQL. MySQL needed to be configured not to die after a long session. (or re-establish periodically)
Use show variables like 'wait_timeout'; to see what your setup is configured for.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Here is my top output :
6747 example.com 15 0 241m 19m 8188 R 29.9[CPU] 0.2[RAM] 3:24.89 php
31798 mysql 15 0 1952m 422m 5384 S 12.0[CPU] 5.3[RAM] 9:00.27 mysqld
Look at PHP's CPU usage. I don't know which file is causing this load. Is there any module for show which file is using CPU under tree? I mean like this :
6747 example.com 15 0 241m 19m 8188 R 29.9[CPU] 0.2[RAM] 3:24.89 php
5.49% index.php
15.39% videos.php
x% y.php
31798 mysql 15 0 1952m 422m 5384 S 12.0[CPU] 5.3[RAM] 9:00.27 mysqld
and I want same thing for mysql too. I want to now which query is executing now and how long is it take.
The simplest options (not saying the most accurate, but the simplest) will be:
For PHP, if you want to compare one PHP file against another, assume that the process that takes the longest is using most CPU. Use a network profiler (in-built into Chrome and IE, or Firebug) to find out which process takes longest. Failing that, user fiddler (on Windows) or Charles (Mac). Remove the data response times and average out the rest = time to generate response approx = CPU usage. Note that this will include external calls, memcached calls, mySQL calls etc.
for mySQL, the slow query log is invaluable. (Will need to be configure in my.cnf and restart the server - but it should be considered compulsory.)
For more diagnostics, Xdebug (as suggested by Xesued) will help you profile individual parts of scripts (as well as debug). Not recommended for production, though, as it will slow you down further.
Another crude way is to "echo microtime(true);" at various places in your script, or pump that info to a log file. (Open log file at start of script, and at various points record the microtime - look for the large gaps.)
I don't know how to determine PHP script load at runtime. (Not sure if it is even possiable). However, using Xdebug's profiler can help you determine where your scripts are slow.
Xdebug Profiler
Its a great tool that will help you catch 90+% of your slowdowns.
As far as MySQL: Here is a related question: how-to-analyze-query-cpu-time
I don't think there's anything that direct available, but assuming each of those scripts runs as a seperate process (something like called individually on the CLI), you could use getmypid() then correlate that with your CPU load log.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Currently I'm trying to build a good scheduler system as an interface for setting and editing cron jobs on my system. My system is built using Zend framework 1.11.11 on a Linux server.
I have 2 main problems that I want your suggestion for:
Problem 1: The setup of the application itself
I have 2 ways to run the cron job:
First way is to create a folder scripts and create a common bootstrap file in it where I'll load only the resources that I need. Then for each task I'll create a separate script and in each script I'll include the bootstrap file. Finally, I'll add a cron task in the crontab file for each one of these scripts and the task will be something like ***** php /path/to/scripts/folder/cronScript_1.php .
Secondly treat the cron job like a normal request (no special bootstrap). Add a cron task in the crontab file for each one of these scripts and the task will be something like ***** curl http://www.mydomain.com/module/controller/action .
Problem 2: The interface to the application
Adding a cron job also can be done in 2 ways:
For each task there will be an entry in the crontab file. when I want to add a new task I must do it via cPanel or any other means to edit the crontab (which might not be available).
Store the tasks in the database and provide a UI for interacting with the database (grid to add few tasks and configuration). After that write only 1 cron job in the crontab file that runs every minute. This job will select all jobs from the database and checks if there is a job that should be run now (the time for the tasks will be stored and compared with the current time of the server).
In your opinion which way is better to implement for each part? Is there a ready made solution for this that is better in general??
Note
I came across Quartz will searching for a ready made solution. Is this what I'm looking for or is it something totally different?
Thanks.
Just my opinion, but I personally like both 1 & 2 dependent on what your script is intending to accomplish. For instance, we mostly do 1 with all of our cron entries as it becomes really easy to look at /etc/crontab and see at a glance when things are supposed to run. However, there are times when a script needs to be called every minute because logic within the script will then figure out what to run in that exact minute. (i.e. millions of users that need to be processed continually so you have a formula for what users to do in each minute of the hour)
Also take a look at Gearman (http://gearman.org/). It enables you to have cron scripts running on one machine that then slice up the jobs into smaller bits and farm those bits out to other servers for processing. You have full control over how far you want to take the map/reduce aspect of it. It has helped us immensely and allows us to process thousands of algorithm scripts per minute. If we need more power we just spin up more "workhorse" nodes and Gearman automatically detects and utilizes them.
We currently do everything on the command line and don't use cPanel, Plesk, etc. so I can't attest to what it's like editing the crontab from one of those backends. You may want to consider having one person be the crontab "gatekeeper" on your team. Throw the expected crontab entries into a non web accessible folder in your project code. Then whenever a change to the file is pushed to version control that person is expected to SSH into the appropriate machine and make the changes. I am not sure of your internal structure so this may or may not be feasible, but it's a good idea for developers to be able to see the way(s) that crontab will be executing scripts.
For Problem 2: The interface to the application I've used both methods 1 & 2. I strongly recommend the 2nd one. It will take quite more upfront work creating the database tables and building the UI. In the long run though, it will make it much easier adding new jobs to be run. I build the UI for my current company and it's so easy to use that non-technical people (accountants, warehouse supervisors) are able to go in and create jobs.
Much easier than logging onto the server as root, editing crontab, remembering the patterns and saving. Plus you won't be known as "The crontab guy" who everyone comes to whenever they want to add something to crontab.
As for setting up the application itself, I would have cron call one script and have that script run the rest. That way you only need 1 cron entry. Just be aware that if running the jobs takes a long time, you need to make sure that the script only starts running if there are no other instances running. Otherwise you may end up with the same job running twice.
I have a PHP script launched from a command-line (aka CLI mode, no webserver involved).
In this script I launch a command that will run from some time, usually exiting after a couple minutes. But at times, this command will run for hours because of various issues, and the best I can do is kill it and wait for a while before launching it again.
Two things I want to emphasize :
I have no control of the code inside that command, and can't improve failure detection.
It's not an important task, and it's perfectly ok to have it working that way.
That being said, I would like to improve things in my code, so that I can kill the child process if the command has been running for more than N seconds. But I still want to get the return code from the command, when it runs fine.
Pseudo-code should be something like this :
Launch command
While command is running
{
If the command is done running
{
echo return code
}
else
{
If the command has been running for more than N seconds
{
Kill the child process
}
}
}
How would you implement this in PHP ?
Thank you !
Solution : I ended up using the SIG_ALERT signal. More info on the signals handling and the pcntl lib in the pages provided by Gordon in his post.
Read through this Chapter of Practical PHP Programming.
It covers interoperating with Processes extensively. And also have a look at the PHP Manual's pages on Process Control.
You might also be interested in Gearman
Sorry for not providing any code. I never had the need for Process Control stuff, so I just kept in memory where to look in case I'd ever need it.
I have cron job - php script which is called one time in 5 minutes. I need to be sure that previously called php script has finished execution - do not want to mix data that's being processed.
There are three approaches I used to apply:
Creation of auxiliary text file which contains running-state flag. Executed script analyzes the contents of the file and breaks if flag is set to true. It's the simplest solution, but every time I create such script, I feel that I invented a bike one more time. Is there any well-known patterns or best-practices which would satisfy most of the needs?
Adding UNIX service. This approach is the best for the cron jobs. But it's more time consuming to develop and test UNIX service: good bash scripting knowledge is required.
Tracking processes using database. Good solution, but sometimes database usage is not encouraged and again - do not want to invent a bike, hope there is a good flexible solution already.
Maybe you have other suggestions how to manage single-processing of php scripts? Would be glad to hear your thoughts about this.
I'd recommend using the file locking mechanism. You create a text file, and you make your process lock it exclusively (see php flock function: http://us3.php.net/flock). If it fails to lock, then you exit because there is another instance running.
The advantage of using file locking is that if your PHP scripts dies unexpectedly or gets killed, it will automatically release the lock. This will not happen if you use plain text files for the status (if the script is set to update this file at the end of execution and it terminates unexpectedly, you will be left with untrue data).
http://php.net/flock with LOCK_EX should be enough in your case.
You could check wether or not your script is currently running using the ps command, helped by the grep command. "man ps" and "man grep" will tell you all about these unix/linux commands if you need informations about these.
Let's assume your script is called 'my_script.php'. This unix command :
ps aux | grep my_script.php
...will tell you if your script is running. You can run this command with shell_exec() at the start of your script, and exit() if it's already running.
The main advantage of this method is that it can't be wrong, where the script could have crashed, leaving your flag file in a state that would let you think it's still running.
I'd stick to version number 1. It's simple and works out. As long as you only wan't to check whether the script has finished or not it should be sufficent. If more complex data is to be remembered I'd go for version 3 in order to be able to 'memorize' the relevant data...
hth
K