I've got a sever which has an action triggered by a frequent cron job.
This is a php application build on Silverstripe (4.0).
The issue I'm facing is the php processes stay alive and also keep database connections open. This means after a few days the site stops working entirely once SQL stops accepting new connections.
The system has two tasks on cron jobs;
One takes a massive CSV file and spits it into smaller sub files which are then imported into the database. This one uses a lock file to prevent it running into a previously running instance. I'm not too sure if this is working.
The second task processes all the records which have been updated in large batches.
Either of these could be the source of the overloading but I'm not sure how to narrow it down.
What's the best way to diagnose the source of the issue?
In terms of debugging, this would be like any other task profiling the application with something like xdebug and kcachegrind. To ensure that processes do not run for too long you can limit the max_execution_time for the PHP.ini for the CLI.
To then let the CLI process run for a long time, but only just enough time add something to set the max execution time on a per row basis:
$allowed_seconds_per_row = 3;
foreach($rows_to_process as $row){
set_time_limit($allowed_seconds_per_row);
$this->process($row);
}
You can also register a shutdown function to record the state as the script ends.
It is likely that the memory is a key cause for failure and debugging focused on the memory usage and that can be controlled by unsetting variable data as needed.
Related
I developed a site using Zend Framework 2. It is basically a price comparison site that integrates with many of the top affiliate networks out there. I wrote a script that checks prices from each affiliate network, and then updates my local DB with that price. Depending on which affiliate network I am contacting, I may be making an API call (Amazon or CJ.com), or I may be looking at an XML product feed (Pepperjam or LinkShare). The XML product feed would be hosted locally.
At present, there are around 3,500 sku's that I am checking with this script. The vast majority of them (95%+) are targeting an XML product feed. I would estimate that this script should probably take in the neighborhood of 10 minutes to complete. Some of the XML files I am looking at are around 8 MB in size.
I have tested this script thoroughly in my local environment and taken great lengths to make sure that there is no memory leak or something of that nature which would cause performance issues. As an example, I made sure to use data streams where possible to avoid putting the XML file in memory over and over, etc. Suffice to say, the script runs locally without issue.
This script is intended to be run as a cron job, however I do have a way to trigger it via the secure admin interface ad-hoc. Locally, this is how I initiate the script to run, and everything goes rather smoothly.
When I deploy my code to the shared hosting account, I am having all sorts of problems. In order to troubleshoot, I attached logging to various stages of this script to track when it starts, how it progresses, and when each step completes, etc. All of this is being logged to a MySQL database.
Problem #1: If I run the script ad-hoc via an HTTP request, I find that it will run for a couple minutes, and then the script starts again (so there are now two instance apparently running). Wait another couple minutes, and a third one will start, etc..... Here is an example when I triggered the script to run at 10:09pm via an HTTP request.
Screenshot of process manager
Needless to say, I DO NOT run it via an HTTP request because it only serves to get me in trouble with my web hosting provider :)
Problem #2: When the script runs on the server, triggered via a cron job, it is failing to complete. I have taken the production copy of the database and taken it locally along with the XML files, it runs fine. So it should not be a problem with bad data exposing bad code. My observation is - the script nearly runs for the exact same amount of time - before aborts, or is terminated, or whatever. The last record updated is generally timestamped around 4 minutes and 30 seconds or so (if memory serves) after the script is triggered. The SKU list is constantly changing so the record that it ends on differs, but the the time of the last update is nearly the same each time. Nothing is being logged in the error logs. I monitored server resources via SSH top command and there is nothing out of the ordinary. CPU usage is in check and memory used does not go up.
I have a shared hosting account through Bluehost. My thoughts were that perhaps it was a script max execution time issue. I extended the max execution time in the script itself and via php.ini. Made no difference.
So I guess what I am looking for is some fresh ideas of where to go next. What questions should I be asking my hosting company so they can help me get to the bottom of this. They are only somewhat helpful to say the least. Could it be some limitation on my hosting account? Triggering some sort of automatic monitor that is killing the script? What types of Apache settings could be problematic for a script of this nature? PHP.ini settings? Absolutely any input you can provide would be helpful.
And why, when triggered via HTTP, would it keep spinning up new instances? I guess I could live w/o running it manually, and only run it via a cron job, but that isn't working either. So .... interested in hearing the communities thoughts on this. Thanks!
I haven't seen your script, neither did I work with your hoster, so everything below is just a guess - and a suggestion.
Given your description, I would say you're right that your script might have been killed by timeout when run from cron. I'm not sure why it keeps spawning new instances of your script when you execute it manually via an HTTP request, but it may also be related to a timeout (e.g. if they have a logic that restarts a script if it has not produced an output within a certain time, or something like that).
You can follow up with your hosting provider about running long-running (or memory-consuming) script in their environment, and they might have some FAQ or document already written that covers this topic.
Let me suggest an option for you in case if your provider is unable to help.
From what you said, I expect your script runs an SQL query to get a list of SKUs, and then slowly iterates over this list, performing some job on every item (and eventually dies for whatever reason, as we learned).
How about if you create a temporary table (or file - just any kind of persistent storage on the server) that would save the last processed record ID of the script, or NULL if the script successfully completed. That way you'll be able to make your script start with the last processed record (if the last processed record had id = 1000, add ... WHERE id > 1000 to the main query that fetches SKUs), and you won't really care if the script completed its first attempt or not (if not, it will keep processing from that very point when it was killed, on its second try).
Alternatively, to extend this approach, you can limit one invocation to the certain amount of records to process (e.g. 100 or 1000), again, saving the last processed record ID in the database or somewhere else.
The main idea is: if the script fails to process all SKUs at once, just make it restartable so that it does not lose its progress.
I have about 35 cron jobs right now. Most of them are PHP scripts that either scrape or do some calculations. The scripts also loop over 10-20 different servers to do those scrapes. (They are different countries so they have to be separate calls).
So we have 30 scripts, each has a loop over 20 servers and therefore take about 5-15 minutes to run per script. I have each script spaced out right now.
But is it better to have 80 individual scripts run instead of 35 scripts that loop and take a while? Each script would take maybe 1-2 minutes instead of 10-15min.
That would of course spawn a ton more PHP processes. Is there any issue or limit with 10-15 or more PHP processes running at once?
I'm running a cloud server performance on Rackspace.
Personally if the jobs need to complete in a certain order I would make it as linear as possible.....it might take longer but I always err . The side of data accuracy.
It depends.
If you are creating more processes that will be running at the same time you are going to increase your overall memory footprint. Each process will carry it's own overhead of memory for the process to run, and to load any libraries needed for it's process. (aside from whatever it needs to do whatever it does). You will also more than have twice as many script to monitor that they are successfully running all the time.
However in creating more processes you will be able to speed things us since you are essentially creating a multi-thread. Allowing one process to continue while another is blocking waiting for i/o.
If each script doesn't have a dependency on another, breaking them into smaller scripts should be fine. If you can handle monitoring more scripts, and the server can handle it, then I would do it.
If scripts do have dependencies, or if you would have to run so many at the same time you server usage maxes out, keep them together.
That being said, I would also try to optimize the script, make sure there isn't something you can do to make them faster without create more processes.
Depending on how you have the servers setup, I would run them at once. In addition, I would also run them at night, off hours when the web servers aren't in use and not during business operations unless your web app depends on it. If you're on a Cloud server on Rackspace I wouldn't worry about bandwidth although increasing your ram could be an issue further down the road.
Spawning a ton more PHP process shouldn't be a worry if you have sufficient amount of ram; there is no limitation on the linux side.
a) Figure out which cron needs to run in which order
b) Order the cron to be run at night, around mid-night
c) Run and fireoff the 80 scripts at once
it would also be a good idea to send you an email with cron results or report that it all went through successfully, based on the batch but not individual cron.
I have a script that updates my database with listings from eBay. The amount of sellers it grabs items from is always different and there are some sellers who have over 30,000 listings. I need to be able to grab all of these listings in one go.
I already have all the data pulling/storing working since I've created the client side app for this. Now I need an automated way to go through each seller in the DB and pull their listings.
My idea was to use CRON to execute the PHP script which will then populate the database.
I keep getting Internal Server Error pages when I'm trying to execute a script that takes a very long time to execute.
I've already set
ini_set('memory_limit', '2G');
set_time_limit(0);
error_reporting(E_ALL);
ini_set('display_errors', true);
in the script but it still keeps failing at about the 45 second mark. I've checked ini_get_all() and the settings are sticking.
Are there any other settings I need to adjust so that the script can run for as long as it needs to?
Note the warnings from the set_time_limit function:
This function has no effect when PHP is running in safe mode. There is no workaround other than turning off safe mode or changing the time limit in the php.ini.
Are you running in safe mode? Try turning it off.
This is the bigger one:
The set_time_limit() function and the configuration directive max_execution_time only affect the execution time of the script itself. Any time spent on activity that happens outside the execution of the script such as system calls using system(), stream operations, database queries, etc. is not included when determining the maximum time that the script has been running. This is not true on Windows where the measured time is real.
Are you using external system calls to make the requests to eBay? or long calls to the database?
Look for particularly long operations by profiling your php script, and looking for long operations (> 45 seconds). Try to break those operations into smaller chunks.
Well, as it turns out, I overlooked the fact that I was testing the script through the browser. Which means Apache was handling the PHP process, which was executed with mod_fcgid, which had a timeout of exactly 45 seconds.
Executing the script directly from shell and CRON works just fine.
I am developing a website that requires a lot background processes for the site to run. For example, a queue, a video encoder and a few other types of background processes. Currently I have these running as a PHP cli script that contains:
while (true) {
// some code
sleep($someAmountOfSeconds);
}
Ok these work fine and everything but I was thinking of setting these up as a deamon which will give them an actual process id that I can monitor, also I can run them int he background and not have a terminal open all the time.
I would like to know if there is a better way of handling these? I was also thinking about cron jobs but some of these processes need to loop every few seconds.
Any suggestions?
Creating a daemon which you can make calls to and ask questions would seem the sensible option. Depends on wether your hoster permits such things, especially if you're requiring it to do work every few seconds, then definately an OS based service/daemon would seem far more sensible than anything else.
You could create a daemon in PHP, but in my experience this is a lot of hard work and the result is unreliable due to PHP's memory management and error handling.
I had the same problem, I wanted to write my logic in PHP but have it daemonised by a stable program that could restart the PHP script if it failed and so I wrote The Fat Controller.
It's written in C, runs as a daemon and can run PHP scripts, or indeed anything. If the PHP script ends for whatever reason, The Fat Controller will restart it. This means you don't have to take care of daemonising or error recovery - it's all handled for you.
The Fat Controller can also do lots of other things such as parallel processing which is ideal for queue processing, you can read about some potential use cases here:
http://fat-controller.sourceforge.net/use-cases.html
I've done this for 5 years using PHP to run background tasks and its no different to doing in any other language. Just use CRON and lock files. The lock file will prevent multiple instances of your script running.
Also its important to monitor your code and one check I always do to prevent stale lock files from preventing scripts to run is to have second CRON job to check if if the lock file is older than a few minutes and if an instance of the PHP script is running, if not it then removes the lock file.
Using this technique allows you to set your CRON to run the script every minute without issues.
Use the System::Daemon module from PEAR.
One solution (that I really need to try myself, as I may need it) is to use cron, but get the process to loop for five mins or so. Then, get cron to kick it off every five minutes. As one dies, the next one should be finishing (or close to finishing).
Bear in mind that the two may overlap a bit, and so you need to ensure that this doesn't cause a clash (e.g. writing to the same video file). Some simple inter-process communication may be useful, even if it is just writing to a PID file in the temp directory.
This approach is a bit low-tech but helps avoid PHP hanging onto memory over the longer term - sort of in-built task restarts!
I have a PHP-script running on my server via a cronjob. The job runs every minute. In the php script i have a loop that executes, then waits one sevond and loops again. Essentially creating a script to run once every second.
Now I'm wondering, if i make the cronjob run only once per hour and have the script still loop for an entire hour or possible an entire day.. Would this have any impact on the servers cpu and or memory and if so, will it be positive or negative?
I spot a design flaw.
You can always have a PHP script permanently running in a loop performing whatever functionality you require, without dependency upon a webserver or clients.
You are obviously checking something with this script, any incites into what? There may be better solutions for you. For example if it is a database consider SQL triggers.
In my opinion it would have a negative impact. since the scripts keeps using resources.
cron is called on a time based scale that is already running on the server.
But cronjob can only run once a minute at most.
Another thing is if the script times out, fails, crashes for whatever reason you end up with not running the script for at max one hour. Would have a positive impact on server load but not what you're looking for i guess? :)
maybe run it every 2 or even 5 minutes to spare server load?
OR maybe change the script so it does not wait but just executes once and calling it from cron job. should have a positive impact on server load.
I think you should change script logic if it is possible.
If tasks your script executes are not periodic but are triggered by some events, the you can use some Message Queue (like Gearman).
Otherwise your solution is OK. Memory leaks can occurs, but in new PHP versions (5.3.x) Garbage Collector is pretty good. Some extensions can lead to memory leaks. Or your application design can lead to hungry memory usage (like Doctrine ORM loaded objects cache).
But you can control script memory usage by tools like monit and restart your script when mempry limit reaches some point or start script again when your script unexpectedly shuts down.