I'm currently on XAMPP and Mac OSX and I have the following scenario and problem.
Let's pretend I have two VirtualHosts
a.dev and b.dev
a.dev has a cronjob that takes some time. Now, while doing that I want to develop further on other sites, but the cronjob on a.dev has blocked everything and I can't access anything on the page anymore. However, b.dev works without problems, so it seems that there is only a problem with the hosts.
Or could it be the database that makes problems? That it somehow locks tables? But then I wouldn't be able to access the phpmyadmin page of the database, right?
Your cron job is probably eating all the resources allocated for the php script on a.dev. You need to raise the limits in your php.ini or improve the script:
you delete the variables you don't need anymore to free the memory
if your script contains a loop, you can use sleep() after each loop to limit the CPU usage
Related
I developed a site using Zend Framework 2. It is basically a price comparison site that integrates with many of the top affiliate networks out there. I wrote a script that checks prices from each affiliate network, and then updates my local DB with that price. Depending on which affiliate network I am contacting, I may be making an API call (Amazon or CJ.com), or I may be looking at an XML product feed (Pepperjam or LinkShare). The XML product feed would be hosted locally.
At present, there are around 3,500 sku's that I am checking with this script. The vast majority of them (95%+) are targeting an XML product feed. I would estimate that this script should probably take in the neighborhood of 10 minutes to complete. Some of the XML files I am looking at are around 8 MB in size.
I have tested this script thoroughly in my local environment and taken great lengths to make sure that there is no memory leak or something of that nature which would cause performance issues. As an example, I made sure to use data streams where possible to avoid putting the XML file in memory over and over, etc. Suffice to say, the script runs locally without issue.
This script is intended to be run as a cron job, however I do have a way to trigger it via the secure admin interface ad-hoc. Locally, this is how I initiate the script to run, and everything goes rather smoothly.
When I deploy my code to the shared hosting account, I am having all sorts of problems. In order to troubleshoot, I attached logging to various stages of this script to track when it starts, how it progresses, and when each step completes, etc. All of this is being logged to a MySQL database.
Problem #1: If I run the script ad-hoc via an HTTP request, I find that it will run for a couple minutes, and then the script starts again (so there are now two instance apparently running). Wait another couple minutes, and a third one will start, etc..... Here is an example when I triggered the script to run at 10:09pm via an HTTP request.
Screenshot of process manager
Needless to say, I DO NOT run it via an HTTP request because it only serves to get me in trouble with my web hosting provider :)
Problem #2: When the script runs on the server, triggered via a cron job, it is failing to complete. I have taken the production copy of the database and taken it locally along with the XML files, it runs fine. So it should not be a problem with bad data exposing bad code. My observation is - the script nearly runs for the exact same amount of time - before aborts, or is terminated, or whatever. The last record updated is generally timestamped around 4 minutes and 30 seconds or so (if memory serves) after the script is triggered. The SKU list is constantly changing so the record that it ends on differs, but the the time of the last update is nearly the same each time. Nothing is being logged in the error logs. I monitored server resources via SSH top command and there is nothing out of the ordinary. CPU usage is in check and memory used does not go up.
I have a shared hosting account through Bluehost. My thoughts were that perhaps it was a script max execution time issue. I extended the max execution time in the script itself and via php.ini. Made no difference.
So I guess what I am looking for is some fresh ideas of where to go next. What questions should I be asking my hosting company so they can help me get to the bottom of this. They are only somewhat helpful to say the least. Could it be some limitation on my hosting account? Triggering some sort of automatic monitor that is killing the script? What types of Apache settings could be problematic for a script of this nature? PHP.ini settings? Absolutely any input you can provide would be helpful.
And why, when triggered via HTTP, would it keep spinning up new instances? I guess I could live w/o running it manually, and only run it via a cron job, but that isn't working either. So .... interested in hearing the communities thoughts on this. Thanks!
I haven't seen your script, neither did I work with your hoster, so everything below is just a guess - and a suggestion.
Given your description, I would say you're right that your script might have been killed by timeout when run from cron. I'm not sure why it keeps spawning new instances of your script when you execute it manually via an HTTP request, but it may also be related to a timeout (e.g. if they have a logic that restarts a script if it has not produced an output within a certain time, or something like that).
You can follow up with your hosting provider about running long-running (or memory-consuming) script in their environment, and they might have some FAQ or document already written that covers this topic.
Let me suggest an option for you in case if your provider is unable to help.
From what you said, I expect your script runs an SQL query to get a list of SKUs, and then slowly iterates over this list, performing some job on every item (and eventually dies for whatever reason, as we learned).
How about if you create a temporary table (or file - just any kind of persistent storage on the server) that would save the last processed record ID of the script, or NULL if the script successfully completed. That way you'll be able to make your script start with the last processed record (if the last processed record had id = 1000, add ... WHERE id > 1000 to the main query that fetches SKUs), and you won't really care if the script completed its first attempt or not (if not, it will keep processing from that very point when it was killed, on its second try).
Alternatively, to extend this approach, you can limit one invocation to the certain amount of records to process (e.g. 100 or 1000), again, saving the last processed record ID in the database or somewhere else.
The main idea is: if the script fails to process all SKUs at once, just make it restartable so that it does not lose its progress.
I've got a sever which has an action triggered by a frequent cron job.
This is a php application build on Silverstripe (4.0).
The issue I'm facing is the php processes stay alive and also keep database connections open. This means after a few days the site stops working entirely once SQL stops accepting new connections.
The system has two tasks on cron jobs;
One takes a massive CSV file and spits it into smaller sub files which are then imported into the database. This one uses a lock file to prevent it running into a previously running instance. I'm not too sure if this is working.
The second task processes all the records which have been updated in large batches.
Either of these could be the source of the overloading but I'm not sure how to narrow it down.
What's the best way to diagnose the source of the issue?
In terms of debugging, this would be like any other task profiling the application with something like xdebug and kcachegrind. To ensure that processes do not run for too long you can limit the max_execution_time for the PHP.ini for the CLI.
To then let the CLI process run for a long time, but only just enough time add something to set the max execution time on a per row basis:
$allowed_seconds_per_row = 3;
foreach($rows_to_process as $row){
set_time_limit($allowed_seconds_per_row);
$this->process($row);
}
You can also register a shutdown function to record the state as the script ends.
It is likely that the memory is a key cause for failure and debugging focused on the memory usage and that can be controlled by unsetting variable data as needed.
I'm creating a plugin for a CMS and need one or more preriodical tasks in background. As it is a plugin for an open source CMS, cron job is not a perfect solution because users may not have access to cron on their server.
I'm going to start a infinite loop via an AJAX request then abort XHR request. So HTTP connection will be closed but script continue running.
Is it a good solution generally? What about server resources? Is there any shutdown or limitation policies in servers (such as Apache) for long time running threads?
Long running php scripts are not too good idea. If your script uses session variables your user won't be able to load any pages until the other session based script is closed.
If you really need long running scripts make sure its not using any session and keep them under the maximum execution time. Do not let it run without your control. It can cause various problems. I remember when I made such a things like that and my server just crashed several times.
Know what you want to do and make sure it's well tested on different servers.
Also search for similiar modules and check what methods they use for such a problems like that. Learn from the pros. :)
I have a large php web scraping script that logs the results onto a mysql database as it goes.The script generally runs for 5 to 10 minutes at a time.
The problem is that when this script is running other pages on the application will not load.
The script is on a dedicated server with plenty of RAM so I have tried increasing the allowed memory usage for MYSQL and PHP. Also increased the max allowed connections. None of this has helped.
Does anyone have any ideas about what else I can try?
Probably, problem in your session. Try to use session_write_close() before you start "big script".
well, there's a big difference between "slowing down" and "not load"!
try the following:
build a static html site and check if it is loaded well during the execution of the big script
build a php site that doesn't connect to the DB (just echo something) during the execution of the big script
build a small php site that connects to the DB (just select something from a table)
if 1 or 2 doesn't work well, your problem has something todo with the web server or server resources. if 3 doesn't work well, there could be resource issues with the mysql server.
if everything works well, check the scraping script. does it lock any table that is needed by the main application?
I built an app in php where a feature analyzes about 10000 text files and extracts stuff from them and puts it into a mysql database. The code itself is just a for loop where every file is loaded through file_get_contents() and after the end of that iteration, its unset() from memory. The file analysis is a cron job and a single php file does all this processing.
The problem however is that the app was built (initially) entirely on a shared server everything worked seamlessly really well. I didn't notice any delays or major lags neither did users however in order for it to be able to handle more of a load, I moved everything to an EC2 server (the micro instance).
The problem I am having now is that every time I run the cronjob (process the files on hourly basis) it slows the entire server down so much that a normal page takes about 5-8 seconds to load, which sort of defeats the purpose of moving it to EC2.
The cron itself is a very long process. Here are some tests results of the script process (every hr)
SQL Insertion Time: 23.138303995132 seconds
Memory Used: 10.05 MB
Execution: 411.00507092476 seconds
But on the top of every hour the server slows down so much for 7 minutes despite of having more dedicated hardware acceleration compared to a shared server (I think at least). The graphs from EC2 dashboard show that the CPU usage is close to 100% but I don't understand how it gets to that level.
Can anyone help me determine the reason as to why this could be happening? I have noticed not even the slightest lag when the cron runs on the shared server but the case is completely different for EC2.
Please feel free to ask me anything I missed mentioning.
Micro instances are pretty slow. If you use a larger instance, it'll run a lot faster.
We use EC2 for all of our production boxes. I can't say enough good things about that platform. I'll never go back to another host.
Also, if you want to write your code in C++, it'll run A LOT faster. I wrote a simple mysql insert with this code here. It's multi-threaded, so you can asyncronously run mysql updates or inserts.
Please let me know if you need any help with it, but I'm sure you'll be able to just use a micro instance still and get great speeds.
Hope that helps...
PS. I'd be willing to help you write a C++ version for your uses... just because it's fun! :-)
Well EC2 is designed to be scalable.
Since your code is running in 1 loop to open each file one after another, it does not make for a scalable design.
Try changing your codes to break them up so that the files are handled concurrently by different instances of the php script. That way, each copy of the script can run in a thread by itself. If you have multiple servers (or instances of servers in EC2), you can run them on different machines to speed it up even more.