I have a small website / web application (HTML/Jquery/PHP/MYSQL) that loads an HTML document, then simultaneously calls php files in the backend to fetch data sets.
For example, a contacts.php page is loaded, then AJAX calls to 4 PHP scripts at the same time to load different data sets:
contacts-names.php
contacts-groups.php
contacts-tags.php
contacts-locations.php
These datasets are rather small and are less than 100 rows in the DB.
individually these guys run fine. But when called from main page (initial page load) I get the memory limit hits.
If it was just one file causing the issue, i could dig in and optimize it.. but every time i load the page, one or 2 of the above calls get an error (hitting memory limit) and they seem so random.
I went as far as to putting an exit(); code on the top of the script (to stop it from running) to two of the files i would still get max resource hit randomly, sometimes on the file with the exit() code! doesn't make sense. The darn file isn't running any code anymore.
The only way that seem to fix this is when i remove some of the calls (2 scripts) w/c renders my App useless.
Or if I set a delayed call to 2 file (JS timeout) ..
So it seems that calling all php scripts at the same time causes the memory limit issue. Is this normal? I could simply settle w/ this delayed call strategy but I wanted to know if you guys had to do this as well (on limited shared hosts)
Other notes:
- im on a very good cloud unix type shared server
- even if i terminate 2 of the scripts being called simultaneously w/ the other 2 scripts , i still get issues. So it must not be my code and optimization wont do me any good.
- i profiled my scripts individually via xdebug and cachegrind and they all seem fine.
Best
Related
I developed a site using Zend Framework 2. It is basically a price comparison site that integrates with many of the top affiliate networks out there. I wrote a script that checks prices from each affiliate network, and then updates my local DB with that price. Depending on which affiliate network I am contacting, I may be making an API call (Amazon or CJ.com), or I may be looking at an XML product feed (Pepperjam or LinkShare). The XML product feed would be hosted locally.
At present, there are around 3,500 sku's that I am checking with this script. The vast majority of them (95%+) are targeting an XML product feed. I would estimate that this script should probably take in the neighborhood of 10 minutes to complete. Some of the XML files I am looking at are around 8 MB in size.
I have tested this script thoroughly in my local environment and taken great lengths to make sure that there is no memory leak or something of that nature which would cause performance issues. As an example, I made sure to use data streams where possible to avoid putting the XML file in memory over and over, etc. Suffice to say, the script runs locally without issue.
This script is intended to be run as a cron job, however I do have a way to trigger it via the secure admin interface ad-hoc. Locally, this is how I initiate the script to run, and everything goes rather smoothly.
When I deploy my code to the shared hosting account, I am having all sorts of problems. In order to troubleshoot, I attached logging to various stages of this script to track when it starts, how it progresses, and when each step completes, etc. All of this is being logged to a MySQL database.
Problem #1: If I run the script ad-hoc via an HTTP request, I find that it will run for a couple minutes, and then the script starts again (so there are now two instance apparently running). Wait another couple minutes, and a third one will start, etc..... Here is an example when I triggered the script to run at 10:09pm via an HTTP request.
Screenshot of process manager
Needless to say, I DO NOT run it via an HTTP request because it only serves to get me in trouble with my web hosting provider :)
Problem #2: When the script runs on the server, triggered via a cron job, it is failing to complete. I have taken the production copy of the database and taken it locally along with the XML files, it runs fine. So it should not be a problem with bad data exposing bad code. My observation is - the script nearly runs for the exact same amount of time - before aborts, or is terminated, or whatever. The last record updated is generally timestamped around 4 minutes and 30 seconds or so (if memory serves) after the script is triggered. The SKU list is constantly changing so the record that it ends on differs, but the the time of the last update is nearly the same each time. Nothing is being logged in the error logs. I monitored server resources via SSH top command and there is nothing out of the ordinary. CPU usage is in check and memory used does not go up.
I have a shared hosting account through Bluehost. My thoughts were that perhaps it was a script max execution time issue. I extended the max execution time in the script itself and via php.ini. Made no difference.
So I guess what I am looking for is some fresh ideas of where to go next. What questions should I be asking my hosting company so they can help me get to the bottom of this. They are only somewhat helpful to say the least. Could it be some limitation on my hosting account? Triggering some sort of automatic monitor that is killing the script? What types of Apache settings could be problematic for a script of this nature? PHP.ini settings? Absolutely any input you can provide would be helpful.
And why, when triggered via HTTP, would it keep spinning up new instances? I guess I could live w/o running it manually, and only run it via a cron job, but that isn't working either. So .... interested in hearing the communities thoughts on this. Thanks!
I haven't seen your script, neither did I work with your hoster, so everything below is just a guess - and a suggestion.
Given your description, I would say you're right that your script might have been killed by timeout when run from cron. I'm not sure why it keeps spawning new instances of your script when you execute it manually via an HTTP request, but it may also be related to a timeout (e.g. if they have a logic that restarts a script if it has not produced an output within a certain time, or something like that).
You can follow up with your hosting provider about running long-running (or memory-consuming) script in their environment, and they might have some FAQ or document already written that covers this topic.
Let me suggest an option for you in case if your provider is unable to help.
From what you said, I expect your script runs an SQL query to get a list of SKUs, and then slowly iterates over this list, performing some job on every item (and eventually dies for whatever reason, as we learned).
How about if you create a temporary table (or file - just any kind of persistent storage on the server) that would save the last processed record ID of the script, or NULL if the script successfully completed. That way you'll be able to make your script start with the last processed record (if the last processed record had id = 1000, add ... WHERE id > 1000 to the main query that fetches SKUs), and you won't really care if the script completed its first attempt or not (if not, it will keep processing from that very point when it was killed, on its second try).
Alternatively, to extend this approach, you can limit one invocation to the certain amount of records to process (e.g. 100 or 1000), again, saving the last processed record ID in the database or somewhere else.
The main idea is: if the script fails to process all SKUs at once, just make it restartable so that it does not lose its progress.
I've got the following scenario: multiple users from local network access a web application in coded in php resident on a IIS server. At every page load 3 ajax calls to 3 separated php script are performed and these calls repeat themselves every X minutes (timed jquery). For every ajax call of every user connected a php-cgi session is opened on the server, adding quickly up to 20 or so processes. The problem is after the ajax call these processes remain open, thus using a large amount of memory on the server with consequential problems to performances (arriving at a total block at times).
All the php scripts are called via the jQuery $.post function and perform one or more queries on a mssql db and end echoing a json encoded object or array. Is there a way to make these process close after the execution of the php script? I would like to avoid the option of making serial calls instead of parallel ones.
Any help is strongly appreciated.
Thanks
Don't know if you have already got a solution but anyways:
Try adding a
die();
?>
at the end of the php scripts that are called. That would kill the scripts that are called after they have completed execution. As each call creates its own process, their would be no issues even if things are done in parallel.
I run a php script that uses the wikipedia api to locate wikipedia pages about certain movies, based on a long list with titles and year of release. This takes 1-2 seconds per query on average, and I do about 5 queries per minute. This has been working well for years. But since february 11 it suddenly became very slow : 30 seconds per query seems the norm now.
This is a example from a random movie in my list, and the link my script loads with file_get_contents();
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=yaml&rvsection=0&titles=Deceiver_(film)
I can put this link in my browser directly and it takes no more than a few seconds to load and open it. So I don't think the wikipedia api servers have suddenly become slow. When I load the link to my php script from my webserver in my browser, it takes between 20 to 40 seconds before the page is loaded and the result from one query is shown. When I run the php script from the command line of my webserver, I have the same slow loading times. My script still manage to save some results to the database now and then, so I'm probably not blocked either.
Other parts of my php scripts have not slowed down. There is a whole bunch of calculations done with the results of the wikipedia api, and all that is still working at a regular speed. So my webserver is still healthy. The load is always pretty low, and I'm not even using this server for something else. I have restarted apache, but found no difference in loading times.
My questions :
has something changed in the wikipedia api system recently ? Perhaps my way of using it is outdated and I need to use something new ?
where could I look for the cause of this slow loading ? I've dug through error files and tested what I could test, but I don't even know where something goes wrong. If I knew what to look for, I might perhaps easily fix it.
git log --until=2013-02-11 --merges in operations/puppet.git shows no changes on that day. Sounds like a DNS issue or similar on your end.
i have a script that load a csv by CURL, once i have the csv it add each of the records to the database and when finished, display the total amount of registries added.
on less than 500 registries, it execute just fine, the problem is that whennever the amount of registries is too big, the execution is interrupted at some point and the browser displays the download dialog with a file named like the last part of my url withouth extension containing nothing. no warning, error or any kind of message. the database shows that it added some of the registries, if i run the script several times it adds a small amount more.
i have tried to look for someone with a similar situation but haven't find it yet.
i would appreciate any insight in the matter, i'm not sure if this is a symfony2 problem, a server configuration problem or what.
thanks in advance
Probably your script is reaching the maximum php execution time which is by default 30 secs. You can change it in the controller doing the lengthy operation with the php set_time_limit() function. For example:
set_time_limit (300); //300 seconds = 5 minutes
That's more a limitation of your webserver/environment PHP is running in.
Increase max_execution_time to allow your webserver running the request longer - alternative would be writing a console command, the cli environment isn't restricted in many cases.
I have written myself a web crawler using simplehtmldom, and have got the crawl process working quite nicely. It crawls the start page, adds all links into a database table, sets a session pointer, and meta refreshes the page to carry onto the next page. That keeps going until it runs out of links
That works fine however obviously the crawl time for larger websites is pretty tedious. I wanted to be able to speed things up a bit though, and possibly make it a cron job.
Any ideas on making it as quick and efficient as possible other than setting the memory limit / execution time higher?
Looks like you're running your script in a web browser. You may consider running it from the command line. You can execute multiple scripts to crawl on different pages at the same time. That should speed things up.
Memory must not be an problem for a crawler.
Once you are done with one page and have written all relevant data to the database you should get rid of all variables you created for this job.
The memory usage after 100 pages must be the same as after 1 page. If this is not the case find out why.
You can split up the work between different processes: Usually parsing a page does not take as long as loading it,so you can write all links that you find to a database and have multiple other processes that just download the documents to a temp directory.
If you do this you must ensure that
no link is downloaded by to workers.
your processes wait for new links if there are none.
temp files are removed after each scan.
the download process stop when you run out of links. You can archive this by setting a "kill flag" this can be a file with a special name or an entry in the database.