I developed a site using Zend Framework 2. It is basically a price comparison site that integrates with many of the top affiliate networks out there. I wrote a script that checks prices from each affiliate network, and then updates my local DB with that price. Depending on which affiliate network I am contacting, I may be making an API call (Amazon or CJ.com), or I may be looking at an XML product feed (Pepperjam or LinkShare). The XML product feed would be hosted locally.
At present, there are around 3,500 sku's that I am checking with this script. The vast majority of them (95%+) are targeting an XML product feed. I would estimate that this script should probably take in the neighborhood of 10 minutes to complete. Some of the XML files I am looking at are around 8 MB in size.
I have tested this script thoroughly in my local environment and taken great lengths to make sure that there is no memory leak or something of that nature which would cause performance issues. As an example, I made sure to use data streams where possible to avoid putting the XML file in memory over and over, etc. Suffice to say, the script runs locally without issue.
This script is intended to be run as a cron job, however I do have a way to trigger it via the secure admin interface ad-hoc. Locally, this is how I initiate the script to run, and everything goes rather smoothly.
When I deploy my code to the shared hosting account, I am having all sorts of problems. In order to troubleshoot, I attached logging to various stages of this script to track when it starts, how it progresses, and when each step completes, etc. All of this is being logged to a MySQL database.
Problem #1: If I run the script ad-hoc via an HTTP request, I find that it will run for a couple minutes, and then the script starts again (so there are now two instance apparently running). Wait another couple minutes, and a third one will start, etc..... Here is an example when I triggered the script to run at 10:09pm via an HTTP request.
Screenshot of process manager
Needless to say, I DO NOT run it via an HTTP request because it only serves to get me in trouble with my web hosting provider :)
Problem #2: When the script runs on the server, triggered via a cron job, it is failing to complete. I have taken the production copy of the database and taken it locally along with the XML files, it runs fine. So it should not be a problem with bad data exposing bad code. My observation is - the script nearly runs for the exact same amount of time - before aborts, or is terminated, or whatever. The last record updated is generally timestamped around 4 minutes and 30 seconds or so (if memory serves) after the script is triggered. The SKU list is constantly changing so the record that it ends on differs, but the the time of the last update is nearly the same each time. Nothing is being logged in the error logs. I monitored server resources via SSH top command and there is nothing out of the ordinary. CPU usage is in check and memory used does not go up.
I have a shared hosting account through Bluehost. My thoughts were that perhaps it was a script max execution time issue. I extended the max execution time in the script itself and via php.ini. Made no difference.
So I guess what I am looking for is some fresh ideas of where to go next. What questions should I be asking my hosting company so they can help me get to the bottom of this. They are only somewhat helpful to say the least. Could it be some limitation on my hosting account? Triggering some sort of automatic monitor that is killing the script? What types of Apache settings could be problematic for a script of this nature? PHP.ini settings? Absolutely any input you can provide would be helpful.
And why, when triggered via HTTP, would it keep spinning up new instances? I guess I could live w/o running it manually, and only run it via a cron job, but that isn't working either. So .... interested in hearing the communities thoughts on this. Thanks!
I haven't seen your script, neither did I work with your hoster, so everything below is just a guess - and a suggestion.
Given your description, I would say you're right that your script might have been killed by timeout when run from cron. I'm not sure why it keeps spawning new instances of your script when you execute it manually via an HTTP request, but it may also be related to a timeout (e.g. if they have a logic that restarts a script if it has not produced an output within a certain time, or something like that).
You can follow up with your hosting provider about running long-running (or memory-consuming) script in their environment, and they might have some FAQ or document already written that covers this topic.
Let me suggest an option for you in case if your provider is unable to help.
From what you said, I expect your script runs an SQL query to get a list of SKUs, and then slowly iterates over this list, performing some job on every item (and eventually dies for whatever reason, as we learned).
How about if you create a temporary table (or file - just any kind of persistent storage on the server) that would save the last processed record ID of the script, or NULL if the script successfully completed. That way you'll be able to make your script start with the last processed record (if the last processed record had id = 1000, add ... WHERE id > 1000 to the main query that fetches SKUs), and you won't really care if the script completed its first attempt or not (if not, it will keep processing from that very point when it was killed, on its second try).
Alternatively, to extend this approach, you can limit one invocation to the certain amount of records to process (e.g. 100 or 1000), again, saving the last processed record ID in the database or somewhere else.
The main idea is: if the script fails to process all SKUs at once, just make it restartable so that it does not lose its progress.
Related
I have this scenario:
User submits a link to my PHP website and closes the browser. Now that the server has got the link it will analyse the submitted link (page) for the broken links and after it has completely analysed the posted link, it will send an email to the user. I have a complete understanding of the second part i.e. how to analyse the page for the broken links and send the mail to the user. Only problem that I have is how may I achieve this first part i.e. make the server keep running the actions on it's own even even if there is no request made by the client end?
I have learned that "Crontab" or a "fork" may work for me. What do you say about these? Is it possible to achieve what I want, using these? What are the alternatives?
crontab would be the way to go for something like this.
Essentially you have two applications:
A web site where users submit data to a database.
An offline script, scheduled to run via cron, which checks for records in the database and performs the analysis, sending notifications of the results when complete.
Both of these applications share the same database, but are otherwise oblivious to each other.
A website itself isn't suited well for this sort of offline work, it's mainly a request/response system. But a scheduled task works for this. Unless the user is expecting an immediate response, a small delay of waiting for the next scheduled run of the offline task is fine.
The server should run the script independently of the browser. Once the request is submitted, the php server runs the script and returns the result to the browser (if it has a result to return)
An alternative would be to add the request to a database and then use crontab run the php script at a given interval. The script would then check the database to see if there's anything that needs to be processed. You could limit the script to run one database entry every minute (or whatever works). This will help prevent performance problems if you have a lot of requests at once, but will be slower to send the email.
A typical approach would be to enter the link into a database when the user submits it. You would then use a cron job to execute a script periodically, which will process any pending links.
Exactly how to setup a cron job (or equivalent scheduled task) depends on your server. If you have a host which provides a web-based admin tool (such as CPanel), there will often be a way to do it in there.
PHP script will keep running after the client closes the broser (terminating the connection).
Only keep in mind PHP scripts maximum execution time is limited to "max_execution_time" directive value.
Of course here I suppose the link submission happens calling your script page... I don't understand if this is your use case...
For the sake of simplicity, a cronjob could do the wonders. User submits a link, the web handler simply saves the link into a DB (let me pretend here that the table is named "queued_links"). Then a cronjob scheduled to run each minute (for example), selects every link from queued_links, does the application logic (finds broken page links) and sends the email. It then also deletes the link from queued_links (or updates a flag to represent the fact that the link has already been processed.
In the sake of scale and speed, a cronjob wouldn't fit as well as a Message Queue (see rabbitmq, activemq, gearman, and beanstalkd (gearman and beanstalk are my favorite 2, simple and fit well with php)). In lieu of spawning a cronjob every minute, a queue processor listens for 'events' and asynchronously processes the 'events' (think 'onLinkSubmission($link)'), and processes the messages ASAP. The cronjob solution is just a simplified implementation of one of these MQ solutions, will result in better / more predictable results, but at the cost of adding new services to maintain, etc.
well, there are couple of ways, simplest of them would be:
When user submit a request, save this request some where, let's call it jobs table, and inform customer that his request has been received, they'll be updated site finish processing your request, or whatever suites you.
Now, create a (or multiple) scripts (depending upon requirement) and run this script from Cron, this script will pick requests from Job table, process it, do whatever required.
Alternatively, you can evaluate possibility of message_queue or may be using a Job server for this.
so, it all depends on your requirement.
i started to learn programming like a month ago. I already knew html and css, i thought i should learn PHP. I learned alot of it from from tutorials and books, now I am making mysql based websites for practice.
I always used to play browser based strategy games like travian when i was a kid. I was thinking about how those sites worked. I didnt have any problem till i realized that the game actually worked after you closed the browser. For example; you log in to your account and start a construction and log off. But even after you close the browser, game knows that in "x" amount of time it needs to update your data of that specific building.
can someone tell me how that works? is it something with php or MySQL or some other programming language? even if you can tell me what to search online, it would be enough.
Despite being someone who loves tackling steep learning curves, I would advise against trying jump into something that requires background processes until you have a bit more programming experience.
But either way, here's what you need to know:
Normal PHP Process
The way that PHP normally works is the following:
User types a url into the browser and hits enter (or just clicks on a link)
Request is sent to a bunch of servers and magically finds its way to the right web server (beyond scope of this answer)
Server program like Apache or IIS listening on port 80 grabs the request
Apache sees that there's a .php extension on the requested page
Apache looks up if any processors have been assigned to .php and finds php.exe
The requested page is fed into php.exe
php.exe starts up a new process for the specific user, runs everything on the script, returns its result
The result is then sent back to the user
When the user closes the browser and ends the "session", the process started by php exits
So the problem you encounter when you want something running in the background is that PHP in most cases is generally accessed through the web server, and hence usually requires a browser (and user making requests through the browser). And since closing the browser ends the process, so you need a way to run php scripts without a browser.
Luckily PHP can be accessed outside of just the webserver as a normal process on the server. But then the problem is that you have to access the server. You probably don't want your users to ssh into your server in order to manually run scripts (and I'm assuming you don't want to do it manually on behalf of your users every single time either). Hence you have the options either creating cronjobs that will automatically execute a command at a specific frequency as if you had typed it in yourself on your server's commandline. Another option is to manually start a script once that doesn't shutdown unless your server shuts down.
Triggering a Script based on Time:
Cron that is a task scheduler on *nix systems and Windows Task Scheduler on Windows. What you can do is set up a cronjob to run a specific php file at a specific frequency, and execute all the "background" tasks you need to run from within there.
One way of doing this would be to have a mysql table containing things that need to be executed along with when they need to be executed. The script then queries the table based on time to retrieve which tasks need to be executed, executes them, and then marks them executed (or just deletes them) in the mysql table.
This is a basic form of process queuing.
Building a Queue Server
This is a lot more advanced, but here's a tutorial for creating a script that will queue processes in the background without the need for any external databases: Building a Queue Server in PHP .
Let me know if this makes sense or if you have any questions :)
PHP is a server side language. Any time anybody accesses a PHP program on the server, it runs, irrespective of who is a client.
So, imagine a program that holds a counter. It stores this in a database. Every time updatecounter.php is called, the counter gets updated by one.
You browse to updatecounter.php, and it tells you that the counter is now at 34.
Next time you browse to updatecounter.php it tells you that the counter is at 53.
Its gone up by 18 more counts than you were expecting.
This is because updatecounter.php was being run without your intervention. It was being run by other people.
Now, if you looked at updatecounter.php, you might see code like this:
require_once("my_code.php);
$counterValue = increment_counter_value();
echo "New Counter Value = ".$counterValue;
Notice that the main core of the program is stored in a separate program than the program that you are calling.
Also, notice that instead of calling increment_counter_value, you could call anything. So every time somebody browsed to updatecounter.php, or whatever your game would be called, the internal game mechanics could be run. You could for instance, have an hourly stat management routine which would check each time it was called if it had been run in the last hour, and if it hadn't it would perform all the stats.
Now, what if nobody else is playing your game? If that happens, then the hourly stat management wouldn't get called, and your game world would die. So what you would need to do is create another program who's sole function is to run your stats. You would then schedule that program on the server to run at an hourly interval. You do this using something called a CRON job. You will probably find that your host already has this facility built in, if you are on Apache. I won't go into any more detail about task scheduling as without knowing your environment its impossible to give the correct answer. But basically, you would need to schedule a PHP program to run on the server to perform the hourly maintenance.
Here's a tutorial on CRON jobs:
http://net.tutsplus.com/tutorials/other/scheduling-tasks-with-cron-jobs/
I haven't used it myself but I've had no problems with other stuff on tutsplus so you should be ok.
This is not only php . Browser based game are combination of php/mysql/javascript/html . There are lot of technologies being used for this kind of work. When you are doing something on the browser, lets say adding a building ,an ajax request is being sent to the server so the server updates the database (can't wait until logout because then other users won't know your status to play (in case of multiparty) .
As my server is not supporting cron job, I want a file in my server to trigger its action on a particular time every day..
Please let me know whether it possible to do run a script at a particular time from the server side itself without any external act.
I agree with Kel's answer.
You could try out one of the free cronjob services available, if your server doesn't support it.
Online Cronjobs
Set Cronjob
Just the first two found on Google, there's likely to be more if you search a little.
You cannot start script without ANY external act.
If your file server has SSH or HTTP server or something like that, you can configure cron job on another server to start your script via SSH / HTTP / something like that.
Also, you can create PHP script, which would do sleeping in a loop all the time, and wake up and do some job only if current time is near some specific value. You will have to correct maximum execution time for php script (see here for details), and you will have to start your script on server startup. BTW, this does not look like good solution.
As mentioned before, this is not possible literally "without external act".
A nice solution I found in the ThinkUp software (don't know where else this is used) to use a RSS feed reader. From the point of simplicity, this is probably the best option.
The idea is that you use your feed reader to automatically call a script on your site every XX hours (or whatever interval you want). When called, this script executes the maintenance tasks or whatever it is that you want to do.
To make sure that not everybody can run that script and cause your server to break down (I suppose this is a somewhat heavy task), you can use a unique long identifier string appended as URL parameter to make sure that the script only gets called by you.
Other than that, you can use one of the "poor man's" web cron job services that have been suggested in other answers.
if (rand(0,100)==0){
if (!file_exists($tf='tmp/job.crontime') || (time() - filemtime($tf))>(60*60*24)){
... # your tasks
touch($tf);
}
}
This simple & stupid script uses a file to store the time of last job-ecexution. If >60*60*24 has passed — it launches the job code. rand(0,100) should lower the overhead of checking for jobs on each request: 1/100 is the chance of running your jobs.
Put it in the end of your 'index.php'. Don't use in projects with modelate to high load :))
The Great Disadvantage: it won't run if you don't have any visitors.
UPD: Write a script that runs indefinitely and every 30s does touch('tmp/job.crontime') to report it's still alive. It should also check the current time & perform actions.
In index.php, if more than 30s has passed — re-launch the daemon with an HTTP-request. Ugly, but fully functional. You'll also deal with time limits, be careful!
Well, if this is on a public web server and you have enough visits, you could always use those to run code to check for a given value, say hour of day, number of times a file have been accessed (or store your number in a file). Just put your php code on top of a web page.
I'm currently running a Linux based VPS, with 768MB of Ram.
I have an application which collects details of domains and then connect to a service via cURL to retrieve details of the pagerank of these domains.
When I run a check on about 50 domains, it takes the remote page about 3 mins to load with all the results, before the script can parse the details and return it to my script. This causes a problem as nothing else seems to function until the script has finished executing, so users on the site will just get a timer / 'ball of death' while waiting for pages to load.
**(The remote page retrieves the domain details and updates the page by AJAX, but the curl request doesnt (rightfully) return the page until loading is complete.
Can anyone tell me if I'm doing anything obviously wrong, or if there is a better way of doing it. (There can be anything between 10 and 10,000 domains queued, so I need a process that can run in the background without affecting the rest of the site)
Thanks
A more sensible approach would be to "batch process" the domain data via the use of a cron triggered PHP cli script.
As such, once you'd inserted the relevant domains into a database table with a "processed" flag set as false, the background script would then:
Scan the database for domains that aren't marked as processed.
Carry out the CURL lookup, etc.
Update the database record accordingly and mark it as processed.
...
To ensure no overlap with an existing executing batch processing script, you should only invoke the php script every five minutes from cron and (within the PHP script itself) check how long the script has been running at the start of the "scan" stage and exit if its been running for four minutes or longer. (You might want to adjust these figures, but hopefully you can see where I'm going with this.)
By using this approach, you'll be able to leave the background script running indefinitely (as it's invoked via cron, it'll automatically start after reboots, etc.) and simply add domains to the database/review the results of processing, etc. via a separate web front end.
This isn't the ideal solution, but if you need to trigger this process based on a user request, you can add the following at the end of your script.
set_time_limit(0);
flush();
This will allow the PHP script to continue running, but it will return output to the user. But seriously, you should use batch processing. It will give you much more control over what's going on.
Firstly I'm sorry but Im an idiot! :)
I've loaded the site in another browser (FF) and it loads fine.
It seems Chrome puts some sort of lock on a domain when it's waiting for a server response, and I was testing the script manually through a browser.
Thanks for all your help and sorry for wasting your time.
CJ
While I agree with others that you should consider processing these tasks outside of your webserver, in a more controlled manner, I'll offer an explanation for the "server standstill".
If you're using native php sessions, php uses an exclusive locking scheme so only a single php process can deal with a given session id at a time. Having a long running php script which uses sessions can certainly cause this.
You can search for combinations of terms like:
php session concurrency lock session_write_close()
I'm sure its been discussed many times here. I'm too lazy to search for you. Maybe someone else will come along and make an answer with bulleted lists and pretty hyperlinks in exchange for stackoverflow reputation :) But not me :)
good luck.
I'm not sure how your code is structured but you could try using sleep(). That's what I use when batch processing.
this is more of a fundamental question at how apache/threading works.
in this hypothetical (read: sometimes i suck and write terrible code), i write some code that enters the infinite-recursion phases of it's life. then, what's expected, happens. the serve stalls.
even if i close the tab, open up a new one, and hit the site again (locally, of course), it does nothing. even if i hit a different domain i'm hosting through a vhost declaration, nothing. i normally have to wait a number of seconds before apache can begin handling traffic again. most of the time i just get tired and restart the server manually.
can someone explain this process to me? i have the php runtime setting 'ignore_user_abort' set to true to allow ajax calls that are initiated to keep running even if they close their browser, but would this being set to false affect it?
any help would be appreciated. didn't know what to search for.
thanks.
ignore_user_abort() allows your script (and Apache) to ignore a user disconnecting (closing browser/tab, moving away from page, hitting ESC, esc..) and continue processing. This is useful in some cases - for instance in a shopping cart once the user hits "yes, place the order". You really don't want an order to die halfway through the process, e.g. order's in the database, but the charge hasn't been sent to the payment facility yet. Or vice-versa.
However, while this script is busilly running away in "the background", it will lock up resources on the server, especially the session file - PHP locks the session file to make sure that multiple parallel requests won't stomp all over the file, so while your infinite loop is running in the background, you won't be able to use any session-enabled other part of the site. And if the loop is intensive enough, it could tie up the CPU enough that Apache is unable to handle any other requests on other hosted sites, where the session lock might not apply.
If it is an infinite loop, you'll have to wait until PHP's own maximum allowed run time (set_time_limit() and max_execution_time()) kicks in and kills the script. There's also some server-side limiters, like Apache's RLimitCPU and TimeOut that can handle situations like this.
Note that except on Windows, PHP doesn't count "external" time in the set_time_limit. So if your runaway process is doing database stuff, calling external programs via system() and the like, the time spent running those external calls is NOT accounted for in the parent's time limit.
If you write code that causes an (effectively) neverending loop, then apache will execute that, and be unable to respond to any additional new requests for a page, because it's trying to determine the page content (for the served page which caused the neverending loop) by executing the (non-terminating) php code.
Solution: don't write code that doesn't terminate (in a reasonable amount of time). Understand loop invariants.