i have a big script written in php, which should import a lot of informations in a prestashop installation, using webservices, this script is written in "sections" I mean, there is a function that import the categories, another one that import products, then manufacturers, and so on, there are about 7 - 10 functions called in the main script. Basically I assume that this script must run for about an hour, passing from a function to the next one and so on since it arrives at the last function, then return some values and stops until the next night.
i would like to understand if it could be better :
1) impose a time limit of 30 minutes everytime i enter a new function (this will prevent the timeout)
2) make a chain of pages, each one with a single function call (and of course the time limit)
or any other idea... i would like to :
know if a function has been called (maybe using a global variable?)
be sure that the server will execute the function in order (so the pages chain)...
i hope to have beeen clear, otherwise i'll update the question.
edits:
the script is executed by another server that will call a page, the other server is "unkown" from me, so I simply know only that this page is called (they could also call the function by going on the page) but anyway i have no controll on it.
For any long running scripts, I would run it through the commandline, probably with a cronjob to kick it off. If it's triggered from the outside, I would create a job queue (for example in the database) where you insert a new row to signify that it should run, along with any variable input params. Then the background job would run - say - every 5 minutes, check if there's a new job in the queue. If there's not, just exit. If there is, mark that it has begun work and start processing. When done, mark that it's done.
1 hour of work is a looooooooong time though. Nothing you can do to optimise that?
You can increase the time limit for execution of a script as much as you want using :
set_time_limit(seconds);
And also for long running scripts you need a more memory. you can increase the memory limit using :
ini_set('memory_limit','20M');
And second other thing you have to make sure is that you are running your script on a dedicated server because if you are using a shared server you server will kill automatically long running scripts.
Related
I'm building a PHP application which has a database containing approximately 140 URL's.
The goal is to download a copy of the contents of these web pages.
I've already written code which reads the URL's from my database then uses curl to grab a copy of the page. It then gets everything between <body> </body>, and writes it to a file. It also takes into account redirects, e.g. if I go to a URL and the response code is 302, it will follow the appropriate link. So far so good.
This all works ok for a number of URL's (maybe 20 or so) but then my script times out due to the max_execution_time being set to 30 seconds. I don't want to override or increase this, as I feel that's a poor solution.
I've thought of 2 work arounds but would like to know if these are a good/bad approach, or if there are better ways.
The first approach is to use a LIMIT on the database query such that it splits the task up into 20 rows at a time (i.e. run the script 7 separate times, if there were 140 rows). I understand from this approach it still needs to call the script, download.php, 7 separate times so would need to pass in the LIMIT figures.
The second is to have a script where I pass in the ID of each individual database record I want the URL for (e.g. download.php?id=2) and then do multiple Ajax requests to them (download.php?id=2, download.php?id=3, download.php?id=4 etc). Based on $_GET['id'] it could do a query to find the URL in the database etc. In theory I'd be doing 140 separate requests as it's a 1 request per URL set up.
I've read some other posts which have pointed to queueing systems, but these are beyond my knowledge. If this is the best way then is there a particular system which is worth taking a look at?
Any help would be appreciated.
Edit: There are 140 URL's at the moment, and this is likely to increase over time. So I'm looking for a solution that will scale without hitting any timeout limits.
I dont agree with your logic , if the script is running OK and it needs more time to finish, just give it more time it is not a poor solution.What you are suggesting makes things more complicated and will not scale well if your urls increase.
I would suggest moving your script to the command line where there is no time limit and not using the browser to execute it.
When you have an unknown list wich will take an unknown amount of time asynchronous calls are the way to go.
Split your script into a single page download (like you proposed, download.php?id=X).
From the "main" script get the list from the database, iterate over it and send an ajax call to the script for each one. As all the calls will be fired all at once, check for your bandwidth and CPU time. You could break it into "X active task" using the success callback.
You can either set the download.php file to return success data or to save it to a database with the id of the website and the result of the call. I recommend the later because you can then just leave the main script and grab the results at a later time.
You can't increase the time limit indefinitively and can't wait indefinitively time to complete the request, so you need a "fire and forget" and that's what asynchronous call does best.
As #apokryfos pointed out, depending on the timing of this sort of "backups" you could fit this into a task scheduler (like chron). If you call it "on demand", put it in a gui, if you call it "every x time" put a chron task pointing the main script, it will do the same.
What you are describing sounds like a job for the console. The browser is for the users to see, but your task is something that the programmer will run, so use the console. Or schedule the file to run with a cron-job or anything similar that is handled by the developer.
Execute all the requests simultaneously using stream_socket_client(). Save all the socket IDs in an array
Then loop through the array of IDs with stream_select() to read the responses.
It's almost like multi-tasking within PHP.
Here's what I'm trying to accomplish in high-level pseudocode:
query db for a list of names (~100)
for each name (using php) {
query a 3rd party site for xml based on the name
parse/trim the data received
update my db with this data
Wait 15 seconds (the 3rd party site has restrictions and I can only make 4 queries / minute)
}
So this was running fine. The whole script took ~25 minutes (99% of the time was spent waiting 15 seconds after every iteration). My web host then made a change so that scripts will timeout after 70 seconds (understandable). This completely breaks my script.
I assume I need to use cronjobs or command line to accomplish this. I only understand the basic us of cronjobs. Any high level advice on how to split up this work in a cronjob? I am not sure how a cronjob could parse through a dynamic list.
cron itself has no idea of your list and what is done already, but you can use two kinds of cron-jobs.
The first cron-job - that runs for example once a day - could add your 100 items to a job queue.
The second cron-job - that runs for example once every minute in a certain period - can check if there are items in the queue, execute one (or a few) and remove it from the queue.
Note that both cron-jobs are just triggers to start a php script in this case and you have two different scripts, one to set the queue and one to process part of a queue so almost everything is still done in php.
In short, there is not much that is different. Instead of executing the script via modphp or fcgi, you are going to execute it via command line php /path/to/script.php.
Because this is a different environment than http, some things obviously don't work. Sessions, cookies, get and post variables. Output gets send to stdout instead of the browser.
You can pass arguments to your script by using $argv.
I'm looking for a way to run a php script multiple times from a browser. Here's the scenario:
I'm building a mySQL table from a series of large files ranging anywhere from 100 megs to 2 gigs. On average, there will be around 150,000 records in the table.
I'm doing so right now by having a javascript function that does an AJAX call to the PHP script. On success, the function sets a timeout to run itself and trigger the AJAX call to run the second hundred.
My thinking behind this was to give the function a second to close out before it runs itself again.
This isn't working so well. The whole function itself works, but performance-wise it is quite slow.
When I wasn't doing 100 records at a time and not wasn't using javascript, just PHP, I could get about 15,000 records into the table before it would time out. Right now it takes about 10 minutes for it to do the same number of records.
I know that the continuous running javascript is bleeding memory and performance like crazy and was just wondering if anyone had any ideas on how to accomplish running a PHP script over and over from a browser. Crons are not an option at this point.
Its called (async) work/job queues, seems you need to explore Gearman
Couldn't you just have the PHP script itself repeat the function multiple times? If the problem is that the function sometimes fails or times out, could you could catch the exception within your script? Or do you have an unavoidable and totally fatal error that really necessitates using an external minder?
I ran into a similar situation... my solution was to use an ajax queue. Essentially you feed a series of ajax calls into a queue which runs them sequentially, starting the next after the previous has returned from the server as successful.
Setting a timeout can run into a situation where the next ajax call is made before the server completed the last. This is the likely cause of your performance issue. I don't really like javascript timeouts myself just for the resource overuse alone.
Google "Ajax Queue" for code that you find useful, or I can post mine, which is jQuery.
configure a cronjob to run your script every minute
I have to call this function
$rep_id=$this->getit($domain);
but some domain takes 2/3 minutes I want to go next if it take long time. I have set set_time_limit(3000); at the begin of php page
set_time_limit() won't work as that sets the time for the script as a whole. I'm not sure if this is really possible with php but you might be able to pull it off with forking. I'm thinking something along the lines of starting a timer (using time() for a timestamp) and looping until it reaches X time, meanwhile forking your $this->geitit() as a child process. Then when the timer runs out, kill the child process. Might work, but dunno.
I guess maybe an alternative is to make it a separate script with a specified timeout using that set_time_limit() and then call it from a main script using exec()
Assuming the call to getit() is in the main thread (i.e. not being spawned in a new thread), the execution of it will likely block the rest of the script. However, if getit() is self-aware with its own timer, you could design it to dump out if that execution limit is reached, returning an error code indicating the problem. If you decide to embark on this change, consider adding an optional paramater to specify the time limit on call.
hello i have some problems with my php ajax script
i'm using PHP/mysql
i have a field in my accounts table that will save the time for the last request from a user, i will use that to kick the idle user out of the chat. and i will make a php function that will delete all the rows that its time field more than the time limit, but where should i use this method is it okay to fire it every time a new request sent to my index.php ? i think that will make a huge load on the server,is n't it ? do you have a better solution?
thanks
There are two viable solutions:
either create a small PHP script that makes this deletion in an infinite loop (and of course sleeps for a specified amount of time before doing it again), and then start it via PHP CLI,
or create one that makes the deletion only once, then exits, and call it from cron (if you're using a UNIXish server) or Task Scheduler (on Windows).
The second one is simpler, but its drawback is that you can't make the interval between the deletions shorter than 60 seconds.
A solution could be to fire the deletion function just once every few requests.
Using rand() you could give it a 1 in 100 (for example) change of running the function, so that about one page request in a 100 will clean up the expired data.