I've modified a basic web-crawler to gather up a list of links to a site, which is likely to run into the thousands.The problem I'm having is that the script is timing out once I try and run it through a browser on top of this its been mentioned in a previous question I asked that there also may be an issue with the script running to many processes at the same time killing the server I run it on.
How would I got about fixing these issues or should I go with a open source crawler and if so which crawler should I go with as I can't find anything specific enough,as phpDig site is down :/
previous question
Processes like this are best run as PHP CLI cron jobs.
If you need to be able to run it on demand from a web interface then consider adding it to a queue to be run in the background using Gearman or even the unix at command.
It so happens that I have written a PHP wrapping class for the linux at job queue, which is available from my github account should you choose to go down that route.
Related
I have created a script on PHP that creates cache files from API and it takes around 30 minutes to load the page completely means when it creates all cache files.
I have a concern that my hostinger's customer support is telling me that it won't run for 30 minutes but in some answers, I found that it can run in the background and nothing to worry about until it's loaded.
So is that possible that the cronjob will run up to 30 minutes?
If not what is the best solution to run that cache making script at a specific time in the background like the cronjob does? Please Explain in brief so I can get a way.
Thanks for the great answer.
Ideally, for long running tasks, the task should be hosted in a platform that allows extended operations and defined in a way that it can be externally triggered, this might be in the form of an endpoint in a web API.
Then you can use the cronjob to trigger that process.
Without creating a whole API, you could make this a single endpoint on your website, a hidden page that only the cronjob knows how to call, then run your script from there.
There are lots of ways around this but the methodology is similar just use the cronjob as the trigger to a different process. Move the core logic of your script to a platform that allows the long execution time.
This is a similar post: Run a “long” php-script via Cronjob with an answer that suggests you can try to execute the script without waiting for the response, that is the same expectation with calling an external web process or API, the cronjob should not wait for a response.
It's good practice to limit resources on web server, especially in the shared hosting account. Because, in most cases, it may cause the web server to slow down and Denial of Services situation.
It's recommended to run the script using php-cli and cron.
php-cli offer much more relaxation, such as time and resource limitation. Please also read
Events in MariaDB VS Cron in php - which is better
i started to learn programming like a month ago. I already knew html and css, i thought i should learn PHP. I learned alot of it from from tutorials and books, now I am making mysql based websites for practice.
I always used to play browser based strategy games like travian when i was a kid. I was thinking about how those sites worked. I didnt have any problem till i realized that the game actually worked after you closed the browser. For example; you log in to your account and start a construction and log off. But even after you close the browser, game knows that in "x" amount of time it needs to update your data of that specific building.
can someone tell me how that works? is it something with php or MySQL or some other programming language? even if you can tell me what to search online, it would be enough.
Despite being someone who loves tackling steep learning curves, I would advise against trying jump into something that requires background processes until you have a bit more programming experience.
But either way, here's what you need to know:
Normal PHP Process
The way that PHP normally works is the following:
User types a url into the browser and hits enter (or just clicks on a link)
Request is sent to a bunch of servers and magically finds its way to the right web server (beyond scope of this answer)
Server program like Apache or IIS listening on port 80 grabs the request
Apache sees that there's a .php extension on the requested page
Apache looks up if any processors have been assigned to .php and finds php.exe
The requested page is fed into php.exe
php.exe starts up a new process for the specific user, runs everything on the script, returns its result
The result is then sent back to the user
When the user closes the browser and ends the "session", the process started by php exits
So the problem you encounter when you want something running in the background is that PHP in most cases is generally accessed through the web server, and hence usually requires a browser (and user making requests through the browser). And since closing the browser ends the process, so you need a way to run php scripts without a browser.
Luckily PHP can be accessed outside of just the webserver as a normal process on the server. But then the problem is that you have to access the server. You probably don't want your users to ssh into your server in order to manually run scripts (and I'm assuming you don't want to do it manually on behalf of your users every single time either). Hence you have the options either creating cronjobs that will automatically execute a command at a specific frequency as if you had typed it in yourself on your server's commandline. Another option is to manually start a script once that doesn't shutdown unless your server shuts down.
Triggering a Script based on Time:
Cron that is a task scheduler on *nix systems and Windows Task Scheduler on Windows. What you can do is set up a cronjob to run a specific php file at a specific frequency, and execute all the "background" tasks you need to run from within there.
One way of doing this would be to have a mysql table containing things that need to be executed along with when they need to be executed. The script then queries the table based on time to retrieve which tasks need to be executed, executes them, and then marks them executed (or just deletes them) in the mysql table.
This is a basic form of process queuing.
Building a Queue Server
This is a lot more advanced, but here's a tutorial for creating a script that will queue processes in the background without the need for any external databases: Building a Queue Server in PHP .
Let me know if this makes sense or if you have any questions :)
PHP is a server side language. Any time anybody accesses a PHP program on the server, it runs, irrespective of who is a client.
So, imagine a program that holds a counter. It stores this in a database. Every time updatecounter.php is called, the counter gets updated by one.
You browse to updatecounter.php, and it tells you that the counter is now at 34.
Next time you browse to updatecounter.php it tells you that the counter is at 53.
Its gone up by 18 more counts than you were expecting.
This is because updatecounter.php was being run without your intervention. It was being run by other people.
Now, if you looked at updatecounter.php, you might see code like this:
require_once("my_code.php);
$counterValue = increment_counter_value();
echo "New Counter Value = ".$counterValue;
Notice that the main core of the program is stored in a separate program than the program that you are calling.
Also, notice that instead of calling increment_counter_value, you could call anything. So every time somebody browsed to updatecounter.php, or whatever your game would be called, the internal game mechanics could be run. You could for instance, have an hourly stat management routine which would check each time it was called if it had been run in the last hour, and if it hadn't it would perform all the stats.
Now, what if nobody else is playing your game? If that happens, then the hourly stat management wouldn't get called, and your game world would die. So what you would need to do is create another program who's sole function is to run your stats. You would then schedule that program on the server to run at an hourly interval. You do this using something called a CRON job. You will probably find that your host already has this facility built in, if you are on Apache. I won't go into any more detail about task scheduling as without knowing your environment its impossible to give the correct answer. But basically, you would need to schedule a PHP program to run on the server to perform the hourly maintenance.
Here's a tutorial on CRON jobs:
http://net.tutsplus.com/tutorials/other/scheduling-tasks-with-cron-jobs/
I haven't used it myself but I've had no problems with other stuff on tutsplus so you should be ok.
This is not only php . Browser based game are combination of php/mysql/javascript/html . There are lot of technologies being used for this kind of work. When you are doing something on the browser, lets say adding a building ,an ajax request is being sent to the server so the server updates the database (can't wait until logout because then other users won't know your status to play (in case of multiparty) .
I'm developing a PHP-service which does numerous operations per customer, and I want this to run continuously. I've already taken a look at cron, but as far as I understood cron made it possible to run the code on set times. This can be a bit dangerous since we are dependant that the code has finished running before it starts over, and the time for each run may vary as the customer base increases. So refresh, cron or other timed intervals cant be done, as far as I'm aware.
So I'm wondering if you know any solutions where I can restart my service when it is finished, and under no circumstances make the re-run before all the code have been executed?
I'm sorry if this is answered before or is easily found on Google, I have tried to find something, but to no avail.
Edit: I could set timed intervals to be 1 hour, to be absolutely sure, but I want as little time as possible between each run.
Look at this:
http://www.godlikemouse.com/2011/03/31/php-daemons-tutorial/
What you need is a daemon that keeps running. There are more solutions than this while loop.
The following I once used in a project: http://kvz.io/blog/2009/01/09/create-daemons-in-php/ , it's also a package for PEAR: http://pear.php.net/package/System_Daemon
For more information, see the following SO links:
What is a daemon: What is daemon? Their practical use? Usage with php?
How to use: PHP script that works forever :)
Have you tried runnning the PHP script as a process. This here has more details http://nsaunders.wordpress.com/2007/01/12/running-a-background-process-in-php/
If you do not want to learn how to code a daemon, I recommand using a software that manages processes in userland: Supervisor (http://supervisord.org/)
You just need to write a configuration file to specify which processes you want to run, and how.
It is extremely simple to configure and it is very adaptable (you can force having only one instance of your process, or instead have a fixed number of instances... etc).
It will also handle automatic restart in case your script crashes, and logging.
On the PHP side, just create a script that never quits, using a while(true) { ... } loop, and add an entry like this in supervisord's conf:
[program:your-script]
command=/usr/bin/php /path/to/your_script.php
I'm using that software in production for a few projects (to run ruby and php gearman asynchronous workers for websites).
Try to have a custom logic , where you can set the flag ON and OFF and in your CRON , you can check before running the code inside it. I wanted to suggested something like Queue based solution , once you get the entry , then run the logic of your processing . Which can be either daemon or cron. It will give more control if your task is OK to execute now . Edited it
i wonder how can i schedule and automate tasks in PHP? can i? or is web server features like cron jobs needed.
i am wondering if there is a way i can say delete files after say 3 days when the file are likely outdated or not needed
PHP natively doesn't support automating tasks, you have to build a solution yourself or search google for available solutions. If you have a frequently visited site/page, you could add a timestamp to the database linking to the file, when visiting your site in a chosen time (e.g. 8 in the morning) the script (e.g. deleteOlderDocuments.php) runs and deletes the files that are older.
Just an idea. Hope it helps.
PHP operates under the request-response model, so it won't be the responsibility of PHP to initiate and perform the scheduled job. Use cron, or make your PHP site to register the cron jobs.
(Note: the script that the job executes can be written in PHP of course)
In most shared hosting environments, a PHP interpreter is started for each page request. This means that for each PHP script in said environment, all that script will know about is the fact that it's handling a request, and the information that request gave it. Technically you could check the current time in PHP and see if a task needs to be performed, but that is relying on a user requesting that script near a given time.
It is better to use cron for such tasks. especially if the tasks you need performed can be slow -- then, every once in a while, around a certain time, a user would have a particularly slow response, because them accessing a script caused the server to do a whole bunch of scheduled stuff.
I am working on a site that require a php script running on a server without any request,
it is a bot script that keeps (not full time but at least once a day) checking client accounts and send alert messages to clients when something happens.
any ideas are appreciated.
Assuming you need to do this on linux, you may run any php script from the browser and from the CLI as well.
You may run a simple php script:
<? echo "Ana are mere"; ?>
like this:
php -f ./index.php
Be careful about file-permissions, and any bug that may creep inside your code, memory leaks or unallocated variables will become VERY visible now, as the process will run continuously.
If you dont want it running in the background all the time, take a look at crontab (http://unixgeeks.org/security/newbie/unix/cron-1.html) to be able to start jobs regularly.
-- edit--
take a look at php execute a background process and PHP: How to return information to a waiting script and continue processing
Basically you want to start a background process, and you may do this by either using exec() or fsockopen() or a file_get_contents() on your own script probably in this order, if don't have access to exec, or socket functions.
Also take a look at http://us2.php.net/manual/en/function.session-write-close.php so the "background script" won't "block" the request and http://us2.php.net/manual/en/function.ignore-user-abort.php
Use a cron job to do it http://www.cronjobs.org/
You can automatically call a script at any interval you like indefinitely. Your hosting provider should support them if they are good.
You should also consider putting a unique key on the end of the page
ie. www.yoursite.com/cronjob.php?key=randomstring
and then only run the script if the key is correct, to prevent bots and other users from running the script when you don't want it run.
If you can't create a cron job, then create a page that does what you want and create a scheduled task on another machine (maybe your PC?) that just goes out and hits that page at a certain time every day.
It's really a hack, but if you absolutely can't set up a cron job, it would be an option.
As Evernoob and Quamis said, you want to have a cron job (UNIX/Linux/Mac OS) or a scheduled task (MS Windows). Furthermore, you can either have the PHP script run using the PHP command line interface (CLI), in which case you can invoke the PHP executable and then your script name. As an alternate, you can use a tool like wget (availble on all platforms) to invoke the PHP script as if someone had typed the URL in the location bar of a web browser.
A php script could not be used like you imagine here. Because it's executed through apache after a request from somewhere.
Even if you do while(1) in your script, apache/php will automaticly stop your script.
Responding to your comment, yes you'll need ssh access to do this, except if your web interface allow you to add cronjob.
Maybe you can write a service which can be executed with a program on another server and do the job.
If you have no access to the server the easiest way would probably be to hit it through the browser, but that would require you or an external script hitting the URL at the same interval each day when you wanted it to one. You may also be able to setup a Selenium test suite that runs locally on a schedule and hits the page. I'm not 100% if that's possible with Selenium though, you may need some 3rd-party apps to make it happen.
Something else you could try would be to see about using PHP's Process Control Functions (link). These will let you create a script that is a deamon and runs in the background. You may be able to do this to keep the script running on the server and firing off commands at programmed intervals. You will still need some way to get it running the first time (browser request or via command line) though.