Executing scripts in background permanently on webserver - php

To extend the request limits I want to fetch data from an API endpoint and provide them to my users from a third party hosting platform. They usually support php so I was thinking of using it. The data should update like once a minute or every two minutes. The fetching process itself could be as simple as possible, e.g. like this:
$json = file_get_contents('abc.com/xyz');
file_put_contents('example.json', $json);
Like this an endpoint would be fetched and written into a local file. But to repeat this step continuously and keep the data updated this script would be needed to run permanently or executed frequently. The only way I found was to use cron jobs for that issue but would that be recommendable to use to keep files updated? Or are there way better methods to do this?
I know that there are better setups to solve that issue like handling it with node.js but I consider using a platform like this so I only have to manage the communication between the API and the server and not between server and clients and didn’t find another way to do so but I‘m open to other suggestions!

While it can be done differently (like with node.js you mentioned or other methods), I believe that a system cron job to be run every X minutes (depending on how long it takes for the API to respond) will suffice and keep things simple.
Provided of course that you are able to set-up system cron jobs on your webserver.

Related

Can a cronjob continue to run 30 minutes continuously?

I have created a script on PHP that creates cache files from API and it takes around 30 minutes to load the page completely means when it creates all cache files.
I have a concern that my hostinger's customer support is telling me that it won't run for 30 minutes but in some answers, I found that it can run in the background and nothing to worry about until it's loaded.
So is that possible that the cronjob will run up to 30 minutes?
If not what is the best solution to run that cache making script at a specific time in the background like the cronjob does? Please Explain in brief so I can get a way.
Thanks for the great answer.
Ideally, for long running tasks, the task should be hosted in a platform that allows extended operations and defined in a way that it can be externally triggered, this might be in the form of an endpoint in a web API.
Then you can use the cronjob to trigger that process.
Without creating a whole API, you could make this a single endpoint on your website, a hidden page that only the cronjob knows how to call, then run your script from there.
There are lots of ways around this but the methodology is similar just use the cronjob as the trigger to a different process. Move the core logic of your script to a platform that allows the long execution time.
This is a similar post: Run a “long” php-script via Cronjob with an answer that suggests you can try to execute the script without waiting for the response, that is the same expectation with calling an external web process or API, the cronjob should not wait for a response.
It's good practice to limit resources on web server, especially in the shared hosting account. Because, in most cases, it may cause the web server to slow down and Denial of Services situation.
It's recommended to run the script using php-cli and cron.
php-cli offer much more relaxation, such as time and resource limitation. Please also read
Events in MariaDB VS Cron in php - which is better

Best way to handle long-running tasks on a web server

For the past few days I've been googling about how should I handle long running tasks on a web server. I have found a lot of good answers of how to run them in a specific language, but not what languages to choose for this kind of job.
So, I have a web server which is running some kind of custom e-commerce platform. There is another server where some products dostributor is providing access to its data through API. I have to sync products list across these two servers. Product base is pretty big (about 100000 products).
My Idea was to write php script which collects data from various API endpoints and updates database accordingly. But it gonna take a long time, so without hard modofications and deep-diving to php itself it will timeout.
Now I'm thinking, maybe I should write python script which goes through API endpoints and collects data about each product. After data about product is collected python script could initiate php script which could update data in database about that particular product..
What are your toughts about it? What would be the best way to handle it?
This sounds like something you could be better off doing from the server-side with Cron, for example and not from a browser. Unless, of course, you need to do this manually at random times by people who have no terminal access.
If it has to be PHP, you can run that with cron or from terminal and disable the timeout (see: http://php.net/manual/en/function.set-time-limit.php ) and even leave it as a background process if needed with &. This way you wouldn't be limited by Apache (or whatever server you are using) time limits.

PHP infinite loop process; Is it good solution?

I'm creating a plugin for a CMS and need one or more preriodical tasks in background. As it is a plugin for an open source CMS, cron job is not a perfect solution because users may not have access to cron on their server.
I'm going to start a infinite loop via an AJAX request then abort XHR request. So HTTP connection will be closed but script continue running.
Is it a good solution generally? What about server resources? Is there any shutdown or limitation policies in servers (such as Apache) for long time running threads?
Long running php scripts are not too good idea. If your script uses session variables your user won't be able to load any pages until the other session based script is closed.
If you really need long running scripts make sure its not using any session and keep them under the maximum execution time. Do not let it run without your control. It can cause various problems. I remember when I made such a things like that and my server just crashed several times.
Know what you want to do and make sure it's well tested on different servers.
Also search for similiar modules and check what methods they use for such a problems like that. Learn from the pros. :)

Automating tasks with PHP

i wonder how can i schedule and automate tasks in PHP? can i? or is web server features like cron jobs needed.
i am wondering if there is a way i can say delete files after say 3 days when the file are likely outdated or not needed
PHP natively doesn't support automating tasks, you have to build a solution yourself or search google for available solutions. If you have a frequently visited site/page, you could add a timestamp to the database linking to the file, when visiting your site in a chosen time (e.g. 8 in the morning) the script (e.g. deleteOlderDocuments.php) runs and deletes the files that are older.
Just an idea. Hope it helps.
PHP operates under the request-response model, so it won't be the responsibility of PHP to initiate and perform the scheduled job. Use cron, or make your PHP site to register the cron jobs.
(Note: the script that the job executes can be written in PHP of course)
In most shared hosting environments, a PHP interpreter is started for each page request. This means that for each PHP script in said environment, all that script will know about is the fact that it's handling a request, and the information that request gave it. Technically you could check the current time in PHP and see if a task needs to be performed, but that is relying on a user requesting that script near a given time.
It is better to use cron for such tasks. especially if the tasks you need performed can be slow -- then, every once in a while, around a certain time, a user would have a particularly slow response, because them accessing a script caused the server to do a whole bunch of scheduled stuff.

PHP: Multithreaded PHP / Web Services?

Greetings All!
I am having some troubles on how to execute thousands upon thousands of requests to a web service (eBay), I have a limit of 5 million calls per day, so there are no problems on that end.
However, I'm trying to figure out how to process 1,000 - 10,000 requests every minute to every 5 minutes.
Basically the flow is:
1) Get list of items from database (1,000 to 10,000 items)
2) Make a API POST request for each item
3) Accept return data, process data, update database
Obviously a single PHP instance running this in a loop would be impossible.
I am aware that PHP is not a multithreaded language.
I tried the CURL solution, basically:
1) Get list of items from database
2) Initialize multi curl session
3) For each item add a curl session for the request
4) execute the multi curl session
So you can imagine 1,000-10,000 GET requests occurring...
This was ok, around 100-200 requests where occurring in about a minute or two, however, only 100-200 of the 1,000 items actually processed, I am thinking that i'm hitting some sort of Apache or MySQL limit?
But this does add latency, its almost like performing a DoS attack on myself.
I'm wondering how you would handle this problem? What if you had to make 10,000 web service requests and 10,000 MySQL updates from the return data from the web service... And this needs to be done in at least 5 minutes.
I am using PHP and MySQL with the Zend Framework.
Thanks!
I've had to do something similar, but with Facebook, updating 300,000+ profiles every hour. As suggested by grossvogel, you need to use many processes to speed things up because the script is spending most of it's time waiting for a response.
You can do this with forking, if your PHP install has support for forking, or you can just execute another PHP script via the command line.
exec('nohup /path/to/script.php >> /tmp/logfile 2>&1 & echo $!'), $processId);
You can pass parameters (getopt) to the php script on the command line to tell it which "batch" to process. You can have the master script do a sleep/check cycle to see if the scripts are still running by checking for the process id's. I've tested up to 100 scripts running at once in this manner, at which point the CPU load can get quite high.
Combine multiple processes with multi-curl, and you should easily be able to do what you need.
My two suggestions are (a) do some benchmarking to find out where your real bottlenecks are and (b) use batching and cacheing wherever possible.
Mysqli allows multiple-statement queries, so you could definitely batch those database updates.
The http requests to the web service are more likely the culprit, though. Check the API you're using to see if you can get more info from a single call, maybe? To break up the work, maybe you want a single master script to shell out to a bunch of individual processes, each of which makes an api call and stores the results in a file or memcached. The master can periodically read the results and update the db. (Careful to rotate the data store for safe reading and writing by multiple processes.)
To understand your requirements better, you must implement your solution only in PHP? Or you can interface a PHP part with another part written in another language?
If you could not go for another language, try to perform this update maybe as php script that runs in the background and not through the apache.
You can follow Brent Baisley advice for a simple use case.
If you want to build a robuts solution, then you need to :
set up a representation of the actions in a table in database that will be your process queue;
set up a script that pop this queue and process your action;
set up a cron daemon that run this script every x.
This way you can have 1000 PHP scripts running, using your OS parallelism capabilities and not hanging when ebay is taking to to respond.
The real advantage of this system is that you can fully control the firepower you throw at your task by adjusting :
the number of request one PHP script does;
the order / number / type / priority of the action in the queue;
the number or scripts the cron daemon runs.
Thanks everyone for the awesome and quick answers!
The advice from Brent Baisley and e-satis works nicely, rather than executing the sub-processes using CURL like i did before, the forking takes a massive load off, it also nicely gets around the issues with max out my apache connection limit.
Thanks again!
It is true that PHP is not multithreaded, but it can certainly be setup with multiple processes.
I have created a system that resemebles the one that you are describing. It's running in a loop and is basically a background process. It uses up to 8 processes for batch processing and a single control process.
It is somewhat simplified because i do not have to have any communication between the processes. Everything resides in a database so each process is spawned with the full context taken from the database.
Here is a basic description of the system.
1. Start control process
2. Check database for new jobs
3. Spawn child process with the job data as a parameter
4. Keep a table of the child processes to be able to control the number of simultaneous processes.
Unfortunately it does not appear to be a widespread idea to use PHP for this type of application, and i really had to write wrappers for the low level functions.
The manual has a whole section on these functions, and it appears that there are methods for allowing IPC as well.
PCNTL has the functions to control forking/child processes, and Semaphore covers IPC.
The interesting part of this is that i'm able to fork off actual PHP code, not execute other programs.

Categories