Multiple PHP requests waiting for one - php

I'm currently developing a website. I want to cache results from an API but the API is slow, so I must handle concurrent PHP requests for same result.
When a PHP is collecting API results for a certain ID, I want next PHP requests for same ID to wait the first to finish and just read the cached value.
My current solution is to add an empty value in cache (and if value is empty new PHP requests will just sleep and recheck) but it sometimes doesn't work.
Is there another or a better way ?

Consider using synchronization provided by PECL Sync library and SyncReaderWriter class, or just by simple File Locking System.
If You want working solution, I've created PHP Cache library that synchronize read/write process especially for task like You described above.
I think you may find it useful, check it here: https://github.com/tztztztz/php-no-slam-cache

Related

Executing scripts in background permanently on webserver

To extend the request limits I want to fetch data from an API endpoint and provide them to my users from a third party hosting platform. They usually support php so I was thinking of using it. The data should update like once a minute or every two minutes. The fetching process itself could be as simple as possible, e.g. like this:
$json = file_get_contents('abc.com/xyz');
file_put_contents('example.json', $json);
Like this an endpoint would be fetched and written into a local file. But to repeat this step continuously and keep the data updated this script would be needed to run permanently or executed frequently. The only way I found was to use cron jobs for that issue but would that be recommendable to use to keep files updated? Or are there way better methods to do this?
I know that there are better setups to solve that issue like handling it with node.js but I consider using a platform like this so I only have to manage the communication between the API and the server and not between server and clients and didn’t find another way to do so but I‘m open to other suggestions!
While it can be done differently (like with node.js you mentioned or other methods), I believe that a system cron job to be run every X minutes (depending on how long it takes for the API to respond) will suffice and keep things simple.
Provided of course that you are able to set-up system cron jobs on your webserver.

Symfony/GuzzleHttp: Limiting API calls across multiple consumers/instances

I've been working on a project for a while that fetches data from an API and processes that data locally for various uses. Currently, a consumer picks up JSON objects from the message queue that it uses to trigger a matching Symfony command. The rate limiting is built in to this one consumer, is fairly simple and adjusts itself automatically to status responses from the API. The problem is, the way it is set up, it cannot run in parallel and if there is a major update to the versioned static data on the API, all processing halts while it caches the new static data.
I looked at using the rabbitmq-bundle Symfony bundle and converting the commands into separate consumers with their own channels so that they can be run in parallel and no longer block each other, however this comes with a couple of issues I'm stuck with how to handle.
The first is that I still need to manage limiting the API calls across all the consumers. I have a wrapper for Guzzle that could, in theory, use a simple file to manage to number of calls across all instances of it. I looked at an existing token bucket library but setting it up to work in Symfony looks problematic as each consumer could potentially reset the number of tokens if the consumer is restarted, so... Not sure where to go with that.
The second is that some consumers may hit data from the main API that we're still do not have the matching version of the static data for. If this happens, it needs to trigger the related consumers but only if there isn't already a trigger in each queue... Possible solution I can see for this is record the latest requested version in a file at the time a message is published to update it and have the consumer wait for the data to be available locally. Again, kind of lost about how best to handle this.

How to process multiple parallel requests from one client to one PHP script

I have a webpage that when users go to it, multiple (10-20) Ajax requests are instantly made to a single PHP script, which depending on the parameters in the request, returns a different report with highly aggregated data.
The problem is that a lot of the reports require heavy SQL calls to get the necessary data, and in some cases, a report can take several seconds to load.
As a result, because one client is sending multiple requests to the same PHP script, you end up seeing the reports slowly load on the page one at a time. In other words, the generating of the reports is not done in parallel, and thus causes the page to take a while to fully load.
Is there any way to get around this in PHP and make it possible for all the requests from a single client to a single PHP script to be processed in parallel so that the page and all its reports can be loaded faster?
Thank you.
As far as I know, it is possible to do multi-threading in PHP.
Have a look at pthreads extension.
What you could do is make the report generation part/function of the script to be executed in parallel. This will make sure that each function is executed in a thread of its own and will retrieve your results much sooner. Also, set the maximum number of concurrent threads <= 10 so that it doesn't become a resource hog.
Here is a basic tutorial to get you started with pthreads.
And a few more examples which could be of help (Notably the SQLWorker example in your case)
Server setup
This is more of a server configuration issue and depends on how PHP is installed on your system: If you use php-fpm you have to increase the pm.max_children option. If you use PHP via (F)CGI you have to configure the webserver itself to use more children.
Database
You also have to make sure that your database server allows that many concurrent processes to run. It won’t do any good if you have enough PHP processes running but half of them have to wait for the database to notice them.
In MySQL, for example, the setting for that is max_connections.
Browser limitations
Another problem you’re facing is that browsers won’t do 10-20 parallel requests to the same hosts. It depends on the browser, but to my knowledge modern browsers will only open 2-6 connections to the same host (domain) simultaneously. So any more requests will just get queued, regardless of server configuration.
Alternatives
If you use MySQL, you could try to merge all your calls into one request and use parallel SQL queries using mysqli::poll().
If that’s not possible you could try calling child processes or forking within your PHP script.
Of course PHP can execute multiple requests in parallel, if it uses a Web Server like Apache or Nginx. PHP dev server is single threaded, but this should ony be used for dev anyway. If you are using php's file sessions however, access to the session is serialized. I.e. only one script can have the session file open at any time. Solution: Fetch information from the session at script start, then close the session.

Can I create a server php variable?

I want to have my own variable that would be (most likely an array) storing what my php application is up to right now.
The application can trigger few processes that are in background (like downloading files) and I want to have a list what is being currently processed.
For example
if php calls exec() that will be downloading for 15mins
and then another download starts
and another download starts
then if I access my application I want to be able to see that 3 downloads are in process. If none of them finished yet.
Can do that? Only in memory, not storing anything on the disk?
I thought that the solution would be a some kind of server variable.
PHP doesn't have knowledge of previous processes. As soon has a php process is finished everything it knows about itself goes with it.
I can think of two options. Write knowledge about spawned processes to a file or database and use it to sync all your php request, (store the PID of each spawned process)
Or
Create an Daemon. The people behind PHP have worked hard to clean up PHP memory handling and such to make this more feasible. Take a look at their PEAR package - http://pear.php.net/package/System_Daemon
Off the top of my head, a quick architecture would compose of 3 peices
Part A) The web app that will take in request for downloads, and report back the progress of all request
Part B) You daemon, which accepts requests for downloads, spawns process, and will report back status of all spawned reqeust
Part C) The spawn request that will perform the download you need.
Anyone for shared memory?
Obviously you would have to have some sort of daemon, but you could use the inbuilt semaphore functions to easily have contact between each of the scripts. You need to be careful though because sometimes if you're not closing the memory block properly, you could risk ending up with no blocks left.
You can't store your own variables in $_SERVER. The best method would be to store your data in a database where and query/update it as required.

PHP: Multithreaded PHP / Web Services?

Greetings All!
I am having some troubles on how to execute thousands upon thousands of requests to a web service (eBay), I have a limit of 5 million calls per day, so there are no problems on that end.
However, I'm trying to figure out how to process 1,000 - 10,000 requests every minute to every 5 minutes.
Basically the flow is:
1) Get list of items from database (1,000 to 10,000 items)
2) Make a API POST request for each item
3) Accept return data, process data, update database
Obviously a single PHP instance running this in a loop would be impossible.
I am aware that PHP is not a multithreaded language.
I tried the CURL solution, basically:
1) Get list of items from database
2) Initialize multi curl session
3) For each item add a curl session for the request
4) execute the multi curl session
So you can imagine 1,000-10,000 GET requests occurring...
This was ok, around 100-200 requests where occurring in about a minute or two, however, only 100-200 of the 1,000 items actually processed, I am thinking that i'm hitting some sort of Apache or MySQL limit?
But this does add latency, its almost like performing a DoS attack on myself.
I'm wondering how you would handle this problem? What if you had to make 10,000 web service requests and 10,000 MySQL updates from the return data from the web service... And this needs to be done in at least 5 minutes.
I am using PHP and MySQL with the Zend Framework.
Thanks!
I've had to do something similar, but with Facebook, updating 300,000+ profiles every hour. As suggested by grossvogel, you need to use many processes to speed things up because the script is spending most of it's time waiting for a response.
You can do this with forking, if your PHP install has support for forking, or you can just execute another PHP script via the command line.
exec('nohup /path/to/script.php >> /tmp/logfile 2>&1 & echo $!'), $processId);
You can pass parameters (getopt) to the php script on the command line to tell it which "batch" to process. You can have the master script do a sleep/check cycle to see if the scripts are still running by checking for the process id's. I've tested up to 100 scripts running at once in this manner, at which point the CPU load can get quite high.
Combine multiple processes with multi-curl, and you should easily be able to do what you need.
My two suggestions are (a) do some benchmarking to find out where your real bottlenecks are and (b) use batching and cacheing wherever possible.
Mysqli allows multiple-statement queries, so you could definitely batch those database updates.
The http requests to the web service are more likely the culprit, though. Check the API you're using to see if you can get more info from a single call, maybe? To break up the work, maybe you want a single master script to shell out to a bunch of individual processes, each of which makes an api call and stores the results in a file or memcached. The master can periodically read the results and update the db. (Careful to rotate the data store for safe reading and writing by multiple processes.)
To understand your requirements better, you must implement your solution only in PHP? Or you can interface a PHP part with another part written in another language?
If you could not go for another language, try to perform this update maybe as php script that runs in the background and not through the apache.
You can follow Brent Baisley advice for a simple use case.
If you want to build a robuts solution, then you need to :
set up a representation of the actions in a table in database that will be your process queue;
set up a script that pop this queue and process your action;
set up a cron daemon that run this script every x.
This way you can have 1000 PHP scripts running, using your OS parallelism capabilities and not hanging when ebay is taking to to respond.
The real advantage of this system is that you can fully control the firepower you throw at your task by adjusting :
the number of request one PHP script does;
the order / number / type / priority of the action in the queue;
the number or scripts the cron daemon runs.
Thanks everyone for the awesome and quick answers!
The advice from Brent Baisley and e-satis works nicely, rather than executing the sub-processes using CURL like i did before, the forking takes a massive load off, it also nicely gets around the issues with max out my apache connection limit.
Thanks again!
It is true that PHP is not multithreaded, but it can certainly be setup with multiple processes.
I have created a system that resemebles the one that you are describing. It's running in a loop and is basically a background process. It uses up to 8 processes for batch processing and a single control process.
It is somewhat simplified because i do not have to have any communication between the processes. Everything resides in a database so each process is spawned with the full context taken from the database.
Here is a basic description of the system.
1. Start control process
2. Check database for new jobs
3. Spawn child process with the job data as a parameter
4. Keep a table of the child processes to be able to control the number of simultaneous processes.
Unfortunately it does not appear to be a widespread idea to use PHP for this type of application, and i really had to write wrappers for the low level functions.
The manual has a whole section on these functions, and it appears that there are methods for allowing IPC as well.
PCNTL has the functions to control forking/child processes, and Semaphore covers IPC.
The interesting part of this is that i'm able to fork off actual PHP code, not execute other programs.

Categories