How to make multiple parallel PHP requests wait for cache update? - php

I have a standard scenario where you have multiple parallel requests trying to access the same key in Redis based cache.
When this key is expired the requesting process notifies some external worker that it needs to be recomputed (the worker might possibly be on another server). The worker recomputes it and updates the cache.
When the cache is hot, everything is fine because I can keep serving the stale data from the cache until the new value is recomputed.
The problem is when the cache is cold and there is no data in Redis to serve yet. The requesting process needs to wait until the value is generated by the external worker. I can't use a cache warm-up in this case because only the keys that are requested should be cached because of limited cache size.
So the question is how can I make PHP requests to wait until the computed value is available in Redis? Or what would be the common solution in this case?
The possible solutions I have already though about:
The Redis blpop command probably would not work because the value being recomputed is not in a list and feels a bit like a workaround.
Maybe it is possible to implement some kind of file based lock? However web app and worker are on separate servers and NFS does not support file locks for example.
The only possible working solution I could think of is to have an infinite while loop that pings Redis every X miliseconds with some Y max wait time. However, is this really a good and practical solution? Because I am not a fan of having infinite loops in supposedly short-living web requests. Besides, hundreds of requests would potentially be running infinite loops and waiting while the value is being recomputed.

Related

Temporary storage for collecting data prior to sending

I'm working on a composer package for PHP apps. The goal is to send some data after requests, queue jobs, other actions that are taken. My initial (and working) idea is to use register_shutdown_function to do it. There are a couple of issues with this approach, firstly, this increases the page response time, meaning that there's the overhead of computing the request, plus sending the data via my API. Another issue is that long-running processes, such as queue workers, do not execute this method for a long time, therefore there might be massive gaps between when the data was created and when it's sent and processed.
My thought is that I could use some sort of temporary storage to store the data and have a cronjob to send it every minute. The only issue I can see with this approach is managing concurrency on hight IO. Because many processes will be writing to the file every (n) ms, there's an issue with reading the file and removing lines that had been already sent.
Another option which I'm trying to desperately avoid is using the client database. This could potentially cause performance issues.
What would be the preferred way to do this?
Edit: the package is essentially a monitoring agent.
There are a couple of issues with this approach, firstly, this increases the page response time, meaning that there's the overhead of computing the request, plus sending the data via my API
I'm not sure you can get around this, there will be additional overhead to doing more work within the context of a web request. I feel like using a job-queue based/asynchronous system is minimizing this for the client. Whether you choose a local file system write, or a socket write you'll have that extra overhead, but you'll be able to return to the client immediately and not block on the processing of that request.
Another issue is that long-running processes, such as queue workers, do not execute this method for a long time, therefore there might be massive gaps between when the data was created and when it's sent and processed.
Isn't this the whole point?? :p To return to your client immediately, and then asynchronously complete the job at some point in the future? Using a job queue allows you to decouple and scale your worker pool and webserver separately. Your webservers can be pretty lean because heavy lifting is deferred to the workers.
My thought is that I could use some sort of temporary storage to store the data and have a cronjob to send it every minute.
I would def recommend looking at a job queue opposed to rolling your own. This is pretty much solved and there are many extremely popular open source projects to handle this (any of the MQs) Will the minute cron job be doing the computation for the client? How do you scale that? If a file has 1000 entries, or you scale 10x and has 10000 will you be able to do all those computations in less than a minute? What happens if a server dies? How do you recover? Inter-process concurrency? Will you need to manage locks for each process? Will you use a separate file for each process and each minute? To bucket events? What happens if you want less than 1 minute runs?
Durability Guarantees
What sort of guarantees are you offering your clients? If a request returns can the client be sure that the job is persisted and it will be completed at sometime in the future?
I would def recommend choosing a worker queue, and having your webserver processes write to it. It's an extremely popular problem with so many resources on how to scale it, and with clear durability and performance guarantees.

Zend_Session::Start intolerably slow (but only sometimes)

Yes, I've read session_start seems to be very slow (but only sometimes), but my problem is slightly different.
We have a PHP application that stores very simple sessions in memcached (elasticache to be specific), and have been monitoring our slowest-performing pageloads. Almost all of the slow ones spend a majority of their time in Zend_Session::Start, and we can't figure out why. It's a very AJAX-y front end, moving more and more toward a single-page app, making a number of simultaneous requests to the backend per pageload, and some of the requests take up to three to four times as long as they should based solely on this.
It's not every request, obviously, but enough of them that we're concerned. Has anyone else seen this behavior? We were under the impression that memcache is not blocking (how could it be?), so the very worst would be a user has a bum session, but multiple-second wait times in session_start seems inexplicable.
Take a look at your session garbage collection mechanism (play with probability and divisor).
If gc is slowing down the app, consider cleaning the old sessions manually (e.g. setting gc probability to 1 and running gc script via cron).
Another possibilities:
session mechanism is locked (so the app waits to unlock, then write)
too much data saved

PHP Script dies due to expired time while getting from memcached

I have a situation where I need rapid and very frequent updates from a website's API. (I've asked them about how fast I can hammer them and they've said as fast as you like.)
So my design architecture is to create several small fast running PHP scripts that do a very specific action, save the result to memcache, and repeat. So the first script grabs a single piece of data via their API and stores it in memcache and then asks again. A second script processes the data the first script stored in memcache and requests another piece of data from the API based on the results of that processing. The third uses the result from the second, does something with that data, asks for more data via the API, on up the chain until a decision is made to execute via their API.
I am running these scripts in parallel on a machine with 24 GB RAM and 8 cores. I am also using supervisor in order to manage them.
When I run every PHP script manually via CLI or browser they work fine. They don't die except where I've told them to for the browser so I can get some feedback. The logic is fine, the scripts runs fine, etc, etc.
However, when I leave them running infinitely vai supervisor the logs fill up with Maximum execution time reached errors and the line it points to is line in one of my classes that gets the data from memcache. Sometimes it bombs on a check to see if the data is JSON (which it should always be), sometimes it bombs elsewhere in the same function/method. The timeout is set for the supervisor managed script is 5 sec because the data is stale by then.
I have considered upping the execution time but
the data will be stale by then,
memcache typically returns in less than 1 msec so 5 sec is an eternity,
none of the scripts have ever failed due to timeout when manually (CLI or browser) run
Environment:
Ubuntu 12.04 Server
PHP 5.3.10-1unbuntu3.9 with Suhosin-Patch
Memcached 1.4.13
Supervisor ??
Memcache Stats (from phpMemcachdAdmin):
Size: 1 GB
Uptime: 17 hrs, 38 min
Hit Rate: 76.5%
Used: 18.9 MB
Wasted: 18.9 MB
Bytes Written: 307.8 GB
Bytes Read: 7.2 GB
Here's a screenshot:
--------------- Additional Thoughts/Questions ----------------
I don't think it was clear in my original post that in order to get rapid updates I am running multiple copies in parallel of the scripts that grab API data. So if one script is grabbing basic account data looking for a change to trigger another event, then I actually have at least 2 instances running concurrently. This is because my biggest risk factor is stale data causing a delayed decision combined with a 1+ sec response time from the API.
So it occurred to me that the issue may stem from write conflicts where 2 instances of the same script are attempting to write to the same cache key. My initial Googling didn't lead to any good material on possible write conflicts/collisions in memcache. However, a little deeper dive provided a page where a user with 2 bookmarking sites powered by Elgg off of 1 memcache instance ran into what he described as collisions.
My initial assumption when deciding to kick multiple instances off in parallel was that Supervisor would kick them off in a sequential and therefore slightly staggered manner (maybe a bad assumption, I'm new to using Supervisor). Additionally, the API would respond at different rates to each call. Thus with a write time in the sub-millisecond time frame and an update from each once every 1-2 seconds the chances of write conflicts/collisions seemed pretty low.
I'm considering using some form of prefix/postfix with the keys. Each instance already has it's own instance ID created from an md5 hash. So I could prefix or postfix and then have each instance write to it's own key. But then I need another key that holds all of those prefixed/postfixed keys. So now I'm doing multiple cache fetches, a loop through all the stored data, and a discard of all but one of those results. I bet there's a better/faster architecture out there...
I am adding the code to do the timing Aziz asked for now. It will take some time to add the code and gather the data.
Recommendations welcome

Cross server MySQL connection and requests

I'm going to be using Nodejs to process some CPU intense loop operations with sending emails to registered users as PHP was using too much during the time it runs and freezes the site.
One thing is that Nodejs will be on different server and do a request using external connection in MySQL.
I've heard that external db connection is bad for performance.
Is this true? And are there any pros and cons of doing this?
Keep in mind, when running a CPU intensive operation in Node the whole application blocks as it runs in a single thread. If you're going to run a CPU intensive operation in Node, make sure you spawn it off into a child process who's only job is to run the calculation and then return to the primary application. This will ensure your Node app is able to continue responding to income requests as the data is being processed.
Now, onto your question. Having the database on a different server is extremely common and typically is a good practice to have. Where you can run into performance problems is if your database is in a different data center entirely. The further (physically) your database server is from your application server, the more latency there will be per request.
If these requests are seriously CPU intensive, you should consider looking into a queueing mechanism for a couple reasons. One, it ensures that even in the event of an application crash, you don't lose a request that is being processed. Two, you can monitor the queue, and scale the number of workers processing the queue in the event that the operations are piling to the point that a single application can't finish processing one before another comes in.

Redis request latency

I'm using redis (2.6.8) with php-fpm and phpredis driver and have some trouble with redis latency issues. Under certain load first request to redis from our application takes about 1-1.5s and redis-cli --latency shows the same latency.
I've already checked the latency guide.
We use redis on the same host with Unix sockets
slowlog has no entries longer 5ms
we don't use AOF
redis takes about 3.5Gb memory of 16Gb available (i suppose it's not too much)
our system is not swapping
there is no other process doing disk I/O
I'm using persistent connections and amount of connected client is varying from 5 to 25 (sometimes strikes to 60-80).
Here is the graph.
It looks like problems starts when there are 20 or more simultaneously connected clients.
Can you help me to figure out where is the problem?
Update
I investigated the problem and it seemed like redis did not have enough processor time for some reason to operate properly.
I thoroughly checked communication between php-fpm and redis with the help of network sniffer. Redis received request over tcp but sent the answer back only after one and a half seconds. It obviously signified that the problem is inside redis, that it cannot process so many requests in the given conditions (possibly processor starvation as the processor was only 50% loaded for the whole system).
The problem was resolved by moving redis to other server that was nearly idle. I suppose that we should have played with linux scheduler to make it work on the same server, but have not done it yet.
Bear in mind that Redis is single-threaded. If the operations that you're doing err on the processor-intensive side, your requests could be blocking on each other. For instance, if you're doing HVALS against hashes with very large values, you're going to make all of your clients wait while you pull out all that data and copy it to the output buffer.
Part of what you need to do here (regardless if this is the issue) is to look at all of the commands that you're using and determine the complexity of each command. If you're doing a bunch of O(N) commands against very large amounts of data, it's not impossible that you're simply doing too much stuff at a time.
TL;DR Nobody on here can debug this issue with real certainty without knowing which commands you're using and what your data looks like. But you can look up the time complexity of each method you're using and make sure it's reasonable.
I ran across this in researching an issue I'm working on but thought it might help here:
https://groups.google.com/forum/#!topic/redis-db/uZaXHZUl0NA
If you read through the thread there is some interesting info.

Categories