Redis - on request caching flow - php

I've been searching around the web without success to find the answer, how exactly does the "FLOW" of the application work if you cache on user request.
For example, the most used implementation to handle "on request" caching is the follow:
pseudocode:
if(redis->hasKey('content')) {
return content;
}
else {
get_content_from_database();
cache_content_in_redis();
content_expire(10);
return content;
}
Let's say, there are suddenly 1000 requests on a certain page which uses the logic above.
Logically the first request to hit the if statement, will see there is no content under the certain key and will trigger the "else" part and retrieve the content and cache it.
My questions:
What happens with other requests?
Does the second request in line already see that there is content under the key and retrieve it?
What happens if the write of the content to redis is still in progress from the request which triggered it?
If the second request also executes the "else" part of the statement, will there find a second write in place?
Or will the write get skipped and the content from the databse will be returned until the write is complete?
Who gets the cached content?

To answer your question (without going into discussion if this problem shouldn't be solved differently): you could use locks (either a distributed one if your app is on multiple nodes or PHP's semaphores if your app runs on a single node).
Please note that this is a pseudo-code not an actual PHP implementation:
$contentProvider->getData('my_key');
class ContentProvider
{
private $lockHandler;
public function getData($key)
{
$this->lockHandler->acquire($key);
// Do the if/else part
$this->lockHandler->release($key);
return $data;
}
}
You basically need to block for the whole if/else part. Lets say you have 1000 requests. Then 1 request will acquire the lock and the rest (999) will wait for the one to generate the content and store it in cache. After that the remaining 999 request will follow.
The problem is that now you have 1000 request accessing the cached content sequentially. There is no parallelism anymore because every call to getData will block.
It might not be a big deal if you are running on a single node using semaphores but the moment you need to switch to a distributed locks this can affect your performance dramatically (because all calls to getData run sequentially anyway - even though they are on different nodes).
Having said that I wanted to point out one thing. The above solution will guarantee that cache is only generated once and it may seem to be a viable solution. But in reality it is not. The whole point of having cache is to be fast and able to handle requests concurrently. With locks it is simply impossible.
So if running into such situation is very probable than caching on request is simply not an option and you need to find another solution. However, if this is very unlikely to happen - just stay with the original code (without locks). Even if you generate the same data multiple times every now and then - it is probably not going to be an issue.

Related

How to parallelize requests without mecache in PHP?

The page really needs to load fast, but the DB is slow, so we split it into two db calls, one faster and one slower, the first one that is faster runs and we can serve a part of the page that is quite usable by itself.
But then we want the second request to go off, and we know that it will ALWAYS be necessary to do whenever the first request goes off. So now the first part of the page contains a script which fires off http requests and then we make a db call and finally it loads.
But this is a serial opreation, which means the first part of page load needs to both finish its db, return http, render in the browser, run the script, request http then wait for db and finally return us the whole page.
How do you go about solving this in PHP? We dont have memcache and I looked into fifo but we dont have posix_mkfifo function either.
I want to make two db calls on the first request, serve the first request and part of page, let the second db call continue running, when its finished I want to keep it in /tmp/ or a buffer or wherever fast - in memory - and when the script asks for it - perhaps the scripts http req will need to wait for it some more, perhaps its lucky and will get it served from memory already.
But where in memory do you keep it, across requests and php instances? Not in global, not in session, not in memcached. Where? Sockets?? Should I fork and pipe?
EDIT: Thanks, everybody. I went with the two-async-http-requests route.
I think you could use AJAX.
First time send HTML page with 2 javascript AJAX call, one for each sql query, triggered by page load.
Then load page async with those results.
The problem is that your problem is to complex to solve it without extra solutions like memcache. Direkt in PHP you can save short data in SHM. But thats not the best solution.
The best solution is to build a better database structure so get a better result and a faster response from your database.
For better performance in your database you can look at MySQL memory tables. But be careful the tables will be cleared after restart. So you can fill the tables with data for caching.
And you can send more then one request at a time with Ajax.

Communication between web page and php script triggered from this web page

I have here an myAction function in some controller. And it has one class instance:
public function myAction() {
...
$myAnalyzer = new Analysis();
$myAnalyzer->analyze();
...
}
Let say this function analyze() takes 10 mins. That means it blocks my.phtml 10 mins, which is unacceptable. What I want is to render my.phtml first and then to show intermediate result from analyze() on my.phtml.
function analyze() {
...
foreach($items as $rv) {
...
...
// new result should be stored in db here
}
}
As far as I know, it's impossible, for there is just one thread in PHP. So I decided to ajax-call from my.phtml to run myAnalyzer instance.
First question: is that right? Can I do it in myAction() without blocking?
OK, now I run myAnalyzer using some script, say worker.php, from my.phtml with the help of javascript or JQuery.
Second question: how can I know when each foreach-loop ends? In other words, how can I let worker.php send some signal (or event) to my.phtml or zend framework. I do NOT want to update my.phtml on a time basis using javascript timer. That's all that I need to know, since intermediate data is supposed to be stored in DB.
Third question: the myAnalyzer muss stop, when user leaves pages. For that I have this code.
window.onbeforeunload = function(e) {
// killer.php kills myAnalyzer
};
But how can javascript communicate with myAnalyzer? Is there something like process-id? I mean, when worker.php runs myAnalyzer, it registers its process-id in zend framework. And when user leave page, killer.php stops myAnalyzer using this process-id.
I appreciate the help in advance.
First Q.: Yeah, I'm afraid that is correct.
Second Q.: I do not understand what do you mean here. See code example below
foreach($data => $item) {
...
}
//code here will be executed only after foreach loop is done.
Third Q.: Take a look at this page. You can set this to false (But I suppose it is already like that) and send something to client from time to time. Or you can set it to true and check if user is still connected with connection_aborted function. What I mean here is that you can run your worker.php with ajax and configure your request so browser will not disconnect it because of timeout (so connection will be kept while user stay on page). But it will be closed if user leave the page.
EDIT:
About second question. There are few options:
1) you may use some shared memory (like memcached, for instance). And call server with another ajax request from time to time. So after each loop is ended - you put some value into memcached and during request you can check that value and build response/update your page based on that value
2) There is such thing like partial response. It is possible to get some piece of response with XMLHTTPRequest, but as I remember - that is not really useful at this moment as it is not supported by many browsers. I do not have any details about this. Never tried to use it, but I know for sure that some browsers allow to process portions of response with XMLHTTPRequest.
3) You can use invisible iframe to call your worker.php instead of XMLHTTPRequest. In this case you can send some piece of where you can put a javascript which will call some function in parrent window and that function will update your page. That is one of Long-polling COMET implementations if you want to get some more information. There are some pitfalls (for instance, you may need to ensure that you are sending some specyfic amount of symbols in response in order to get it executed in some browser), but it is still possible to use (some web browser chats are based on this).
2) and 3) is also good because it will solve your third question problem automatically. At the same time 1) may be simpler, but it will not solve a problem in third question.
One more thing - as you will have long running script you must remember that session may block execution of any other requests (if default file based PHP session is used - this will happen for sure)

php maintaining state

This will be a newbie question but I'm learning php for one sole purpose (atm) to implement a solution--everything i've learned about php was learned in the last 18 hours.
The goal is adding indirection to my javascript get requests to allow for cross-domain accesses of another website. I also don't wish to throttle said website and want to put safeguards in place. I can't rely on them being in javascript because that can't account for other peers sending their requests.
So right now I have the following makeshift code, without any throttling measures:
<?php
$expires = 15;
if(!$_GET["target"])
exit();
$fn = md5($_GET["target"]);
if(!$_GET["cache"]) {
if(!array_search($fn, scandir("cache/")) ||
time() - filemtime($file) > $expires)
echo file_get_contents("cache/".$fn);
else
echo file_get_contents(file);
}
else if($_GET["data"]) {
file_put_contents("cache/".$fn, $_GET["data"]);
}
?>
It works perfectly, as far as I can tell (doesn't account for the improbable checksum clash). Now what I want to know is, and what my search queries in google refuse to procure for me, is how php actually launches and when it ends.
Obviously if I was running my own web server I'd have a bit more insight into this: I'm not, I have no shell access either.
Basically I'm trying to figure out whether I can control for when the script ends in the code, and whether every 'get' request to the php file would launch a new instance of the script or whether it can 'wake up' the same script. The reason being I wish to track whether, say, it already sent a request to 'target' within the last n milliseconds, and it seems a bit wasteful to dump the value to a savefile and then recover it, over and over, for something that doesn't need to be kept in memory for very long.
Every HTTP request starts a new instance of the interpreter; it's basically an implementation detail whether this is a whole new process, or a reuse of an existing one.
This generally pushes you towards good simple and scalable designs: you can run multiple server processes and threads and you won't get varying behaviour depending whether the request goes back to the same instance or not.
Loading a recently-touched file will be very fast on Linux, since it will come right from the cache. Don't worry about it.
Do worry about the fact that by directly appending request parameters to the path you have a serious security hole: people can get data=../../../etc/passwd and so on. Read http://www.php.net/manual/en/security.variables.php and so on. (In this particular example you're hashing the inputs before putting them in the path so it's not a practical problem but it is something to watch for.)
More generally, if you want to hold a cache across multiple requests the typical thing these days is to use memcached.
php is done from a per-connection basis. IE: each request for a php file is seen as a new instance. Each instance is ended, generally, when the connection is closed. You can however use sessions to save data between connections for a specific user
For basic use of sessions look into:
session_start()
$_SESSION
session_destroy()

PHP - Display status of loop

I have a PHP script something like:
$i=0;
for(;$i<500;++i) {
//Do some operation with files numbered 0 to 500;
}
The thing is, the script works and displays the end results, but the operation takes a while and watching a blank screen can be frustrating. I was thinking if there is some way I can continuously update the page at the client's end, detailing which file is currently being worked upon. That is, can I display and continuously update what is the current value of $i?
The Solution
Thanks everyone! The output buffering is working as suggested. However, David has offered valuable insight and am considering that approach as well.
You can buffer and control the output from the PHP script.
However, you may want to consider the scalability of this design. In general, heavy processes shouldn't be done online. Your particular case may be an edge in that the wait is acceptable, but consider something like this as an alternative for an improved user experience:
The user kicks off a process. This can be as simple as setting a flag on a record in the database or inserting some "to be processed" records into the data.
The user is immediately directed to a page indicating that the process has been queued.
An offline process (either kicked off by the PHP script on the server or scheduled to run regularly) checks the data and does the heavy processing.
In the meantime, the user can refresh the page (manually, by navigating elsewhere and coming back to check, or even use an AJAX polling mechanism to update the page) to check the status of the processing. In this case, it sounds like you'd have several hundred records in a database table queued for processing. As each one finishes, it can be flagged as done. The page can just check how many are left, which one is current, etc. from the data.
When the processing is completed, the page shows the result.
In general this is a better user experience because it doesn't force the user to wait. The user can navigate around the site and check back on progress as desired. Additionally, this approach scales better. If your heavy processing is done directly on the page, what happens when you have many users or the data processing load increases? Will the page start to time out? Will users have to wait longer? By making the process happen outside of the scope of the website you can offload it to better hardware if needed, ensure that records are processed in serial/parallel as business rules demand (avoid race conditions), save processing for off-peak hours, etc.
Check out PHP's Output Buffering.
Try to use:
flush();
http://php.net/manual/ru/function.flush.php
Try the flush() function. Calling this function forces PHP to send whatever output it has so far to the client, instead of waiting for the script to end.
However, some web servers will only send the output once the entire page is done being built, so calling flush() would have no effect in this case.
Also, browsers themselves buffer input, so you may run into problems there. For example, certain versions of IE won't start displaying the page until 256 bytes has been received.

PHP display progress messages on the fly

I am working in a tool in PHP that processes a lot of data and takes a while to finish. I would like to keep the user updated with what is going on and the current task processed.
What is in your opinion the best way to do it? I've got some ideas but can't decide for the most effective one:
The old way: execute a small part of the script and display a page to the user with a Meta Redirect or a JavaScript timer to send a request to continue the script (like /script.php?step=2).
Sending AJAX requests constantly to read a server file that PHP keeps updating through fwrite().
Same as above but PHP updates a field in the database instead of saving a file.
Does any of those sound good? Any ideas?
Thanks!
Rather than writing to a static file you fetch with AJAX or to an extra database field, why not have another PHP script that simply returns a completion percentage for the specified task. Your page can then update the progress via a very lightweight AJAX request to said PHP script.
As for implementing this "progress" script, I could offer more advice if I had more insight as to what you mean by "processes a lot of data". If you are writing to a file, your "progress" script could simply check the file size and return the percentage complete. For more complex tasks, you might assign benchmarks to particular processes and return an estimated percentage complete based on which process has completed last or is currently running.
UPDATE
This is one suggested method to "check the progress" of an active script which is simply waiting for a response from a request. I have a data mining application that I use a similar method for.
In your script that makes the request you're waiting for (the script you want to check the progress of), you can store (either in a file or a database, I use a database as I have hundreds of processes running at any time which all need to track their progress, and I have another script that allows me to monitor progress of these processes) a progress variable for the process. When the process begins, set this to 1. You can easily select an arbitrary number of 'checkpoints' the script will pass and calculate the percentage given the current checkpoint. For a large request, however, you might be more interested in knowing the approximate percent the request has completed. One possible solution would be to know the size of the returned content and set your status variable according to the percentage received at any moment. I.e. if you receive the request data in a loop, each iteration you could update the status. Or if you are downloading to a flat file you could poll the size of the file. This could be done less accurately with time (rather than file size) if you know the approximate time the request should take to complete and simply compare against the script's current execution time. Obviously neither of these are perfect solutions, but I hope they'll give you some insight into your options.
I suggest using the AJAX method, but not using a file or a database. You could probably use session values or something like that, that way you don't have to create a connection or open a file to do anything.
In the past, I've just written messages out to the page and used flush() to flush the output buffer. Very simple, but it may not work correctly on every web server or with every web browser (as they may do their own internal buffering).
Personally, I like your second option the best. Should be reliable and fairly simple to implement.
I like option 2 - using AJAX to read a status file that PHP writes to periodically. This opens up a lot of different presentation options. If you write a JSON object to the file, you can easily parse it and display things like a progress bar, status messages, etc...
A 'dirty' but quick-and-easy approach is to just echo out the status as the script runs along. So long as you don't have output buffering on, the browser will render the HTML as it receives it from the server (I know WordPress uses this technique for it's auto-upgrade).
But yes, a 'better' approach would be AJAX, though I wouldn't say there's anything wrong with 'breaking it up' use redirects.
Why not incorporate 1 & 2, where AJAX sends a request to script.php?step=1, checks response, writes to the browser, then goes back for more at script.php?step=2 and so on?
if you can do away with IE then use server sent events. its the ideal solution.

Categories