The page really needs to load fast, but the DB is slow, so we split it into two db calls, one faster and one slower, the first one that is faster runs and we can serve a part of the page that is quite usable by itself.
But then we want the second request to go off, and we know that it will ALWAYS be necessary to do whenever the first request goes off. So now the first part of the page contains a script which fires off http requests and then we make a db call and finally it loads.
But this is a serial opreation, which means the first part of page load needs to both finish its db, return http, render in the browser, run the script, request http then wait for db and finally return us the whole page.
How do you go about solving this in PHP? We dont have memcache and I looked into fifo but we dont have posix_mkfifo function either.
I want to make two db calls on the first request, serve the first request and part of page, let the second db call continue running, when its finished I want to keep it in /tmp/ or a buffer or wherever fast - in memory - and when the script asks for it - perhaps the scripts http req will need to wait for it some more, perhaps its lucky and will get it served from memory already.
But where in memory do you keep it, across requests and php instances? Not in global, not in session, not in memcached. Where? Sockets?? Should I fork and pipe?
EDIT: Thanks, everybody. I went with the two-async-http-requests route.
I think you could use AJAX.
First time send HTML page with 2 javascript AJAX call, one for each sql query, triggered by page load.
Then load page async with those results.
The problem is that your problem is to complex to solve it without extra solutions like memcache. Direkt in PHP you can save short data in SHM. But thats not the best solution.
The best solution is to build a better database structure so get a better result and a faster response from your database.
For better performance in your database you can look at MySQL memory tables. But be careful the tables will be cleared after restart. So you can fill the tables with data for caching.
And you can send more then one request at a time with Ajax.
Related
I've been searching around the web without success to find the answer, how exactly does the "FLOW" of the application work if you cache on user request.
For example, the most used implementation to handle "on request" caching is the follow:
pseudocode:
if(redis->hasKey('content')) {
return content;
}
else {
get_content_from_database();
cache_content_in_redis();
content_expire(10);
return content;
}
Let's say, there are suddenly 1000 requests on a certain page which uses the logic above.
Logically the first request to hit the if statement, will see there is no content under the certain key and will trigger the "else" part and retrieve the content and cache it.
My questions:
What happens with other requests?
Does the second request in line already see that there is content under the key and retrieve it?
What happens if the write of the content to redis is still in progress from the request which triggered it?
If the second request also executes the "else" part of the statement, will there find a second write in place?
Or will the write get skipped and the content from the databse will be returned until the write is complete?
Who gets the cached content?
To answer your question (without going into discussion if this problem shouldn't be solved differently): you could use locks (either a distributed one if your app is on multiple nodes or PHP's semaphores if your app runs on a single node).
Please note that this is a pseudo-code not an actual PHP implementation:
$contentProvider->getData('my_key');
class ContentProvider
{
private $lockHandler;
public function getData($key)
{
$this->lockHandler->acquire($key);
// Do the if/else part
$this->lockHandler->release($key);
return $data;
}
}
You basically need to block for the whole if/else part. Lets say you have 1000 requests. Then 1 request will acquire the lock and the rest (999) will wait for the one to generate the content and store it in cache. After that the remaining 999 request will follow.
The problem is that now you have 1000 request accessing the cached content sequentially. There is no parallelism anymore because every call to getData will block.
It might not be a big deal if you are running on a single node using semaphores but the moment you need to switch to a distributed locks this can affect your performance dramatically (because all calls to getData run sequentially anyway - even though they are on different nodes).
Having said that I wanted to point out one thing. The above solution will guarantee that cache is only generated once and it may seem to be a viable solution. But in reality it is not. The whole point of having cache is to be fast and able to handle requests concurrently. With locks it is simply impossible.
So if running into such situation is very probable than caching on request is simply not an option and you need to find another solution. However, if this is very unlikely to happen - just stay with the original code (without locks). Even if you generate the same data multiple times every now and then - it is probably not going to be an issue.
I have a program in which an Ajax request goes out to a PHP script that does all the behind the scenes stuff of grabbing the data from the database and then returning it as responseText to be pasted into my HTML to update the page without refreshing. I do this request once every 250 milliseconds (so that means I get 4 hits to the PHP script per second for every user on the HTML page firing the Ajax requests.) I am already seeing the PHP crash with a few computers on at the same time, so I'm guessing the problem has something to do with the PHP getting a lot of requests. Is there a way to do these requests so that a lot of users can get on without this scalability issue coming into play?
First of all firing that much ajax requests ain't a good idea. when the number of users increases, the server load will increase exponentially serving requests. Second thing you need to consider the possibilities of scaling the application and database. I guess you might be returning json_encoded data from server, else make it so.
4 r/s is.. nothing for PHP. So in a short: in order to make PHP scalable; make your code scalable.
I am working in a tool in PHP that processes a lot of data and takes a while to finish. I would like to keep the user updated with what is going on and the current task processed.
What is in your opinion the best way to do it? I've got some ideas but can't decide for the most effective one:
The old way: execute a small part of the script and display a page to the user with a Meta Redirect or a JavaScript timer to send a request to continue the script (like /script.php?step=2).
Sending AJAX requests constantly to read a server file that PHP keeps updating through fwrite().
Same as above but PHP updates a field in the database instead of saving a file.
Does any of those sound good? Any ideas?
Thanks!
Rather than writing to a static file you fetch with AJAX or to an extra database field, why not have another PHP script that simply returns a completion percentage for the specified task. Your page can then update the progress via a very lightweight AJAX request to said PHP script.
As for implementing this "progress" script, I could offer more advice if I had more insight as to what you mean by "processes a lot of data". If you are writing to a file, your "progress" script could simply check the file size and return the percentage complete. For more complex tasks, you might assign benchmarks to particular processes and return an estimated percentage complete based on which process has completed last or is currently running.
UPDATE
This is one suggested method to "check the progress" of an active script which is simply waiting for a response from a request. I have a data mining application that I use a similar method for.
In your script that makes the request you're waiting for (the script you want to check the progress of), you can store (either in a file or a database, I use a database as I have hundreds of processes running at any time which all need to track their progress, and I have another script that allows me to monitor progress of these processes) a progress variable for the process. When the process begins, set this to 1. You can easily select an arbitrary number of 'checkpoints' the script will pass and calculate the percentage given the current checkpoint. For a large request, however, you might be more interested in knowing the approximate percent the request has completed. One possible solution would be to know the size of the returned content and set your status variable according to the percentage received at any moment. I.e. if you receive the request data in a loop, each iteration you could update the status. Or if you are downloading to a flat file you could poll the size of the file. This could be done less accurately with time (rather than file size) if you know the approximate time the request should take to complete and simply compare against the script's current execution time. Obviously neither of these are perfect solutions, but I hope they'll give you some insight into your options.
I suggest using the AJAX method, but not using a file or a database. You could probably use session values or something like that, that way you don't have to create a connection or open a file to do anything.
In the past, I've just written messages out to the page and used flush() to flush the output buffer. Very simple, but it may not work correctly on every web server or with every web browser (as they may do their own internal buffering).
Personally, I like your second option the best. Should be reliable and fairly simple to implement.
I like option 2 - using AJAX to read a status file that PHP writes to periodically. This opens up a lot of different presentation options. If you write a JSON object to the file, you can easily parse it and display things like a progress bar, status messages, etc...
A 'dirty' but quick-and-easy approach is to just echo out the status as the script runs along. So long as you don't have output buffering on, the browser will render the HTML as it receives it from the server (I know WordPress uses this technique for it's auto-upgrade).
But yes, a 'better' approach would be AJAX, though I wouldn't say there's anything wrong with 'breaking it up' use redirects.
Why not incorporate 1 & 2, where AJAX sends a request to script.php?step=1, checks response, writes to the browser, then goes back for more at script.php?step=2 and so on?
if you can do away with IE then use server sent events. its the ideal solution.
I'm searching on how to do this but my searches aren't turning up things that are talking about what I'm trying to do so maybe I'm not searching with the right terms or this isn't possible, but figured I would ask here for help.. this is what I am trying to do..
I have PHP scripts that are called asyncrhonously, so it is called and it just runs, the calling PHP doesn't wait for a response, so it can go on to do other stuff / free things up so another asynch php process can be run.
I would still like to get back a result from these "zombie" scripts or whatever you want to call them, however the only way I can think of doing it that I know for sure will work is something like make this "zombie" script save its final output to a database and then have my AJAX UI make periodic requests to this database to check if the needed value exists in the place it is supposed to.. which would allow it to get the output from the zombie PHP script..
I am thinking it would be better if somehow this zombie script could do a sort of page refresh to the AJAX ui but the ajax ui would intercept this and just take the received data from PHP and use it as needed (such as display in a DIV for user to see).. basically I'm wondering if you can make PHP force this kind of thing rather than needing to involve a database in this and making AJAX do repeated requests to check for a specific value that way..
Thanks for any advice
No, a background script has no way to influence the client's front-end because it has no connection to it.
Starting a background script, having the script write status data into a shared space - be it a database or a memcache or a similar solution - and polling the status through Ajax is usually indeed the best way to go.
One alternative may be Comet. It's a technique where a connection is kept open over a long time, and updated actively from the server side (instead of frequent client-side Ajax polling). I have no practical experience with this but I imagine it most probably needs server side tweaking to be doable in PHP - it's not the best platform for long-running stuff. See this question for some approaches.
My AJAX search program keeps asking PHP for results of a particular search term. The start of the PHP script reads thru a MySQL DB and initializes, so if the PHP keeps restarting, it will have to read the DB millions of times. If I can keep it alive and let multiple silly AJAX requests be served by a single running PHP script instance I'm sure performance would improve.
How do you do this typically? Use a service? Can this be done with services?
PHP has no concept of long-lived objects or shared state between threads or requests, every request always starts at zero (except for the session state, of course). You can emulate long-lived objects by caching to disk or memory (see memcached).
Do you have a particular reason to read the entire database when your script initializes?
How about storing the db results in a session variable? You'd check first if the keyword is not in the session (sessions allow to transport variable values between page refreshes), and if not, do a db query.
To store it:
$_SESSION['storedQueries']['keyword']= 'its definition, from the database';
To look for it:
$index= array_search('keyword',array_keys($_SESSION['storedQueries']));
$result = ($index) ? $_SESSION['storedQueries'][$index] : 'nothing found, you should do a db query';
The ajax part is pretty easy if you use a javascript library, such as jquery:
$('#resultZone').load('find.php',{keyword: $('input.search').val() });
If you know the results are the same every time, just move those results to a session variable.
PHP sessions are explained pretty well in their documentation:
http://us3.php.net/manual/en/book.session.php
If the search result is something that would be similar to multiple users, I usually create a cache file and serialize the result set there. As a filename I might use md5sum of a string containing the search query and perhaps user group. Then when a Ajax needs the data I just need to check if the file is too old, if not I just need to send it to the client or maybe even just redirect the Ajax http-request to the file (assuming it is formatted properly). If the file is too old, I just refresh it with new content.
For very high volume sites memcached is usually a better option. And also some kind of php cache helps and SQL connection pooling lowers the overhead of opening SQL connections.
connecting to the DB is a very expensive operation and you can go around that by caching the results, take a look at Zend_Cache and see how it can save you allot headache.