I'm using Amazon Product Advertising API to handle my full text search. The problem is that the response is taking up to 3-4 seconds (which is about half of my total page load time of 6-8 seconds). Are there any general techniques I could do to improve response time? I'm already receiving the response in compressed format.
Ultimately, I want to be able to display the search engine results page to the user as quickly as possible.
I think you're asking about the concept of Web 2.0. Here is where, in your case, you serve the page immediately and then use an AJAX request that will populate it several seconds later with the content - all the while the user sees a spinning animated GIF waiting for your data payload.
You may want to read further about SOA (Service-oriented architecture) - this just one of dozens of programming paradigm that fit with the whole Web 2.0 theme.
Communicating with external web services is nearly always slow, usually unacceptably so. In this case, the only piece you'll really be able to optimize is the connection overhead. If you were to keep a daemon running locally that maintained a keepalive connection to the Amazon web service, then fired requests through that, you could avoid the connection overhead and improve response times.
From a UX perspective, you're probably better executing the search via an AJAX request to the server. You can display a spinner to the user, and then populate the page when the request returns. This would probably make it feel a bit more responsive, since they wouldn't be waiting on the whole page to build.
Related
I've been reading a lot on the subject of SSE and PHP, most of which seems to be advocating it as viable solutions for all sorts of things including chat apps. I have seen similar questions on this site but have not found a concise, definitive answer.
Is there something inherent in SSE which makes it way more server-friendly than AJAX short polling? Because the headers appear to be of very similar size. I am wondering if there is some kind of behind-the-scenes stuff beyond the headers that a noob like myself can't see e.g. some sort of connection recognition with each request/response? I know there are other factors involved where SSE prevails such as handling disconnections.
In terms of using it in a chat app scenario, ajax and sse appear to be doing the same thing. Neither of them seems to be able to perform long polling effectively with PHP. If I have User A and User B waiting on a PHP script that checks for new messages from the other user in the DB then sleeps for 3 seconds for say 10 loops, User A's new message cannot be inserted until User B has looped through the entire checking script, thereby rendering it absolutely useless (at least based on everything I've tried in the last 2 weeks!). I can get it working smoothly if I chat to myself and no one else is waiting on the checking script, but I've run out of things to talk about with myself and would really enjoy someone else being able to use it too.
So in a nutshell, given an Apache and PHP environment with WebSockets as not an option (due to shared hosting), is the only effective way to write a chat app, based on server burden alone, by short polling with one's choice of either AJAX or SSE, or is SSE definitely the superior option?
I would pursue WebSockets if the eventual traffic called for it and justified the web hosting upgrade.
(ALSO, as a side, is my premise off base regarding the long-polling scenario I described above where User A must wait for User B's loop to finish before he/she/it can perform the insert? Got me confused as to why that should be the case).
Kind of a long-winded, meandering question but hoping someone in the same situation can find this question and save themselves a lot of time.
Many Thanks!
Yes, SSE is a better option than AJAX, as AJAX polling is done on the main servers, like where most of the normal user traffic is to be hit. Whereas SSE polling is done on another instance which is made for it, so there will be no extra traffic on the main server. Please check Mercure (https://mercure.rocks/)
EDIT:
I mean to that, using SSE with platforms like Mercure would be a better option than AJAX. As AJAX will make a request to the main server. Which would increase the count of requests for the main server. Whereas we can distribute the network load using tools like the Mercure, in order to achieve the required functionality.
SSE can be thought of a thin API wrapper around the AJAX long-poll approach. It brings a standard API to something that was a hacky solution before.
something inherent in SSE which makes it way more server-friendly than AJAX short polling?
It holds the socket open. The pro of this is less latency (as soon as the server has the new information it sends it to the client, rather than waiting for the next client poll); the con is the extra resource usage (the socket, and the PHP process).
but I've run out of things to talk about with myself
Surely not. Have you tried starting a chat about if time is an illusion, and what came before?
with WebSockets as not an option (due to shared hosting)
SSE and WebSockets both hold a socket open. Shared hosting ISPs often go round closing sockets that have been open a long time (e.g. over 60s), unless they explicitly say they support SSE. The may also kill long-running PHP processes.
is my premise off base regarding the long-polling scenario I described above where User A must wait for User B's loop to finish before he/she/it can perform the insert?
I think it is off. The "A" in Ajax is asynchronous, meaning you can have multiple ajax/sse requests running at the same time. And on the server side you will have a distinct PHP process running for each request.
I've got a rather large PHP web app which gets its products from numerous others suppliers through their API's, usually responding with a large XML to parse. Currently there are 20 suppliers but this is due to rise even further.
Our current set up uses multi curl to make the requests and this takes about 30-40 seconds to complete and is too long. The script runs in the background whilst the front end polls the database looking for results and then displays them as they come in.
To improve this process we were thinking of using a job server to run in the background, each supplier request being a separate job. We've seen beanstalkd and Gearman being mentioned.
So are we looking in the right direction, as in, is a job server the right way to go? We're looking at doing some promotion soon so we may get 200+ users searching 30 suppliers at the same time so the right choice needs to scale well if we have to load balance.
Any advice is great fully received.
You can use Beanstalkd, as you can customize the priority of jobs and the TTR time-to-resolve, default is 60 seconds, but for your scenario you must increase it. There is a nice admin console panel for Beanstalkd.
You should also leverage the multi Curl calls, so you should use parallel requests. In order to make use of Keep-alive you also need to maintain a pool of CURL handles and keep them warm. See high performance curl tips. You also need to tune Linux network stack.
If you run this in cloud, make sure you use multiple micro machines rather than one heavy machine as the throughput is better when you have multiple resources available.
I've got a small php web app I put together to automate some manual processes that were tedious and time consuming. The app is pretty much a GUI that ssh's out and "installs" software to target machines based off of atomic change #'s from source control (perforce if it matters). The app currently kicks off each installation in a new popup window. So, say I'm installing software to 10 different machines, I get 10 different pop ups. This is getting to be too much. What are my options for kicking these processes off and displaying the results back on one page?
I was thinking I could have one popup that dynamically created divs for every installation I was kicking off, and do an ajax call for each one then display the output for each install in the corresponding div. The only problem is, I don't know how I can kick these processes off in parallel. It'll take way too long if I have to wait for each one to go out, do it's thing, and spit the results back. I'm using jQuery if it helps, but I'm looking mainly for high level architecture ideas atm. Code examples are welcome, but psuedo code is just fine.
I don't know how advanced you are or even if you have root access to your server which would be required, but this is one possible way.. it uses several different technologies, and would probably be suited for a large scale application rather than a small. But I'll advise you on it anyway.
Following technologies/stacks are used (in addition to PHP as you mentioned):
WebSockets (on top of node.js)
JSON-RPC Server (within node.js)
Gearman
What you would do, is from your client (so via JavaScript), when the page loads, a connection is made to node.js via WebSockets ) you can use something like socket.io for this).
Then when you decide that you want to do a task, (which might take a long time...) you send a request to your server, this might be some JSON encoded raw body, or it might just be a simple GET /do/something. What is important is what happens next.
On your server, when the job is received, you kick off a new job to Gearman, by adding a Task to your server. This then processes your task, and it will be a non blocking request, so you can respond immediately back to the client who made the request saying "hey we are processing your job".
Then, your server with all of your Gearman workers, receives the job, and starts processing it. This might take 5 minutes lets say for arguments sake. Once it has finished, the worker then makes a JSON encoded message which it sends to your node.js server which receives it via JSON-RPC.
After it grabs the message, it can then emit the event to any connections which need to know about it via websockets.
I needed something like this for a project once and managed to learn the basics of node.js in a day (having already a strong JS background). The second day I was complete with a full push/pull messaging job notification platform.
I have a game running in N ec2 servers, each with its own players inside (lets assume it a self-contained game inside each server).
What is the best way to develop a frontend for this game allowing me to have near real-time information on all the players on all servers.
My initial approach was:
Have a common-purpose shared hosting php website polling data from each server (1 socket for each server). Because most shared solutions don't really offer permanent sockets, this would require me to create and process a connection each 5 seconds or so. Because there isn't cronjob with that granularity, I would end up using the requests of one unfortunate client to make this update. There's so many wrong's here, lets consider this the worst case scenario.
The best scenario (i guess) would be to create small ec2 instance with some python/ruby/php web based frontend, with a server application designed just for polling and saving the data from the servers on the website database. Although this should work fine, I was looking for some solution where I don't need to spend that much money (even a micro instance is expensive for such pet project).
What's the best and cheap solution for this?
Is there a reason you can't have one server poll the others, stash the results in a json file, then push that file to the web server in question? The clients could then use ajax to update the listings in near real time pretty easily.
If you don't control the game servers I'd pass the work on updating the json off to one of the random client requests. it's not as bad as you think though.
Consider the following:
Deliver (now expired) data to client, including timestamp
call flush(); (test to make sure the page is fully rendered, you may need to send whitespace or something to fill the buffer depending on how the webserver is configured. appending flush(); sleep(4); echo "hi"; to a php script should be an easy way to test.
call ignore user abort (http://php.net/manual/en/function.ignore-user-abort.php) so your client will continue execution regardless of what the user does
poll all the servers, update your file
Client waits a suitable amount of time before attempting to update the updated stats via AJAX.
Yes that client does end up with the request taking a long time, but it doesn't affect their page load, so they might not even notice.
You don't provide the information needed to make a decision on this. It depends on the number of players, number of servers, number of games, communication between players, amount of memory and cpu needed per game/player, delay and transfer rate of the communications channels, geographical distribution of your players, update rate needed, allowed movement of the players, mutual visibility. A database should not initially be part of the solution, as it only adds extra delay and complexity. Make it work real-time first.
Really cheap would be to use netnews for this.
I'm keeping my self busy working on app that gets a feed from twitter search API, then need to extract all the URLs from each status in the feed, and finally since lots of the URLs are shortened I'm checking the response header of each URL to get the real URL it leads to.
for a feed of 100 entries this process can be more then a minute long!! (still working local on my pc)
i'm initiating Curl resource one time per feed and keep it open until I'm finished all the URL expansions though this helped a bit i'm still warry that i'l be in trouble when going live
any ideas how to speed things up?
The issue is, as Asaph points out, that you're doing this in a single-threaded process, so all of the network latency is being serialized.
Does this all have to happen inside an http request, or can you queue URLs somewhere, and have some background process chew through them?
If you can do the latter, that's the way to go.
If you must do the former, you can do the same sort of thing.
Either way, you want to look at way to chew through the requests in parallel. You could write a command-line PHP script that forks to accomplish this, though you might be better off looking into writing such a beast in language that supports threading, such as ruby or python.
You may be able to get significantly increased performance by making your application multithreaded. Multi-threading is not supported directly by PHP per se, but you may be able to launch several PHP processes, each working on a concurrent processing job.