I have a PHP site in which I make an ajax call , in that ajax call I make call to an API that returns XML and I parse it, The problem it sometimes the xML is so huge that it takes many time, The load balancer in EC2 have timeout value of 20 minutes, so If my call is greater than this I get 504 Error, How can I solve this issue? I know its a server issue but how I can solve this?I dont think php.ini is helpful here
HTTP is a stateless protocol. It works best when responses to requests are made within a few seconds of the request. When you don't respond quickly, timeouts start coming into play. This might be a timeout you can control (fcgi process timeout) or one you can't control (third party proxy, client browser).
So what do you do when you have work that will take longer than a few seconds? Use a message queue of course.
The cheap way to do this is store the job in a db table and have cron read from the table and process the work. This can work on a small scale, but it has some issues when you try to get larger.
The proper way to do this is use a real message queue system. Amazon has SQS, but could just as well use Gearman, zeroMQ, rabbitMQ, and others to handle this.
Related
I have an CURL request, where at times, and also depending on the request load, it takes minutes to receive response from the API, due to processing and calculations.
For sake of good user experience, this behavior is undesired to have.
While waiting for response, sometimes long time, user is unable to perform any functions on the website.
So I am looking for solutions on how to go by doing it so the user can use the application while this application waits for results from API.
Solutions I have already considered.
Recording the request and using cron job to process it.
Unfortunately there is couple of pitfalls to that.
-Need of running a cron job every minute or constantly.
-Situations when user request is minimal or API is super fast at the moment, entire request may only take 2-3 seconds. But when using cron and it was requested let's say 30 seconds ahead of time for next cron job run, you end up with result of 32 second turn around.
So this solution may improve some, and worsen some, not sure if I really like that.
Aborting CURL request after few seconds.
Although the API sends separate response to url endpoint of my choice, and it seems safe to terminate transmission after posting request, there might be situation when those few seconds may not be enough time to even establish the connection with API. I guess what I am trying to say is that, with terminating CURL I have no way of knowing if the actual request made it through.
Is there another approach that I could consider?
Thank you.
Sounds like what you're asking for is an asynchronous cURL call.
Here's the curl_multi_init documentation
I am running into this problem :
I am sending a request to the server using AJAX, which takes some parameters in and on the server side will generate a PDF.
The generation of the pdf can take a lot of time depending on the data used
The Elastic Load Balancer of AWS, after 60s of "idle" connection decides to drop the socket, and therefore my request fails in that case.
I know it's possible to increase the timeout in ELB settings, but not only my sysadmin is against it, it's also a false solution, and bad practice.
I understand the best way to solve the problem would be to send data through the socket to sort of "tell ELB" that I am still active. Sending a dummy request to the server every 30s doesn't work because of our architecture and the fact that the session is locked (ie. we cannot have concurrent AJAX requests from the same session, otherwise one is pending until the other one finishes)
I tried just doing a get request to files on the server but it doesn't make a difference, I assume the "socket" is the one used by the original AJAX call.
The function on the server is pretty linear and almost impossible to divide in multiple calls, and the idea of letting it run in the background and checking every 5sec until it's finished is making me uncomfortable in terms of resource control.
TL;DR : is there any elegant and efficient solution to maintain a socket active while an AJAX request is pending?
Many thanks if anyone can help with this, I have found a couple of similar questions on SO but both are answered by "call amazon team to ask them to increase the timeout in your settings" which sounds very bad to me.
Another approach is to divided the whole operations into two services:
The first service accepts a HTTP request for generating a PDF document. This service finishes immediately after request is accepted. And it will return a UUID or URL for checking result
The second service accepts the UUID and return the PDF document if it's ready. If PDF document is not ready, this service can return an error code, such as HTTP 404.
Since you are using AJAX to call the server side, it will be easy for you to change your javascript and call the 2nd servcie when the 1st service finished successfully. Will this work for your scenario?
Have you tried to following the trouble shooting guide of ELB? Quoted the relevant part below:
HTTP 504: Gateway Timeout
Description: Indicates that the load balancer closed a connection
because a request did not complete within the idle timeout period.
Cause 1: The application takes longer to respond than the configured
idle timeout.
Solution 1: Monitor the HTTPCode_ELB_5XX and Latency metrics. If there
is an increase in these metrics, it could be due to the application
not responding within the idle timeout period. For details about the
requests that are timing out, enable access logs on the load balancer
and review the 504 response codes in the logs that are generated by
Elastic Load Balancing. If necessary, you can increase your capacity
or increase the configured idle timeout so that lengthy operations
(such as uploading a large file) can complete.
Cause 2: Registered instances closing the connection to Elastic Load
Balancing.
Solution 2: Enable keep-alive settings on your EC2 instances and set
the keep-alive timeout to greater than or equal to the idle timeout
settings of your load balancer.
The problem is I have to use curl and sometimes the curl requests take a long time because of the timeouts. I have set the timeouts to 1 second so no request should take more than 1 second but still the server is unable to process other php requests.
My question is how many concurrent scripts(running at the same time) can nginx/php-fpm handle. What I see is that a few requests lasting 1 second make the whole server unresponsive. What are the settings that I can change so more requests can be processed at the same time?
Multicurl is indeed not the solution to your probleme, but asynchrousity probably is. I am not sure that the solution is tweaking Nginx. It would scale better if you were to consider one of the following options :
You can abstract Curl with Guzzle http://docs.guzzlephp.org/en/latest/ and use their approach to async call and promises.
You can use Gearmand http:/gearman.org/getting-started/ which will enable you to send an async message to a remote server which will process the instruction based on a script you register to your message. (I use this mechanism for non blocking logging)
Either way, your call will be made in milliseconds and won't block your nginx but your code will have to change a little bit.
Php-curl did not respond in a timely manner because of DNS.
The problem was that I had to access files from a CDN but the IP behind the domain changed frequently and unfortunately curl keeps a DNS cache.
So from time to time it would try to access files from IPs that were not valid anymore, but they were still in the DNS cache of php-curl.
I had to drop php-curl completely and use a plain file_get_contents(...) request. This completely solved the problem.
So a friend and I are building a web based, AJAX chat software with a jQuery and PHP core. Up to now, we've been using the standard procedure of calling the sever every two seconds or so looking for updates. However I've come to dislike this method as it's not fast, nor is it "cost effective" in that there are tons of requests going back and forth from the server, even if no data is returned.
One of our project supporters recommended we look into a technique known as COMET, or more specifically, Long Polling. However after reading about it in different articles and blog posts, I've found that it isn't all that practical when used with Apache servers. It seems that most people just say "It isn't a good idea", but don't give much in the way of specifics in the way of how many requests can Apache handle at one time.
The whole purpose of PureChat is to provide people with a chat that looks great, goes fast, and works on most servers. As such, I'm assuming that about 96% of our users will being using Apache, and not Lighttpd or Nginx, which are supposedly more suited for long polling.
Getting to the Point:
In your opinion, is it better to continue using setInterval and repeatedly request new data? Or is it better to go with Long Polling, despite the fact that most users will be using Apache? Also, it possible to get a more specific rundown on approximately how many people can be using the chat before an Apache server rolls over and dies?
As Andrew stated, a socket connection is the ultimate solution for asynchronous communication with a server, although only the most cutting edge browsers support WebSockets at this point. socket.io is an open source API you can use which will initiate a WebSocket connection if the browser supports it, but will fall back to a Flash alternative if the browser does not support it. This would be transparent to the coder using the API however.
Socket connections basically keep open communication between the browser and the server so that each can send messages to each other at any time. The socket server daemon would keep a list of connected subscribers, and when it receives a message from one of the subscribers, it can immediately send this message back out to all of the subscribers.
For socket connections however, you need a socket server daemon running full time on your server. While this can be done with command line PHP (no Apache needed), it is better suited for something like node.js, a non-blocking server-side JavaScript api.
node.js would also be better for what you are talking about, long polling. Basically node.js is event driven and single threaded. This means you can keep many connections open without having to open as many threads, which would eat up tons of memory (Apaches problem). This allows for high availability. What you have to keep in mind however is that even if you were using a non-blocking file server like Nginx, PHP has many blocking network calls. Since It is running on a single thread, each (for instance) MySQL call would basically halt the server until a response for that MySQL call is returned. Nothing else would get done while this is happening, making your non-blocking server useless. If however you used a non-blocking language like JavaScript (node.js) for your network calls, this would not be an issue. Instead of waiting for a response from MySQL, it would set a handler function to handle the response whenever it becomes available, allowing the server to handle other requests while it is waiting.
For long polling, you would basically send a request, the server would wait 50 seconds before responding. It will respond sooner than 50 seconds if it has anything to report, otherwise it waits. If there is nothing to report after 50 seconds, it sends a response anyways so that the browser does not time out. The response would trigger the browser to send another request, and the process starts over again. This allows for fewer requests and snappier responses, but again, not as good as a socket connection.
I'm using the rolling-curl [https://github.com/LionsAd/rolling-curl] library to asynchronously retrieve content from a large amount of web resources as part of a scheduled task. The library allows you to set the maximum number of concurrent CURL connections, and I started out at 20 but later moved up to 50 to increase speed.
It seems that every time I run it, arbitrary urls out of the several thousand being processed just fail and return a blank string. It seems the more concurrent connections I have, the more failed requests I get. The same url that failed one time may work the next time I attempt to run the function. What could be causing this, and how can I avoid it?
Everything Luc Franken wrote is accurate and his answer lead me to the solution to my version of the questioner's problem, which is:
Remote servers respond according to their own, highly variable, schedules. To give them enough time to respond, it's important to set two cURL parameters to provide a liberal amount of time. They are:
CURLOPT_CONNECTTIMEOUT => 30
CURLOPT_TIMEOUT => 30
You can try longer and shorter amounts of time until you find something that minimizes errors. But if you're getting intermittent non-responses with curl/multi-curl/rollingcurl, you can likely solve most of the issue this way.
In general you assume that this should not happen.
In the case of accessing external servers that is just not the case. Your code should be totally aware of servers which might not respond, don't respond in time or respond wrong. It is allowed in the HTTP process that things can go wrong. If you reach the server you should get notified by an HTTP error code (although that not always happens) but also network issues can create no or useless responses.
Don't trust external input. That's the root of the issue.
In your concrete case you increase the amount of requests consistently. That will create more requests, open sockets and other uses. To find the solution to your exact issue you need advanced access to the server so you can see the logfiles and monitor open connections and other concerns. Preferably you test this on a test server without any other software creating connections so you can isolate the issue.
But how well tested you make it, you have just uncertainties. For example you might get blocked by external servers because you make too many requests. You might be get stuck in some security filters like DDOS filters etc. Monitoring and customization of the amount of requests (automated or by hand) will generate the most stable solution for you. You could also just accept these lost requests and just handle a stable queue which makes sure you get the contents in at a certain moment in time.