I have an CURL request, where at times, and also depending on the request load, it takes minutes to receive response from the API, due to processing and calculations.
For sake of good user experience, this behavior is undesired to have.
While waiting for response, sometimes long time, user is unable to perform any functions on the website.
So I am looking for solutions on how to go by doing it so the user can use the application while this application waits for results from API.
Solutions I have already considered.
Recording the request and using cron job to process it.
Unfortunately there is couple of pitfalls to that.
-Need of running a cron job every minute or constantly.
-Situations when user request is minimal or API is super fast at the moment, entire request may only take 2-3 seconds. But when using cron and it was requested let's say 30 seconds ahead of time for next cron job run, you end up with result of 32 second turn around.
So this solution may improve some, and worsen some, not sure if I really like that.
Aborting CURL request after few seconds.
Although the API sends separate response to url endpoint of my choice, and it seems safe to terminate transmission after posting request, there might be situation when those few seconds may not be enough time to even establish the connection with API. I guess what I am trying to say is that, with terminating CURL I have no way of knowing if the actual request made it through.
Is there another approach that I could consider?
Thank you.
Sounds like what you're asking for is an asynchronous cURL call.
Here's the curl_multi_init documentation
I have a web application written in PHP using a Postgres database.
The next phase of development is for background batch processes to be built that will need to be executed once a day (or adhoc as requested) for each user of the app. The process will query, await response and process the response from third-party services to feed information into the user's account within the web application.
Are there good ways to do this?
How would batches be triggered every day at 3am for each user?
Given there could be a delay in the response, is this a good scenario to use something like node.js?
Is it best to have the output of the batch process directly update the web application's database
with the appropriate data?
Or, is there some other way to handle the output?
Update: The process doesn't have to run at 3am. The key is that a few batch processes may need to run for each user. The execution of batches could be spread throughout the day.. I want this to be a "background" process separate to the app.
You could write a PHP script that runs through any users that need to be processed, and set up a cron job to run your script at 3am. Running as a cron job means you don't need to worry so much about how slow the third party call is. Obviously you'd need to store any necessary data in the database.
Alternatively, if the process is triggered by the user doing something on the site, you could use exec() to trigger the PHP script to process just that user, right away, without the user having to wait. The risk with this is that you can't control how rapidly the process is triggered.
Third option is to just do the request live and make the user wait. But it sounds like this is not an option for you.
It really depends on what third party you're calling and why. How long does the third party take to respond, how reliable they are, what kind of rate limits they might enforce, etc...
I'm not an expert on http request so this question might be trivial for some. I'm sending a request to a php script which takes a lot of time to process a file and return a response. Is there a way to send a response before this script finishes its task to let the user know about the process status? Since this task can take up to several minutes I'd like to notify the user when key parts of the process are finished.
Note: I cannot break this request into several others
I might not have the correct approach here if so do you have other ideas how this could be handled?
Technically yes, but it would require you to have fine grained control of the http-stack, which you may or may not have in a typical php setup. I would suggest you look into other solutions (E.g. Make request to start the task - then poll to get an update on the progress)
http://www.redips.net/javascript/ajax-progress-bar/
here's a great article that goes over creating ajax a progress bar to use with php.
let me know if it doesn't make sense!
I think the best way for long proccessing requests is cron jobs. You can send request which will create 'task' and catch the task by cron job. Cron job can change task status while working and you can check task status via interval requests. I can't imagine another way to inform users about request proccessing. As soon as you response your headers are sent and PHP stops.
EDIT: it should be noted that Cron jobs are only available on Linux servers. windows servers would require access to the task scheduler, which most web hosts will not allow.
I'm currently pinging URLs using CURL + PHP. But in my script, a request is sent, then it waits until the response comes, then another request, ... If each response takes ~3s to come, in order to ping 10k links it takes more than 8 hours!
Is there a way to send multiple requests at once, like some kind of multi-threading?
Thank you.
USe the curl_multi_* functions available in curl. See http://www.php.net/manual/en/ref.curl.php
You must group the URLs in smaller sets: Adding all 10k links at once is not likely to work. So create a loop around the following code and use a subset of URLS (like 100) in the $urls variable.
$all = array();
$handle = curl_multi_init();
foreach ($urls as $url) {
$all[$url] = curl_init();
// Set curl options for $all[$url]
curl_multi_add_handle($handle, $all[$url]);
}
$running = 0;
do {
curl_multi_exec($handle, $running;);
} while ($running > 0);
foreach ($all as $url => $curl) {
$content = curl_multi_getcontent($curl);
// do something with $content
curl_multi_remove_handle($handle, $curl);
}
curl_multi_close($handle);
First off I would like to point out that this is not a basic task which you can do on any kind of shared hosting provider. I assume you will get banned for sure.
So I assume you are able to compile software(VPS?) and start long running processes in the background(using php cli). I would use a redis(I liked predis as PHP client library very much) to push messages on a list. (P.S: I would prefer to write this in node.js/python(explanation below works for PHP), because I think this task can be coded in these languages pretty fast. I am going to try and write it and post code on github later.)
Redis:
Redis is an advanced key-value store.
It is similar to memcached but the
dataset is not volatile, and values
can be strings, exactly like in
memcached, but also lists, sets, and
ordered sets. All this data types can
be manipulated with atomic operations
to push/pop elements, add/remove
elements, perform server side union,
intersection, difference between sets,
and so forth. Redis supports different
kind of sorting abilities.
Then start a couple of worker processes which will take(blocking if none available) messages from the list.
Blpop:
This is where Redis gets really
interesting. BLPOP and BRPOP are the
blocking equivalents of the LPOP and
RPOP commands. If the queue for any of
the keys they specify has an item in
it, that item will be popped and
returned. If it doesn't, the Redis
client will block until a key becomes
available (or the timeout expires -
specify 0 for an unlimited timeout).
Curl is not exactly pinging(ICMP Echo), but I guess some servers could block these requests(security). I would first try to ping(using nmap snippet part) the host, and fail back to curl if ping fails, because pinging is faster then using curl.
Libcurl:
A free client-side URL transfer
library, supporting FTP, FTPS, Gopher
(protocol), HTTP, HTTPS, SCP, SFTP,
TFTP, TELNET, DICT, FILE, LDAP, LDAPS,
IMAP, POP3, SMTP and RTSP (the last
four—only in versions newer than
7.20.0 or 9 February 2010)
Ping:
Ping is a computer network
administration utility used to test
the reachability of a host on an
Internet Protocol (IP) network and to
measure the round-trip time for
messages sent from the originating
host to a destination computer. The
name comes from active sonar
terminology. Ping operates by sending
Internet Control Message Protocol
(ICMP) echo request packets to the
target host and waiting for an ICMP
response.
But then you should do a HEAD request and only retrieve headers to check if host is up. Otherwise you would also be downloading content of url(takes time/cost bandwidth).
HEAD:
The HEAD method is identical to GET
except that the server MUST NOT return
a message-body in the response. The
metainformation contained in the HTTP
headers in response to a HEAD request
SHOULD be identical to the information
sent in response to a GET request.
This method can be used for obtaining
metainformation about the entity
implied by the request without
transferring the entity-body itself.
This method is often used for testing
hypertext links for validity,
accessibility, and recent
modification.
Then each worker process should use curl_multi. I think this link might provide a good implementation of this(minus it does not do head request). to have some sort of concurrency in each process.
You can either fork your php process using pcntl_fork or look into curl's built-in multi-threading. https://web.archive.org/web/20091014034235/http://www.ibuildings.co.uk/blog/archives/811-Multithreading-in-PHP-with-CURL.html
PHP doesn't have true multi-thread capabilities.
However, you could always make your CURL requests asynchronously.
This would allow you to fire off batches of pings instead of one at a time.
Reference: How do I make an asynchronous GET request in PHP?
Edit: Just keep in mind your gonna have to make your PHP wait until all responses come back before terminating.
Christian
curl has the "multi request" facility which is essentially a way of doing threaded requests. Study the example on this page: http://www.php.net/manual/en/function.curl-multi-exec.php
You can use the PHP exec() function to execute unix commands like wget to accomplish this.
exec('wget -O - http://example.com/url/to_ping /dev/null 2>&1 &');
It's by no means an ideal solution but does get the jobs done and by sending the output to /dev/null and running it in the background you can move onto the next "ping" without having to wait for the response.
Note: Some servers have exec() disabled for security purposes.
I would use system() and execute the ping script as a new process. Or multiple processes.
You can make a centralized queue with all addresses to ping, then kick of some ping scripts on the task.
Just note:
If a program is started with this
function, in order for it to continue
running in the background, the output
of the program must be redirected to a
file or another output stream. Failing
to do so will cause PHP to hang until
the execution of the program ends.
To handle this kind of tasks try out I/O multiplexing strategies. In a nutshell, the idea is that you create a bunch of sockets, feed them to your OS (say, using epoll on linux / kqueue on FreeBSD) and sleep until an event occurs on some of the sockets. Your OS's kernel can handle hundreds or even thousands of sockets in parallel in a single process.
You can not only handle TCP sockets but also deal with timers / file descriptors in a similar fashion in parallel.
Back to PHP, check out something like https://github.com/reactphp/event-loop which exposes a good API and hides lots of low-level details.
Run multiple php processes.
Process 1: pings sites 1-1000
Process 2: pings sites 1001-2001
...
Is there any sane way to make a HTTP request asynchronously in PHP without throwing out the response? I.e., something similar to AJAX - the PHP script initiates the request, does it's own thing and later, when the response is received, a callback function/method or another script handles the response.
One approach has crossed my mind - spawning a new php process with another script for each request - the second script does the request, waits for the response and then parses the data and does whatever it should, while the original script goes on spawning new processes. I have doubts, though, about performance in this case - there must be some performance penalty from having to create a new process every time.
Yes, depending on the traffic of your site, spawning a separate PHP process for running a script could be devastating. It would be more efficient to use shell_exec() to start a background process that saves the output to a filename you already know, but even this could be resource intensive.
You could also have a request queue stored in a database. A single, separate background process would pull the job, execute it, and save the output, possibly setting a flag in the DB that your web process could check.
If you're going to use the DB queue approach, use curl_multi* class of functions to send all queued requests at once. This will limit the execution time of each iteration in your background process to the longest request time.
V5 may not be threaded, but you can create applications that exploit in-process multitasking.
Check out the following article: "Develop multitasking applications with PHP V5" from
IBM DeveloperWorks. You can find it here http://www.ibm.com/developerworks/web/library/os-php-multitask/