cURL multi hanging/ignoring timeout - php

I'm using a 'rolling' cURL multi implementation (like this SO post, based on this cURL code). It works fine to process thousands of URLs using up to 100 requests at the same time, with 5 instances of the script running as daemons (yeah, I know, this should be written in C or something).
Here's the problem: after processing ~200,000 urls (across the 5 instances) curl_multi_exec() seems to break for all instances of the script. I've tried shutting down the scripts, then restarting, and the same thing happens (not after 200,000 urls, but right on restart), the script hangs calling curl_multi_exec().
I put the script into 'single' mode, processing one regular cURL handle at time, and that works fine (but it's not quite the speed I need). My logging leads me to suspect that it may have hit a patch of slow/problematic connections (since every so often it seems to process on URL then hang again), but that would mean my CURLOPT_TIMEOUT is being ignored for the individual handles. Or maybe it's just something with running that many requests through cURL.
Anyone heard of anything like this?
Sample code (again based on this):
//some logging shows it hangs right here, only looping a time or two
//so the hang seems to be in the curl call
while(($execrun =
curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM);
//code to check for error or process whatever returned
I have CURLOPT_TIMEOUT set to 120, but in the cases where curl_multi_exec() finally returns some data, it's after 10 minutes of waiting.
I have a bunch of testing/checking yet to do, but thought maybe this might ring a bell with someone.

After much testing, I believe I've found what is causing this particular problem. I'm not saying the other answer is incorrect, just in this case not the issue I am having.
From what I can tell, curl_multi_exec() does not return until all DNS (failure or success) is resolved. If there are a bunch of urls with bad domains curl_multi_exec() doesn't return for at least:
(time it takes to get resolve error) * (number of urls with bad domain)
Here's someone else who has discovered this:
Just a note on the asynchronous nature of cURL’s multi functions: the DNS lookups are not (as far as I know today) asynchronous. So if one DNS lookup of your group fails, everything in the list of URLs after that fails also. We actually update our hosts.conf (I think?) file on our server daily in order to get around this. It gets the IP addresses there instead of looking them up. I believe it’s being worked on, but not sure if it’s changed in cURL yet.
Also, testing shows that cURL (at least my version) does follow the CURLOPT_CONNECTTIMEOUT setting. Of course the first step of a multi cycle may still take a long time, since cURL waits for every url to resolve or timeout.

I think your problem is releated to:
(62) CURLOPT_TIMEOUT does not work properly with the regular multi and multi_socket interfaces. The work-around for apps is to simply remove the easy handle once the time is up.
See also: http://curl.haxx.se/bug/view.cgi?id=2501457
If that is the case you should watch your curl handles for timeouts and remove them from the multi pool.

Related

How to process multiple post requests in wordpress async

My usecase is eg. to send three FB and two GA-firebase events from server-side code to the corresponding tracking endpoints. But that significantly decreases loading time for each page-view.
I tried to implement it as a loop as described here, tried the wp_remote_post blocking attribute in combination with timeout: 0.01 but all of them increase the page load still significantly. Adding the redundant method attribute to wp_remote_post as mentioned here helped a bit, which seems very confusing to me
Without the post_request the page loads in 70ms, with the post request it goes to 600ms. I couldn't see much difference in the blocking attribute in load time change.
I also tried to use wp_schedule_single_event( time(), 'trigger_async_schedule_hook' ); but this didnt work in my local docker setup. I guess due to missing cron abilities.
Since php relies on HTTP it seems that this seems not really possible at first glance?! I'm used to it from node where you can send requests and wait for them in another context aka after page load.
What is the way to go here for?
Which is the way to go here? I thought about an external queue outside of PHP execution, but that seems overkill. Also other wp-async ideas would be very helpful.
PS: I searched the whole internet but couldn't find exactly the solution.
Cheers

Very bad TTFB time [duplicate]

I have a query which involves getting a list of user from a table in sorted order based on at what time it was created. I got the following timing diagram from the chrome developer tools.
You can see that TTFB (time to first byte) is too high.
I am not sure whether it is because of the SQL sort. If that is the reason then how can I reduce this time?
Or is it because of the TTFB. I saw blogs which says that TTFB should be less (< 1sec). But for me it shows >1 sec. Is it because of my query or something else?
I am not sure how can I reduce this time.
I am using angular. Should I use angular to sort the table instead of SQL sort? (many posts say that shouldn't be the issue)
What I want to know is how can I reduce TTFB. Guys! I am actually new to this. It is the task given to me by my team members. I am not sure how can I reduce TTFB time. I saw many posts, but not able to understand properly. What is TTFB. Is it the time taken by the server?
The TTFB is not the time to first byte of the body of the response (i.e., the useful data, such as: json, xml, etc.), but rather the time to first byte of the response received from the server. This byte is the start of the response headers.
For example, if the server sends the headers before doing the hard work (like heavy SQL), you will get a very low TTFB, but it isn't "true".
In your case, TTFB represents the time you spend processing data on the server.
To reduce the TTFB, you need to do the server-side work faster.
I have met the same problem. My project is running on the local server. I checked my php code.
$db = mysqli_connect('localhost', 'root', 'root', 'smart');
I use localhost to connect to my local database. That maybe the cause of the problem which you're describing. You can modify your HOSTS file. Add the line
127.0.0.1 localhost.
TTFB is something that happens behind the scenes. Your browser knows nothing about what happens behind the scenes.
You need to look into what queries are being run and how the website connects to the server.
This article might help understand TTFB, but otherwise you need to dig deeper into your application.
If you are using PHP, try using <?php flush(); ?> after </head> and before </body> or whatever section you want to output quickly (like the header or content). It will output the actually code without waiting for php to end. Don't use this function all the time, or the speed increase won't be noticable.
More info
I would suggest you read this article and focus more on how to optimize the overall response to the user request (either a page, a search result etc.)
A good argument for this is the example they give about using gzip to compress the page. Even though ttfb is faster when you do not compress, the overall experience of the user is worst because it takes longer to download content that is not zipped.

Domain Lookup Script execution time

I need to lookup domains names from an XML file and then loop through each domain to see whether it exists or not..
Im using below approaches..
1.fsockopen()
2.checkdnsrr()
Number of records in XML file is around 120.Im using AJAX to get the results..
Results :
**1.with approach-1 -- it took 13-14 s on an average on localhost
2.with approach-1 -- it took 25-30 s on an average on live server
1.with approach-2 -- it took 6-8 s on an average on localhost
2.with approach-1 -- it took 19-22 s on an average on live server**
Why the difference with localhost and live server??
Because in both the cases i have a 2MBPS Machine to test from..
Also i would like to show the availability of each domain entry as soon as it is scanned rather than dumping whole results when ajax call returns..How am i supposed to achieve this??
Any help is appreciated
First of all, queries on localhost might be faster because the DNS results are cached already.
You should do these tests on a cache-clean machine, but it's always tricky to clean DNS cache entries. Or maybe your browser cache some results too. (See DNS Flusher)
About the AJAX requests, what you are looking for is asynchronous requests.
AJAX works in both modes :
with synchronous calls, the script wait/hangs until the responses before going on your script, so it's longer, but it's sequential.
with asynchronous calls, the script do the call, and goes on. The response might arrive or not, the script continues anyway. The responses will be handle when they arrive, maybe not in the same order you made the calls.
Checkout http://javascript.about.com/od/ajax/a/ajaxasyn.htm
In jQuery, you have a parameter async: true to achieve this.
Good luck with your project.

Faster alternative to file_get_contents()

Currently I'm using file_get_contents() to submit GET data to an array of sites, but upon execution of the page I get this error:
Fatal error: Maximum execution time of 30 seconds exceeded
All I really want the script to do is start loading the webpage, and then leave. Each webpage may take up to 5 minutes to load fully, and I don't need it to load fully.
Here is what I currently have:
foreach($sites as $s) //Create one line to read from a wide array
{
file_get_contents($s['url']); // Send to the shells
}
EDIT: To clear any confusion, this script is being used to start scripts on other servers, that return no data.
EDIT: I'm now attempting to use cURL to do the trick, by setting a timeout of one second to make it send the data and then stop. Here is my code:
$ch = curl_init($s['url']); //load the urls
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 1); //Only send the data, don't wait.
curl_exec($ch); //Execute
curl_close($ch); //Close it off.
Perhaps I've set the option wrong. I'm looking through some manuals as we speak. Just giving you an update. Thank you all of you that are helping me thus far.
EDIT: Ah, found the problem. I was using CURLOPT_CONNECTTIMEOUT instead of CURLOPT_TIMEOUT. Whoops.
However now, the scripts aren't triggering. They each use ignore_user_abort(TRUE); so I can't understand the problem
Hah, scratch that. Works now. Thanks a lot everyone
There are many ways to solve this.
You could use cURL with its curl_multi_* functions to execute asynchronously the requests. Or use cURL the common way but using 1 as timeout limit, so it will request and return timeout, but the request will be executed.
If you don't have cURL installed, you could continue using file_get_contents but forking processes (not so cool, but works) using something like ZendX_Console_Process_Unix so you avoid the waiting between each request.
As Franco mentioned and I'm not sure was picked up on, you specifically want to use the curl_multi functions, not the regular curl ones. This packs multiple curl objects into a curl_multi object and executes them simultaneously, returning (or not, in your case) the responses as they arrive.
Example at http://php.net/curl_multi_init
Re your update that you only need to trigger the operation:
You could try using file_get_contents with a timeout. This would lead to the remote script being called, but the connection being terminated after n seconds (e.g. 1).
If the remote script is configured so it continues to run even if the connection is aborted (in PHP that would be ignore_user_abort), it should work.
Try it out. If it doesn't work, you won't get around increasing your time_limit or using an external executable. But from what you're saying - you just need to make the request - this should work. You could even try to set the timeout to 0 but I wouldn't trust that.
From here:
<?php
$ctx = stream_context_create(array(
'http' => array(
'timeout' => 1
)
)
);
file_get_contents("http://example.com/", 0, $ctx);
?>
To be fair, Chris's answer already includes this possibility: curl also has a timeout switch.
it is not file_get_contents() who consume that much time but network connection itself.
Consider not to submit GET data to an array of sites, but create an rss and let them get RSS data.
I don't fully understands the meaning behind your script.
But here is what you can do:
In order to avoid the fatal error quickly you can just add set_time_limit(120) at the beginning of the file. This will allow the script to run for 2 minutes. Of course you can use any number that you want and 0 for infinite.
If you just need to call the url and you don't "care" for the result you should use cUrl in asynchronous mode. This case any call to the URL will not wait till it finished. And you can call them all very quickly.
BR.
If the remote pages take up to 5 minutes to load, your file_get_contents will sit and wait for that 5 minutes. Is there any way you could modify the remote scripts to fork into a background process and do the heavy processing there? That way your initial hit will return almost immediately, and not have to wait for the startup period.
Another possibility is to investigate if a HEAD request would do the trick. HEAD does not return any data, just headers, so it may be enough to trigger the remote jobs and not wait for the full output.

Alternative to header(location: ) php

I have a while loop that constructs a url for an SMS api.
This loop will eventually be sending hundreds of messages, thus being hundreds of urls.
How would i go about doing this?
I know you can use header(location: ) to chnage the location of the browser, but this sint going to work, as the php page needs to remain running
Hope this is clear
thankyouphp h
You have a few options:
file_get_contents as Trevor noted
curl_ - Use the curl library of commands to make the request
fsock* - Handle the connection a bit lower level, but making and managing the socket connection.
All will probably work just fine and you should pick one depending on your overall needs.
After you construct each $url, use file_get_contents($url)
If it just a case that during the construction of all these URLs you get the error "Maximum Execution Time Exceeded", then just add set_time_limit(10); after the URL generation to give your script an extra 10 seconds to generate the next URL.
I'm not quite sure what you are actually asking in this question - do you want the user to visit the urls (if so, can you does the end users web browser support javascript?), just be shown the urls, for the urls to be generated and stored or for the PHP script to fetch each url (and do you care about the user seeing the result) - but if you clarify the question, the community may be able to provide you with a perfect answer!
Applying a huge amount guesswork, I infer from your post that you need to dynamically create a URL, and the invoking of that URL causes an SMS message to be sent.
If this is the case, then you should not be trying to invoke the URL from the client but from server side using the url_wrappers or cURL.
You should also consider running the loop in a seperate process and reporting back to the browser using (e.g.) AJAX.
Have a google for spawning long running processes in PHP - but be warned there is a lot of bad advice on the topic published out there.
C.

Categories