Well, I am attempting to reuse the handles I've spawned in the initial process, however after the first run it simply stops working. If I remove (or recreate the entire handler) the handles and add them again, it works fine. What could be the culprit of this?
My code currently looks like this:
<?php
echo 'Handler amount: ';
$threads = (int) trim(fgets(STDIN));
if($threads < 1) {
$threads = 1;
}
$s = microtime(true);
$url = 'http://mywebsite.com/some-script.php';
$mh = curl_multi_init();
$ch = array();
for($i = 0; $i < $threads; $i++) {
$ch[$i] = curl_init($url);
curl_setopt_array($ch[$i], array(
CURLOPT_USERAGENT => 'Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20130213 Firefox/21.0',
CURLOPT_REFERER => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_NOBODY => true
));
curl_multi_add_handle($mh, $ch[$i]);
}
while($mh) {
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
$e = microtime(true);
$totalTime = number_format($e - $s, 2);
if($totalTime >= 1) {
echo floor($threads / $totalTime) . ' requests per second (total time '.$totalTime.'s)' . "\r";
$s = microtime(true);
}
}
foreach($ch as $handler) {
curl_multi_remove_handle($mh, $handler);
curl_close($handler);
}
curl_multi_close($mh);
?>
When I have CURLOPT_VERBOSE set to true, I see many "additional stuff not fine transfer.c:1037: 0 0" messages, I read about them on a different question, and it seems that it is caused by some obvious things:
Too fast
Firewall
ISP restricting
AFAIK, this is not it, because if I recreate the handles every time, they successfully complete at about 79 requests per second (about 529 bytes each)
My process for reusing the handles:
Create the multi handler, and add the specified number of handles to the multi handler
While the mutli handler is working, execute all the handles
After the while loop has stopped (it seems very unlikely that it will), close all the handles and the multi curl handler
It executes all handles once and then stops.
This is really stumping me. Any ideas?
I ran into the same problem (using C++ though) and found out that I need to remove the curl easy handle(s) and add it back in again. My solution was to remove all handles at the end of the curl_multi_perform loop and add them back in at the beginning of the outer loop in which I reuse existing keep-alive connections:
for(;;) // loop using keep-alive connections
{
curl_multi_add_handle(...)
while ( stillRunning ) // curl_multi_perform loop
{
...
curl_multi_perform(...)
...
}
curl_multi_remove_handle(...)
}
Perhaps this also applies to your PHP scenario. Remember: don't curl_easy_cleanup or curl_easy_init the curl handle in between.
If you turn on CURLOPT_VERBOSE you can follow along in the console and very that your connections are indeed reused. That has solved this problem for me.
Related
When I run a check on 10 urls, if I am able to get a connection with the host server, the handle will return a success message (CURLE_OK)
When processing each handle if a server refuses the connection, the handle will include a error message.
The problem
I assumed that when we get a bad handle, CURL will mark this handle but continue to process the unprocessed handles, however this is not what seems to happen.
When we come across a bad handle, CURL will mark this handle as bad, but will not process the remaining unprocessed handles.
This can be hard to detect, if I do get a connection with all handles, which is what happens most of the time, then the problem is not visible.(CURL only stops on first bad connection);
For the test, I had to find a suitable site which loads slow/refuses x amount simultaneous of connections.
set_time_limit(0);
$l = array(
'http://smotri.com/video/list/',
'http://smotri.com/video/list/sports/',
'http://smotri.com/video/list/animals/',
'http://smotri.com/video/list/travel/',
'http://smotri.com/video/list/hobby/',
'http://smotri.com/video/list/gaming/',
'http://smotri.com/video/list/mult/',
'http://smotri.com/video/list/erotic/',
'http://smotri.com/video/list/auto/',
'http://smotri.com/video/list/humour/',
'http://smotri.com/video/list/film/'
);
$mh = curl_multi_init();
$s = 0;
$f = 10;
while($s <= $f)
{
$ch = curl_init();
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT => 0,
CURLOPT_CONNECTTIMEOUT => 0,
CURLOPT_RETURNTRANSFER => 1
);
curl_setopt_array($ch, $curlsettings);
curl_multi_add_handle($mh,$ch);
$s++;
}
$active = null;
do
{
curl_multi_exec($mh,$active);
curl_multi_select($mh);
$info = curl_multi_info_read($mh);
echo '<pre>';
var_dump($info);
if($info['result'] === CURLE_OK)
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
if($info['result'] != 0)
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
} while ($active > 0);
curl_multi_close($mh);
I have dumped $info in the script which asks the Multi Handle if there is any new information on any handles whilst running.
When the script has ended we will see some bool(false) - when no new information was available(handles were still processing), along with all handles if all was successful or limited handles if one handle failed.
I have failed at fixing this, its probably something I have overlooked and I have gone too far down the road on attempting to fix things which are not relevant.
Some attempts at fixing this was.
Assign each $ch handle to a array - $ch[1], $ch[2] etc (instead of
adding current $ch handle to multi_handle then overwriting - as whats
in the test)
Removing handles after success/failure with
curl_multi_remove_handle
Set CURLOPT_CONNECTTIMEOUT and CURLOPT_TIMEOUT to infinity.
many more.(I will update this post as I have forgotten all of them)
Testing this with Php version 5.4.14
Hopefully I have illustrated the points well enough.
Thanks for reading.
I've been mucking around with your script for a while now trying to get it to work.It was only when I read Repeated calls to this function will return a new result each time, until a FALSE is returned as a signal that there is no more to get at this point., for http://se2.php.net/manual/en/function.curl-multi-info-read.php, that I realized a while loop might work.
The extra while loop makes it behave exactly how you'd expect. Here is the output I get:
http://smotri.com/video/list/sports/ failed
http://smotri.com/video/list/travel/ failed
http://smotri.com/video/list/gaming/ failed
http://smotri.com/video/list/erotic/ failed
http://smotri.com/video/list/humour/ failed
http://smotri.com/video/list/animals/ success
http://smotri.com/video/list/film/ success
http://smotri.com/video/list/auto/ success
http://smotri.com/video/list/ failed
http://smotri.com/video/list/hobby/ failed
http://smotri.com/video/list/mult/ failed
Here's the code I used for testing:
<?php
set_time_limit(0);
$l = array(
'http://smotri.com/video/list/',
'http://smotri.com/video/list/sports/',
'http://smotri.com/video/list/animals/',
'http://smotri.com/video/list/travel/',
'http://smotri.com/video/list/hobby/',
'http://smotri.com/video/list/gaming/',
'http://smotri.com/video/list/mult/',
'http://smotri.com/video/list/erotic/',
'http://smotri.com/video/list/auto/',
'http://smotri.com/video/list/humour/',
'http://smotri.com/video/list/film/'
);
$mh = curl_multi_init();
$s = 0;
$f = 10;
while($s <= $f)
{
$ch = curl_init();
if($s%2)
{
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT_MS => 3000,
CURLOPT_RETURNTRANSFER => 1,
);
}
else
{
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT_MS => 4000,
CURLOPT_RETURNTRANSFER => 1,
);
}
curl_setopt_array($ch, $curlsettings);
curl_multi_add_handle($mh,$ch);
$s++;
}
$active = null;
do
{
$mrc = curl_multi_exec($mh,$active);
curl_multi_select($mh);
while($info = curl_multi_info_read($mh))
{
echo '<pre>';
//var_dump($info);
if($info['result'] === 0)
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
}
else
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
}
}
} while ($active > 0);
curl_multi_close($mh);
Hope that helps. For testing just adjust CURLOPT_TIMEOUT_MS to your internet connection. I made it so it alternates between 3000 and 4000 milliseconds as 3000 will fail and 4000 usually succeeds.
Update
After going through the PHP and libCurl docs I have found how curl_multi_exec works (in libCurl its curl_multi_perform). Upon first being called it starts handling transfers for all the added handles (added before via curl_multi_add_handle).
The number it assigns $active is the number of transfers still running. So if it's less than the total number of handles you have then you know one or more transfers are complete. So curl_multi_exec acts as a kind of progress indicator as well.
As all transfers are handled in a non-blocking fashion (transfers can finish simultaneously) the while loop curl_multi_exec's in cannot represent each iteration of completed url requests.
All data is stored in a queue so as soon as one or more transfers are complete you can call curl_multi_info_read to fetch this data.
In my original answer I had curl_multi_info_read in a while loop. This loop would keep iterating until curl_multi_info_read found no remaining data in the queue. After which the outer while loop would move onto the next iteration if $active != 0 (meaning curl_multi_exec reported transfers still not complete).
To summarize, the outer loop keeps iterating when there are still transfers not completed and the inner loop iterates only when there's data from a completed transfer.
The PHP documentation is pretty bad for curl multi functions so I hope this cleared a few things up. Below is an alternative way to do the same thing.
do
{
curl_multi_exec($mh,$active);
} while ($active > 0);
// while($info = curl_multi_info_read($mh)) would work also here
for($i = 0; $i <= $f; $i++){
$info = curl_multi_info_read($mh);
if($info['result'] === 0)
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
}
else
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
}
}
From this information you can also see curl_multi_select is not needed as you don't want something that blocks until there is activity.
With the code you provided in your question it only seemed like curl wasn't proceeding after a few failed transfers but there was actually still data queued in the buffer. Your code just wasn't calling curl_multi_info_read enough times. The reason all the successful transfers were picked up by your code is due to PHP being run on a single thread and so the script hanged waiting for the requests. The timeouts for the failed requests didn't impact PHP enough to make it hang/wait that long so the number of iterations the while loop was doing was less than the number of queued data.
I have a web portal that needs to download many of separate json files and display their contents in a sort of form view. By lots I mean 32 separate files minimum.
I've tried cUrl with brute force iteration and its taking ~12.5 seconds.
I've tried curl_multi_exec as demonstrated here http://www.php.net/manual/en/function.curl-multi-init.php with the function below and its taking ~9 seconds. A little better but still terribly slow.
function multiple_threads_request($nodes){
$mh = curl_multi_init();
$curl_array = array();
foreach($nodes as $i => $url)
{
$curl_array[$i] = curl_init($url);
curl_setopt($curl_array[$i], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($mh, $curl_array[$i]);
}
$running = NULL;
do {
curl_multi_exec($mh,$running);
} while($running > 0);
$res = array();
foreach($nodes as $i => $url)
{
$res[$url] = curl_multi_getcontent($curl_array[$i]);
}
foreach($nodes as $i => $url){
curl_multi_remove_handle($mh, $curl_array[$i]);
}
curl_multi_close($mh);
return $res;
}
I realize this is an inherently expensive operation but does anyone know any other alternatives that might faster?
EDIT: In the end, my system was limiting curl_multi_exec and moving the code to a production machine saw dramatic improvements
You should definitely look into benchmarking your cURLs to see which one has the slowdown but this was too lengthy for a comment so let me know if it helps or not:
// revert to "cURLing with brute force iteration" as you described it :)
$curl_timer = array();
foreach($curlsite as $row)
{
$start = microtime(true);
/**
* curl code
*/
$curl_timer[] = (microtime(true)-$start);
}
echo '<pre>'.print_r($curl_timer, true).'</pre>';
I have 1000 feed urls sitting in a MySQL database table. I need to do a http request to all these urls every 2 minutes. I wrote a php script to do that, but the script takes 5min 30sec to run.
I want to be able to finish all the 1000 requests in under a minute. Is there a way to run multiple async processes to get the job done faster? Any help is appreciated. Thanks in advance.
Since your question is about sending http requests, not really ping, you can use Grequests (Requests+gevent) to do it easily and fast (in my experience seconds for a couple hundred url requests):
import grequests
urls = [
'http://www.python.org',
'http://python-requests.org',
'http://www.google.com',
]
rs = (grequests.get(u) for u in urls)
grequests.map(rs) # [<Response [200]>, <Response [200]>, <Response [200]>]
Your Php script takes 5 mins to run because it is synchronous code, which means that for every request you sent, you have to wait for response to arrive before moving onto sending the next request.
The trick here is not to wait (or block as many would call) for responses but go straight to make the next request, and you can achieve it easily with gevent(coroutine-based) or nodejs. You can read more on it here.
Have a look at the AnyEvent::Ping or AnyEvent::FastPing modules on CPAN.
Below is straightforward example of using AnyEvent::Ping to ping 10000 urls:
use strict;
use warnings;
use AnyEvent;
use AnyEvent::Ping;
my $cv = AnyEvent->condvar;
my $ping = AnyEvent::Ping->new;
my #results;
for my $url (get_ten_thousand_urls()) {
$cv->begin;
# ping each URL just once
$ping->ping( $url, 1, sub {
# [ url address, ping status ]
push #results, [ $url, $_[0]->[0][0] ];
$cv->end;
});
}
$cv->recv;
# now do something with #results
Some quick tests of above using 10,000 random URLs all took just over 7 seconds to run on my Macbook Air. With tweaking and/or using faster event loop then this time will drop further (above used default pure Perl event loop).
NB. AnyEvent is an abstraction library which will allow you to use the async event system provided by (or installed on) your system. If you want to use a specific event loop then remember to install the relevant Perl module from CPAN, for e.g. EV if using libev. AnyEvent will default to a pure Perl event loop if nothing else is found (installed).
BTW, If you just need to check an HTTP request (ie. not ping) then simply replace AnyEvent::Ping part with AnyEvent::HTTP.
You tagged this with "python", so I'll assume that using Python is an option here. Look at the multiprocessing module. For example:
#!/usr/bin/env python
import multiprocessing
import os
import requests
import subprocess
addresses = ['1.2.3.4', '1.2.3.5', '4.2.2.1', '8.8.8.8']
null = open(os.devnull, 'w')
def fbstatus(count):
"""Returns the address, and True if the ping returned in under 5 seconds or
else False"""
return (count,
requests.get('http://www.facebook.com/status.php').status_code)
def ping(address):
"""Returns the address, and True if the ping returned in under 5 seconds or
else False"""
return address, not subprocess.call(['ping', '-c1', '-W5', address],
stdout=null)
pool = multiprocessing.Pool(15)
if False:
print pool.map(ping, addresses)
else:
pool.map(fbstatus, range(1000))
New - Fetching pages
The fbstatus() function fetches a page from Facebook. This scaled almost linearly with the size of the pool up through 30 concurrent processes. It averaged a total runtime of about 80 seconds on my laptop. At 30 workers, it took a total of about 3.75 wall clock seconds to finish.
Old - Pinging
This uses the subprocess module to call the ping command with a 5 second timeout and a count of 1. It uses the return value of ping (0 for success, 1 for failure) and negates it to get False for failure and True for success. The ping() function returns the address it was called with plus that boolean result.
The last bit creates a multiprocessing pool with 5 child processes, then calls ping() on each of the values in addresses. Since ping() returns its address, it's really easy to see the result of pinging each of those addresses.
Running it, I get this output:
[('1.2.3.4', False), ('1.2.3.5', False), ('4.2.2.1', True), ('8.8.8.8', True)]
That run took 5.039 seconds of wallclock time and 0% CPU. In other words, it spent almost 100% of its time waiting for ping to return. In your script, you'd want to use something like Requests to fetch your feed URLs (and not the literal ping command that I was using as an example), but the basic structure could be nearly identical.
You could try multithreading ping on python.
Here is good example.
#!/usr/bin/env python2.5
from threading import Thread
import subprocess
from Queue import Queue
num_threads = 4
queue = Queue()
ips = ["10.0.1.1", "10.0.1.3", "10.0.1.11", "10.0.1.51"]
#wraps system ping command
def pinger(i, q):
"""Pings subnet"""
while True:
ip = q.get()
print "Thread %s: Pinging %s" % (i, ip)
ret = subprocess.call("ping -c 1 %s" % ip,
shell=True,
stdout=open('/dev/null', 'w'),
stderr=subprocess.STDOUT)
if ret == 0:
print "%s: is alive" % ip
else:
print "%s: did not respond" % ip
q.task_done()
#Spawn thread pool
for i in range(num_threads):
worker = Thread(target=pinger, args=(i, queue))
worker.setDaemon(True)
worker.start()
#Place work in queue
for ip in ips:
queue.put(ip)
#Wait until worker threads are done to exit
queue.join()
I used Perl's POE Ping Component module for this task quite extensively.
[Update: Re-tested this with maxSockets = 100 and while connected to a very good network connection. The script finished in < 1 second, meaning the biggest factor is probably network thruput / latency, as previously noted. Your results will almost certainly vary. ;) ]
You can use node.js for this, as it's API for doing HTTP is powerful, clean, and simple. E.g. The following script fetches ~1000 requests in 10 seconds less than one second on my MacBook Pro:
test.js
var http = require('http');
// # of simultaneouse requests allowed
http.globalAgent.maxSockets = 100;
var n = 0;
var start = Date.now();
function getOne(url) {
var id = n++;
var req = http.get(url, function(res) {
res.on('data', function(chunk){
// do whatever with response data here
});
res.on('end', function(){
console.log('Response #' + id + ' complete');
n--;
if (n == 0) {
console.log('DONE in ' + (Date.now() - start)/1000 + ' secs');
}
});
});
}
// Set # of simultaneous connections allowed
for (var i = 0; i < 1000; i++) {
getOne('http://www.facebook.com/status.php');
}
Outputs ...
$ node test.js
Response #3 complete
Response #0 complete
Response #2 complete
...
Response #999 complete
DONE in 0.658 secs
Thanks Alex Lunix for the suggestion. I looked up curl_multi_* and found a solution to do it in curl, so I don't have to change my code much. But thank you all the others for the answers. Here is what I did:
<?php
require("class.php");
$obj=new module();
$det=$obj->get_url();
$batch_size = 40;
function curlTest2($urls) {
clearstatcache();
$batch_size = count($urls);
$return = '';
echo "<br/><br/>Batch:";
foreach ($urls as &$url)
{
echo "<br/>".$url;
if(substr($url,0,4)!="http") $url = "http://".$url;
$url = "https://ajax.googleapis.com/ajax/services/feed/load?v=1.0&num=-1&q=".$url;
}
$userAgent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)';
$chs = array();
for ($i = 0; $i < $batch_size; $i++)
{
$ch = curl_init();
array_push($chs, $ch);
}
for ($i = 0; $i < $batch_size; $i++)
{
curl_setopt($chs[$i], CURLOPT_HEADER, 1);
curl_setopt($chs[$i], CURLOPT_NOBODY, 1);
curl_setopt($chs[$i], CURLOPT_USERAGENT, $userAgent);
curl_setopt($chs[$i], CURLOPT_RETURNTRANSFER, 1);
curl_setopt($chs[$i], CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($chs[$i], CURLOPT_FAILONERROR, 1);
curl_setopt($chs[$i], CURLOPT_FRESH_CONNECT, 1);
curl_setopt($chs[$i], CURLOPT_URL, $urls[$i]);
}
$mh = curl_multi_init();
for ($i = 0; $i < $batch_size; $i++)
{
curl_multi_add_handle($mh, $chs[$i]);
}
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
for ($i = 0; $i < $batch_size; $i++)
{
curl_multi_remove_handle($mh, $chs[$i]);
}
curl_multi_close($mh);
}
$startTime = time();
$urls = array();
foreach($det as $key=>$value){
array_push($urls, $value['url']);
if (count($urls) == $batch_size)
{
curlTest2($urls);
$urls = array();
}
}
echo "<br/><br/>Time: ".(time() - $startTime)."sec";
?>
This brought down my processing time from 332sec to 18sec. The code probably can be optimized a little but you get the gist of it.
I'm trying to take a list of 20,000 + domain names and check if they are "alive". All I really need is a simple http code check but I can't figure out how to get that working with curl_multi. On a separate script I'm using I have the following function which simultaneously checks a batch of 1000 domains and returns the json response code. Maybe this can be modified to just get the http response code instead of the page content?
(sorry about the syntax I couldn't get it to paste as a nice block of code without going line by line and adding 4 spaces...(also tried skipping a line and adding 8 spaces)
$dotNetRequests = array of domains...
//loop through arrays
foreach(array_chunk($dotNetRequests, 1000) as $Netrequests) {
$results = checkDomains($Netrequests);
$NetcurlRequest = array_merge($NetcurlRequest, $results);
}
function checkDomains($data) {
// array of curl handles
$curly = array();
// data to be returned
$result = array();
// multi handle
$mh = curl_multi_init();
// loop through $data and create curl handles
// then add them to the multi-handle
foreach ($data as $id => $d) {
$curly[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, 0);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
// post?
if (is_array($d)) {
if (!empty($d['post'])) {
curl_setopt($curly[$id], CURLOPT_POST, 1);
curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
}
}
curl_multi_add_handle($mh, $curly[$id]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
// get content and remove handles
foreach($curly as $id => $c) {
// $result[$id] = curl_multi_getcontent($c);
// if($result[$id]) {
if (curl_multi_getcontent($c)){
//echo "yes";
$netName = $data[$id];
$dName = str_replace(".net", ".com", $netName);
$query = "Update table1 SET dotnet = '1' WHERE Domain = '$dName'";
mysql_query($query);
}
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($mh);
return $result;
}
In any other language you would thread this kind of operation ...
https://github.com/krakjoe/pthreads
And you can in PHP too :)
I would suggest a few workers rather than 20,000 individual threads ... not that 20,000 threads is out of the realms of possibility - it isn't ... but that wouldn't be a good use of resources, I would do as you are now and have 20 workers getting the results of 1000 domains each ... I assume you don't need me to give the example of getting a response code, I'm sure curl would give it to you, but it's probably overkill to use curl being that you do not require it's threading capabilities: I would fsockopen port 80, fprintf GET HTTP/1.0/\n\n, fgets the first line and close the connection ... if you're going to be doing this all the time then I would also use Connection: close so that the receiving machines are not holding connections unnecessary ...
This script works great for handling bulk simultaneous cURL requests using PHP.
I'm able to parse through 50k domains in just a few minutes using it!
https://github.com/petewarden/ParallelCurl/
I have a widget that runs on my homepage which is loading xml data from an external source. I want to timeout the xml load after x seconds (lately the other site has been having load issues). Here is the function I have so far. I can't figure out how to make the timer ineract with the simplexml_load_file().
Am I on the right track? Is there a way to make this work? Or is there a better way to do this? If this does timeout, I still need the rest of the page to continue loading, so I can't use set_time_limit(), because that will end all script execution, right?
function timer($end) {
$count = 0;
while($end > $count) {
sleep(1);
$count++;
}
return true;
}
$we = simplexml_load_file('http://forecast.weather.gov/MapClick.php?lat=44.08920&lon=-70.17250&FcstType=xml');
if(timer(3)) return;
So you want to set a timeout for simplexml_load_file(). You can't set it specifically, but you can just set it globally (for all socket based streams) before using the function:
ini_set('default_socket_timeout', 3);
$we = simplexml_load_file($url);
// you can restore the default value after use, if you want
ini_restore('default_socket_timeout');
I would use CURL instead of loading the URL directly...
function getXml($url, $timeout = 0){
$ch = curl_init($url);
curl_setopt_array($ch,array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_TIMEOUT => (int) $timeout
));
if($xml = curl_exec($ch)){
return new SimpleXmlElement($xml);
}
else {
return null;
}
}
//Example
$xmlData = getXml('http://yoururl.com', 2); // 2 second timeout
You could first read the content of the file with some blocking or more reliable function (like fopen, fsockopen or curl, choose the best you can use) and then pass the content to simplexml_load_string instead of simplexml_load_file