I would like to stop a simplexml_load_file if it takes too long to load and/or isn't reachable (occasionally the site with the xml goes down) seeing as I don't want my site to completely lag if theirs aren't up.
I tried to experiment a bit myself, but haven't managed to make anything work.
Thank you so much in advance for any help!
You can't have an arbitrary function quit after a specified time. What you can do instead is to try to load the contents of the URL first - and if it succeeds, continue processing the rest of the script.
There are several ways to achieve this. The easiest is to use file_get_contents() with a stream context set:
$context = stream_context_create(array('http' => array('timeout' => 5)));
$xmlStr = file_get_contents($url, FALSE, $context);
$xmlObj = simplexml_load_string($xmlStr);
Or you could use a stream context with simplexml_load_file() via the libxml_set_streams_context() function:
$context = stream_context_create(array('http' => array('timeout' => 5)));
libxml_set_streams_context($context);
$xmlObj = simplexml_load_file($url);
You could wrap it as a nice little function:
function simplexml_load_file_from_url($url, $timeout = 5)
{
$context = stream_context_create(
array('http' => array('timeout' => (int) $timeout))
);
$data = file_get_contents($url, FALSE, $context);
if(!$data) {
trigger_error("Couldn't get data from: '$url'", E_USER_NOTICE);
return FALSE;
}
return simplexml_load_string($data);
}
Alternatively, you can consider using the cURL (available by default). The benefit of using cURL is that you get really fine grained control over the request and how to handle the response.
You should be using a stream context with a timeout option coupled with file_get_contents
$context = stream_context_create(array('http' => array('timeout' => 5))); //<---- Setting timeout to 5 seconds...
and now map that to your file_get_contents
$xml_load = file_get_contents('http://yoururl', FALSE, $context);
$xml = simplexml_load_string($xml_load);
Related
Trying to get contents of a url, but to avoid getting blocked i want to use a proxy every request.
But both ways do not seem to work...
EDIT:
Now i tried this, but my server log keeps logging my real IP.
$page = file_get_contents("https://free-proxy-list.net/");
preg_match_all("/[0-9]{1,3}\.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}<\/td><td>[0-9]{1,5}/", $page, $matches);
$randomproxy = $matches[0][array_rand($matches[0])];
$randomproxy = "tcp://".str_replace("</td><td>", ":", $randomproxy);
echo $randomproxy;
// configure default context to use proxy
$opts = array(
'tcp' => array(
'proxy' => $randomproxy
)
);
$context = stream_context_create($opts);
$sFile = file_get_contents("https://www.website.tld/inner.html", False, $context);
var_dump($sFile);
I want to build a code in which if I give the username it dump me the below highlighted value(no. of followers) from the page source of any instagram user.
I know about curl and DOM concept a bit.[![enter image description here][1]][1]
function callInstagram($url)
{
$ch = curl_init();
curl_setopt_array($ch, array(CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_SSL_VERIFYPEER => false, CURLOPT_SSL_VERIFYHOST => 2)) $result = curl_exec($ch); curl_close($ch); return $result; }
$url = "instagram.com/xyz/";;
$dom = new domDocument();
$dom->loadHTML(callInstagram($url));
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('script');
print_r($tables); ?> Still building
Look like you are trying to get instagram's data.
It's better to use instragram's API to achieve your goal.
link: https://www.instagram.com/developer/
Edit:
Another way assume you can get string of all html.
Next, use regex to extract json string out.
You can use this regex: _sharedData = (.*);
Finally, use json_decode to convert string to json.
I have following method to send file_get_contents requests:
protected function query($page, $post, $timeout = false, $debug = false, $enableWarnings = true) {
$this->correctHostIfNeeded(self::HTTP_CONNECTION_TYPE);
$page = $this->host . (substr($page, 0, 1) == '?' ? '' : '?') . $page;
$opts = array('http' => array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
));
if (!empty($post)) {
$postdata = http_build_query($post);
$opts['http']['content'] = $postdata;
}
if ($timeout > 0)
$opts['http']['timeout'] = $timeout;
$context = stream_context_create($opts);
$result = $enableWarnings ? file_get_contents($page, false, $context) : #file_get_contents($page, false, $context);
return $result;
}
It usually works fine, better than curl version (it occasionally not executing properly, regardless of data in post). Unfortunately, if I send really big POST usign file_get_contents (for example array with 100k elements) it fails. Sometimes the target server saves part of the data but it never gets it all.
I know the internet connetion between servers is not the problem (both servers are in my own datacenters and speed between is stable about 100Mb). Also the code itself on both servers seems to be fine because with smaller data it works fine and if I change to curl big packages are received properly (unfortunately it sometimes fails and I read that it's not to strange behavior for curl).
Increase the execution time of the page, write this at the top-
ini_set('max_execution_time', 300);
Try to read file by parts, and merge result afterwards. In file_get_context you can specify offset and max_length argument.
I have a performance problem with my script (below).
The fread operation takes a lot of time, I get times like:
$time_split2 == 0.00135s
$time_split3 == 15.01747s
I have tested it even with a remote script that does nothing except echoing OK message - there is still the aprox. 15 seconds execution time
What could be the problem or how could I solve it another way.
I would prefer not to use curl (would that speed up things?) since the curl is not always installed with PHP, and I would like my code to be portable
$opts = array('http' =>
array(
'method' => 'POST',
'header' => array('Content-type: application/x-www-form-urlencoded', 'Custom-header: test'),
'content' => $postdata,
'timeout' => 60
)
);
$context = stream_context_create($opts);
$time_split = microtime(true);
$fp = fopen('http://someremotedomain/script.php', 'r', false, $context);
$time_split2 = microtime(true);
while(!feof($fp))
$result .= fread($fp, 4096);
fclose($fp);
$time_split3 = microtime(true);
$time_split2 = round($time_split2 - $time_split, 5);
$time_split3 = round($time_split3 - $time_split, 5);
UPDATE
I have used your suggestions - file_get_contents() + Connection: close - it doesn't work yet - file_get_contents() works with a delay and returns an empty string but - I have isolated the problem, here is the $postdata:
$postdata = http_build_query(
array('body' => $mail_html_content,
'from' => 'test <test#test.com>',
'to' => 'test2 <test2#test.com>',
)
);
when I remove 'body' from the array - file_get_contents() works fine and without any delays - how could this create a problem - $mail_html_content contains just a simple HTML string and it is not a big string
UPDATE 2
I have isolated the problem even more - when the length of the $postdata string exceeds 1024 chars, file_get_contents() starts to return empty values, below that value everything works fine, since method POST isn't limited by length of the data (at least for such low numbers) what could be the problem now??
You should try file_get_contents() instead of use while(!feof($fp)).
E.g.
/* EDIT: header should be something like that */
$opts = array(
'http' => array(
'method'=>"POST",
'header'=>"Content-Type: text/html; charset=utf-8",
),
);
$context = stream_context_create($opts);
$result = file_get_contents('http://someremotedomain/script.php', false, $context);
For other header information look here
Reason
according to fread documentation:
Note:
If you just want to get the contents of a file into a string, use file_get_contents() as
it has much better performance than the code above.
file_get_contents() doesn't read data for short urls
Example:
http://wp.me/pbZy8-1WM,
http://bit.ly/d00E2C
Please help me in handle this. OR Is there any CURL function to handle above links?
This in general works fine. If you find it doesn't do the right thing you can explicitly use a stream context:
$url = "http://bit.ly/d00E2C";
$context = stream_context_create(array('http' => array('max_redirects' => 5)));
$val = file_get_contents($url, false, $context);
should do it. No need to touch CURL for that.
On my machine, I cannot replicate your problem; I receive the page as intended. However, should the issue be with the redirect, this may solve your problem.
<?php
$opts = array(
'http' => array(
'follow_location' => 1,
'max_redirects' => 20
)
);
$context = stream_context_create($opts);
echo file_get_contents('http://wp.me/pbZy8-1WM', false, $context);
I imagine there may be a directive that toggles redirect following, but I have not yet found it. I will edit my answer should I.
What you can do is using curl with CURLOPT_FOLLOWLOCATION set to True:
$ch = curl_init("http://bit.ly/d00E2C");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($ch);
curl_close($ch);
echo $result;