I have following method to send file_get_contents requests:
protected function query($page, $post, $timeout = false, $debug = false, $enableWarnings = true) {
$this->correctHostIfNeeded(self::HTTP_CONNECTION_TYPE);
$page = $this->host . (substr($page, 0, 1) == '?' ? '' : '?') . $page;
$opts = array('http' => array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
));
if (!empty($post)) {
$postdata = http_build_query($post);
$opts['http']['content'] = $postdata;
}
if ($timeout > 0)
$opts['http']['timeout'] = $timeout;
$context = stream_context_create($opts);
$result = $enableWarnings ? file_get_contents($page, false, $context) : #file_get_contents($page, false, $context);
return $result;
}
It usually works fine, better than curl version (it occasionally not executing properly, regardless of data in post). Unfortunately, if I send really big POST usign file_get_contents (for example array with 100k elements) it fails. Sometimes the target server saves part of the data but it never gets it all.
I know the internet connetion between servers is not the problem (both servers are in my own datacenters and speed between is stable about 100Mb). Also the code itself on both servers seems to be fine because with smaller data it works fine and if I change to curl big packages are received properly (unfortunately it sometimes fails and I read that it's not to strange behavior for curl).
Increase the execution time of the page, write this at the top-
ini_set('max_execution_time', 300);
Try to read file by parts, and merge result afterwards. In file_get_context you can specify offset and max_length argument.
Related
Context
I have the following POST pipeline:
index.php -> submit.php ->list/item/new/index.php
index.php has a normal form with an action="submit.php" property.
submit.php decides where to send the following post request by some logic based on the POST variable content.
The problem is that I haven't found a successful way to debug this pipeline. Somewhere, something is failing and I would appreciate a fresh pair of eyes.
What I have tried
I have tried running list/item/new/index.php with dummy parameters through a regular GET request. DB updates successfully.
I have tried running submit.php (below) with dummy parameters through a regular GET request. The value of $result is not FALSE, indicating the file_get_contents request was successful, but it's value is the literal content of list/new/index.php instead of the generated content, which I expect to be the result of
echo $db->new($hash,$content) && $db->update_content_key($hash);
Here is submit.php
$url = 'list/new/index.php';
if($test){
$content = $_GET["i"];
$hash = $_GET["h"];
}else{
$content = $_POST["item"]["content"];
$hash = $_POST["list"]["hash"];
}
$data = array(
'item'=>array('content' => $content),
'list'=>array('hash' => $hash)
);
$post_content = http_build_query($data);
$options = array(
'http' => array(
'header' => "Content-type: application/x-www-form-urlencoded\r\n".
"Content-Length: " . strlen($post_content) . "\r\n",
'method' => 'POST',
'content' => $post_content
)
);
$context = stream_context_create($options);
$result = file_get_contents($url, false, $context);
if ($result === FALSE) {
echo "error";
//commenting out for testing. should go back to index.php when it's done
//header('Location: '.$root_folder_path.'list/?h='.$hash.'&f='.$result);
}
else{
var_dump($result);
//commenting out for testing. should go back to index.php when it's done
//header('Location: '.$root_folder_path.'list/?h='.$hash.'&f='.str_replace($root_folder_path,"\n",""));
}
And here is list/item/new/index.php
$db = new sql($root_folder_path."connection_details.php");
if($test){
$content = $_GET["i"];
$hash = $_GET["h"];
}else{
$content = $_POST["item"]["content"];
$hash = $_POST["list"]["hash"];
}
// insert into DB, use preformatted queries to prevent sqlinjection
echo $db->new($hash,$content) && $db->update_content_key($hash);
The worst thing about this is that I don't know enough of PHP to effectively debug this (I actually had it working at some point today but I did not commit right then...).
All comments and suggestions are welcome. I appreciate your time.
Got it.
I'm not sure what to call the error I was making (or what is actually going on behind the scenes) but it was the following:
on the POST request I was using
$url='list/item/new/index.php'
I used the whole url scheme:
$url = 'https://example.com/list/item/new/index.php';`
I have a website, where I need to determine user's location, so I use webservice, which gives me detailed information about my user (using his IP address).
My function looks like this:
$user_ip = $_SERVER['REMOTE_ADDR'];
$json_url = 'http://example.com/'.$user_ip;
$json = file_get_contents($json_url);
$obj = json_decode($json);
Today morning this webservice had a problems (500 errors, too many connections, bad gateway...) and my website was loading very long time.
So I have a question: Is it possible to set timeout for file_get_contens function? Or maybe there are a ways to get fast that the server is not working?
You can set the timeout option of the http context:
$opts = array('http' =>
array(
'timeout' => 5
)
);
$result = file_get_contents($url, false, stream_context_create($opts));
Check the docs of:
stream_context_create()
HTTP context options
An alternative would be to set the default socket timeout via ini_set():
$st = ini_get("default_socket_timeout"); // backup current value
ini_set("default_socket_timeout", 5000); // 5 seconds
$content = file_get_contents($url);
if($content === false) {
// error handling
}
ini_set("default_socket_timeout", $st); // restore previous value
I have a performance problem with my script (below).
The fread operation takes a lot of time, I get times like:
$time_split2 == 0.00135s
$time_split3 == 15.01747s
I have tested it even with a remote script that does nothing except echoing OK message - there is still the aprox. 15 seconds execution time
What could be the problem or how could I solve it another way.
I would prefer not to use curl (would that speed up things?) since the curl is not always installed with PHP, and I would like my code to be portable
$opts = array('http' =>
array(
'method' => 'POST',
'header' => array('Content-type: application/x-www-form-urlencoded', 'Custom-header: test'),
'content' => $postdata,
'timeout' => 60
)
);
$context = stream_context_create($opts);
$time_split = microtime(true);
$fp = fopen('http://someremotedomain/script.php', 'r', false, $context);
$time_split2 = microtime(true);
while(!feof($fp))
$result .= fread($fp, 4096);
fclose($fp);
$time_split3 = microtime(true);
$time_split2 = round($time_split2 - $time_split, 5);
$time_split3 = round($time_split3 - $time_split, 5);
UPDATE
I have used your suggestions - file_get_contents() + Connection: close - it doesn't work yet - file_get_contents() works with a delay and returns an empty string but - I have isolated the problem, here is the $postdata:
$postdata = http_build_query(
array('body' => $mail_html_content,
'from' => 'test <test#test.com>',
'to' => 'test2 <test2#test.com>',
)
);
when I remove 'body' from the array - file_get_contents() works fine and without any delays - how could this create a problem - $mail_html_content contains just a simple HTML string and it is not a big string
UPDATE 2
I have isolated the problem even more - when the length of the $postdata string exceeds 1024 chars, file_get_contents() starts to return empty values, below that value everything works fine, since method POST isn't limited by length of the data (at least for such low numbers) what could be the problem now??
You should try file_get_contents() instead of use while(!feof($fp)).
E.g.
/* EDIT: header should be something like that */
$opts = array(
'http' => array(
'method'=>"POST",
'header'=>"Content-Type: text/html; charset=utf-8",
),
);
$context = stream_context_create($opts);
$result = file_get_contents('http://someremotedomain/script.php', false, $context);
For other header information look here
Reason
according to fread documentation:
Note:
If you just want to get the contents of a file into a string, use file_get_contents() as
it has much better performance than the code above.
I would like to stop a simplexml_load_file if it takes too long to load and/or isn't reachable (occasionally the site with the xml goes down) seeing as I don't want my site to completely lag if theirs aren't up.
I tried to experiment a bit myself, but haven't managed to make anything work.
Thank you so much in advance for any help!
You can't have an arbitrary function quit after a specified time. What you can do instead is to try to load the contents of the URL first - and if it succeeds, continue processing the rest of the script.
There are several ways to achieve this. The easiest is to use file_get_contents() with a stream context set:
$context = stream_context_create(array('http' => array('timeout' => 5)));
$xmlStr = file_get_contents($url, FALSE, $context);
$xmlObj = simplexml_load_string($xmlStr);
Or you could use a stream context with simplexml_load_file() via the libxml_set_streams_context() function:
$context = stream_context_create(array('http' => array('timeout' => 5)));
libxml_set_streams_context($context);
$xmlObj = simplexml_load_file($url);
You could wrap it as a nice little function:
function simplexml_load_file_from_url($url, $timeout = 5)
{
$context = stream_context_create(
array('http' => array('timeout' => (int) $timeout))
);
$data = file_get_contents($url, FALSE, $context);
if(!$data) {
trigger_error("Couldn't get data from: '$url'", E_USER_NOTICE);
return FALSE;
}
return simplexml_load_string($data);
}
Alternatively, you can consider using the cURL (available by default). The benefit of using cURL is that you get really fine grained control over the request and how to handle the response.
You should be using a stream context with a timeout option coupled with file_get_contents
$context = stream_context_create(array('http' => array('timeout' => 5))); //<---- Setting timeout to 5 seconds...
and now map that to your file_get_contents
$xml_load = file_get_contents('http://yoururl', FALSE, $context);
$xml = simplexml_load_string($xml_load);
I have an array of urls (~1000 urls in it), i want to check all of them if they exist or not. Here is my current code:
$south_east_png_endings = array();
for($x=1;$x<=25;$x++) {
for($y=1;$y<=48;$y++) {
$south_east_png_endings[] ="${x}s${y}e.png";
}
}
foreach ($south_east_png_endings as $se){
$url = 'http://imgs.xkcd.com/clickdrag/'.$se;
$file_headers = #get_headers($url);
if($file_headers[0] == 'HTTP/1.1 404 Not Found') {
// echo 'Does not exist';
}
else
{
echo $url;
}
}
This script works, it echos out all the working urls, but the process is too long (takes several minutes to complete). Is there a way to do this faster or is this as fast as it gets? Maybe i can use curl_timeout functions to shorten the time?
1) get_headers() actually uses GET requests, which are not needed if you just want to know if a file exists. Use HEAD instead, example from the manual:
<?php
// By default get_headers uses a GET request to fetch the headers. If you
// want to send a HEAD request instead, you can do so using a stream context:
stream_context_set_default(
array(
'http' => array(
'method' => 'HEAD'
)
)
);
$headers = get_headers('http://example.com');
?>
2) since those checks can be easily run in parallel, you should use separate threads/processes to do the checking. However, if you're doing this from home, your router might choke on 1000 requests at once, so you might want to use something like 5-20 concurrent threads.
For paralleling check you may use multi_curl. It might be pretty fast. Here some example. Cause it's more complex than example by #eis.
P.S. Also with curl you can use trick with method HEAD.
function _isUrlexist($url) {
$flag = false;
if ($url) {
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_NOBODY => true,
CURLOPT_HEADER => true
));
curl_exec($ch);
$info = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
$flag = ($info == 200) ? true : false;
}
return $flag;
}