Performance issue with php remote url fread operation - php

I have a performance problem with my script (below).
The fread operation takes a lot of time, I get times like:
$time_split2 == 0.00135s
$time_split3 == 15.01747s
I have tested it even with a remote script that does nothing except echoing OK message - there is still the aprox. 15 seconds execution time
What could be the problem or how could I solve it another way.
I would prefer not to use curl (would that speed up things?) since the curl is not always installed with PHP, and I would like my code to be portable
$opts = array('http' =>
array(
'method' => 'POST',
'header' => array('Content-type: application/x-www-form-urlencoded', 'Custom-header: test'),
'content' => $postdata,
'timeout' => 60
)
);
$context = stream_context_create($opts);
$time_split = microtime(true);
$fp = fopen('http://someremotedomain/script.php', 'r', false, $context);
$time_split2 = microtime(true);
while(!feof($fp))
$result .= fread($fp, 4096);
fclose($fp);
$time_split3 = microtime(true);
$time_split2 = round($time_split2 - $time_split, 5);
$time_split3 = round($time_split3 - $time_split, 5);
UPDATE
I have used your suggestions - file_get_contents() + Connection: close - it doesn't work yet - file_get_contents() works with a delay and returns an empty string but - I have isolated the problem, here is the $postdata:
$postdata = http_build_query(
array('body' => $mail_html_content,
'from' => 'test <test#test.com>',
'to' => 'test2 <test2#test.com>',
)
);
when I remove 'body' from the array - file_get_contents() works fine and without any delays - how could this create a problem - $mail_html_content contains just a simple HTML string and it is not a big string
UPDATE 2
I have isolated the problem even more - when the length of the $postdata string exceeds 1024 chars, file_get_contents() starts to return empty values, below that value everything works fine, since method POST isn't limited by length of the data (at least for such low numbers) what could be the problem now??

You should try file_get_contents() instead of use while(!feof($fp)).
E.g.
/* EDIT: header should be something like that */
$opts = array(
'http' => array(
'method'=>"POST",
'header'=>"Content-Type: text/html; charset=utf-8",
),
);
$context = stream_context_create($opts);
$result = file_get_contents('http://someremotedomain/script.php', false, $context);
For other header information look here
Reason
according to fread documentation:
Note:
If you just want to get the contents of a file into a string, use file_get_contents() as
it has much better performance than the code above.

Related

Unable to get Healthline search results with PHP

I am trying to run a script that will search Healthline with a query string and determine if there are any search results, but I can't get the contents with the query string posting to the page. To search for something on their site, you go to https://www.healthline.com/search?q1=search+string.
Here is what I tried:
$healthline_url = 'https://www.healthline.com/search';
$search_string = 'ashwaganda';
$postdata = http_build_query(
array(
'q1' => $search_string
)
);
$opts = array('http' =>
array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
'content' => $postdata
)
);
$stream = stream_context_create($opts);
$theHtmlToParse = file_get_contents($healthline_url, false, $stream);
print_r($theHtmlToParse);
I also tried to just add the query string to the url and skip the stream, amongst other variations, but I'm running out of ideas. This also didn't work:
$healthline_url = 'https://www.healthline.com/search';
$search_string = 'ashwaganda';
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Content-Type: text/xml; charset=utf-8"
)
);
$stream = stream_context_create($opts);
$theHtmlToParse = file_get_contents($healthline_url.'&q1='.$search_string, false, $stream);
print_r($theHtmlToParse);
And suggestions?
EDIT: I changed the url in case someone wants to look at the search page. Also fixed the query string. Still doesn't work.
In response to Ken Lee, I did try the following cURL script that also just returns the page without search results:
$healthline_url = 'https://www.healthline.com/search?q1=ashwaganda';
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $healthline_url);
$data = curl_exec($ch);
curl_close($ch);
print_r($data);
Healthline does not load the search result directly. It has its search index stored in Algolia and made extra javascript calls to retrieve the result. Therefore you cannot see the search result by file_get_content.
To see the search result, you need to run a browser simulator that simulates a javascript-capable browser to properly run the site page.
For PHP developers, you may try using php-webdriver to control browers through webdriver (e.g. Selenium, Chrome + chromedriver, Firefox + geckodriver).
Update: Didn't know that the target site is Healthline. Updated the answer once I found out.

using a context-stream resource with file_get_contents returns a NULL string

I'm using PHP 4.3.9 and am trying to POST to a url without a form using stream_context_create like below:
function do_post_request($url, $postdata) {
$content = "";
foreach($postdata as $key => $value)
$content .= "$key=$value&";
$content = urlencode($content);
$params = array('http' => array(
'method' => 'POST',
'header' => 'Content-Type: application/x-www-form-urlencoded',
'content' => $content
));
$ctx = stream_context_create($params);
$result = file_get_contents($url, false, $ctx);
var_dump($result);
This code is taken almost word for word from the php manual and I've seen it in several places here on stackoverflow as well.
If I do file_get_contents without $ctx, var_dump($results) will display the $url properly (but without the necessary changes $_POST would cause, of course). With $ctx, var_dump($result) is NULL. So something is wrong with $ctx but I have no idea what. Am I setting up my $params incorrectly or something?
Any insight would be appreciated. If there is another way to pass POST data to a url I wouldn't mind hearing that either. But I cannot use cURL (or anything that needs installation) and I'm using an older version of php so my choices are limited.
Thanks

PHP file_get_contents sending big POST

I have following method to send file_get_contents requests:
protected function query($page, $post, $timeout = false, $debug = false, $enableWarnings = true) {
$this->correctHostIfNeeded(self::HTTP_CONNECTION_TYPE);
$page = $this->host . (substr($page, 0, 1) == '?' ? '' : '?') . $page;
$opts = array('http' => array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
));
if (!empty($post)) {
$postdata = http_build_query($post);
$opts['http']['content'] = $postdata;
}
if ($timeout > 0)
$opts['http']['timeout'] = $timeout;
$context = stream_context_create($opts);
$result = $enableWarnings ? file_get_contents($page, false, $context) : #file_get_contents($page, false, $context);
return $result;
}
It usually works fine, better than curl version (it occasionally not executing properly, regardless of data in post). Unfortunately, if I send really big POST usign file_get_contents (for example array with 100k elements) it fails. Sometimes the target server saves part of the data but it never gets it all.
I know the internet connetion between servers is not the problem (both servers are in my own datacenters and speed between is stable about 100Mb). Also the code itself on both servers seems to be fine because with smaller data it works fine and if I change to curl big packages are received properly (unfortunately it sometimes fails and I read that it's not to strange behavior for curl).
Increase the execution time of the page, write this at the top-
ini_set('max_execution_time', 300);
Try to read file by parts, and merge result afterwards. In file_get_context you can specify offset and max_length argument.

How can I interrupt a PHP function that's taking too long?

I would like to stop a simplexml_load_file if it takes too long to load and/or isn't reachable (occasionally the site with the xml goes down) seeing as I don't want my site to completely lag if theirs aren't up.
I tried to experiment a bit myself, but haven't managed to make anything work.
Thank you so much in advance for any help!
You can't have an arbitrary function quit after a specified time. What you can do instead is to try to load the contents of the URL first - and if it succeeds, continue processing the rest of the script.
There are several ways to achieve this. The easiest is to use file_get_contents() with a stream context set:
$context = stream_context_create(array('http' => array('timeout' => 5)));
$xmlStr = file_get_contents($url, FALSE, $context);
$xmlObj = simplexml_load_string($xmlStr);
Or you could use a stream context with simplexml_load_file() via the libxml_set_streams_context() function:
$context = stream_context_create(array('http' => array('timeout' => 5)));
libxml_set_streams_context($context);
$xmlObj = simplexml_load_file($url);
You could wrap it as a nice little function:
function simplexml_load_file_from_url($url, $timeout = 5)
{
$context = stream_context_create(
array('http' => array('timeout' => (int) $timeout))
);
$data = file_get_contents($url, FALSE, $context);
if(!$data) {
trigger_error("Couldn't get data from: '$url'", E_USER_NOTICE);
return FALSE;
}
return simplexml_load_string($data);
}
Alternatively, you can consider using the cURL (available by default). The benefit of using cURL is that you get really fine grained control over the request and how to handle the response.
You should be using a stream context with a timeout option coupled with file_get_contents
$context = stream_context_create(array('http' => array('timeout' => 5))); //<---- Setting timeout to 5 seconds...
and now map that to your file_get_contents
$xml_load = file_get_contents('http://yoururl', FALSE, $context);
$xml = simplexml_load_string($xml_load);

PHP json_encode adds some number (hex?) to the beginning of the json-string

I am echoing json_encoded data from one php script to another (the request is made by fsockopen/GET).
When having encoded an array with 40 elements, there is no problem. When doing exactly the same thing with 41, some numbers and \r\n is added to the beginning of the json string.
This is the beginning of the string just before I echo it:
{"transactions":[{"transaction_id":"03U191739F337671L",
This is how I send the data:
header('Content-Type: text/plain; charset=utf-8');
error_log(json_encode($transaction_list));
echo json_encode($transaction_list);
As soon as I have received the data in the requesting script I print it again to error_log:
27fc\r\n{"transactions":[{"transaction_id":"03U191739F337671L",
The "27fc\r\n" is not there if I retrieve less data.
This is how I handle the response:
$response="";
while (!feof($fp)) {
$response .= fgets($fp, 128);
}
//Seperate header and content
$separator_position = strpos($response,"\r\n\r\n");
$header_text = substr($response,0,$separator_position);
$body = substr($response,$separator_position+4);
error_log($body);
fclose($fp);
I have tried playing around with the time out of the fsockopen request, that doesn't matter. The same thing with max_execution_time and max_input_time in php.ini, doesn't matter. I was thinking that the content in some way may have been cut due to time out...
The 41st array is having no different format of the content than the preceding ones.
What is happening and how can I fix it?
I am using Linux, Apache (httpd) and PHP.
UPDATE
The data seems to be chunked. In the response, following header is included: "Transfer-Encoding: chunked".
Based on #Salmans idea of using file_get_contents, this is the working solution. This uses POST to send the data (GET didn't seem to be working, I think one has to append that query string to the URL oneself):
$postdata = http_build_query(
array('customer_id' => $customer_id)
);
$opts = array('http' =>
array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
'content' => $postdata
)
);
$context = stream_context_create($opts);
$content = file_get_contents($my_url, false, $context);
return $content;

Categories