The best way for decoding gzip compressed response

The best way for decoding gzip compressed response - php

Example Code:
<?php
set_time_limit(0);
$authorization = ""; // OAuth authorization credentials
$fp = fsockopen("ssl://userstream.twitter.com", 443);
$headers = array(
"GET /1.1/user HTTP/1.1",
"Host: userstream.twitter.com",
"Accept-Encoding: deflate, gzip",
"Authorization: OAuth {$authorization}",
"",
"",
);
fwrite($fp, implode("\r\n", $headers));
while (!feof($fp)) {
if (!$size = hexdec(fgets($fp))) {
break;
}
echo DECODE_FUNCTION(fread($fp, $size));
fgets($fp); // SKIP CRLF
}
This example works if I implement DECODE_FUNCTION as:
function DECODE_FUNCTION($str) {
$filename = stream_get_meta_data($fp = tmpfile())['uri'];
fwrite($fp, $str);
ob_start();
readgzfile($filename);
return ob_get_clean();
}
However, these cases fails:
function DECODE_FUNCTION($str) {
return gzuncompress($str);
}
or
function DECODE_FUNCTION($str) {
return gzinflate($str);
}
or
function DECODE_FUNCTION($str) {
return gzdecode($str);
}
Creating temporary files seems to have much overheads.
What is the best way?
Thank you.

Related

Writing multiple post requests using single connection - PHP

I am writing to a server using the following snippet.
$fp = connect();
$sent_requests = 0;
function connect() {
$addr = gethostbyname("example.com");
$fp = fsockopen("$addr", 80, $errno, $errstr);
socket_set_blocking( $fp, false );
if (!$fp) {
echo "$errstr ($errno)<br />\n";
exit(1);
} else{
echo "Connected\n";
return $fp;
}
}
function sendTestCalls($load){
global $fp, $sent_requests;
if(!$fp){
echo "reconnecting";
$sent_requests = 0;
//echo stream_get_contents($fp) . "\n";
fclose($fp);
$fp = connect();
}
$data = "POST /test HTTP/2.0\r\n";
$data.= "Host: example.com\r\n";
$data.= "Content-Type: application/json\r\n";
$data.= "Content-Length: ".strlen($load)."\r\n";
$data.= "Connection: Keep-Alive\r\n";
$data.= "xYtU87BVFluc6: 1\r\n";
$data.= "\r\n" . $load;
$bytesToWrite = strlen($data);
$totalBytesWritten = 0;
while ($totalBytesWritten < $bytesToWrite) {
$bytes = fwrite($fp, substr($data, $totalBytesWritten));
$totalBytesWritten += $bytes;
}
$sent_requests++;
}
$time = time();
for($i=0; $i<1000; $i++) {
sendTestCalls('{"justtesting": "somevalue"}');
}
fclose($fp);
$time_taken = time() - $time;//might be a bit inaccurate
echo "Time Taken: " . $time_taken . "\n";
When I check my access logs on my server less than 1000 post requests are received (in the range of 0 to 900). What am I doing wrong here?
EDIT1
I suppose my socket is timing out. What should I do to check if it has disconnected in such a scenario reconnect. I tried using stream_get_meta_data($fp) but it had no effect.

Try to insert this before each request:
$info = stream_get_meta_data($fp);
if ($info['timed_out']) {
fclose($fp);
$fp = connect();
}

I had found a solution to this at that point using php_curl and KEEP-ALIVE
Here is my updated version:
function sendCall(&$curl_handle, $data){
curl_setopt($curl_handle, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($curl_handle, CURLOPT_POSTFIELDS, $data);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt(
$curl_handle,
CURLOPT_HTTPHEADER,
array(
'Content-Type: application/json',
'Connection: Keep-Alive',
'Content-Length: ' . strlen($data)
)
);
$response = curl_exec($curl_handle); //Check response?
if($err = curl_error($curl_handle)) {
error_log("Error - $err Status - Reconnecting" );
$curl_handle = curl_init(curl_getinfo($curl_handle, CURLINFO_EFFECTIVE_URL));
sendCall($curl_handle, $data);
}
}
This function gives me an almost always alive connection. (Never got the error log in more than a week). Hope it helps anyone looking for the same.

if connection is keep alive how to read until end of stream php

$f = fsockopen("www....",80,$x,$y);
fwrite("GET request HTTP/1.1\r\nConnection: keep-alive\r\n\r\n");
while($s = fread($f,1024)){
...
}
The above stalls because of the Connection: keep-alive, and works with Connection: close.
How do you do it without stalling?

It depends on the response, if the transfer-encoding of the response is chunked, then you read until you encounter the "last chunk" (\r\n0\r\n).
If the content-encoding is gzip, then you look at the content-length response header and read that much data and then inflate it. If the transfer-encoding is also set to chunked, then you must dechunk the decoded response.
The easiest thing is to build a simple state machine to read the response from the socket while there is still data left for the response.
When reading chunked data, you should read the first chunk length (and any chunked extension) and then read as much data as the chunk size, and do so until the last chunk.
Put another way:
Read the HTTP response headers (read small chunks of data until you encounter \r\n\r\n)
Parse the response headers into an array
If the transfer-encoding is chunked, read and dechunk the data piece by piece.
If the content-length header is set, you can read that much data from the socket
If the content-encoding is gzip, decompress the read data
Once you have performed the above steps, you should have read the entire response and you can now send another HTTP request on the same socket and repeat the process.
On the other hand, unless you have the absolute need for a keep-alive connection, just set Connection: close in the request and you can safely read while (!feof($f)).
I don't have any PHP code for reading and parsing HTTP responses at the moment (I just use cURL) but if you'd like to see actual code, let me know and I can work something up. I could also refer you to some C# code I've made that does all of the above.
EDIT: Here is working code that uses fsockopen to issue an HTTP request and demonstrate reading keep-alive connections with the possibility of chunked encoding and gzip compression. Tested, but not tortured - use at your own risk!!!
<?php
/**
* PHP HTTP request demo
* Makes HTTP requests using PHP and fsockopen
* Supports chunked transfer encoding, gzip compression, and keep-alive
*
* #author drew010 <http://stackoverflow.com/questions/11125463/if-connection-is-keep-alive-how-to-read-until-end-of-stream-php/11812536#11812536>
* #date 2012-08-05
* Public domain
*
*/
error_reporting(E_ALL);
ini_set('display_errors', 1);
$host = 'www.kernel.org';
$sock = fsockopen($host, 80, $errno, $errstr, 30);
if (!$sock) {
die("Connection failed. $errno: $errstr\n");
}
request($sock, $host, 'GET', '/');
$headers = readResponseHeaders($sock, $resp, $msg);
$body = readResponseBody($sock, $headers);
echo "Response status: $resp - $msg\n\n";
echo '<pre>' . var_export($headers, true) . '</pre>';
echo "\n\n";
echo $body;
// if the connection is keep-alive, you can make another request here
// as demonstrated below
request($sock, $host, 'GET', '/kernel.css');
$headers = readResponseHeaders($sock, $resp, $msg);
$body = readResponseBody($sock, $headers);
echo "Response status: $resp - $msg\n\n";
echo '<pre>' . var_export($headers, true) . '</pre>';
echo "\n\n";
echo $body;
exit;
function request($sock, $host, $method = 'GET', $uri = '/', $params = null)
{
$method = strtoupper($method);
if ($method != 'GET' && $method != 'POST') $method = 'GET';
$request = "$method $uri HTTP/1.1\r\n"
."Host: $host\r\n"
."Connection: keep-alive\r\n"
."Accept-encoding: gzip, deflate\r\n"
."\r\n";
fwrite($sock, $request);
}
function readResponseHeaders($sock, &$response_code, &$response_status)
{
$headers = '';
$read = 0;
while (true) {
$headers .= fread($sock, 1);
$read += 1;
if ($read >= 4 && $headers[$read - 1] == "\n" && substr($headers, -4) == "\r\n\r\n") {
break;
}
}
$headers = parseHeaders($headers, $resp, $msg);
$response_code = $resp;
$response_status = $msg;
return $headers;
}
function readResponseBody($sock, array $headers)
{
$responseIsChunked = (isset($headers['transfer-encoding']) && stripos($headers['transfer-encoding'], 'chunked') !== false);
$contentLength = (isset($headers['content-length'])) ? $headers['content-length'] : -1;
$isGzip = (isset($headers['content-encoding']) && $headers['content-encoding'] == 'gzip') ? true : false;
$close = (isset($headers['connection']) && stripos($headers['connection'], 'close') !== false) ? true : false;
$body = '';
if ($contentLength >= 0) {
$read = 0;
do {
$buf = fread($sock, $contentLength - $read);
$read += strlen($buf);
$body .= $buf;
} while ($read < $contentLength);
} else if ($responseIsChunked) {
$body = readChunked($sock);
} else if ($close) {
while (!feof($sock)) {
$body .= fgets($sock, 1024);
}
}
if ($isGzip) {
$body = gzinflate(substr($body, 10));
}
return $body;
}
function readChunked($sock)
{
$body = '';
while (true) {
$data = '';
do {
$data .= fread($sock, 1);
} while (strpos($data, "\r\n") === false);
if (strpos($data, ' ') !== false) {
list($chunksize, $chunkext) = explode(' ', $data, 2);
} else {
$chunksize = $data;
$chunkext = '';
}
$chunksize = (int)base_convert($chunksize, 16, 10);
if ($chunksize === 0) {
fread($sock, 2); // read trailing "\r\n"
return $body;
} else {
$data = '';
$datalen = 0;
while ($datalen < $chunksize + 2) {
$data .= fread($sock, $chunksize - $datalen + 2);
$datalen = strlen($data);
}
$body .= substr($data, 0, -2); // -2 to remove the "\r\n" before the next chunk
}
} // while (true)
}
function parseHeaders($headers, &$response_code = null, &$response_message = null)
{
$lines = explode("\r\n", $headers);
$return = array();
$response = array_shift($lines);
if (func_num_args() > 1) {
list($proto, $code, $message) = explode(' ', $response, 3);
$response_code = $code;
if (func_num_args() > 2) {
$response_message = $message;
}
}
foreach($lines as $header) {
if (trim($header) == '') continue;
list($name, $value) = explode(':', $header, 2);
$return[strtolower(trim($name))] = trim($value);
}
return $return;
}

The following code works without any problem for me:
<?php
$f = fsockopen("www.google.de",80);
fwrite($f,"GET / HTTP/1.1\r\n Connection: keep-alive\r\n\r\n");
while($s = fread($f,1024)){
echo "got: $s";
}
echo "finished;";
?>
The funny thing is that without keep-alive this example stalls for me.
Can you add an example that can be just copy&pasted and shows your error?

Why can I not download files from some sites like this?

This is my php source code:
<?php
$path = '/images/one.jpg';
$imm = 'http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg';
if( $content = file_get_contents($imm) ){
file_put_contents($path, $content);
echo "Yes";
}else{
echo "No";
}
?>
and I get this error:
Warning: file_get_contents(http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in /opt/lampp/htdocs/test/down.php on line 4
No
Why ?

There are some headers expected by the server(especially Accept and User-Agent). Use the stream_context -argument of file_get_contents() to provide them:
<?php
$path = '/images/one.jpg';
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Accept:image/png,image/*;q=0.8,*/*;q=0.5 \r\n".
"Host: www.allamoda.eu\r\n" .
"User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0\r\n"
)
);
$context = stream_context_create($opts);
$imm = 'http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg';
if( $content = file_get_contents($imm,false,$context) ){
file_put_contents($path, $content);
echo "Yes";
}else{
echo "No";
}
?>

You are not allowed to download this file, the server allamoda.eu says (HTTP 403).

Nothing wrong with the code. The server simply is not letting you (either you have too much requests to it, or it just blocks all scripts scraping it).

You're not allowed to open the file directly. But you can try to fetch it's content by using sockets:
function getRemoteFile($url)
{
// get the host name and url path
$parsedUrl = parse_url($url);
$host = $parsedUrl['host'];
if (isset($parsedUrl['path'])) {
$path = $parsedUrl['path'];
} else {
// the url is pointing to the host like http://www.mysite.com
$path = '/';
}
if (isset($parsedUrl['query'])) {
$path .= '?' . $parsedUrl['query'];
}
if (isset($parsedUrl['port'])) {
$port = $parsedUrl['port'];
} else {
// most sites use port 80
$port = '80';
}
$timeout = 10;
$response = '';
// connect to the remote server
$fp = #fsockopen($host, '80', $errno, $errstr, $timeout );
if( !$fp ) {
echo "Cannot retrieve $url";
} else {
// send the necessary headers to get the file
fputs($fp, "GET $path HTTP/1.0\r\n" .
"Host: $host\r\n" .
"User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3\r\n" .
"Accept: */*\r\n" .
"Accept-Language: en-us,en;q=0.5\r\n" .
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n" .
"Keep-Alive: 300\r\n" .
"Connection: keep-alive\r\n" .
"Referer: http://$host\r\n\r\n");
// retrieve the response from the remote server
while ( $line = fread( $fp, 4096 ) ) {
$response .= $line;
}
fclose( $fp );
// strip the headers
$pos = strpos($response, "\r\n\r\n");
$response = substr($response, $pos + 4);
}
// return the file content
return $response;
}
Example:
$content = getRemoteFile('http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg');
Source

Retrieving page with fsockopen adds numbers to returned string

This is very strange, on some pages it will return the HTML fine, others it will add numbers to the beginning and end of the returned string ($out).
function lookupPage($page, $return = true) {
$fp = fsockopen("127.0.0.1", 48580, $errno, $errstr, 5);
if (!$fp) {
return false;
}
else {
$out = "";
$headers = "GET /" . $page . " HTTP/1.1\r\n";
$headers .= "Host: www.site.com\r\n";
$headers .= "Connection: Close\r\n\r\n";
fwrite($fp, $headers);
stream_set_timeout($fp, 300);
$info = stream_get_meta_data($fp);
while (!feof($fp) && !$info['timed_out'] && ($line = stream_get_line($fp, 1024)) !== false) {
$info = stream_get_meta_data($fp);
if ($return) $out .= $line;
}
fclose($fp);
if (!$info['timed_out']) {
if ($return) {
$out = substr($out, strpos($out, "\r\n\r\n") + 4);
return $out;
}
else {
return true;
}
}
else {
return false;
}
}
}
e.g...
3565
<html>
<head>
...
</html>
0

It is called Chunked Transfer Encoding
It is part of the HTTP 1.1 protocol and you're decoding it in a HTTP 1.0 way. You can just check for the values and trim them if you want. They only show the length of the response so the browser knows it has the complete response.
Also maybe look at file_get_contents

My guess would be that the server responds with chunked data.
Have a look at RFC2616 Transfer codings and its introduction.

Improve HTTP GET PHP scripts

This code is getting the headers and content from $url, and prints it to the browser. It is really slow, and it's not because the server. How can I improve this?
$headers = get_headers($url);
foreach ($headers as $value)
header($value);
$fh = fopen($url, "r");
fpassthru($fh);
Thanks

Why make two requests when one will do?
$fh = fopen($url, 'r');
foreach ($http_response_header as $value) {
header($value);
}
fpassthru($fh);
Or:
$content = file_get_contents($url);
foreach ($http_response_header as $value) {
header($value);
}
echo $content;

I'm not sure why you're opening a connection there on line 6 if you already have and have printed out the headers. Is this doing more than printing out headers?

If you are really looking to just proxy a page, the cURL functions are much more efficient:
<?
$curl = curl_init("http://www.google.com");
curl_setopt($curl, CURLOPT_HEADER, true);
curl_exec($curl);
curl_close($curl);
?>
Of course, cURL has to be enabled on your server, but it's not uncommon.

Are you trying to make a proxy? If so, here is a recipe, in proxy.php:
<?php
$host = 'example.com';
$port = 80;
$page = $_SERVER['REQUEST_URI'];
$conn = fsockopen($host, $port, $errno, $errstr, 180);
if (!$conn) throw new Exception("$errstr ($errno)");
$hbase = array();
$hbase[] = 'User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)';
if (!empty($_SERVER['HTTP_REFERER'])) $hbase[] = 'Referer: '.str_ireplace($_SERVER['HTTP_HOST'], $host, $_SERVER['HTTP_REFERER']);
if (!empty($_SERVER['HTTP_COOKIE'])) $hbase[] = 'Cookie: '.$_SERVER['HTTP_COOKIE'];
$hbase = implode("\n", $hbase);
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
$post = file_get_contents("php://input");
$length = strlen($post);
$request = "POST $page HTTP/1.0\nHost: $host\n$hbase\nContent-Type: application/x-www-form-urlencoded\nContent-Length: $length\n\n$post";
} else $request = "GET $page HTTP/1.0\nHost: $host\n$hbase\n\n";
do {
$conn = fsockopen($host, 80, $errno, $errstr, 180);
if (!$conn) throw new Exception("$errstr ($errno)");
fputs($conn, $request);
$header = false;
$body = false;
stream_set_blocking($conn, false);
$info = stream_get_meta_data($conn);
while (!feof($conn) && !$info['timed_out']) {
$str = fgets($conn);
if (!$str) {
usleep(50000);
continue;
}
if ($body !== false) $body .= $str;
else $header .= $str;
if ($body === false && $str == "\r\n") $body = '';
$info = stream_get_meta_data($conn);
}
fclose($conn);
} while ($info['timed_out']);
$header = str_ireplace($host, $_SERVER['HTTP_HOST'], $header);
if (stripos($body, $host) !== false) $body = str_ireplace($host, $_SERVER['HTTP_HOST'], $body);
$header = str_replace('domain=.example.com; ', '', $header);
$header_array = explode("\r\n", $header);
foreach ($header_array as $line) header($line);
if (strpos($header, 'Content-Type: text') !== false) {
$body = str_replace('something', '', $body);
}
echo $body;
In .htaccess:
Options +FollowSymlinks
RewriteEngine on
RewriteBase /
RewriteRule ^(.*)$ proxy.php [QSA,L]

You may be able to pinpoint the slowness by changing $url to a know fast site, or even a local webserver. The only thing that seems to be possible is a slow response from the server.
Of course as suggested by GZipp, if you're going to output the file contents as well, just do it with a single request. That would make the server you're requesting from happier.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

The best way for decoding gzip compressed response - php

Related

Writing multiple post requests using single connection - PHP

if connection is keep alive how to read until end of stream php

Why can I not download files from some sites like this?

Retrieving page with fsockopen adds numbers to returned string

Improve HTTP GET PHP scripts

Categories

Resources