Sometimes when downloading sources of a webpage and trying to decode it, i get an error: gzdecode() insufficient memory. (memory limit i 500m, usage far below that)
I include headers with my curl output, those are separated from the content correctly, before decoding. Content encoding header of the pages is clearly gzip. I read on php.net that including a length argument could cause such a crash, but I do not use length argument with gzdecode.
So while seemingly everything should be fine, I still get the error. Last time I found it with this page: https://ahmia.fi/address/.
Is there probably something with the https I do not know about? My curl settiing is \CURLOPT_SSL_VERIFYPEER => false.
Any help appreciated!
CURLOPT_ENCODING The contents of the "Accept-Encoding: " header. This enables decoding of the response. Supported encodings are "identity", "deflate", and "gzip". If an empty string, "", is set, a header containing all supported encoding types is sent.
try this:
CURLOPT_ENCODING => ""
Related
How do I correctly compress a string, so PHP would be able to decompress?
I tried this:
public static byte[] compress(String string) throws IOException {
ByteArrayOutputStream os = new ByteArrayOutputStream(string.length());
DeflaterOutputStream gos = new DeflaterOutputStream(os);
// ALSO TRIED GZOutputStream, same results!
gos.write(string.getBytes());
gos.close();
byte[] compressed = os.toByteArray();
os.close();
return compressed;
}
But PHP does not recognize output as valid GZip compressed string...
The problem seems to be in some headers / footers being added by Android...
For example when I compress something word via PHP with gzcompress I got similar results as with Android, but not similar enough, so PHP could read it:
something (HEX DUMP):
Android: 1f8b08000000000000002bcecf4d2dc9c8cc4b0700fb31da0909000000
PHP: 789c2bcecf4d2dc9c8cc4b0700134703cf
The weirdest thing is that by changing GZOutputStream to DeflaterOutputStream it fixed the problem with something word, but the problem still appears with longer strings...
PS. Removing heading 10 characters from Android generated data does not help at all.
EDIT: I tried to decompress it in PHP with:
gzdecode() - this function does not exist in standard Debian PHP5
version.
gzdecompress() - does not work
And some functions to emulate gzdecode() from PHP site comments that don't really do much.
All above, with removing first 10 bytes and leaving them.
PS2. I tried every single solution from Stack Overflow, and other sources, and still nothing. It is not a duplicate.
EDIT2 (BINARY DUMP): Sample data generated with Android that can't be decomprssed by gzuncompress() or pseudo-gzdecode() functions from PHP.NET: data.compressed.
It supposed to be some JSON, after decompression.
The Android data that starts with 1f8b is a gzip stream. In php you use gzdecode() for that. gzencode() on php makes gzip streams.
The php data that starts with 789c is a zlib stream. You used gzcompress() to make that, and you would use gzuncompress() to decode it.
The compressed data contained within both of those streams, starting with 2bce is raw deflate data. You can use gzinflate() to decode that if you happened to make it somewhere, and you can use gzdeflate() to generate raw deflate.
Just to rant, gzencode(), gzcompress(), and gzdeflate() are some of the most misleading function names ever concocted, since only one of them is related to gzip yet all start with gz, and nothing in the name gzcompress() indicates zlib.
Update:
The "EDIT2" data is, for some reason, doubly compressed. It was compressed first to the zlib format, and then that zlib stream was compressed to the gzip format. (Though gzip couldn't compress the already compressed data, so it's a little bigger.)
You should repair the problem that made it doubly compressed. Or if you have no control over that, you can doubly decompress it, first stripping the gzip header using the RFC 1952 specification and then gzinflate() on the raw deflate data, and then using gzdecompress() on the result.
Whenever I try to read a Google alert via PHP using something like:
$feed = file_get_contents("http://www.google.com/alerts/feeds/01445174399729103044/950192755411504138");
Regardless of whether I save the $feed to a file or echo the result to the output, all utf-8 unicode characters ( i.e. those with diacritics) are represented by white space. I have tried - without success - various combinations of:
utf8_encode
utf8_decode
iconv
mb_convert_encoding
I think the wrong characters have come from the stream, but I'm lost because if I try this URI in a browser then everything is fine. Can anyone shed some light on the issue?
Sorry, you are absolutely correct - there is something untoward happening! Though it is not what you would first suspect... For reference, given that:
echo mb_detect_encoding($feed); // prints: ASCII
The unicode data is lost before it is even sent by the remote server - it appears that Google is looking at the user-agent string in the request header - which is non-existent using file_get_contents by default without a stream-context.
Because it cannot identify the client making the request it defaults to and forces ASCII encoding. This is presumably a necessary fallback in the event of some kind of cataclysmic cock-up. [citation needed...]
It's not simply enough to name your application however, you need to include a known vendor. I 'm unsure of the full extent of this but I believe most folks include "Mozilla [version]" to work around the issue, for example:
$url = 'http://www.google.com/...';
$feed = file_get_contents($url, false, stream_context_create([
'http' => [
'method' => 'GET',
'header' => 'Accept-Charset: UTF-8' ."\r\n"
.'User-Agent: (Mozilla/5.0 compatible) MyFeedReader/1.0'
]
]));
file_put_contents('test.txt', $feed); // should now work as expected
While evaluating performance of PHP frameworks I came across a strange problem
Sending a JSON as application/json seems to be much slower than sending with no extra header (which seems to fallback to text/html)
Example #1 (application/json)
header('Content-Type: application/json');
echo json_encode($data);
Example #2 (text/html)
echo json_encode($data);
Testing with apache bench (ab -c10 -n1000) gives me:
Example #1: 350 #/sec
Example #2: 440 #/sec
which shows that setting the extra header seems to be a little bit slower.
But:
Getting the same JSONs via "ajax" (jQuery.getJSON('url', function(j){console.log(j)});) makes the difference very big (timing as seen in Chrome Web Inspector):
Example #1: 340 ms / request
Example #2: 980 ms / request
Whats the matter of this difference?
Is there a reason to use application/json despite the performance difference?
I'll take up the last part of question:
Is there a reason to use application/json despite the performance
difference?
Answer: Yes
Why:
1) text/html can often be malformed json and will go uncaught until you try parsing it. application/json will fail and you can easily debug whenever the json is malformed
2) If you are viewing json in browser, having the header type will format it in a user-friendly formatting. text/html will show it more as a blob.
3) If you are consuming this json on your webpage, application/json will immediately be converted into js object and you may access them as obj.firstnode.childnode etc.
4) callback feature can work on application/json, but not on text/html
Note:
Using gzip will sufficiently alleviate the performance problem. text/html will still be bit faster, but not the recommended way for fetching json objects
Would like to see more insight on performance though. Header length is definitely not causing performance issue. More to do with your webserver analyzing the header format.
Does your server handle gzipping/deflate differently depending on content-type? Mine does. Believe ab does not accept gzip by default. (You can set this in ab with a custom header with the -H flag). But Chrome will always say it accepts gzipping.
You can use curl test to see if the files are different sizes:
curl http://www.example.com/whatever --silent -H "Accept-Encoding: gzip,deflate" --write-out "size_download=%{size_download}\n" --output /dev/null
You can also look at the headers to see if gzipping is applied:
curl http://www.example.com/whatever -I -H "Accept-Encoding: gzip,deflate"
I'm trying to mimick an application that sends octet streams to and from a server. The data contained in the body looks like raw bytes, and I'm fairly certain the data being sent for each command is static, so I'm hoping to map the bytes to something more readable in my application. For example, I'll have an array that does: "test" => "&^D^^&#*#dgkel" So I can call "test" and get the real bytes that need to be sent. Trouble is, PHP seems to convert these bytes. I'm not sure if it is an encoding problem or what, but what has been happing is I'll give it some bytes (for example, �ھ����#�qs��������������������X����������������������������) which has a length of 67 I believe, but PHP will say (when I do a var_dump of the HTTP request) that the headers sent contained "Content-Length: 174" or something close to that and the bytes will look like �ھ����#�qs��������������������X����������������������������
So I'm not really sure how to fix this.. Anyone have any ideas? Cheers!
Edit, a little PHP:
$request = new HttpRequest($this->GetMessageURL(), HTTP_METH_POST);
$request->addHeaders($headers);
$request->addRawPostData($buttonMapping[$button]);
$request->send();
I am having a problem with an echo/print that returns a large amount of data. The response is broken and is as follows:
end of data
http response header printed in body
start of data
I am running the following script to my browser to replicate the problem:
<?php
// Make a large array of strings
for($i=0;$i<10000;$i++)
{
$arr[] = "testing this string becuase it is must longer than all the rest to see if we can replicate the problem. testing this string becuase it is must longer than all the rest to see if we can replicate the problem. testing this string becuase it is must longer than all the rest to see if we can replicate the problem.";
}
// Create one large string from array
$var = implode("-",$arr);
// Set HTTP headers to ensure we are not 'chunking' response
header('Content-Length: '.strlen($var));
header('Content-type: text/html');
// Print response
echo $var;
?>
What is happening here?
Can someone else try this?
You might have automatic output buffering activated on your server. If the buffer overflows, it just starts pooping out the rest of the data instead.
Note that something like gzip compression also implicitly buffers the output. If it's the case, an ob_end_flush() call after the headers should solve it.
Browsers often limits the characters you're allowed to pass through a get variable.
To work around this, you could base 64 encode the string, and then decode it, once you recievede the respone.
I think there's javascript base 64 encode libraries available.
Like this one:
http://www.webtoolkit.info/javascript-base64.html