I need to be able to decompress through PHP some data that I have in a string which uses the gzip format. I need to do this via PHP, not by calling - through system for example - an external program.
I go to the documentation and I find gzdecode. Too bad it doesn't exist. Digging further through google it appears this function was implemented in PHP6, which I cannot use. (Interestingly enough gzencode exists and is working).
I believe - but I'm not sure - that the gzip format simply has some extra header data. Is there a way to uncompress it by manipulating this extra data and then using gzuncompress, or some other way?
Thanks
gzdecode() is not yet in PHP. But you can use the implementation from upgradephp. It really is just a few extra header bytes.
Another option would be to use gzopen. Maybe just like gzopen("data:app/bin,....") even.
Well I found my answer by reading the comments on the gzdecode page I linked in my original post. One of the users, Aaron G, provided an implementation of it and it works:
<?php
function gzdecode($data) {
$len = strlen($data);
if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {
return null; // Not GZIP format (See RFC 1952)
}
$method = ord(substr($data,2,1)); // Compression method
$flags = ord(substr($data,3,1)); // Flags
if ($flags & 31 != $flags) {
// Reserved bits are set -- NOT ALLOWED by RFC 1952
return null;
}
// NOTE: $mtime may be negative (PHP integer limitations)
$mtime = unpack("V", substr($data,4,4));
$mtime = $mtime[1];
$xfl = substr($data,8,1);
$os = substr($data,8,1);
$headerlen = 10;
$extralen = 0;
$extra = "";
if ($flags & 4) {
// 2-byte length prefixed EXTRA data in header
if ($len - $headerlen - 2 < 8) {
return false; // Invalid format
}
$extralen = unpack("v",substr($data,8,2));
$extralen = $extralen[1];
if ($len - $headerlen - 2 - $extralen < 8) {
return false; // Invalid format
}
$extra = substr($data,10,$extralen);
$headerlen += 2 + $extralen;
}
$filenamelen = 0;
$filename = "";
if ($flags & 8) {
// C-style string file NAME data in header
if ($len - $headerlen - 1 < 8) {
return false; // Invalid format
}
$filenamelen = strpos(substr($data,8+$extralen),chr(0));
if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {
return false; // Invalid format
}
$filename = substr($data,$headerlen,$filenamelen);
$headerlen += $filenamelen + 1;
}
$commentlen = 0;
$comment = "";
if ($flags & 16) {
// C-style string COMMENT data in header
if ($len - $headerlen - 1 < 8) {
return false; // Invalid format
}
$commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));
if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {
return false; // Invalid header format
}
$comment = substr($data,$headerlen,$commentlen);
$headerlen += $commentlen + 1;
}
$headercrc = "";
if ($flags & 1) {
// 2-bytes (lowest order) of CRC32 on header present
if ($len - $headerlen - 2 < 8) {
return false; // Invalid format
}
$calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;
$headercrc = unpack("v", substr($data,$headerlen,2));
$headercrc = $headercrc[1];
if ($headercrc != $calccrc) {
return false; // Bad header CRC
}
$headerlen += 2;
}
// GZIP FOOTER - These be negative due to PHP's limitations
$datacrc = unpack("V",substr($data,-8,4));
$datacrc = $datacrc[1];
$isize = unpack("V",substr($data,-4));
$isize = $isize[1];
// Perform the decompression:
$bodylen = $len-$headerlen-8;
if ($bodylen < 1) {
// This should never happen - IMPLEMENTATION BUG!
return null;
}
$body = substr($data,$headerlen,$bodylen);
$data = "";
if ($bodylen > 0) {
switch ($method) {
case 8:
// Currently the only supported compression method:
$data = gzinflate($body);
break;
default:
// Unknown compression method
return false;
}
} else {
// I'm not sure if zero-byte body content is allowed.
// Allow it for now... Do nothing...
}
// Verifiy decompressed size and CRC32:
// NOTE: This may fail with large data sizes depending on how
// PHP's integer limitations affect strlen() since $isize
// may be negative for large sizes.
if ($isize != strlen($data) || crc32($data) != $datacrc) {
// Bad format! Length or CRC doesn't match!
return false;
}
return $data;
}
?>
Try gzinflate.
Did you tried gzuncompress?
http://www.php.net/manual/en/function.gzuncompress.php
Related
I'm writing an attribute to an HDF5 file using UTF-8 encoding. As an example, I've written "äöüß" to the attribute "notes" in the file.
I'm now trying to parse the output of h5ls (or h5dump) to extract this data back. Either tool gives me an output like this:
ATTRIBUTE "notes" {
DATATYPE H5T_STRING {
STRSIZE 8;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): "\37777777703\37777777644\37777777703\37777777666\37777777703\37777777674\37777777703\37777777637"
}
}
I'm aware that, e.g., \37777777703\37777777644 somehow encodes ä as 0xC3 0xA4, however, I have a really hard time coming up with how this encoding works.
What's the magic formula behind this and how can I properly decode it back into äöüß?
The strings are encoded using base 8. I've decoded them in the PHP backend using:
$line = "This is the text including some UTF-8 bytes \37777777703\37777777644\37777777703\37777777666\37777777703\37777777674\37777777703\37777777637";
// extract UTF-8 Bytes
$octbytes;
preg_match_all("/\\\\37777777(\\d{3})/", $line, $octbytes);
// parse extracted Bytes
for ($m = 0; $m < count($octbytes[1]); ) {
$B = octdec($octbytes[1][$m]);
// UTF-8 may span over 2 to 4 Bytes
$numBytes;
if (($B & 0xF8) == 0xF0) { $numBytes = 4; }
else if (($B & 0xF0) == 0xE0) { $numBytes = 3; }
else if (($B & 0xE0) == 0xC0) { $numBytes = 2; }
else { $numBytes = 1; }
$hxstr = "";
$replaceStr = "";
for ($j = 0; $j < $numBytes; $j++) {
$match = $octbytes[1][$m+$j];
$dec = octdec($match) & 255;
$hx = strtoupper(dechex($dec));
$hxstr = $hxstr . $hx;
$replaceStr = $replaceStr . "\\37777777" . $match;
}
// pack extracted bytes into one hex string
$utfChar = pack("H*", $hxstr); // < this will be interpreted correctly
// replace Bytes in the input with the parsed chars
$parsedData = str_replace($replaceStr,$utfChar,$line);
// go to next byte
$m+=$numBytes;
}
echo "The parsed line: $line";
I have written a wrapper class around a byte stream in order to read bit by bit from that stream (bit arrays) using this method:
public function readBits($len) {
if($len === 0) {
return 0;
}
if($this->nextbyte === null) {
//no byte has been started yet
if($len % 8 == 0) {
//don't start a byte with the cache, even number of bytes
$ret = 0;
//just return byte count not bit count
$len /= 8;
while ($len--) {
if($this->bytestream->eof()) {
//no more bytes
return false;
}
$byte = $this->bytestream->readByte();
$ret = ($ret << 8) | ord($byte);
}
return $ret;
} else {
$this->nextbyte = ord($this->bytestream->readByte());
$this->byteshift = 0;
}
}
if($len <= 8 && $this->byteshift + $len <= 8) {
//get the bitmask e.g. 00000111 for 3
$bitmask = self::$includeBitmask[$len - 1];
//can be satisfied with the remaining bits
$ret = $this->nextbyte & $bitmask;
//shift by len
$this->nextbyte >>= $len;
$this->byteshift += $len;
} else {
//read the remaining bits first
$bitsremaining = 8 - $this->byteshift;
$ret = $this->readBits($bitsremaining);
//decrease len by the amount bits remaining
$len -= $bitsremaining;
//set the internal byte cache to null
$this->nextbyte = null;
if($len > 8) {
//read entire bytes as far as possible
for ($i = intval($len / 8); $i > 0; $i--) {
if($this->bytestream->eof()) {
//no more bytes
return false;
}
$byte = $this->bytestream->readByte();
$ret = ($ret << 8) | ord($byte);
}
//reduce len to the rest of the requested number
$len = $len % 8;
}
//read a new byte to get the rest required
$newbyte = $this->readBits($len);
$ret = ($ret << $len) | $newbyte;
}
if($this->byteshift === 8) {
//delete the cached byte
$this->nextbyte = null;
}
return $ret;
}
This allows me to read bit arrays of arbitrary length off my byte stream which are returned in integers (as php has no signed integers).
The problem appears once I try to read a bit array that is bigger than 64 bits and I am assuming if I were to use the class on a 32 bit system the problem would appear with 32 bit arrays already.
The problem is that the return value is obviously to big to be held within an integer, so it topples over into a negative integer.
My question now is what would be the best way to deal with this. I can think of:
Forcing the number to be saved as a string (I am unsure if that's even possible)
Use the GMP extension (which I kinda don't want to because I think the gmp bitwise methods are probably quite a performance hit compared to the normal bitwise operators)
Is there something I missed on this or is one of the options I mentioned actually the best way to deal with this problem?
Thanks for your help in advance
I am using function:
private function random($len) {
if (#is_readable('/dev/urandom')) {
$f=fopen('/dev/urandom', 'r');
$urandom=fread($f, $len);
fclose($f);
}
$return='';
for ($i=0;$i<$len;++$i) {
if (!isset($urandom)) {
if ($i%2==0) mt_srand(time()%2147 * 1000000 + (double)microtime() * 1000000);
$rand=48+mt_rand()%64;
} else $rand=48+ord($urandom[$i])%64;
if ($rand>57)
$rand+=7;
if ($rand>90)
$rand+=6;
if ($rand==123) $rand=52;
if ($rand==124) $rand=53;
$return.=chr($rand);
}
return $return;
}
I have some forms which trigger this function and I get the error:
int(2) string(200) "is_readable(): open_basedir restriction in effect.
File(/dev/urandom) is not within the allowed path(s):
Is there a way to replace this function and not to use /dev/urandom ?
Thank you very much.
From the (previously accepted) answer:
Instead of urandom you can use "rand":
Nooooooooo!
Dealing with open_basedir is one of the things we handle gracefully in random_compat. Seriously consider importing that library then just using random_bytes() instead of reading from /dev/urandom.
Whatever you do, DON'T USE rand(). Even if you believe there's a use case for it, the security trade-offs are a lie.
Also, if you need a function to generate a random string (depends on PHP 7 or random_compat):
/**
* Note: See https://paragonie.com/b/JvICXzh_jhLyt4y3 for an alternative implementation
*/
function random_string($length = 26, $alphabet = 'abcdefghijklmnopqrstuvwxyz234567')
{
if ($length < 1) {
throw new InvalidArgumentException('Length must be a positive integer');
}
$str = '';
$alphamax = strlen($alphabet) - 1;
if ($alphamax < 1) {
throw new InvalidArgumentException('Invalid alphabet');
}
for ($i = 0; $i < $length; ++$i) {
$str .= $alphabet[random_int(0, $alphamax)];
}
return $str;
}
Demo code: https://3v4l.org/DOjNE
If your host doesn't support random_int() you can use a function which I made for myself.
function generateRandomString($length, $secureRand = false, $chars="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ") {
if (!function_exists("random_int") && $secureRand) {
function random_int($min, $max) {
$range = $max - $min;
if ($range <= 0) return $min;
$log = ceil(log($range, 2));
$bytes = (int)($log / 8) + 1;
$filter = (int)(1 << ((int)($log + 1))) - 1;
do {
$rnd = hexdec(bin2hex(openssl_random_pseudo_bytes($bytes, $s)));
if (!$s) continue;
$rnd = $rnd & $filter;
} while ($rnd > $range);
return $min + $rnd;
}
}
$charsCount = strlen($chars) - 1;
$output = "";
for ($i=1; $i <= $length; $i++) {
if ($secureRand)
$output .= $chars[random_int(0, $charsCount)];
else
$output .= $chars[mt_rand(0, $charsCount)];
}
return $output;
}
If you need a secure random string (e.g. random passwords):
generateRandomString(8, true);
this will give you a 8 lenght string.
Is there a way which checks a CSV-file for UTF-8 without BOM encoding? I want to check the whole file and not a single string.
I would try to set the first line with a special character and than reading the string and checking if it matches the same string hard-coded in my script. But I don't know if this is a good idea.
Google only showed me this. But the link in the last post isn't available.
if (mb_check_encoding(file_get_contents($file), 'UTF-8')) {
// yup, all UTF-8
}
You can also go through it line by line with fgets, if the file is large and you don't want to store it all in memory at once. Not sure what you mean by the second part of your question.
I recommand this function (from the symfony toolkit):
<?php
/**
* Checks if a string is an utf8.
*
* Yi Stone Li<yili#yahoo-inc.com>
* Copyright (c) 2007 Yahoo! Inc. All rights reserved.
* Licensed under the BSD open source license
*
* #param string
*
* #return bool true if $string is valid UTF-8 and false otherwise.
*/
public static function isUTF8($string)
{
for ($idx = 0, $strlen = strlen($string); $idx < $strlen; $idx++)
{
$byte = ord($string[$idx]);
if ($byte & 0x80)
{
if (($byte & 0xE0) == 0xC0)
{
// 2 byte char
$bytes_remaining = 1;
}
else if (($byte & 0xF0) == 0xE0)
{
// 3 byte char
$bytes_remaining = 2;
}
else if (($byte & 0xF8) == 0xF0)
{
// 4 byte char
$bytes_remaining = 3;
}
else
{
return false;
}
if ($idx + $bytes_remaining >= $strlen)
{
return false;
}
while ($bytes_remaining--)
{
if ((ord($string[++$idx]) & 0xC0) != 0x80)
{
return false;
}
}
}
}
return true;
}
But as it check all the characters of the string, I don't recommand to use it on a large file. Just check the first 10 lines i.e.
<?php
$handle = fopen("mycsv.csv", "r");
$check_string = "";
$line = 1;
if ($handle) {
while ((($buffer = fgets($handle, 4096)) !== false) && $line < 11) {
$check_string .= $buffer;
$line++;
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
var_dump( self::isUTF8($check_string) );
}
Using md5() function in PHP directly gives me the String. What I want to do before saving the string in the database is remove zeroes 0 if any in the byte representation of that hex and that byte representation is < 0x10 and then save the string in the database.
How can I do this in PHP?
MD5 - PHP - Raw Value - catch12 - 214423105677f2375487b4c6880c12ae - This is what I get now. Below is the value that I want the PHP to save in the database.
MD5 - Raw Value - catch12 - 214423105677f2375487b4c688c12ae
Wondering why? The MD5 code I have in my Android App for Login and Signup I did not append zeroes for the condition if ((b & 0xFF) < 0x10) hex.append("0"); Works fine. But the Forgot Password functionality in the site is PHP which is when the mismatch happens if the user resets password. JAVA code below.
byte raw[] = md.digest();
StringBuffer hexString = new StringBuffer();
for (int i=0; i<raw.length; i++)
hexString.append(Integer.toHexString(0xFF & raw[i]));
v_password = hexString.toString();
Any help on the PHP side so that the mismatch does not happen would be very very helpful. I can't change the App code because that would create problems for existing users.
Thank you.
Pass the "normal" MD5 hash to this function. It will parse it into the individual byte pairs and strip leading zeros.
EDIT: Fixed a typo
function convertMD5($md5)
{
$bytearr = str_split($md5, 2);
$ret = '';
foreach ($bytearr as $byte)
$ret .= ($byte[0] == '0') ? str_replace('0', '', $byte) : $byte;
return $ret;
}
Alternatively, if you don't want zero-bytes completely stripped (if you want 0x00 to be '0'), use this version:
function convertMD5($md5)
{
$bytearr = str_split($md5, 2);
$ret = '';
foreach ($bytearr as $byte)
$ret .= ($byte[0] == '0') ? $byte[1] : $byte;
return $ret;
}
$md5 = md5('catch12');
$new_md5 = '';
for ($i = 0; $i < 32; $i += 2)
{
if ($md5[$i] != '0') $new_md5 .= $md5[$i];
$new_md5 .= $md5[$i+1];
}
echo $new_md5;
To strip leading zeros (00->0, 0a->a, 10->10)
function stripZeros($md5hex) {
$res =''; $t = str_split($md5hex, 2);
foreach($t as $pair) $res .= dechex(hexdec($pair));
return $res;
}
To strip leading zeros & zero bytes (00->nothing, 0a->a, 10->10)
function stripZeros($md5hex) {
$res =''; $t = str_split($md5hex, 2);
foreach($t as $pair) {
$b = dechex(hexdec($pair));
if ($b!=0) $res .= $b;
}
return $res;
}