MD5 hash discrepancy between Python and PHP? - php

I'm trying to create a checksum of a binary file (flv/f4v, etc) to verify the contents of the file between the server and client computers. The application that's running on the client computer is python-based, while the server is using PHP.
PHP code is as follows:
$fh = fopen($filepath, 'rb');
$contents = fread($fh, filesize($filepath));
$checksum = md5(base64_encode($contents));
fclose($fh);
Python code is as follows:
def _get_md5(filepath):
fh = open(filepath, 'rb')
md5 = hashlib.md5()
md5.update(f.read().encode('base64'))
checksum = md5.hexdigest()
f.close()
return checksum
on the particular file I'm testing, the PHP and Python md5 hash strings are as follows, respectively:
cfad0d835eb88e5342e843402cc42764
0a96e9cc3bb0354d783dfcb729248ce0
Server is running CentOS, while the client is a MacOSX environment. I would greatly appreciate any help in understanding why the two are generating different hash results, or if it something I overlooked (I am relatively new to Python...). Thank you!
[post mortem: the problem was ultimately the difference between Python and PHP's base64 encoding varieties. MD5 works the same between the two scripting platforms (at least using .hexdigest() in Python).]

I would rather assume that the base64 implementations differ.
EDIT
PHP:
php -r 'var_dump(base64_encode(str_repeat("x", 10)));'
string(16) "eHh4eHh4eHh4eA=="
Python (Note the trailing newline):
>>> ("x" * 10).encode('base64')
'eHh4eHh4eHh4eA==\n'

PHP and python use different base64 flavors:
PHP's base64_encode uses MIME (RFC 2045, see page 24)
Python's base64 module uses RFC 3548.

The problem seems to be that your base-64-encoding the file data, changing the structure of the binary data, in php I belive that it does not base_64 encode the file.
Give this a go:
def md5_file(filename):
//MD5 Object
crc = hashlib.md5()
//File Pointer Object
fp = open(filename, 'rb')
//Loop the File to update the hash checksum
for i in fp:
crc.update(i)
//Close the resource
fp.close()
//Return the hash
return crc.hexdigest()
and within PHP use md5_file and see if that works accordingly.
python taken from: http://www.php2python.com/wiki/function.md5-file/

Python appends a newline '\n' to the string when using .encode, therefore the input strings to the md5 function are different. This issue in the Python bug tracker explains it in detail. See below for the gist of it:
>>> import base64
>>> s='I am a string'
>>> s.encode('base64')
'SSBhbSBhIHN0cmluZw==\n'
>>> base64.b64encode(s)
'SSBhbSBhIHN0cmluZw=='
>>> s.encode('base64')== base64.b64encode(s)+'\n'
True

Related

Unexpected output with zlib_encode()

I'm trying to encode a chunk of binary data with PHP in the same way zlib's compress2() function does it. However, using zlib_encode(), I get the wrong encoded output. I know this because I have a C program that does it (correctly). When I compare the output (using a hex editor) of the C program against that of the PHP script below, I notice it doesn't match at all.
My question I guess is, does this really compress in the same way zlib's compress2() function does?
<?php
$filename = 'C:\data.bin';
$in = fopen($filename, 'rb');
$data = fread($in, filesize($filename));
fclose($in);
$data_dec = zlib_decode($data);
$data_enc = zlib_encode($data_dec, ZLIB_ENCODING_DEFLATE, 9);
?>
The compression level is correct, so it should match with the C program's encoded output. Is there a bug somewhere perhaps.. ?
Yes, zlib_encode() (with the default arguments), and uncompress() are compatible, and compress2() and zlib_decode() are compatible.
The way to check is not to compare compressed output. Check by decompressing with uncompress() and zlib_decode(). There is no reason to expect that the compressed output will be the same, and it does not need to be. All that matters is that it can be losslessly decompressed on the other end.

SHA-1 base32 of a file report back to a string

I'm having troubles figuring out how to implement a program that I can generate a base32 sha-1 value of a file. I know it can't be too difficult to figure out as to generate a standard sha1 file is fairly easy.
$file1 = sha1_file('main.jpg');
Any help would be appreciated.
If you want to encode the SHA1 value as Base32, you gonna have to either write that yourself or find a library. PHP does not have it built-in, like it does with base64_encode.
A while ago, I needed a base32_encode function, so I wrote one. I don't know how efficient it is, and I'm sure better ones exist out there, but it does work. It's located here: https://github.com/NTICompass/PHP-Base32
Using that you can do:
<?php
include 'Base32.php';
$base32 = new Base32;
$file1 = sha1_file('main.jpg');
echo $base32->base32_encode($file1);
I have a PHP class, Base2n, which can encode Base32 per RFC 4648. It's actually more flexible than that, allowing you to parametrically define many standard and non-standard encoding schemes with a 2n base.
https://github.com/ademarre/binary-to-text-php
Base32 is demonstrated in the first example of the README file.
Here's what your "main.jpg" example would look like:
// include, require, or autoload Base2n
$file1 = sha1_file('main.jpg', TRUE);
$base32 = new Base2n(5, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567', FALSE, TRUE, TRUE);
$base32FileHash = $base32->encode($file1);
Notice that I also set $raw_output=TRUE in the call to sha1_file(). I assume this is what you want because otherwise the Base32 output would be done on the hexadecimal representation of the SHA-1 hash digest, not the raw 160-bit digest itself.

PHP write binary response

In php is there a way to write binary data to the response stream,
like the equivalent of (c# asp)
System.IO.BinaryWriter Binary = new System.IO.BinaryWriter(Response.OutputStream);
Binary.Write((System.Int32)1);//01000000
Binary.Write((System.Int32)1020);//FC030000
Binary.Close();
I would then like to be able read the response in a c# application, like
System.Net.HttpWebRequest Request = (System.Net.HttpWebRequest)System.Net.WebRequest.Create("URI");
System.IO.BinaryReader Binary = new System.IO.BinaryReader(Request.GetResponse().GetResponseStream());
System.Int32 i = Binary.ReadInt32();//1
i = Binary.ReadInt32();//1020
Binary.Close();
In PHP, strings and byte arrays are one and the same. Use pack to create a byte array (string) that you can then write. Once I realized that, life got easier.
$my_byte_array = pack("LL", 0x01000000, 0xFC030000);
$fp = fopen("somefile.txt", "w");
fwrite($fp, $my_byte_array);
// or just echo to stdout
echo $my_byte_array;
Usually, I use chr();
echo chr(255); // Returns one byte, value 0xFF
http://php.net/manual/en/function.chr.php
This is the same answer I posted to this, similar, question.
Assuming that array $binary is a previously constructed array bytes (like monochrome bitmap pixels in my case) that you want written to the disk in this exact order, the below code worked for me on an AMD 1055t running ubuntu server 10.04 LTS.
I iterated over every kind of answer I could find on the Net, checking the output (I used either shed or vi, like in this answer) to confirm the results.
<?php
$fp = fopen($base.".bin", "w");
$binout=Array();
for($idx=0; $idx < $stop; $idx=$idx+2 ){
if( array_key_exists($idx,$binary) )
fwrite($fp,pack( "n", $binary[$idx]<<8 | $binary[$idx+1]));
else {
echo "index $idx not found in array \$binary[], wtf?\n";
}
}
fclose($fp);
echo "Filename $base.bin had ".filesize($base.".bin")." bytes written\n";
?>
You probably want the pack function -- it gives you a decent amount of control over how you want your values structured as well, i.e., 16 bits or 32 bits at a time, little-endian versus big-endian, etc.

Python's cPickle deserialization from PHP?

I have to deserialize a dictionary in PHP that was serialized using cPickle in Python.
In this specific case I probably could just regexp the wanted information, but is there a better way? Any extensions for PHP that would allow me to deserialize more natively the whole dictionary?
Apparently it is serialized in Python like this:
import cPickle as pickle
data = { 'user_id' : 5 }
pickled = pickle.dumps(data)
print pickled
Contents of such serialization cannot be pasted easily to here, because it contains binary data.
If you want to share data objects between programs written in different languages, it might be easier to serialize/deserialize using something like JSON instead. Most major programming languages have a JSON library.
Can you do a system call? You could use a python script like this to convert the pickle data into json:
# pickle2json.py
import sys, optparse, cPickle, os
try:
import json
except:
import simplejson as json
# Setup the arguments this script can accept from the command line
parser = optparse.OptionParser()
parser.add_option('-p','--pickled_data_path',dest="pickled_data_path",type="string",help="Path to the file containing pickled data.")
parser.add_option('-j','--json_data_path',dest="json_data_path",type="string",help="Path to where the json data should be saved.")
opts,args=parser.parse_args()
# Load in the pickled data from either a file or the standard input stream
if opts.pickled_data_path:
unpickled_data = cPickle.loads(open(opts.pickled_data_path).read())
else:
unpickled_data = cPickle.loads(sys.stdin.read())
# Output the json version of the data either to another file or to the standard output
if opts.json_data_path:
open(opts.json_data_path, 'w').write(json.dumps(unpickled_data))
else:
print json.dumps(unpickled_data)
This way, if your getting the data from a file you could do something like this:
<?php
exec("python pickle2json.py -p pickled_data.txt", $json_data = array());
?>
or if you want to save it out to a file this:
<?php
system("python pickle2json.py -p pickled_data.txt -j p_to_j.json");
?>
All the code above probably isn't perfect (I'm not a PHP developer), but would something like this work for you?
I know this is ancient, but I've just needed to do this for a Django 1.3 app (circa 2012) and found this:
https://github.com/terryf/Phpickle
So just in case, one day, someone else needs the same solution.
If the pickle is being created by the the code that you showed, then it won't contain binary data -- unless you are calling newlines "binary data". See the Python docs. Following code was run by Python 2.6.
>>> import cPickle
>>> data = {'user_id': 5}
>>> for protocol in (0, 1, 2): # protocol 0 is the default
... print protocol, repr(cPickle.dumps(data, protocol))
...
0 "(dp1\nS'user_id'\np2\nI5\ns."
1 '}q\x01U\x07user_idq\x02K\x05s.'
2 '\x80\x02}q\x01U\x07user_idq\x02K\x05s.'
>>>
Which of the above looks most like what you are seeing? Can you post the pickled file contents as displayed by a hex editor/dumper or whatever is the PHP equivalent of Python's repr()? How many items in a typical dictionary? What data types other than "integer" and "string of 8-bit bytes" (what encoding?)?

in php how to find already encrypted file in specific folder

I am using PGP (GNU Privacy Guard) for encrypting the file.
while encrypting i removed the '.pgp' extension of encrypted file.
Now some how i want to know which file is already encrypted in the specific folder.
Note :- my goal is that ... do not encrypt any file twice ... so before encrypt any file .. i want to check is the file already encrypted.
in php can we find out which file is already encrypted ?
PGP file all starts with "-----BEGIN PGP MESSAGE-----".
So you can do something like this:
$content = file_get_contents($filename);
$encrypted = strpos($content, '-----BEGIN PGP MESSAGE-----') === 0;
I really don't know much about how it works, or how you could look at the contents of the file to tell if it is encrypted properly, but could you try decrypting them? If you know you're only working with plain text files, you could examine the first 500 bytes of the decrypted data and if there's strange characters (outside the standard a-z A-Z 0-9 + punctuation, etc), then that could be a clue that the file wasn't encrypted.
This really is a half-arsed answer, I know, but it was a bit long to fit into a comment.
You can't unless you understand the algorithm used in the encryption. Once you understand it, you can apply that to check whether a file is already encrypted.
Also check to make sure that there is already a function available in PGP for checking if something is already encrypted. This is usually present in encryption solutions.
Thanks
There are two possible formats for OpenPGP data, binary and ascii armored.
Ascii-armored files are easy to recognize by looking for "-----BEGIN PGP MESSAGE-----" which can also be done using the unix command file:
$ file encrypted
encrypted: PGP message
#ZZ_Coders answer is totally fine if you're only dealing with ascii armored encrypted files.
If it shows something else, it's not an OpenPGP message - or in binary format. This isn't as easy to recognize (at least I don't know which magic packets you could look for), but you can easily use the gpg command to test the file:
$ gpg --list-only --list-packets encrypted
:pubkey enc packet: version 3, algo 1, keyid DEAFBEEFDEADBEEF
data: [2048 bits]
:encrypted data packet:
length: 73
mdc_method: 2
If it isn't encrypted, response will look like this:
$ gpg --list-only --list-packets something_else
gpg: no valid OpenPGP data found.
In PHP, you could use this code to check if a file is OpenPGP-encrypted:
if (strpos(`gpg --list-only --list-packets my_file.txt 2>&1`,
'encrypted data packet'))
echo "encrypted file";

Categories