I am trying to build an application that needs to compare the MD5 hash of any file.
Due to specific issues, before the upload, the MD5 must be generated client side, and after the upload the application needs to check it server side.
My first approach was to use, at the client side, the JavaScript File API and the FileReader.ReadAs functions. Then I use the MD5 algorithm found here: http://pajhome.org.uk/crypt/md5/
Server side, I would use PHP's fopen command and the md5 function.
This approach works fine when using simple text files. But, when a binary file is used (like some jpg or pdf), the MD5 generated at the client side is different from the server. Using md5sum command-line tool I figured out that the server MD5 is correct and the problem occurs at client side.
I've tried other MD5 API's I found with the same results. I suspect that FileReader.ReadAs functions are loading the file content slightly differently (I have tried all ReadAs function variants: text, binary and so on), but I can't figure out what is the difference.
I'm missing something but don't know what, maybe I need to decode the content somehow before generating the MD5.
Any tips?
Edit 1:
I followed the idea given by optima1. Took each character and printed the unicode number both on javascript and PHP. I could see only one difference at the end on all the cases (used vimdiff).
PHP: 54 51 10 37 37 69 79 70 0
Javascript: 54 51 10 37 37 69 79 70
Maybe this extra zero at PHP is some kind of "string end". On both cases the binary strings have the same length. Adding a String.fromCharCode(0) to the end of the JS content do not solve the problem. I will keep investigating.
If i can't find a solution i will try to build a giant string by concatenating those charcodes and using it to build the MD5. It is a crap solution but will serve for now and i will just need to add a zero to the end of the JS string...
Edit 2:
Thank God! This implementantion works like a charm: http://www.myersdaily.org/joseph/javascript/md5.js
If you need to generate a MD5 hash from binary files, go for it.
Thanks in advance!
http://membres-liglab.imag.fr/donsez/cours/exemplescourstechnoweb/js_securehash/
javascript md5 and php md5 both are same but we need to use some functions...that functions we can get from above url....
I would suggest doing a quick sanity check: have your client-side code report the first and last bytes of the binary data. Repeat in your PHP code. Compare first and last bytes from both methods to ensure that they are in fact reading the same data (which should result in the same MD5 hash.)
Then I would suggest posting code here so that we can review.
Related
Is there anything out there for PHP that can hash/encrypt a long string into a 128 bit string that can also be reversed?
I am trying importing hundreds on millions of strings into a MySQL DB and the average string is over 100 characters, MD5 gets this down to 32 characters which significantly reduces storage however I cannot reverse this again in my application.
Does PHP have anything available that can handle this?
If I understand your question correctly, it seems to me you mix up hashing and compression quite a lot.
Most hash-functions are not easily reversible, because that is not their purpose. There are infinite "Strings/ByteStreams/Numbers/..." that correspond to the result of a hash-function. As you may know, even images that are a few Gigabytes big, also give you an md5sum of 32 characters.
You can not just magically map any String into a String of fixed length that is shorter, to just be able to magically pouff it back to its original String.
It may well be, that some hash-functions could very efficiently be used to reverse their process if you know that your target results have to have this and that property (in you case maybe character-length of 100-120), but I doubt it.
Or do I totally misunderstand and you just mean ASCII-Strings with the expression "128 bit string"?
No, you can't do this: Pigeonhole principle
I'm currently working with data I'd like to temporarily store in my database as encrypted data. I'm not worried about the database getting hacked into, I just want to ensure the people that had entered the data that it is not reachable by any other than themselves. (and me of course)
The data is not meant to be stored permanently in the database since I'm exporting it to a third party application using their API, but since they have a rate limit I need to store the data in our database until the limit is over and I can upload it. (Assuming the rate limit occurs)
The process:
The request I receive from the form is in an array, so to begin with I serialize() the array to get a long string which I will unserialize() later.
Then I want to use a method that lets me convert the string into numbers and back again without losing information.
The reason I want to turn the data into numbers is because I use the HashIds library, which only encodes numbers. To my knowledge it's an extra layer of security I'm happy to add.
Read more on HashIds here: http://hashids.org/
What I have tried:
I tried converting the string into hex numbers, and then the hex numbers into decimals. Unfortunately the number was too large, and i haven't had any luck using biginteger with it.
base64_encode() which is not going to turn the data into numbers, but then base_converting them is. But I couldn't figure out the base converting in php since apparently it's rather odd.
Conclusion:
How can I convert the data I'm receiving from a form request into a short encoded string which can be converted back into the data without too much hassle? I don't quite know all the options PHP offers yet.
UPDATE:
To conclude this thread, I ended up using OpenSSL to encrypt my serialized array. Only problem I ran into was if the request contained a file I wouldn't be able to serialize it and save the object to the database. I do still need a way around this, since the third party application expects the file to be a multipart/formdata object i can't just save the filepath to the database and upload that. But I guess I will have to figure out that one later.
That link http://hashids.org/ provides a pretty clear example. Lets assume that your integer is 15.
$hashids = new Hashids\Hashids('some random string for a salt. Make sure you use the same salt if you want to be able to decode');
$encoded = $hashids->encode(15);
print_r(['hashedId' => $encoded]);
$decoded = $hashids->decode($hashed);
print_r(['decoded' => $decoded]);
So the value of $decoded should equal 15
Update
Sorry - the hashids bit of your question threw me and as such, I misunderstood what you were asking. I will update my answer:
You should really be using https://secure.php.net/openssl_encrypt and https://secure.php.net/manual/en/function.openssl-decrypt.php
I'm writing a command line application in PHP that accepts a path to a local input file as an argument. The input file will contain one of the following things:
JSON encoded associative array
A serialized() version of the associative array
A base 64 encoded version of the serialized() associative array
Base 64 encoded JSON encoded associative array
A plain old PHP associative array
Rubbish
In short, there are several dissimilar programs that I have no control over that will be writing to this file, in a uniform way that I can understand, once I actually figure out the format. Once I figure out how to ingest the data, I can just run with it.
What I'm considering is:
If the first byte of the file is { , try json_decode(), see if it fails.
If the first byte of the file is < or $, try include(), see if it fails.
if the first three bytes of the file match a:[0-9], try unserialize().
If not the first three, try base64_decode(), see if it fails. If not:
Check the first bytes of the decoded data, again.
If all of that fails, it's rubbish.
That just seems quite expensive for quite a simple task. Could I be doing it in a better way? If so, how?
There isn't much to optimize here. The magic bytes approach is already the way to go. But of course the actual deserialization functions can be avoided. It's feasible to use a verification regex for each instead (which despite the meme are often faster than having PHP actually unpack a nested array).
base64 is easy enough to probe for.
json can be checked with a regex. Fastest way to check if a string is JSON in PHP? is the RFC version for securing it in JS. But it would be feasible to write a complete json (?R) match rule.
serialize is a bit more difficult without a proper unpack function. But with some heuristics you can already assert that it's a serialize blob.
php array scripts can be probed a bit faster with token_get_all. Or if the format and data is constrained enough, again with a regex.
The more important question here is, do you need reliability - or simplicity and speed?
For speed, you could use the file(1) utility and add "magic numbers" in /usr/share/file/magic. It should be faster than a pure PHP alternative.
You can try json_decode() and unserialize() which will return NULL if they fail, then base64_decode() and run that again. It's not fast, but it's infinitely less error prone than hand parsing them...
The issue here is that if you have no idea which it can be, you will need to develop a detection algorithm. Conventions should be set with an extension (check the extension, if it fails, tell whoever put the file there to place the correct extension on), otherwise you will need to check yourself. Most algorithms that detect what type a file actually is do use hereustics to determine it's contents (exe, jpg etc) because generally they have some sort of signature that identifies them. So if you have no idea what the content will be for definate, it's best to look for features that are specific to those contents. This does sometimes mean reading more than a couple of bytes.
Can any one please let me know the way, how can i encrypt/decrypt a file instead of string. I mean i need to encrypt the entire file it may be an excel-sheet or document or even text file.
instead of string.
That rather implies that you already know how to encrypt the string - and since you're being specific about the algorithm, that you can create an appropriate representation for the other tools being used to operate on the data. But you haven't said what mode of operation you need to use - implementing this using CBC is trivial.
It's also not stated - but implied in your question, that the data is too large to load into a string (otherwise its simply a case of encrypting file_get_contents()).
There doesn't seem to be much in the way of documentation, but I would expect the modificed key required for ECB is updated in the resource created by mcrypt_module_open() and modified by mcrypt_generic_init(). Then its just a matter of feeding in parts from the file sized as a multiple of the block size (see mcrypt_get_block_size)
See http://www.php.net/manual/en/function.mcrypt-module-open.php
C.
I'm a little confused, can't you just read/write the string to a file using functions like file_get_contents and file_put_contents?
If you need an encryption-class there are some over at PHP classes. There is also a paid solution here: phpAES.
I guess it is better to create your own library for it and expose an API that just accepts a filepath instead of it content. It can open read the file and do the encryption / decryption.
You can use your own or pre-existing algo for encrypt/decrypt. Also you can have an argument in that API to accept the filepath to store the decrypted data or replace with the same file or whatever.
Is there a maximum length for the URI in the file_get_contents() function in PHP?
I suppose there is a maximum length, but you'll be hard pressed to find it. If you do hit the maximum, you're doing something wrong. :)
I haven't been able to find a number for PHP specifically, but MS IIS, Apache and the Perl HTTP::Daemon seem to have limits between 4,000 and 16,384 bytes, PHP will probably be somewhere around there as well.
What you need to consider is not really how much your side can handle, but also how much the other server you're querying can handle (which is presumably what you're doing). As such, any URL longer than ~1000 characters is usually already way too long and never really encountered in the real world.
As others have stated, it is most likely limited by the HTTP protocol.
You can view this answer for more info on that : What is the maximum length of an url?
In HTTP there's no length-limit for URI,and there's no note of file_get_contents() about this in the manual .So I think you needn't to consider about this problem.
BTW,the length of URI is limited by some browser and webserver.For example,in IE,the length should be less than 2083 and in FF it's 65,536.I tried to test this I found that only not more than 8182 is OK when I visited my apache on ubuntu because of limit of my apache.