The initial task is to process image, hash it, make some heavy image work and store this hash and work results in database,
during next request with same image I want to compare the image hashes with hashes I have in database and load database-cached results to reduce amount of heavy work.
So the questions are, what to hash? with what to hash?
I see good php implementations of phash but seems to be it is great for similarity check, but we need exact matching.
Is phash fine for exact mathing also?
Thank you!
PHP provides a built-in function for this, which is probably the easiest solution:
$hash = hash_file("sha1", '/path/to/image');
You can use this check for exact matches. There is a small chance of collisions, but you can help mitigate that by also using the file path or database ID in your comparison.
The answers in this similar question provide more options.
Related
Ok, now while I understand the chances of reproducing a verification code made up of some 50-100 random character is slim to none, do any of you guys do anything to hedge against the off chance that two users are provided with the same random verification code? I.e. Would you store these codes (tokens, whatever you want to call them) in a DB? Just wondering, logically, not even necessarily programmatically how you guys go about this or, even in the most secure systems, if it is even necessary. Thanks.
You have several options, depending on what php version you're using.
For PHP >= 7.1(I believe) you have random_bytes which returns a random series of bytes, you need to use bin2hex to get a readable series of characters.
For versions less that 7.x you can use openssl_random_pseudo_bytes. Notice the "pseudo" part. It's not truly random, but for your purposes it should be considered "random enough".
You can directly read from random or urandom if using a linux distro.
Read here about the differences between the two.
Storing them in the database is perfectly fine.
Do note that functions like rand aren't truly random. See here.
As for the question itself:
You don't really need truly random tokens for email verification. Normally email verifications are associated with, well, an email and usually have an expiration period (1, 2, 3 hours, whatever you want it to be), you you don't need them to be perfectly random just random enough.
For your purposes even str_shuffle would be good enough.
Don't over complicate things whenever possible.
I'm trying to create a dynamic avatar for my website's users. Something like stackoverflow. I have a PHP script which generates an image based on a string:
path/to/avatar.php?hash=string
I want to use the MD5 of users' emails as the name of their avatars: (and as that string PHP script generates an image based on)
$email = $_GET['email'];
$hash = md5($email);
copy("path/to/avatar.php?hash=$hash","path/img/$hash.jpg");
Now I want to be sure, can I use the MD5 of their emails as their avatar's name? I mean isn't there two different strings which have identical MD5's output? In other word I want to know whether will be the output of two different strings unique?
I don't know my question is clear or not .. All I want to know, is there any possibility of being duplicate the MD5 of two different emails?
As the goal here is to use a hash for it's uniqueness rather than it's cryptographic strength MD5 is acceptable. Although I still wouldn't recommend it.
If you do settle on using MD5, use a globally unique id that you control rather than an user-supplied email address, along with a salt.
i.e.
$salt = 'random string';
$hash = md5($salt . $userId);
However:
There is still a small chance of a collision (starting at 2128 and approaching 264 relatively quickly due to the Birthday Paradox). Remember this is a chance, hashn and hashn+1 could collide.
There is not a reasonable way to determine the userId from the hash (I don't consider indexing 128-bit hashes so you can query them to be reasonable).
You use StackOverflow as an example.
User profiles on this site look like: http://stackoverflow.com/users/2805376/shafizadeh
So what is wrong with having avatar urls like http://your_site/users/2805376/avatar.png ? The back end storage could simply be /path/to/images/002/805/376.png
This guarantees a unique name, and provides you with a very simple and easy to work with way of storing, locating, and reversing the id assigned to images back to the user.
This is actually what Gravatar is doing (this was the standard way to get an avatar in Stackoverflow). Have a look at Gravatars implementation.
The chance of a collision is negligible in practice, it is difficult enough to intentionally forge two (binary) strings which result in the same MD5 and EMails are restricted in size and characters.
One problem of this approach is what Fred-ii- mentioned, because brute-forcing of MD5 is so fast (100 Giga MD5 per second), somebody could try to find the original email address, whose MD5 is now visible. For short emails this would work in reasonable time.
Using a UUID could be a good alternative to derriving from an EMail address. You can create such an id without database access and be sure that you won't get a duplicate.
I need to know if exists any form to get a unique hash from gif images, i did tried with SHA1 file function
sha1_file
but i don't know if exist the case where two hash of different gif images, result in same hash.
Its can happen with SHA1? In this case is better SHA2, or MD5? Or any other previously implemented in PHP language.
I know its also depends of file size, but gifs image don't exceed 10mb in any case.
I need recommendations for this problem. best regards.
There is no hash function that creates different values for each and every set of images you provide. This should be obvious as your hash values are much shorter than the files themselves and therefore they are bound to drop some information on the way. Given a fixed set of images it is rather simple to produce a perfect hash function (e.g. by numbering them), but this is probably not the answer you are looking for.
On the other hand you can use "perfect hashing", a two step hashing algorithm that guarantees amortized O(1) access using a two step hashing algorithm, but as you are asking for a unique 'hash' that may also not be what you are looking for. Could you be a bit more specific about why you insist on the hash-value being unique and under what circumstances?
sha1_file is fine.
In theory you can run into two files that hash to the same value, but in practice it is so stupendously unlikely that you should not worry about it.
Hash functions don't provide any guarantees about uniqueness. Patru explains why, very well - this is the pigeonhole principle, if you'd like to read up.
I'd like to talk about another aspect, though. While you won't get any theoretical guarantees, you get a practical guarantee. Consider this: SHA-256 generates hashes that are 256 bits long. That means there are 2256 possible hashes it can generate. Assume further that the hashes it generates are distributed almost purely randomly (true for SHA-256). That means that if you generate a billion hashes a second, 24 hours a day, you'll have generated 31,536,000,000,000,000 hashes a year. A lot, right?
Divide that by 2256. That's ~1060. If you walked linearly through all possible hashes, that's how many years it would take you to generate all possible hashes (pack a lunch). Divide that by two, that's... still ~1060. That's how many years you'd have to work to have a greater than 50% chance of generating the same hash twice.
To put it another way, if you generate a billion hashes a second for a century, you'd have a 1/1058 chance of generating the same hash twice. Until the sun burns out, 1/1050.
Those are damn fine chances.
i wasn't able to answer my question.
I need a hashing method that will generate a hash that can be compared with others and find out the fidelity,
let's say i have to 2 strings, "mother", "father" and when i compare the 2 hashes, it will say that there is a fidelity between them because of the "ther".
Is there any hashing method that it's able to do that?
thank you
PHP provides a function called similar_text which calculates similarity between two strings. You could also use the levenshtein function to calculate the distance between the two strings. Whilst these aren't hashing functions, I think they should provide the functionality I think you're after.
I'm not sure if you were looking for an answer specific to your specific case of 2 words, but there are definitely hash-style functions that are useful for comparing parts of a whole. A Hash Tree is a perfect example of one such structure. Hash trees are used to compare parts of a chunk of data and they aggregate for comparison of the entire chunk of data.
I'll also note that while others point out that most real world hash functions will not allow any information about the input to be derived from the output, they are talking about a Cryptographic Hash Function. The set of guarantees for a regular Hash Function is much less strict than those of a Cryptographic Hash Function. For instance, in Java you can override .hashCode() and return 4 for every object. This is perfectly valid, but not extremely useful. It is valid because collisions are ok in general hash functions, but they are considered failure in a cryptographic hash function.
I believe rot13, along with taking out all the vowels would qualify. Any real-world hash would not. That's kind of the point.
In short: There can't be in a universal sense of the word
This is why:
One of the main functions of a hash is compression - apart from trivial usage (such as "mother" and "father") a hash will allways be shorter than the hashed information. E.g. a SHA1 (or even MD5) as a quick check, whether a a download of a 600MB ISO went without corruption will be much shorter than the file itself.
Another main function of a hash is (very high grade) obfuscation. Were this not so, hashing a salted password would do nothing (or at least much less) to protect against a dicitionary attack, as similar passwords would result in similar hashes.
What is the absolute fastest way to hash a string in PHP?
I have read that md5 can be relatively slow but am unsure of the alternatives.
Basically, i have a function that i need to squeeze every last bit of performance possible out of and within that function i have a string say "yada yada yada" and i need it hashed in someway so it becomes one string.
I should note that security is no issue here - i simply need a single unique string representation, as its for a cache key.
The whole point of a hash is that it's -not- fast. The faster the hash is the faster it can be cracked.
By that logic, the less secure the hash is - the faster it'll be. If you're going to favour such logic I suggest you either stop what you're doing or use encryption instead.
In response to your update
It sounds like you may want a CRC. Again it's worth mentioning that typically the faster the check is the less combinations exist for the particular algorithm, and thus it's less likely to be a "unique representation".
The associated PHP documentation can be found here: hash function with crc32/crc32b
Benchmarks. I seem to recall reading somewhere that this depends a lot of your version of apache and PHP, can't remember where though. I'll post if I remember :)