What is the absolute fastest way to hash a string in PHP?
I have read that md5 can be relatively slow but am unsure of the alternatives.
Basically, i have a function that i need to squeeze every last bit of performance possible out of and within that function i have a string say "yada yada yada" and i need it hashed in someway so it becomes one string.
I should note that security is no issue here - i simply need a single unique string representation, as its for a cache key.
The whole point of a hash is that it's -not- fast. The faster the hash is the faster it can be cracked.
By that logic, the less secure the hash is - the faster it'll be. If you're going to favour such logic I suggest you either stop what you're doing or use encryption instead.
In response to your update
It sounds like you may want a CRC. Again it's worth mentioning that typically the faster the check is the less combinations exist for the particular algorithm, and thus it's less likely to be a "unique representation".
The associated PHP documentation can be found here: hash function with crc32/crc32b
Benchmarks. I seem to recall reading somewhere that this depends a lot of your version of apache and PHP, can't remember where though. I'll post if I remember :)
Related
The initial task is to process image, hash it, make some heavy image work and store this hash and work results in database,
during next request with same image I want to compare the image hashes with hashes I have in database and load database-cached results to reduce amount of heavy work.
So the questions are, what to hash? with what to hash?
I see good php implementations of phash but seems to be it is great for similarity check, but we need exact matching.
Is phash fine for exact mathing also?
Thank you!
PHP provides a built-in function for this, which is probably the easiest solution:
$hash = hash_file("sha1", '/path/to/image');
You can use this check for exact matches. There is a small chance of collisions, but you can help mitigate that by also using the file path or database ID in your comparison.
The answers in this similar question provide more options.
Suppose you have two datasets that you need to make sure that they have not changed. For example, you have an array of objects in one hand, and another array in the other hand. Now, you need to verify that both arrays are exactly the same.
Each array can contain any type data: boolean, strings, objects, arrays, NULL, etc.
When comparing both array contents should be exactly the same. Same data type and same order.
Instead of iterating over the array contents, with code that can compare different types of data, and possible recursive comparisons, I came with a solution that I would be grateful if you could shed a light if there is any downside in. PHP is the language, but I'm more interested in a language-neutral answer.
I serialized both datasets separately, and calculated their md5 hashes. I chose md5 because it is available without external extensions or libraries, and works quite fast. I am aware of chance of a collision, and md5 hashes are no where nearly cryptographically secure.
My question is that:
Is it a widely used method to validate the arbitrary types of data. Checking file checksums make sense, but I have not personally used it to compare variables like this.
I'm mainly doing this to keep my code simple. A comparison is probably faster because it can break the comparison whenever it finds a mismatch first. In my case, the length of the data is fairly small. About 5kb as a serialized string.
Are there any other downsites that I should know off.
Thanks in advance.
If you're looking for changes in an array I would actually recommend using CRC32(). Like MD5() this function has been available in PHP since version 4 and requires no special libraries adding. However, CRC32() is actually meant for the purpose of error checking and is quicker than MD5(), which is meant as a hashing function and as such is slower by design.
Especially in terms of your language agnostic answer, I would always choose CRC32() over MD5() as it's much much simpler to find libraries for and it is much less computationally expensive making it ideal for pretty much every application, even embedded devices.
I need to know if exists any form to get a unique hash from gif images, i did tried with SHA1 file function
sha1_file
but i don't know if exist the case where two hash of different gif images, result in same hash.
Its can happen with SHA1? In this case is better SHA2, or MD5? Or any other previously implemented in PHP language.
I know its also depends of file size, but gifs image don't exceed 10mb in any case.
I need recommendations for this problem. best regards.
There is no hash function that creates different values for each and every set of images you provide. This should be obvious as your hash values are much shorter than the files themselves and therefore they are bound to drop some information on the way. Given a fixed set of images it is rather simple to produce a perfect hash function (e.g. by numbering them), but this is probably not the answer you are looking for.
On the other hand you can use "perfect hashing", a two step hashing algorithm that guarantees amortized O(1) access using a two step hashing algorithm, but as you are asking for a unique 'hash' that may also not be what you are looking for. Could you be a bit more specific about why you insist on the hash-value being unique and under what circumstances?
sha1_file is fine.
In theory you can run into two files that hash to the same value, but in practice it is so stupendously unlikely that you should not worry about it.
Hash functions don't provide any guarantees about uniqueness. Patru explains why, very well - this is the pigeonhole principle, if you'd like to read up.
I'd like to talk about another aspect, though. While you won't get any theoretical guarantees, you get a practical guarantee. Consider this: SHA-256 generates hashes that are 256 bits long. That means there are 2256 possible hashes it can generate. Assume further that the hashes it generates are distributed almost purely randomly (true for SHA-256). That means that if you generate a billion hashes a second, 24 hours a day, you'll have generated 31,536,000,000,000,000 hashes a year. A lot, right?
Divide that by 2256. That's ~1060. If you walked linearly through all possible hashes, that's how many years it would take you to generate all possible hashes (pack a lunch). Divide that by two, that's... still ~1060. That's how many years you'd have to work to have a greater than 50% chance of generating the same hash twice.
To put it another way, if you generate a billion hashes a second for a century, you'd have a 1/1058 chance of generating the same hash twice. Until the sun burns out, 1/1050.
Those are damn fine chances.
i wasn't able to answer my question.
I need a hashing method that will generate a hash that can be compared with others and find out the fidelity,
let's say i have to 2 strings, "mother", "father" and when i compare the 2 hashes, it will say that there is a fidelity between them because of the "ther".
Is there any hashing method that it's able to do that?
thank you
PHP provides a function called similar_text which calculates similarity between two strings. You could also use the levenshtein function to calculate the distance between the two strings. Whilst these aren't hashing functions, I think they should provide the functionality I think you're after.
I'm not sure if you were looking for an answer specific to your specific case of 2 words, but there are definitely hash-style functions that are useful for comparing parts of a whole. A Hash Tree is a perfect example of one such structure. Hash trees are used to compare parts of a chunk of data and they aggregate for comparison of the entire chunk of data.
I'll also note that while others point out that most real world hash functions will not allow any information about the input to be derived from the output, they are talking about a Cryptographic Hash Function. The set of guarantees for a regular Hash Function is much less strict than those of a Cryptographic Hash Function. For instance, in Java you can override .hashCode() and return 4 for every object. This is perfectly valid, but not extremely useful. It is valid because collisions are ok in general hash functions, but they are considered failure in a cryptographic hash function.
I believe rot13, along with taking out all the vowels would qualify. Any real-world hash would not. That's kind of the point.
In short: There can't be in a universal sense of the word
This is why:
One of the main functions of a hash is compression - apart from trivial usage (such as "mother" and "father") a hash will allways be shorter than the hashed information. E.g. a SHA1 (or even MD5) as a quick check, whether a a download of a 600MB ISO went without corruption will be much shorter than the file itself.
Another main function of a hash is (very high grade) obfuscation. Were this not so, hashing a salted password would do nothing (or at least much less) to protect against a dicitionary attack, as similar passwords would result in similar hashes.
So, I've been looking into encryption lately, and I've heard of people using timestamps as keys for encryption. I think this is a great idea, but if I want to decrypt the data, how would I retrieve that specific timestamp? Timestamps are unique, and I'm not really sure how this would work.
EDIT:
I am using PHP and MYSQL
Er, usually timestamps are used as the basis for generating keys - not as the key itself. The key is something you have to store for later if you want to be able to decrypt the data.
Two problems here.
You would need to store the timestamp somewhere. So why not just use "rand()" and store that.
Its possible to get hundreds of duplicate timestamps on a modern multi-threaded, multi-core processor. So you may as well just use date().
The only way would be to save the timestamp. (assuming there is no backdoor to your encryption, which would nullify it's purpose)
What I would do is save the timestamp along with the encrypted string in a different format Ex: 09/21/2011 03:09:53 and use a combination of strtotime() and salting to store both bits of information in a secure manor.
09/21/2011 03:09:53 becomes 13165--Salt--74593
Timestamps make awful keys. A program could blaze through all possible keys in the blink of an eye.
The time is often used a component is priming a random number generator. It can't be the only component for the same reason it can't be used as a key, though.
The time could be used semi-successfully as the salt for a hashing algorithm. It's still not as good as something random since it allows the attacker to generate rainbow tables in advance.
I don't want to hurt your feelings, but...
It is very obvious that you have never studied crypto.
So, please do not design your own cryptographic protocols, and do not assemble cryptographic primitives yourself either (like the "WiFi" designers/amateurs cryptologists did with WEP).
Protocols designed to meet specific security goals (I am not saying "secure protocols" on purpose) have been invented and implemented by specialists.
You first need to define your security goals, then choose an adequate protocol.