Converting base64_encode gives the binary data into characters like
9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBwcHBw8LCwkMEQ8
Can I take some set of character to check duplicate? Can I do it the same for videos?
Like the others have said, don't use Base64 as a means of comparing files, it would be much much less expensive to to use something like SHA1, particularly if you are using this for videos. See the sha1_file function
For example if you already have a SHA1 sum, it is easy to compare:
if ($storedSHA1 == sha1_file($newImage)){
// ...some rejection code
}
I'd recommend creating a database table that stores the name, size and SHA1 of each file you upload. Then you can run a simple query to check if any of the records match. If you have a match in your database you know you have a duplicate.
See the below MySQL query.
SELECT SHA1_hash FROM Uploads
WHERE SHA1_hash = '<hashOfIncomingImage>';
No, you don't. Use digest for duplicates checking. SHA1 is good enough choice. It has constant and small footprint in comparing to base64. Base64 is good for transmitting or exchanging binary data but that's all. In addition, base64 is about 1/3 greater than binary data.
Verifying that two files are identical using pure PHP?
You want to use hash functions for that, for example, Sha1. It always returns a 40 character wich you can use to compare.
Related
I'm trying to use mcrypt_create_iv to generate random salts. When I test to see if the salt is generated by echo'ing it out, it checks out but it isn't the required length which I pass as a parameter to it (32), instead its less than that.
When I store it in my database table however, it shows up as something like this K??5P?M???4?o???"?0??
I'm sure it's something to do with the database, but I tried to change the collation of it to correspond with the config settings of CI, which is utf8_general_ci, but it doesn't solve the problem, instead it generates a much smaller salt.
Does anyone know of what may be wrong? Thanks for any feedback/help
The function mcrypt_create_iv() will return a binary string, containing \0 and other unreadable characters. Depending on how you want to use the salts, you first have to encode those byte strings, to an accepted alphabet. It is also possible to store binary strings in the database, but of course you will have a problem to display them.
Since salts are normally used for password storing, i would recommend to have a look at PHP's function password_hash(), it will generate a salt automatically and includes it in the resulting hash-value, so you don't need a separate database field for the salt.
Most of the text stored in my DB is from 1MB to 1.5MB big. But not bigger then 1.5MB, because that's the limit I set.
Here are my needs:
I need it for lowering my mysql database size
I need it to be as fast as possible
no security needed
it must just work correctly, so that string_1 and string_2 can never have the same hash
I use PHP and MYSQL.
A hash is not reversible. You can make a 1.5MB text into a small string with the help of hashing, but you cannot convert the same hash back into the original text.
What you are looking for is a compression algorithm. You can make the files a lot smaller with compression, but it's unlikely to be as small as a hash.
I would suggest SHA1, as it is also in use by git and similar applications to identify strings.
See: https://en.wikipedia.org/wiki/Sha1
and: http://php.net/manual/en/function.hash.php
$hash = hash( 'sha1', $inputData );
Saving space
MySQL has built-in COMPRESS() and UNCOMPRESS() functions which will save space in your DB, as well having to write extra PHP code.
Checking unique-ness
Instead of indexing TEXT columns [regardless of if they're compressed or not] you can store and index 2 relatively-small things that will guarantee that that text is unique.
A hash of the data, MD5, SHA, whatever you want.
The length of the uncompressed data.
For most hashing functions you're more likely to get hit by a meteor than have 2 identical hashes for different text strings, and having 2 indentical length and hash strings is less likely than getting hit by a meteor and lightning while winning three simultaneous lotteries.
I'm going to assume you want a compression algorithm to reduce the text size.
See http://php.net/manual/en/function.gzcompress.php.
How can I Decode the md5, crc32, and sha1, below is xml file and then is code I'm using to get data so far.
<files>
<file name="AtTheInn-Germany-Morrow78Collection.mp3" source="original">
<format>VBR MP3</format>
<title>At the Inn - Germany - Morrow 78 collection</title>
<md5>056bbd63961450d9684ca54b35caed45</md5>
<creator>Germany</creator>
<album>Morrow 78 collection</album>
<mtime>1256879264</mtime>
<size>2165481</size>
<crc32>22bab6a</crc32>
<sha1>796fccc9b9dd9732612ee626c615050fd5d7483c</sha1>
<length>179.59</length>
</file>
And this is code I'm using to get title and album name how can I make sense of sha1 and md5, any help to any direction will be helpful, Thanks
<?php
$search = $_GET['sku'];
$catalogfile = $_GET['file'];
$directory = "feeds/";
$xmlfile = $directory . $catalogfile;
$xml = simplexml_load_file($xmlfile);
list($product) = $xml->xpath("//file[crc32 = '$search']");
echo "<head>";
echo "<title>$product->title</title>";
MD5, SHA-1, and CRC32 are hash functions. That means that they cannot be reversed.1 You'd have more luck looking into that name attribute of the file tag.
1 You can2 brute-force them, but since they can represent variable-length data as a fixed-length piece of data, due to the pigeonhole principle and just plain probability, you're more likely to get something that's not the original input than the original input.
2 It'll take forever for SHA-1, though.
Hash functions generate numbers that represent some arbitrary data. They can be used to verify if the data has changed (a good hash function should produce a totally different hash for even a single bit has changed).
Since you are turning an arbitrary amount of data in a number as a result you loose information, this means that it's hard to reverse them. Technically there is an infinite number of possible results for a hash as the data can be any length. For limited data sizes its still possible for there to be multiple data values for a specific hash, this is called a collision.
For some data sets (for example passwords) you can generate all possible combinations of data and check to see if they match a hash. If you do the generation at the same time as the checking it's known as 'brute forcing'. You can also store all possible combinations (for a limited range, for example all dictionary works or all combinations of characters under a specific size), then look it up. This is known as a rainbow table and is useful for reversing multiple hashes.
It's good practice to store passwords as a hash rather than in plain text but to ensure the passwords are hard to reverse they add a bit of random data to each one and store it along with the passwords, this is known as salting. This salt means it takes much longer to brute force a password.
In this case they are probably hashes of the mp3 file that is specified to verify file integrity and show any corruption that occurs during transfer (or storage). It won't be possible to reverse them since you would have to generate all possible combinations of megabytes of data. But if you have the file itself there wouldn't be any reason too. You can confirm they are hashes of the file by running a checksum generating program on it.
I have a MySQL database I am working on in PHP where It will perform address verification from a daily data feed. We would do address correction on our end, because we don't have control over the source of the feed.
I am trying to come up with a method to see if the address has been changed at the source. If it changes then an address verification would be performed in PHP on our MySQL database.
Without storing a copy of the old feed I was thinking it might be better to do a checksum of the fields from the feeds and store this with each record. Then each feed after that it would see if the checksum has changed. Is this the best method to do this? Might there been a PHP function to do all this already? What about something in MySQL? Thanks!
crc32 is probably what you want.
In php: crc32()
In Mysql CRC32()
crc32 is probably a better fit that SHA1 or MD5 for simple comparisons/data integrity:
see here
PHP and MySQL both support the crc32 function which is inexpensive to run; at least less so than a hash algorithm like MD or SHA.
There are various hash methods you can use, either the md5 or sha ones will be ok, you will need to store in your database the hash string to compare to,
Idealy you'd want to do something like
if (sha1(strtoupper($list_of_values) )=== $stored_hashstring){
//skip
}else{
//update
}
Depending of the data you might need to add additional parsing on the strings ie: removing spaces, etc
How do I store some encrypted strings (just about one to a couple of words)?
Let's suppose I encrypted this string:
$key='fappings'; // Encryption Key
$str='Mama Luigi'; // String that I Encrypted
$encrypted = mcrypt_encrypt(MCRYPT_RIJNDAEL_256, md5($key), $str, MCRYPT_MODE_CBC, md5(md5($key)));
Now, assuming I don't want to index that data, or perform any searches on it, just wanna ask two questions:
What datatype would I better use? I would guess varbinary, but I'm not sure...
How would I make process my query? Assuming I would use a simple mysql_query() function.
I saw some people would actually make base64 encode, and then simply insert it, just the way a normal string would go, eg:
mysql_query("insert into faps data='".base64_encode($encrypted)."'");
But something tells me it's not the way to do it. Even space-wise insufficient.
What would be a better approach?
Use a varbinary (as you suggest) or a blob, depending on length required. For safety, multiply the length of the string by 4.
No need to base64 encode it - that just adds yet more length. All you need are the regular escaping functions (mysql_real_escape_string at a basic level, or bind using PDO and it'll make it safe for you).
An alternative may be to use MySql's own encryption functions. http://dev.mysql.com/doc//refman/5.5/en/encryption-functions.html as this may save you some programming hassles? Moves the stress from PHP server to MySQL server - so consider which is less stressed, and whether travelling of the uncompressed SQL instructions over an open network matters etc.