I am uploading files to my server and nameing them with record ID. because record id is in sequence, these files are not safe can be downloaded in loop. http://www.blabla.com/1.jpg .. 2.jpg etc.
I want to encrypt the record id to 7 char and while reading these files I want to dycrypt it back.
so file names would be
http://www.blabla.com/72ayhg6.jpg
which(72ayhg6) is when dycripted is id 1.
How can I do this using php.
Php decrypt and encrypt generate quite a long number. Can I added some sort of salt in it and limited it to 7 or 11 char.
thanks in advance.
Check this http://kevin.vanzonneveld.net/techblog/article/create_short_ids_with_php_like_youtube_or_tinyurl/
Why do you need to decrypt it?
Did you bother to read up on anti-leeching strategies.
A simple approach (though far from ideal, since its just security by obscurity) would be to rename the files based on a hash of the record id and a nonce.
Can I added some sort of salt in it and limited it to 7 or 11 char linux file name limit.
If you don't know the difference between Linux and MS-DOS, perhaps you need to cover some of the basics before attempting to write code?
Related
I am trying to build an application that needs to compare the MD5 hash of any file.
Due to specific issues, before the upload, the MD5 must be generated client side, and after the upload the application needs to check it server side.
My first approach was to use, at the client side, the JavaScript File API and the FileReader.ReadAs functions. Then I use the MD5 algorithm found here: http://pajhome.org.uk/crypt/md5/
Server side, I would use PHP's fopen command and the md5 function.
This approach works fine when using simple text files. But, when a binary file is used (like some jpg or pdf), the MD5 generated at the client side is different from the server. Using md5sum command-line tool I figured out that the server MD5 is correct and the problem occurs at client side.
I've tried other MD5 API's I found with the same results. I suspect that FileReader.ReadAs functions are loading the file content slightly differently (I have tried all ReadAs function variants: text, binary and so on), but I can't figure out what is the difference.
I'm missing something but don't know what, maybe I need to decode the content somehow before generating the MD5.
Any tips?
Edit 1:
I followed the idea given by optima1. Took each character and printed the unicode number both on javascript and PHP. I could see only one difference at the end on all the cases (used vimdiff).
PHP: 54 51 10 37 37 69 79 70 0
Javascript: 54 51 10 37 37 69 79 70
Maybe this extra zero at PHP is some kind of "string end". On both cases the binary strings have the same length. Adding a String.fromCharCode(0) to the end of the JS content do not solve the problem. I will keep investigating.
If i can't find a solution i will try to build a giant string by concatenating those charcodes and using it to build the MD5. It is a crap solution but will serve for now and i will just need to add a zero to the end of the JS string...
Edit 2:
Thank God! This implementantion works like a charm: http://www.myersdaily.org/joseph/javascript/md5.js
If you need to generate a MD5 hash from binary files, go for it.
Thanks in advance!
http://membres-liglab.imag.fr/donsez/cours/exemplescourstechnoweb/js_securehash/
javascript md5 and php md5 both are same but we need to use some functions...that functions we can get from above url....
I would suggest doing a quick sanity check: have your client-side code report the first and last bytes of the binary data. Repeat in your PHP code. Compare first and last bytes from both methods to ensure that they are in fact reading the same data (which should result in the same MD5 hash.)
Then I would suggest posting code here so that we can review.
What would be the best way to compare files and or directories. Lets say I want to store files on a sever or collective of servers like a cloud based system. My users are in collaboration with one another in many cases and some not. Either way I can have upwards of a hundred people or more with the same exact file. Only key difference is they likey renamed the file or whatever. But essentially same exact data all around. Now other thing is there is no specific file type. There's pdf, doc, docx, txt, videos, audio files, etc.. but the issue boils down to the same files over and over. What i want to do is cut it down. Remove the hundreds of dupes and with the help of a database store things like the file name the user provided so I can in turn store the single file left how and where I want while still providing the info they used essentially.
Now i know I can do something with md5 or sha1 or sha2 or something equivalent that will essentially give me a unique value I can use for such comparisons. But i am not exactly sure how or where to begin with that. Such as how with php can I get the sha or md5 of a file? When i look up stuff for those I get methods for strings but not files..
Overall I am here looking to bounce ideas around to figuring this out not so much as a direct means.. any help would be great.
$filePath = '/var/www/site/public/uploads/foo.txt'
$data = file_get_contents($filePath);
$key = sha1($data); //or $key = sha1_file($filePath);
Save this $key in a column of table also mark that column as UNIQUE so no to same file can be stored by default.
Use sha1 instead of md5 since many version control system like git use sha1 hash itself to identify uniqueness of file
When a file is uploaded:
Compute the hash (SHA1, etc.)
Rename the file to that hash and store it (unless a file with that hash already exists [you already have it])
Store the hash in your database.
When a file is requested:
Get the hash from your database
Return the file based on the hash
Use HTTP headers so the user's browser provides it to them with the filename they used
To get the md5 hash of a file at $path...
$hash = md5(file_get_contents($path));
Hope this partially answers your question.
There are many ways you can accomplish such a system. But if I'd have to write one from scratch, this is most likely how I would do it :
have three database tables (in pseudocode) :
table users {
id integer ## PK
username string
password string ## sha1
...
}
table user_files {
user_id integer ## Composite INDEX
file_id integer ##
filename string
}
table files {
id integer ## PK
uniq_id string ## basically 'yyyMMddhhmmssRRRR' INDEX
sha_hash string ## sha1
md5_hash string ## md5
}
Where files.sha_hash is the result of computing the sha1 of the file, files.md5_hash is the result of computing the md5 of the same file, as double security check, and files.filename the actual file name. On the server, the file would be stored and renamed to files.uniq_id to make sure there is no name collision, where the last RRRR chars represents a random string (cycle RRRR until uniq_id is unique in the database)
Note : PHP provides sha1_file and md5_file. Use these when computing files!
When a user stores a file, process the file (describe in step 1) and save it appropriately. To avoid having too many files in the same folder on the server, you may decompose files.uniq_id and separate each files into yyyy/MM sub folders.
Next, associate user_files.file_id = files.id and user_files.user_id = users.id and set user_files.filename to the uploaded file name (see next step).
If a user uploads another file, process the result as in 2, but check whether the result match another files.sha_hash, files.md5_hash. At this point, if we have a match, it doesn't matter what name the file has, it's very likely the exact same file, so associate the found user_files.file_id = files.id and user_files.user_id = users.id and set user_files.filename to the uploaded file name.
Note : this will cause to have 1 physical file and 2 virtual files on your server.
If a user rename a file without modifying it, simply rename user_files.filename matching the file he/she wants to rename.
If a user deletes a file, check how many user_files.file_id matches and only if 1 match is found, delete the physical file and the files entry. Otherwise, simply remove the user_files association.
If a user modifies the file with or without renaming it, perform a delete (step 5) and another save (step 3)
You can use :
md5(file_get_contents($filename));
To generate a hash for a file.
With that in mind, two entirely different files will produce the exact same md5 hash (Same problem with the other hashes, although you can have much less collisions by using a better hash method than md5). To compare two files you need to do it byte by byte, but you don't want to analyze every byte of every file on the hard disk to find a match.
What you need to do is store the hash for every file in your database in an a column, which should also be an index.
Then you can select all files with the same hash as the new file from your database.
That will give you a small list of files. Say you have 100,000 files on the disc. You might get a list of a few files that match the hash. Most of the time the lists will be short. Then you can loop through those files byte by byte to see if it's a match. Searching through a list of the ~10 files that have the same hash will save you from searching through all 100,000 files, but you still need to do the byte by byte comparison, because those 10 files could all be very different.
Is it necessary? Hard disk is very cheap these days so who cares for the duplicates? I would imagine that are not that big?
MD5 et al. are not unique. Just a quick way of saying that two files are not the same. It is possible for two files to have the same MD5 value but contain different data.
I was planning on hosting images on a server and wanted to use the same sort of file naming encryption mechanism. Is it just a hash?
yes.
$filename = md5($_SERVER['REQUEST_URI'].$_SERVER['REMOTE_ADDR'].rand(50000000, 900000000000)).$ext;
It's just a hash. If you have characters a-zA-Z0-9 and choose a hash only 6 characters long, you get 61,474,519 possible unique filenames. I doubt you'll run out =) use the mt_rand function for best results.
We use UUIDs for our primary keys in our db (generated by php, stored in mysql). The problem is that when someone wants to edit something or view their profile, they have this huge, scary, ugly uuid string at the end of the url. (edit?id=.....)
Would it be safe (read: still unique) if we only used the first 8 characters, everything before the first hyphen?
If it is NOT safe, is there some way to translate it into something else shorter for use in the url that could be translated back into the hex to use as a lookup? I know that I can base64 encode it to bring it down to 22 characters, but is there something even shorter?
EDIT
I have read this question and it said to use base64. again, anything shorter?
Shortening the UUID increases the probability of a collision. You can do it, but it's a bad idea. Using only 8 characters means just 4 bytes of data, so you'd expect a collision once you have about 2^16 IDs - far from ideal.
Your best option is to take the raw bytes of the UUID (not the hex representation) and encode it using base64. Or, just don't worry much, because I seriously doubt your users care what's in the URL.
Don't cut a single bit out of that UUID: You have no control over the algorithm that produced it, there are multiple possible implementation, algorithm implementation is subject to change (example: changed with the version of PHP you're using)
If you ask me an UUID in the address bar doesn't look scary or difficult at all, even a simple google search for "UUID" produces worst looking URL's, and everybody's used to looking at google URL's!
If you want nicer looking URL's, take a look at the address bar of this stackoverflow.com article. They're using the article ID followed by the title of the question. Only the ID part is relevant, everything else is there to make it easy on the eyes of readers (go ahead and try it, you can delete anything after the ID, you can replace it with junk - doesn't matter).
It is not safe to truncate uuid's. Also, they are designed to be globally unique, so you aren't going to have luck shortening them. Your best bet is to either assign each user a unique number, or let users pick a custom (unique) string (like a username, or nick name) that can be decoded. So you could have edit?id=.... or edit?name=blah and you then decode name into the uuid in your script.
It depends on how you're generating the UUID - if you're using PHP's uniqid then it's the right-most digits that are more "unique". However, if you're going to truncate the data, then there's no real guarantee that it'll be unique anyway.
Irrespective, I'd say that this is a somewhat sub-optimal approach - is there no way you can use a unique (and ideally meaningful) textual reference string instead of an ID in the query string? (Hard to know without more knowledge of the problem domain, but it's always a better approach in my opinion, even if SEO, etc. isn't a factor.)
If you were using this approach, you could also let MySQL generate the unique IDs, which is probably a considerably more sane approach than attempting to handle this in PHP.
If you're worried about scaring users with the UUID in the URL, why not write it out to a hidden form field instead?
I've always wondered how and why they do this...an example: http://youtube.com/watch?v=DnAMjq0haic
How are these IDs generated such that there are no duplicates, and what advantage does this have over having a simple auto incrementing numeric ID?
How do one keep it short but still keep it's uniqueness? The string uniqid creates are pretty long.
Kevin van Zonneveld has written an excellent article including a PHP function to do exactly this. His approach is the best I've found while researching this topic.
His function is quite clever. It uses a fixed $index variable so problematic characters can be removed (vowels for instance, or to avoid O and 0 confusion). It also has an option to obfuscate ids so that they are not easily guessable.
Try this: http://php.net/manual/en/function.uniqid.php
uniqid — Generate a unique ID...
Gets a prefixed unique identifier based on the current time in microseconds.
Caution
This function does not generate cryptographically secure values, and should not be used for cryptographic purposes. If you need a cryptographically secure value, consider using random_int(), random_bytes(), or openssl_random_pseudo_bytes() instead.
Warning
This function does not guarantee uniqueness of return value. Since most systems adjust system clock by NTP or like, system time is changed constantly. Therefore, it is possible that this function does not return unique ID for the process/thread. Use more_entropy to increase likelihood of uniqueness...
base62 or base64 encode your primary key's value then store it in another field.
example base62 for primary key 12443 = 3eH
saves some space, which is why im sure youtube is using it.
doing a base62(A-Za-z0-9) encode on your PK or unique identifier will prevent the overhead of having to check to see if the key already exists :)
I had a similar issue - I had primary id's in the database, but I did not want to expose them to the user - it would've been much better to show some sort of a hash instead. So, I wrote hashids.
Documentation: http://www.hashids.org/php/
Souce: https://github.com/ivanakimov/hashids.php
Hashes created with this class are unique and decryptable. You can provide a custom salt value, so others cannot decrypt your hashes (not that it's a big problem, but still a "good-to-have").
To encrypt a number your would do this:
require('lib/Hashids/Hashids.php');
$hashids = new Hashids\Hashids('this is my salt');
$hash = $hashids->encrypt(123);
Your $hash would now be: YDx
You can also set minimum hash length as the second parameter to the constructor so your hashes can be longer. Or if you have a complex clustered system you could even encrypt several numbers into one hash:
$hash = $hashids->encrypt(2, 456); /* aXupK */
(for example, if you have a user in cluster 2 and an object with primary id 456) Decryption works the same way:
$numbers = $hashids->decrypt('aXupK');
$numbers would then be: [2, 456].
The good thing about this is you don't even have to store these hashes in the database. You could get the hash from url once request comes in and decrypt it on the fly - and then pull by primary id's from the database (which is obviously an advantage in speed).
Same with output - you could encrypt the id's on the way out, and display the hash to the user.
EDIT:
Changed urls to include both doc website and code source
Changed example code to adjust to the main lib updates (current PHP lib version is 0.3.0 - thanks to all the open-source community for improving the lib)
Auto-incrementing can easily be crawled. These cannot be predicted, and therefore cannot be sequentially crawled.
I suggest going with a double-url format (Similar to the SO URLs):
yoursite.com/video_idkey/url_friendly_video_title
If you required both the id, and the title in the url, you could then use simple numbers like 0001, 0002, 0003, etc.
Generating these keys can be really simple. You could use the uniqid() function in PHP to generate 13 chars, or 23 with more entropy.
If you want short URLs and predictability is not a concern, you can convert the auto-incrementing ID to a higher base.
Here is a small function that generates unique key randomly each time. It has very fewer chances to repeat same unique ID.
function uniqueKey($limit = 10) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$randstring = '';
for ($i = 0; $i < $limit; $i++) {
$randstring .= $characters[rand(0, strlen($characters))];
}
return $randstring;
}
source: generate random unique IDs like YouTube or TinyURL in PHP
Consider using something like:
$id = base64_encode(md5(uniqid(),true));
uniqid will get you a unique identifier. MD5 will diffuse it giving you a 128 bit result. Base 64 encoding that will give you 6 bits per character in an identifier suitable for use on the web, weighing in around 23 characters and computationally intractable to guess. If you want to be even more paranoid ugrade from md5 to sha1 or higher.
A way to do it is by a hash function with unique input every time.
example (you've tagged the question with php therfore):
$uniqueID = null
do {
$uniqueID = sha1( $fileName + date() );
} while ( !isUnique($uniqueID) )
There should be a library for PHP to generate these IDs. If not, it's not difficult to implement it.
The advantage is that later you won't have name conflicts, when you try to reorganize or merge different server resources. With numeric ids you would have to change some of them to resolve conflicts and that will result in Url change leading to SEO hit.
So much of this depends on what you need to do. How 'unique' is unique? Are you serving up the unique ID's, and do they mean something in your DB? if so, a sequential # might be ok.
ON the other hand, if you use sequential #'s someone could systematically steal your content by iterating thru the numbers.
There are filesystem commands that will generate unique file names - you could use those.
Or GUID's.
Results of hash functions like SHA-1 or MD5 and GUIDs tend to become very long, which is probably something you don't want. (You've specifically mentioned YouTube as an example: Their identifiers stay relatively short even with the bazillion videos they are hosting.)
This is why you might want to look into converting your numeric IDs, which you are using behind the scenes, into another base when putting them into URLs. Flickr e.g. uses Base58 for their canonical short URLs. Details about this are available here: http://www.flickr.com/groups/api/discuss/72157616713786392/. If you are looking for a generic solution, have a look at the PEAR package Mathe_Basex.
Please note that even in another base, the IDs can still be predicted from outside of your application.
I don't have a formula but we do this on a project that I'm on. (I can't share it). But we basically generate one character at a time and append the string.
Once we have a completed string, we check it against the database. If there is no other, we go with it. If it is a duplicate, we start the process over. Not very complicated.
The advantage is, I guess that of a GUID.
This is NOT PHP but can be converted to php or as it's Javascript & so clinetside without the need to slow down the server.. it can be used as you post whatever needs a unique id to your php.
Here is a way to create unique ids limited to
9 007 199 254 740 992 unique id's
it always returns 9 charachters.
where iE2XnNGpF is 9 007 199 254 740 992
You can encode a long Number and then decode the 9char generated String
and it returns the number.
basically this function uses the 62base index Math.log() and Math.Power to get the right index based on the number.. i would explain more about the function but ifound it some time ago and can't find the site anymore and it toke me very long time to get how this works... anyway i rewrote the function from 0.. and this one is 2-3 times faster than the one that i found.
i looped through 10million checking if the number is the same as the enc dec process and it toke 33sec with this one and the other one 90sec.
var UID={
ix:'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
enc:function(N){
N<=9007199254740992||(alert('OMG no more uid\'s'));
var M=Math,F=M.floor,L=M.log,P=M.pow,r='',I=UID.ix,l=I.length,i;
for(i=F(L(N)/L(l));i>=0;i--){
r+=I.substr((F(N/P(l,i))%l),1)
};
return UID.rev(new Array(10-r.length).join('a')+r)
},
dec:function(S){
var S=UID.rev(S),r=0,i,l=S.length,I=UID.ix,j=I.length,P=Math.pow;
for(i=0;i<=(l-1);i++){r+=I.indexOf(S.substr(i,1))*P(j,(l-1-i))};
return r
},
rev:function(a){return a.split('').reverse().join('')}
};
As i wanted a 9 character string i also appended a's on the generated string which are 0's.
To encode a number you need to pass a Number and not a string.
var uniqueId=UID.enc(9007199254740992);
To decode the Number again you need to pass the 9char generated String
var id=UID.dec(uniqueId);
here are some numbers
console.log(UID.enc(9007199254740992))//9 biliardi o 9 milioni di miliardi
console.log(UID.enc(1)) //baaaaaaaa
console.log(UID.enc(10)) //kaaaaaaaa
console.log(UID.enc(100)) //Cbaaaaaaa
console.log(UID.enc(1000)) //iqaaaaaaa
console.log(UID.enc(10000)) //sBcaaaaaa
console.log(UID.enc(100000)) //Ua0aaaaaa
console.log(UID.enc(1000000)) //cjmeaaaaa
console.log(UID.enc(10000000)) //u2XFaaaaa
console.log(UID.enc(100000000)) //o9ALgaaaa
console.log(UID.enc(1000000000)) //qGTFfbaaa
console.log(UID.enc(10000000000)) //AOYKUkaaa
console.log(UID.enc(100000000000)) //OjO9jLbaa
console.log(UID.enc(1000000000000)) //eAfM7Braa
console.log(UID.enc(10000000000000)) //EOTK1dQca
console.log(UID.enc(100000000000000)) //2ka938y2a
As you can see there are alot of a's and you don't want that... so just start with a high number.
let's say you DB id is 1 .. just add 100000000000000 so that you have 100000000000001
and you unique id looks like youtube's id 3ka938y2a
i don't think it's easy to fulfill the other 8907199254740992 unique id's