Amazon S3 creating unique keys for every object

Amazon S3 creating unique keys for every object - php

My app users upload their files to one bucket. How can I ensure that each object in my S3 bucket has a unique key to prevent objects from being overwritten?
At the moment I'm encrypting filenames with a random string in my php script before sending the file to S3.
For the sake of the discussion let's suppose that the uploader finds a way to manipulate the filename on upload. He wants to replace all the images on my site with a picture of a banana. What is a good way to prevent overwriting files in S3 if encryption fails?
Edit: I don't think versioning will work because I can't specify a version id in an image URL when displaying images from my bucket.

Are you encrypting, or hashing? If you are using md5 or sha1 hashes, an attacker could easily find a hash collision and make you slip on a banana skin. If you are encrypting without a random initialization vector, an attacker might be able to deduce your key after uploading a few hundred files, and encryption is probably not the best approach. It is computationally expensive, difficult to implement, and you can get a safer mechanism for this job with less effort.
If you prepend a random string to each filename, using a reasonably reliable source of entropy, you shouldn’t have any issues, but you should check whether the file already exists anyway. Although coding a loop to check, using S3::GetObject, and generate a new random string might seem like a lot of effort for something that will almost never need to run, "almost never" means it has a high probability of happening eventually.

Checking for a file with that name before uploading it would work.
If the file already exists, re-randomize the file name, and try again.

Related

Best practice for storing private images on a webserver

For example if you had an online communitity that allowed the sending of private images between members, like a digital penpal or dating website.
What would be the best practice for securing these images on a webserver and the best practice for displaying them to the authenticated user?
Here is what I have done so far:
Store Images outside of public root.
Retrieve images via one time code instead of the actual image location.
Randomised hashed image names and folder names that are not easy to guess.
PHP script to authenticate user before displaying the image.
Outside of root seems to be one of the best ways to store the images to make then hard to access, but what about if the server itself is directly hacked into?
Is there a way to hash and salt the image files so it can only be displayed once the hash and salt matches, even if a hacker had the file?
Would this be possible to return via PHP or SQL?
I was thinking of encoding the images to base64 and salting the base64 with a salt generated from a randomly generated password per user (Is this possible?)
Or is there a better method?

For a basic protection, the things you have described could be enough, maybe even too much in the sense that if folders are outside of www root, randomizing folder names won't add much to security but will increase complexity.
Based on a risk assessment that you should conduct for your scenario, you can choose to do more. Of course if you find that you can lower the risk of a $100 breach with the cost of $10000, you probably don't want to do that. So do the maths first. :)
I can see two major threats to your solution, one is a bug in the access control logic that allows a user to download images that he was not supposed to be able to access. The other is an attacker gaining access to your web server and downloading images (as your web server needs to have access to image files, this is not necessarily root/admin access, which increases the risk).
An idea one could think of would be to encrypt images on the server. However, with encryption, key management is usually the problem, and that is exactly the case now. There is not much point in encryption with a key that your application can access anyway, as an attacker could also access that key in case of a successful application level attack (and also in case of a server/OS level attack, because the user running your web server and/or application must have access to the key).
In theory, you could generate a public/private keypair for all of your users. When somebody uploads an image, you would generate a symmetric key for the image, encrypt the image with that key, and then encrypt the symmetric key with each intended recipient's public key and store encrypted keys (and metadata) with the image. The private keys for users should also be encrypted, preferably with a key derived from the user's password with a proper key derivation function like PBKDF2. One implication is that you can only get the user's private key when the user logs in, because you don't store his password, so that's the only time you have it. This means you would have to store your user's decrypted private key in server memory at least, where it is not really safe (and any other store is much worse). This would still provide protection against offline attackers though (somebody having access to backups for instance), and it would also limit attack scope to victim users that log on while the server is compromised (meaning after it is compromised, but before you realize this). Another drawback is the complexity of this solution - crypto is hard, it would be really easy to mess this up without experience. This would also mitigate the threat posed by an access control flaw, because unintended images could not be decrypted with the logged on user's private key.
A completely different approach would be to separate your application into several components: a logon service (similar to SSO), your web server, and a backend image service. When your user logs on to the authentication provider (AP), he would in this case receive a token with claims, signed by the AP. When talking to the web application, he would use this token for authentication. What differentiates this solution from the previous is that when a user requests images, the web application would pass his token to the image service, and the image service could on the one hand store images securely on a box not directly accessible from the internet, and on the other hand it could authorize whether for the token received it wants to return images (it could verify the token with the AP or by itself, depending on the implementation you choose). In this case, even if an attacker compromises the web application, he would still not be able to produce (sign) a valid token from the AP to get access to images on the image service, and it could potentially be much harder to compromise the image service. Of course in case of a breach on the web server, the attacker would still be able to observe any image flowing through, meaning any user that logs on while the server is compromised would still lose his images to the attacker. The added complexity of this solution is even worse than the previous one, which means it is easy to get this wrong too, and it's also costly both to develop and maintain.
Note that none of these solutions protect images from server admins, which may or may not be a requirement for your app.
I hope this answer sheds some light on the difficulties involved in making it significantly more secure than your current solution. Having said all this, implementation is key, and details (the actual code level vulnerabilities) probably matter the most.

You have these listed as some of your security protocols:
"1. Store Images outside of public root.
2. Retrieve images via one time code instead of the actual image location.
...
4. PHP script to authenticate user before displaying the image."
This should be enough, but then you mentioned...
"3. Randomised hashed image names and folder names that are not easy to guess."
If you are actually doing the first two correctly, 1 and 2, then it's not really possible for 3 to have any effect. If the images are outside of the webserver directory, then it doesn't matter if the folder names and image names are easy to guess.
Your code will look like (for doing 1 and 2), assuming the environment is the root directory of your webserver (i.e., example.com/index.php)...
$file_location = '../../images/' . $some_id_that_is_authenticated_and_cleansed_for_slashes . '.jpg';
readfile($file_location); // grabs file and shows it to user
If you are doing the above, then 3 is redundant. Hashing the names, etc., won't help if your site is hacked (the Apache security is bypassed) and it won't help if your site isn't doing the above (since users can then just directly access the URLs). Except for that redundancy, the rest seems perfect.

ZF2: How to serve a secure image as part of a web page

I have built a ZF2 application which includes user profiles and I now want to allow users to upload and display their photo as part of their profile. Something like what you see in LinkedIn.
Uploading the photo seems easy enough (using Zend\InputFilter\FileInput()). I have that working fine.
It seems to me that storing them outside of the web root makes a lot of sense. (For example, I don't need to worry about user's using wget on the directory). But how do I then embed these images as part of a web page?
If they were within the web root I would simply do <img width="140" src="/img/filename.jpg"> but obviously that's not possible if they are in a secure location. What's the solution?

You're right. Web developers traditionally obfuscate the paths used to store images to prevent malicious individuals from retrieving them in bulk (as you allude to with your wget comment).
So while storing a user's avatar in /uploads/users/{id}.jpg would be straightforward (and not necessarily inappropriate, depending on your use case), you can use methods to obfuscate the URL. Keep in mind: There are two ways of approaching the problem.
More simply, you want to ensure one cannot determine an asset URL based on "public" information (e.g., the user's primary key). So if a user has a user ID of 37, accessing their avatar won't be as simple as downloading /uploads/users/37.jpg.
A more vigorous approach would be to ensure one cannot relate a URL back to its public information. A URL like /uploads/users/37/this-is-some-gibberish.jpg puts its ownership "on display"; the user responsible for this content must be the user with an ID of 37.
A simple solution
If you'd like to go with simpler approach, generate a fast hash based on set property (e.g., the user's ID) and an application-wide salt. For PHP, take a look at "Fastest hash for non-cryptographic uses?".
$salt = 'abc123'; // Change this, keep it secret, store it as env. variable
$user->id; // 37
$hash = crc32($salt . strval($user->id)); // 1202873758
Now we have a unique hash and can store the file at this endpoint: /uploads/users/37/1202873758.jpg. Anytime we need to reference a user's avatar, we can repeat this logic to generate hash needed to create the filename.
The collision issue
You might be wondering, why can't I store it at /uploads/users/1202873758.jpg? Won't this keep my user's identity safe? (And if you're not wondering, that's OK, I'll explain for other readers.) We could, but the hash generated is not unique; with a sufficiently large number of users, we will overwrite the file with some other user's avatar, rendering our storage solution impotent.
To be more secretive
To be fair, /uploads/users/1202873758.jpg is a more secretive filename. Perhaps even /uploads/1202873758.jpg would be better. To store files with paths like these; we need to ensure uniqueness, which will require not only generating a hash, but also checking for uniqueness, accommodating for inevitable collisions, and storing the (potentially modified) hash—as well as being able to retrieve the hash from storage as needed.
Depending on your application stack, you could implement this an infinite number of ways, some more suitable than others depending on your needs, so I won't dive into it here.

If you use Zfcuser, you can use this module:HtProfileImage.
It contains a view helper to display images very easily!

Best way to detect same files in php

I have a web server, where users upload their files. I want to implement logic, that will show to user, if he will try to upload same file twice.
First idea is to save md5_file() value to the db and then check if there are any files with same md5 value. Files size differs from 2 megabytes up to 300.
I heared that md5 have collisions. Is this ok to use it?
Is it effective to use such logic with 300 megabytes files?

Yes, this is exactly what hashing is for. Consider using sha1, it's an all around superior hashing algorithm.
No, you probably shouldn't worry about collisions. The odds of people accidentally causing collisions is extremely low, close enough to impossible that you shouldn't waste any time thinking about it up-front. If you are seriously worried about it, use the hash as a first check, and then compare the file sizes, then compare the files bit-by-bit.

MD5 collisions are rare enough that in this case it shouldn't be an issue.
If you are dealing with large files however, you'll have to remember you are essentially uploading the file any way before you even check if it is a duplicate.
Upload -> MD5 -> Compare -> Keep or Disregard.

If checking for duplicates, you can usually get away with using sha1.
Or to bulletproof it:
$hash = hash_file("sha512", $filename); // 128 char hex output
(And yes, with very large files md5 does indeed have a fairly high number of collisions)

Best hash algorithm for a data index (ie, crc)

Basically, I'm keeping track of file modifications, in something like:
array (
'crc-of-file' => 'latest-file-contents'
)
This is because I'm working on the file contents of different files at runtime at the same time.
So, the question is, what hashing algorithm should I use over the file contents (as a string, since the file is being loaded anyway)?
Collision prevention is crucial, as well as performance. I don't see any security implications in this so far.
Edit: Another thing I could have used instead of hashing contents is the file modification timestamp, but I wasn't sure how reliable it is. On the other hand, I think it's faster to monitor the said stamp than hashing the file each time.

CRC it's not a hashing algorithm, a checksum algorithm so your chances of collision will be quite high.
md5 is quite fast and the collision risk is rather minimal for your kind of application / volume. If you are buffering the file, you may also want to look at incremental hashes using the hash extension.
A bit more complex, but also worth looking at (if you have it) is the Inotify extension.

Webserver on the fly decrypting?

I am dealing with a concept for a project that involes absolutely critical data.
The most important part is that it needs to be stored encrypted.
An encrypted file system that is mounted from which the webserver serves the files is not enough.
The key to decrypt the data should be passed in the request URI on a secured connection along with a hash and a timestamp.
The hash, based on timestamp, key and filename validates the URI and stores it on a list, so it can only be accessed once.
The important part now is that the webserver should take the file from the disk, and serve it decrypted using they key he got from the request URI.
It should also be efficient and fast. This also requires an encryption method that does not require the whole file to be scanned. so that the file can progressively be decrypted. I think AES can do this with specified block sizes that are encrypted atomic.
So one option would be reading the source file into a php script in chunks of some megs where i decrypt using aes and print the decrypted content. The script then forgets the previous data and continues with the next chunk until eof.
If aes doesnt support that i can just encrypt chunks of defined size of the file seperately, concatenate them and do it the same when serving the files. however i would like to stick to one standard that i dont have to re invent, so i can also use standard libraries to encrypt the files.
However this will be very inefficient.
Do you know of any apache/lighttpd/nginx module or some better method?

You should open the file with nmap() and then encrypt the data on-the-fly as needed.
I don't see anything more appropriate for this than G-Wan (200 KB), which offers native C scripts and AES encryption (no external libraries needed even if C scripts can link with any existing library).
If you need to achieve the best possible performances, then this is the way to go.

You may want to look into PHP's Stream Filters ( http://php.net/stream.filters ); with a bit of glue code, you could make it read an encrypted file with the regular PHP file access functions, and it would be mostly transparent to existing code.

If you can't find a PHP module that lets you decrypt the files chunk/block-wise, you can always pre-split the file into appropriate sized blocks and encrypt each seperately.
Of course, remember that even if you're only sending out small pieces of the plaintext at a time, there's still plenty of other places that this vulnerable data can be held - particularly in the web server's output buffers. Consider the extreme case of a large-ish file being downloaded by someone stuck on a 2400 baud modem. You may very well decrypt and queue the entire file before even the first chunk's been downloaded, leaving the entire file in the clear in a buffer somewhere.

There's no off-the-shelf solution to provide what you require. And while you've provided a bit of information about the data will be retrieved, you've not given much clues as to how the data will get on to the webserver in the first place.
You're jumping through lots of hoops to try to ensure that the data is not compromised - but if you're decrypting it on the server, then there is not only a risk of the data being compromised - but also that the key will be compromised. i.e. there's more theatre than substance in the architecture.
You seem to be flexible in the algorithm used for the encryption - which implies that you have some control over the architecture - so there is some scope to resolve these problems.
The hash, based on timestamp, key and filename validates the URI and stores it on a list, so it can only be accessed once.
How does that ensure it is only accessed once? Certainly it could be used to reduce the window of opportunity for CSRF - but it does not eliminate it.
The script then forgets the previous data and continues with the next chunk until eof.
This fundamentally undermines the objective of encryption - patterns within the data will still be apparent - and this provides a machanism for leveraging brute force attacks against the data - even if the block size is relatively large. Have a look at the images here for a simple demonstration.
A far more secure approach would be to use CBC, and do the encryption/decryption on the client.
There are javascript implementations of several encryption algorthms (including AES) this page has a good toolkit. And with HTML5 / localstorage you can build a complete clientside app in HTML/javascript.
As you're starting to discover - just using a clever encryption algorithm does not make your application secure - it sounds like you need to go back and think about how you store and retrieve data before you worry about the method you use for encrypting it.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.