I am developing a site framework in php (codeigniter) and want to introduce image versioning on image uploads so that I can take advantage of image caching. The easiest approach would just be to md5 the image and use that as the file name but I don't like this approach for the following reasons:
1)Not SEO friendly on the image names
2)md5 hashes seem unnecessarily long - and therefore larger database
field required.
So I am considering using an approach such as the following:
Start the filename with the entered name of the image with underscores instead of spaces then add a randomly generated integer, say 8 digits long. This will mean I have to check for an existing image by that name and then regenerate the integer if one exists (however unlikely that is).
Presumably I will also have to unique filename for every image size, so I guess the solution here would be to add a prefix representing the file size.
Now I want to get this right first time since it will be a pain to change once the framework is deployed so I am really just looking for input on
a)Whether my concerns are justified (particularly does the filename do
anything for SEO and does the length of a random string of numbers
affect it)
b)Whether there is anything else I should be concerned about or check
for with my proposed approach.
c)Is there an easier approach, perhaps a hashing algorithm which
produces much shorter results.
d) Is there already a ci lib out there that does this?
Thank you for your input and advice
This answers a few of your questions:
Replacing spaces with underscores is not enough to have a clean filename as you'd need to check for more weird characters, but you can use sanitize_filename() method in CI's security library: http://ellislab.com/codeigniter/user-guide/libraries/security.html
If you do want to preserve the original filename, your approach sounds good to me. Though, 8-digit integer at the end of filename can be replaced by '-1’, ‘-2’, ‘-3' by simple incremental loop checking if the file with that ending exists or not.
File Upload library is something you can check out - http://ellislab.com/codeigniter/user-guide/libraries/file_uploading.html. It is flexible and can be configured to keep the original filenames. Getting sanitize_filename() from Security lib to work along should do exactly what you need.
In all my CI applications I always use encrypted filename (this optional feature is provided by CI file upload class). At the same time I can configure the library to not overwrite already existing file by adding a number to it (if no encryption is used) or by just giving it another encrypted name (when encryption option is on). I do like it this way as it keeps the filenames consistent clean (although long and not SEO-friendly, however ALT tag gives it more exposure to search engines).
Related
I can't seem to find a reference. I am assuming the PHP function file_exists uses system calls on linux and that these are safe for any string that does not contain a \0 character, but I would like to be sure.
Does anyone have (preferably non-anecdotal) information regarding this? Is is vulnerable to injection if I don't check the strings first?
I guess you need to, because the user may enter something like :
../../../somewhere_else/some_file and access a file that he is not allowed to access .
I suggest that you generate the absolute path of the file independently in your php code and just get the file name from user by basename()
or exclude any input containing ../ like :
$escaped_input = str_replace("../","",$input);
It depends on what you're trying to protect against.
file_exists doesn't do any writing to disk, which means that the worst that can happen is that someone gains some information about your file system or the existence of files that you have.
In practice however, if you're doing something later on with the same file that was previously checked with file_exists, such as includeing it, you may wish to perform more stringent checks.
I'm assuming that you may be passing arbitrary values, possibly sourced from user input, into this function.
If that is the case, it somewhat depends on why you actually need to use file_exists in the first place. In general, for any filesystem function that the user can pass values directly into, I'd try to filter out the string as much as possible. This is really just being pedantic and on the safe side, and may be unnecessary in practice.
So, for example, if you only ever need to check the existence of a file in a single directory, you should probably strip out directory delimiters of all sorts.
From personal experience, I've only ever passed user input into a file_exists call for mapping to a controller file, in which case, I'd just strip out any non-alphanumeric + underscore character.
UPDATE: reading your comments recently added, no there aren't special characters as this isn't executed in a shell. Even \0 should be fine, at least on newer PHP versions (I believe older ones would cut the string before the \0 when sent to underlying filesystem calls).
I'm using crypt() which in the particular case uses an md5 hash with 12 character salt.
Here is an example of the string crypt() returns modified from php.net, crypt documentation.
$1$rasmusle$rISCgZzpwk3UhDidwX/in0
Here is the salt which also includes the encoding type.
$1$rasmusle$
Here it the encoding type. ( MD5 in this case )
$1$
and finally the hash value.
rISCgZzpwk3UhDidwX/in0
You can not have forward slashes in file names as this will be interpreted as a folder.
Should I simply remove all the forward slashes and are there other issue with the characters set that crypt() uses.
It looks like you want to prevent / allow access to the image for specific users. If that is the case I would do the following:
Store the images outside of the document root. This makes sure the images cannot simply be directly requested.
Store the images original name in the database and also store the sha1_file() hash in the same record. This adds the benefit if not having duplicate images on your server. Although images are small it prevents cluttering of the system.
When somebody requests a "private" image they will request it through a PHP file which will check whether the user has the privileges to access the file and if so serves the file (from the database).
With the above method you will have the most control over who can request the images and your users will thank you for that.
Note: that you cannot simply store all images in the same folder, because all filesystems have limits as to how many files can be stored in a single directory
A simple example of a PHP script that serves an image would look something like the following:
<?php
// always set the header and change it according to the type of the image
header("Content-type: image/jpeg");
echo file_get_contents('/path/to/the/image.jpg');
/$1$/ - Is an algorithm that used to create a hash
You can just use md5 md5_file/ sha1 sha1_file functions that would create hash without that additional information. Unless you want to use different algorithms at the same time.
Run a URLEncode method over your hash, and it should replace all of the '/' with %2F... I know this isn't a perfect fix, because i think things like apache server still block any web requests with '%2F' in the url. Just my 2 cents on the matter
ALWAYS normalize user provided data, including file names, unless you want to be hacked by uploading file with name containig NULL to fool PHP. Specify allowed characters (i.e A-Za-z0-9 and convert all other to i.e. underscore. Or use sha1/md5 to create hash from filename and store file under that name.
EDIT
This will replace all characters except for A-Z, a-z, 0-9 with underscore _:
$normalizedName = preg_replace('/[^A-Za-z0-9]/', '_', $userProvidedName);
I'm stuck on a crazy project that has me looking for a strange solution. I've got a XFA PDF document generated by an outside party. There's are several checkmark characters '✓' on the PDF's that I need to simply change to 'X'. The reason for this is beyond my control. I'm just looking for a way to change the ✓'s into X's. Can anyone point me in the right direction? Is it possible?
Currently we use PHP and TCPDF for creating "our" server PDF's, but this particular PDF is generated outside of my control by a third party that doesn't want to alter their way of doing things. To make things worse, I don't know how many or where the checkmarks may exist. It's just one very specific character that is in need of changing. Does any know a way of hacking the document to change the character?
Character 2713
http://www.fileformat.info/info/unicode/char/2713/index.htm
Yes, I think you can. To my (rather limited) knowledge of the PDF format, you can only reliably search and replace strings of one character in length, since they are created by placing strings of variable length at specific co-ordinates, in an arbitrary order. The string 'hello' could therefore be one string of five letters, or five strings of one letter each or some combination thereof, all placed in the correct position (and in whatever order the print driver decided upon).
I'm afraid I don't know of any libraries that will do this, but I'd be surprised if they don't exist. You'll need to read PDF objects in, do the replacement, and write them out to a new file. I'd start off researching around the answers to this question.
Edit: this looks like it might be useful.
We use UUIDs for our primary keys in our db (generated by php, stored in mysql). The problem is that when someone wants to edit something or view their profile, they have this huge, scary, ugly uuid string at the end of the url. (edit?id=.....)
Would it be safe (read: still unique) if we only used the first 8 characters, everything before the first hyphen?
If it is NOT safe, is there some way to translate it into something else shorter for use in the url that could be translated back into the hex to use as a lookup? I know that I can base64 encode it to bring it down to 22 characters, but is there something even shorter?
EDIT
I have read this question and it said to use base64. again, anything shorter?
Shortening the UUID increases the probability of a collision. You can do it, but it's a bad idea. Using only 8 characters means just 4 bytes of data, so you'd expect a collision once you have about 2^16 IDs - far from ideal.
Your best option is to take the raw bytes of the UUID (not the hex representation) and encode it using base64. Or, just don't worry much, because I seriously doubt your users care what's in the URL.
Don't cut a single bit out of that UUID: You have no control over the algorithm that produced it, there are multiple possible implementation, algorithm implementation is subject to change (example: changed with the version of PHP you're using)
If you ask me an UUID in the address bar doesn't look scary or difficult at all, even a simple google search for "UUID" produces worst looking URL's, and everybody's used to looking at google URL's!
If you want nicer looking URL's, take a look at the address bar of this stackoverflow.com article. They're using the article ID followed by the title of the question. Only the ID part is relevant, everything else is there to make it easy on the eyes of readers (go ahead and try it, you can delete anything after the ID, you can replace it with junk - doesn't matter).
It is not safe to truncate uuid's. Also, they are designed to be globally unique, so you aren't going to have luck shortening them. Your best bet is to either assign each user a unique number, or let users pick a custom (unique) string (like a username, or nick name) that can be decoded. So you could have edit?id=.... or edit?name=blah and you then decode name into the uuid in your script.
It depends on how you're generating the UUID - if you're using PHP's uniqid then it's the right-most digits that are more "unique". However, if you're going to truncate the data, then there's no real guarantee that it'll be unique anyway.
Irrespective, I'd say that this is a somewhat sub-optimal approach - is there no way you can use a unique (and ideally meaningful) textual reference string instead of an ID in the query string? (Hard to know without more knowledge of the problem domain, but it's always a better approach in my opinion, even if SEO, etc. isn't a factor.)
If you were using this approach, you could also let MySQL generate the unique IDs, which is probably a considerably more sane approach than attempting to handle this in PHP.
If you're worried about scaring users with the UUID in the URL, why not write it out to a hidden form field instead?
Can any one please let me know the way, how can i encrypt/decrypt a file instead of string. I mean i need to encrypt the entire file it may be an excel-sheet or document or even text file.
instead of string.
That rather implies that you already know how to encrypt the string - and since you're being specific about the algorithm, that you can create an appropriate representation for the other tools being used to operate on the data. But you haven't said what mode of operation you need to use - implementing this using CBC is trivial.
It's also not stated - but implied in your question, that the data is too large to load into a string (otherwise its simply a case of encrypting file_get_contents()).
There doesn't seem to be much in the way of documentation, but I would expect the modificed key required for ECB is updated in the resource created by mcrypt_module_open() and modified by mcrypt_generic_init(). Then its just a matter of feeding in parts from the file sized as a multiple of the block size (see mcrypt_get_block_size)
See http://www.php.net/manual/en/function.mcrypt-module-open.php
C.
I'm a little confused, can't you just read/write the string to a file using functions like file_get_contents and file_put_contents?
If you need an encryption-class there are some over at PHP classes. There is also a paid solution here: phpAES.
I guess it is better to create your own library for it and expose an API that just accepts a filepath instead of it content. It can open read the file and do the encryption / decryption.
You can use your own or pre-existing algo for encrypt/decrypt. Also you can have an argument in that API to accept the filepath to store the decrypted data or replace with the same file or whatever.