File naming convention and allowed characters between different OS - php

I'm wring a piece of code in PHP for saving email attachments. Can i assume that this will never fail because different allowed characters between OS?
foreach($message->attachments as $a)
{
// Make dir if not exists
$dir = __DIR__ . "/saved/$uid"; // Message id
if (!file_exists($dir)) mkdir($dir) or die("Cannot create: $dir");
// Save the attachment using original!!! filename as found in email
$fp = fopen($dir . '/' . $a->filename, 'w+');
fwrite($fp, $a->data);
fclose($fp);
}

You should never use a name that you have no control over, it can contain all sorts of characters, like ../../...
You can use a function like basename to clean it up and a constant like DIRECTORY_SEPARATOR to separate directories.
Personally I would rename the file but you can also filter the variables before using them.

it is good practice to replace certain characters that may occur in filenames on windows.
Unix can handle almost any character in a file name (but not "/" and 0x00 [the null Character]), but to prevent encoding problems and difficulties on downloading a file I would suggest to replace anything that does not match
/[A-Za-Z0-9_-\.]/g, which satisfies the POSIX fully portable filename format.
so a preg_replace("/[^A-Za-Z0-9_-\.]/g","_",$filename); will do a good job.
a more generous approach would be to replace only |\?*<":>+[]\x00/ which leaves special language characters like öäü untouched and is compatible with FAT32, NTFS, any Unix and Mac OS X.
in that case use preg_replace("/[\|\\\?\*<\":>\+\[\]\/]\x00/g","_",$filename);

NO, you should assume that this will have a high probability of failing. for 2 reasons:
what if 2 emails have files named the same (selfie.jpg, for example)?
what if filename contains unacceptable characters?
you should use an internal naming convention (user+datetime+sequential, for example) and save names in a MySQL table with at least 3 fields:
Id - Autonumbered
filename - as saved by your php
original name - as in the original email
optional username - or usercode or email address of whomever sent email
optional datetime stamp
save the original filename as a VARCHAR and you will be able to keep track of original name and even show it, search for it, etc.

Related

unicode characters in image URL - 404

I am trying to open an image that has Latin characters in its name (113_Atlético Madrid).
I saved it by encoding its name with the PHP function rawurlencode(), so now its new name is 113_Atl%C3%A9tico%20Madrid. But when I am trying to open it by this URL for example mysite.com/images/113_Atl%C3%A9tico%20Madrid.png I got 404 error.
How I can fix this issue?
PHP code:
if(isset($_FILES['Team'])){
$avatar = $_FILES['Team'];
$model->avatar = "{$id}_".rawurlencode($model->name).".png";
if(!is_file(getcwd()."/images/avatars/competitions/{$model->avatar}")){
move_uploaded_file($avatar['tmp_name']['avatar'], getcwd()."/images/avatars/teams/{$model->avatar}");
}
}
%-encoding is for URLs. Filenames are not URLs. You use the form:
http://example.org/images/113_Atl%C3%A9tico%20Madrid.png
in the URL, and the web server will decode that to a filename something like:
/var/www/example-site/data/images/113_Atlético Madrid.png
You should use rawurlencode() when you're preparing the filename to go in a URL, but you shouldn't use it to prepare the filename for disc storage.
There is an additional problem here in that storing non-ASCII filenames on disc is something that is unreliable across platforms. Especially if you run on a Windows server, the PHP file APIs like move_uploaded_file() can very likely use an encoding that you didn't want, and you might end up with a filename like 113_Atlético Madrid.png.
There isn't necessarily an easy fix to this, but you could use any form of encoding, even %-encoding. So if you stuck with your current rawurlencode() for making filenames:
/var/www/example-site/data/images/113_Atl%C3%A9tico%20Madrid.png
that would be OK but you would then have to use double-rawurlencode to generate the matching URL:
http://example.org/images/113_Atl%25C3%25A9tico%2520Madrid.png
But in any case, it's very risky to include potentially-user-supplied arbitrary strings as part of a filename. You may be open to directory traversal attacks, where the name contains a string like /../../ to access the filesystem outside of the target directory. (And these attacks commonly escalate for execute-arbitrary-code attacks for PHP apps which are typically deployed with weak permissioning.) You would be much better off using an entirely synthetic name, as suggested (+1) by #MatthewBrown.
(Note this still isn't the end of security problems with allowing user file uploads, which it turns out is a very difficult feature to get right. There are still issues with content-sniffing and plugins that can allow image files to be re-interpreted as other types of file, resulting in cross-site scripting issues. To prevent all possibility of this it is best to only serve user-supplied files from a separate hostname, so that XSS against that host doesn't get you XSS against the main site.)
If you do not need to preserve the name of the file (and often there are good reasons not to) then it might be best to simply rename the entirely. The current timestamp is a reasonable choice.
if(isset($_FILES['Team'])){
$avatar = $_FILES['Team'];
$date = new DateTime();
$model->avatar = "{$id}_".$date->format('Y-m-d-H-i-sP').".png";
if(!is_file(getcwd()."/images/avatars/competitions/{$model->avatar}")){
move_uploaded_file($avatar['tmp_name']['avatar'], getcwd()."/images/avatars/teams/{$model->avatar}");
}
}
After all, what the file was called before it was uploaded shouldn't be that important and much more importantly if two users have a picture called "me.png" there is much less chance of a conflict.
If you are married to the idea of encoding the file name then I can only point you to other answers:
How do I use filesystem functions in PHP, using UTF-8 strings?
PHP - FTP filename encoding issue
PHP - Upload utf-8 filename

PHP - Windows - filename incorrect after upload (ü saved as ü etc.) [duplicate]

This question already has answers here:
PHP - Upload utf-8 filename
(9 answers)
UTF-8 all the way through
(13 answers)
Closed 4 months ago.
I have this home made app that allows multiple file uploads, I pass the files to php with AJAX, create new dir with php, move there uploaded files and save the dir location to database. Then to see the files I run listing of the directory location saved in the db.
The problem is that files come from all around the world so very often they have some non latin characters like for example ü. When I echo the filename in php names appear correctly even when they have names written in Arabic, yet they are being saved on the server with encoded names as for example ü in place of ü. When I list the files from directory I can see the name ü.txt insted of ü.txt but when I click on it server returns error object not found (since on the server it is saved as ü.txt and it reads the link as ü.txt).
I tried some of the suggested solutions as for example using iconv, but the filenames are still being saved the same way.
I could swear the problem wasn't present when the web app was hosted on linux, but at the moment I am not so sure about it anymore. Right now I temporarily run it on xampp (on Windows) and it seems like filenames are saved using windows-1252 encoding (default Windows' encoding on the server). Is it default Windows encoding related problem?
To be honest I do not know how to approach that problem and I would appreciate any help. Should I keep on trying to save the files in different character encoding or would it be better to approach it different way and change the manner of listing the already saved and encoded files?
EDIT. According to the (finally) closed bug report it was fixed in php 7.1.
In the end I solved it with the following approach:
When uploading the files I urlencode the names with rawurlencode()
When fetching the files from server they are obviously URL encoded so I use urldecode($filename) to print correct names
Links in a href are automatically translated, so for example "%20" becomes a " " and URL ends up being incorrect since it links to incorrect filename. I decided to encode them back and print them ending up with something like this: print $dirReceived.rawurlencode($file); ($dirReceived is the directory where received files are stored, defined earlier in the code)
I also added download attribute with urldecode($filename) to save the file with UTF-8 name when needed.
Thanks to this I have files saved on the server with url encoded names. Can open them in browser (very important as most of them are *.pdf) and can download them with correct name which lets me upload and download even files with names written in Arabic, Cyrillic, etc.
So far I tested it and looks good. I am thinking of implementing it in production code. Any concerns/thoughts on it?
EDIT.
Since there are no objections I select my answer as the one that solved my problem. After doing some testing everything looks good on client and server side. When saving the files on server they are URL encoded, when downloading them they are decoded and saved with correct names.
At the beginning I was using the code:
for($i=0;$i<count($_FILES['file']['name']);$i++)
{
move_uploaded_file($_FILES['file']['tmp_name'][$i],
"../filepath/" . $_FILES['file']['name'][$i]);
}
This method caused the problem upon saving file and replaced every UTF-8 special character with cp1252 encoded one (ü saved as ü etc.), so I added one line and replaced that code with the following:
for($i=0;$i<count($_FILES['file']['name']);$i++)
{
$fname= rawurlencode($_FILES['file']['name'][$i]);
move_uploaded_file($_FILES['file']['tmp_name'][$i],
"../filepath/" . $fname);
}
This allows me to save any filename on server using URL encoding (% and two hexadecimals) which is compatible with both cp1252 and UTF-8.
To list the saved files I use filepaths I have saved in DB and list them for files. I was using the following code:
if (is_dir($dir)){
if ($dh = opendir($dir)){
while (($file = readdir($dh)) !== false){
if(is_file($dir . $file)){
echo "<li><a href='".$dir.$file."' download='".$file ."'>".$file."</a></li><br />";
}
}
closedir($dh);
}
}
Since URL encoded filenames were decoded automatically I changed it to:
if (is_dir($dir)){
if ($dh = opendir($dir)){
while (($file = readdir($dh)) !== false){
if(is_file($dir . $file)){
echo "<li><a href='";
print $dir.rawurlencode($file);
echo "' download='" . urldecode($file) ."'>".urldecode($file)."</a></li><br />";
}
}
closedir($dh);
}
}
I don't know if this is the best way to solve it but works perfectly, also I am aware that it is generally a good practice not to use php to generate html tags but at the moment I have some critical bugs that need addressing so first that and then I'll have to work on the appearance of the code itself.
EDIT2
Also the great thing is I do not have to change names of the already uploaded files which in my case is a big advantage.
Are you using $_FILES['upfile']['name'] to name the file? That could create your problem.
How about using GNU Recode?
$fileName = recode_string('latin1',$_FILES['upfile']['name']);
Syntax:
recode_string(string recode type,string $string)
Valid Character sets: http://www.faqs.org/rfcs/rfc1345.html
Somehow you must validate the characters in the uploaded file name.
You could also try sprintf. The formatted string characters can be unpredictable, but will probably work.
$fileName = pathinfo($_FILES['upfile']['name'], PATHINFO_FILENAME);
$fileName = sprintf('./uploads/%s',$fileName);
When you save the file name use
$fileName = mysqli_real_escape_string($fileName)

algorithm to name files with no probability of repetition

Can someone suggest a complex algorithm in php to name files that would be uploaded so that it never repeats? i wonder how youtube which has millions of videos does it??
Right now i use an random number and get its 16 character sha1 hash and name the file with that name but i'm pretty sure it will eventually repeat and generate an error as file will not be able to save in the file system.
something like:
$name = sha1(substr(sha1(md5($randomnumber)),0,10));
somebody once told me that its impossible to break the hash generated by this code or at least it'll take 100 years to break it.
you could do:
$uniq = md5(uniqid(rand(), true));
You could also apped user id of users uploading the file, like:
$uniq = $user_id_of_uploader."_".md5(uniqid(rand(), true));
Generate a GUID (sometimes called UUID) using a pre-existing implementation. GUIDs are unique per computer, timestamp, GUID generated during that timestamp and so on, so they will never repeat.
If making a GUID isn't available, using sha1 on the entire input and using the entire output of it is second best.
$name = 'filename'.$user_id(if_available).md5(microtime(true)).'extension';
Try to remove special characters and white spaces from the file name.
If you are saving name in database then a recursive function can be helpful.
Do below with proper methods.
First slice its extension and filename
Now Trim the filename
Change multiple Space into single space
Replace special character and whitespace into to _
Prefix with current timestamp using strtotime and salt using md5(uniqid(rand(), true)) separated by _ (Thanks to #Sudhir )
Suffix with a special signature using str_pad and limit the text length of a file
Now again add extension and formatted file name
hope it make sense.
Thanks
I usually just generate a string for the filename (implementation is not incredibly important), then check if a file already exists with that name. If so, append a counter to it. If you somehow have a lot of files with the same base filename, this could be inefficient, but assuming your string is unique enough, it shouldn't happen very often. There's also the overhead of checking that the file exists.
$base_name = generate_some_random_string(); // use whatever method you like
$extension = '.jpg'; // Change as necessary
$file_name = $base_name . $extension;
$i = 0;
while (file_exists($file_name)) {
$file_name = $base_name . $i++ . $extension;
}
/* insert code to save the file as $file_name */

How to make a name of file in such way 31f3207.jpeg [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Unique key generation
How to automaticly make a new name of file when you upload it to server?
I upload a picture file with < input type="file" / >. File has name picture_1.jpg. But I'd like to store it in filesystem with name like this ec0b4c5173809b6aa534631f3207.jpg? How such new names are created? Is some special script/generator used to make such names?
For example picture in FB: http:// a3.sphotos.ak.fbcdn.net/hphotos-ak-snc6/190074_10150167009837952_8062627951_8311631_4439729_n.jpg.
The name of it is 190074_10150167009837952_8062627951_8311631_4439729_n.jpg. But original name was different for sure. So I'd like to change the name of uploaded file the same way. How is it possible?
In PHP I use uniqid() for naming files that I store from uploads
If you want an extra long or seemingly more random id, you can sha1() or md5() the uniqid to create a hash.
You could of course also use those two methods to create a hash of the filename.
For example, the following code can be used to generate a new name for a file
$file = 'somefile.jpg';
$new_name = uniqid();
$new_name .= '.'.pathinfo($file,PATHINFO_EXTENSION);
You didn't mention what programming language you are using.
in PHP, that cryptic name and original filename can be found on array $_FILES.
let's assume your form's element name is userfile, you can get that cryptic name from basename($_FILES['userfile']['tmp_name']) and the original name from basename($_FILES['userfile']['name'])
Visit here for more information :
http://www.php.net/manual/en/features.file-upload.post-method.php
You could MD5 the name to create a string without spaces or dots: md5($filename) (http://php.net/md5) although I'm not sure that was what you were asking?
Edit: you could also use uniqid() (http://php.net/manual/en/function.uniqid.php)
You just rewrite the name of the file upon moving it from the temporary files on the server, to the location you want to move it to.
move_uploaded_file($_FILES['userfile']['tmp_name'], $filePN)
Where &filePN is the path and name of where you want to move the file to.
the special script that you're talking about, however, can be a multitude of things from an MD5 hash of the input name, to an incremented number to prevent overwrites.
Looks like a GUID.
If you are an ASP.NET programmer, use something like this:
string filename = new Guid().ToString() + ".jpg";

Create Unique Image Names

What's a good way to create a unique name for an image that my user is uploading?
I don't want to have any duplicates so something like MD5($filename) isn't suitable.
Any ideas?
as it was mentioned, i think that best way to create unique file name is to simply add time(). that would be like
$image_name = time()."_".$image_name;
Grab the file extension from uploaded file:
$ext = pathinfo($uploaded_filename, PATHINFO_EXTENSION);
Grab the time to the second: time()
Grab some randomness: md5(microtime())
Convert time to base 36: base_convert (time(), 10, 36) - base 36 compresses a 10 byte string down to about 6 bytes to allow for more of the random string to be used
Send the whole lot out as a 16 char string:
$unique_id = substr( base_convert( time(), 10, 36 ) . md5( microtime() ), 0, 16 ) . $ext;
I doubt that will ever collide - you could even not truncate it if you don't mind very long file names.
If you actually need a filename (it's not entirely clear from your question) I would use tempnam(), which:
Creates a file with a unique filename, with access permission set to 0600, in the specified directory.
...and let PHP do the heavy lifting of working out uniqueness. Note that as well as returning the filename, tempnam() actually creates the file; you can just overwrite it when you drop the image file there.
You could take a hash (e.g., md5, sha) of the image data itself. That would help identify duplicate images too (if it was byte-for-byte, the same). But any sufficiently long string of random characters would work.
You can always rig it up in a way that the file name looks like:
/image/0/1/012345678/original-name.jpg
That way the file name looks normal, but it's still unique.
I'd recommend sha1_file() over md5_file(). It's less prone to collisions.
You could also use hash_file('sha256', $filePath) to get even better results.
http://php.net/manual/en/function.uniqid.php maybe?
You can prefix it with the user id to avoid collisions between 2 users (in less than one millisecond).
For short names:
$i = 0;
while(file_exists($name . '_' . $i)){
$i++;
}
WARNING: this might fail on a multi threaded server if two user upload a image with the same name at the same time.
In that case you should include the md5 of the username.
lol there are around 63340000000000000000000000000000000000000000000000 possibility's that md5 can produce
plus you could use just tobe on the safe side
$newfilename = md5(time().'image');
if(file_exists('./images/'.$newfilename)){
$newfilename = md5(time().$newfilename);
}
//uploadimage
How big is the probablity of two users uploading image with same name on same microsecond ?
try
$currTime = microtime(true);
$finalFileName = cleanTheInput($fileName)."_".$currTime;
// you can also append a _.rand(0,1000) in the end to have more foolproof name collision
function cleanTheInput($input)
{
// do some formatting here ...
}
This would also help you in tracking the upload time of the file for analysis. or may be sort the files,manage the files.
For good performance and uniqueness you can use approach like this:
files will be stored on a server with names like md5_file($file).jpg
the directory to store file in define from md5 file name, by stripping first two chars (first level), and second two (second level) like that:
uploaded_files\ 30 \ c5 \ 30 c5 67139b64ee14c80cc5f5006d8081.pdf
create record in database with file_id, original file name, uploaded user id, and path to file on server
on server side create script that'll get role of download providing - it'll get file by id from db, and output its content with original filename provided by user (see php example of codeigniter download_helper ). So url to file will look like that:
http://site.com/download.php?file=id
Pros:
minified collisions threat
good performance at file lookup (not much files in 1 directory, not much directories at the same level)
original file names are saved
you can adjust access to files by server side script (check session or cookies)
Cons:
Good for small filesizes, because before user can download file, server have to read this file in memory
try this file format:
$filename = microtime(true) . $username . '.jpg';
I think it would be good for you.
<?php
$name=uniqid(mt_rand()).$image_name;
?>
You should try to meet two goals: Uniqueness, and usefulness.
Using a GUID guarantees uniqueness, but one day the files may become detached from their original source, and then you will be in trouble.
My typical solution is to embed crucial information into the filename, such as the userID (if it belongs to a user) or the date and time uploaded (if this is significant), or the filename used when uploading it.
This may really save your skin one day, when the information embedded in the filename allows you to, for example, recover from a bug, or the accidental deletion of records. If all you have is GUIDs, and you lose the catalogue, you will have a heck of a job cleaning that up.
For example, if a file "My Holiday: Florida 23.jpg" is uploaded, by userID 98765, on 2013/04/04 at 12:51:23 I would name it something like this, adding a random string ad8a7dsf9:
20130404125123-ad8a7dsf9-98765-my-holiday-florida-23.jpg
Uniqueness is ensured by the date and time, and random string (provided it is properly random from /dev/urandom or CryptGenRandom.
If the file is ever detached, you can identify the user, the date and time, and the title.
Everything is folded to lower case and anything non-alphanumeric is removed and replaced by dashes, which makes the filename easy to handle using simple tools (e.g. no spaces which can confuse badly written scripts, no colons or other characters which are forbidden on some filesystems, and so on).
Something like this could work for you:
while (file_exists('/uploads/' . $filename . '.jpeg')) {
$filename .= rand(10, 99);
}
Ready-to-use code:
$file_ext = substr($file['name'], -4); // e.g.'.jpg', '.gif', '.png', 'jpeg' (note the missing leading point in 'jpeg')
$new_name = sha1($file['name'] . uniqid('',true)); // this will generate a 40-character-long random name
$new_name .= ((substr($file_ext, 0, 1) != '.') ? ".{$file_ext}" : $file_ext); //the original extension is appended (accounting for the point, see comment above)

Categories