I was wondering how should I name my images upload using PHP & MySQL, should I use the auto increment number as the name of the image for example, 1.gif or should I use some random numbers or something. I was thinking auto increment was better. But what would be best?
Since no one's officially offered this yet, I'd advise to simply store the file name as the database unique id, nothing more, and store the extension in the database (unless you are forcing all images to be .jpg or something, then you don't need to).
It is always going to be a safe file name (an integer)
It will always be unique
No need to store the file name in the db or worry about scrubbing it.
It will be as small as possible.
Why I would not use the user's username/id, as suggested by others:
There's no benefit, and no reason to expose a user's id in the file name if you don't need to.
No need to scrub it for allowed characters, which may even end up with multiple users with the same "file safe" user name.
User names may change, so it doesn't always make sense, and you don't want to have to rename files if you want them to match.
Why I would not use the original file name in any form:
There's no benefit.
You have to scrub it for allowed characters.
There will be duplicates.
Unless you are interested in vanity file names, I can't think of any reason not to just use the auto-increment id. If your DB ids are unique, your file names will be too.
If later on you do want "pretty" file names, you can use .htaccess to rewrite the requests, and/or output your images through a php script, which also has the benefit of checking for permissions and whatnot if you need it.
What about
md5(microtime())
?
It is pretty unique
I like to use a combination of an auto incrementing id and filename.
So if I upload the image my_photo.jpg and it gets stored with an id of 5, I would save it as 5_my_photo.jpg
This way, the original filename and extension are preserved and I can deliver it back to the user without the id prefix if I want to.
One good way to name the images is to append a user name to an autoincrement value padded on the left with zero, such as "00000027MyPic.jpg".
if you are worried about the image being unique, store it as time().$extention.
I also prefer to put the user's username as a prefix, but thats just me, there is no reason to do that.
Related
i'm new to PHP, and i'm trying to upload file to file server and file information to mysql database, i have done uploading file server and database part but i need to retrieve the info of specific file from my file server folder if i click that file, i'm trying get that logic. please help me if there is any solid solution for this. (correct me if i'm wrong, my idea was to upload the file path to database along with info, is this will give me solution? but the filename can be duplicate)
I figured I would write a short(for me this is short) "answer" just so I could summarize my points.
Some "Best Practices" when creating a file storage system. File storage is a broad category so your mileage may vary for some of these. Take them just as suggestion of what I found works well.
Filenames
Don't store the file with the name give it by an end user. They can and will use all kind of crappy characters that will make your life miserable. Some can be as bad as ' single quotes, which on linux basically makes it so it's impossible to read, or even delete the file ( directly ). Some things can seem simple like a space but depending on where you use it and the OS on your server you could wind up with one%20two.txt or one+two.txt or one two.txt which may or may not create all kinds of issues in your links.
The best thing to do is create a hash, something like sha1 this can be as simple as {user_id}{orgianl_name} The username make it less likely of collisions with other users filenames.
I prefer doing file_hash('sha1', $contents) that way if someone uploads the same file more then once you can catch that ( the contents are the same the hash is the same). But if you expect to have large files you may want to do some bench marking on it to see what type of performance it has. I mostly handle small files so it works fine for that.
-note- that with the timestamp the file can still be saved because the full name is different, but it makes it quite easy to see, and it can be verified in the database.
Regardless of what you do I would prefix it with a timestamp time().'-'.$filename. This is useful information to have, because its the absolute time the file was created.
As for the name a user give the file. Just store that in the database record. This way you can show them the name they expect, but use a name you know is always safe for links.
$filename = 'some crapy^ fileane.jpg';
$ext = strrchr($filename, '.');
echo "\nExt: {$ext}\n";
$hash = sha1('some crapy^ fileane.jpg');
echo "Hash: {$hash}\n";
$time = time();
echo "Timestamp: {$time}\n";
$hashname = $time.'-'.$hash.$ext;
echo "Hashname: $hashname\n";
Ouputs
Ext: .jpg
Hash: bb9d2c2c7c73bb8248537a701870e35742b41c02
Timestamp: 1511853063
Hashname: 1511853063-bb9d2c2c7c73bb8248537a701870e35742b41c02.jpg
You can try it here
Paths never store the full path to the file. All you need in the database is the hash from creating the hashed name. The "root" path to the folder the file is stored in should be done in PHP. This has several benefits.
prevents directory transferal. Because your not passing any part of the path around you don't have to worry as much about someone slipping a \..\.. in there and going places they shouldn't. A poor example of this would be someone overwriting a .htpassword file by uploading a file named that with directory transverse in it.
Has more uniform looking links, uniform size, uniform set of
characters.
https://en.wikipedia.org/wiki/Directory_traversal_attack
Maintenance. Paths change, Servers change. Demands on your system change. If you need to relocate those files, but you stored the absolute full path to them in the DB your stuck gluing everything together with symlinks or updating all your records.
There are some exceptions to this. If you want to store them in a monthly folder or by username. You could save that part of the path, in a seperate field. But even in that case, you could build it dynamically based on data saved in the record. I have found it's best to save as little path info as possible. And them make a config or a constant you can use in all the places you need to put the path to the file.
Also the path and the link are very different, so by saving only the name you can link it from whatever PHP page you want without having to subtract data from the path. I've always found it easier to add to the filename then to subtract from a path.
Database (just some suggestions, use may vary )
As always with data ask yourself, who, what, where, when
id - int primary key auto increment
user_id - int foreign key, who uploaded it
hash - char[40] *sha1*, unique what the hash
hashname - varchar {timestampl}-{hash}.{ext} where the files name on the hard drive
filename - varchar the original name give by the user, that way we can show them the name they expect ( if that is important )
status - enum[public,private,deleted,pending.. etc] status of the file, depending on your use case, you may have to review the files, or maybe some are private only the user can see them, maybe some are public etc.
status_date - timestamp|datetime time the status was changed.
create_date - timestamp|datetime when time the file was created, a timestamp is prefered as it makes some things easier but it should be the same timestamp use in the hashname, in that case.
type - varchar - mime type, can be useful for setting the mime type when downloading etc.
If you expect different users to upload the same file and you use the file_hash you can make the hash field a combined unique index of the user_id and the hash this way it would only conflict if the same user uploaded the same file. You could also do it based on the timestamp and hash, depending on your needs.
That's the basic stuff I could think of, this isn't an absolute just some fields I thought would be useful.
It's useful to have the hash by itself, if you store it by it's self you can store it in a CHAR(40) for sha1 (takes up less space in the DB then VARCHAR) and set the collation, to UTF8_bin which is binary. This makes searches on it case sensitive. Although there is little possibility of a hash collision, this adds just a bit more protection because hashes are upper an lower case letters.
You can always build the hashname on the fly if you store the extension, and the timestamp separate. If you find yourself creating things time and time again you may just want to store it in the DB to simplify the work in PHP.
I like just putting the hash in the link, no extension no anything so my links look like this.
http://www.example.com/download/ad87109bfff0765f4dd8cf4943b04d16a4070fea
Real simple, real generic, safe in urls always the same size etc..
The hashname for this "file" would be like this
1511848005-ad87109bfff0765f4dd8cf4943b04d16a4070fea.jpg
If you do have conflicts with the same file and different user(which I mentioned above). You can always add the timestamp part into the link, the user_id or both. If you use the user_id, it might be useful to left pad it with zeros. For example some users may have ID:1 and some may be ID:234 so you could left pad it to 4 places and make them 0001 and 0234. Then add that to the hash, which is almost unnoticeable:
1511848005-ad87109bfff0765f4dd8cf4943b04d16a4070fea0234.jpg
The important thing here is that because sha1 is always 40 and the id is always 4 we can separate the two accurately and easily. And this way, you can still look it up uniquely. There are a lot of different options but so much depends on your needs.
Access
Such as downloading. You should always output the file with PHP, don't give them direct access to the file. The best way is to store the files outside of the webroot ( above the public_html, or www folder ). Then in PHP you can set the headers to the correct type ans basically read out the file. This works for pretty much everything except video. I don't handle videos so that's a topic outside of my experience. But I find it best to think of it as all file data is text, its the headers that make that text into an image, or an excel file or a pdf.
The big advantage of not giving them direct access to the file is if you have a membership site, of don't want your content accessible without a login, you can easily check in PHP if they are logged in before giving them the content. And, as the file is outside the webroot, they can't access it any other way.
The most important thing is to pick something consistent, that is still flexible enough to handle all your needs.
I'm sure I can come up with more, but if you have any suggest feel free to comment.
BASIC PROCESS FLOW
User submits form (enctype="multipart/form-data")
https://www.w3schools.com/tags/att_form_enctype.asp
Server receives the post from the form, Super Globals $_POST and the $_FILES
http://php.net/manual/en/reserved.variables.files.php
$_FILES = [
'fieldname' => [
'name' => "MyFile.txt" // (comes from the browser, so treat as tainted)
'type' => "text/plain" // (not sure where it gets this from - assume the browser, so treat as tainted)
'tmp_name' => "/tmp/php/php1h4j1o" // (could be anywhere on your system, depending on your config settings, but the user has no control, so this isn't tainted)
'error' => "0" //UPLOAD_ERR_OK (= 0)
'size' => "123" // (the size in bytes)
]
];
Check for errors if(!$_FILES['fielname']['error'])
Sanitize display name $filename = htmlentities($str, ENT_NOQUOTES, "UTF-8");
Save file, create DB record ( PSUDO-CODE )
Like this:
$path = __DIR__.'/uploads/'; //for exmaple
$time = time();
$hash = hash_file('sha1',$_FILES['fielname']['tmp_name']);
$type = $_FILES['fielname']['type'];
$hashname = $time.'-'.$hash.strrchr($_FILES['fielname']['name'], '.');
$status = 'pending';
if(!move_uploaded_file ($_FILES['fielname']['tmp_name'], $path.$hashname )){
//failed
//do somehing for errors.
die();
}
//store record in db
http://php.net/manual/en/function.move-uploaded-file.php
Create link ( varies based on routing ), the simple way is to do your link like this http://www.example.com/download?file={$hash} but it's uglier then http://www.example.com/download/{$hash}
user clicks link goes to download page.
get INPUT and look up record
$hash = $_GET['file'];
$stmt = $PDO->prepare("SELECT * FROM attachments WHERE hash = :hash LIMIT 1");
$stmt->execute([":hash" => $hash]);
$row = $stmt->fetch(PDO::FETCH_ASSOC);
print_r($row);
http://php.net/manual/en/intro.pdo.php
Etc....
Cheers!
I'm learning about the function urlencode. Is it possible to use this on a file name? So - when you upload a file to your server and then use that file name later, you would be able to use it in a url?
$promotionpicture=$_FILES["promotionpicture"]["name"];
$promotionpicture=rawurlencode($promotionpicture);
Then later...
$imagesource="http://mysite.com/".$userID."/".$promotionpicture;
I'm trying to do this, but every time I navigate to the picture, i get a "Bad request" from my server. Is there a specific php encode function I should use? Or is this wrong all together? Thanks in advance for you help.
urlencode and similar functions are for making an HTTP friendly URL. You would want to keep the normal filename and then when printing the img src, use urlencode.
Note that this is not really the preferred way to do it as you can run into duplicate filenames and misc security issues. It's better to generate a filename for it using a uuid or timestamp or something, that way you can bypass those types of issues.
Pictures are really just raw data, like any other file. It is possible to do something like what you're doing, but not necessarily advisable.
If you want to do something like that, I recommend instead doing something to strip special characters.
$newfilename=preg_replace('/[^a-zA-Z0-9.]/','',$filename);
(from Regex to match all characters except letters and numbers)
That said, keep in mind what others have said. How will you handle file name collisions? Where will the images be stored and how?
One easy way to do this much more robustly is to store in a database the original file name and the MD5 hash. Save the file by its hash instead of by name, and write a script that retrieves the file by matching the original name to the MD5 using the database. If you store the file type, you can issue correct headers and when the user downloads the file or uses it to embed in a web page, it will retain its original name, or display as expected respectively.
I want to have a very small web application (using PHP) that allows the user to add a text file. edit it and then save it.
my question is - if I have 100 users - and they all upload the file "myFile.txt" - how do I manage that each file will be saved in a different place - so they won't rewrite one another?
Do I need to attach to it a random number like:
myFile_randomness010101010.txt - so I will know which file belongs to who?
and then what? I just take this number out when they want to download what they have changed? and how do I know which files goes to who?
I think the answer has something to do with Cookies - but I don't know how exactly..
How does it work? where do I start?
thanks,
Alon
It's pretty straight forward really. Append a time and possible some unique id (possibly simple uniqid()). If you don't want the time to be readable, consider hashing it.
Now the main thing you are worried about is getting the file back, right? It would be best to just store both the original name and the tempered one in a database. That means you can show the original name on the frontend, but work with the unique one.
Other solutions are not so much fail proof. You can append the user's SID for example, but that means the user would not get the file in another session (and possibly other users might edit theirs to get it themselves).
Yes, you are right about cookies. You can implement this using a Session. PHP session use Cookies.
A simple approach:
//uploadbegin.php or login.php etc
session_start();
//validate user logon and
$_SESSION['USERNAME'] = <user name>
The following handles the file upload part.
//fileupload.php
session_start(); //start the session..
$user = $_SESSION['USERNAME'];
$content = //get the contents of the file received by HTTP Post to a file or a database.
$rand = //some random number
$fileName = "$user/file$rand";
//this file may be inside a directory named after the user or something like that.
file_put_contents($fileName, $content);
//add this filename and path to the database along with the $user.
Now for obtaining the file contents, check the $_SESSION variable(after the user logs on or something like that), get the file path, spit out the content onto some HTML editable control.
HTH
What I have always done for this is to store the file in a folder that is named the id of the file in a database. That way the original file name is preserved, and the folder names will be relatively short, and you don't technically need to store anything but the id in the database as the only file in the folder is the one you'll be looking for.
Cookies would work; create a random hash then use PHP's set_cookie() to keep that hash on their computer for as long as you would like, then call the cookie if it exists when the visitor comes to the page.
$fileName=($_COOKIE['your_cookie_name']) ? $_COOKIE['your_cookie_name'].'.txt' : 'myFile.txt';
Simply create a new directory for each user - name it based on something unique in the their database record such as user id.
When you go to move the file to its new location you move it to /files/$userid/ for example.
Then to access the files you would do the same thing img src="files/$userid/filename" for example.
No crossover, no chance of overwriting from someone else, other users cannot access files (unless they are given access or know the userid)
I'm using ext3 and according to Wikipedia, the maximum sub directories allowed is around 32000. Currently, each user is given their own directory to upload images on the filesystem. This makes it simple to retrieve images and ease of access. the folder structure is like this:
../images/<user id>/<image>
../images/<another user id>/<image>
I don't want to commit to a design that is doomed to fail with scalability, specifically when 32k users have upload images. While this may never be achieved, I still think it is bad practice.
Does anyone have an idea to avoid this problem? I would prefer not to use the database if possible for reasons of unnecessary queries and speed.
You could have a multi-level hierarchy, where each level is guaranteed to never exceed the maximum.
For example, if your user ids are defined with the regular expression [A-Za-z0-9_]+, you have 64 possible choices for any given character (I'm adding a space to account for spaces at the end when ids are shorter). Taking two characters together you have 64*64 = 4096 total possibilities. You cannot do three characters as that takes you over your limit. Then with this info you can create the directories by splitting the ids in groups of two letters. Example: user ids "miguel" and "miguel12345" would go to:
/images/mi/gu/el/<image>
/images/mi/gu/el/12/34/5/<image>
Note how the last component can be one char long if the length of the id is odd. This is fine, since the space is accounted as a possible char, you will still be within the max sub-directory limit.
Good luck!
Create a subdirectory for when the previous one gets full
/images/<a>/<user id 1>/<image>
/images/<a>/<user id 2>/<image>
...
/images/<a>/<user id 32000>/<image>
/image/<b>/<user id 32001>/<image>
...
If i'm getting this right and this ir some sort of web app You could use some abstract layer to imitate that folder structure and save the files in one directory. save file real name in database, and save uploaded file with some unique name. then list users files from database.
Ok, i'm trying to get data from a database to be pulled according to the url.
I have a database that is holding data for some announcements for individual customer websites. The websites are in individual directories on my website. (i.e., www.domain.com/website1/index.html, www.domain.com/website2/index.html, www.domain.com/website3/index.html, etc..) In the database i have a column that has each customers "filing name" (aka directory name - website1, website2, website3, etc..). I want to try and display only rows in the database where filingName = website1 for the domain "www.domain.com/website1/index.html". Hope this makes sense. I'm basically trying to figure out how to connect the dots between a single page and only pulling a specific customers records. Any thoughts??
Thanks!
Depending on how big your data set is, it might be "cheaper" to preprocess the URLs and store the individual components you need to match on. e.g. create extra fields for host/dir/querystring and store those individually, instead of a monolithic absolute URL. This would be safer than trying to do substring matches, especially if the substring you're matching could be part of multiple different urls ('abc' being part of 'abc.com' and 'abctv.com').
Use a like statement in your query:
SELECT * FROM `sites` WHERE `url` LIKE "%/website1/%";
Your best bet is going to be to store these identifiers on their own in the database (ex 1= website1, 2 = website 2) and name the directories accordingly. Then in your .htaccess file, manipulate the URL to look like your $_GET variable is the directory name, replace it with the matching name in your database. Do this with RewriteRule rule. This is the most absolute way to achieve this while keeping the same URL structure.
edit: whoops, didn't realize how old this thread was.