I'm building a torrent site where users can upload torrents.
What would be a good way to save the .torrent files?
I can think of several options:
Saving the torrent file itself in a folder on the server (not the best option since OS's have limitations saving lots of files in 1 folder)
Saving the torrent file itself in different folders per month
Saving the contents of the torrent file in the database (any limitations / performance issues / any other caveats?)
Any other options?
If you're concerned about having too much files within a directory, you need to distribute the files across multiple directories. Storing them by month, day or week is one way how to do so. It depends a bit how many files you really have I would say.
You can try to more or less equally distribute the files within subdirectories by hashing their filename and use the whole or part of the hash to generate one or multiple subdirectory names:
$hash = md5($fileName);
$srotePath = sprintf('%s/%s', substr($hash,0,2), $fileName);
This would pick the first two character from an md5 hash (00-ff, 256 subdirectories) to generate the subdirectory.
The benefit compared with a date is, that you always can find out in which directory a file is stored when you have it's name.
That does also mean, that you can not have duplicate files with the same name (which might have worked for the date based subfolder).
I'm use this:
Saving the torrent file itself in different folders per month
using database is not at all good. just save them as static files and maybe even gzip them .
just make sure to rename them uniquely with some kind of hashing .
if you don't have any problem in using external provider you can use TorCache
I would say saving the .torrent file in a weekly/monthly folder is the best option.
That way you can use the OS' filesystem cache, even if you store the .torrents outside the document root for limiting user access (in the end you will have to open() the file anyway)
Leaving torrents in the database would eventually lead to slow performance as the DB increases in size.
May be you try Amazon S3? It's cheap, easy and fast.
Uploading them automatically saves .torrent files. http://www.tizag.com/phpT/fileupload.php has a good example. Give it a try.
Related
I have a topic/question concerning your upload filename standards, if any, that you are using. Imagine you have an application that allows many types of documents to be uploaded to your server and placed into a directory. Perhaps the same document could even be uploaded twice. Usually, you have to make some kind of unique filename adjustment when saving the document. Assume it is saved in a directory, not saved directly into a database. Of course, the Meta Data would probably need to be saved into the database. Perhaps the typical PHP upload methods could be the application used; simple enough to do.
Possible Filenaming Standard:
1.) Append the document filename with a unique id: image.png changed to image_20110924_ahd74vdjd3.png
2.) Perhaps use a UUID/GUID and store the actual file type (meta) in a database: 2dea72e0-a341-11e0-bdc3-721d3cd780fb
3.) Perhaps a combination: image_2dea72e0-a341-11e0-bdc3-721d3cd780fb.png
Can you recommend a good standard approach?
Thanks, Jeff
I always just hash the file using md5() or sha1() and use that as a filename.
E.g.
3059e384f1edbacc3a66e35d8a4b88e5.ext
And I would save the original filename in the database may I ever need it.
This will make the filename unique AND it makes sure you don't have the same file multiple times on your server (since they would have the same hash).
EDIT
As you can see I had some discussion with zerkms about my solution and he raised some valid points.
I would always serve the file through PHP instead of letting user download them directly.
This has some advantages:
I would add records into the database if users upload a file. This would contain the user who uploaded the file, the original filename and tha hash of the file.
If a user wants to delete a file you just delete the record of the user with that file.
If no more users has the file after delete you can delete the file itself (or keep it anyway).
You should not keep the files somewhere in the document root, but rather somewhere else where it isn't accessible by the public and serve the file using PHP to the user.
A disadvantage as zerkms has pointed out is that serving files through PHP is more resource consuming, although I find the advantages to be worth the extra resources.
Another thing zerkms has pointed out is that the extension isn't really needed when saving the file as hash (since it already is in the database), but I always like to know what kind of files are in the directory by simply doing a ls -la for example. However again it isn't really necessarily.
I was wondering what would be the good way to organize documents in a filesystem for my php/mysql application. All info of the docs are stored in a db i am curious about filesystem organization. Is this good way to do it? Is there a better way?
Main folder
/documents
One folder per client
/documents/client1
Documents are uploaded here per client
/documents/client1/queue
After users fill the form docs are saved to a database and moved in this folder
/documents/client1/docs
Original and filesystem names of the document are stored in a database and filesystem name is something like md5($time.$filename.$client_id) and the document path looks like this
/documents/client1/docs/6f99caa11e78697612d8f1b4481cd76a.pdf
I need (minimum) first page of pdf for the thumb and auto barcode reading from the first page
/documents/client1/docs/6f99caa11e78697612d8f1b4481cd76a/6f99caa11e78697612d8f1b4481cd76a-0.gif
I also have
/documents/client1/scan
where files from the scanner goes so users can import them in the database and after that they are renamed and moved to:
/documents/client1/docs
I wonder if i should put files for specific date in a date folder like this:
/documents/client1/docs/20110915/6f99caa11e78697612d8f1b4481cd76a.pdf
Or should i use completely different folder structure?
Why not using only one common folder for temporary (uploaded) files ? It could make maintenance routine easier (for example deleting all old files).
I wonder if i should put files for specific date in a date folder like
this:
/documents/client1/docs/20110915/6f99caa11e78697612d8f1b4481cd76a.pdf
If the date is handled by your database it's needless. Nevertheless if for some reason you need to access this file manually (not using your interface) it could be helpful...
Last but not least be aware of your filesystem's limitation. On some filesytem/os you could have a limitation on the number of file per folder (huge but still).
I would prefere a structure that hash the file organisation according to the file name.
/documents/6/f/9/6f99caa11e78697612d8f1b4481cd76a.pdf
This will keep good file access and its easy to make the structure deeper if needed.
The problem with using the date as a directory, is that the file location is dependant on some information that may change (we never know!)
Basically i have simple form which user uses for files uploading. Files should be stored under /files/ directory with some subdirectories for almost equally splitting files. e.g. /files/sub1/sub2/file1.txt
Also i need to not to store equal files (by filename).
I have own solution. Calculate sha1 from filename. Take first 5 symbols - abcde for example and put file in /files/a/b/c/d/e/ this works well, but gives situation when one folder contains 4k files, 2nd 6k files. Is there any way to make files count be more closer to each other? Max files count can be 10k or 10kk.
Thanks for any help.
P.S. May be i explained something wrong, so once again :) Task is simple - you have only html and php (without any db) and files directory where you should store only uploaded files without any own data. You should develop script that can handle storing uploads to files directory without storing duplicates (by filename) and split uploaded files by subdirectories by files count in each directory (optimal and count files in each directory should be close to each other).
I have no idea why you want it taht way. But if you REALLY have to do it this way, iI would suggest you set a limit how many bytes are stored in each folder. Everytime you have to save the data you open a log with
the current sub
the total number of bytes written to that directory
If necesary you create a new sub diretory(you coulduse th current timestempbecause it wont repeat) and reset the bytecount
Then you save the file and increment the byte count by the number of bytes written.
I highly doubt it is worth the work, but I do not really know why you want to distribute the files that way.
which is a better place to upload images to? A database or in the web directory? And why?
You should only store images in your database if you have a specific need to, like security, or like an absolute to-die-for need to keep all custom data in a database.
Other than that, getting large files into databases usually isn't worth the trouble. Storing and retrieving the file get that much more complicated to implement, and database updates/upgrades/conversions have that many more things that can go wrong.
I don't see that there is an advantage storing images in a database. There is certainly no inherent security in this. Files are for the filesystem so store your images in there.
I don't think you can "upload" an image to a database. You can store the image's string value in the database and stream it via "header("Content-Type")" later on. That saves space in your web server, but obviously takes space on your database.
If I were you, I'd upload to a web directory, that way you have the image for a regular URL request later on. If you don't have it in a regular directory, you'll have to connect to the database every time the image is requested, and stream it then.
Well It depends on your requirement.
If you are considering security as a major issue then definitely you should store it in db other wise nothing will leads you to store images in db.
Also retieving images from database is quite complicated as in database images are stored as binary data. So if you have specific need then only store images in database other wise storing images in directory would be fine.
As you can see there are many reasons why to use/why not to use the database for image storage. Personally I prefer not to use the database for storage of files (images, documents etc), except when I'm ordered to store them.
-Sometimes you're tired and screw up a query, something like "SELECT * FROM images", this will kill the server if there are too many images with huge size (2MB and more) in the database.
-The security issue: you can still save the files in the disk and still be secure, how? Well save the files outside the web directory, whenever the file is requested read the file and give it to the user.
-If by any chance you are using MySQL: if your database has got to big (say 2-3 GB), and you are using a shared hosting, well good luck making that backup or trying to restore that image database.
It's just my point of view
Suppose there is a site where we are uploading images. Now, when we have to display the album of that particular logged-in user. What we can do is:
We save the path of that image in the database and retrieve the image
Save only the name(unique) of the image and use fopen() because we save all the uploaded images in a single folder
Now my question is:
What are the various options to retrieve that file instead of fopen()? Meaning, is there anything else that is faster than this?
You shouldn't need to use fopen to display a gallery. Why can't you just show the images like that:
<img src="/folder/with/your/images/<?php echo $unique_name; ?>" />
?
I don't know if it's faster, but file_get_contents() is more succinct. With a binary file like an image you will probably want to use the FILE_BINARY flag.
I think readfile() is would be best?
For ultimate speed, keep the file stored in a RAM disk.
Look up tmpfs for more details.
If your web server is lighttpd or Apache with mod_xsendfile, then you can send the file by specifying a special X-Sendfile header. This allows the web server to make a sendfile call, which can be very, very fast.
The reason is sendfile is usually an optimized kernel call which can take bits from the file system and directly put them on a TCP socket.
How many images are in your uploaded folder?
If you have thousands of them, consider making your folders smaller.
For example, the terminfo system uses /usr/share/lib/terminfo/a/ subdirectory to hold entries that begin with 'a'.
The CPAN system for Perl uses authors/id/A/AB/ABRAHAM to divide things up.
The point is that the system probably does a linear search through the directory to find the files, and not all of them can be at the beginning. By splitting them up into smaller sub-directories, you greatly improve the lookup time - and hence the speed of any and all file open functions.
There was a paper or discussion about this - I think it was in Eric Raymond's 'The Art of Unix Programming', but it referenced some papers where the measurements were made - and it can give valuable speedups at minimal programming cost.
If done properly, you can write a simple function to generate the file name from the image storage base directory and the image file name.