Suppose there is a site where we are uploading images. Now, when we have to display the album of that particular logged-in user. What we can do is:
We save the path of that image in the database and retrieve the image
Save only the name(unique) of the image and use fopen() because we save all the uploaded images in a single folder
Now my question is:
What are the various options to retrieve that file instead of fopen()? Meaning, is there anything else that is faster than this?
You shouldn't need to use fopen to display a gallery. Why can't you just show the images like that:
<img src="/folder/with/your/images/<?php echo $unique_name; ?>" />
?
I don't know if it's faster, but file_get_contents() is more succinct. With a binary file like an image you will probably want to use the FILE_BINARY flag.
I think readfile() is would be best?
For ultimate speed, keep the file stored in a RAM disk.
Look up tmpfs for more details.
If your web server is lighttpd or Apache with mod_xsendfile, then you can send the file by specifying a special X-Sendfile header. This allows the web server to make a sendfile call, which can be very, very fast.
The reason is sendfile is usually an optimized kernel call which can take bits from the file system and directly put them on a TCP socket.
How many images are in your uploaded folder?
If you have thousands of them, consider making your folders smaller.
For example, the terminfo system uses /usr/share/lib/terminfo/a/ subdirectory to hold entries that begin with 'a'.
The CPAN system for Perl uses authors/id/A/AB/ABRAHAM to divide things up.
The point is that the system probably does a linear search through the directory to find the files, and not all of them can be at the beginning. By splitting them up into smaller sub-directories, you greatly improve the lookup time - and hence the speed of any and all file open functions.
There was a paper or discussion about this - I think it was in Eric Raymond's 'The Art of Unix Programming', but it referenced some papers where the measurements were made - and it can give valuable speedups at minimal programming cost.
If done properly, you can write a simple function to generate the file name from the image storage base directory and the image file name.
Related
recently i have tried to store images in Mysql database (use BLOB data type), using php and web to upload and store it. It works fine, except when loading large enough image , it is going to load very slowly. Is there any way to load this image faster ?
note : my friend suggests me to use force caching to this image ( he says something about change the content-header of image ), but i don't know how to do it. and i doubt it will bring significantly better performance.
Thanks in advance
The images should definitely be cached. I think this can be done mostly just by making sure the image url you make is always the same for the same picture. What I think your problem is though is you need to change max_allowed_packet. If this is too small it won't be able to send much data over the network at one time. Also if the pictures are truly that big I'd also consider changing the quality of the picture to maybe 70%? All the resize image functions have a way to change it. ie: http://php.net/manual/en/function.imagejpeg.php. Hope that helps. I'd also look into YSlow. It'll help point out what exactly is wrong with your images that is making it load slowly. Whether it is quality, cache, compression or w/e it may be.
Caching the images could be used when images stored on filesystem. If they are dynamically popped and printed from the database they will be fetched each time the PHP code ask for them.
It could be that images are fetched in a dozens of ms, but a 3MB image data could be downloaded to clients browser for 5 seconds to 1 minute (depending on the connection speed). There is not much to do with it (even less on common shared hostings).
I would suggest storing the images on a filesystem so they could be cached by the browser, or You could even set a memcache on Apache server so until the expire they would be served from the cache.
I guess that caching can be good or not. Instead of I suggest you to upload images or other files to a folder and save on DB just the information about file: name, type, size, folder, etc...
If you don't have any requisite that requires you to store an image in the database, images are better inside a folder and what you should store in the database is the path or the name of each of them.
This would make them load normally. Just depending on the size of the image, of course.
That's what you will find in almost all web applications.
I'm building a torrent site where users can upload torrents.
What would be a good way to save the .torrent files?
I can think of several options:
Saving the torrent file itself in a folder on the server (not the best option since OS's have limitations saving lots of files in 1 folder)
Saving the torrent file itself in different folders per month
Saving the contents of the torrent file in the database (any limitations / performance issues / any other caveats?)
Any other options?
If you're concerned about having too much files within a directory, you need to distribute the files across multiple directories. Storing them by month, day or week is one way how to do so. It depends a bit how many files you really have I would say.
You can try to more or less equally distribute the files within subdirectories by hashing their filename and use the whole or part of the hash to generate one or multiple subdirectory names:
$hash = md5($fileName);
$srotePath = sprintf('%s/%s', substr($hash,0,2), $fileName);
This would pick the first two character from an md5 hash (00-ff, 256 subdirectories) to generate the subdirectory.
The benefit compared with a date is, that you always can find out in which directory a file is stored when you have it's name.
That does also mean, that you can not have duplicate files with the same name (which might have worked for the date based subfolder).
I'm use this:
Saving the torrent file itself in different folders per month
using database is not at all good. just save them as static files and maybe even gzip them .
just make sure to rename them uniquely with some kind of hashing .
if you don't have any problem in using external provider you can use TorCache
I would say saving the .torrent file in a weekly/monthly folder is the best option.
That way you can use the OS' filesystem cache, even if you store the .torrents outside the document root for limiting user access (in the end you will have to open() the file anyway)
Leaving torrents in the database would eventually lead to slow performance as the DB increases in size.
May be you try Amazon S3? It's cheap, easy and fast.
Uploading them automatically saves .torrent files. http://www.tizag.com/phpT/fileupload.php has a good example. Give it a try.
I'm making a real simple "backend" (PHP5) for two flash/air-applications. One of them will upload a photo, the backend will save it to a folder, and the second app will poll the backend for new photo's and show them.
I don't got any access to a database, so the backend has to be pure PHP5 and nothing more. That's why I chose to save the images to a folder (with a timestamp in their names) and use readdir() to get them back.
This all works like a charm. Nevertheless, I would really like to make sure the backend only returns photo's that are completely uploaded, preventing the second app to try to load an unfinished image. Are there any methods/tricks that I can use to validate a file?
You could check the filesize a couple hundred milliseconds apart and see if it changes:
$first = filesize($file);
// wait 100ms
usleep(100000);
$second = filesize($file);
if($first == $second) {
// file is no longer being actively uploaded
}
The usual trick for atomic filesystem operations is to write into a temporary file that is not matched by the reader (e.g. XXX.jpg.tmp) and once it's completely uploaded, rename it to it's target name. Renames on the same volume are atomic, so there is no point where the file is either uncomplete or unavailable.
A really easy and common way to do so would be to create a trigger file based on the files name, so that you get something like
123.jpg
123.rdy
or
123.jpg
123.jpg.rdy
You create that file (just an empty stub) as soon as the upload is complete. The application that grabs files to load only cares about files with a trigger file and then processes those. Alternatively, you could also save the uploaded file as ie. 123.bsy or 123.jpg.bsy while it is still being uploaded and then rename it to the finale name 123.jpg after the upload is done. Since renames in the same directory are usually really cheap operations in term of processing time, the chances of running in a race condition should be pretty low. (This might or might now depend on the OS used, though ...)
If you need to keep the files in that place, you could, of course, use a database where you add a record for each file, as the upload is complete. The other app could then just provide files with a matching database record.
After writing this all down I figgered it out myself. What I did was adding the exact amount of bytes in the filename as well and validate that while outputting the list of images. The .tmp/.bsy-sollution is nice also, but I read it a bit to late :)
Upside to my solution is that no more renaming is required after the upload is done. Thanks everybody for your fast answers!
I can't figure out a good solution for limiting the storage amount a user may access with his files.
In the application users are allowed to upload a limit amount of files. The limitation is based on total file size. I.e. a user might be allowed to store 50 Mb on the server.
This number is stored in a database so that it can be easily increased/decreased.
The language used is PHP, but I guess the solution isn't depended on the scripting language.
Very sorry if the question is unclear. I don't really know what to ask for more than a strategy to implement.
Thanks!
Keeping track of how much space has been used should be straightforward - with each upload you could store the space used in another table. The PHP filesize() function will tell you the size of a file on disk. Use a SUM() SQL query to get the total size of all the files uploaded by each user, and compare it against their quota limit.
The tricky bit is when you're approaching the limit - you can't tell how big the file is going to be before it's uploaded. So you'll have to get the user to upload a file and then check its size and see if it takes them over quota. If the file's too big, delete and let the user know they're out of space.
A simple approach would be to store the filename, dates and sizes of a users uploads in the database too. Then you can easily reject an upload when it exceeds their total storage.
This also makes it easy to show a list of files sorted in a variety of ways, allowing a user close to their limit to select some files for removal.
You could even use the average size of the files the user uploads to warn them when they are getting close to using up all their space.
You can use a script (something like that) that iterates through a directory contents, calculates filesizes and then deletes files that don't fit or rejects new uploads. But I think that this better be done with some sort of directory restrictions on a server. Unfortunately, I'm not a linux guy, so I don't know exactly how to do that, but this post might be helpful.
Solution of drewm is good, I just want to add few words about tricky part he mentioned.
Yes, it is impossible to predict file size before file is uploaded, as you cannot check filesize using javascript on user`s file upload page. However you can do it using flash based file uploader (swfupload.org for example). By using it you can check files size before upload is started and check it against upload limit you have. This way you will save time for user (no need to upload file to get "limit exceed error" message).
As a bonus you can show user upload progress bar as well.
Don' forget about OS solutions. If the files are stored in a "user" specific directory, then you can use the OS to find the disk spaced used in that directory. A Linux solution would something like this:
$dirSize = explode("\t", `du -ks $userDir`); // Will return an array of size, dirName
if ($dirSize[0] > MAX_DIR_LIMIT) print "USER IS OVER QUOTA";
To quote some famous words:
“Programmers… often take refuge in an understandable, but disastrous, inclination towards complexity and ingenuity in their work. Forbidden to design anything larger than a program, they respond by making that program intricate enough to challenge their professional skill.”
While solving some mundane problem at work I came up with this idea, which I'm not quite sure how to solve. I know I won't be implementing this, but I'm very curious as to what the best solution is. :)
Suppose you have this big collection with JPG files and a few odd SWF files. With "big" I mean "a couple thousand". Every JPG file is around 200KB, and the SWFs can be up to a few MB in size. Every day there's a few new JPG files. The total size of all the stuff is thus around 1 GB, and is slowly but steadily increasing. Files are VERY rarely changed or deleted.
The users can view each of the files individually on the webpage. However there is also the wish to allow them to download a whole bunch of them at once. The files have some metadata attached to them (date, category, etc.) that the user can filter the collection by.
The ultimate implementation would then be to allow the user to specify some filter criteria and then download the corresponding files as a single ZIP file.
Since the amount of criteria is big enough, I cannot pre-generate all the possible ZIP files and must do it on-the-fly. Another problem is that the download can be quite large and for users with slow connections it's quite likely that it will take an hour or more. Support for "resume" is therefore a must-have.
On the bright side however the ZIP doesn't need to compress anything - the files are mostly JPEGs anyway. Thus the whole process shouldn't be more CPU-intensive than a simple file download.
The problems then that I have identified are thus:
PHP has execution timeout for scripts. While it can be changed by the script itself, will there be no problems by removing it completely?
With the resume option, there is the possibility of the filter results changing for different HTTP requests. This might be mitigated by sorting the results chronologically, as the collection is only getting bigger. The request URL would then also include a date when it was originally created and the script would not consider files younger than that. Will this be enough?
Will passing large amounts of file data through PHP not be a performance hit in itself?
How would you implement this? Is PHP up to the task at all?
Added:
By now two people have suggested to store the requested ZIP files in a temporary folder and serving them from there as usual files. While this is indeed an obvious solution, there are several practical considerations which make this infeasible.
The ZIP files will usually be pretty large, ranging from a few tens of megabytes to hundreads of megabytes. It's also completely normal for a user to request "everything", meaning that the ZIP file will be over a gigabyte in size. Also there are many possible filter combinations and many of them are likely to be selected by the users.
As a result, the ZIP files will be pretty slow to generate (due to sheer volume of data and disk speed), and will contain the whole collection many times over. I don't see how this solution would work without some mega-expensive SCSI RAID array.
This may be what you need:
http://pablotron.org/software/zipstream-php/
This lib allows you to build a dynamic streaming zip file without swapping to disk.
Use e.g. the PhpConcept Library Zip library.
Resuming must be supported by your webserver except the case where you don't make the zipfiles accessible directly. If you have a php script as mediator then pay attention to sending the right headers to support resuming.
The script creating the files shouldn't timeout ever just make sure the users can't select thousands of files at once. And keep something in place to remove "old zipfiles" and watch out that some malicious user doesn't use up your diskspace by requesting many different filecollections.
You're going to have to store the generated zip file, if you want them to be able to resume downloads.
Basically you generate the zip file and chuck it in a /tmp directory with a repeatable filename (hash of the search filters maybe). Then you send the correct headers to the user and echo file_get_contents to the user.
To support resuming you need to check out the $_SERVER['HTTP_RANGE'] value, it's format is detailed here and once your parsed that you'll need to run something like this.
$size = filesize($zip_file);
if(isset($_SERVER['HTTP_RANGE'])) {
//parse http_range
$range = explode( '-', $seek_range);
$new_length = $range[1] - $range[0]
header("HTTP/1.1 206 Partial Content");
header("Content-Length: $new_length");
header("Content-Range: bytes {$range[0]}-$range[1]");
echo file_get_contents($zip_file, FILE_BINARY, null, $range[0], $new_length);
} else {
header("Content-Range: bytes 0-$size");
header("Content-Length: ".$size);
echo file_get_contents($zip_file);
}
This is very sketchy code, you'll probably need to play around with the headers and the contents to the HTTP_RANGE variable a bit. You can use fopen and fwrite rather than file_get contents if you wish and just fseek to the right place.
Now to your questions
PHP has execution timeout for scripts. While it can be changed by the script itself, will there be no problems by removing it completely?
You can remove it if you want to, however if something goes pear shaped and your code get stuck in an infinite loop at can lead to interesting problems should that infinite loop be logging and error somewhere and you don't notice, until a rather grumpy sys-admin wonders why their server ran out of hard disk space ;)
With the resume option, there is the possibility of the filter results changing for different HTTP requests. This might be mitigated by sorting the results chronologically, as the collection is only getting bigger. The request URL would then also include a date when it was originally created and the script would not consider files younger than that. Will this be enough?
Cache the file to the hard disk, means you wont have this problem.
Will passing large amounts of file data through PHP not be a performance hit in itself?
Yes it wont be as fast as a regular download from the webserver. But it shouldn't be too slow.
i have a download page, and made a zip class that is very similar to your ideas.
my downloads are very big files, that can't be zipped properly with the zip classes out there.
and i had similar ideas as you.
the approach to give up the compression is very good, with that you not even need fewer cpu resources, you save memory because you don't have to touch the input files and can pass it throught, you can also calculate everything like the zip headers and the end filesize very easy, and you can jump to every position and generate from this point to realize resume.
I go even further, i generate one checksum from all the input file crc's, and use it as an e-tag for the generated file to support caching, and as part of the filename.
If you have already download the generated zip file the browser gets it from the local cache instead of the server.
You can also adjust the download rate (for example 300KB/s).
One can make zip comments.
You can choose which files can be added and what not (for example thumbs.db).
But theres one problem that you can't overcome with the zip format completely.
Thats the generation of the crc values.
Even if you use hash-file to overcome the memory problem, or use hash-update to incrementally generate the crc, it will use to much cpu resources.
Not much for one person, but not recommend for professional use.
I solved this with an extra crc value table that i generate with an extra script.
I add this crc values per parameter to the zip class.
With this, the class is ultra fast.
Like a regular download script, as you mentioned.
My zip class is work in progress, you can have a look at it here: http://www.ranma.tv/zip-class.txt
I hope i can help someone with that :)
But i will discontinue this approach, i will reprogram my class to a tar class.
With tar i don't need to generate crc values from the files, tar only need some checksums for the headers, thats all.
And i don't need an extra mysql table any more.
I think it makes the class easier to use, if you don't have to create an extra crc table for it.
It's not so hard, because tars file structure is easier as the zip structure.
PHP has execution timeout for scripts. While it can be changed by the script itself, will there be no problems by removing it completely?
If your script is safe and it closes on user abort, then you can remove it completely.
But it would be safer, if you just renew the timeout on every file that you pass throught :)
With the resume option, there is the possibility of the filter results changing for different HTTP requests. This might be mitigated by sorting the results chronologically, as the collection is only getting bigger. The request URL would then also include a date when it was originally created and the script would not consider files younger than that. Will this be enough?
Yes that would work.
I had generated a checksum from the input file crc's.
I used this as an e-tag and as part of the zip filename.
If something changed, the user can't resume the generated zip,
because the e-tag and filename changed together with the content.
Will passing large amounts of file data through PHP not be a performance hit in itself?
No, if you only pass throught it will not use much more then a regular download.
Maybe 0.01% i don't know, its not much :)
I assume because php don't do much with the data :)
You can use ZipStream or PHPZip, which will send zipped files on the fly to the browser, divided in chunks, instead of loading the entire content in PHP and then sending the zip file.
Both libraries are nice and useful pieces of code. A few details:
ZipStream "works" only with memory, but cannot be easily ported to PHP 4 if necessary (uses hash_file())
PHPZip writes temporary files on disk (consumes as much disk space as the biggest file to add in the zip), but can be easily adapted for PHP 4 if necessary.