More efficient way of accessing/rewriting uploaded images - php

At the moment we currently store image uploads in the following directory and format:
/uploads/products/{product_id}.jpg
However, we don't want this uploads directory to be shown publicly, and nor do we want to expose a product's unique ID, so we rewrite the requested image URLs as follows (using .htaccess and PHP):
/images/products/{product_url}.jpg
For reference, this corresponds to the product in question, such as:
/products/{product_url}
This has advantages of hiding the original upload folder as well as the unique ID of each product, by rewriting the ID to the corresponding URL of that product (which is obviously public knowledge).
The rewriting part of this works great; we process each request via .htaccess and then use PHP to query the database based on the ID given and retrieve the product's URL. However, as this process can be called numerous times per page, the amount of connections to the database can be ridiculous and slow. On certain pages, we end up re-connecting to the database 20+ times, once for each product image requested, which feels completely wrong and probably isn't very efficient at all.
Is there a better way to manage this process? We'd still like to keep rewriting the images to show the URL of the product rather than expose the product's ID or indeed uploads folder, if possible.
I've thought of generating a JSON file with a list of ID => URL pairs and then parse this upon image request instead of re-connecting and querying the database, but I'm not sure if this would be a valid, faster alternative?
(I've also contemplated persistent database connections but I'd rather not go down that route for now, if any other viable solutions exist instead.)
EDIT:
Some more information might help. We currently use .htaccess to rewrite the above image requests to a single file, image.php:
RewriteRule ^images/products/(.*)$ image.php?url=$1 [L,QSA]
Every time an image is requested, this file is called, checking that the URL is valid, and if so displays the real file located under /uploads/products/{product_id}.jpg.
So every time a browser encounters an <img> tag pointing to /images/products/... it starts the rewrite/database process each time the image is loaded, which is what I'm questioning the efficiency of really (and hence a new database connection each time too).

Related

Generate unique URL for each customer downloading the same file

I need to deliver a file
example.com/realpathofthe/file.zip
to customers but I don't want to communicate the same URL to all customers (they could easily share URL to non-customers, and it would be difficult to track if the product is delivered or not). Instead, I'm generating a random string in PHP and want to share such URL:
example.com/download/djbf6xu83/file.zip
which will be different for each customer.
Question: should I generate symlinks for each customer to link the random string path to the path of the actual file on server?
Or use a RewriteRule in .htaccess for this? But then if done this way (ie rewrite /download/*/file.zip to the actual file), all random strings would link to the same file. This is not good because a non customer could generate a download link himself.
How to handle this correctly?
Note: I'd like to avoid if possible that PHP has to process the gigabytes of files data (through file_get_contents()) before delivering it. I thought (please correct me if I'm wrong) that it would be lighter for the server to let Apache distribute the file.
There can be many ways to approach this problem. Here's what I suggest.
Make a file, say /download.php and pass in a download code as an HTTP GET variable. So it'd say something like /download.php?code=abcdef, meanwhile generate and store codes for each customer in a database, and check if the code exists when someone opens download.php. Easy to track, and not creating a complex directory structure.

file_exists on Amazon S3

I have a web page that lists thousands of links to image files. Currently the way this is handled is with a very large HTML file that is manually edited with the name of the image and links to the image file. The images aren't managed very well so often many of the links are broken or the name is wrong.
Here is an example of one line of the thousands of lines in the HTML file:
<h4>XL Green Shirt<h4>
<h5>SKU 158f15 </h5>
[TIFF]
[JPEG]
[PNG]
<br />
I have the product information about the images in a database, so my solution was to write a page in PHP to iterate through each of the product numbers in the database and see if a file existed with the same id and then display the appropriate link and information.
I did this with the PHP function file_exists() since the product id is the same as the file name, and it worked fine on my local machine. The problem is all the images are hosted on AmazonS3, so running this function thousands of times to S3 always causes the request to time out. I've tried similar PHP functions as well as pinging the URL and testing for a 200 or 404 response, all time out.
Is there a solution that can check the existence of a file on a remote URL and consume few resources? Or is there a more novel way I can attack this problem?
I think you would be better served to make sure you enforce the existence of a file upon placing the record in the database than trying to check for the existence of thousands of files on each and every page load.
That being said, an alternate solution would possibly to use s3fs with local storage cache directory within which to check for existence of the file. This would be much faster than checking your S3 storage directly. s3fs would also provide a convenient way to write new files into the S3 storage.

Store uploaded user images

This is theoretical question.
Twitter keeps user profile images as following :
https://twimg0-a.akamaihd.net/profile_images/2044921128/finals_normal.png
It's impossible to imagine that they have a server which contains 2044921128 directories (for example).Maybe this URL is created using mod_rewrite?
So how to store an extremely large number of user images?
How to complete this scheme:
User chooses and PHP script uploads an image that's supposed to be his profile picture.
PHP script renames it, sets the PATH to store this image, moves it and finally adds this path to database for further use.
So how PATH must look like?
Nothing says that Akamai (which stores the pictures for Twitter based on your URL) actually stores the files in a directory structure. It's entirely possible that they are stored in memory (backed by say a directory structure), in a database (SQL / NoSQL) or any other storage mechanism that Akamai finds efficient.
You can route all requests for a URL that start with
https://yourService.com/profile_images/
to a PHP script of your choice that then parses the rest of the URL to determine which image is being requested and store/retrieve from whatever storage mechanism you want (perhaps a database) based on the parsed URL.
Here's a short blog post that shows one method of doing that using mod_rewrite
http://www.phpaddiction.com/tags/axial/url-routing-with-php-part-one/
Most OS-es discourage having more than 1024 directories/files within a single directory as any number above that slowly makes scanning and locating specific resources within it slower, so I think it is safe to think that akamai would not be having 2044921128 directories within profile_images!
Either it is a special unique identifier number generated within profile_images or one of the numerous ways in with url routing can be used to locate a resource. In any case, I do not think it would correspond to the number of directories..

Best way to store and deliver files on a CMS

First of all, this isn't another question about storing images on DB vs file system. I'm already storing the images on the file system. I'm just struggling to find a better way to show them to all my users.
I' currently using this system. I generate a random filename using md5(uniqueid()), the problem that I'm facing, is that the image is then presented with a link like this:
<img src="/_Media/0027a0af636c57b75472b224d432c09c.jpg" />
As you can see this isn't the prettiest way to show a image ( or a file ), and the name doesn't say anything about the image.
I've been working with a CMS system at work, that stores all uploaded files on a single table, to access that image it uses something like this:
<img src="../getattachment/94173104-e03c-499b-b41c-b25ae05a8ee1/Menu-1/Escritorios.aspx?width=175&height=175" />
As you can see the path to the image, now has a meaning compared to the other one, the problem is that this put's a big strain in the DB, for example, in the last site I made, I have an area that has around 60 images, to show the 60 images, I would have to do at least 60 individual query's to the database, besides the other query's to retrieve the various content on the page.
I think you understand my dilemma, has anyone gone trough this problem, that can give me some pointers on how to solve this?
Thanks..
You could always use .htaccess to rewrite the url and strip the friendly name. So taking your example, you could display the image source as something like:
/Media/0027a0af636c57b75472b224d432c09c/MyPictures/Venice.jpg
You could use htaccess to actually request:
/Media/0027a0af636c57b75472b224d432c09c.jpg
The other option is to have a totally friendly name:
/Media/MyPictures/Venice.jpg
And redirect it to a PHP file which examines the url and generates the hash so that it then knows the actual image file name on the server. The php script should then set the content type, read the image and output it. The major downside of this method is that you may end up with collisions as you two images may have the same hash. Given that the same thing can also occur with you current method I assume it isn't an issue.

Linking an image to a PHP file

Here's a bit of history first: Recently finished an application that allows me to upload images and store them in a directory, it also stores the information of that file in a database. Database stores the location, name and gives it an ID (auto_increment).
Okay, so what I'm doing now is allowing people to insert images into posts. Throwing a few ideas around on the best way to do this, as the application I designed allows people to move files around, and I don't want images in posts to break if an image is moved to a different directory (hence the storing of IDs).
What I'm thinking of doing is when linking to images, instead of linking to the file directly, I link it like so:
<img src="/path/to/functions.php?method=media&id=<IMG_ID_HERE>" alt="" />
So it takes the ID, searches the database, then from there determines the mime type and what not, then spits out the image.
So really, my question is: Is this the most efficient way?
Note that on a single page there could be from 3 to 30 images, all making a call to this function.
Doing that should be fine as long as you are aware of your memory limitations configured by both PHP and the web server. (Though you'll run into those problems merely by receiving the file first)
Otherwise, if you're strict about this being just for images, it could prove more efficient to go with Mike B's approach. Design a static area and just drop the images off in there, and record those locations in the records for their associated post. It's less work, and less to worry about... and I'm willing to bet your web server is better at serving files than most developer's custom application code will be.
Normally, I would recommend keeping the src of an image static (instead of a php script). But if you're allowing users to move them around the filesystem you need a way to track them
Some form of caching would help reduce the number of database calls required to fetch the filesystem location of each image. Should be pretty easy to put an indefinite TTL on the cache and invalidate upon the image being moved.
I don't think you should worry about that, what you have planned sounds fine.
But if you want to go out of your way to minimise requests or whatever, you could instead do the following: when someone embeds an image in a post, replace the anchor tag with some special character sequence, like [MYIMAGE=1234] or something. Then when a page with one or more posts is viewed, search through all the posts to find all the [MYIMAGE=] sequences, query the database to get all of the images' locations, and then output the posts with the [MYIMAGE=] sequences replaced with the appropriate anchor tags. You might or might not want to make sure users cannot directly add [MYIMAGE=] tags to their submitted content.
The way you have suggested will work, and it's arguably the nicest solution, but I should warn you that I've tried something similar before and it completely fell apart under load. The database seemed to be keeping up, but the script would start to time out and the image wouldn't arrive. That was probably down to some particular server configuration, but it's worth bearing in mind.
Depending on how much access you have to the server it's running on, you could just create a symlink whenever the user moves a file. It's a little messy but it'll be fast and reliable, and will also handle collisions if a user moves a file to where another one used to be.
Use the format proposed by Hammerite, and use [MYIMAGE=1234] tags (or something similar).
You can then fetch the id-path mappings before display, and replace the [MYIMAGE] tags with proper tags which link to images directly. This will yield much better performance than outputting images using php.
You could even bypass the database completely, and simply use image paths like (for example) /images/hash(IMAGEID).jpg.
(If there are different file formats, use [MYIMAGE=1234.png], so you can append png/jpg/whatever without a database call)
If the need arises to change the image locations, output method, or anything else, you only need to change the method where [MYIMAGE] tags are converted to full file paths.

Categories