This is theoretical question.
Twitter keeps user profile images as following :
https://twimg0-a.akamaihd.net/profile_images/2044921128/finals_normal.png
It's impossible to imagine that they have a server which contains 2044921128 directories (for example).Maybe this URL is created using mod_rewrite?
So how to store an extremely large number of user images?
How to complete this scheme:
User chooses and PHP script uploads an image that's supposed to be his profile picture.
PHP script renames it, sets the PATH to store this image, moves it and finally adds this path to database for further use.
So how PATH must look like?
Nothing says that Akamai (which stores the pictures for Twitter based on your URL) actually stores the files in a directory structure. It's entirely possible that they are stored in memory (backed by say a directory structure), in a database (SQL / NoSQL) or any other storage mechanism that Akamai finds efficient.
You can route all requests for a URL that start with
https://yourService.com/profile_images/
to a PHP script of your choice that then parses the rest of the URL to determine which image is being requested and store/retrieve from whatever storage mechanism you want (perhaps a database) based on the parsed URL.
Here's a short blog post that shows one method of doing that using mod_rewrite
http://www.phpaddiction.com/tags/axial/url-routing-with-php-part-one/
Most OS-es discourage having more than 1024 directories/files within a single directory as any number above that slowly makes scanning and locating specific resources within it slower, so I think it is safe to think that akamai would not be having 2044921128 directories within profile_images!
Either it is a special unique identifier number generated within profile_images or one of the numerous ways in with url routing can be used to locate a resource. In any case, I do not think it would correspond to the number of directories..
Related
At the moment we currently store image uploads in the following directory and format:
/uploads/products/{product_id}.jpg
However, we don't want this uploads directory to be shown publicly, and nor do we want to expose a product's unique ID, so we rewrite the requested image URLs as follows (using .htaccess and PHP):
/images/products/{product_url}.jpg
For reference, this corresponds to the product in question, such as:
/products/{product_url}
This has advantages of hiding the original upload folder as well as the unique ID of each product, by rewriting the ID to the corresponding URL of that product (which is obviously public knowledge).
The rewriting part of this works great; we process each request via .htaccess and then use PHP to query the database based on the ID given and retrieve the product's URL. However, as this process can be called numerous times per page, the amount of connections to the database can be ridiculous and slow. On certain pages, we end up re-connecting to the database 20+ times, once for each product image requested, which feels completely wrong and probably isn't very efficient at all.
Is there a better way to manage this process? We'd still like to keep rewriting the images to show the URL of the product rather than expose the product's ID or indeed uploads folder, if possible.
I've thought of generating a JSON file with a list of ID => URL pairs and then parse this upon image request instead of re-connecting and querying the database, but I'm not sure if this would be a valid, faster alternative?
(I've also contemplated persistent database connections but I'd rather not go down that route for now, if any other viable solutions exist instead.)
EDIT:
Some more information might help. We currently use .htaccess to rewrite the above image requests to a single file, image.php:
RewriteRule ^images/products/(.*)$ image.php?url=$1 [L,QSA]
Every time an image is requested, this file is called, checking that the URL is valid, and if so displays the real file located under /uploads/products/{product_id}.jpg.
So every time a browser encounters an <img> tag pointing to /images/products/... it starts the rewrite/database process each time the image is loaded, which is what I'm questioning the efficiency of really (and hence a new database connection each time too).
This is a best practice question regarding how to handle user uploads and distribute static content to a large number of concurrent users.
I have an upload form for images (png, jpg, gif) and other forms of multimedia (mp4, webm). The files are created, hashed, and stored in storage/app/attachments/ as their hash with no extension.
The request URL /file/md5/filename (such as /file/9d42b752ecd0e3b4542aeca21c7c50a9/dancing_cat.gif) will distribute the file with that name. The route is completely flexible, so replacing dancing_cat.gif with boring_cat_dancing_poorly.gif will still fetch the same file, but will distribute it with the new filename specified.
The point of this system is to stop duplicates from being uploaded while preserving the original name of the document that the uploader had. Other instances of the same file uploaded will also keep their name.
The code I have for this works, however, people raise issue with distributing static content through PHP. I am told that on my large, target platform, this system will work poorly and will immediately become a bottleneck. I am told I should use routes in Apache/nginx/Lighhttpd/whatever webserver to try and serve the static file directly by capturing the request URL before it hits PHP, but that may cause issues with mime types (i.e. an image won't render correctly).
My question is: What is the best practice for achieving what I am doing? How would a big website handle distributing static, user-uploaded content while avoiding a "PHP Bottleneck". I am early enough into my project to consider major rewrites, so please be as informative as possible.
I hope im clear whats the problem but, you may try to hash your current user name and file name plus file extension with sha1 or any shorter encoder wich generates a hash and its barely hard to generate same hash with theese combinations and add that generated hash to file name saved in ur dir. for example
/file/9d42b752ecd0e3b4542aeca21c7c50a9/gifhse3peo40ed-user_photo.jpg
You may then distribute hashes per user for example creating specific folder for specific user to save his uploads so when user reuploads any file the code will know where to save vice versa.
Hope it helps!
I have a website where users each have their own profile page. Here they can upload a single image that acts as their avatar. This is the only image users can upload to the server across the whole site. There is no archive so it can be overwritten if a user wishes to update their avatar.
I have never had to do anything like this before so I would like to open it up and ask for a suitable, scalable option for this website.
My initial thought is to give each user's image a random name, a string 6-12 characters long. This way you couldn't build a script that just pulls every user's profile pic from a directory (eg 001.png, 002.png etc). My second thought is that there should be only be a certain amount of images per directory to make sure they can be retrieved quickly by the server.
There may well be other things I'm missing here, I'm not sure on exact details hence why I'm asking.
I would recommend storing the images on something like Amazon S3. Depending on how many pictures you're storing, serving images can really take a tow on your web server. S3 is scalable and with multi-zone deployments through CloudFront (Amazon's version of a CDN), you can really speed up this part of your service.
It's good idea to not overload single directory. Very often you can see that images are stored in hierarchy of folders according to theirs first few letters. An example of this is
b5dcv5.jpg -> /b/5/b5dcv5.jpg
bsgb0g.jpg -> /b/s/bsgb0g.jpg
a5dcbt.jpg -> /a/5/a5dcbt.jpg
and so on. I thing you got the principle. Advantage of this is to have access to and image in O(log N) when filenames are uniformly distributed instead of O(N) as it would be in single folder solution.
I've been using base64 to store them within an SQL database. No need to manage files. It works well for relatively low resolution options.
How about not storing them as images at all?
You could leverage an external placeholder for each user, you could cache a random image from lorempixel.com: http://lorempixel.com/100/100. Use an MD5 hash of the user's name or ID. You could also just save the image using the user's ID, for example 442.jpg.
What are some ideas out there for storing images on web servers. Im Interacting with PHP and MySQL for the application.
Question 1
Do we change the name of the physical file to a000000001.jpg and store it in a base directory or keep the user's unmanaged file name, i.e 'Justin Beiber Found dead.jpg'? For example
wwroot/imgdir/a0000001.jpg
and all meta data in a database, such as FileName and ReadableName, Size, Location, etc.
I need to make a custom Filemanager and just weighing out some pros and cons of the underlying stucture of how to store the images.
Question 2
How would I secure an Image from being downloaded if my app/database has not set it to be published/public?
In my app I can publish images, or secure them from download, if I stored the image in a db table I could store it as a BLOB and using php prevent the user from downloading it. I want to be able to do the same with the image if it was store in the FileSystem, but im not sure if this is possible with PHP and Files in the system.
Keeping relevant file names can be good for SEO, but you must also make sure you don't duplicate.
In all cases I would rename files to lowercase and replace spaces by underscores (or hyphens)
Justin Beiber Found dead.jpg => justin_beiber_finally_dead.jpg
If the photo's belongs to an article or something specific you can perhaps add the article ID to the image, i.e. 123_justin_beiber_found_dead.jpg. Alternatively you can store the images in an article specific folder, i.e. /images/123/justin_beiber_found_dead.jpg.
Naming the files like a0000001 removes all relevance to the files and adds no value whatsoever.
Store (full) filepaths only in the database.
For part 2;
I'm not sure what the best solution here is, but using the filesystem, I think you will have to configure apache to serve all files in a particular directory by PHP. In PHP you can then check if the file can be published and then spit it out. If not, you can serve a dummy image. This however is not very efficient and will be much heavier on apache.
I have an image that send to affiliate for advertising.
so, how can I find it out from my server the number of times that image been downloaded?
does server log keep track of image upload count?
---- Addition ----
Thanks for the reply.. few more questions
because I want to do ads rotation, and tracking IP address, etc.
so, i think I should do it by making a dynamic page (php) and return the proper images, right?
In this case, is there anyway that I can send that information to Google Analytics from the server? I know I can do it in javascript. but now, since the PHP should just return the images file. so what I should do? :)
Well This can be done irrespective of your web Server or Language / Platform.
Assuming the File is Physically stored in a Certain Directory.
Write a program that somehow gets to know which file has to be downloaded. Through GET/POST parameters. There can be even more ways.
then point that particullar file physically.
fopen that file
read through it byte by byte
print them
fclose
store/increment/updatethe download counter in database/flatfile
and in the database you may keep the record as md5checksum -> downloadCounter
It depends on a server and how you download the image.
1) Static image (e.g. URL points to actual image file): Most servers (e.g. Apache) store each URL served (including the GET request for the URL for the image) in access log. There are a host of solutions for slicing and dicing access logs from web servers (especially Apache) and obtaining all sorts of statistics including count of accesses.
2) Another approach for fancier stuff is to serve the image by linking to a dynamic page which does some sort of computation (from simple counter increment to some fancy statistics collection) and responds with HTTP REDIRECT to a real image.
Use Galvanize a PHP class for GA that'll allow you to make trackPageView (for a virtual page representing your download, like the file's url) from PHP.
HTTP log should have a GET for every time that image was accessed.
You should be able to configure your server to log each download. Then, you can just count the number of times the image appears in the log file.