file_exists on Amazon S3 - php

I have a web page that lists thousands of links to image files. Currently the way this is handled is with a very large HTML file that is manually edited with the name of the image and links to the image file. The images aren't managed very well so often many of the links are broken or the name is wrong.
Here is an example of one line of the thousands of lines in the HTML file:
<h4>XL Green Shirt<h4>
<h5>SKU 158f15 </h5>
[TIFF]
[JPEG]
[PNG]
<br />
I have the product information about the images in a database, so my solution was to write a page in PHP to iterate through each of the product numbers in the database and see if a file existed with the same id and then display the appropriate link and information.
I did this with the PHP function file_exists() since the product id is the same as the file name, and it worked fine on my local machine. The problem is all the images are hosted on AmazonS3, so running this function thousands of times to S3 always causes the request to time out. I've tried similar PHP functions as well as pinging the URL and testing for a 200 or 404 response, all time out.
Is there a solution that can check the existence of a file on a remote URL and consume few resources? Or is there a more novel way I can attack this problem?

I think you would be better served to make sure you enforce the existence of a file upon placing the record in the database than trying to check for the existence of thousands of files on each and every page load.
That being said, an alternate solution would possibly to use s3fs with local storage cache directory within which to check for existence of the file. This would be much faster than checking your S3 storage directly. s3fs would also provide a convenient way to write new files into the S3 storage.

Related

For image upload, should I add a field to the MYSQL database to check against, or simply use PHP to check if the image exists?

I have a simple image upload form. When someone uploads an image, it is for a football pool, so there always is a $poolid that goes with the image they upload.
Right now, I am naming the uploaded image using the poolid. So for example, if someone uploads an image, it might get named P0714TYER7EN.png.
All the app will ever do is, when it outputs the football pool's page, it will check to see if an image exists for that pool and if so, it will show it. It checks like this:
if (file_exists("uploads/".$poolid.".png")) { //code to show it }
My first thought when planning this was to add a field called "image" in my MYSQL database's table for all the pool information (called pools) and I would store a value of either the image name (P0714TYER7EN.png) or empty if there wasn't one uploaded. Then I would check that field in the database to determine if an image exists or not.
But I realized I don't really need to store anything in the database because I can simply use the PHP file_exists check above to know if there is an image or not.
In other words, it would seem redundant to have a field in the database.
Everything works doing it this way (i.e. NOT having a field in the database) but I'm wondering if this is bad practice for any reason?
If anyone feels that I should absolutely still have a field in the database, please share your thoughts. I just want to do it the proper way.
Thank you.
The approach could depend a lot on what exactly you're trying to do. Seems like the options you would have is:
File System Only
Benefits would be the speed of accessing static files of an image and use of it in your HTML directly which makes it a more simple solution. Also if you're comfortable with using these functions it will be faster to finish.
Drawbacks would be that you're limited to using file_exists and similar. Any code to manage files this way has to be very specific and static. You also can not search or perform operations efficiently on this. In general relying on the file system alone is not a best practice from my experience.
Database Only
Benefits, you can use Blob type as a column with meta data like owner, uploader, timestamp, etc. in the same row. This makes checking for existing files faster as well as any searching or other operations fast and efficient.
Drawbacks, you can't serve files statically using a CDN or even a cookie-less subdomain or other strategies for page performance. You also have to use PHP and MySQL to generate then serve any images via code rather than just referring to the image file directly.
Hybrid
Benefits, basically the same benefits as both above. You can have your metadata in MySQL with a MD5 hash and location of the file available as well. Your PHP then renders the page with a direct link to the file rather than processing the Blob to an image. You could use this in conjunction with a CDN by prefixing or storing the CDN location as well.
Drawbacks, if you manually changed names of files on the server you'd have to rely on a function matching hashes to detect this, though this would also affect a File System Only that needs to detect a duplicate file potentially.
TLDR; the Hybrid approach is what you'll see most software use like WordPress or others and I believe would be considered a best practice while file system only is a bit of a hack.
Note: Database only could be a best approach in specific situations where you want database clustering and replication of images directly in your database rather than to a file system (especially if the file system is restricted access or unable to be modified for any reason, then you have full flexibility on the DB).
You can also use the blob datatypes from mysql. There you can save the image as binary data next to the data about the football pool.
So when you want to load an football pool you simple fire an sql statement and check if it returns a result, if so load the image from the database and display the data, otherwise throw an error.
If you have very frequent access you can simply put the images into a seperate table and load the image independent of the data about the football pool. Additional set some cache headers on the image and put it in a seperate file, this way you could simply save the primary key of the images in football table. Then you want to display the web page you simply load another document, pass it the primary key of the image, there the image will be loaded, or if the browser has it in cache, will load it from cache without querying the database.
This way you also have a better consistency of data and images.
Your uploading an image to specific folder and that too with poolid which will be unique. It should work just fine.
Problem :
The code you have written works great. But the problem is, for the first time if the image loaded is .png and second time loaded file in jpeg or jpg then file exists wont check that and hence it may fail.
Caution :
If you have already taken a caution to check that the image uploaded must and should be png than the file_exists will work great.
Alternate Solution :
In case if your not checking for the image type to be .png then I highly advice you to take a boolean image column in your table by is_image_uploaded or something which can be set once you upload the file every time.
This makes sure that in case next time you wan to upload the image then you can directly go and check in your database table and see that if is_image_uploaded column is set or not. If not set then upload or else ignore or do whatever you want

Create thumbnails on the fly with cache or on upload?

I'm currently rewriting a website that need a lot of different sizes for each images. In the past I was doing it by creating the thumbnails images for all sizes on the upload. But now I have a doubt about is performance. This is because now I have to change my design and half of my images are not of the right size. So I think of 2 solutions :
Keep doing this and add a button on the backend to re-generate all the images. The problem is that I always need to know every sizes needed by every part of the site.
Only upload the real size image, and when displaying it, put in the SRC tag something like sr="thumbs.php?img=my-image-path/image.jpg&width=120&height=120". Then create the thumb and display it. Also my script would check if the thumb already exists, if it does it doesn't need to recrate it so just display it. Each 5 Days launch a script with a crontask to delete all the thumbs (to be sure to only use the usefull ones).
I think that the second solution is better but I'm a little concern by the fact that I need to call php everytime an image is shown, even if it's already created, it's php that give it to display...
Thanks for your advises
Based on the original question and subsequent comments, it would sound like on-demand generation would be suitable for you, as it doesn't sound like you will have a demanding environment in terms of absolutely minimizing the amount of download time to the end client.
It seems you already have a grasp around the option to give your <img> tags a src value that is a PHP script, with that script either serving up a cached thumbnail if it exists, or generating it on the fly, caching it, and then serving it up, so let me give you another option.
Generally speaking, utilizing PHP to serve up static resources is not a great idea as you begin to scale your site as
This would require the additional overhead of invoking PHP to serve these sorts of requests, something much more optimized with the basic web server like Apache, Nginx, etc. This means your site is going to be able to handle less traffic per server because it is using extra memory, CPU, etc. in order to serve up this static content.
It makes it hard to move those static resources into a single repository outside of the server for serving up content (such as CDN). This means you have to duplicate your files on each and every web server you have powering a site.
As such, my suggestion would be to still serve up the images as static image files via the webserver, but generate thumbnails on the fly if they are missing. To achieve this you can simply create a custom redirect rule or 404 handler on the web server, such that requests in your thumbnail directory which do not match an existing thumbnail image could be redirected to a PHP script to automatically generate the thumbnail and serve up the image (without the browser even knowing it). Future requests against this thumbnail would be served up as a static image.
This scales quite nicely as, if in the future you have the need to move your static images to a single server (or CDN), you can just use an origin-pull mechanism to try to get the content from your main servers, which will auto-generate them via the same mechanism I just mentioned.
Use the second option, if you don't have too much storage and first if you don't have too much CPU.
Or you can combine these: generate and store the image at the first open of the php thumbnails generator and nex time just give back the cached image.
With this solution you'll have only the necessary images and if you want you can delete sometimes the older ones.

How to display images that are uploaded to my ftp folder

I am building a website where I am uploading images to my ftp folder through PHP script. Now I want to display those images on to my HTML pages. I was thinking about using PHP and getting array of all the images from my ftp folder and then display them using image view.
Please tell me if I am doing this the wrong way and if there is any other better alternatives to it. I was reading php manual for ftp_nlist and ftp_rawlist but did not understand.
Well it may depend on how many images you have in there. Probably the most "correct" way to do it would be to store the filenames in a DB. You could scan the entire folder, but for every single request that's potentially a lot of overhead rather than just grabbing them out of a DB.
Are you manually uploading the images? Give us more details on how that works and we can better serve you. If you're using a script to upload images (I've had lots of projects where that's the case), then you can just have the script insert those filepaths into the DB for you. If not, (you're manually uploading them), or if indeed there are not a large number of files, then scanning the folder wouldn't necessarily be a bad thing. I've used that method on smaller projects myself.
Read up on the php readdir function in the docs (which actually works a lot like mysql_fetch_assoc, ironically)- That will provide you with an excellent way to go without setting up a DB. For an approach where an upload script handles it, I recommend a DB. Without more info, it's hard to say.
Good luck!

How to find out how many times a file been downloaded?

I have an image that send to affiliate for advertising.
so, how can I find it out from my server the number of times that image been downloaded?
does server log keep track of image upload count?
---- Addition ----
Thanks for the reply.. few more questions
because I want to do ads rotation, and tracking IP address, etc.
so, i think I should do it by making a dynamic page (php) and return the proper images, right?
In this case, is there anyway that I can send that information to Google Analytics from the server? I know I can do it in javascript. but now, since the PHP should just return the images file. so what I should do? :)
Well This can be done irrespective of your web Server or Language / Platform.
Assuming the File is Physically stored in a Certain Directory.
Write a program that somehow gets to know which file has to be downloaded. Through GET/POST parameters. There can be even more ways.
then point that particullar file physically.
fopen that file
read through it byte by byte
print them
fclose
store/increment/updatethe download counter in database/flatfile
and in the database you may keep the record as md5checksum -> downloadCounter
It depends on a server and how you download the image.
1) Static image (e.g. URL points to actual image file): Most servers (e.g. Apache) store each URL served (including the GET request for the URL for the image) in access log. There are a host of solutions for slicing and dicing access logs from web servers (especially Apache) and obtaining all sorts of statistics including count of accesses.
2) Another approach for fancier stuff is to serve the image by linking to a dynamic page which does some sort of computation (from simple counter increment to some fancy statistics collection) and responds with HTTP REDIRECT to a real image.
Use Galvanize a PHP class for GA that'll allow you to make trackPageView (for a virtual page representing your download, like the file's url) from PHP.
HTTP log should have a GET for every time that image was accessed.
You should be able to configure your server to log each download. Then, you can just count the number of times the image appears in the log file.

How do I handle image management (upload, removal, etc.) in CakePHP?

I'm building a site were users can upload images and then "use" them. What I would like is some thoughts and ideas about how to manage temporary uploads.
For example, a user uploads an image but decides not to do anything with it and just leaves the site. I have then either uploaded the file to the server, or loaded it to the server memory, but how do I know when the image can be removed? First, I thought of just having a temporary upload folder which is emptied periodically, but it feels like there must be something better?
BTW I'm using cakePHP and MySQL. Although images are stored on the server, only the location is stored in the dbb.
Save the information about file to MySQL, and save the last time the image was viewed - can be done via some script that would be altered everytime the image is being used.. and check the database for images not used for 30 days, delete them..
You could try to define a "session" in some way and give the user some information about it. For example, in SO, there is a popup when you started an answer but try to leave the site (and your answer would be lost). You could do the same and delete the uploaded image if the user proceeds. Of course, you can still use a timeout or some other rules (maximum image folder size etc.).
I'm not sure what does "temporary upload" mean in your app. The file is either uploaded or not, and under the ownership of a user. If a user doesn't want to do anything at the moment, you have no other choice but to leave the file where it is.
What you can do is put a warning somewhere on your image management page about unused images, but removing them yourself seems like a bad practice (at least from the user perspective).
As a user,When I upload the image to a server(assuming I want to use it later) and leave the site, I don't expect it to be deleted if I am a registered user.
I would prefer it to be there in my acct until I come back.I would suggest thinking in those lines and implementing a solution to save the users' images if possible.
Check the last accessed/modified time of file to see it if has been used.

Categories