Amazon AWSSDKforPHP too slow - php

Amazon AWSSDKforPHP too slow
Hi there,
I'm using Amazon AWSSDKforPHP for connecting my web application with S3. But, there's an issue with the process or making requests to the service that make this too slow.
For example, I have this code:
// Iterate an array of user images
foreach($images as $image){
// Return the Bucket URL for this image
$urls[] = $s3->get_object_url($bucket, 'users/'.trim($image).'.jpg', '5 minutes');
}
Supposing that $images is an array of user pictures, this returns an array called $urls that have (As his name says) the URL of tha pictures with the credentials for 5 minutes. This request takes at least 6 seconds with 35 images, and that's ok. But.... when the pictures does not exists in the bucket, I want to assign a default image for the user, something like 'images/noimage.png'.
Here's the code:
// Iterate an array of user images
foreach($images as $image){
// Check if the object exists in the Bucket
if($s3->if_object_exists($bucket, 'users/'.trim($image).'.jpg')){
// Return the Bucket URL for this image
$urls[] = $s3->get_object_url($bucket, 'users/'.trim($image).'.jpg', '5 minutes');
} else {
// Return the default image
$urls[] = 'http://www.example.com/images/noimage.png';
}
}
And the condition works, but SLOOOOOW. With the the condition "$s3->if_object_exists()", the Script takes at least 40 seconds with 35 images!
I have modified my Script, making the request using cURL:
// Iterate an array of user images
foreach($images as $image){
// Setup cURL
$ch = curl_init($s3->get_object_url($bucket, 'users/'.trim($image).'.jpg', '1 minutes') );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
// Get Just the HTTP response code
$res = curl_getinfo($ch,CURLINFO_HTTP_CODE);
if($res == 200){ //the image exists
$urls[] = $s3->get_object_url($bucket, 'users/'.trim($image).'.jpg', '5 minutes');
}else{ // The response is 403
$urls[] = 'http://www.example.com/images/noimage.png';
}
}
And this modified Script takes between 16 and 18 Seconds. This is a big difference, but it's still a lot of time :(.
Please, any help is so much appreciated.
Thank you.

Why not change how you are doing your checks. Store the locations/buckets of the images locally in a database then this way you do not have to worry about this check?
This way you minimize the number of API calls you are doing which is 35 in your case now, but this could get exponentially large with time. And, not only are you doing one call per image but rather two calls per image for the most part. This is highly inefficient and reliant on your network connection to be fairly fast.
Moving the location data and if the image exists or not locally is a much better choice in terms of performance in this area. Also this check should only have to be done a single time it looks like anyways if you store the result ahead of time.

I would think that if you wanted to be able to read directory type of information from S3, you might best use something like s3fs to mount your bucket as a system drive. s3fs can also be configured with a local cache to speed things up (cache on fast ephemeral storage if you are using EC2).
This would allow you to do regular PHP directory handling (DirectoryIterator, etc.) with ease.
If this is more than you want to mess with, at least store the filename data in a databases and just expect the files to be in proper S3 locations or cache the results of individual API checks locally in some manner so as to not need to make an API call for each similar request.

It's slow because you're calling if_object_exists() in every iteration through the loop, kicking off a network request to AWS.
The user "thatidiotguy" said:
I do not know about the S3 API, but could you ask for a list of files in the bucket and do the string matching/searching yourself in the script? There is no way 34 string match tests should take anywhere near that long in a PHP script.
He's right.
Instead of calling if_object_exists(), you can instead call get_object_list() once — at the beginning of the script — then compare your user photo URL to the list using PHP's in_array() function.
You should see a speed-up of approximately a zillion percent. Don't quote me on that, though. ;)

Related

How to extract contents from URLs?

I am having a problem. This is what I have to do and the code is taking extremely long to run:
There is 1 website I need to collect data from, and to do so I need my algorithm to visit over 15,000 subsections of this website (i.e. www.website.com/item.php?rid=$_id), where $_id will be the current iteration of a for loop.
Here are the problems:
The method I am currently using to get the source code of each page is file_get_contents, and, as you can imagine, it takes super long to file_get_contents of 15,000+ pages.
Each page contains over 900 lines of code, but all I need to extract is about 5 lines worth, so it seems as though the algorithm is wasting a lot of time by retrieving all 900 lines of it.
Some of the pages do not exist (i.e. maybe www.website.com/item.php?rid=2 exists but www.website.com/item.php?rid=3 does not), so I need a method of quickly skipping over these pages before the algorithm tries to fetch its contents and waste a bunch of time.
In short, I need a method of extracting a small portion of the page from 15,000 webpages in as quick and efficient a manner as possible.
Here is my current code.
for ($_id = 0; $_id < 15392; $_id++){
//****************************************************** Locating page
$_location = "http://www.website.com/item.php?rid=".$_id;
$_headers = #get_headers($_location);
if(strpos($_headers[0],"200") === FALSE){
continue;
} // end if
$_source = file_get_contents($_location);
//****************************************************** Extracting price
$_needle_initial = "<td align=\"center\" colspan=\"4\" style=\"font-weight: bold\">Current Price:";
$_needle_terminal = "</td>";
$_position_initial = (stripos($_source,$_needle_initial))+strlen($_needle_initial);
$_position_terminal = stripos($_source,$_needle_terminal);
$_length = $_position_terminal-$_position_initial;
$_current_price = strip_tags(trim(substr($_source,$_position_initial,$_length)));
} // end for
Any help at all is greatly appreciated since I really need a solution to this!
Thank you in advance for your help!
the short of it: don't.
longer: If you want to do this much work, you shouldn't do it on demand. Do it in the background! You can use the code you have here, or any other method you're comfortable with, but instead of showing it to a user, you can save it in a database or a local file. Call this script with a cron job every x minutes (depends on the interval you need), and just show the latest content from your local cache (be it a database or a file).

PHP: Displaying an image from a web service

I'm using an external web service that will return an image URL which i will display in my website, For example :
$url = get_from_web_service();
echo '<img url="'.$url.'" />';
everything is working fine except if i have 100 images to show then calling the web service become time & resources consuming.
//the problem
foreach($items as $item) {
$url = get_from_web_service($item);
echo '<img url="'.$url.'" />';
}
So now i'm considering two options:
//Option1: Using php get_file_contents():
foreach($items as $item)
{
echo '<img url="url_to_my_website/get_image.php?id='.$item->id.'" />'
}
get_image.php :
$url = get_from_web_service($id);
header("Content-Type: image/png");
echo file_get_contents($url);
//Option2: Using ajax:
echo '<img scr="dummy_image_or_website_logo" data-id="123" />';
//ajax call to the web service to get the id=123 and get the url then add the src attribute to that image.
THOUGHTS
First option seems more straight forward, but my server might be
overloaded and involved in every single image request.
Second option it's all done by browser & web service so my server is not involved at all. but for each image i'm making 2 calls 1 ajax call to get the image URL and another one one to get the image. so loading time might be vary and ajax calls might fail for large number of calls.
Information
Around 50 Images will be displayed in that page.
This service will be used by around a 100 user at a given time.
I have no control over the web service so i can't change its functionality and it doesn't accept more than 1 image ID for each call.
My Questions
Any better option i should consider?
If not, which option should I follow? and most important why i should follow that one?
Thanks
Method 1: Rendering in PHP
Pros:
Allows for custom headers that're independent of any server software. If you're using something that's not generally cached (like a PHP file with a query string) or are adding this to a package that needs header functionality regardless of server software, this is a very good idea.
If you know how to use GD or Imagick, you can easily resize, crop, compress, index, etc. your images to reduce the image file size (sometimes drastically) and make the page load significantly faster.
If width and height are passed as variables to the PHP file, the dimensions can be set dynamically:
<div id="gallery-images">
<noscript>
<!-- So that the thumbnail is small for old mobile devices //-->
<img src="get-image.php?id=123&h=200&w=200" />
</noscript>
</div>
<script type="text/javascript">
/* Something to create an image element inside of the div.
* In theory, the browser height and width can be pulled dynamically
* on page load, which is useful for ensuring that images are no larger
* than they need to be. Having a function to load the full image
* if the borwser becomes bigger isn't a bad idea though.
*/
</script>
This would be incredibly considerate of mobile users on a page that has an image gallery. This is also very considerate of users with limited bandwidth (like almost everyone in Alaska. I say this from personal experience).
Allows you to easily clear the EXIF data of images if they're uploaded by users on the website. This is important for user privacy as well as making sure there aren't any malicious scripts living in your JPGs.
Gives potential to dynamically create a large image sprite and drastically reduce your HTTP requests if they're causing latency. It'd be a lot of work so this isn't a very strong pro, but it's still something you can do using this method that you can't do using the second method.
Cons:
Depending on the number and size of images, this could put a lot of strain on your server. When used with browser-caching, the dynamic images are being pulled from cache instead of being re-generated, however it's still very easy for a bot to be served the dynamic image a number of times.
It requires knowledge of HTTP headers, basic image manipulation skills, and an understanding of how to use image manipulation libraries in PHP to be effective.
Method 2: AJAX
Pros:
The page would finish loading before any of the images. This is important if your content absolutely needs to load as fast as possible, and the images aren't very important.
Is far more simple, easy and significantly faster to implement than any kind of dynamic PHP solution.
It spaces out the HTTP requests, so the initial content loads faster (since the HTTP requests can be sent based on browser action instead of just page load).
Cons:
It doesn't decrease the number of HTTP requests, it simply spaces them out. Also note that there will be at least one additional external JS file in addition to all of these images.
Displays nothing if the end device (such as older mobile devices) does not support JavaScript. The only way you could fix this is to have all of the images load normally between some <noscript> tags, which would require PHP to generate twice as much HTML.
Would require you to add loading.gif (and another HTTP request) or Please wait while these images load text to your page. I personally find this annoying as a website user because I want to see everything when the page is "done loading".
Conclusion:
If you have the background knowledge or time to learn how to effectively use Method 1, it gives far more potential because it allows for manipulation of the images and HTTP requests sent by your page after it loads.
Conversely, if you're looking for a simple method to space out your HTTP Requests or want to make your content load faster by making your extra images load later, Method 2 is your answer.
Looking back at methods 1 and 2, it looks like using both methods together could be the best answer. Having two of your cached and compressed images load with the page (one is visible, the other is a buffer so that the user doesn't have to wait every time they click "next"), and having the rest load one-by-one as the user sees fit.
In your specific situation, I think that Method 2 would be the most effective if your images can be displayed in a "slideshow" fashion. If all of the images need to be loaded at once, try compressing them and applying browser-caching with method 1. If too many image requests on page load is destroying your speed, try image spriting.
As of now, you are contacting the webservice 100 times. You should change it so it contacts the webservice only once and retrieves an array of all the 100 images, instead of each image separately.
You can then loop over this array, which will be very fast as no further webtransactions are needed.
If the images you are fetching from the webservice are not dynamic in nature i.e. do not get changed/modified frequently, I would suggest to setup a scheduled process/cron job on your server which gets the images from the webservice and stores locally (in your server itself), so you can display images on the webpage from your server only and avoid third party server round trip every time webpage is served to the end users.
Both of the 2 option cannot resolve your problem, may be make it worse.
For option 1:
The process where cost most time is "get_from_web_service($item)", and the code is only made it be executed by another script( if the file "get_image.php" is executed at the same server).
For option 2:
It only make the "get-image-resource-request" being trigger by browser, but your server has also need to process the "get_from_web_service($item)".
One thing must be clear is that the problem is about the performance of get_from_web_service, the most straight proposal is to make it have a better performance. On the other hand, we can make it reduce the number of concurrent connections. I haven't thought this through, only have 2 suggestion:
Asynchronous: The user didn't browse your whole page, they only notice the page at the top. If your mentioned images does not all displayed at the top, you can use jquery.lazyload extension, it can make the image resource at invisible region do not request the server until they are visible.
CSS Sprites : An image sprite is a collection of images put into a single image. If images on your page does not change frequency, you can write some code to merge them daily.
Cache Image : You can cache your image at your server, or another server (better). And do some key->value works: key is about the $item, value is the resource directory(url).
I am not a native english speaker, hope I made it clear and helpful to you.
im not an expert, but im thinking everytime you echo, it takes time. getting 100 images shouldnt be a problem (solely)
Also. maybe get_from_web_service($item); should be able to take an array?
$counter = 1;
$urls = array();
foreach($items as $item)
{
$urls[$counter] = get_from_web_service($item);
$counter++;
}
// and then you can echo the information?
foreach($urls as $url)
{
//echo each or use a function to better do it
//echo '<img url="url_to_my_website/get_image?id='.$url->id.'" />'
}
get_image.php :
$url = get_from_web_service($item);
header("Content-Type: image/png");
echo file_get_contents($url);
at the end, it would be mighty nice if you can just call
get_from_web_service($itemArray); //intake the array and return images
Option 3:
cache the requests to the web service
Option one is the best option. I would also want to make sure that the images are cached on the server, so that multiple round trips are not required from the original web server for the same image.
If your interested, this is the core of the code that I use for caching images etc (note, that a few things, like reserving the same content back to the client etc is missing):
<?php
function error404() {
header("HTTP/1.0 404 Not Found");
echo "Page not found.";
exit;
}
function hexString($md5, $hashLevels=3) {
$hexString = substr($md5, 0, $hashLevels );
$folder = "";
while (strlen($hexString) > 0) {
$folder = "$hexString/$folder";
$hexString = substr($hexString, 0, -1);
}
if (!file_exists('cache/' . $folder))
mkdir('cache/' . $folder, 0777, true);
return 'cache/' . $folder . $md5;
}
if (!isset($_GET['img']))
error404();
getFile($_GET['img']);
function getFile($url) {
// true to enable caching, false to delete cache if already cached
$cache = true;
$defaults = array(
CURLOPT_HEADER => FALSE,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_MAXCONNECTS => 15,
CURLOPT_CONNECTTIMEOUT => 30,
CURLOPT_TIMEOUT => 360,
CURLOPT_USERAGENT => 'Image Download'
);
$ch = curl_init();
curl_setopt_array($ch, $defaults);
curl_setopt($ch, CURLOPT_URL, $_GET['img']);
$key = hexString(sha1($url));
if ($cache && file_exists($key)) {
return file_get_contents($key);
} elseif (!$cache && file_exists($key)) {
unlink($key);
}
$data = curl_exec($this->_ch);
$info = curl_getinfo($this->_ch);
if ($cache === true && $info['http_code'] == 200 && strlen($data) > 20)
file_put_contents($key, $data);
elseif ($info['http_code'] != 200)
error404();
return $data;
}
$content = getURL($_GET['img']);
if ($content !== null or $content !== false) {
// Success!
header("Content-Type: image");
echo $content;
}
None of the two options will resolve server resources usage issue. Out of the two, though, I would recommend option 1. The second one will delay page loading, causing website speed to slow down, and reducing your SEO ratings.
Best option for you would be something like:
foreach($items as $item) {
echo '<img url="url_to_my_website/get_image.php?id='.$item->id.'" />'
}
Then where the magic happens is get_image.php:
if(file_exists('/path_to_local_storage/image_'.$id.'.png')) {
$url = '/path_to_images_webfolder/image_'.$id.'.png';
$img = file_get_contents($url);
} else {
$url = get_from_web_service($id);
$img = file_get_contents($url);
$imgname = end(explode('/', $url));
file_put_contents($imgname, $img);
}
header("Content-Type: image/png");
echo $img;
This was you will only run the request to web service once per image, and then store it on your local space. Next time the image is requested - you will serve it form your local space, skipping request to web service.
Of course, considering image IDs to be unique and persistent.
Probably not the best solution, but should work pretty well for you.
As we see that above you're including an URL to the web service provided image right in the <img> tag src attribute, one can safely assume that these URLs are not secret or confidential.
Knowing that above, the following snippet from the get_image.php will work with the least overhead possible:
$url = get_from_web_service($id);
header("Location: $url");
If you're getting a lot of subsequent requests to the same id from a given client, you can somewhat lessen number of requests by exploiting browser's internal cache.
header("Cache-Control: private, max-age=$seconds");
header("Expires: ".gmdate('r', time()+$seconds));
Else resort to server-side caching by means of Memcached, database, or plain files like so:
is_dir('cache') or mkdir('cache');
$cachedDataFile = "cache/$id";
$cacheExpiryDelay = 3600; // an hour
if (is_file($cachedDataFile) && filesize($cachedDataFile)
&& filemtime($cachedDataFile) + $cacheExpiryDelay > time()) {
$url = file_get_contents($cachedDataFile);
} else {
$url = get_from_web_service($id);
file_put_contents($cachedDataFile, $url, LOCK_EX);
}
header("Cache-Control: private, max-age=$cacheExpiryDelay");
header("Expires: ".gmdate('r', time() + $cacheExpiryDelay));
header("Location: $url");

When parsing XML with PHP, it only shows the first record from the file, not all of them

Below is my code:
foreach(simplexml_load_file('http://www.bbc.co.uk/radio1/playlist.xml')->item as $link){
$linked = $link->artist;
$xml_data = file_get_contents('http://ws.audioscrobbler.com/2.0/?method=artist.getimages&artist=' . $linked . '&api_key=b25b959554ed76058ac220b7b2e0a026');
$xml = new SimpleXMLElement($xml_data);
foreach($xml->images as $test){
$new = $test->image->sizes->size[4];
echo "<img src='$new'>";
?><br /><?php
}}
?>
This does work, but it only displays one record from many, it shows the first record from the XML file. I want it to display all of the records.
What I am trying to achieve from this code is:
I have an xml file I am getting the artist name from, im then listing all of the artist names and inserting them into a link which is therefore dynamically created from them generated artist names. I then want to take the dynamically created link, which is another xml file and parse that file to get the size node which is an image link (the image is of the artist). I then want to echo that image link out into an image tag which displays the image.
It partially works, but as I said earlier, It only displays one record instead of all the records in the xml file.
The returned XML is structured like this:
<lfm>
<images>
<image>
<image>
…
<image>
Which means you have to iterate
$xml->images->image
Example:
$lfm = simplexml_load_file('http://…');
foreach ($lfm->images->image as $image) {
echo $image->sizes->size[4];
}
On a sidenote, there is no reason to use file_get_contents there. Either use simplexml_load_file or use new SimpleXmlElement('http://…', false, true). And really no offense, but given that I have already given you an almost identical solution in the comments to When extracting artist name from XML file only 1 record shows I strongly suggest you try to understand what is happening there instead of just copy and pasting.
Problems:
Rate limiting. My comment from the question:
Please note how much network traffic you are generating on each execution of this script, and cache accordingly. It's quite possible you could be rate-limited if you execute too often or too many times in a day (and API rate-limits are often a lot lower than one might think).
Even if you are just "testing", or you and a "few other people" use this, every single request makes 40 automated requests to ws.audioscrobbler.com! They are not going to be happy about this, and since it appears they are smart, they have banned this kind of traffic.
When I run this script, ws.audioscrobbler.com serves up the first result (Artist: Adele), but gives request-failed warnings on many subsequent requests until some time period has passed, obviously due to a rate limit.
Remedies:
Check if the API for ws.audioscrobbler.com has a multiple-artist query. This would allow you to get multiple artists with one request.
Create a manager interface that can get AND CACHE results for one artist at a time. Then, perform this process when you need updates and use the cached results all other times.
Regardless of which method you use, cache, cache, cache!
Wrong argument supplied to the inner foreach. file_get_contents returns a string. Even though the contents are XML, you haven't loaded it into an XML parser. You need to do that before you can iterate on it.

Parse Google Images API Json PHP

Hey, well I'm trying to use google images api with PHP, and I'm really not sure what to do.
This is basically what I have right now:
$jsonurl = "https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=test";
$json = file_get_contents($jsonurl,0,null,null);
$json_output = json_decode($json);
Where would I go from there to retrieve the first image url?
With a minor change to the last line of your code sample, the following will output the url of the first image in the result set.
<?php
$jsrc = "https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=test";
$json = file_get_contents($jsrc);
$jset = json_decode($json, true);
echo $jset["responseData"]["results"][0]["url"];
?>
For security reasons, most server configurations won't let you use file_get_contents on a remote file (different domain name). It would potentially allow a hacker to load code from anywhere on the Internet to your site, then execute it.
Even if your server configuration does allow for it, then I wouldn't recommend using it for this purpose. The standard tool for retrieving remote HTTP data is cURL, and there are plenty of good tutorials out there doing exactly what you should do in this case.
So, let's say you've successfully used cURL to retrieve the JSON array.
$json_output = json_decode($json); // Now the JSON is an associative array
foreach ($json_output['responseData']['results'] as $result)
{
echo $result['url'] . '<br />';
}
Of course, you don't have to echo the URL there; you can do whatever you need to with the value.
I have to say, this is 10 shades of awesome.. But I come with bad news (don't shoot the messenger..)
Important: The Google Image Search API has been officially deprecated as of May 26, 2011. It will continue to work as per our deprecation policy, but the number of requests you may make per day may be limited.
That is, as they same, lame.
I feel as if Google might have hired one-too-many layed-off-from-IBM-types... as they seem to be killing off all their "cool" API's.
They launch services haphazardly, promising this and that and the other thing... but then some middle-manager gets screamed at after realizing (ta-da!) that XYZ project doesn't generate income (like image results without ads, lol) and then... they axe it..
Lesson: Don't get married (aka build your software or service) around any google API you can't replace on-the-fly at a moment's notice... Now, I'm no LTS-junkie - I'm just bitter because I'd much rather get my Google results via XML or JSON than the icky HTML-soup they throw at you...
One Question #Marcel... How can I get an array, or at least multiple JSON result responses using that same "formula". I thought maybe the 1 meant "result 1", but alas, no... Is their a "trick" to generate a content stream ala a Picasa xml feed?

How do I display protected Amazon S3 images on my secure site using PHP?

I am trying to move images for my site from my host to Amazon S3 cloud hosting. These images are of client work sites and cannot be publicly available. I would like them to be displayed on my site preferably by using the PHP SDK available from Amazon.
So far I have been able to script for the conversion so that I look up records in my database, grab the file path, name it appropriately, and send it to Amazon.
//upload to s3
$s3->create_object($bucket, $folder.$file_name_new, array(
'fileUpload' => $file_temp,
'acl' => AmazonS3::ACL_PRIVATE, //access denied, grantee only own
//'acl' => AmazonS3::ACL_PUBLIC, //image displayed
//'acl' => AmazonS3::ACL_OPEN, //image displayed, grantee everyone has open permission
//'acl' => AmazonS3::ACL_AUTH_READ, //image not displayed, grantee auth users has open permissions
//'acl' => AmazonS3::ACL_OWNER_READ, //image not displayed, grantee only ryan
//'acl' => AmazonS3::ACL_OWNER_FULL_CONTROL, //image not displayed, grantee only ryan
'storage' => AmazonS3::STORAGE_REDUCED
)
);
Before I copy everything over, I have created a simple form to do test upload and display of the image. If I upload an image using ACL_PRIVATE, I can either grab the public url and I will not have access, or I can grab the public url with a temporary key and can display the image.
<?php
//display the image link
$temp_link = $s3->get_object_url($bucket, $folder.$file_name_new, '1 minute');
?>
<a href='<?php echo $temp_link; ?>'><?php echo $temp_link; ?></a><br />
<img src='<?php echo $temp_link; ?>' alt='finding image' /><br />
Using this method, how will my caching work? I'm guessing every time I refresh the page, or modify one of my records, I will be pulling that image again, increasing my get requests.
I have also considered using bucket policies to only allow image retrieval from certain referrers. Do I understand correctly that Amazon is supposed to only fetch requests from pages or domains I specify?
I referenced:
https://forums.aws.amazon.com/thread.jspa?messageID=188183&#188183 to set that up, but then am confused as to which security I need on my objects. It seemed like if I made them Private they still would not display, unless I used the temp link like mentioned previously. If I made them public, I could navigate to them directly, regardless of referrer.
Am I way off what I'm trying to do here? Is this not really supported by S3, or am I missing something simple? I have gone through the SDK documentation and lots of searching and feel like this should be a little more clearly documented so hopefully any input here can help others in this situation. I've read others who name the file with a unique ID, creating security through obscurity, but that won't cut it in my situation, and probably not best practice for anyone trying to be secure.
The best way to serve your images is to generate a url using the PHP SDK. That way the downloads go directly from S3 to your users.
You don't need to download via your servers as #mfonda suggested - you can set any caching headers you like on S3 objects - and if you did you would be losing some major benefits of using S3.
However, as you pointed out in your question, the url will always be changing (actually the querystring) so browsers won't cache the file. The easy work around is simply to always use the same expiry date so that the same querystring is always generated. Or better still 'cache' the url yourself (eg in the database) and reuse it every time.
You'll obviously have to set the expiry time somewhere far into the future, but you can regenerate these urls every so often if you prefer. eg in your database you would store the generated url and the expiry date(you could parse that from the url too). Then either you just use the existing url or, if the expiry date has passed, generate a new one. etc...
You can use bucket policies in your Amazon bucket to allow your application's domain to access the file. In fact, you can even add your local dev domain (ex: mylocaldomain.local) to the access list and you will be able to get your images. Amazon provides sample bucket policies here: http://docs.aws.amazon.com/AmazonS3/latest/dev/AccessPolicyLanguage_UseCases_s3_a.html. This was very helpful to help me serve my images.
The policy below solved the problem that brought me to this SO topic:
{
"Version":"2008-10-17",
"Id":"http referer policy example",
"Statement":[
{
"Sid":"Allow get requests originated from www.example.com and example.com",
"Effect":"Allow",
"Principal":"*",
"Action":"s3:GetObject",
"Resource":"arn:aws:s3:::examplebucket/*",
"Condition":{
"StringLike":{
"aws:Referer":[
"http://www.example.com/*",
"http://example.com/*"
]
}
}
}
]
}
When you talk about security and protecting data from unauthorized users, something is clear: you have to check every time you access that resource that you are entitled to.
That means, that generating an url that can be accessed by anyone (might be difficult to obtain, but still...). The only solution is an image proxy. You can do that with a php script.
There is a fine article from Amazon's blog that sugests using readfile, http://blogs.aws.amazon.com/php/post/Tx2C4WJBMSMW68A/Streaming-Amazon-S3-Objects-From-a-Web-Server
readfile('s3://my-bucket/my-images/php.gif');
You can download the contents from S3 (in a PHP script), then serve them using the correct headers.
As a rough example, say you had the following in image.php:
$s3 = new AmazonS3();
$response = $s3->get_object($bucket, $image_name);
if (!$response->isOK()) {
throw new Exception('Error downloading file from S3');
}
header("Content-Type: image/jpeg");
header("Content-Length: " . strlen($response->body));
die($response->body);
Then in your HTML code, you can do
<img src="image.php">

Categories