Amazon S3: What are considered PUT/COPY/POST/LIST request? - php

Kindly confirm if this correct:
PUT is probably uploading files to S3?
COPY is probably copying files within S3?
How about POST and LIST?
Additional question, is get_bucket_filesize() and get_object_filesize() (from PHP SDK) considered a LIST request?

From my experience using S3 (and also from the basics of HTTP protocol and REST), POST is the creation of a new object (in S3, it would be the upload of a new file), and PUT is a creation of a new object or update of an existing object (i.e., creation or update of a file). Additionally, from S3 docs:
POST is an alternate form of PUT that enables browser-based uploads as
a way of putting objects in buckets
Every time you, for example, get the contents of a given S3 bucket, you're running into a LIST operation. You have not asked, but a GET is the download of a file from S3 and DELETE would obviously be the deletion of a file. Of course these assumptions depend on which SDK you are using (it seems you're using the PHP one) and its underlying implementation. My argument is that it is possible to implement a download using a GET, an upload using a PUT or a POST, and so forth.
Taking a look into S3 REST API, though, I assume get_bucket_filesize() is implemented as a LIST (a GET operation on a bucket brings, along with some more data, the size of each object in the response) and get_object_filesize() is implemented as a GET (using the HEAD operation on a single file also brings its size included in the metadata).

Yes, you are right. PUT is uploading (specifically one file is one PUT). I was watching for whether PUT was per file or per some packet size which would make it more difficult to price. It is putting a file (without reference to size).
ALSO, COPY indeed is copying files within S3, but there’s more. See below.
I also found references to POST and LIST; see below.
So what I learned about PUT/COPY/POST/LIST and GET Requests while digging in to assess our costs. I’m also including WHERE I discovered it (wanted to get it all from Amazon). All corrections are welcome.
Amazon's FAQ is here: https://aws.amazon.com/s3/faqs/ and I'll reference this below.
COPY can be several things, one of which is copying between regions which does cost. For example, if you store in West VA, and COPY to the Northern CA region, that incurs cost. Copying from EC2 to S3 (within the same region I presume) incurs no transfer cost. See Amazon's FAQ in the section Q: How much does Amazon S3 cost?
NOTE: Writing a file, then re-writing that same file stores both versions (unless you delete something). I’m guessing you are not charged more if the files are exactly the same, but don’t send me the bill if I’m wrong. :-) It seems that the average size (for a month) is what is billed. See FAQ (link above)
For PUT, GET and DELETE, it appears one file is one transaction. That answers a big question for me (I didn’t want their 128k minimum size to be a PUT for each 128k packet… yeah, I’m paranoid). See the Question section like this:
Q: How will I be charged and billed for my use of Amazon S3?
Request Example:
Assume you transfer 10,000 files into Amazon S3 and transfer 20,000 files out of Amazon S3 each day during the month of March. Then, you delete 5,000 files on March 31st.
Total PUT requests = 10,000 requests x 31 days = 310,000 requests
Total GET requests = 20,000 requests x 31 days = 620,000 requests
Total DELETE requests = 5,000×1 day = 5,000 requests
LIST is mentioned under the question:
Q: Can I use the Amazon S3 APIs or Management Console to list objects that I’ve archived to Amazon Glacier?
It is essentially getting a list of files… a directory, if you will.
POST is mentioned under RESTObjectPost.html here: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html
I hope that helps. It sure made me more comfortable with what we would be charged.

There is not much of a difference between PUT and POST. The following was copied from AWS S3 documentation.
POST is an alternate form of PUT that enables browser-based uploads as a way of putting objects in buckets. Parameters that are passed to PUT via HTTP Headers are instead passed as form fields to POST in the multipart/form-data encoded message body.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html
As others has specified LIST is for listing objects. You can find all the operations in following link.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketOps.html

Related

REST API with images

I am in the process of putting together a REST API of an image application to be consumed by an Angular Frontend. The API is being put together using PHP. All of the images are securely stored outside of the webroot.
Problem is that I am converting all my images to base64, it increases the payload, in some cases I have 40 images display on a page, not uncommon to wait 30-40 seconds due to the huge payload.
What is the best practice for presenting images using REST API? I have searched round, there is nothing that exactly addresses the problem. Code below. The base64 images bloats the payload by an incredible amount. Any pointers please.
//create presentation array
$presentation_arr=array();
$presentation_arr["records"]=array();
$LargeImageName = $slideName;
$LargefileDir = $largefolder. $fileid . '/';
$Largefile = $LargefileDir . $LargeImageName;
if (file_exists($Largefile)){
$b64largeImage = base64_encode(file_get_contents($Largefile));
$datafullpath = 'data:image/jpg;base64,$b64image';
}
$presentation_item=array(
"id" => $id,
"smallimage" => $b64image,
"largeimage" => $b64largeImage
);
array_push($presentation_arr["imagerecords"], $presentation_item);
Two approaches:
Create a "wrapper" endpoint that is just a proxy to the final image itself (e.g. does a readfile() internally, see this: https://stackoverflow.com/a/1353867/1364793)
Host the images at a static, web accessible folder (or even consider S3 as a storage for static assets). Then, your main endpoint just returns publicly accessile URLs to those.
You write that you are serving images as base64 encoded blobs due to security concerns, including scraping.
To meet this security requirement, you are incurring a significant performance penalty, in server-side encoding energy, in file transfer and rendering time on the client.
To improve server-side performance, you can cache the encoded version; you could write $b64largeImage to disk in the same directory, check whether it exists and send it to the client.
To improve transfer time, make sure you've got GZIP enabled on the server; this should compress your data.
However, client side performance will remain a problem - your images will most likely not be cached on the client, and decoding the images (especially if there are 40 on each page) can consume a decent amount of CPU (especially on mobile devices).
You then get the problem that if a browser can decode the image, and attakcer/scraper can too, and they can store a copy of that image. So, all that effort doesn't really buy you very much secrecy.
Of course, you may want to avoid having a 3rd party embedding your images in their pages, or you may want to avoid having them scrape your images easily.
In that case, you may want to focus on having URLs that are hard/impossible to guess, or expire. This will hurt your SEO, so it's a trade-off. S3 has expiring URLs, or you could create a service which checks the referrer for each request and only honours image requests from white-listed domains, or create your own expiring image URL service - but in each case, you'd serve JPEG/GIF/PNG images so you get small file sizes and limited decoding time.

How to implement video-streaming/ chunked encoding with Azure Blob Storage in PHP?

I try to setup a php api that is able to stream a ".mp4" video coming from Azure Blob Storage.
What i want to achieve is, that not the whole video must be downloaded before the video starts.
I'm using Slim Framework 3 for my Rest-API.
I've already implemented "normal" video playback, but it takes very long until the video actually starts playing.
I am using the Azure Storage for PHP Sdk to access the Blob storage, then i get the blob and use "fpassthru" to write the video in the http response.
Additionally i set the "content-type" and "content-length" header.
$blob = $this->blobClient->getBlob($this->ContainerName, $filename);
fpassthru($blob->getContentStream());
$response = $response->withHeader('Content-type', 'video/mp4');
Right now it takes a very long time until the (about 30 mb) video starts playing, because all the data must be downloaded before the video begins.
I would like to know if it's possible to enable a sort of "chunked" playback, that starts when a part of the video data has already arrived.
I think the best way would be to stream your video using an HLS(RFC 8216) implementation.
It won't be simple, as you'll need to:
Provide an endpoint for getting index files
Breakdown your mp4 file into smaller chunks that can be downloaded separately by HLS supporting players (I think most ones do this days)
Might need to manage the chunks internally, as breaking the mp4's each time a user request the video (or part of it) is very inefficient.
Also, you might want to take a look at something like hls-video-generater , or any other alternative you might find.
Update
If you're already using Azure, I would suggest you take a look at Media Services, it should give you what you want with very little effort.
Hope it helps!

file_exists on Amazon S3

I have a web page that lists thousands of links to image files. Currently the way this is handled is with a very large HTML file that is manually edited with the name of the image and links to the image file. The images aren't managed very well so often many of the links are broken or the name is wrong.
Here is an example of one line of the thousands of lines in the HTML file:
<h4>XL Green Shirt<h4>
<h5>SKU 158f15 </h5>
[TIFF]
[JPEG]
[PNG]
<br />
I have the product information about the images in a database, so my solution was to write a page in PHP to iterate through each of the product numbers in the database and see if a file existed with the same id and then display the appropriate link and information.
I did this with the PHP function file_exists() since the product id is the same as the file name, and it worked fine on my local machine. The problem is all the images are hosted on AmazonS3, so running this function thousands of times to S3 always causes the request to time out. I've tried similar PHP functions as well as pinging the URL and testing for a 200 or 404 response, all time out.
Is there a solution that can check the existence of a file on a remote URL and consume few resources? Or is there a more novel way I can attack this problem?
I think you would be better served to make sure you enforce the existence of a file upon placing the record in the database than trying to check for the existence of thousands of files on each and every page load.
That being said, an alternate solution would possibly to use s3fs with local storage cache directory within which to check for existence of the file. This would be much faster than checking your S3 storage directly. s3fs would also provide a convenient way to write new files into the S3 storage.

Dealing with large amounts of data via XML API

So, I searched some here, but couldn't find anything good, apologies if my search-fu is insufficient...
So, what I have today is that my users upload a CSV text file using a form to my PHP script, and then I import that file into a database, after validating every line in it. The text file can be put to about 70,000 lines long, and each lines contains 24 fields of values. This is obviously not a problem since dealing with that kind of data. Every line needs to be validated plus I check the DB for duplicates (according to a dynamic key generated from the data) to determine if the data should be inserted or updated.
Right, but my clients are now requesting an automatic API for this, so they don't have to manually create and upload a text file. Sure, but how would I do it?
If I were to use a REST server, memory would run out pretty quickly if one request contained XML for 70k posts to be inserted, so that's pretty much out of the question.
So, how should I do it? I have thought about three options, please help med decide or add more options to the list
One post per request. Not all clients have 70k posts, but an update to the DB could result in the API handling 70k requests in a short period, and it would probably be daily either way.
X amount of posts per request. Set a limit to the number of posts that the API deals with per request is set to, say, 100 at a time. This means 700 requests.
The API requires for the client script to upload a CSV file ready to import using the current routine. This seems "fragile" and not very modern.
Any other ideas?
If you read up on SAX processing http://en.wikipedia.org/wiki/Simple_API_for_XML and HTTP Chunk Encoding http://en.wikipedia.org/wiki/Chunked_transfer_encoding you will see that it should be feasible to parse the XML document whilst it is being sent.
I have now solved this by imposing a limit of 100 posts per request, and I am using REST through PHP to handle the data. Uploading 36,000 posts takes about two minutes with all the validation.
First of all don't use XMl for this! Use JSON, it is fastest than xml.
I Use on my project import from xls. file is very large, but script work fine, just client must create files with same structure for import

How to find out how many times a file been downloaded?

I have an image that send to affiliate for advertising.
so, how can I find it out from my server the number of times that image been downloaded?
does server log keep track of image upload count?
---- Addition ----
Thanks for the reply.. few more questions
because I want to do ads rotation, and tracking IP address, etc.
so, i think I should do it by making a dynamic page (php) and return the proper images, right?
In this case, is there anyway that I can send that information to Google Analytics from the server? I know I can do it in javascript. but now, since the PHP should just return the images file. so what I should do? :)
Well This can be done irrespective of your web Server or Language / Platform.
Assuming the File is Physically stored in a Certain Directory.
Write a program that somehow gets to know which file has to be downloaded. Through GET/POST parameters. There can be even more ways.
then point that particullar file physically.
fopen that file
read through it byte by byte
print them
fclose
store/increment/updatethe download counter in database/flatfile
and in the database you may keep the record as md5checksum -> downloadCounter
It depends on a server and how you download the image.
1) Static image (e.g. URL points to actual image file): Most servers (e.g. Apache) store each URL served (including the GET request for the URL for the image) in access log. There are a host of solutions for slicing and dicing access logs from web servers (especially Apache) and obtaining all sorts of statistics including count of accesses.
2) Another approach for fancier stuff is to serve the image by linking to a dynamic page which does some sort of computation (from simple counter increment to some fancy statistics collection) and responds with HTTP REDIRECT to a real image.
Use Galvanize a PHP class for GA that'll allow you to make trackPageView (for a virtual page representing your download, like the file's url) from PHP.
HTTP log should have a GET for every time that image was accessed.
You should be able to configure your server to log each download. Then, you can just count the number of times the image appears in the log file.

Categories