AWS S3 PHP Progress Bar (Server to Cloud) - php

I have an image upload system in my application written in PHP. The file browser opens, user picks an image, I upload it to my server, I crop, I resize, I apply a watermark to it. Bottom line is the images are in my server. At some point, the user clicks a button and then I move those files to my S3 bucket. Naturally, I need a progress bar because, ze client wants a progress bar.
Now uploading the files is quite easy:
$result = $this->awsS3Client->putObject(array(
'Bucket' => 'bad-dum-tss-bucket',
'Key' => $destinationFilePath,
'SourceFile' => $sourceFilePath,
'ContentType' => $mimeType,
'ACL' => 'public-read',
));
I can even go multi-part
$uploader = UploadBuilder::newInstance()
->setClient($this->awsS3Client)
->setSource($sourceFilePath)
->setBucket( 'bad-dum-tss-bucket')
->setKey($destinationFilePath)
->build();
try {
$uploader->upload();
} catch (MultipartUploadException $e) {
$uploader->abort();
}
No problem there until I realize my client needs a freaking progress bar. Now I've searched a lot and all I can see are links to uploaders such as http://fineuploader.com/ that assumes that the upload will happen directly from the browser (i.e. not from my server). So PHP-progress bar-S3, anybody?

If you're still interested, I found a way to track progress in PHP with AWS SDK v3.
$client = new S3Client(/* config */);
$result = $client->putObject([
'Bucket' => 'bucket-name',
'Key' => 'bucket-name/file.ext',
'SourceFile' => 'local-file.ext',
'ContentType' => 'application/pdf',
'#http' => [
'progress' => function ($downloadTotalSize, $downloadSizeSoFar, $uploadTotalSize, $uploadSizeSoFar) {
printf(
"%s of %s downloaded, %s of %s uploaded.\n",
$downloadSizeSoFar,
$downloadTotalSize,
$uploadSizeSoFar,
$uploadTotalSize
);
}
]
]);
This is explained in the AWS docs - S3 Config section. It works by exposing GuzzleHttp's progress property-callable, as explained in this SO answer.

I got this to work by firing concurrent XHRs to the server to poll the upload progress and saving it to a session variable. See: Why are my XHR calls waiting for each other to return a response where I asked another question related to XHR polling and session blocking in order to accomplish this.
In the end though, I decided to drop all of this altogether. My production server was an EC2 instance so any upload to the S3 server took only very little network overhead, (I should have realized this sooner). I could transfer a couple of MBs of images (all that I will ever need) in less than 3 seconds so I decided to just not display a progress bar as it doesn't justify the cost of adding nasty session calls in various parts of my code.

Related

Dynamic S3 Link for downloading or viewing a PDF

I am storing some customer PDFs in S3 for multiple parties to either view in the browser or download. The trouble is I can only get a single file in S3 to either always download or always view in the browser.
I could just upload the same file twice with each having its own ContentDisposition, but that seems wasteful when ideally it could be as simple as adding something like ?ContentDisposition=inline to the public bucket URL.
My Question: How can dynamically set a ContentDisposition for a single S3 file?
For context, my current code looks something like this:
$s3_object = array(
'ContentDisposition' => sprintf('attachment; filename="%s"', addslashes($basename)),
'ACL' => 'public-read',
'ContentType' => 'pdf',
'StorageClass' => 'REDUCED_REDUNDANCY',
'Bucket' => 'sample',
'Key' => static::build_file_path($path, $filename, $extension),
'Body' => $binary_content,
);
$result = $s3_client->putObject($s3_object);
Also, I did try to search for this elsewhere in SO, but most people seem to just be looking for one or the other, so I didn't find any SO answers that showed how to do this.
I ended up stumbling across the definitive answer for this today (over a month later) while looking at other S3 documentation. Going to the GetObject docs for the S3 API and under the section labeled "Overriding Response Header Values" we find the following:
Note: You must sign the request, either using an Authorization header or a presigned URL, when using these parameters. They cannot be used with an unsigned (anonymous) request.
response-content-language
response-expires
response-cache-control
response-content-disposition
response-content-encoding
This answer's how to dynamically change any S3 object's content-disposition in the URL. However, at least for me, this is an imperfect solution because my intended use case was to store the URL for years as part of an invoicing archive, but signed URLs are only valid for a maximum of 1 week.
I could technically also try to find a way to make the Authorization header work for me or just query the S3 API to get a new signed URL every time I want to link to it, but that has other security, performance, and ROI implications for me that make it not worth it.

Google Cloud resumable upload in PHP - How does it differ programmatically from regular upload?

I have a Google Cloud upload function that works fine with the relatively small files I've been uploading (see code below). But I will soon need to be uploading files up to 500Meg in size and I was looking into the "resumable" upload option. On the Google Documentation it says basically that files over 5Meg and Google just converts the upload to a resumable upload type. But what does that mean? Does that mean I don't have to make any coding changes? Does it mean that if my page times out and I re-invoke the page and start the download again, that Google API will automatically detect that the previous upload failed and will just simply resume the upload where it left off and then only return a valid (non-NULL) storageObject to me once the upload completes?
Here is my current "non-resumable" code:
function uploadFile($bucketName,&$fileContent, $cloudPath) {
$privateKeyFileContent = $GLOBALS['privateKeyFileContent'];
// connect to Google Cloud Storage using private key as authentication
try {
$storage = new StorageClient([
'keyFile' => json_decode($privateKeyFileContent, true)
]);
} catch (Exception $e) {
// maybe invalid private key ?
print $e;
return false;
}
// set which bucket to work in
$bucket = $storage->bucket($bucketName);
$sFileHash = base64_encode(md5($fileContent,true));
$storageObject = $bucket->upload(
$fileContent,
[
'name' => $cloudPath,
'metadata' => ['md5Hash' => $sFileHash]
]
);
return $storageObject; // will be null on failure
}
In the differences between upload() and getResumableUploader(), we can see very strange documentation. As per documentation on Cloud client Library the upload function states:
Upload your data in a simple fashion. Uploads will default to being
resumable if the file size is greater than 5mb.
And as Cloud Storage documentation states:
Resumable uploads are automatically managed on you behalf, but can be
directly controlled using the resumable option.
Which means with your current code you could enable the resumable option for upload() in your code by adding 'resumable' => true. Though i'm not too certain, there's probably something behind the scenes were not seeing and documentation doesnt explain clearly. This example would look something like:
$storageObject = $bucket->upload(
$fileContent,
[
'name' => $cloudPath,
'metadata' => ['md5Hash' => $sFileHash],
'resumable' => true
]
);
I had a look at the source code on github for both of these methods and they have practically the same configuration options but getresumableUpload() contains getResumeUri() which is necessary for resume(), I couldn't seem to find compatibility for normal upload() though I wouldn't rule it out.
$uploader = $bucket->getResumableUploader(
fopen($fileContent, 'r'),
[
'name' => $cloudPath,
'metadata' => ['md5Hash' => $sFileHash]
]
);
try {
$object = $uploader->upload();
} catch (GoogleException $ex) {
//if there is an error it can automatically restart
//$uploader contains 'resumeUri' which is what is used to resume the upload
$resumeUri = $uploader->getResumeUri();
$object = $uploader->resume($resumeUri);
}
resume() handles all of the necessary headers and bytes sent for you to resume an upload.
The case you are describing should be resumable.
All of the following HTTP status responses are considered retryable:
408 Request Timeout
500 Internal Server Error
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
The following HTTP status responses are non-retryable:
404 Not Found
410 Gone
More info on HTTP statuses here

Setting public access on Google Cloud Storage objects through resumable upload signed urls

I'm trying to update a project to using resumable uploads and have managed to upload to my bucket using client-side to handle all PUT requests.
One issue I have though is setting the ACL on the object itself.
Client Side.
I have tried setting the header on the PUTs with both,
'x-goog-acl':'public-read'
and
'acl':'public-read'
The latter works fine on my non-resumable uploading, but I'm not 100% on which I'm expected to use with resumable or if it even matters.
Server Side
I'm using the 'beginSignedUploadSession' method with the Google Cloud Storage for PHP Library
I've seen examples like.
$bucket->upload(
fopen('/data/file.txt', 'r'),
[
'predefinedAcl' => 'publicRead'
]
);
So I've tried...
$url = $object->beginSignedUploadSession([
'predefinedAcl' => 'publicRead'
]);
However looking at the docs, the predefinedAcl parameter does not seem to be supported for this method.
beginSignedUploadSession Parameters
The only thing I can think to try is using the headers directly, like..
$url = $object->beginSignedUploadSession([
'headers' => array('x-goog-acl' => 'public-read'),
'contentType' => $filetype
]);
Although this seems to also fail with both 'x-goog-acl' and 'acl' headers.
So, my question is, does anyone know of the correct way to set the ACL on an object using the beginSignedUploadSession method or whether there is a workaround
if it is not possible directly ?
Thanks.
Update
So far the only way I've been able to do this is to edit the source of the library.
I've hard coded the header in google-cloud-php/Core/src/Upload/SignedUrlUploader.php with..
'x-goog-acl' => 'public-read'
preview
This is obviously horrible, but it works for me for now.
I'm still very much interested in the correct way, or as to why the header isn't getting passed through. If I find out at a later date, then I'll update this post.
Thanks again.
You can use the getResumableUpload function of Google apis, there is an option for ACL.
getResumableUpload
$uploader = $bucket->getResumableUploader(
fopen('/data/file.txt', 'r'),
[
'predefinedAcl' => 'publicRead'
]
);
Then to upload:
try {
$object = $uploader->upload();
} catch (GoogleException $ex) {
$resumeUri = $uploader->getResumeUri();
$object = $uploader->resume($resumeUri);
}

Logging to CloudWatch from EC2 instances

My EC2 servers are currently hosting a website that logs each registered user's activity under their own separate log file on the local EC2 instance, say username.log. I'm trying to figure out a way to push log events for these to CloudWatch using the PHP SDK without slowing the application down, AND while still being able to maintain a separate log file for each registered member of my website.
I can't for the life of me figure this out:
OPTION 1: How can I log to CloudWatch asynchronously using the CloudWatch SDK? My PHP application is behaving VERY sluggishly, since each log line takes roughly 100ms to push directly to CloudWatch. Code sample is below.
OPTION 2: Alternatively, how could I configure an installed CloudWatch Agent on EC2 to simply OBSERVE all of my log files, which would basically upload them asynchronously to CloudWatch for me in a separate process? The CloudWatch EC2 Logging Agent requires a static "configuration file" (AWS documentation) on your server which, to my knowledge, needs to lists out all of your log files ("log streams") in advance, which I won't be able to predict at the time of server startup. Is there any way around this (ie, simply observe ALL log files in a directory)? Config file sample is below.
All ideas are welcome here, but I don't want my solution to simply be "throw all your logs into a single file, so that your log names are always predictable".
Thanks in advance!!!
OPTION 1: Logging via SDK (takes ~100ms / logEvent):
// Configuration to use for the CloudWatch client
$sharedConfig = [
'region' => 'us-east-1',
'version' => 'latest',
'http' => [
'verify' => false
]
];
// Create a CloudWatch client
$cwClient = new Aws\CloudWatchLogs\CloudWatchLogsClient($sharedConfig);
// DESCRIBE ANY EXISTING LOG STREAMS / FILES
$create_new_stream = true;
$next_sequence_id = "0";
$result = $cwClient->describeLogStreams([
'Descending' => true,
'logGroupName' => 'user_logs',
'LogStreamNamePrefix' => $stream,
]);
// Iterate through the results, looking for a stream that already exists with the intended name
// This is so that we can get the next sequence id ('uploadSequenceToken'), so we can add a line to an existing log file
foreach ($result->get("logStreams") as $stream_temp) {
if ($stream_temp['logStreamName'] == $stream) {
$create_new_stream = false;
if (array_key_exists('uploadSequenceToken', $stream_temp)) {
$next_sequence_id = $stream_temp['uploadSequenceToken'];
}
break;
}
}
// CREATE A NEW LOG STREAM / FILE IF NECESSARY
if ($create_new_stream) {
$result = $cwClient->createLogStream([
'logGroupName' => 'user_logs',
'logStreamName' => $stream,
]);
}
// PUSH A LINE TO THE LOG *** This step ALONE takes 70-100ms!!! ***
$result = $cwClient->putLogEvents([
'logGroupName' => 'user_logs',
'logStreamName' => $stream,
'logEvents' => [
[
'timestamp' => round(microtime(true) * 1000),
'message' => $msg,
],
],
'sequenceToken' => $next_sequence_id
]);
OPTION 2: Logging via CloudWatch Installed Agent (note that config file below only allows hardcoded, predermined log names as far as I know):
[general]
state_file = /var/awslogs/state/agent-state
[applog]
file = /var/www/html/logs/applog.log
log_group_name = PP
log_stream_name = applog.log
datetime_format = %Y-%m-%d %H:%M:%S
Looks like we have some good news now... not sure if it's too late!
CloudWatch Log Configuration
So to answer the doubt,
Is there any way around this (ie, simply observe ALL log files in a directory)?
yes, we can mention log files and file paths using wild cards, which can help you in having some flexibility in configuring from where the logs are fetched and pushed to the log streams.

Sdk Glacier php timeout

after playing a bit an uploading some small test files I wanted to upload a bigger file, around 200 MB but I always get the timeout exception, then I tried to upload a 30 MB file and the same happens.
I think the timeout is 30 seconds, it is possible to tell the glacier client to wait until the upload is done?
This is the code I use:
$glacier->uploadArchive(array(
'vaultName' => $vaultName,
'archiveDescription' => $desc
'body' => $body
));
I have tested with other files and the same happens, then I tried with a small file of 4MB and the operation was successful, I thought that dividing the files and uploading them one by one, bu then again around the third one a timeout exception comes out.
I also tried the multiupload with the following code
$glacier = GlacierClient::factory(array(
'key' => 'key',
'secret' => 'secret',
'region' => Region::US_WEST_2
));
$multiupload = $glacier->initiateMultipartUpload(array(
'vaultName' => 'vaultName',
'partSize' => '4194304'
));
// An array for the suffixes of the tar file
foreach($suffixes as $suffix){
$contents = file_get_contents('file.tar.gz'.$suffix);
$glacier->uploadMultipartPart(array(
'vaultName' => 'vaultName',
'uploadId' => $multiupload->get('uploadId'),
'body' => $contents
));
}
$result=$glacier->completeMultipartUpload(array(
'vaultName' => 'vaultName',
'uploadId' => $multiupload->get('uploadId'),
));
echo $result->get('archiveId');
It misses the parameter Range, I don't think I fully understand how this multi part upload works, but I think I will have the same timeout exception. So my question is as I said before.
It is possible to tell the glacier client to wait until the upload is done?
The timeout sounds like a script timeout like Jimzie said.
As for using the Glacier client, you should checkout this blog post from the official AWS PHP Developer Blog, which shows how to do multipart uploads to Glacier using the UploadPartGenerator object. If you are doing the part uploads in different requests/processes, you should also keep in mind that the UploadPartGenerator class can be serialized.
This sounds suspiciously like a script timeout. Try
set_time_limit (120);
just inside of the foreach loop. This will give you a two minute PHP sanity timer for each of your multi-part files.

Categories