Image upload performance issue with Amazon S3 and jqueryfileupload plugin

Image upload performance issue with Amazon S3 and jqueryfileupload plugin - php

I have another issue with amazon and its related to file uploads.I am using jqueryFileUpload and amazon API's to uplaod files to amazon S3.I have succeeded in uploading it,but it involves a trick.
I had to store the image on my server and then move it to S3 from there using putObjectFile method of S3.Now the plugin comes with great functions to crop/resize images and I have been using them since long.Now when I integrate the plugin with AWS,i am facing performance issues with upload.The time taken for uploads is longer than normal and this raises questions of us using AWS S3 over traditional way.
I had to make changes to my UploadHandler.php file to make it work.These are the changes made.i added a part of AWS upload code to the file from line 735 to 750
$bucket = "elasticbeanstalk-2-66938761981";
$s3 = new S3(awsAccessKey, awsSecretKey);
$response = $s3->putObjectFile($file_path,$bucket,$file->name,S3::ACL_PUBLIC_READ);
$thumbResponse = $s3->putObjectFile('files/thumbnail/'.$file->name,$bucket,'images/'.$file->name,S3::ACL_PUBLIC_READ);
//echo $response;
//echo $thumbResponse;
if ($response==1) {
//echo 'HERER enter!!';
} else {
$file->error = "<strong>Something went wrong while uploading your file... sorry.</strong>";
}
return $file;
Here is a link to s3 class on git.
The normal upload to my current server(not amazon),same image uploads in 15 secs,but on amazon S3 it takes around 23 secs and I am not able to figure out a better solution.I have to store the image on my sever before uploading to S3 as I am not sure if I can process them on the fly and upload directly to S3.Can anyone suggest the right way to approach the problem?Is it possible to resize the images to different sizes in memory and upload directly to S3 avoiding the overhead of saving it to our server?If yes can anyone guide me in the right direction?
Thank you for the attention.

I believe the approximate 8secs is the overhead here for creating versions of image in different sizes.
You may take different approaches to get rid of the resizing overhead at time of upload. The basic idea will be to allow the uploading script to finish execution and return the response, and do the resizing process as a separate script.
I like to suggest following approaches:
Approach 1. Don't resize during the upload! Create resized versions on-the-fly only when it is being requested for the first time and cache the generated images to serve directly for later requests. I saw a few mentions of Amazon CloudFront as a solution in some other threads in Stackoverflow.
Approach 2. Invoke the code for creating resized versions as a separate asynchronous request after the upload of original image. There will be a delay in scaled versions being available. So write necessary code to show some place holder images in the website until the scaled versions become available. You will have to figure out some way to identify whether scaled version is available yet or not(For example check file is existing, or set some flag in database). Some ways for making asynchronous cURL requests are suggested here if you would like to try it out.
I think both approaches will have equal level of complexity.
Some other approaches are suggested as answers for this other question.

Related

PHP: What is a good way to upload multiple large files?

What is a good way to upload multiple large files in PHP?
Note: in my case I don't really need very large files, I need something like 40-50 or maybe 100 MB upload. My purpose and goal is mainly about documents upload in websites, these documents (pdf, doc, etc.) can sometimes be 20-30 MB and rarely 100+MB, where normally the php MAX_UPLOAD_FILESIZE is 10-20 MB. I think that having a large MAX_UPLOAD_FILESIZE is very bad and anyway allowing large file uploads in PHP (like 1GB+) is really a bad idea.
I have been reading, even here on SO, different solutions like plupload and some others (HTTP Upload, bigUpload, etc.) and I am not sure which way is better to consider.
As a principle, I'd like to find something with mantained code (not an abandoned library) and possibly following coding standards (PSR).
I think that writing all from scratch would be a huge work, but maybe I am wrong, if someone did that I would like to hear your experience. Of course if I can't find something that gives me what I need, I'll have to edit existing libraries or write one on my own.
I will give a try to plupload but my biggest concern is that it might not be well mantained. There is a v3 release but the stable is still the v2. If I understand correctly, the last updates of this library on GitHub were made in 2017 and this doesn't really reassure me. PHP but everything in general changes really quickly.
Problem 1: Server knows the size of the file only when the upload is finished
I think this is the first problem. I can't be sure about the file size until the upload is completed. If I have to give an error in PHP because the file is too big, I'd need to upload the file before. I could guess the size in javascript but I think that would be too hackable. Same thing for HTML5. All client-side checks could be "hacked" or manipulated even if it requires work, it is still doable
Problem 2: File chunking, is it the best solution?
I have read about file chunking, which is very interesting, but that is only client-side, right? Because to chunk a file on server-side, you go back to problem 1, you need to upload the whole file before chunking it. Is client-side chunking safe? Are there known vulnerabilities about it? (an example, could I exploit something out while the file is being uploaded?). I will read more about this later.
Also what about the time required to upload? Let's say you are using your phone in 4G to upload two PDFs or JPGs on a form, you have to upload 20-30 MB with a bad signal in that moment, let's say you take 30 seconds or 1 minute to upload them. Will the server stop listening after a while?
In any case, with file chunking, do I need to have a large MAX_FILESIZE_UPLOAD? Is usually the request sent (as a POST) with the entire file, which could return an error because the size exceeds the PHP limit? I would like to keep a normal PHP file size limit.
Problem 3: stopping file upload and deleting interrupted upload
What happens if I stop or I want to stop my upload? I guess that the server will find itself with a temporary file which is only partially uploaded.
Then, in general, is the best way to have a cron checking the /tmp (or whatever) folder and deleting uncompleted files?
Problem 4: overwriting interrupted uploads
What if the user uploads a new file that should overwrite the old one, but the old one was not completed? Let's say, user uploads a 10MB document but realizes it's the wrong one. He will probably reload the page or maybe click "Browse" again and upload the new one. So, all the temporary files should have a unique name. If I remember correctly, PHP already gives them a random name in /tmp/. Is this enough? Would it be better to manually give them a random name, maybe based on a timestamp?
In code words, something like:
$fileName = time() . '-' . uniqid() . '.tmp';
Problem 5: what about multiple files upload?
Let's say the user uploads 3 documents, 10 MB each one. Should the server receive them all at the same time or one by one? Or maybe is it the same? At a first look, I might think that multiple uploads at the same time could give more problems with server load. Maybe this is not that important though.
Problem 6: accessibility
Would all this be easily accessible? Do you think that a user using a screen-reader would be able to upload multiple (and possibly large) files without problems?
Conclusion
In conclusion I will take a look at existing libraries and test a bit if I can find a good solution for my needs. In general I would like to read comments or experiences that could help me understand possible difficulties and problems that I could encounter.

I will try to help you in your questions:
Problem 1: Server knows the size of the file only when the upload is finished
You need to configure upload_max_filesize and post_max_size to allow the maximum file upload you want to be able to receive in a single request.
And yes, you will know the size once the file has been uploaded, as your script will be executed once the file is completely uploaded.
You can have some kind of checking in your javascript to improve your UI to the customer.
Also if the file exceeds the maximum size, the file will not be available in your server script.
Problem 2: File chunking, is it the best solution?
I don't think this would be any better than uploading the whole file.
The speed will not be increased, so although you can paralelize the upload, the upload speed will be determined by the client speed, you will end up uploading 1 file at 10Mb/s or 10 files at 1Mb/s, at the end will be the same.
Also you will have a very complex code at the client and at the server to handle it, and you will have to handle new error scenarios.
It is not worth it.
Problem 3: stopping file upload and deleting interrupted upload
If the client stops the upload, your server code will not be executed as the request has not been completed.
Also you won't have any troubles with duplicated tmp files, the file is removed from tmp folder when you move it using move_uploaded_file and if you don't move the file will be deleted when your script has finished.
https://www.php.net/manual/en/features.file-upload.post-method.php
Problem 5: what about multiple files upload?
With multiple file uploads, i prefer to use diferent requests via javascript for each file, then you can paralelize or start uploading files one by one and show the user the upload process.
With javascript you can see the upload percent of each request, and it is very useful when uploading big files to give the user the feedback that the application is working properly.
Problem 6: accessibility
You will handle the same accessibility issues than uploading a small file.
Conclusion
The server side coding is independent of the file size you are trying to upload, as it just does some checks and moves the file to the proper location.
You will have to code some javascript to improve the ui to show the progress to the customer, and if you want to add the ability to cancel the upload, it is very easy as it is just an http request with a listener that updates the progress (https://stackoverflow.com/a/47638378/1445024)

Rotate Image in Amazon S3

I have a bunch of images in Amazon S3 that I need to physically rotate. I currently do this by downloading the image to my server, rotating it using GD and overwriting it back to S3.
This process takes ~5 secs per image. I was wondering if there is any AWS API or such that can do this rotation directly in S3, preferably as in a batch mode?
I would appreciate it if anyone who has any experience with that kind of stuff can give me any pointers!

There is no way to rotate an image 'on' S3. Any method you employ is going to have to read the file from S3, do the rotation, and write it back to S3.
If the server you are doing it on now is not an EC2 instance, than its worth a try to do it there - the latency will be reduced quite a bit. Lambda is another option for you in that it will run within the AWS infrastructure, so network overhead will be reduced.

Not quite sure what your constraints might be, but if you're preparing the images for a web page - you could rotate them client-side using CSS. That would prevent the additional calls to S3, and eliminate processing load on your application server.
img {
transform: rotate(90deg);
}

Large file upload through Browser (100 GB)

Is there any way to upload large files (more than 80 Gb) through a web browser? Previously I have been uploading files (img, png, jpg) using plupload but it seems not to be working for larger files. I would also like to know how to implement a web page where users could upload like Mega.co.nz or Drive.google.com.
If it is impossible to do it using web development tools, can anyone guide me about how I can divide & upload a file in segments?
Thanks.

You can use the JavaScript Blob object to slice large files into smaller chunks and transfer these to the server to be merged together. This has the added benefit of being able to pause/resume downloads and indicate progress.
If you don't fancy doing it yourself there are existing solutions that use this approach. One example is HTML5 Uploader by Filkor.

If I was you I would use something like FTP to accomplish this. If you can use ASP.NET there are already good libraries that exist for file transfer.
Here is a post that shows an example of uploading a file: Upload file to ftp using c#
The catch is you will need a server. I suggest Filezilla. https://filezilla-project.org/

Integrate Cloudfront with dynamic image resize and S3 storage

I've been reading a lot about dynamic image manipulation, storage and content delivery, the company I'm working for already uses AWS for some of their services.
The application I'm working on, store document images to a S3 bucket (not limited to), and i need to display them on demand.
The first version of this application, stored the images locally and performs the image manipulation on-demand on the same server.
Now, the documents storage has increased and a lot of images are being stored, all this via web application, this means that one user may upload say 100+ images and the server needs to process them as fast as it can.
That's why the images are uploaded to an EC2 instance and they are streamed to a S3 bucket internally, that's how we save the original image in the first place, no thumbnails here to speed up the uploading process.
Then a different user may want to preview this images or see them in original size, this is why i need to dynamically re-size them, i will implement Cloudfront for the image caching after they are re-sized, and here comes the issue.
The workflow is like this:
1. User Request CDN image
2.a Cloudfront Serves the cached image
2.b Cloudfront request the image to a custom origin if its not cached
3. The origin server query S3 for the image
4.a If the image size exists on S3
5. Return the image to Cloudfront, Cache and return to user
4.b If the image size does not exists on S3
5. Generate a image size from the original S3 image
6. Save the new size to S3
7. Return the new size to Cloudfront, Cache and return to user
The custom origin is responsible of creating the missing image size and save it to S3, the Cloudfront can use the cached image or request this new image size to S3 as it now exists.
I think this is possible, as i already read a lot of documentation about it, but i still haven't found documentation of someone who has made this before.
Does this looks like a good way of handle the image manipulation, has anyone saw any documentation about how to do this.
I'm a PHP developer but i might be able to implement a non-PHP solution in favor of performance on the image server.

If you are open to non-PHP based solutions https://github.com/bcoe/thumbd is a good option since it already integrates S3, etc. However you will need to know the sizes you need ahead of time. I would recommend such an approach rather than generating sizes on the fly since it means faster response times for your user. Your user will not have to wait while the new size is being generated. Storage on S3 is incredibly cheap and so you will not be wasting any $$ by creating multiple sizes either.

S3 Bucket Amazon issue

1) I have upload form
2) It uploads file to my local storage move_uploaded_file.
3) It uses zend putObject function to move file to s3 object.
Everything works ok till I have file size of around 30Mb to 40 Mb. The problem is when I try uploading larger files like 80 Mb, 100 Mb or so, the file moving to s3 takes ages to complete the upload. My code is something like this:
$orginalPath = APPLICATION_PATH."/../storage/".$fileName;
move_uploaded_file($data['files']['tmp_name'], "$orginalPath");
$s3 = new Zend_Service_Amazon_S3($accessKey, $secretKey);
$s3->putObject($path, file_get_contents($orginalPath),
array(Zend_Service_Amazon_S3::S3_ACL_HEADER =>Zend_Service_Amazon_S3::S3_ACL_PUBLIC_READ));
Can you help how to handle large files move quickly I tried using streamWrapper like this
$s3->registerStreamWrapper("s3");
file_put_contents("s3://my-bucket-name/orginal/$fileName", file_get_contents($orginalPath));
But no luck, it take same long time to move file.
Hence, is there an efficient way to move file quickly to s3 bucket?

The answer is a worker process. You can start a PHP worker script via PHP CLI on server boot, perhaps with a GearmanClient php extension and gearman server running on your box. Then you queue a background job to upload the file to S3 while your main site PHP code returns success after issuing the job and the file happily uploads in the background while your foreground site continues on it's merry way. Another way of doing this is making another server do all of this task while your main site remains utilization free of this process. I am doing this now. It works well.

You could consider using the more direct POST to S3 feature. The AWS SDK for PHP has a class to help generate the data for the form.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.