I have the following script which works with small files, however fails when I try a huge file (4GB):
<?php
require 'vendor/autoload.php';
use Google\Cloud\Storage\StorageClient;
$storage = new StorageClient([
'keyFilePath' => 'keyfile.json',
'projectId' => 'storage-123456'
]);
$bucket = $storage->bucket('my-bucket');
$options = [
'resumable' => true,
'chunkSize' => 200000,
'predefinedAcl' => 'publicRead'
];
// Upload a file to the bucket.
$bucket->upload(
fopen('data/file.imgc', 'r'),
$options
);
?>
The error I receive is:
Fatal error: Uncaught exception 'Google\Cloud\Core\Exception\GoogleException' with message 'Upload failed. Please use this URI to resume your upload:
Any ideas how to upload a large file?
http://googlecloudplatform.github.io/google-cloud-php/#/docs/google-cloud/v0.61.0/storage/bucket?method=upload
I've also tried the getResumableUploader():
$uploader = $bucket->getResumableUploader(fopen('data/file.imgc', 'r'), [
'name' => 'file.imgc'
]);
try {
$object = $uploader->upload();
} catch (GoogleException $ex) {
$resumeUri = $uploader->getResumeUri();
$object = $uploader->resume($resumeUri);
}
When navigating to the resume URI it returns "Method Not Allowed"
I haven’t used this API, but I don’t think you’re supposed to just open a massive file into memory and then stuff it into this single request. You’re requesting a resumable operation, so read small portions of the file, like a couple MB at a time, and loop through it, until you have uploaded all parts of it.
One thing that stands out as an issue, potentially, is the chosen chunkSize of 200000. Per the documentation, the chunkSize must be provided as multiples of 262144.
Also, when dealing with large files, I would highly recommend usingBucket::getResumableUploader(). It will help give you better control over the upload process and you should find it will be more reliable :). There is a code snippet in the link I shared that should help get you started.
Related
I'm using https://github.com/vimeo/laravel package to upload my video to Vimeo. But there's a little problem with the file size, so I edited the php and nginx configuration to allow up to 500 request size... which isn't good (Keep in mind this is my test server and not production). I'm wondering if the package itself streams the file and uploads it or it uses as much memory as the file size and upload it at once..
Heres my code :
public function UploadToVimeo(Request $request){
$this->validate($request, [
'class_id' => 'required|exists:teacher_classes,class_id',
'video_name' => 'required|mimes:mp4,mov,ogg,qt',
]);
$file = $request->video_name;
$result = Vimeo::upload($file);
if($result){
$str = str_replace('/videos/','',$result);
TeacherClass::where('class_id',$request->class_id)->update(['url'=>'https://vimeo.com'.$str]);
}
return back()->with('result',$str);
}
Can someone explain to me how the package works? Or a way to stream the file?
Thank you
I use cloud vision to annotate documents with DOCUMENT_TEXT_DETECTION, and I only use the words data.
The annotate command returns a lot of information for each letter/symbol (languages, vertices, breaks, text, confidence, ...) which adds up to a lot of memory usage. Running annotate on a 4 pages document¹ return over 100MB of data, which is past my php memory limit, causing the script to crash. Getting only the words data would probably be about 5 times smaller.
To be clear, I load the VisionClient, set up the image, run the annotate() command, and it returns a 100MB variable directly, crashing at this point, before I get the chance to do any cleaning.
$vision = new VisionClient([/* key & id here */]);
$image = $vision->image(file_get_contents($imagepath), ['DOCUMENT_TEXT_DETECTION']);
$annotation = $vision->annotate($image); // Crash at that point trying to allocate too much memory.
Is there a way to not request the entirety of the data? The documentation on annotate seems to indicate that it's possible to annotate only part of the picture, but not to toss the symbols data.
At a more fundamental level, am I doing something wrong here regarding memory management in general?
Thanks
Edit : Just realized : I also need to store the data in a file, which I do using serialize()... which double the memory usage when ran, even if I do $annotation = serialize($annotation) to avoid having 2 variables. So I'd actually need 200MB per user.
¹ Though this is related to the amount of text rather than the amount of pages.
Dino,
When dealing with large images, I would highly recommend uploading your image to Cloud Storage and then running the annotation request against the image in a bucket. This way you'll be able to take advantage of the resumable or streaming protocols available in the Storage library to upload your object with more reliability and with less memory consumption. Here's a quick snippet of what this could look like using the resumable uploader:
use Google\Cloud\Core\Exception\GoogleException;
use Google\Cloud\Storage\StorageClient;
use Google\Cloud\Vision\VisionClient;
$storage = new StorageClient();
$bucket = $storage->bucket('my-bucket');
$imageName = 'my-image.png';
$uploader = $bucket->getResumableUploader(
fopen('/path/to/local/image.png', 'r'),
[
'name' => $imageName,
'chunkSize' => 262144 // This will read data in smaller chunks, freeing up memory
]
);
try {
$uploader->upload();
} catch (GoogleException $ex) {
$resumeUri = $uploader->getResumeUri();
$uploader->resume($resumeUri);
}
$vision = new VisionClient();
$image = $vision->image($bucket->object($imageName), [
'FACE_DETECTION'
]);
$vision->annotate($image);
https://googlecloudplatform.github.io/google-cloud-php/#/docs/google-cloud/v0.63.0/storage/bucket?method=getResumableUploader
I want to upload a big video file to my AWS S3 bucket. After a good deal of hours, I finally managed to configure my php.ini and nginx.conf files, so they allowed bigger files.
But then I got a "Fatal Error: Allowed Memory Size of XXXXXXXXXX Bytes Exhausted". After some time I found out larger files should be uploaded with streams using fopen(),fwrite(), and fclose().
Since I'm using Laravel 5, the filesystem takes care of much of this. Except that I can't get it to work.
My current ResourceController#store looks like this:
public function store(ResourceRequest $request)
{
/* Prepare data */
$resource = new Resource();
$key = 'resource-'.$resource->id;
$bucket = env('AWS_BUCKET');
$filePath = $request->file('resource')->getRealPath();
/* Open & write stream */
$stream = fopen($filePath, 'w');
Storage::writeStream($key, $stream, ['public']);
/* Store entry in DB */
$resource->title = $request->title;
$resource->save();
/* Success message */
session()->flash('message', $request->title . ' uploadet!');
return redirect()->route('resource-index');
}
But now I get this long error:
CouldNotCreateChecksumException in SignatureV4.php line 148:
A sha256 checksum could not be calculated for the provided upload body, because it was not seekable. To prevent this error you can either 1) include the ContentMD5 or ContentSHA256 parameters with your request, 2) use a seekable stream for the body, or 3) wrap the non-seekable stream in a GuzzleHttp\Stream\CachingStream object. You should be careful though and remember that the CachingStream utilizes PHP temp streams. This means that the stream will be temporarily stored on the local disk.
So I am currently completely lost. I can't figure out if I'm even on the right track. Here are the resource I try to make sense of:
AWS SDK guide for PHP: Stream Wrappers
AWS SDK introduction on stream wrappers
Flysystem original API on stream wrappers
And just to confuse me even more, there seems to be another way to upload large files other than streams: The so called "multipart" upload. I actually thought that was what the streams where all about...
What is the difference?
I had the same problem and came up with this solution.
Instead of using
Storage::put('file.jpg', $contents);
Which of course ran into an "out of memory error" I used this method:
use Aws\S3\MultipartUploader;
use Aws\Exception\MultipartUploadException;
// ...
public function uploadToS3($fromPath, $toPath)
{
$disk = Storage::disk('s3');
$uploader = new MultipartUploader($disk->getDriver()->getAdapter()->getClient(), $fromPath, [
'bucket' => Config::get('filesystems.disks.s3.bucket'),
'key' => $toPath,
]);
try {
$result = $uploader->upload();
echo "Upload complete";
} catch (MultipartUploadException $e) {
echo $e->getMessage();
}
}
Tested with Laravel 5.1
Here are the official AWS PHP SDK docs:
http://docs.aws.amazon.com/aws-sdk-php/v3/guide/service/s3-multipart-upload.html
the streaming part applies to downloads.
for uploads you need to know the content size. for large files multipart uploads are the way to go.
I have created a simple PHP script to play around with Google's Drive SDK. The plan is that eventually we will use Google Drive as a form of CDN for some of our web content (our company already has already upgraded to 1TB).
The code works to a degree, in that it successfully authenticates and uploads a file. The problem is, the file is always broken and cannot be viewed either with Drive itself, or by downloading.
The code is relatively simple, and just fetches an image from Wikipedia and attempts an upload:
<?php
require_once 'Google/Client.php';
require_once 'Google/Service/Drive.php';
require_once 'Google/Service/Oauth2.php';
$client = new Google_Client();
//$client->setUseObjects(true);
//$client->setAuthClass('apiOAuth2');
$client->setScopes(array('https://www.googleapis.com/auth/drive.file'));
$client->setClientId('***');
$client->setClientSecret('***');
$client->setRedirectUri('***');
$client->setAccessToken(authenticate($client));
// initialise the Google Drive service
$service = new Google_Service_Drive($client);
$data = file_get_contents('http://upload.wikimedia.org/wikipedia/commons/3/38/JPEG_example_JPG_RIP_010.jpg');
// create and upload a new Google Drive file, including the data
try
{
//Insert a file
$file = new Google_Service_Drive_DriveFile($client);
$file->setTitle(uniqid().'.jpg');
$file->setMimeType('image/jpeg');
$createdFile = $service->files->insert($file, array(
'data' => $data,
'mimeType' => 'image/jpeg',
));
}
catch (Exception $e)
{
print $e->getMessage();
}
print_r($createdFile);
?>
The print_r statement executes and we get information about the file. However, as I mentioned, the file is not viewable, and appears to be corrupt. Can anyone shed any light on what the issue may be?
After doing some more digging around in the docs (the current public docs are seriously out of date), I found that it's necessary to send another parameter as part of the insert() function's body parameter (the second argument in the function call).
Using the following:
$createdFile = $service->files->insert($doc, array(
'data' => $content,
'mimeType' => 'image/jpeg',
'uploadType' => 'media', // this is the new info
));
I was able to get everything working. I'll leave the question here, as I think it will be very useful until such a time that Google actually updates the documentation for the PHP API.
Source info here
Your code seems to be ok.
Have You tried to download this file from Google Drive afterwards and look at it?
Also I know its abit stupid but have You tried to write file on disk right after using file_get_contents(). You know just to establish point where it goes bad for 100%.
Will the function pause the php script until it finds the object on s3 servers?
I have it inside a foreach loop, uploading images one by one. After the object is found I call a method to delete the image locally then delete the local folder if empty. Is this a proper way of going about it? Thanks
foreach ($fileNames as $fileName)
{
$imgSize = getimagesize($folderPath . $fileName);
$width = (string)$imgSize[0];
$height = (string)$imgSize[1];
//upload the images
$result = $S3->putObject(array(
'ACL' => 'public-read',
'Bucket' => $bucket,
'Key' => $keyPrefix . $fileName,
'SourceFile' => $folderPath . $fileName,
'Metadata' => array(
'w' => $width,
'h' => $height
)
));
$S3->waitUntilObjectExists(array(
'Bucket' => $bucket,
'Key' => $keyPrefix . $fileName));
$this->deleteStoreDirectory($folderPath, $fileName);
}
waitUntilObjectExists is basically a waiter that periodically checks (polls) S3 at specific time intervals to see if the resource is available. The script's execution is blocked until the resource is located or the maximum number of retries is reached.
As the AWS docs defines them:
Waiters help make it easier to work with eventually consistent systems by providing an easy way to wait until a resource enters into a particular state by polling the resource.
By default, the waitUntilObjectExists waiter is configured to try to locate the resource 20 times, with a 5 seconds delay between each try. You can override these default values with your desired ones by passing additional parameters to the waitUntilObjectExists method.
If the waiter is unable to locate the resource after the maximum number of tries, it will throw an exception.
You can learn more about waiters at:
http://docs.aws.amazon.com/aws-sdk-php-2/guide/latest/feature-waiters.html
For your use case, I don't think it makes sense to call waitUntilObjectExists after you uploaded the object, unless the same PHP script tries to retrieve the same object from S3 later in the code.
If the putObject API call has returned a successful response, then the object will eventually show up in S3 and you don't necessarily need to wait for this to happen before you remove the local files.