Download large files with php on appengine onto cloud storage - php

My appengine project (running on App Engine Standard Environment), uses a media-converter service outside of appengine. I can easily start a new converting job and get notified whenever the job is done. The media-converter delivers a temporary url to retrieve the conversion-result (mp4 file).
Now i want, to start a background-job to download this converted file from the media-converter to my google-cloud storage.
Whatever i tried so far, i cannot download larger files that 32 mb.
These are my approaches so far:
First one by just copy with file_put_contents / file_get_contents as suggested on https://cloud.google.com/appengine/docs/standard/php/googlestorage/
$options = [
'Content-Type' => 'video/mp4',
'Content-Disposition' => 'inline',
];
$context = stream_context_create(['gs' => $options]);
file_put_contents($targetPath, file_get_contents($url), 0, $context);
Then i tried to work directly with streams:
$CHUNK_SIZE = 1024*1024;
$targetStream = fopen($targetPath, 'w');
$sourceStream = fopen($url, 'rb');
if ($sourceStream === false) {
return false;
}
while (!feof($sourceStream)) {
$buffer = fread($sourceStream, $CHUNK_SIZE);
Logger::log("Chuck");
fwrite($targetStream, $buffer);
}
fclose($sourceStream);
fclose($targetStream);
Then i was surprised that this actually worked (up to 32 mb)
copy($url, $targetPath);
Im running out of ideas now. Any suggestions? I kinda need the cp function of gutil in php.
https://cloud.google.com/storage/docs/quickstart-gsutil
I think this stackoverflow issue had a similar issue:
Large file upload to google cloud storage using PHP

There is a strict limit of 32MB of data for incoming requests.
Refer to Incoming Bandwidth for more details - https://cloud.google.com/appengine/quotas. This must be the reason why you are not able to go beyond the 32MB limit.
Possible Solution -
Can you modify the media-converter service?
If yes - Create an API in the media converter service and do the cloud storage upload at the media converter service itself by invoking the endpoint from your AppEngine application. Use the service account for cloud storage authentication (https://cloud.google.com/storage/docs/authentication#storage-authentication-php).
If no - You can do the same thing using Compute Engine. Create an API in the Compute Engine where you will be passing the URL of the file in response to the background job in AppEngine. Upload to cloud storage using the service account authentication.

Related

How to download a large private file from Google Storage API with PHP?

I'm trying to allow download of large files to clients, from Google Storage API behind an authenticated PHP.
I was able to read/download small files using the following code:
$object = $storage->objects->get($bucket, 'filename');
$request = new GuzzleHttp\Psr7\Request('GET', $object['mediaLink']);
//authorize the request before sending
$http = $client->authorize();
$response = $http->send($request);
$body = $response->getBody()->read($object->getSize());
$body will have the entire content of the file, but some of those might be 1gb size.
Tried using:
$stream = Psr7\stream_for($response->getBody());
But it doesn't work.
How would I be able to stream the download of the file to the client without loading it in memory?
Thanks.
Consider sending the client a signed URL so that the content is served directly from Google Cloud Storage, rather than trying to proxy the entire file download yourself.

How to include Google App Engine for PHP in my scripts / autoloading?

I have a website on an Ubuntu webserver (not an app and not hosted at App Engine) and I want to use google cloud storage for the upload/download of large files. I am trying to upload a file directly to the Google Cloud Storage which isn't working (maybe because I made some basic errors).
I have installed the Google Cloud SDK and downloaded and unzipped Google App Engine. If I now include CloudStorageTools.php I get an the error:
Class 'google\appengine\CreateUploadURLRequest' not found"
My script looks like this:
require_once 'google/appengine/api/cloud_storage/CloudStorageTools.php';
use google\appengine\api\cloud_storage\CloudStorageTools;
$options = [ 'gs_bucket_name' => 'test' ];
$upload_url = CloudStorageTools::createUploadUrl( '/test.php' , $options );
If you want to use the functionality of Google App Engine (gae), you will need to host on gae, which will likely have a larger impact on your app architecture (it uses a custom google compiled php version with limited libraries and no local file handling, so all that functionality needs to be into blob store or gcs - Google Cloud Storage).
With a PHP app running on ubuntu, your best bet is to use the google-api-php-client to connect to the storage JSON api.
Unfortunately the documentation is not very good for php. You can check my answer in How to rename or move a file in Google Cloud Storage (PHP API) to see how to GET / COPY / DELETE an object.
To upload I would suggest to retrieve a pre signed Upload URL like so:
//get google client and auth token for request
$gc = \Google::getClient();
if($gc->isAccessTokenExpired())
$gc->getAuth()->refreshTokenWithAssertion();
$googleAccessToken = json_decode($gc->getAccessToken(), true)['access_token'];
//compose url and headers for upload url request
$initUploadURL = "https://www.googleapis.com/upload/storage/v1/b/"
.$bucket."/o?uploadType=resumable&name="
.urlencode($file_dest);
//Compose headers
$initUploadHeaders = [
"Authorization" =>"Bearer ".$googleAccessToken,
"X-Upload-Content-Type" => $mimetype,
"X-Upload-Content-Length" => $filesize,
"Content-Length" => 0,
"Origin" => env('APP_ADDRESS')
];
//send request to retrieve upload url
$req = $gc->getIo()->makeRequest(new \Google_Http_Request($initUploadURL, 'POST', $initUploadHeaders));
// pre signed upload url that allows client side upload
$presigned_upload_URL = $req->getResponseHeader('location');
With that URL sent to your client side, you can use it to PUT the file directly onto your bucket with an upload script that generates an appropriate PUT request. Here an example in AngularJS with ng-file-upload:
file.upload = Upload.http({
url: uploadurl.url,
skipAuthorization: true,
method: 'PUT',
filename: file.name,
headers: {
"Content-Type": file.type !== '' ? file.type : 'application/octet-stream'
},
data: file
});
Good luck - gcs is a tough one if you don't want to go google all the way with app engine!
The Google API PHP Client allows you to connect to any Google API, including the Cloud Storage API. Here's an example, and here's a getting-started guide.

Writing to GC Bucket using AppEngine

The google docs states
The GCS stream wrapper is built in to the run time, and is used when you supply a file name starting with gs://.
When I look into the app.yaml, I see where the runtime is selected. I have selected php runtime. However when I try to write to my bucket I get an error saying the wrapper is not found for gs://. But when I try to write to my bucket using the helloworld.php script that is provided by google here https://cloud.google.com/appengine/docs/php/gettingstarted/helloworld and modifying it so that it says
<?php
file_put_contents('gs://<app_id>.appspot.com/hello.txt', 'Hello');
I have to deploy the app in order for the write to be successful. I am not understanding why I have to deploy the app everytime to get the wrapper I need to write to my bucket. How come I can not write to my bucket from a random php script?
Google says
"In the Development Server, when a Google Cloud Storage URI is specified we emulate this functionality by reading and writing to temporary files on the user's local filesystem"
So, "gs://" is simulated locally - to actually write to GCS buckets using the stream wrapper, it has to run from App Engine itself.
Try something like this:
use google\appengine\api\cloud_storage\CloudStorageTools;
$object_url = "gs://bucket/file.png";
$options = stream_context_create(['gs'=>['acl'=>'private', 'Content-Type' => 'image/png']]);
$my_file = fopen($object_url, 'w', false, $options);
fwrite($my_file, $file_data));
fclose($my_file);

Upload an image to Google Cloud Storage with PHP?

i just can't get this code to work. I'm getting an image from the URL and storing it in a temporary folder so I can upload it to a bucket on Google Cloud Storage. This is a CodeIgniter project. This function is within a controller and is able to get the image and store it in the project root's 'tmp/entries' folder.
Am I missing something? The file just doesn't upload to Google Cloud Storage. I went to the Blobstore Viewer in my local App Engine dev server and notice that there is a file but, it's empty. I want to be able to upload this file to Google's servers from my dev server as well. The dev server seems to overwrite this option and save all files locally. Please help.
public function get()
{
$filenameIn = 'http://upload.wikimedia.org/wikipedia/commons/1/16/HDRI_Sample_Scene_Balls_(JPEG-HDR).jpg';
$filenameOut = FCPATH . '/tmp/entries/1.jpg';
$contentOrFalseOnFailure = file_get_contents($filenameIn);
$byteCountOrFalseOnFailure = file_put_contents($filenameOut, $contentOrFalseOnFailure);
$options = [ "gs" => [ "Content-Type" => "image/jpeg" ]];
$ctx = stream_context_create($options);
file_put_contents("gs://my-storage/entries/2.jpg", $file, 0, $ctx);
echo 'Saved the Image';
}
As you noticed, the dev app server emulates Cloud Storage locally. So, this is the intended behaviour-- and it lets you test without modifying your production storage.
If you run the deployed app you should see the writes actually going to your GCS bucket.

RequestTimeout uploading to S3 using PHP

I am having trouble uploading files to S3 from on one of our servers. We use S3 to store our backups and all of our servers are running Ubuntu 8.04 with PHP 5.2.4 and libcurl 7.18.0. Whenever I try to upload a file Amazon returns a RequestTimeout error. I know there is a bug in our current version of libcurl preventing uploads of over 200MB. For that reason we split our backups into smaller files.
We have servers hosted on Amazon's EC2 and servers hosted on customer's "private clouds" (a VMWare ESX box behind their company firewall). The specific server that I am having trouble with is hosted on a customer's private cloud.
We use the Amazon S3 PHP Class from http://undesigned.org.za/2007/10/22/amazon-s3-php-class. I have tried 200MB, 100MB and 50MB files, all with the same results. We use the following to upload the files:
$s3 = new S3($access_key, $secret_key, false);
$success = $s3->putObjectFile($local_path, $bucket_name,
$remote_name, S3::ACL_PRIVATE);
I have tried setting curl_setopt($curl, CURLOPT_NOPROGRESS, false); to view the progress bar while it uploads the file. The first time I ran it with this option set it worked. However, every subsequent time it has failed. It seems to upload the file at around 3Mb/s for 5-10 seconds then drops to 0. After 20 seconds sitting at 0, Amazon returns the "RequestTimeout - Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed." error.
I have tried updating the S3 class to the latest version from GitHub but it made no difference. I also found the Amazon S3 Stream Wrapper class and gave that a try using the following code:
include 'gs3.php';
define('S3_KEY', 'ACCESSKEYGOESHERE');
define('S3_PRIVATE','SECRETKEYGOESHERE');
$local = fopen('/path/to/backup_id.tar.gz.0000', 'r');
$remote = fopen('s3://bucket-name/customer/backup_id.tar.gz.0000', 'w+r');
$count = 0;
while (!feof($local))
{
$result = fwrite($remote, fread($local, (1024 * 1024)));
if ($result === false)
{
fwrite(STDOUT, $count++.': Unable to write!'."\n");
}
else
{
fwrite(STDOUT, $count++.': Wrote '.$result.' bytes'."\n");
}
}
fclose($local);
fclose($remote);
This code reads the file one MB at a time in order to stream it to S3. For a 50MB file, I get "1: Wrote 1048576 bytes" 49 times (the first number changes each time of course) but on the last iteration of the loop I get an error that says "Notice: fputs(): send of 8192 bytes failed with errno=11 Resource temporarily unavailable in /path/to/http.php on line 230".
My first thought was that this is a networking issue. We called up the customer and explained the issue and asked them to take a look at their firewall to see if they were dropping anything. According to their network administrator the traffic is flowing just fine.
I am at a loss as to what I can do next. I have been running the backups manually and using SCP to transfer them to another machine and upload them. This is obviously not ideal and any help would be greatly appreciated.
Update - 06/23/2011
I have tried many of the options below but they all provided the same result. I have found that even trying to scp a file from the server in question to another server stalls immediately and eventually times out. However, I can use scp to download that same file from another machine. This makes me even more convinced that this is a networking issue on the clients end, any further suggestions would be greatly appreciated.
This problem exists because you are trying to upload the same file again. Example:
$s3 = new S3('XXX','YYYY', false);
$s3->putObjectFile('file.jpg','bucket-name','file.jpg');
$s3->putObjectFile('file.jpg','bucket-name','newname-file.jpg');
To fix it, just copy the file and give it new name then upload it normally.
Example:
$s3 = new S3('XXX','YYYY', false);
$s3->putObjectFile('file.jpg','bucket-name','file.jpg');
now rename file.jpg to newname-file.jpg
$s3->putObjectFile('newname-file.jpg','bucket-name','newname-file.jpg');
I solved this problem in another way. My bug was, that filesize() function returns invalid cached size value. So just use clearstatcache()
I have experienced this exact same issue several times.
I have many scripts right now which are uploading files to S3 constantly.
The best solution that I can offer is to use the Zend libraries (either the stream wrapper or direct S3 API).
http://framework.zend.com/manual/en/zend.service.amazon.s3.html
Since the latest release of Zend framework, I haven't seen any issues with timeouts. But, if you find that you are still having problems, a simple tweak will do the trick.
Simply open the file Zend/Http/Client.php and modify the 'timeout' value in the $config array. At the time of writing this it existed on line 114. Before the latest release I was running at 120 seconds, but now things are running smooth with a 10 second timeout.
Hope this helps!
There are quite a bit of solutions available. I had this exact problem but I don't wanted to write a code and figure out the problem.
Initially I was searching for a possibility to mount S3 bucket in the Linux machine, found something interesting:
s3fs - http://code.google.com/p/s3fs/wiki/InstallationNotes
- this did work for me. It uses FUSE file-system + rsync to sync the files in S3. It kepes a copy of all filenames in the local system & make it look like a FILE/FOLDER.
This saves BUNCH of our time + no headache of writing a code for transferring the files.
Now, when I was trying to see if there is other options, I found a ruby script which works in CLI, can help you manage S3 account.
s3cmd - http://s3tools.org/s3cmd - this looks pretty clear.
[UPDATE]
Found one more CLI tool - s3sync
s3sync - https://forums.aws.amazon.com/thread.jspa?threadID=11975&start=0&tstart=0 - found in the Amazon AWS community.
I don't see both of them different, if you are not worried about the disk-space then I would choose a s3fs than a s3cmd. A disk makes you feel more comfortable + you can see the files in the disk.
Hope it helps.
You should take a look at the AWS PHP SDK. This is the AWS PHP library formerly known as tarzan and cloudfusion.
http://aws.amazon.com/sdkforphp/
The S3 class included with this is rock solid. We use it to upload multi GB files all of the time.

Categories