phpleague flysystem read and write to large file on server

phpleague flysystem read and write to large file on server - php

I am using flysystem with IRON IO queue and I am attempting to run a DB query that will be taking ~1.8 million records and while doing 5000 at at time. Here is the error message I am receiving with file sizes of 50+ MB:
PHP Fatal error: Allowed memory size of ########## bytes exhausted
Here are the steps I would like to take:
1) Get the data
2) Turn it into a CSV appropriate string (i.e. implode(',', $dataArray) . "\r\n")
3) Get the file from the server (in this case S3)
4) Read that files' contents and append this new string to it and re-write that content to the S3 file
Here is a brief run down of the code I have:
public function fire($job, $data)
{
// First set the headers and write the initial file to server
$this->filesystem->write($this->filename, implode(',', $this->setHeaders($parameters)) . "\r\n", [
'visibility' => 'public',
'mimetype' => 'text/csv',
]);
// Loop to get new sets of data
$offset = 0;
while ($this->exportResult) {
$this->exportResult = $this->getData($parameters, $offset);
if ($this->exportResult) {
$this->writeToFile($this->exportResult);
$offset += 5000;
}
}
}
private function writeToFile($contentToBeAdded = '')
{
$content = $this->filesystem->read($this->filename);
// Append new data
$content .= $contentToBeAdded;
$this->filesystem->update($this->filename, $content, [
'visibility' => 'public'
]);
}
I'm assuming this is NOT the most efficient? I am going off of these docs:
PHPLeague Flysystem
If anyone can point me in a more appropriate direction, that would be awesome!

Flysystem supports read/write/update stream
Please check latest API https://flysystem.thephpleague.com/api/
$stream = fopen('/path/to/database.backup', 'r+');
$filesystem->writeStream('backups/'.strftime('%G-%m-%d').'.backup', $stream);
// Using write you can also directly set the visibility
$filesystem->writeStream('backups/'.strftime('%G-%m-%d').'.backup', $stream, [
'visibility' => AdapterInterface::VISIBILITY_PRIVATE
]);
if (is_resource($stream)) {
fclose($stream);
}
// Or update a file with stream contents
$filesystem->updateStream('backups/'.strftime('%G-%m-%d').'.backup', $stream);
// Retrieve a read-stream
$stream = $filesystem->readStream('something/is/here.ext');
$contents = stream_get_contents($stream);
fclose($stream);
// Create or overwrite using a stream.
$putStream = tmpfile();
fwrite($putStream, $contents);
rewind($putStream);
$filesystem->putStream('somewhere/here.txt', $putStream);
if (is_resource($putStream)) {
fclose($putStream);
}

If you are working with S3, I would use the AWS SDK for PHP directly to solve this particular problem. Appending to a file is actually very easy using the SDK's S3 streamwrapper, and doesn't force you to read the entire file into memory.
$s3 = \Aws\S3\S3Client::factory($clientConfig);
$s3->registerStreamWrapper();
$appendHandle = fopen("s3://{$bucket}/{$key}", 'a');
fwrite($appendHandle, $data);
fclose($appendHandle);

Related

Read the csv file content

I want to read the csv file content using php, google drive api v3
I got the fileid and file name but I am not sure how I can read the file content?
$service = new Drive($client);
$results = $service->files->listFiles();
$fileId="1I****************";
$file = $service->files->get($fileId);

The google drive api is a file storage api. It allows you to upload, download and manage storage of files.
It does not give you access to the contents of the file.
To do this you would need to download the file and open it locally.
Alternately since its a csv file you may want to consider converting it to a google sheet then you could use the google sheets api to access the data within the file programmatically.
Code for downloading a file from Google drive api would look something like this
Full sample can be found here large-file-download.php
// If this is a POST, download the file
if ($_SERVER['REQUEST_METHOD'] == 'POST') {
// Determine the file's size and ID
$fileId = $files[0]->id;
$fileSize = intval($files[0]->size);
// Get the authorized Guzzle HTTP client
$http = $client->authorize();
// Open a file for writing
$fp = fopen('Big File (downloaded)', 'w');
// Download in 1 MB chunks
$chunkSizeBytes = 1 * 1024 * 1024;
$chunkStart = 0;
// Iterate over each chunk and write it to our file
while ($chunkStart < $fileSize) {
$chunkEnd = $chunkStart + $chunkSizeBytes;
$response = $http->request(
'GET',
sprintf('/drive/v3/files/%s', $fileId),
[
'query' => ['alt' => 'media'],
'headers' => [
'Range' => sprintf('bytes=%s-%s', $chunkStart, $chunkEnd)
]
]
);
$chunkStart = $chunkEnd + 1;
fwrite($fp, $response->getBody()->getContents());
}
// close the file pointer
fclose($fp);

How to upload large files (around 10GB)

I want to transfer to my Amazon S3 bucket an archive of around 10GB, using a PHP script (it's a backup script).
I actually use the following code :
$uploader = new \Aws\S3\MultipartCopy($s3Client, $tmpFilesBackupDirectory, [
'Bucket' => 'MyBucketName',
'Key' => 'backup'.date('Y-m-d').'.tar.gz',
'StorageClass' => $storageClass,
'Tagging' => 'expiration='.$tagging,
'ServerSideEncryption' => 'AES256',
]);
try
{
$result = $uploader->copy();
echo "Upload complete: {$result['ObjectURL']}\n";
}
catch (Aws\Exception\MultipartUploadException $e)
{
echo $e->getMessage() . "\n";
}
My issue is that after few minutes (let's say 10mn), I receive an error message from the apache server : 504 Gateway timeout.
I understand that this error is related to the configuration of my Apache server, but I don't want to increase the timeout of my server.
My idea is to use the PHP SDK Low-Level API to do the following steps:
Use Aws\S3\S3Client::uploadPart() method in order to manually upload 5 parts, and store the response obtained in $_SESSION (I need the ETag values to complete the upload);
Reload the page using header('Location: xxx');
Perform again the first 2 steps for the next 5 parts, until all parts are uploaded;
Finalise the upload using Aws\S3\S3Client::completeMultipartUpload().
I suppose that this should work but before to use this method, I'd like to know if there is an easier way to achieve my goal, for example by using the high-level API...
Any suggestions?
NOTE : I'm not searching for some existing script : my main goal is to learn how to fix this issue :)
Best regards,
Lionel

Why not just use the AWS CLI to copy the file? You can create a script in the CLI and that way everything is AWS native. (Amazon has a tutorial on that.) You can use the scp command:
scp -i Amazonkey.pem /local/path/backupfile.tar.gz ec2-user#Elastic-IP-of-ec2-2:/path/backupfile.tar.gz
From my perspective, it would be easier to do the work within AWS, which has features to move files and data. If you'd like to use a shell script, this article on automating EC2 backups has a good one, plus more detail on backup options.

To answer my own question (I hope it might help someone one day!), he is how I fixed my issue, step by step:
1/ When I load my page, I check if the archive already exists. If not, I create my .tar.gz file and I reload the page using header().
I noticed that this step was quite slow since there is lot of data to archive. That's why I reload my page to avoid any timeout during the next steps!
2/ If the backup file exists, I use AWS MultipartUpload to send 10 chunks of 100MB each. Everytime that a chunk is sent successfully, I update a session variable ($_SESSION['backup']['partNumber']) to know what is the next chunk that needs to be uploaded.
Once my 10 chunks are sent, I reload the page again to avoid any timeout.
3/ I repeat the second step until the upload of all parts is done, using my session variable to know which part of the upload needs to be sent next.
4/ Finally, I complete the multipart upload and I delete the archive stored locally.
You can of course send more than 10 times 100MB before to reload your page. I chose this value to be sure that I won't reach a timeout even if the download is slow. But I guess I could send easilly around 5GB each time without issue.
Note: You cannot redirect you script to itself too much time. There is a limit (I think it's around 20 times for Chrome and Firefoxbefore to get an error, and more for IE). In my case (the archive is around 10GB), transfering 1GB per reload is fine (the page will be reloaded around 10 times). But it the archive size increases, I'll have to send more chunks each time.
Here is my full script. I could surely be improved, but it's working quite well for now and it may help someone having similar issue!
public function backup()
{
ini_set('max_execution_time', '1800');
ini_set('memory_limit', '1024M');
require ROOT.'/Public/scripts/aws/aws-autoloader.php';
$s3Client = new \Aws\S3\S3Client([
'version' => 'latest',
'region' => 'eu-west-1',
'credentials' => [
'key' => '',
'secret' => '',
],
]);
$tmpDBBackupDirectory = ROOT.'Var/Backups/backup'.date('Y-m-d').'.sql.gz';
if(!file_exists($tmpDBBackupDirectory))
{
$this->cleanInterruptedMultipartUploads($s3Client);
$this->createSQLBackupFile();
$this->uploadSQLBackup($s3Client, $tmpDBBackupDirectory);
}
$tmpFilesBackupDirectory = ROOT.'Var/Backups/backup'.date('Y-m-d').'.tar.gz';
if(!isset($_SESSION['backup']['archiveReady']))
{
$this->createFTPBackupFile();
header('Location: '.CURRENT_URL);
}
$this->uploadFTPBackup($s3Client, $tmpFilesBackupDirectory);
unlink($tmpDBBackupDirectory);
unlink($tmpFilesBackupDirectory);
}
public function createSQLBackupFile()
{
// Backup DB
$tmpDBBackupDirectory = ROOT.'Var/Backups/backup'.date('Y-m-d').'.sql.gz';
if(!file_exists($tmpDBBackupDirectory))
{
$return_var = NULL;
$output = NULL;
$dbLogin = '';
$dbPassword = '';
$dbName = '';
$command = 'mysqldump -u '.$dbLogin.' -p'.$dbPassword.' '.$dbName.' --single-transaction --quick | gzip > '.$tmpDBBackupDirectory;
exec($command, $output, $return_var);
}
return $tmpDBBackupDirectory;
}
public function createFTPBackupFile()
{
// Compacting all files
$tmpFilesBackupDirectory = ROOT.'Var/Backups/backup'.date('Y-m-d').'.tar.gz';
$command = 'tar -cf '.$tmpFilesBackupDirectory.' '.ROOT;
exec($command);
$_SESSION['backup']['archiveReady'] = true;
return $tmpFilesBackupDirectory;
}
public function uploadSQLBackup($s3Client, $tmpDBBackupDirectory)
{
$result = $s3Client->putObject([
'Bucket' => '',
'Key' => 'backup'.date('Y-m-d').'.sql.gz',
'SourceFile' => $tmpDBBackupDirectory,
'StorageClass' => '',
'Tagging' => '',
'ServerSideEncryption' => 'AES256',
]);
}
public function uploadFTPBackup($s3Client, $tmpFilesBackupDirectory)
{
$storageClass = 'STANDARD_IA';
$bucket = '';
$key = 'backup'.date('Y-m-d').'.tar.gz';
$chunkSize = 100 * 1024 * 1024; // 100MB
$reloadFrequency = 10;
if(!isset($_SESSION['backup']['uploadId']))
{
$response = $s3Client->createMultipartUpload([
'Bucket' => $bucket,
'Key' => $key,
'StorageClass' => $storageClass,
'Tagging' => '',
'ServerSideEncryption' => 'AES256',
]);
$_SESSION['backup']['uploadId'] = $response['UploadId'];
$_SESSION['backup']['partNumber'] = 1;
}
$file = fopen($tmpFilesBackupDirectory, 'r');
$parts = array();
//Reading parts already uploaded
for($i = 1; $i < $_SESSION['backup']['partNumber']; $i++)
{
if(!feof($file))
{
fread($file, $chunkSize);
}
}
// Uploading next parts
while(!feof($file))
{
do
{
try
{
$result = $s3Client->uploadPart(array(
'Bucket' => $bucket,
'Key' => $key,
'UploadId' => $_SESSION['backup']['uploadId'],
'PartNumber' => $_SESSION['backup']['partNumber'],
'Body' => fread($file, $chunkSize),
));
}
}
while (!isset($result));
$_SESSION['backup']['parts'][] = array(
'PartNumber' => $_SESSION['backup']['partNumber'],
'ETag' => $result['ETag'],
);
$_SESSION['backup']['partNumber']++;
if($_SESSION['backup']['partNumber'] % $reloadFrequency == 1)
{
header('Location: '.CURRENT_URL);
die;
}
}
fclose($file);
$result = $s3Client->completeMultipartUpload(array(
'Bucket' => $bucket,
'Key' => $key,
'UploadId' => $_SESSION['backup']['uploadId'],
'MultipartUpload' => Array(
'Parts' => $_SESSION['backup']['parts'],
),
));
$url = $result['Location'];
}
public function cleanInterruptedMultipartUploads($s3Client)
{
$tResults = $s3Client->listMultipartUploads(array('Bucket' => ''));
$tResults = $tResults->toArray();
if(isset($tResults['Uploads']))
{
foreach($tResults['Uploads'] AS $result)
{
$s3Client->abortMultipartUpload(array(
'Bucket' => '',
'Key' => $result['Key'],
'UploadId' => $result['UploadId']));
}
}
if(isset($_SESSION['backup']))
{
unset($_SESSION['backup']);
}
}
If someone has questions don't hesitate to contact me :)

AWS S3 How to read a .gz object without downloading PHP

My S3 contains .gz objects that contain JSON within. I simply want to access this JSON without actually downloading objects to a file.
$iterator = $client->getIterator('ListObjects', array(
'Bucket' => $bucket
));
foreach ($iterator as $object) {
$object = $object['Key'];
$result = $client->getObject(array(
'Bucket' => $bucket,
'Key' => $object
));
echo $result['Body'] . "\n";
}
When I run the above in the shell it outputs gibberish on the echo line. What's the correct way to simply retrieve the contents of the .gz object and save to a variable?
Thank you

You can use stream wrapper like this.
$client->registerStreamWrapper();
if ($stream = fopen('s3://bucket/key.gz', 'r')) {
// While the stream is still open
while (!feof($stream)) {
// Read 1024 bytes from the stream
$d = fread($stream, 1024);
echo zlib_decode($d);
}
// Be sure to close the stream resource when you're done with it
fclose($stream);
}
If you are sending it to a browser you dont need to zlib_decode it, just set a header:
header('Content-Encoding: gzip');

Make requests and handle responses for resumable upload: Google Drive api [php]

I'm using the API Client Library for PHP (Beta) to work with google drive api, So far I can authorize and and upload a file in chuncks.
According to the documentation, these three steps should be taken to upload a file:
Start a resumable session.
Save the resumable session URI.
Upload the file.
Which I think the Client Library handles.
Again, according to the documentation, if I want to show the progress or resume an interrupted upload, or to handle errors I need to capture the response and also be able to send requests like this:
> PUT {session_uri} HTTP/1.1 Content-Length: 0 Content-Range: bytes
> */2000000
But I have no idea how should I make such request and where can I get the response from, the php code I'm using to upload,like any other php code, only returns values when it is done executing, which is when the upload is done.
here is the function I'm using to upload files (Resumable):
function uploadFile($service,$client,$filetoUpload,$parentId){
$file = new Google_Service_Drive_DriveFile();
$file->title = $filetoUpload['name'];
$chunkSizeBytes = 1 * 1024 * 1024;
// Set the parent folder.
if ($parentId != null) {
$parent = new Google_Service_Drive_ParentReference();
$parent->setId($parentId);
$file->setParents(array($parent));
}
// Call the API with the media upload, defer so it doesn't immediately return.
$client->setDefer(true);
$request = $service->files->insert($file);
// Create a media file upload to represent our upload process.
$media = new Google_Http_MediaFileUpload(
$client,
$request,
$filetoUpload['type'],
null,
true,
$chunkSizeBytes
);
$media->setFileSize(filesize($filetoUpload['tmp_name']));
// Upload the various chunks. $status will be false until the process is
// complete.
$status = false;
$handle = fopen($filetoUpload['tmp_name'], "rb");
while (!$status && !feof($handle)) {
set_time_limit(120);
$chunk = fread($handle, $chunkSizeBytes);
$status = $media->nextChunk($chunk);
}
// The final value of $status will be the data from the API for the object
// that has been uploaded.
$result = false;
if($status != false) {
$result = $status;
set_time_limit(30);
echo "<pre>";
print_r($result);
}
fclose($handle);
// Reset to the client to execute requests immediately in the future.
$client->setDefer(false);
}
Should i make a separate php file to handle these requests?
if so, How should tell that which file's status I'm requesting for?
Thanks.

Apperantly, the BETA Client API simply doesn't support resuming Uploads.
Please see this issue on Github, which asks for fixing this. Of course it should be easy to modify the class (see below) and create a Pull-request to enable support for resuming existing uploads when the session-URL is supplied.
However, there's an easy way to get the progress after an chunk has been uploaded.
The Google_Http_MediaFileUpload-object ($media in your example) has a public method called 'getProgress' which can be called anytime.
(Please have a look at the source code of the API-client-library).
To get the upload status, I'd add a parameter to modify the progress precision by adjusting the chunk size. Since the more chunks are used, the more protocol overhead is generated, setting the precision as low as possible should be avoided.
Therefore, you could modify your source code as below to output the progress after each chunk:
function uploadFile($service,$client,$filetoUpload,$parentId,$progressPrecision = 1){
$file = new Google_Service_Drive_DriveFile();
$file->title = $filetoUpload['name'];
$filesize = filesize($filetoUpload['tmp_name']);
//minimum chunk size needs to be 256K
$chunkSizeBytes = min( $filesize / 100 * $progressPrecision, 262144);
// Set the parent folder.
if ($parentId != null) {
$parent = new Google_Service_Drive_ParentReference();
$parent->setId($parentId);
$file->setParents(array($parent));
}
// Call the API with the media upload, defer so it doesn't immediately return.
$client->setDefer(true);
$request = $service->files->insert($file);
// Create a media file upload to represent our upload process.
$media = new Google_Http_MediaFileUpload(
$client,
$request,
$filetoUpload['type'],
null,
true,
$chunkSizeBytes
);
$media->setFileSize($filesize);
…
while (!$status && !feof($handle)) {
set_time_limit(120);
$chunk = fread($handle, $chunkSizeBytes);
$status = $media->nextChunk($chunk);
if(!$status){ //nextChunk() returns 'false' whenever the upload is still in progress
echo 'sucessfully uploaded file up to byte ' . $media->getProgress() .
' which is ' . ( $media->getProgress() / $chunkSizeBytes ) . '% of the whole file';
}
}
Hope this helps. I'll see if I can find some time to add resume-support to the client Library.
EDIT: according to this doc, the chunks need to be at least 256KB big. Changed in code.
EDIT2: I just added a Pull request To add the Resume-Feature. If it gets rejected you could still decide whether it would be ok for you to modify/extend the client. If it gets accepted, just use store the return value of $media->getResumeUri() in a database, and later call $media->resume($previously_stored_return_value) after instantiation to resume the process.

On the first instance, make an upload call and get the resumeUri:
$chunk = fread($handle, 1*1024*1024);
$result = $media->nextChunk($chunk);
$resumeUri = $media->getResumeUri();
$offset = ftell($handle);
On the second instance use the same resumeUri to resume the upload from where we left out by calling resume() function before nextChunk() function:
if ($resumeUri) {
$media->resume($resumeUri);
}
fseek($handle, $offset);
$chunk = fread($handle, 1*1024*1024);
$result = $media->nextChunk($chunk);
Repeat the process until the $result variable has truthy value.

File from $_POST and Amazon S3

i have problem with amazon s3 service and a web service writed with php.
This WS receive from $_POST a file base64encoded. I need to take this "string" and save to Amazon S3 bucket.
I didn't find a right solution for do that, and after a week of work I'm looking for help here.
//$file = 'kdK9IWUAAAAdaVRYdENvbW1lbnQAAAAAAENyZWF0ZWQgd'
$file = $_POST['some_file'];
$opt = array(
'fileUpload' => base64_decode($file),
'acl' => AmazonS3::ACL_PUBLIC
);
$s3 = new AmazonS3(AWS_KEY, AWS_SECRET_KEY);
$response = $s3->create_object($bucket, $filename, $opt);
Thanks

Per the docs, fileUpload expects a URL or path, or an fopen resource.
fileUpload - string|resource - Required; Conditional - The URL/path for the file to upload, or an open resource. Either this parameter or body is required.
You should pass the decoded file data via the body parameter instead:
body - string - Required; Conditional - The data to be stored in the object. Either this parameter or fileUpload must be specified.

The imagestring is the base64 encoded string that was passed from POST data. You can encode this image and make a file to anywhere - in my case /photos. I used this is a school server, but if Amazon server is similar enough, it could also work there.
$values = array();
foreach($_POST as $k => $v){
$values[$k] = $v;
}
$imagestring = $values['stringEncodedImage'];
$img = imagecreatefromstring(base64_decode($imagestring));
if($img != false)
{
echo 'making the file!';
imagejpeg($img, 'photos/'.$file_name);
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

phpleague flysystem read and write to large file on server - php

Related

Read the csv file content

How to upload large files (around 10GB)

AWS S3 How to read a .gz object without downloading PHP

Make requests and handle responses for resumable upload: Google Drive api [php]

File from $_POST and Amazon S3

Categories

Resources