How to upload large files (around 10GB) - php

I want to transfer to my Amazon S3 bucket an archive of around 10GB, using a PHP script (it's a backup script).
I actually use the following code :
$uploader = new \Aws\S3\MultipartCopy($s3Client, $tmpFilesBackupDirectory, [
'Bucket' => 'MyBucketName',
'Key' => 'backup'.date('Y-m-d').'.tar.gz',
'StorageClass' => $storageClass,
'Tagging' => 'expiration='.$tagging,
'ServerSideEncryption' => 'AES256',
]);
try
{
$result = $uploader->copy();
echo "Upload complete: {$result['ObjectURL']}\n";
}
catch (Aws\Exception\MultipartUploadException $e)
{
echo $e->getMessage() . "\n";
}
My issue is that after few minutes (let's say 10mn), I receive an error message from the apache server : 504 Gateway timeout.
I understand that this error is related to the configuration of my Apache server, but I don't want to increase the timeout of my server.
My idea is to use the PHP SDK Low-Level API to do the following steps:
Use Aws\S3\S3Client::uploadPart() method in order to manually upload 5 parts, and store the response obtained in $_SESSION (I need the ETag values to complete the upload);
Reload the page using header('Location: xxx');
Perform again the first 2 steps for the next 5 parts, until all parts are uploaded;
Finalise the upload using Aws\S3\S3Client::completeMultipartUpload().
I suppose that this should work but before to use this method, I'd like to know if there is an easier way to achieve my goal, for example by using the high-level API...
Any suggestions?
NOTE : I'm not searching for some existing script : my main goal is to learn how to fix this issue :)
Best regards,
Lionel

Why not just use the AWS CLI to copy the file? You can create a script in the CLI and that way everything is AWS native. (Amazon has a tutorial on that.) You can use the scp command:
scp -i Amazonkey.pem /local/path/backupfile.tar.gz ec2-user#Elastic-IP-of-ec2-2:/path/backupfile.tar.gz
From my perspective, it would be easier to do the work within AWS, which has features to move files and data. If you'd like to use a shell script, this article on automating EC2 backups has a good one, plus more detail on backup options.

To answer my own question (I hope it might help someone one day!), he is how I fixed my issue, step by step:
1/ When I load my page, I check if the archive already exists. If not, I create my .tar.gz file and I reload the page using header().
I noticed that this step was quite slow since there is lot of data to archive. That's why I reload my page to avoid any timeout during the next steps!
2/ If the backup file exists, I use AWS MultipartUpload to send 10 chunks of 100MB each. Everytime that a chunk is sent successfully, I update a session variable ($_SESSION['backup']['partNumber']) to know what is the next chunk that needs to be uploaded.
Once my 10 chunks are sent, I reload the page again to avoid any timeout.
3/ I repeat the second step until the upload of all parts is done, using my session variable to know which part of the upload needs to be sent next.
4/ Finally, I complete the multipart upload and I delete the archive stored locally.
You can of course send more than 10 times 100MB before to reload your page. I chose this value to be sure that I won't reach a timeout even if the download is slow. But I guess I could send easilly around 5GB each time without issue.
Note: You cannot redirect you script to itself too much time. There is a limit (I think it's around 20 times for Chrome and Firefoxbefore to get an error, and more for IE). In my case (the archive is around 10GB), transfering 1GB per reload is fine (the page will be reloaded around 10 times). But it the archive size increases, I'll have to send more chunks each time.
Here is my full script. I could surely be improved, but it's working quite well for now and it may help someone having similar issue!
public function backup()
{
ini_set('max_execution_time', '1800');
ini_set('memory_limit', '1024M');
require ROOT.'/Public/scripts/aws/aws-autoloader.php';
$s3Client = new \Aws\S3\S3Client([
'version' => 'latest',
'region' => 'eu-west-1',
'credentials' => [
'key' => '',
'secret' => '',
],
]);
$tmpDBBackupDirectory = ROOT.'Var/Backups/backup'.date('Y-m-d').'.sql.gz';
if(!file_exists($tmpDBBackupDirectory))
{
$this->cleanInterruptedMultipartUploads($s3Client);
$this->createSQLBackupFile();
$this->uploadSQLBackup($s3Client, $tmpDBBackupDirectory);
}
$tmpFilesBackupDirectory = ROOT.'Var/Backups/backup'.date('Y-m-d').'.tar.gz';
if(!isset($_SESSION['backup']['archiveReady']))
{
$this->createFTPBackupFile();
header('Location: '.CURRENT_URL);
}
$this->uploadFTPBackup($s3Client, $tmpFilesBackupDirectory);
unlink($tmpDBBackupDirectory);
unlink($tmpFilesBackupDirectory);
}
public function createSQLBackupFile()
{
// Backup DB
$tmpDBBackupDirectory = ROOT.'Var/Backups/backup'.date('Y-m-d').'.sql.gz';
if(!file_exists($tmpDBBackupDirectory))
{
$return_var = NULL;
$output = NULL;
$dbLogin = '';
$dbPassword = '';
$dbName = '';
$command = 'mysqldump -u '.$dbLogin.' -p'.$dbPassword.' '.$dbName.' --single-transaction --quick | gzip > '.$tmpDBBackupDirectory;
exec($command, $output, $return_var);
}
return $tmpDBBackupDirectory;
}
public function createFTPBackupFile()
{
// Compacting all files
$tmpFilesBackupDirectory = ROOT.'Var/Backups/backup'.date('Y-m-d').'.tar.gz';
$command = 'tar -cf '.$tmpFilesBackupDirectory.' '.ROOT;
exec($command);
$_SESSION['backup']['archiveReady'] = true;
return $tmpFilesBackupDirectory;
}
public function uploadSQLBackup($s3Client, $tmpDBBackupDirectory)
{
$result = $s3Client->putObject([
'Bucket' => '',
'Key' => 'backup'.date('Y-m-d').'.sql.gz',
'SourceFile' => $tmpDBBackupDirectory,
'StorageClass' => '',
'Tagging' => '',
'ServerSideEncryption' => 'AES256',
]);
}
public function uploadFTPBackup($s3Client, $tmpFilesBackupDirectory)
{
$storageClass = 'STANDARD_IA';
$bucket = '';
$key = 'backup'.date('Y-m-d').'.tar.gz';
$chunkSize = 100 * 1024 * 1024; // 100MB
$reloadFrequency = 10;
if(!isset($_SESSION['backup']['uploadId']))
{
$response = $s3Client->createMultipartUpload([
'Bucket' => $bucket,
'Key' => $key,
'StorageClass' => $storageClass,
'Tagging' => '',
'ServerSideEncryption' => 'AES256',
]);
$_SESSION['backup']['uploadId'] = $response['UploadId'];
$_SESSION['backup']['partNumber'] = 1;
}
$file = fopen($tmpFilesBackupDirectory, 'r');
$parts = array();
//Reading parts already uploaded
for($i = 1; $i < $_SESSION['backup']['partNumber']; $i++)
{
if(!feof($file))
{
fread($file, $chunkSize);
}
}
// Uploading next parts
while(!feof($file))
{
do
{
try
{
$result = $s3Client->uploadPart(array(
'Bucket' => $bucket,
'Key' => $key,
'UploadId' => $_SESSION['backup']['uploadId'],
'PartNumber' => $_SESSION['backup']['partNumber'],
'Body' => fread($file, $chunkSize),
));
}
}
while (!isset($result));
$_SESSION['backup']['parts'][] = array(
'PartNumber' => $_SESSION['backup']['partNumber'],
'ETag' => $result['ETag'],
);
$_SESSION['backup']['partNumber']++;
if($_SESSION['backup']['partNumber'] % $reloadFrequency == 1)
{
header('Location: '.CURRENT_URL);
die;
}
}
fclose($file);
$result = $s3Client->completeMultipartUpload(array(
'Bucket' => $bucket,
'Key' => $key,
'UploadId' => $_SESSION['backup']['uploadId'],
'MultipartUpload' => Array(
'Parts' => $_SESSION['backup']['parts'],
),
));
$url = $result['Location'];
}
public function cleanInterruptedMultipartUploads($s3Client)
{
$tResults = $s3Client->listMultipartUploads(array('Bucket' => ''));
$tResults = $tResults->toArray();
if(isset($tResults['Uploads']))
{
foreach($tResults['Uploads'] AS $result)
{
$s3Client->abortMultipartUpload(array(
'Bucket' => '',
'Key' => $result['Key'],
'UploadId' => $result['UploadId']));
}
}
if(isset($_SESSION['backup']))
{
unset($_SESSION['backup']);
}
}
If someone has questions don't hesitate to contact me :)

Related

update_post_meta and update_field not working without any error received

I am working with a wordpress site that imports all data from API to the site automatically via cron job. However, I'm now on the part of saving the field data from the API. The problem is update_post_meta and update_fields are both not working.
I already tried interchanging between the two methods of saving but both doesn't work. No error prompts and no results as well (which is pretty weird for me). I checked the built-in plugin of the site and it uses update_post_meta.
add_action('wp_ajax_nopriv_get_properties_from_api','get_properties_from_api');
add_action('wp_ajax_get_properties_from_api','get_properties_from_api')
function get_properties_from_api(){
$file = get_stylesheet_directory() . '/report.txt';
$current_page=(! empty($_POST['current_page'])) ? $_POST['current_page'] : 1;
$properties = [];
$results = wp_remote_retrieve_body(wp_remote_get('https://www.realestateview.com.au/listing_api?rm=search&company=castlemain&code=29GKRRSgkVdQVM&CID=5813&json=1&ptr=r&con=S&portalview=residential&rn=1&pg='. $current_page));
file_put_contents($file, "Current page: ". $current_page. "\n\n", FILE_APPEND);
$results=json_decode($results, true);
if(!is_array($results['Listings']) || empty($results['Listings'])){
return false;
}
$properties[]=$results;
foreach($properties[0] as $property){
$property_slug = sanitize_title($property->TitleNoHTML, '-', $property->OrderID);
$inserted_property = wp_insert_post([
'post_name' => $property_slug,
'post_title' => $property->TitleNoHTML,
'post_type' => 'property',
'post_status' => 'publish',
]);
if(is_wp_error($inserted_property)){
continue;
}
$fillable=[
//Basic information
get_the_title($inserted_property) => 'TitleNoHTML',
'REAL_HOMES_property_price' => 'PriceText',
'REAL_HOMES_property_size' => 'LandSizeText',
'REAL_HOMES_property_bedrooms' => 'BedroomsCount',
'REAL_HOMES_property_bathrooms' => 'BathroomsCount',
'REAL_HOMES_property_garage' => 'LockUpGaragesCount',
'REAL_HOMES_featured' => 'FeaturedProperty',
//$this->REAL_HOMES_property_id =
//$this->REAL_HOMES_property_year_built =
//Location on Map
'REAL_HOMES_property_address' => 'AddressText',
'REAL_HOMES_property_location' => 'Suburb',
'REAL_HOMES_property_map' => 'DisplayTrueAddress',
//Gallery
'REAL_HOMES_property_images' => 'PhotoOriginalURL',
//Floor Plans
//$this->inspiry_floor_plan_name =
'inspiry_floor_plan_price' => 'PriceText',
//$this->inspiry_floor_plan_price_postfix =
// $this->inspiry_floor_plan_size =
// $this->inspiry_floor_plan_size_postfix =
'inspiry_floor_plan_bedrooms' => 'BedroomsCount',
'inspiry_floor_plan_bathrooms' =>'BathroomsCount',
// $this->inspiry_floor_plan_descr =
'inspiry_floor_plan_image' => 'FloorplanThumbURL',
//Property Video
'inspiry_video_group_image' => 'PhotoThumbURL',
//$this->inspiry_video_group_title =
'inspiry_video_group_url' => 'VideoURL',
//DEPRECATED FIELDS
// $this->REAL_HOMES_360_virtual_tour =
// $this->REAL_HOMES_tour_video_url_divider =
// $this->REAL_HOMES_tour_video_url =
// $this->REAL_HOMES_tour_video_image =
//Agent
//$this->REAL_HOMES_agent_display_option =
'REAL_HOMES_agents' => 'ContactAgentName',
//Energy Performance
// $this->REAL_HOMES_energy_class =
// $this->REAL_HOMES_energy_performance =
// $this->REAL_HOMES_epc_current_rating =
// $this->REAL_HOMES_epc_potential_rating =
//Misc
// $this->REAL_HOMES_sticky =
// $this->inspiry_property_label =
// $this->inspiry_property_label_color =
// $this->REAL_HOMES_attachments =
'inspiry_property_owner_name' => 'ClientName',
//$this->inspiry_property_owner_contact =
'inspiry_property_owner_address' => 'ClientAddress',
// $this->REAL_HOMES_property_private_note =
// $this->inspiry_message_to_reviewer =
//Homepage slider
// $this->REAL_HOMES_add_in_slider =
// $this->REAL_HOMES_slider_image =
// $this->REAL_HOMES_page_banner_image =
//Additional fields
'inspiry_InspectionDateandStartTime' => 'ISOInspectionStart',
'inspiry_InspectionDateandFinishTime' => 'ISOInspectionFinish',
];
foreach($fillable as $key => $TitleNoHTML){
update_post_meta($inserted_property, $key, $property->$TitleNoHTML);
}
}
$current_page = $current_page + 1;
wp_remote_post(admin_url('admin-ajax.php?action=get_properties_from_api'), [
'blocking' => false,
'sslverify' => false,
'body' => [
'current_page' => $current_page
]
]);
What I'm already expecting is it should be already save some data, if not, it should produce an error but for some weird reason, there isn't. I tried to var_dump some variables and I think it should be working. Would anyone be able to help me find out where I gone wrong?
In order to replace the Wordpress cron with a real cron job you will need to set a cron job which will fetch data from a webpage using wget. First you will need to create a Wordpress page which will contain your PHP code then fetch the content with cron using wget.
The real cron job command will look like this:
wget -q -O - http://yourdomain.com/your_cron_page >/dev/null 2>&1
-q tells wget to operate quietly (ie. to not output the usual status information)
-O /dev/null tells it to output to /dev/null
Keep in mind that everyone can access this page so you might want to set some restrictions.

phpleague flysystem read and write to large file on server

I am using flysystem with IRON IO queue and I am attempting to run a DB query that will be taking ~1.8 million records and while doing 5000 at at time. Here is the error message I am receiving with file sizes of 50+ MB:
PHP Fatal error: Allowed memory size of ########## bytes exhausted
Here are the steps I would like to take:
1) Get the data
2) Turn it into a CSV appropriate string (i.e. implode(',', $dataArray) . "\r\n")
3) Get the file from the server (in this case S3)
4) Read that files' contents and append this new string to it and re-write that content to the S3 file
Here is a brief run down of the code I have:
public function fire($job, $data)
{
// First set the headers and write the initial file to server
$this->filesystem->write($this->filename, implode(',', $this->setHeaders($parameters)) . "\r\n", [
'visibility' => 'public',
'mimetype' => 'text/csv',
]);
// Loop to get new sets of data
$offset = 0;
while ($this->exportResult) {
$this->exportResult = $this->getData($parameters, $offset);
if ($this->exportResult) {
$this->writeToFile($this->exportResult);
$offset += 5000;
}
}
}
private function writeToFile($contentToBeAdded = '')
{
$content = $this->filesystem->read($this->filename);
// Append new data
$content .= $contentToBeAdded;
$this->filesystem->update($this->filename, $content, [
'visibility' => 'public'
]);
}
I'm assuming this is NOT the most efficient? I am going off of these docs:
PHPLeague Flysystem
If anyone can point me in a more appropriate direction, that would be awesome!
Flysystem supports read/write/update stream
Please check latest API https://flysystem.thephpleague.com/api/
$stream = fopen('/path/to/database.backup', 'r+');
$filesystem->writeStream('backups/'.strftime('%G-%m-%d').'.backup', $stream);
// Using write you can also directly set the visibility
$filesystem->writeStream('backups/'.strftime('%G-%m-%d').'.backup', $stream, [
'visibility' => AdapterInterface::VISIBILITY_PRIVATE
]);
if (is_resource($stream)) {
fclose($stream);
}
// Or update a file with stream contents
$filesystem->updateStream('backups/'.strftime('%G-%m-%d').'.backup', $stream);
// Retrieve a read-stream
$stream = $filesystem->readStream('something/is/here.ext');
$contents = stream_get_contents($stream);
fclose($stream);
// Create or overwrite using a stream.
$putStream = tmpfile();
fwrite($putStream, $contents);
rewind($putStream);
$filesystem->putStream('somewhere/here.txt', $putStream);
if (is_resource($putStream)) {
fclose($putStream);
}
If you are working with S3, I would use the AWS SDK for PHP directly to solve this particular problem. Appending to a file is actually very easy using the SDK's S3 streamwrapper, and doesn't force you to read the entire file into memory.
$s3 = \Aws\S3\S3Client::factory($clientConfig);
$s3->registerStreamWrapper();
$appendHandle = fopen("s3://{$bucket}/{$key}", 'a');
fwrite($appendHandle, $data);
fclose($appendHandle);

S3 - need to download particular images from aws and download them in zip

I need to download images from AWS bucket into a local directory and zip download them.
I have tried my code, but can't figure out how will I copy the images into my local directory.
here is my function :
public function commomUpload($contentId,$prop_id)
{
$client = S3Client::factory(array(
'key' => 'my key',
'secret' => '----secret-----',
));
$documentFolderName = 'cms_photos';
$docbucket = "propertiesphotos/$prop_id/$documentFolderName";
$data = $this->Photo->find('all',array('conditions'=>array('Photo.content_id'=>$contentId)));
//pr($data);die;
$split_point = '/';
foreach($data as $row){
$string = $row['Photo']['aws_link'];
$result = array_map('strrev', explode($split_point, strrev($string)));
$imageName = $result[0];
$result = $client->getObject(array(
'Bucket' => $docbucket,
'Key' => $imageName
));
$uploads_dir = '/img/uploads/';
if (!copy($result, $uploads_dir)) {
echo "failed to copy $result...\n";
}
//move_uploaded_file($imageName, "$uploads_dir/$imageName");
}
}
Dont complicate things. There exists a very simple tools called s3cmd. Install it in any platforms.Click here to know more. Once you download the images from the s3,use can either use gzip or zip it using just a bash script. Dont forget to configure your s3cmd. You need to have AWS access key and secret key with you.
you can use Amazon S3 cakephp plugin https://github.com/fullybaked/CakeS3

Set cache time in symfony

I am using file cache in symfony to store my data for some time limit.Below is the code i have written.
$c = new sfFileCache(array('cache_dir' => sfConfig::get('sf_cache_dir').'/function'));
if ($c->has('myarray')) {
$cached = $c->get('myarray');
if (!empty($cached)) {
$data = unserialize($cached);
}
} else {
foreach($queries as $key => $query) {
foreach ($query->fetchArray() As $result) {
$data[] = $result;
}
}
$c->set('myarray',serialize($data));
}
Can anybody tell how to set time limit for file cache in symfony so that the cache will be automatically destroyed after an hour.
Simply:
$c = new sfFileCache(array(
'cache_dir' => sfConfig::get('sf_cache_dir').'/function',
'lifetime' => 3600
));
See the code to learn about other option from sfCache.
just sharing my code for the ones using APC. should be quite the same. i just passed the prefix "query" since i was caching a query.
$cache = new sfAPCCache(array('lifetime' => 600, 'prefix' => 'query'));

Python script can retrieve value from memcache, but PHP script gets empty result

i run a python script whitch cache some data
self.cache.set('test', 'my sample data', 300)
data = self.cache.get('test')
self.p(data)
this program will result in print of 'my sample data' ... everything its good, but when i try to access this key from php
$data = $this->cache->get('test');
print_r($test);
i only get empty result
so i check the server stats
$list = array();
$allSlabs = $this->cache->getExtendedStats('slabs');
$items = $this->cache->getExtendedStats('items');
foreach($allSlabs as $server => $slabs) {
foreach($slabs AS $slabId => $slabMeta) {
$cdump = $this->cache->getExtendedStats('cachedump',(int)$slabId);
foreach($cdump AS $server => $entries) {
if($entries) {
foreach($entries AS $eName => $eData) {
$list[$eName] = array(
'key' => $eName,
'server' => $server,
'slabId' => $slabId,
'detail' => $eData,
'age' => $items[$server]['items'][$slabId]['age'],
);
}
}
}
}
}
ksort($list);
print_r($list);
and this key 'test' is there ... but i cannot access it
if i cache something in php i get the result everytime, but somehow this python + php cache wont work
if someone has an idea where could be a problem plese advice ... i try everything
Could it be that the hashes don't match between PHP and Python? A solution is here: http://www.ruturaj.net/python-php-memcache-hash
Add the following to your Python script to change how hashes are calculated...
import memcache
import binascii
m = memcache.Client(['192.168.28.7:11211', '192.168.28.8:11211
', '192.168.28.9:11211'])
def php_hash(key):
return (binascii.crc32(key) >> 16) & 0x7fff
for i in range(30):
key = 'key' + str(i)
a = m.get((php_hash(key), key))
print i, a

Categories