My EC2 servers are currently hosting a website that logs each registered user's activity under their own separate log file on the local EC2 instance, say username.log. I'm trying to figure out a way to push log events for these to CloudWatch using the PHP SDK without slowing the application down, AND while still being able to maintain a separate log file for each registered member of my website.
I can't for the life of me figure this out:
OPTION 1: How can I log to CloudWatch asynchronously using the CloudWatch SDK? My PHP application is behaving VERY sluggishly, since each log line takes roughly 100ms to push directly to CloudWatch. Code sample is below.
OPTION 2: Alternatively, how could I configure an installed CloudWatch Agent on EC2 to simply OBSERVE all of my log files, which would basically upload them asynchronously to CloudWatch for me in a separate process? The CloudWatch EC2 Logging Agent requires a static "configuration file" (AWS documentation) on your server which, to my knowledge, needs to lists out all of your log files ("log streams") in advance, which I won't be able to predict at the time of server startup. Is there any way around this (ie, simply observe ALL log files in a directory)? Config file sample is below.
All ideas are welcome here, but I don't want my solution to simply be "throw all your logs into a single file, so that your log names are always predictable".
Thanks in advance!!!
OPTION 1: Logging via SDK (takes ~100ms / logEvent):
// Configuration to use for the CloudWatch client
$sharedConfig = [
'region' => 'us-east-1',
'version' => 'latest',
'http' => [
'verify' => false
]
];
// Create a CloudWatch client
$cwClient = new Aws\CloudWatchLogs\CloudWatchLogsClient($sharedConfig);
// DESCRIBE ANY EXISTING LOG STREAMS / FILES
$create_new_stream = true;
$next_sequence_id = "0";
$result = $cwClient->describeLogStreams([
'Descending' => true,
'logGroupName' => 'user_logs',
'LogStreamNamePrefix' => $stream,
]);
// Iterate through the results, looking for a stream that already exists with the intended name
// This is so that we can get the next sequence id ('uploadSequenceToken'), so we can add a line to an existing log file
foreach ($result->get("logStreams") as $stream_temp) {
if ($stream_temp['logStreamName'] == $stream) {
$create_new_stream = false;
if (array_key_exists('uploadSequenceToken', $stream_temp)) {
$next_sequence_id = $stream_temp['uploadSequenceToken'];
}
break;
}
}
// CREATE A NEW LOG STREAM / FILE IF NECESSARY
if ($create_new_stream) {
$result = $cwClient->createLogStream([
'logGroupName' => 'user_logs',
'logStreamName' => $stream,
]);
}
// PUSH A LINE TO THE LOG *** This step ALONE takes 70-100ms!!! ***
$result = $cwClient->putLogEvents([
'logGroupName' => 'user_logs',
'logStreamName' => $stream,
'logEvents' => [
[
'timestamp' => round(microtime(true) * 1000),
'message' => $msg,
],
],
'sequenceToken' => $next_sequence_id
]);
OPTION 2: Logging via CloudWatch Installed Agent (note that config file below only allows hardcoded, predermined log names as far as I know):
[general]
state_file = /var/awslogs/state/agent-state
[applog]
file = /var/www/html/logs/applog.log
log_group_name = PP
log_stream_name = applog.log
datetime_format = %Y-%m-%d %H:%M:%S
Looks like we have some good news now... not sure if it's too late!
CloudWatch Log Configuration
So to answer the doubt,
Is there any way around this (ie, simply observe ALL log files in a directory)?
yes, we can mention log files and file paths using wild cards, which can help you in having some flexibility in configuring from where the logs are fetched and pushed to the log streams.
Related
I have a Google Cloud upload function that works fine with the relatively small files I've been uploading (see code below). But I will soon need to be uploading files up to 500Meg in size and I was looking into the "resumable" upload option. On the Google Documentation it says basically that files over 5Meg and Google just converts the upload to a resumable upload type. But what does that mean? Does that mean I don't have to make any coding changes? Does it mean that if my page times out and I re-invoke the page and start the download again, that Google API will automatically detect that the previous upload failed and will just simply resume the upload where it left off and then only return a valid (non-NULL) storageObject to me once the upload completes?
Here is my current "non-resumable" code:
function uploadFile($bucketName,&$fileContent, $cloudPath) {
$privateKeyFileContent = $GLOBALS['privateKeyFileContent'];
// connect to Google Cloud Storage using private key as authentication
try {
$storage = new StorageClient([
'keyFile' => json_decode($privateKeyFileContent, true)
]);
} catch (Exception $e) {
// maybe invalid private key ?
print $e;
return false;
}
// set which bucket to work in
$bucket = $storage->bucket($bucketName);
$sFileHash = base64_encode(md5($fileContent,true));
$storageObject = $bucket->upload(
$fileContent,
[
'name' => $cloudPath,
'metadata' => ['md5Hash' => $sFileHash]
]
);
return $storageObject; // will be null on failure
}
In the differences between upload() and getResumableUploader(), we can see very strange documentation. As per documentation on Cloud client Library the upload function states:
Upload your data in a simple fashion. Uploads will default to being
resumable if the file size is greater than 5mb.
And as Cloud Storage documentation states:
Resumable uploads are automatically managed on you behalf, but can be
directly controlled using the resumable option.
Which means with your current code you could enable the resumable option for upload() in your code by adding 'resumable' => true. Though i'm not too certain, there's probably something behind the scenes were not seeing and documentation doesnt explain clearly. This example would look something like:
$storageObject = $bucket->upload(
$fileContent,
[
'name' => $cloudPath,
'metadata' => ['md5Hash' => $sFileHash],
'resumable' => true
]
);
I had a look at the source code on github for both of these methods and they have practically the same configuration options but getresumableUpload() contains getResumeUri() which is necessary for resume(), I couldn't seem to find compatibility for normal upload() though I wouldn't rule it out.
$uploader = $bucket->getResumableUploader(
fopen($fileContent, 'r'),
[
'name' => $cloudPath,
'metadata' => ['md5Hash' => $sFileHash]
]
);
try {
$object = $uploader->upload();
} catch (GoogleException $ex) {
//if there is an error it can automatically restart
//$uploader contains 'resumeUri' which is what is used to resume the upload
$resumeUri = $uploader->getResumeUri();
$object = $uploader->resume($resumeUri);
}
resume() handles all of the necessary headers and bytes sent for you to resume an upload.
The case you are describing should be resumable.
All of the following HTTP status responses are considered retryable:
408 Request Timeout
500 Internal Server Error
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
The following HTTP status responses are non-retryable:
404 Not Found
410 Gone
More info on HTTP statuses here
I'm working in a template animation system, so I have different folders in S3 with different files inside (html, imgs, etc.)
What I do is:
I change the folder policy like that:
function changeFolderPolicy($folderPath, $client=null, $public) {
if (!$client) {
$client = getClientS3();
}
$effect = 'Allow';
if (!$public) {
$effect = 'Deny';
}
$policy = json_encode(array(
'Statement' => array(
array(
'Sid' => 'AllowPublicRead',
'Action' => array(
's3:GetObject'
),
'Effect' => $effect,
'Resource' => array(
"arn:aws:s3:::".__bucketS3__."/".$folderPath."*"
),
'Principal' => array(
'AWS' => array(
"*"
)
)
)
)
));
$client->putBucketPolicy(array(
'Bucket' => __bucketS3__,
'Policy' => $policy
));
}
After changing the policy, the frontend gets all the necessary files.
However, sometimes, some files aren't loaded because of a forbidden 403. It's not always the same files, sometimes ara all loaded, sometimes none... I don't have a clue since putBucketPolicy is a synchronous call.
Thank you very much.
First, putBucketPolicy is not exactly synchronous. The validatation of the policy is synchronous but the application of the policy requires a nonspecific amount of time to replicate through the infrastructure.
There is no mechanism exposed for determining whether the policy has propagated.
Second, you're bucket policies in a way that fundamentally makes no sense.
Of course, this setup makes the implicit assumption that only one copy of this code would ever run at the same time, which is usually an unsafe assumption, even if it seems true right now.
But worse... toggling a prefix publicly readable so you can copy those files, then (presumably) putting it back when you're done - - instead of using the service correctly, by using the credentials to sign requests to download individual objects you need - - frankly, if I am correctly understanding what you're doing, here, I am at a loss for words to describe just how wrong this solution is.
This seems comparable to a bank manager securing the bank vault with a bicycle lock instead of using the vault's hardened, high-security, built-in access-control mechanisms because a bicycle lock "is easier to open."
I need to download a large number of large files, stored across multiple identical servers. A file, like '5.doc', that is stored on server 3, is also stored on server 55.
To speed this up, instead of using just one server to download all the files one after another, I'm using all servers at the same time. The problem is that one of the servers may be much slower than the others, or may even be down. When using Guzzle to batch download files, all of the files in that batch must be downloaded before another batch starts.
Is there a way to immediately start downloading another file alongside others so that all of the servers are constantly downloading a file?
If a server is down, I've set a timeout of 300 seconds and when this is reached Guzzle will catch it's ConnectionException.
How do I identify which of the promises (downloads) have failed so I can cancel them? Can I get information about which file/server failed?
Below is a simplified example of the code I'm using to illustrate the point. Thanks for the help!
$filesToDownload = [['5.doc', '8.doc', '10.doc'], ['1.doc', '9.doc']]; //The file names that we need to download
$availableServers = [3, 55, 88]; //Server id's that are available
foreach ($filesToDownload as $index => $fileBatchToDownload) {
$promises = [];
foreach ($availableServers as $key => $availableServer) {
array_push(
$promises, $client->requestAsync('GET', 'http://domain.com/' . $fileBatchToDownload[$index][$key], [
'timeout' => 300,
'sink' => '/assets/' . $fileBatchToDownload[$index][$key]
])
);
$database->updateRecord($fileBatchToDownload[$index][$key], ['is_cached' => 1]);
}
try {
$results = Promise\unwrap($promises);
$results = Promise\settle($promises)->wait();
} catch (\GuzzleHttp\Exception\ConnectException $e) {
//When can't connect to the server or didn't download within timeout
foreach ($e->failed() as $failedPromise) {
//Re-set record in database to is_cached = 0
//Delete file from server
//Remove this server from the $availableServers list as it may be down or too slow
//Re-add this file to the next batch to download $filesToDownload
}
}
}
I'm not sure how you are doing an asynchronous download of one file from multiple servers using Guzzle, but getting array index of failed requests can be done by promise's then() method:
array_push(
$promises,
$client->requestAsync('GET', "http://localhost/file/{$id}", [
'timeout' => 10,
'sink' => "/assets/{$id}"
])->then(function() {
echo 'Success';
},
function() use ($id) {
echo "Failed: $id";
}
)
);
then() accepts two callbacks. First one is triggered on success and the second one on failure. Source calls them $onFullfilled and $onRejected. Other usages are documented in guzzle documentation. This way you can start downloading a file immediately after its failure.
Can I get information about which file/server failed?
When a promise failed then it means request remained unfulfilled. In this case you can get host and requested path by passing an instance of RequestException class to second then()'s callback:
use GuzzleHttp\Exception\RequestException;
.
.
.
array_push(
$promises,
$client->requestAsync('GET', "http://localhost/file/{$id}", [
'timeout' => 10,
'sink' => "/assets/{$id}"
])->then(function() {
echo 'Success';
},
function(RequestException $e) {
echo "Host: ".$e->getRequest()->getUri()->getHost(), "\n";
echo "Path: ".$e->getRequest()->getRequestTarget(), "\n";
}
)
);
So you will have full information about failing host and file's name. If you may need access to more information you should know that $e->getRequest() returns an instance of GuzzleHttp\Psr7\Request class and all methods on this class are available to be used here. (Guzzle and PSR-7)
When an item is successfully downloaded, can we then immediately
start a new file download on this free server, whilst the other files
are still downloading?
I think you should decide to download new files only on creating promises at the very beginning and repeat/renew failed requests within second callback. Trying to make new promises followed by a successful promise may result in an endless process with downloading duplicated files and that's not simple to handle.
I'm trying to use the AWS SDK for PHP to programatically upload a file to a bucket that's set to be a static website in the S3 Console.
The bucket is named foo.ourdomain.com and is hosted in eu-west. I'm using the following code to try and test if I can upload a file:
$client = \Aws\S3\S3Client::factory(array('key' => bla, 'secret' => bla));
$client->upload('foo.ourdomain.com', 'test.txt', 'hello world', 'public-read');
This is pretty much like it is in the examples, however, I received the following exception:
PHP Fatal error: Uncaught Aws\S3\Exception\PermanentRedirectException: AWS Error Code: PermanentRedirect, Status Code: 301, AWS Request ID: -, AWS Error Type: client, AWS Error Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint: "foo.ourdomain.com.s3.amazonaws.com"., User-Agent: aws-sdk-php2/2.4.8 Guzzle/3.7.4 curl/7.22.0 PHP/5.3.10-1ubuntu3.8
At this point I was surprised as there's no mention of this in the manual for the S3 SDK. But okay, I found a method setEndpoint and adjusted the code to:
$client = \Aws\S3\S3Client::factory(array('key' => bla, 'secret' => bla));
$client->setEndpoint('foo.ourdomain.com.s3.amazonaws.com');
$client->upload('foo.ourdomain.com', 'test.txt', 'hello world', 'public-read');
I assumed that'd work, but I'm getting the exact same error. I've doublechecked and the endpoint mentioned in the exception byte-for-byte matches the one I'm setting in the second line.
I've also tried using foo.ourdomain.com.s3-website-eu-west-1.amazonaws.com as the endpoint (this is the host our CNAME points to as per the S3 consoles instructions). Didn't work either.
I must be missing something, but I can't find it anywhere. Perhaps buckets set to 'static website' behave differently in a way which is not currently supported by the SDK? If so, I can't find mention of it in the docs nor in the management console.
Got it. The solution was to change the initialisation of the client to:
$client = \Aws\S3\S3Client::factory(array(
'key' => bla,
'secret' => bla,
'region' => 'eu-west-1'
));
I.e. rather than specify an endpoint I needed to explicitly set the region in the options array. I guess the example code happens to use whatever the default region is.
If you don't want to initialize the client to a specific region and/or you'll need to work with different regions, I have been successful in using the getBucketLocation/setRegion set of calls as follows:
// Bucket location is fetched
$m_bucketLocation = $I_s3->getBucketLocation(array(
'Bucket' => $s_backupBucket,
));
// Bucket location is specified before operation is made
$I_s3->setRegion($m_bucketLocation['Location']);
I have one extra call, but solved my issue without the need to intervene on the factory.
Since I got tired of repetitively clicking/waiting/clicking with Amazon web services GUI interface, I needed an EC2 script to:
Stop the instance specified at bash command line
Detach a specified volume
Create a new a volume from a specified snapshot
Start the instance up again
It can of course be done with the GUI, but its such a pain. This way I can just let the script run for 5 minutes while I get coffee instead of having to attend to it.
Syntax:
php reprovision.php i-xxxx vol-xxxx snap-xxxx
reprovision.php:
<?php
require 'aws.php';
$config = aws_setup();
$ec2Client = \Aws\Ec2\Ec2Client::factory($config);
$stop = $argv[1];
$detach = $argv[2];
$snapshot = $argv[3];
$ec2Client->stopInstances(array('InstanceIds' => array($stop)));
sleep(60);
$ec2Client->detachVolume(array('VolumeId' => $detach));
sleep(10);
$vol = $ec2Client->createVolume(array('SnapshotId' => $snapshot, 'AvailabilityZone' => 'us-east-1a'));
sleep(10);
$ec2Client->attachVolume(array('VolumeId' => $vol->VolumeId, 'InstanceId' => $stop, 'Device' => '/dev/sda1'));
sleep(10);
$ec2Client->startInstances(array('InstanceIds' => array($stop)));
'aws_setup()' gets the configuration array to launch the ec2 client in the next line.
The command-line arguments are then assigned to variables.
The next version of the script would ideally use the EC2 wait functions instead of PHP's 'sleep'.
AWS PHP SDK2 EC2 Client API