PHP AWS Athena: Need to execute queries against athena - php

I need to run queries against AWS Athena from one of my PHP applications. I have used the documentation from AWS as well as another forum to try and compile the code I need to achieve this. Can you please go through the code and validate/comment/correct where necessary? Most of the code makes sense to me except for the waitForSucceeded() function? I have never seen a function defined this way?
require "/var/www/app/vendor/autoload.php";
use Aws\Athena\AthenaClient;
$options = [
'version' => 'latest',
'region' => 'eu-north-1',
'credentials' => [
'key' => '12345',
'secret' => '12345'
];
$athenaClient = new Aws\Athena\AthenaClient($options);
$databaseName = 'database';
$catalog = 'AwsDataCTLG';
$sql = 'select * from database limit 3';
$outputS3Location = 's3://BUCKET_NAME/';
$startQueryResponse = $athenaClient->startQueryExecution([
'QueryExecutionContext' => [
'Catalog' => $catalog,
'Database' => $databaseName
],
'QueryString' => $sql,
'ResultConfiguration' => [
'OutputLocation' => $outputS3Location
]
]);
$queryExecutionId = $startQueryResponse->get('QueryExecutionId');
var_dump($queryExecutionId);
$waitForSucceeded = function () use ($athenaClient, $queryExecutionId, &$waitForSucceeded) {
$getQueryExecutionResponse = $athenaClient->getQueryExecution([
'QueryExecutionId' => $queryExecutionId
]);
$status = $getQueryExecutionResponse->get('QueryExecution')['Status']['State'];
print("[waitForSucceeded] State=$status\n");
return $status === 'SUCCEEDED' || $waitForSucceeded();
};
$waitForSucceeded();
$getQueryResultsResponse = $athenaClient->getQueryResults([
'QueryExecutionId' => $queryExecutionId
]);
var_dump($getQueryResultsResponse->get('ResultSet'));

From what is can see, it should work properly. What log do you have on execution?
waitForSucceeded() is a closure, aka anopnymous function.
You can find some documentation/ detail here:
https://www.php.net/manual/fr/functions.anonymous.php
https://www.php.net/manual/fr/class.closure.php
So here is what the closure do:
// Declare your closure and inject scope that will be use inside
$waitForSucceeded = function () use ($athenaClient, $queryExecutionId, &$waitForSucceeded) {
$getQueryExecutionResponse = $athenaClient->getQueryExecution([
'QueryExecutionId' => $queryExecutionId
]);
$status = $getQueryExecutionResponse->get('QueryExecution')['Status']['State'];
print("[waitForSucceeded] State=$status\n");
// If status = SUCCEEDED, return some result, else relaunch the function
return $status === 'SUCCEEDED' || $waitForSucceeded();
};
// Launch the function, which must return true when $status === 'SUCCEEDED'
$waitForSucceeded();
$getQueryResultsResponse = $athenaClient->getQueryResults([
'QueryExecutionId' => $queryExecutionId
]);
var_dump($getQueryResultsResponse->get('ResultSet'));

Related

How to get collection class object from mongodb/driver/manager in PHP

I am using the (current? not sure, php documentation is very opaque to me) method to connect to a MongoDB from PHP:
$manager = new MongoDB\Driver\Manager("mongodb://{$user}:{$pwd}#{$url}", array("ssl" => true), array("context" => $ctx));
From there, if I want to write something I do the following:
$bson = MongoDB\BSON\fromJSON($newData);
$value = MongoDB\BSON\toPHP($bson);
$bulk = new MongoDB\Driver\BulkWrite;
$bulk->update(
$filter,
['$set' => $value],
['multi' => false, 'upsert' => $upsert]
);
$results = $manager->executeBulkWrite("$DB.$collection", $bulk);
var_dump($results);
All the documentation on the MongoDB PHP tutorials starts with a $collection object... and the functions thereafter seem much more user-friendly (getInsertedID... insertOne...find...findOne...etc).
For example:
<?php
$collection = (new MongoDB\Client)->test->users;
$insertManyResult = $collection->insertMany([
[
'username' => 'admin',
'email' => 'admin#example.com',
'name' => 'Admin User',
],
[
'username' => 'test',
'email' => 'test#example.com',
'name' => 'Test User',
],
]);
printf("Inserted %d document(s)\n", $insertManyResult->getInsertedCount());
var_dump($insertManyResult->getInsertedIds());
It is not clear to me, how they are actually connecting to the DB... how would I go from the $manager connection to a $collection?
On the MongoDB PHP documentation page, it says 'You can construct collections directly using the driver’s MongoDB\Driver\Manager class'. Unfortunately, a search on the resulting page doesn't include the word 'collection' other than as a side comment in a user contributed note'
Elsewhere on the MongoDB PHP reference pages, I see nowhere that the MongoDB\Manager class is described.
So, how do I get access to the many features in the MongoDB\Collection class?
I was not able to get a collection out of the Manager class, however, I was able to use the bulkWrite class to execute an insert in a secure fashion (I believe). I expect the same pattern will work for reads and updates as well.
Code snippet for those that come here after me:
//echo "Specify the cert...";
$SSL_DIR = ".";
$SSL_FILE = "XXXXXX.pem";
$ctx = stream_context_create(array(
"ssl" => array(
"cafile" => $SSL_DIR . "/" . $SSL_FILE,
))
);
//echo "Done\n";
// echo "Creating manager...";
$manager = new MongoDB\Driver\Manager("mongodb://{$user}:{$pwd}#{$url}", array("ssl" => true), array("context" => $ctx));
// echo "Done!\n";
// echo "Making BSON...";
$bson = MongoDB\BSON\fromJSON($newData);
// echo "Done!\nMaking Value...";
$value = MongoDB\BSON\toPHP($bson);
$value->_id = (string) new MongoDB\BSON\ObjectID;
// echo "Done!\nMaking Bulk...";
$bulk = new MongoDB\Driver\BulkWrite;
$bulk->insert($value);
// echo "Done!\nExecuting Bulk Write";
$results = $manager->executeBulkWrite("$db.$collection", $bulk);
if($results->getInsertedCount()==1) {
echo $value->_id;
} else {
echo $results->getWriteErrors();
}
// echo "Done!\n";

HTTP Guzzle not returning all data

I have created a function that contacts a remote API using Guzzle but I cannot get it to return all of the data available.
I call the function here:
$arr = array(
'skip' => 0,
'take' => 1000,
);
$sims = api_request('sims', $arr);
And here is the function, where I have tried the following in my $response variable
json_decode($x->getBody(), true)
json_decode($x->getBody()->getContents(), true)
But neither has shown any more records. It returns 10 records, and I know there are over 51 available that it should be returning.
use GuzzleHttp\Client;
function api_request($url, $vars = array(), $type = 'GET') {
$username = '***';
$password = '***';
//use GuzzleHttp\Client;
$client = new Client([
'auth' => [$username, $password],
]);
$auth_header = 'Basic '.$username.':'.$password;
$headers = ['Authorization' => $auth_header, 'Content-Type' => 'application/json'];
$json_data = json_encode($vars);
$end_point = 'https://simportal-api.azurewebsites.net/api/v1/';
try {
$x = $client->request($type, $end_point.$url, ['headers' => $headers, 'body' => $json_data]);
$response = array(
'success' => true,
'response' => // SEE ABOVE //
);
} catch (GuzzleHttp\Exception\ClientException $e) {
$response = array(
'success' => false,
'errors' => json_decode($e->getResponse()->getBody(true)),
);
}
return $response;
}
By reading the documentation on https://simportal-api.azurewebsites.net/Help/Api/GET-api-v1-sims_search_skip_take I assume that the server is not accepting your parameters in the body of that GET request and assuming the default of 10, as it is normal in many applications, get requests tend to only use query string parameters.
In that function I'd try to change it in order to send a body in case of a POST/PUT/PATCH request, and a "query" without json_encode in case of a GET/DELETE request. Example from guzzle documentation:
$client->request('GET', 'http://httpbin.org', [
'query' => ['foo' => 'bar']
]);
Source: https://docs.guzzlephp.org/en/stable/quickstart.html#query-string-parameters

kinesis "getShardIterator" gets stuck

I have a weird problem when fetching items from kinesis.
So when I have events in the stream, and Im querying for them with the correct timestamp, I get the result.
But if I don't have events in the stream
or
if I'm querying for times where not event are there
The call getShardIterator gets stuck for several minutes.
That's why I added the "timeout" of 2 seconds.
Is there a better way to just get an empty response from kinesis if no events found?
Thanks
<?php
$getKinessisClient = new GetKinessisClient();
$kinesisClient = $getKinessisClient->get([
]);
$params = [
'credentials' => array(
'key' => 'xxxx',
'secret' => 'xxxx',
),
'region' => 'xxxx',
'version' => 'latest',
'http' => [
'timeout' => 2
]
];
$kinesisClient = (new Sdk())->createKinesis($params);
// get all shard ids
$res = $kinesisClient->describeStream([ 'StreamName' => $streamName ]);
$shardIds = $res->search('StreamDescription.Shards[].ShardId');
$foundItems = [];
foreach ($shardIds as $shardId) {
try {
$getShardItParams = [
'ShardId' => $shardId,
'StreamName' => $streamName,
'ShardIteratorType' => 'AT_TIMESTAMP',
'Timestamp' => $from_timestamp, //PROBLEM HERE
];
// this gets stuck (without timeout)
$res = $kinesisClient->getShardIterator($getShardItParams);

unmarshalItem DynamoDB & PHP not working

I wanted to unmarshal the dynamodb scan query response and here is my code
$client = $this->getClient();
$result = $client->scan([
'ExpressionAttributeValues' => [
':v1' => [
'S' => "200",
],
],
'FilterExpression' => 'id = :v1',
'ProjectionExpression' => "entryStamp",
'TableName' => $this->table,
]);
return $this->unmarshalItem($result['Items']);
It returns error "Unexpected type: entryStamp."
I was searching for this myself and it doesn't seem possible at the moment.
I didn't find anything specifically about PHP but this thread describe the exact same problem with GO.
So the best way to go about it is to do what Saurabh advised in his comment:
$result = $this->client->query($params);
$data = [];
foreach( $result['Items'] as $item)
{
$data[] = $marshaler->unmarshalItem($item);
}
return $data;

How do I configure default query parameters with Guzzle 6?

Migrating from 5 to 6, and I've run into a snag and can't find the relevant docs.
Guzzle docs here, http://guzzle.readthedocs.io/en/latest/quickstart.html#creating-a-client, site that we can add "any number of default request options".
I want to send "foo=bar" with every request. E.g.:
$client = new Client([
'base_uri' => 'http://google.com',
]);
$client->get('this/that.json', [
'query' => [ 'a' => 'b' ],
]);
This will generate GET on http://google.com/this/that.json?a=b
How do I modify the client construction so that it yields:
http://google.com/this/that.json?foo=bar&a=b
Thanks for your help!
Alright, so far, this works here:
$extraParams = [
'a' => $config['a'],
'b' => $config['b'],
];
$handler = HandlerStack::create();
$handler->push(Middleware::mapRequest(function (RequestInterface $request) use ($extraParams) {
$uri = $request->getUri();
$uri .= ( $uri ? '&' : '' );
$uri .= http_build_query( $extraParams );
return new Request(
$request->getMethod(),
$uri,
$request->getHeaders(),
$request->getBody(),
$request->getProtocolVersion()
);
}));
$this->client = new Client([
'base_uri' => $url,
'handler' => $handler,
'exceptions' => false,
]);
If anyone knows how to make it less sinister-looking, I would say thank you!
I found a nice solution here.
Basically, anything defined in the first array of arguments, become part of the config for the client.
this means you can do this when initialising:
$client = new Client([
'base_uri' => 'http://google.com',
// can be called anything but defaults works well
'defaults' => [
'query' => [
'foo' => 'bar',
]
]
]);
Then, when using the client:
$options = [
'query' => [
'nonDefault' => 'baz',
]
];
// merge non default options with default ones
$options = array_merge_recursive($options, $client->getConfig('defaults'));
$guzzleResponse = $client->get('this/that.json', $options);
It's woth noting that the array_merge_recursive function appends to nested arrays rather than overwrites. If you plan on changing a default value, you'll need a different utility function. It works nicely when the default values are immutable though.
A "less sinister-looking" example based on the answer by #Saeven and the comment from #VladimirPak.
$query_defaults = [
'a' => $config['a'],
'b' => $config['b'],
];
$handler = \GuzzleHttp\HandlerStack::create();
$handler->push(\GuzzleHttp\Middleware::mapRequest(function (\Psr\Http\Message\RequestInterface $request) use ($query_defaults) {
$query = \GuzzleHttp\Psr7\Query::parse($request->getUri()->getQuery());
$query = array_merge($query_defaults, $query);
return $request->withUri($request->getUri()->withQuery(\GuzzleHttp\Psr7\Query::build($query)));
}));
$this->client = new \GuzzleHttp\Client([
'base_uri' => $url,
'handler' => $handler,
'exceptions' => false,
]);
I'm not sure how less sinister-looking it is though. lol
the solution proposed in github looks pretty ugly. This does not look much better, but at least is more readable and also works. I'd like feedback if anyone knows why should not be used:
$query = $uri . '/person/id?personid=' . $personid . '&name=' . $name;
return $result = $this->client->get(
$query
)
->getBody()->getContents();

Categories