Currently I am using Google Big query to match two tables and export the query result as CSV file in GCS.
I can't find a way to directly export the query result as CSV in GCS through API, whereas it is readily available in Google cloud console.
So I am creating a table dynamically with same column name and type as query output and inserts the record in the table. However as soon as I did it, Streaming buffer is created before actual table is populated with data. It is taking around 90 minutes to mature the table.
If I export that table (with active streaming buffer) as CSV in GCS, sometimes the exported file contains only the column header and no row data is exported.
Is there a way to overcome the situation?
I am using google cloud php API for my code.
Below is the sample working code.
1. We are creating two Table with schema dynamically
$bigQuery = new BigQueryClient();
try{
$res = $bigQuery->dataset(DATASET)->createTable($owner_table, [
'schema' => [
'fields' => $ownerFields
]
]);
}
catch (Exception $e){
//echo 'Message: ' .$e->getMessage();
//die();
$custom_error_message = "Error while creating owner table schema. Trying to create with ".$ownerFields;
sendErrorEmail($e->getMessage(), $custom_error_message);
return false;
}
2. Import two CSV from GCS into created tables.
$bigQuery = new BigQueryClient([
'projectId' => $projectId,
]);
$dataset = $bigQuery->dataset($datasetId);
$table = $dataset->table($tableId);
// load the storage object
$storage = new StorageClient([
'projectId' => $projectId,
]);
$object = $storage->bucket($bucketName)->object($objectName);
// create the import job
$job = $table->loadFromStorage($object, $options);
// poll the job until it is complete
$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
//print('Waiting for job to complete' . PHP_EOL);
$job->reload();
if (!$job->isComplete()) {
//throw new Exception('Job has not yet completed', 500);
// sendErrorEmail("Job has not yet completed", "Error while import from storage.");
}
});
3. Run Big Query to find some matches from both table.
{
//Code omitted for brevity
}
4. Big Query returns the result as array in php. Then We are creating the 3rd table and inserting the query result set into it.
$queryResults = $job->queryResults();
if ($queryResults->isComplete()) {
$i = 0;
$rows = $queryResults->rows();
$dataRows = [];
foreach ($rows as $row) {
//pr($row); die();
// printf('--- Row %s ---' . PHP_EOL, ++$i);
++$i;
$inlineRow = [];
$inlineRow['insertId'] = $i;
foreach ($row as $column => $value) {
// printf('%s: %s' . PHP_EOL, $column, $value);
$arr[$column] = $value;
}
$inlineRow['data']= $arr;
$dataRows[] = $inlineRow;
}
/* Create a new result table to store the query result*/
if(createResultTable($schema,$type))
{
if(count($dataRows) > 0)
{
sleep(120); //force sleep to mature the result table
$ownerTable = $bigQuery->dataset(DATASET)->table($tableId);
$insertResponse = $ownerTable->insertRows($dataRows);
if (!$insertResponse->isSuccessful()) {
//print_r($insertResponse->failedRows());
sendErrorEmail($insertResponse->failedRows(),"Failed to create resulted table for".$type);
return 0;
}
else
{
return 1;
}
}
else
{
return 2;
}
}
else
{
return 0;
}
//printf('Found %s row(s)' . PHP_EOL, $i);
}
5. We are trying to export the 3rd table which contains the result set to GCS.
function export_table($projectId, $datasetId, $tableId, $bucketName, $objectName, $format = 'CSV')
{
$bigQuery = new BigQueryClient([
'projectId' => $projectId,
]);
$dataset = $bigQuery->dataset($datasetId);
$table = $dataset->table($tableId);
// load the storage object
$storage = new StorageClient([
'projectId' => $projectId,
]);
$destinationObject = $storage->bucket($bucketName)->object($objectName);
// create the import job
$options = ['jobConfig' => ['destinationFormat' => $format]];
$job = $table->export($destinationObject, $options);
// poll the job until it is complete
$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
//print('Waiting for job to complete' . PHP_EOL);
$job->reload();
if (!$job->isComplete()) {
//throw new Exception('Job has not yet completed', 500);
return false;
}
});
// check if the job has errors
if (isset($job->info()['status']['errorResult'])) {
$error = $job->info()['status']['errorResult']['message'];
//printf('Error running job: %s' . PHP_EOL, $error);
sendErrorEmail($job->info()['status'], "Error while exporting resulted table. File Name: ".$objectName);
return false;
} else {
return true;
}
}
The Problem is on #5. As the 3rd Table in #4 is in streaming buffer the export job is not working successfully.
Sometimes it is able to export the table correctly and sometimes it is not done. We have tried to give a sleep between #4 and #5 about 120 seconds but the problem remains.
I have seen that the table generated dynamically via API is in streaming buffer for 90 mins. But this is too high.
Related
I'm trying to build a script where I need to read a txt file and execute some process with the lines on the file. For example, I need to check if the ID exists, if the information has updated, if yes, then update the current table, if no, then insert a new row on another temporary table to be manually checked later.
These files may contain more than 20,30 thousand lines.
When I just read the file and print some dummie content from the lines, it takes up to 40-50ms. However, when I need to connect to the database to do all those verifications, it stops before the end due to the timeout.
This is what I'm doing so far:
$handle = fopen($path, "r") or die("Couldn't get handle");
if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
$segment = explode('|', $buffer);
if ( strlen($segment[0]) > 6 ) {
$param = [':code' => intval($segment[0])];
$codeObj = Sql::exec("SELECT value FROM product WHERE code = :code", $param);
if ( !$codeObj ) {
$param = [
':code' => $segment[0],
':name' => $segment[1],
':value' => $segment[2],
];
Sql::exec("INSERT INTO product_tmp (code, name, value) VALUES (:code, :name, :value)", $param);
} else {
if ( $codeObj->value !== $segment[2] ) {
$param = [
':code' => $segment[0],
':value' => $segment[2],
];
Sql::exec("UPDATE product SET value = :value WHERE code = :code", $param);
}
}
}
}
fclose($handle);
}
And this is my Sql Class to connect with PDO and execute the query:
public static function exec($sql, $param = null) {
try {
$conn = new PDO('mysql:charset=utf8mb4;host= '....'); // I've just deleted the information to connect to the database (password, user, etc.)
$q = $conn->prepare($sql);
if ( isset($param) ) {
foreach ($param as $key => $value) {
$$key = $value;
$q->bindParam($key, $$key);
}
}
$q->execute();
$response = $q->fetchAll();
if ( count($response) ) return $response;
return false;
} catch(PDOException $e) {
return 'ERROR: ' . $e->getMessage();
}
}
As you can see, each query I do through Sql::exec(), is openning a new connection. I don't know if this may be the cause of such a delay on the process, because when I don't do any Sql query, the script run within ms.
Or what other part of the code may be causing this problem?
First of all, make your function like this,
to avoid multiple connects and also o get rid of useless code.
public static function getPDO() {
if (!static::$conn) {
static::$conn = new PDO('mysql:charset=utf8mb4;host= ....');
}
return static::$conn;
}
public static function exec($sql, $param = null) {
$q = static::getPDO()->prepare($sql);
$q->execute($param);
return $q;
}
then create unique index for the code field
then use a single INSERT ... ON DUPLICATE KEY UPDATE query instead of your thrree queries
you may also want to wrap your inserts in a transaction, it may speed up the inserts up to 70 times.
I am trying to paginate the results since its more than 1M rows. I am using the PHP version of BigQuery. but even if I put 10 as maxresults still gives me all the results. or am I doing it wrong.
function run_query_as_job($query, $maxResults = 10, $startIndex = 0)
{
$options = [
'maxResults' => $maxResults,
'startIndex' => $startIndex
];
$bigQuery = new BigQueryClient([
'projectId' => 'xxx',
'keyFilePath' => 'xxxx-21c721cefe2c.json'
]);
$job = $bigQuery->runQueryAsJob(
$query,
['jobConfig' => ['useLegacySql' => true]]);
$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
print('Waiting for job to complete' . PHP_EOL);
$job->reload();
if (!$job->isComplete()) {
throw new Exception('Job has not yet completed', 500);
}
});
$queryResults = $job->queryResults($options);
print_r($options);
if ($queryResults->isComplete()) {
$i = 0;
$rows = $queryResults->rows($options);
foreach ($rows as $row) {
printf('--- Row %s ---' . "<br>", ++$i);
}
printf('Found %s row(s)' . "<br>", $i);
} else {
throw new Exception('The query failed to complete');
}
}
The QueryResults.rows() function returns a Google\Cloud\Core\Iterator\ItemIterator. This automatically gets the next page when you loop through it. Setting maxResults = 10 means that it only fetches 10 at a time, but it will still fetch all pages when you loop through.
You can manually access just the first page with
$rows -> $queryResults->rows($options);
$firstPage -> $rows->pageIterator->current();
This question has been asked so many times , I have tried couple of way also but this time I am stuck since my requirement is bit specific . None of the generic methods worked for me .
Details
File Size = 75MB
Total Rows = 300000
PHP Code
protected $chunkSize = 500;
public function handle()
{
try {
set_time_limit(0);
$file = Flag::where('imported','=','0')
->orderBy('created_at', 'DESC')
->first();
$file_path = Config::get('filesystems.disks.local.root') . '/exceluploads/' .$file->file_name;
// let's first count the total number of rows
Excel::load($file_path, function($reader) use($file) {
$objWorksheet = $reader->getActiveSheet();
$file->total_rows = $objWorksheet->getHighestRow() - 1; //exclude the heading
$file->save();
});
$chunkid=0;
//now let's import the rows, one by one while keeping track of the progress
Excel::filter('chunk')
->selectSheetsByIndex(0)
->load($file_path)
->chunk($this->chunkSize, function($results) use ($file,$chunkid) {
//let's do more processing (change values in cells) here as needed
$counter = 0;
$chunkid++;
$output = new ConsoleOutput();
$data =array();
foreach ($results->toArray() as $row)
{
$data[] = array(
'data'=> json_encode($row),
'created_at'=>date('Y-m-d H:i:s'),
'updated_at'=> date('Y-m-d H:i:s')
);
//$x->save();
$counter++;
}
DB::table('price_results')->insert($data);
$file = $file->fresh(); //reload from the database
$file->rows_imported = $file->rows_imported + $counter;
$file->save();
$countx = $file->rows_imported + $counter;
echo "Rows Executed".$countx.PHP_EOL;
},
false
);
$file->imported =1;
$file->save();
echo "end of execution";
}
catch(\Exception $e)
{
dd($e->getMessage());
}
}
So the above Code runs really fast for the 10,000 rows CSV File.
But when I upload a larger CSV its not working .
My Only restriction here is that I have to use following logic to transform each row of the CSV to KeyPair value json data
foreach ($results->toArray() as $row)
{
$data[] = array(
'data'=> json_encode($row),
'created_at'=>date('Y-m-d H:i:s'),
'updated_at'=> date('Y-m-d H:i:s')
);
//$x->save();
$counter++;
}
Any suggestions would be appreciated , Its been more than and Hour now and still only 100,000 rows have been inserted
I find this is really slow
Database : POSTGRES
I use Google Api PHP Library.
I want to delete rows.
I found it, but how can I use it? https://github.com/google/google-api-php-client-services/blob/master/src/Google/Service/Sheets/DeleteDimensionRequest.php
For example, I use it for adding rows:
// Build the CellData array
$values = array();
foreach( $ary_values AS $d ) {
$cellData = new Google_Service_Sheets_CellData();
$value = new Google_Service_Sheets_ExtendedValue();
$value->setStringValue($d);
$cellData->setUserEnteredValue($value);
$values[] = $cellData;
}
// Build the RowData
$rowData = new Google_Service_Sheets_RowData();
$rowData->setValues($values);
// Prepare the request
$append_request = new Google_Service_Sheets_AppendCellsRequest();
$append_request->setSheetId(0);
$append_request->setRows($rowData);
$append_request->setFields('userEnteredValue');
// Set the request
$request = new Google_Service_Sheets_Request();
$request->setAppendCells($append_request);
// Add the request to the requests array
$requests = array();
$requests[] = $request;
// Prepare the update
$batchUpdateRequest = new Google_Service_Sheets_BatchUpdateSpreadsheetRequest(array(
'requests' => $requests
));
try {
// Execute the request
$response = $sheet_service->spreadsheets->batchUpdate($fileId, $batchUpdateRequest);
if( $response->valid() ) {
// Success, the row has been added
return true;
}
} catch (Exception $e) {
// Something went wrong
error_log($e->getMessage());
print_r($e->getMessage());
}
return false;
Please Try this works for me
$spid = '1lIUtn8WmN1Yhgxv68dt4rylm6rO77o8UX8uZu9PIu2o'; // <=== Remember to update this to your workbook to be updated
$deleteOperation = array(
'range' => array(
'sheetId' => 0, // <======= This mean the very first sheet on worksheet
'dimension' => 'ROWS',
'startIndex'=> 2, //Identify the starting point,
'endIndex' => (3) //Identify where to stop when deleting
)
);
$deletable_row[] = new Google_Service_Sheets_Request(
array('deleteDimension' => $deleteOperation)
);
Then Send the query to google to update your worksheet
$delete_body = new Google_Service_Sheets_BatchUpdateSpreadsheetRequest(array(
'requests' => $deletable_row
)
);
//var_dump($delete_body);
$result = $service->spreadsheets->batchUpdate($spid, $delete_body);
you check here. This assists me when trying to solve google sheet deleting operation.
I hope this helps you. Thanks and keep in touch
It might be a silly question, but I think it will be useful for all new mongoDB users and not just myself.
I am currently working on a live chat script as a way to learn about mongoDB. I am having trouble with only loading new messages into chat rather than loading all messages and replacing old ones. This is what I have.
My PHP:
try
{
$m = new Mongo(); // connect
$db = $m->selectDB("local");
}
catch ( MongoConnectionException $e )
{
echo '<p>Couldn\'t connect to mongodb, is the "mongo" process running?</p>';
exit();
}
$collection = $db->selectCollection("message");
echo "".$collection." selected";
$cursor = $collection->find();
// iterate cursor to display title of documents
foreach ($cursor as $document) {
echo $document["sender"].": ". $document["message"] . "<br>";
My JS:
function get_new(){
var messages = $.ajax({
type: "POST",
url: "ajax/receive.php",
async: false
}).success(function(){
setTimeout(function(){get_new();}, 10000);
}).responseText;
$('div.messages').html(messages);
}
My messages collection:
"createdAt" => $now,
"sender" => $sender,
"message" => $message
I know that my JS would eventually start using .append(messages), but I really don't know what to do with my PHP. So how do I change my php to only find new messages (older than last set of messages)
Found the answer. Use sessions to store the time of last update. So The PHP code will look like this:
$now = date("Y-m-d H:i:s");
$last = $_SESSION["last_message_ts"];
$session_last = new MongoDate(strtotime($last));
$session_now = new MongoDate(strtotime($now));
$_SESSION["last_message_ts"]=$now;
try
{
$m = new Mongo(); // connect
$db = $m->selectDB("local");
}
catch ( MongoConnectionException $e )
{
echo '<p>Couldn\'t connect to mongodb, is the "mongo" process running?</p>';
exit();
}
$collection = $db->selectCollection("message");
$cursor = $collection->find(array("createdAt" => array('$gt' => $session_last, '$lte' => $session_now)));
// iterate cursor to display title of documents
foreach ($cursor as $document) {
echo $document["sender"].": ". $document["message"];
}