This question has been asked so many times , I have tried couple of way also but this time I am stuck since my requirement is bit specific . None of the generic methods worked for me .
Details
File Size = 75MB
Total Rows = 300000
PHP Code
protected $chunkSize = 500;
public function handle()
{
try {
set_time_limit(0);
$file = Flag::where('imported','=','0')
->orderBy('created_at', 'DESC')
->first();
$file_path = Config::get('filesystems.disks.local.root') . '/exceluploads/' .$file->file_name;
// let's first count the total number of rows
Excel::load($file_path, function($reader) use($file) {
$objWorksheet = $reader->getActiveSheet();
$file->total_rows = $objWorksheet->getHighestRow() - 1; //exclude the heading
$file->save();
});
$chunkid=0;
//now let's import the rows, one by one while keeping track of the progress
Excel::filter('chunk')
->selectSheetsByIndex(0)
->load($file_path)
->chunk($this->chunkSize, function($results) use ($file,$chunkid) {
//let's do more processing (change values in cells) here as needed
$counter = 0;
$chunkid++;
$output = new ConsoleOutput();
$data =array();
foreach ($results->toArray() as $row)
{
$data[] = array(
'data'=> json_encode($row),
'created_at'=>date('Y-m-d H:i:s'),
'updated_at'=> date('Y-m-d H:i:s')
);
//$x->save();
$counter++;
}
DB::table('price_results')->insert($data);
$file = $file->fresh(); //reload from the database
$file->rows_imported = $file->rows_imported + $counter;
$file->save();
$countx = $file->rows_imported + $counter;
echo "Rows Executed".$countx.PHP_EOL;
},
false
);
$file->imported =1;
$file->save();
echo "end of execution";
}
catch(\Exception $e)
{
dd($e->getMessage());
}
}
So the above Code runs really fast for the 10,000 rows CSV File.
But when I upload a larger CSV its not working .
My Only restriction here is that I have to use following logic to transform each row of the CSV to KeyPair value json data
foreach ($results->toArray() as $row)
{
$data[] = array(
'data'=> json_encode($row),
'created_at'=>date('Y-m-d H:i:s'),
'updated_at'=> date('Y-m-d H:i:s')
);
//$x->save();
$counter++;
}
Any suggestions would be appreciated , Its been more than and Hour now and still only 100,000 rows have been inserted
I find this is really slow
Database : POSTGRES
Related
Currently I am using Google Big query to match two tables and export the query result as CSV file in GCS.
I can't find a way to directly export the query result as CSV in GCS through API, whereas it is readily available in Google cloud console.
So I am creating a table dynamically with same column name and type as query output and inserts the record in the table. However as soon as I did it, Streaming buffer is created before actual table is populated with data. It is taking around 90 minutes to mature the table.
If I export that table (with active streaming buffer) as CSV in GCS, sometimes the exported file contains only the column header and no row data is exported.
Is there a way to overcome the situation?
I am using google cloud php API for my code.
Below is the sample working code.
1. We are creating two Table with schema dynamically
$bigQuery = new BigQueryClient();
try{
$res = $bigQuery->dataset(DATASET)->createTable($owner_table, [
'schema' => [
'fields' => $ownerFields
]
]);
}
catch (Exception $e){
//echo 'Message: ' .$e->getMessage();
//die();
$custom_error_message = "Error while creating owner table schema. Trying to create with ".$ownerFields;
sendErrorEmail($e->getMessage(), $custom_error_message);
return false;
}
2. Import two CSV from GCS into created tables.
$bigQuery = new BigQueryClient([
'projectId' => $projectId,
]);
$dataset = $bigQuery->dataset($datasetId);
$table = $dataset->table($tableId);
// load the storage object
$storage = new StorageClient([
'projectId' => $projectId,
]);
$object = $storage->bucket($bucketName)->object($objectName);
// create the import job
$job = $table->loadFromStorage($object, $options);
// poll the job until it is complete
$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
//print('Waiting for job to complete' . PHP_EOL);
$job->reload();
if (!$job->isComplete()) {
//throw new Exception('Job has not yet completed', 500);
// sendErrorEmail("Job has not yet completed", "Error while import from storage.");
}
});
3. Run Big Query to find some matches from both table.
{
//Code omitted for brevity
}
4. Big Query returns the result as array in php. Then We are creating the 3rd table and inserting the query result set into it.
$queryResults = $job->queryResults();
if ($queryResults->isComplete()) {
$i = 0;
$rows = $queryResults->rows();
$dataRows = [];
foreach ($rows as $row) {
//pr($row); die();
// printf('--- Row %s ---' . PHP_EOL, ++$i);
++$i;
$inlineRow = [];
$inlineRow['insertId'] = $i;
foreach ($row as $column => $value) {
// printf('%s: %s' . PHP_EOL, $column, $value);
$arr[$column] = $value;
}
$inlineRow['data']= $arr;
$dataRows[] = $inlineRow;
}
/* Create a new result table to store the query result*/
if(createResultTable($schema,$type))
{
if(count($dataRows) > 0)
{
sleep(120); //force sleep to mature the result table
$ownerTable = $bigQuery->dataset(DATASET)->table($tableId);
$insertResponse = $ownerTable->insertRows($dataRows);
if (!$insertResponse->isSuccessful()) {
//print_r($insertResponse->failedRows());
sendErrorEmail($insertResponse->failedRows(),"Failed to create resulted table for".$type);
return 0;
}
else
{
return 1;
}
}
else
{
return 2;
}
}
else
{
return 0;
}
//printf('Found %s row(s)' . PHP_EOL, $i);
}
5. We are trying to export the 3rd table which contains the result set to GCS.
function export_table($projectId, $datasetId, $tableId, $bucketName, $objectName, $format = 'CSV')
{
$bigQuery = new BigQueryClient([
'projectId' => $projectId,
]);
$dataset = $bigQuery->dataset($datasetId);
$table = $dataset->table($tableId);
// load the storage object
$storage = new StorageClient([
'projectId' => $projectId,
]);
$destinationObject = $storage->bucket($bucketName)->object($objectName);
// create the import job
$options = ['jobConfig' => ['destinationFormat' => $format]];
$job = $table->export($destinationObject, $options);
// poll the job until it is complete
$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
//print('Waiting for job to complete' . PHP_EOL);
$job->reload();
if (!$job->isComplete()) {
//throw new Exception('Job has not yet completed', 500);
return false;
}
});
// check if the job has errors
if (isset($job->info()['status']['errorResult'])) {
$error = $job->info()['status']['errorResult']['message'];
//printf('Error running job: %s' . PHP_EOL, $error);
sendErrorEmail($job->info()['status'], "Error while exporting resulted table. File Name: ".$objectName);
return false;
} else {
return true;
}
}
The Problem is on #5. As the 3rd Table in #4 is in streaming buffer the export job is not working successfully.
Sometimes it is able to export the table correctly and sometimes it is not done. We have tried to give a sleep between #4 and #5 about 120 seconds but the problem remains.
I have seen that the table generated dynamically via API is in streaming buffer for 90 mins. But this is too high.
Here i have a csv file which have contains numerous contacts rows, and these rows ae saved to my database table. Now i just want to inspect empty rows from uploaded csv by customer.
Here is an example csv with some empty rows
In this above csv, have 3rd and 6th rows are empty. so want to inspect these empty row number and discard csv with error.
Here is my csv code
$filename = $_FILES["csv_file"]["tmp_name"];
if ($_FILES["csv_file"]["size"] > 0) {
$file = fopen($filename, "r");
$importdata = fgetcsv($file, 10000, ",");
$counter = 1;
while (!feof($file)) {
if ($counter > 1) {
$alldata[] = fgetcsv($file);
}
$counter++;
}
fclose($file);
$csvfieldcounter = 1;
foreach ($alldata as $importdata) {
$userdata = $this->session->userdata();
$userId = $userdata['id'];
$status = 'Y';
if ($importdata[4] == 'Disable' || $importdata[4] == 'disable')
$status = 'N';
else if ($importdata[4] == 'Enable' || $importdata[4] == 'enable')
$status = 'Y';
$data = array(
'customer_name' => $importdata[0],
'customer_email' => $importdata[1],
'customer_mobile' => $importdata[2],
'birth_date' => $importdata[3],
'status' => $status,
'user_id' => $userId,
'cat_type' => $file_cat
);
if ($importdata[2]) {
$run = $this->db->insert('customer', $data);
$csvfieldcounter++;
$id = $this->db->insert_id();
}
}
$this->session->set_flashdata('csv_imported','Your CSV have been successfully imported.');
redirect('/customer', $csvfieldcounter);
}
I just want a little help for get that. your kind efforts would be appreciated Thanks :)
I suggest you to use below mentioned library. This will help you address above issue also it will decrease lines of code in your controller.
https://github.com/parsecsv/parsecsv-for-php
This will help you.
If you worried about how to use than check below mentioned code.
$this->load->library('Parsecsv');
$csv = new Parsecsv($file);
$users = $csv->data;
$noOfUsers = count($users);
foreach($users as $user):
if(!empty($user)):
// Write your code
endif;
endforeach;
I usually work with web hosting companies but I decided to start learning working with servers to expand my knowledge.
I'll better give a real example to explain my question the best:
I have a web application that gathers data from a slow API that returns JSON data of products.
I have a function running every 1AM running a lot of queries on "id"s in my database.
Crontab:
0 1 * * * cd /var/www/html/tools; php index.php aso Cli_kas kas_alert
So this creates a process for the app (please correct me here if I'm wrong) and each process creates threads, and just to be more accurate, they are multi-threads since they do more than one thing: like pulling data from the DB to get the right variables and string them to the API queries, getting the data from the API, organizing it, searching the relevant data, and then inserting new data to the database.
The main PHP functions:
// MAIN: Cron Job Function
public function kas_alert() {
// 0. Deletes all the saved data from the `data` table 1 month+ ago.
// $this->kas_model->clean_old_rows();
// 1. Get 'prod' table
$data['table'] = $this->kas_model->prod_table();
// 2. Go through each row -
foreach ( $data['table'] as $row ) {
// 2.2. Gets all vars from the first query.
$last_row_query = $this->kas_model->get_last_row_of_tag($row->tag_id);
$last_row = $last_row_query[0];
$l_aaa_id = $last_row->prod_aaa_id;
$l_and_id = $last_row->prod_bbb_id;
$l_r_aaa = $last_row->dat_data1_aaa;
$l_r_and = $last_row->dat_data1_bbb;
$l_t_aaa = $last_row->dat_data2_aaa;
$l_t_and = $last_row->dat_data2_bbb;
$tagword = $last_row->tag_word;
$tag_id = $last_row->tag_id;
$country = $last_row->kay_country;
$email = $last_row->u_email;
$prod_name = $last_row->prod_name;
// For the Weekly report:
$prod_id = $last_row->prod_id;
$today = date('Y-m-d');
// 2.3. Run the tagword query again for today on each one of the tags and insert to DB.
if ( ($l_aaa_id != 0) || ( !empty($l_aaa_id) ) ) {
$aaa_data_today = $this->get_data1_aaa_by_id_and_kw($l_aaa_id, $tagword, $country);
} else{
$aaa_data_today['data1'] = 0;
$aaa_data_today['data2'] = 0;
$aaa_data_today['data3'] = 0;
}
if ( ($l_and_id != 0) || ( !empty($l_and_id) ) ) {
$bbb_data_today = $this->get_data1_bbb_by_id_and_kw($l_and_id, $tagword, $country);
} else {
$bbb_data_today['data1'] = 0;
$bbb_data_today['data2'] = 0;
$bbb_data_today['data3'] = 0;
}
// 2.4. Insert the new variables to the "data" table.
if ($this->kas_model->insert_new_tag_to_db( $tag_id, $aaa_data_today['data1'], $bbb_data_today['data1'], $aaa_data_today['data2'], $bbb_data_today['data2'], $aaa_data_today['data3'], $bbb_data_today['data3']) ){
}
// Kas Alert Outputs ($SEND is echoed in it's original function)
echo "<h1>prod Name: $prod_id</h1>";
echo "<h2>tag id: $tag_id</h2>";
var_dump($aaa_data_today);
echo "aaa old: ";
echo $l_r_aaa;
echo "<br> aaa new: ";
echo $aaa_data_today['data1'];
var_dump($bbb_data_today);
echo "<br> bbb old: ";
echo $l_r_and;
echo "<br> bbb new: ";
echo $bbb_data_today['data1'];
// 2.5. Check if there is a need to send something
$send = $this->check_if_send($l_aaa_id, $l_and_id, $l_r_aaa, $aaa_data_today['data1'], $l_r_and, $bbb_data_today['data1']);
// 2.6. If there is a trigger, send the email!
if ($send) {
$this->send_mail($l_aaa_id, $l_and_id, $aaa_data_today['data1'], $bbb_data_today['data1'], $l_r_aaa, $l_r_and, $tagword, $email, $prod_name);
}
}
}
For #Raptor, this is the function that get's the API data:
// aaa tag Query
// Gets aaa prod dataing by ID.
public function get_data_aaa_by_id_and_tg($id, $tag, $query_country){
$tag_for_url = rawurlencode($tag);
$found = FALSE;
$i = 0;
$data = array();
// Create a stream for Json. That's how the code knows what to expect to get.
$context_opts = array(
'http' => array(
'method' => "GET",
'header' => "Accepts: application/json\r\n"
));
$context = stream_context_create($context_opts);
while ($found == FALSE) {
// aaa Query
$json_query_aaa = "https://api.example.com:443/aaa/ajax/research_tag?app_id=$id&term=$tag_for_url&page_index=$i&country=$query_country&auth_token=666";
// Get the Json
$json_query_aaa = file_get_contents($json_query_aaa, false, $context);
// Turn Json to a PHP array
$json_query_aaa = json_decode($json_query_aaa, true);
// Get the data2
$data2 = $json_query_aaa['tag']['data2'];
if (is_null($data2)){ $data2 = 0; }
// Get data3
$data3 = $json_query_aaa['tag']['phone_prod']['data3'];
if (is_null($data3)){ $data3 = 0; }
// Finally, the main prod array.
$json_query_aaa = $json_query_aaa['tag']['phone_prod']['app_list'];
if ( count($json_query_aaa) > 2 ) {
for ( $j=0; $j<count($json_query_aaa); $j++ ) {
if ( $json_query_aaa[$j]['id'] == $id ) {
$found = TRUE;
$data = $json_query_aaa[$j]['data'] + 1;
break;
}
if ($found == TRUE){
break;
}
}
$i++;
} else {
$data = 0;
break;
}
}
$data['data1'] = $data;
$data['data2'] = $data2;
$data['data3'] = $data3;
return $data;
}
All threads are stacked one after an other, and when one thread is done, only then - the second thread can proceed, ect'.
And in technical view on this, all threads wait in the RAM until the one before them is done working "inside" the CPU. (correct me if I'm wrong again :] )
This doesn't even "tickle" the servers RAM or CPU when looking at it in the process manager (I use "htop"). RAM is at 400M/4.25G and CPU at ONLY 0.7%-1.3%.
Making me feel this isn't the best I can get from my current server, and getting slow results from my web app.
How do I get things done in a way that all threads work in parallel, but not to a point that my app crashes due to lacks of CPU or RAM?
Am a trying to create a PHP (PHrets) script that downloads all real estate listing information from a specific area and saves all of the listings data (CSV file and photos) on my web server.
Note: A single listing may have up to 20 photos.
I am using PHrets to retrieve MLS listing data and it works great for creating a CSV of data. However, I would like to modify this code to loop through each listing's photos and download them onto my web server with the following name convention: MLSID-PHOTOID.
The error below is coming from the GetObject loop at the end of the code:
ERROR Message: HTTP Error 500 (Internal Server Error): An unexpected condition was encountered while the server was attempting to fulfill the request.
Thanks in advance!
<?php
$rets_login_url = "http://maxebrdi.rets.fnismls.com/Rets/FNISRETS.aspx/MAXEBRDI/login?&rets-version=rets/1.5";
$rets_username = "MyUser";
$rets_password = "MyPass";
// use http://retsmd.com to help determine the SystemName of the DateTime field which
// designates when a record was last modified
$rets_status_field = "L_StatusCatID";
$rets_city_field = "L_City";
// use http://retsmd.com to help determine the names of the classes you want to pull.
// these might be something like RE_1, RES, RESI, 1, etc.
//"RE_1"
$property_classes = array("LR_5");
// DateTime which is used to determine how far back to retrieve records.
// using a really old date so we can get everything
$listing_status = 1;
$listing_city = "OAKLAND";
//////////////////////////////
require_once("phrets.php");
// start rets connection
$rets = new phRETS;
echo "+ Connecting to {$rets_login_url} as {$rets_username}<br>\n";
$connect = $rets->Connect($rets_login_url, $rets_username, $rets_password);
if ($connect) {
echo " + Connected<br>\n";
}
else {
echo " + Not connected:<br>\n";
print_r($rets->Error());
exit;
}
foreach ($property_classes as $class) {
echo "+ Property:{$class}<br>\n";
$file_name = strtolower("property_{$class}.csv");
$fh = fopen($file_name, "w+");
$maxrows = true;
$offset = 1;
$limit = 1000;
$fields_order = array();
while ($maxrows) {
$query = "({$rets_status_field}={$listing_status}),({$rets_city_field}={$listing_city})";
// run RETS search
echo " + Query: {$query} Limit: {$limit} Offset: {$offset}<br>\n";
$search = $rets->SearchQuery("Property", $class, $query, array("Limit" => $limit, "Offset" => $offset, "Format" => "COMPACT", "Select" => "L_ListingID,L_Class,L_Type_,L_Status,L_AskingPrice,L_Keyword2,L_Keyword3,L_Keyword4,L_SquareFeet,L_Remarks,L_Address,L_City,L_State,LO1_OrganizationName,LA1_AgentLicenseID,LA1_UserFirstName,LA1_UserLastName,L_PictureCount", "Count" => 1));
if ($rets->NumRows() > 0) {
if ($offset == 1) {
// print filename headers as first line
$fields_order = $rets->SearchGetFields($search);
fputcsv($fh, $fields_order);
}
// process results
while ($record = $rets->FetchRow($search)) {
$this_record = array();
foreach ($fields_order as $fo) {
$this_record[] = $record[$fo];
}
fputcsv($fh, $this_record);
}
$offset = ($offset + $rets->NumRows());
}
$maxrows = $rets->IsMaxrowsReached();
echo " + Total found: {$rets->TotalRecordsFound()}<br>\n";
$rets->FreeResult($search);
}
fclose($fh);
echo " - done<br>\n";
}
/*
//This code needs to be fixed. Not sure how to include the listing array correctly.
$photos = $rets->GetObject("Property", "Photo", $record[ListingID], "*", 1);
foreach ($photos as $photo) {
$listing = $photo['Content-ID'];
$number = $photo['Object-ID'];
if ($photo['Success'] == true) {
echo "{$listing}'s #{$number} photo is at {$photo['Location']}\n";
file_put_contents("photos/{$photo['Content-ID']}-{$photo['Object-ID']}.jpg",
$photo['Data']);
}
else {
echo "Error ({$photo['Content-ID']}-{$photo['Object-ID']}
}
}
//ERROR Message: HTTP Error 500 (Internal Server Error): An unexpected condition was //encountered while the server was attempting to fulfill the request.
*/
echo "+ Disconnecting<br>\n";
$rets->Disconnect();
I modified your while loop that processes each record to also get the photos for that record. I also tested this on a RETS server.
while ($record = $rets->FetchRow($search)) {
$this_record = array();
foreach ($fields_order as $fo) {
if ($fo == 'L_ListingID') {
$photos = $rets->GetObject("Property", "Photo", $record[$fo], "*", 1);
foreach ($photos as $photo) {
if ($photo['Success'] == true) {
file_put_contents("photos/{$photo['Content-ID']}-{$photo['Object-ID']}.jpg", $photo['Data']);
}
}
}
$this_record[] = $record[$fo];
}
fputcsv($fh, $this_record);
}
One thing I found is you have ListingID on this line:
$photos = $rets->GetObject("Property", "Photo", $record[ListingID], "*", 1);
But then in your search query you refer to ListingID as L_ListingID. Maybe the above line should have L_ListingID.
$search = $rets->SearchQuery("Property", $class, $query, array("Limit" => $limit, "Offset" => $offset, "Format" => "COMPACT", "Select" => "L_ListingID,L_Class,L_Type_,L_Status,L_AskingPrice,L_Keyword2,L_Keyword3,L_Keyword4,L_SquareFeet,L_Remarks,L_Address,L_City,L_State,LO1_OrganizationName,LA1_AgentLicenseID,LA1_UserFirstName,LA1_UserLastName,L_PictureCount", "Count" => 1));
What if the last parameter of GetObject is "0" :
$photos = $rets->GetObject("Property", "Photo", $record[$fo], "*", 1);
in this case "1" is for retreiving the location. some servers this is switched off and return the actual image I believe - that would require a different method.
I have been trying to export all of our invoices in a specific format for importing into Sage accounting. I have been unable to export via Dataflow as I need to export the customer ID (which strangely is unavailable) and also a couple of static fields to denote tax codes etc…
This has left me with the option of using the API to export the data and write it to a CSV. I have taken an example script I found (sorry can’t remember where in order to credit it...) and made some amendments and have come up with the following:
<?php
$website = 'www.example.com';
$api_login = 'user';
$api_key ='password';
function magento_soap_array($website,$api_login,$api_key,$list_type,$extra_info){
$proxy = new SoapClient('http://'.$website.'/api/soap/?wsdl');
$sessionId = $proxy->login($api_login, $api_key);
$results = $proxy->call($sessionId,$list_type,1);
if($list_type == 'order_invoice.list'){
/*** INVOICES CSV EXPORT START ***/
$filename = "invoices.csv";
$data = "Type,Account Reference,Nominal A/C Ref,Date,Invoice No,Net Amount,Tax Code,Tax Amount\n";
foreach($results as $invoice){
foreach($invoice as $entry => $value){
if ($entry == "order_id"){
$orders = $proxy->call($sessionId,'sales_order.list',$value);
}
}
$type = "SI";
$nominal = "4600";
$format = 'Y-m-d H:i:s';
$date = DateTime::createFromFormat($format, $invoice['created_at']);
$invoicedOn = $date->format('d/m/Y');
$invoiceNo = $invoice['increment_id'];
$subtotal = $invoice['base_subtotal'];
$shipping = $invoice['base_shipping_amount'];
$net = $subtotal+$shipping;
$taxCode = "T1";
$taxAmount = $invoice['tax_amount'];
$orderNumber = $invoice['order_id'];
foreach($orders as $order){
if ($order['order_id'] == $orderNumber){
$accRef = $order['customer_id'];
}
}
$data .= "$type,$accRef,$nominal,$invoicedOn,$invoiceNo,$net,$taxCode,$taxAmount\n";
}
file_put_contents($_SERVER['DOCUMENT_ROOT']."/var/export/" . $filename, "$header\n$data");
/*** INVOICES CSV EXPORT END ***/
}else{
echo "nothing to see here";
}/*** GENERIC PAGES END ***/
}/*** END function magento_soap_array ***/
if($_GET['p']=="1")
{
magento_soap_array($website,$api_login,$api_key,'customer.list','Customer List');
}
else if($_GET['p']=="2")
{
magento_soap_array($website,$api_login,$api_key,'order_creditmemo.list','Credit Note List');
}
else if($_GET['p']=="3")
{
magento_soap_array($website,$api_login,$api_key,'sales_order.list','Orders List');
}
else if($_GET['p']=="4")
{
magento_soap_array($website,$api_login,$api_key,'order_invoice.list','Invoice List');
}
?>
This seems to be working fine, however it is VERY slow and I can’t help but think there must be a better, more efficient way of doing it…
Has anybody got any ideas?
Thanks
Marc
i think on put break; would be okey. because only one key with order_id, no need to looping after found order_id key.
if ($entry == "order_id"){
$orders = $proxy->call($sessionId,'sales_order.list',$value);
break;
}
and you can gather all call(s) and call it with multicall as example:
$client = new SoapClient('http://magentohost/soap/api/?wsdl');
// If somestuff requires api authentification,
// then get a session token
$session = $client->login('apiUser', 'apiKey');
$result = $client->call($session, 'somestuff.method');
$result = $client->call($session, 'somestuff.method', 'arg1');
$result = $client->call($session, 'somestuff.method', array('arg1', 'arg2', 'arg3'));
$result = $client->multiCall($session, array(
array('somestuff.method'),
array('somestuff.method', 'arg1'),
array('somestuff.method', array('arg1', 'arg2'))
));
// If you don't need the session anymore
$client->endSession($session);
source