I am dealing with 700 rows of data in my excel.
And I add on a column this entry:
foreach($data as $k => $v){
$users ->getCell('A'.$k)->setValue($v['Username']);
$users->setCellValueExplicit('B'.$k,
'=INDEX(\'Feed\'!H2:H'.$lastRow.',MATCH(A'.$k.',\'Feed\'!G2:G'.$lastRow.',0))',
PHPExcel_Cell_DataType::TYPE_FORMULA);
}
$users stands for a spreadsheet.
I see that writing 700 cells with the above setCellValueExplicit() takes more than 2 minutes to get processed. If I omit that line it takes 4 seconds for the same machine to process it.
2 minutes can be ok, but what if I have 2000 cells. Is there any way that can be speed optimized?
ps: =VLOOKUP is the same slow as the above function.
Update
The whole idea of the script:
read a CSV file (13 columns and at least 100 rows), write it into a spreadsheet, create a new spreadsheet ($users), read two columns, sort them based to one column and write it to the $users spreadsheet.
Read the columns:
$data = array();
for ($i = 1; $i <= $lastRow; $i++) {
$user = $Feed ->getCell('G'.$i)->getValue();
$number = $Feed ->getCell('H'.$i)->getValue();
$row = array('User' => $user, 'Number' => $number);
array_push($data, $row);
}
Sort the data
function cmpb($a,$b){
//get which string is less or 0 if both are the same
if($a['Number']>$b['Number']){
$cmpb = -1;
}elseif($a['Number']<$b['Number']){
$cmpb = 1;
}else{
$cmpb = 0;
}
//if the strings are the same, check name
if($cmpb == 0){
//compare the name
$cmpb = strcasecmp($a['User'], $b['User']);
}
return $cmpb;
}
usort($data, 'cmpb');
Write data
foreach($data as $k => $v){
$users ->getCell('A'.$k)->setValue($v['Username']);
$users ->getCell("B{$k}")->setValueExplicit("=INDEX('Feed'!H2:H{$lastRow},MATCH(A{$k},'Feed'!G2:G{$lastRow},0))",
PHPExcel_Cell_DataType::TYPE_FORMULA);
}
and also unset the data for memory:
unset($data);
So if comment the line with setValueExplicit everything becomes smoother.
Looking at PHPExcel's source code, this is PHPExcel_Worksheet::setCellValueExplicit function:
public function setCellValueExplicitByColumnAndRow($pColumn = 0, $pRow = 1, $pValue = null, $pDataType = PHPExcel_Cell_DataType::TYPE_STRING)
{
return $this->getCell(PHPExcel_Cell::stringFromColumnIndex($pColumn) . $pRow)->setValueExplicit($pValue, $pDataType);
}
For the data type you're using, PHPExcel_Cell_DataType::TYPE_FORMULA, the PHPExcel_Cell::setValueExplicit function just executes:
case PHPExcel_Cell_DataType::TYPE_FORMULA:
$this->_value = (string)$pValue;
break;
I can't find a logical explanation for the old up on the execution of that particular instruction. Try to replace it for the following and let me know if there was any improvement:
$users ->getCell("B{$k}")->setValueExplicit("=INDEX('Feed'!H2:H{$lastRow},MATCH(A{$k},'Feed'!G2:G{$lastRow},0))", PHPExcel_Cell_DataType::TYPE_FORMULA);
As a last resource my advice would be to time track the execution of the instruction to find the bottleneck.
Related
I'm trying to get a large amount of data by grabbing it in chunks from the database and writing it to CSV. For some reason the code below is only writing the first chunk (2000 rows) to CSV. I have my $chunk and $limit variables writing to a text file, and those are going through the loop and writing out the correct values. So why isn't $result=$this->db->get('tblProgram', $chunk, $offset)->result_array(); grabbing the next chunks?
Can you not run $this->db->get('tblProgram', $chunk, $offset)->result_array(); multiple times with different offsets? How else would I loop through the results?
I can confirm that I have more than 200k rows returned from the query, and that if I set chunk to something different, I'm still only getting the first chunk returned.
//Get rows from tblTrees based on criteria set in options array in downloadable format
public function get_download_tree_data($options=array(), $rand=""){
$this->db->reset_query();
$this->db->join('tblPlots','tblPlots.programID=tblProgram.pkProgramID');
$this->db->join('tblTrees','tblTrees.treePlotID=tblPlots.id');
$this->db->order_by('tblTrees.id', 'ASC');
// $allResults=$this->db->count_all_results('tblProgram', false);
$allResults=200000;
$offset=0;
$chunk=2000;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
$tempPath=$this->config->item('temp_path')."$rand/trees.txt";
$temp_handle=fopen($tempPath,'a');
while (($offset<$allResults)) {
$temptxt=$chunk." ".$offset."\n";
fwrite($temp_handle,$temptxt);
$result=$this->db->get('tblProgram', $chunk, $offset)->result_array();
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
$offset=$offset+$chunk;
}
fclose($tree_handle);
fclose($temp_handle);
return array('resultCount'=>$allResults);
}
https://github.com/bcit-ci/CodeIgniter/blob/develop/system/database/DB_query_builder.php
Looks like calling the get method resets your model:
public function get($table = '', $limit = NULL, $offset = NULL)
{
if ($table !== '')
{
$this->_track_aliases($table);
$this->from($table);
}
if ( ! empty($limit))
{
$this->limit($limit, $offset);
}
$result = $this->query($this->_compile_select());
$this->_reset_select();
return $result;
}
I'd imagine this is the case of any version of ci.
I'm reporting on appointment activity and have included a function to export the raw data behind the KPIs. This raw data is stored as a CSV and I need to check for potentially duplicate consultations that have been entered.
Each row of data is assigned a unique visit ID based on the patients ID and the appointment ID. The raw data contains 30 columns of data, the duplicate check only needs to be performed on 7 of these. I have imported the CSV and created an array as below for first record and then append rest on.
$mds = array(
$unique_visit_id => array(
$appt_date,
$dob,
$site,
$CCG,
$GP,
$appt_type,
$treatment_scheme
)
);
What I need is to scan the $mds array and return an array containing just the $unique_visit_id for any duplicate arrays.
e.g. keys 1111, 2222 and 5555 all references arrays that contain the same value for all seven values, then I would need 2222 and 5555 returned.
I've tried search but not coming up with anything that is working.
Thanks
This is what I've gone with, still validating (data set is very big) but seems to be functioning as expected so far
$handle = fopen("../reports/mds_full_export.csv", "r");
$visits = array();
while($data = fgetcsv($handle,0,',','"') !== FALSE){
$key = $data['unique_visit_id'];
$value = $data['$appt_date'].$data['$dob'].$data['$site'].$data['$CCG'].$data['$GP'].$data['$appt_type'].$data['$treatment_scheme'];
$visits[$key] = $value;
}
$visits = asort($visits);
$previous = "";
$dupes = array();
foreach($visits as $id => $visit){
if(strcmp($previous, $visit) == 0){
$dupes[] = $id;
}
$previous = $visit;
}
return $dupes;
My goal is to iterate over all rows in a specific ColumnFamily in a node.
Here is the php code (using my wrapper over phpcassa):
$ring = $cass_db->describe_ring();
foreach ($ring as $ring_details)
{
$start_token = $ring_details->start_token;
$end_token = $ring_details->end_token;
if ($start_token != null && $end_token != null)
{
$i = 0;
$batch_size = 10;
$params = array(
'token_start' => $start_token,
'token_finish' => $end_token,
'row_count' => $batch_size,
'buffer_size' => 1000
);
while ($batch = $cass_db->get_range_by_token('myColumnFamily', $params))
{
var_dump('Batch# '.$i);
foreach ($batch as $row)
{
$row_key = $row[0];
$row_values = $row[1];
var_dump($row_key);
}
$i++;
//Just to stop infinite loop
if ($i > 14)
{
die();
}
}
}
}
get_range_by_token() uses default parameters overwritten by $params.
In each batch I get the same 10 row keys.
How to iterate over all existing rows in a large Cassandra DB?
I am not a PHP developer so I may misunderstand something in your code. More, you did not specify which cassandra version you are using.
Iteration on all rows is generally done starting and ending with an empty token, and redefining the start token in each iteration. In your code I can't see where you redefine token_start in each iteration. If you don't redefine it you're querying cassandra everytime for the same range of tokens and you will get always the same resultset.
Your code should do something like this ...
start_token = '';
end_token = '';
page_size = 100;
while ( get_range_by_token('cf', start_token, end_token, page_size) {
// here I should get page_size rows (unless I'm in last iteration or table rows is smaller than page_size elements)
start_token = rows[rows.size()].getKey();
}
HTH,
Carlo
I am facing problem to retrieve records in descending order with pagination limit from amazon dynamodb as in mysql.
Now I am using the following script, but it gives unordered list of records. I need the last inserted id is on top.
$limit = 10;
$total = 0;
$start_key = null;
$params = array('TableName' => 'event','AttributesToGet' =>array('id','interactiondate','repname','totalamount','fooding','nonfooding','pdfdocument','isMultiple','payment_mode','interaction_type','products','programTitle','venue','workstepId','foodingOther','interaction_type_other'), 'ScanFilter'=> array('manufacturername' => array("ComparisonOperator" => "EQ", "AttributeValueList" => array(array("S" => "$manufacturername")))),'Limit'=>$limit );
$itemsArray = array();
$itemsArray = array();
$finalItemsArray = array();
$finalCRMRecords = array();
do{
if(!empty($start_key)){
$params['ExclusiveStartKey'] = $start_key->getArrayCopy();
}
$response = $this->Amazon->Dynamodb->scan($params);
if ($response->status == 200) {
$counter = (string) $response->body->Count;
$total += $counter;
foreach($response->body->Items as $itemsArray){
$finalItemsArray[] = $itemsArray;
}
if($total>$limit){
$i =1;
foreach($response->body->Items as $items){
$finalItemsArray[] = $items;
if($i == $limit){
$start_key = $items->id->{AmazonDynamoDB::TYPE_NUMBER}->to_array();
$finalCRMRecords['data'] = $finalItemsArray;
$finalCRMRecords['start_key'] = $start_key;
break;
}
$i++;
}
}elseif($total<$limit){
$start_key = $response->body->LastEvaluatedKey->to_array();
}else{
$finalCRMRecords['data'] = $finalItemsArray;
if ($response->body->LastEvaluatedKey) {
$start_key =$response->body->LastEvaluatedKey->to_array();
break;
} else {
$start_key = null;
}
$finalCRMRecords['start_key'] = $start_key;
}
}
}while($start_key);
Regards
Sandeep Kumar Sinha
A Scan operation in DynamoDB can not change the sorting of the returned items. Also is Scan a pretty expensive operation as it always requires to read the whole table.
If you want to take advantage of DynamoDB, here's one advice:
Instead of looking for information, try to just find it.
In the sense of, use lookups instead of scan/query to get the information you need.
As an example, if you have a table that stores Events. Just store all events in that table, with their EventId as HashKey. Then you can have a second table EventLookups to store lookups to EventIds. In the EventLookups table you could put an Item like LookupId: LATEST-EVENT referencing some EventId: .... Every time you insert new events you can update the LATEST-EVENT entry to point to a newer Event. Or use a SET to store the latest 50 EventIds events in one Item.
-mathias
I'm fairly new to cassandra but i have making good progress so far.
$conn = new ConnectionPool('Cluster');
$User = new ColumnFamily($conn, 'User');
$index_exp = CassandraUtil::create_index_expression('email', 'John#dsaads.com');
$index_clause = CassandraUtil::create_index_clause(array($index_exp));
$rows = $User->get_indexed_slices($index_clause);
foreach($rows as $key => $columns) {
echo $columns['name']."<br />";
}
Im using this type of query to get specific date from somebodys email adress.
However, i now want to do 2 things.
Count every user in the database and display the number
List every user in the database with $columns['name']." ".$columns['email']
In mysql i would just remove the 'where attribute' from the select query, however i think its a little bit more complicated here?
In Cassandra there's no easy way to count all of the rows. You basically have to scan everything. If this is something that you want to do often, you're doing it wrong. Example code:
$rows = $User->get_range("", "", 1000000);
$count = 0;
foreach($rows as $row) {
$count += 1;
}
The second answer is similar:
$rows = $User->get_range("", "", 1000000, null, array("name", "email"));
foreach($rows as $key => $columns) {
echo $columns["name"]." ".$columns["email"];
}
Tyler Hobbs give very nice example.
However if you have many users, you do not want to iterate on them all the time.
It is better to have this iteration once or twice per day and to store the data in cassandra or memcached / redis.
I also would do a CF with single row and put all usernames (or user keys) there on single row. However some considered this as odd practice and some people will not recommend it. Then you do:
$count = $cf->get_count($rowkey = 0);
note get_count() is slow operation too, so you still need to cache it.
If get_count() returns 100, you will need to upgrade your phpcassa to latest version.
About second part - if you have less 4000-5000 users, I would once again do something odd - put then on single row as supercolumns. Then read will be with just one operation:
$users = $scf->get($rowkey = 0, new ColumnSlice("", "", 5000));
foreach($users as $user){
echo $user["name"]." ".$user["email"];
}