My goal is to iterate over all rows in a specific ColumnFamily in a node.
Here is the php code (using my wrapper over phpcassa):
$ring = $cass_db->describe_ring();
foreach ($ring as $ring_details)
{
$start_token = $ring_details->start_token;
$end_token = $ring_details->end_token;
if ($start_token != null && $end_token != null)
{
$i = 0;
$batch_size = 10;
$params = array(
'token_start' => $start_token,
'token_finish' => $end_token,
'row_count' => $batch_size,
'buffer_size' => 1000
);
while ($batch = $cass_db->get_range_by_token('myColumnFamily', $params))
{
var_dump('Batch# '.$i);
foreach ($batch as $row)
{
$row_key = $row[0];
$row_values = $row[1];
var_dump($row_key);
}
$i++;
//Just to stop infinite loop
if ($i > 14)
{
die();
}
}
}
}
get_range_by_token() uses default parameters overwritten by $params.
In each batch I get the same 10 row keys.
How to iterate over all existing rows in a large Cassandra DB?
I am not a PHP developer so I may misunderstand something in your code. More, you did not specify which cassandra version you are using.
Iteration on all rows is generally done starting and ending with an empty token, and redefining the start token in each iteration. In your code I can't see where you redefine token_start in each iteration. If you don't redefine it you're querying cassandra everytime for the same range of tokens and you will get always the same resultset.
Your code should do something like this ...
start_token = '';
end_token = '';
page_size = 100;
while ( get_range_by_token('cf', start_token, end_token, page_size) {
// here I should get page_size rows (unless I'm in last iteration or table rows is smaller than page_size elements)
start_token = rows[rows.size()].getKey();
}
HTH,
Carlo
Related
So basically i'm trying to create a complex timetable and i have these two methods that each perform a different check function for me:
Checks if i have a unique array
function tutorAllot($array,$check,$period){
//check for clashes and return non colliding allotment
shuffle($array);
$rKey = array_rand($array);
if(array_key_exists($array[$rKey]['teacher_id'], $check[$period])) {
return $this->tutorAllot($array,$check,$period);
}
return $tutor = array($array[$rKey]['teacher_id'] => $array[$rKey]['subject_code']);
}
checks that each subject does not appear more than twice in a day
function checkDayLimit($data,$check){
//check double day limit
$max = 2;
$value = array_values($check);
$tempCount = array_count_values($data);
return (array_key_exists($value[0], $tempCount) && $tempCount[$value[0]] <= $max) ? true : false;
}
I'm calling the functions from a loop and populating timetable array only if all conditions area satisfied:
$outerClass = array();
foreach ($value as $ky => $val) {
$innerClass = array(); $dayCount = array();
foreach ($periods[0] as $period => $periodData) {
$innerClass[$period] = array();
if(!($periodData == 'break')){
$return = $this->Schedule->tutorAllot($val,$clashCheck,$period);
if($return){
//check that the returned allocation hasnt reached day limit
if($this->Schedule->checkDayLimit($dayCount,$return)){
$innerClass[$period] += $return;
$clashCheck[$period] += $return;
}else{
}
}
}else{
$innerClass[$period] = '';
}
}
//debug($innerClass);
$outerClass[$ky] = $innerClass;
}
My requirements
If the checkDayLimit returns false , i want to go back and call tutorAllot function again to pick a new value.
I need to do this without breaking the loop.
I was thinking maybe i could use goto statement but only when am out of options.
Is there a way i can achieve this without using goto statement.
PHP v5.5.3 Ubuntu
Your architecture seems overly complex. Instead of
pick at random >> check limit >> if at limit, go to re-pick...
Why not incorporate both checks into a single function? It would
Filter out data that is not eligible to be picked, and return an array of legitimate choices
Pick at random from the safe choices and return the pick
addendum 1
I don't think there is any need for recursion. I would use array_filter to pass the data through a function that returns true for eligible members and false for the rest. I would then take the result of array_map and make a random selection from it
I have a 2 dimensional array. Each subarray consists out of a number of options. I am trying to write a script which picks one of these options for each row. The chosen options have to be unique. An example:
$array = array(
1 => array(3,1),
2 => array(3),
3 => array(1,5,3),
);
With a solution:
$array = array(
1 => 1,
2 => 3,
3 => 5,
);
I have finished the script, but i am not sure if it is correct. This is my script. The description of what i am doing is in the comments.
function pickUnique($array){
//Count how many times each option appears
$counts = array();
foreach($array AS $options){
if(is_array($options)){
foreach($options AS $value){
//Add count
$counts[$value] = (isset($counts[$value]) ? $counts[$value]+1 : 1);
}
}
}
asort($counts);
$didChange = false;
foreach($counts AS $value => $count){
//Check one possible value, starting with the ones that appear the least amount of times
$key = null;
$scoreMin = null;
//Search each row with the value in it. Pick the row which has the lowest amount of other options
foreach($array AS $array_key => $array_options){
if(is_array($array_options)){
if(in_array($value,$array_options)){
//Get score
$score = 0;
$score = count($array_options)-1;
if($scoreMin === null OR ($score < $scoreMin)){
//Store row with lowest amount of other options
$scoreMin = $score;
$key = $array_key;
}
}
}
}
if($key !== null){
//Store that we changed something while running this function
$didChange = true;
//Change to count array. This holds how many times each value appears.
foreach($array[$key] AS $delValue){
$counts[$delValue]--;
}
//Remove chosen value from other arrays
foreach($array AS $rowKey => $options){
if(is_array($options)){
if(in_array($value,$options)){
unset($array[$rowKey][array_search($value,$options)]);
}
}
}
//Set value
$array[$key] = $value;
}
}
//validate, check if every row is an integer
$success = true;
foreach($array AS $row){
if(is_array($row)){
$success = false;
break;
}
}
if(!$success AND $didChange){
//Not done, but we made changes this run so lets try again
$array = pickUnique($array);
}elseif(!$success){
//Not done and nothing happened this function run, give up.
return null;
}else{
//Done
return $array;
}
}
My main problem is is that i have no way to verify if this is correct. Next to that i also am quite sure this problem has been solved a lot of times, but i cannot seem to find it. The only way i can verificate this (as far as i know) is by running the code a lot of times for random arrays and stopping when it encounters an insolvable array. Then i check that manually. So far the results are good, but this way of verication is ofcourse not correct.
I hope somebody can help me, either with the solution, the name of the problem or the verification method.
I am facing problem to retrieve records in descending order with pagination limit from amazon dynamodb as in mysql.
Now I am using the following script, but it gives unordered list of records. I need the last inserted id is on top.
$limit = 10;
$total = 0;
$start_key = null;
$params = array('TableName' => 'event','AttributesToGet' =>array('id','interactiondate','repname','totalamount','fooding','nonfooding','pdfdocument','isMultiple','payment_mode','interaction_type','products','programTitle','venue','workstepId','foodingOther','interaction_type_other'), 'ScanFilter'=> array('manufacturername' => array("ComparisonOperator" => "EQ", "AttributeValueList" => array(array("S" => "$manufacturername")))),'Limit'=>$limit );
$itemsArray = array();
$itemsArray = array();
$finalItemsArray = array();
$finalCRMRecords = array();
do{
if(!empty($start_key)){
$params['ExclusiveStartKey'] = $start_key->getArrayCopy();
}
$response = $this->Amazon->Dynamodb->scan($params);
if ($response->status == 200) {
$counter = (string) $response->body->Count;
$total += $counter;
foreach($response->body->Items as $itemsArray){
$finalItemsArray[] = $itemsArray;
}
if($total>$limit){
$i =1;
foreach($response->body->Items as $items){
$finalItemsArray[] = $items;
if($i == $limit){
$start_key = $items->id->{AmazonDynamoDB::TYPE_NUMBER}->to_array();
$finalCRMRecords['data'] = $finalItemsArray;
$finalCRMRecords['start_key'] = $start_key;
break;
}
$i++;
}
}elseif($total<$limit){
$start_key = $response->body->LastEvaluatedKey->to_array();
}else{
$finalCRMRecords['data'] = $finalItemsArray;
if ($response->body->LastEvaluatedKey) {
$start_key =$response->body->LastEvaluatedKey->to_array();
break;
} else {
$start_key = null;
}
$finalCRMRecords['start_key'] = $start_key;
}
}
}while($start_key);
Regards
Sandeep Kumar Sinha
A Scan operation in DynamoDB can not change the sorting of the returned items. Also is Scan a pretty expensive operation as it always requires to read the whole table.
If you want to take advantage of DynamoDB, here's one advice:
Instead of looking for information, try to just find it.
In the sense of, use lookups instead of scan/query to get the information you need.
As an example, if you have a table that stores Events. Just store all events in that table, with their EventId as HashKey. Then you can have a second table EventLookups to store lookups to EventIds. In the EventLookups table you could put an Item like LookupId: LATEST-EVENT referencing some EventId: .... Every time you insert new events you can update the LATEST-EVENT entry to point to a newer Event. Or use a SET to store the latest 50 EventIds events in one Item.
-mathias
I am dealing with 700 rows of data in my excel.
And I add on a column this entry:
foreach($data as $k => $v){
$users ->getCell('A'.$k)->setValue($v['Username']);
$users->setCellValueExplicit('B'.$k,
'=INDEX(\'Feed\'!H2:H'.$lastRow.',MATCH(A'.$k.',\'Feed\'!G2:G'.$lastRow.',0))',
PHPExcel_Cell_DataType::TYPE_FORMULA);
}
$users stands for a spreadsheet.
I see that writing 700 cells with the above setCellValueExplicit() takes more than 2 minutes to get processed. If I omit that line it takes 4 seconds for the same machine to process it.
2 minutes can be ok, but what if I have 2000 cells. Is there any way that can be speed optimized?
ps: =VLOOKUP is the same slow as the above function.
Update
The whole idea of the script:
read a CSV file (13 columns and at least 100 rows), write it into a spreadsheet, create a new spreadsheet ($users), read two columns, sort them based to one column and write it to the $users spreadsheet.
Read the columns:
$data = array();
for ($i = 1; $i <= $lastRow; $i++) {
$user = $Feed ->getCell('G'.$i)->getValue();
$number = $Feed ->getCell('H'.$i)->getValue();
$row = array('User' => $user, 'Number' => $number);
array_push($data, $row);
}
Sort the data
function cmpb($a,$b){
//get which string is less or 0 if both are the same
if($a['Number']>$b['Number']){
$cmpb = -1;
}elseif($a['Number']<$b['Number']){
$cmpb = 1;
}else{
$cmpb = 0;
}
//if the strings are the same, check name
if($cmpb == 0){
//compare the name
$cmpb = strcasecmp($a['User'], $b['User']);
}
return $cmpb;
}
usort($data, 'cmpb');
Write data
foreach($data as $k => $v){
$users ->getCell('A'.$k)->setValue($v['Username']);
$users ->getCell("B{$k}")->setValueExplicit("=INDEX('Feed'!H2:H{$lastRow},MATCH(A{$k},'Feed'!G2:G{$lastRow},0))",
PHPExcel_Cell_DataType::TYPE_FORMULA);
}
and also unset the data for memory:
unset($data);
So if comment the line with setValueExplicit everything becomes smoother.
Looking at PHPExcel's source code, this is PHPExcel_Worksheet::setCellValueExplicit function:
public function setCellValueExplicitByColumnAndRow($pColumn = 0, $pRow = 1, $pValue = null, $pDataType = PHPExcel_Cell_DataType::TYPE_STRING)
{
return $this->getCell(PHPExcel_Cell::stringFromColumnIndex($pColumn) . $pRow)->setValueExplicit($pValue, $pDataType);
}
For the data type you're using, PHPExcel_Cell_DataType::TYPE_FORMULA, the PHPExcel_Cell::setValueExplicit function just executes:
case PHPExcel_Cell_DataType::TYPE_FORMULA:
$this->_value = (string)$pValue;
break;
I can't find a logical explanation for the old up on the execution of that particular instruction. Try to replace it for the following and let me know if there was any improvement:
$users ->getCell("B{$k}")->setValueExplicit("=INDEX('Feed'!H2:H{$lastRow},MATCH(A{$k},'Feed'!G2:G{$lastRow},0))", PHPExcel_Cell_DataType::TYPE_FORMULA);
As a last resource my advice would be to time track the execution of the instruction to find the bottleneck.
I have a game script thing set up, and when it creates a new character I want it to find an empty address for that players house.
The two relevant table fields it inserts are 'city' and 'number'. The 'city' is a random number out of 10, and the 'number' can be 1-250.
What it needs to do though is make sure there's not already an entry with the 2 random numbers it finds in the 'HOUSES' table, and if there is, then change the numbers. Repeat until it finds an 'address' not in use, then insert it.
I have a method set up to do this, but I know it's shoddy- there's probably some more logical and easier way. Any ideas?
UPDATE
Here's my current code:
$found = 0;
while ($found == 0) {
$num = (rand()%250)+1; $city = (rand()%10)+1;
$sql_result2 = mysql_query("SELECT * FROM houses WHERE city='$city' AND number='$num'", $db);
if (mysql_num_rows($sql_result2) == 0) { $found = 1; }
}
You can either do this in PHP as you do or by using a MySQL trigger.
If you stick to the PHP way, then instead of generating a number every time, do something like this
$found = 0;
$cityarr = array();
$numberarr = array();
//create the cityarr
for($i=1; $i<=10;$i++)
$cityarr[] = i;
//create the numberarr
for($i=1; $i<=250;$i++)
$numberarr[] = i;
//shuffle the arrays
shuffle($cityarr);
shuffle($numberarr);
//iterate until you find n unused one
foreach($cityarr as $city) {
foreach($numberarr as $num) {
$sql_result2 = mysql_query("SELECT * FROM houses
WHERE city='$city' AND number='$num'", $db);
if (mysql_num_rows($sql_result2) == 0) {
$found = 1;
break;
}
}
if($found) break;
}
this way you don't check the same value more than once, and you still check randomly.
But you should really consider fetching all your records before the loops, so you only have one query. That would also increase the performance a lot.
like
$taken = array();
for($i=1; $i<=10;$i++)
$taken[i] = array();
$records = mysql_query("SELECT * FROM houses", $db);
while($rec = mysql_fetch_assoc($records)) {
$taken[$rec['city']][] = $rec['number'];
}
for($i=1; $i<=10;$i++)
$cityarr[] = i;
for($i=1; $i<=250;$i++)
$numberarr[] = i;
foreach($cityarr as $city) {
foreach($numberarr as $num) {
if(in_array($num, $taken[]) {
$cityNotTaken = $city;
$numberNotTaken = $number;
$found = 1;
break;
}
}
if($found) break;
}
echo 'City ' . $cityNotTaken . ' number ' . $numberNotTaken . ' is not taken!';
I would go with this method :-)
Doing it the way you say can cause problems when there is only a couple (or even 1 left). It could take ages for the script to find an empty house.
What I recommend doing is insert all 2500 records in the database (combo 1-10 with 1-250) and mark with it if it's empty or not (or create a combo table with user <> house) and match it on that.
With MySQL you can select a random entry from the database witch is empty within no-time!
Because it's only 2500 records, you can do ORDER BY RAND() LIMIT 1 to get a random row. I don't recommend this when you have much more records.