Distribute options uniquely algorithm - php

I have a 2 dimensional array. Each subarray consists out of a number of options. I am trying to write a script which picks one of these options for each row. The chosen options have to be unique. An example:
$array = array(
1 => array(3,1),
2 => array(3),
3 => array(1,5,3),
);
With a solution:
$array = array(
1 => 1,
2 => 3,
3 => 5,
);
I have finished the script, but i am not sure if it is correct. This is my script. The description of what i am doing is in the comments.
function pickUnique($array){
//Count how many times each option appears
$counts = array();
foreach($array AS $options){
if(is_array($options)){
foreach($options AS $value){
//Add count
$counts[$value] = (isset($counts[$value]) ? $counts[$value]+1 : 1);
}
}
}
asort($counts);
$didChange = false;
foreach($counts AS $value => $count){
//Check one possible value, starting with the ones that appear the least amount of times
$key = null;
$scoreMin = null;
//Search each row with the value in it. Pick the row which has the lowest amount of other options
foreach($array AS $array_key => $array_options){
if(is_array($array_options)){
if(in_array($value,$array_options)){
//Get score
$score = 0;
$score = count($array_options)-1;
if($scoreMin === null OR ($score < $scoreMin)){
//Store row with lowest amount of other options
$scoreMin = $score;
$key = $array_key;
}
}
}
}
if($key !== null){
//Store that we changed something while running this function
$didChange = true;
//Change to count array. This holds how many times each value appears.
foreach($array[$key] AS $delValue){
$counts[$delValue]--;
}
//Remove chosen value from other arrays
foreach($array AS $rowKey => $options){
if(is_array($options)){
if(in_array($value,$options)){
unset($array[$rowKey][array_search($value,$options)]);
}
}
}
//Set value
$array[$key] = $value;
}
}
//validate, check if every row is an integer
$success = true;
foreach($array AS $row){
if(is_array($row)){
$success = false;
break;
}
}
if(!$success AND $didChange){
//Not done, but we made changes this run so lets try again
$array = pickUnique($array);
}elseif(!$success){
//Not done and nothing happened this function run, give up.
return null;
}else{
//Done
return $array;
}
}
My main problem is is that i have no way to verify if this is correct. Next to that i also am quite sure this problem has been solved a lot of times, but i cannot seem to find it. The only way i can verificate this (as far as i know) is by running the code a lot of times for random arrays and stopping when it encounters an insolvable array. Then i check that manually. So far the results are good, but this way of verication is ofcourse not correct.
I hope somebody can help me, either with the solution, the name of the problem or the verification method.

Related

How to find the keys of duplicate entries in multidimensional array

I'm reporting on appointment activity and have included a function to export the raw data behind the KPIs. This raw data is stored as a CSV and I need to check for potentially duplicate consultations that have been entered.
Each row of data is assigned a unique visit ID based on the patients ID and the appointment ID. The raw data contains 30 columns of data, the duplicate check only needs to be performed on 7 of these. I have imported the CSV and created an array as below for first record and then append rest on.
$mds = array(
$unique_visit_id => array(
$appt_date,
$dob,
$site,
$CCG,
$GP,
$appt_type,
$treatment_scheme
)
);
What I need is to scan the $mds array and return an array containing just the $unique_visit_id for any duplicate arrays.
e.g. keys 1111, 2222 and 5555 all references arrays that contain the same value for all seven values, then I would need 2222 and 5555 returned.
I've tried search but not coming up with anything that is working.
Thanks
This is what I've gone with, still validating (data set is very big) but seems to be functioning as expected so far
$handle = fopen("../reports/mds_full_export.csv", "r");
$visits = array();
while($data = fgetcsv($handle,0,',','"') !== FALSE){
$key = $data['unique_visit_id'];
$value = $data['$appt_date'].$data['$dob'].$data['$site'].$data['$CCG'].$data['$GP'].$data['$appt_type'].$data['$treatment_scheme'];
$visits[$key] = $value;
}
$visits = asort($visits);
$previous = "";
$dupes = array();
foreach($visits as $id => $visit){
if(strcmp($previous, $visit) == 0){
$dupes[] = $id;
}
$previous = $visit;
}
return $dupes;

php jumping to previous statement inside a loop

So basically i'm trying to create a complex timetable and i have these two methods that each perform a different check function for me:
Checks if i have a unique array
function tutorAllot($array,$check,$period){
//check for clashes and return non colliding allotment
shuffle($array);
$rKey = array_rand($array);
if(array_key_exists($array[$rKey]['teacher_id'], $check[$period])) {
return $this->tutorAllot($array,$check,$period);
}
return $tutor = array($array[$rKey]['teacher_id'] => $array[$rKey]['subject_code']);
}
checks that each subject does not appear more than twice in a day
function checkDayLimit($data,$check){
//check double day limit
$max = 2;
$value = array_values($check);
$tempCount = array_count_values($data);
return (array_key_exists($value[0], $tempCount) && $tempCount[$value[0]] <= $max) ? true : false;
}
I'm calling the functions from a loop and populating timetable array only if all conditions area satisfied:
$outerClass = array();
foreach ($value as $ky => $val) {
$innerClass = array(); $dayCount = array();
foreach ($periods[0] as $period => $periodData) {
$innerClass[$period] = array();
if(!($periodData == 'break')){
$return = $this->Schedule->tutorAllot($val,$clashCheck,$period);
if($return){
//check that the returned allocation hasnt reached day limit
if($this->Schedule->checkDayLimit($dayCount,$return)){
$innerClass[$period] += $return;
$clashCheck[$period] += $return;
}else{
}
}
}else{
$innerClass[$period] = '';
}
}
//debug($innerClass);
$outerClass[$ky] = $innerClass;
}
My requirements
If the checkDayLimit returns false , i want to go back and call tutorAllot function again to pick a new value.
I need to do this without breaking the loop.
I was thinking maybe i could use goto statement but only when am out of options.
Is there a way i can achieve this without using goto statement.
PHP v5.5.3 Ubuntu
Your architecture seems overly complex. Instead of
pick at random >> check limit >> if at limit, go to re-pick...
Why not incorporate both checks into a single function? It would
Filter out data that is not eligible to be picked, and return an array of legitimate choices
Pick at random from the safe choices and return the pick
addendum 1
I don't think there is any need for recursion. I would use array_filter to pass the data through a function that returns true for eligible members and false for the rest. I would then take the result of array_map and make a random selection from it

Row Iteration not working

My goal is to iterate over all rows in a specific ColumnFamily in a node.
Here is the php code (using my wrapper over phpcassa):
$ring = $cass_db->describe_ring();
foreach ($ring as $ring_details)
{
$start_token = $ring_details->start_token;
$end_token = $ring_details->end_token;
if ($start_token != null && $end_token != null)
{
$i = 0;
$batch_size = 10;
$params = array(
'token_start' => $start_token,
'token_finish' => $end_token,
'row_count' => $batch_size,
'buffer_size' => 1000
);
while ($batch = $cass_db->get_range_by_token('myColumnFamily', $params))
{
var_dump('Batch# '.$i);
foreach ($batch as $row)
{
$row_key = $row[0];
$row_values = $row[1];
var_dump($row_key);
}
$i++;
//Just to stop infinite loop
if ($i > 14)
{
die();
}
}
}
}
get_range_by_token() uses default parameters overwritten by $params.
In each batch I get the same 10 row keys.
How to iterate over all existing rows in a large Cassandra DB?
I am not a PHP developer so I may misunderstand something in your code. More, you did not specify which cassandra version you are using.
Iteration on all rows is generally done starting and ending with an empty token, and redefining the start token in each iteration. In your code I can't see where you redefine token_start in each iteration. If you don't redefine it you're querying cassandra everytime for the same range of tokens and you will get always the same resultset.
Your code should do something like this ...
start_token = '';
end_token = '';
page_size = 100;
while ( get_range_by_token('cf', start_token, end_token, page_size) {
// here I should get page_size rows (unless I'm in last iteration or table rows is smaller than page_size elements)
start_token = rows[rows.size()].getKey();
}
HTH,
Carlo

Optimizing setCellValueExplicit() in PHPExcel

I am dealing with 700 rows of data in my excel.
And I add on a column this entry:
foreach($data as $k => $v){
$users ->getCell('A'.$k)->setValue($v['Username']);
$users->setCellValueExplicit('B'.$k,
'=INDEX(\'Feed\'!H2:H'.$lastRow.',MATCH(A'.$k.',\'Feed\'!G2:G'.$lastRow.',0))',
PHPExcel_Cell_DataType::TYPE_FORMULA);
}
$users stands for a spreadsheet.
I see that writing 700 cells with the above setCellValueExplicit() takes more than 2 minutes to get processed. If I omit that line it takes 4 seconds for the same machine to process it.
2 minutes can be ok, but what if I have 2000 cells. Is there any way that can be speed optimized?
ps: =VLOOKUP is the same slow as the above function.
Update
The whole idea of the script:
read a CSV file (13 columns and at least 100 rows), write it into a spreadsheet, create a new spreadsheet ($users), read two columns, sort them based to one column and write it to the $users spreadsheet.
Read the columns:
$data = array();
for ($i = 1; $i <= $lastRow; $i++) {
$user = $Feed ->getCell('G'.$i)->getValue();
$number = $Feed ->getCell('H'.$i)->getValue();
$row = array('User' => $user, 'Number' => $number);
array_push($data, $row);
}
Sort the data
function cmpb($a,$b){
//get which string is less or 0 if both are the same
if($a['Number']>$b['Number']){
$cmpb = -1;
}elseif($a['Number']<$b['Number']){
$cmpb = 1;
}else{
$cmpb = 0;
}
//if the strings are the same, check name
if($cmpb == 0){
//compare the name
$cmpb = strcasecmp($a['User'], $b['User']);
}
return $cmpb;
}
usort($data, 'cmpb');
Write data
foreach($data as $k => $v){
$users ->getCell('A'.$k)->setValue($v['Username']);
$users ->getCell("B{$k}")->setValueExplicit("=INDEX('Feed'!H2:H{$lastRow},MATCH(A{$k},'Feed'!G2:G{$lastRow},0))",
PHPExcel_Cell_DataType::TYPE_FORMULA);
}
and also unset the data for memory:
unset($data);
So if comment the line with setValueExplicit everything becomes smoother.
Looking at PHPExcel's source code, this is PHPExcel_Worksheet::setCellValueExplicit function:
public function setCellValueExplicitByColumnAndRow($pColumn = 0, $pRow = 1, $pValue = null, $pDataType = PHPExcel_Cell_DataType::TYPE_STRING)
{
return $this->getCell(PHPExcel_Cell::stringFromColumnIndex($pColumn) . $pRow)->setValueExplicit($pValue, $pDataType);
}
For the data type you're using, PHPExcel_Cell_DataType::TYPE_FORMULA, the PHPExcel_Cell::setValueExplicit function just executes:
case PHPExcel_Cell_DataType::TYPE_FORMULA:
$this->_value = (string)$pValue;
break;
I can't find a logical explanation for the old up on the execution of that particular instruction. Try to replace it for the following and let me know if there was any improvement:
$users ->getCell("B{$k}")->setValueExplicit("=INDEX('Feed'!H2:H{$lastRow},MATCH(A{$k},'Feed'!G2:G{$lastRow},0))", PHPExcel_Cell_DataType::TYPE_FORMULA);
As a last resource my advice would be to time track the execution of the instruction to find the bottleneck.

Parent Child Relationships PHP/MYSQL

I have a table like this:
id
name
parent_id
I then want to select certain rows based on their id, so something like this:
SELECT *
FROM TABLE
WHERE id IN ('1', '5', '8', '9', '35')
I want to, from this query, also show the parent/child relationship, like:
id parent
-----------
1 0
5 1
8 0
9 8
35 9
So the final output would look something like this:
1
--5
8
--9
----35
Do I do this outside of mysql, i have tried using arrays, but can't figure it out, or
Do I do it inside MYSQL, which i don't know how to do that either.
Here is what I was able to come with which seems to be working great.
PS-Sorry about the formatting, can't figure it out :( (fixed?)
I grab my parent_id and id from MYSQL and put it into an arraly where the array keys are the id's and the values are the parents, so with in the while loop for mysql, something like this: $testarray[$id] = $parent_id;
Then I run it through the functions below, and it orders it just how I need it.
function retrieveSubTree($parent, $myarray) {
$tempArray = $myarray;
$array = array();
//now we have our top level parent, lets put its children into an array, yea!
while ($child = array_search($parent, $tempArray)) {
unset($tempArray[$child]);
//now lets get all this guys children
if (in_array($child, $tempArray)) {
$array[$child] = retrieveSubTree($child, $tempArray);
} else {
$array[$child] = true;
}
}//end while
return (!empty($array)) ? $array : false;
}
function retrieveTree($myarray) {
$array = array();
$counter = 0;
foreach ($myarray as $key => $value) {
$child = $key;
$parent = $value;
//if this child is a parent of somebody else
if (in_array($child, $myarray) && $parent != '0') {
while ($myarray[$parent] != '' && $myarray[$parent] != '0') {
$newparent = $myarray[$parent];
$parent = $newparent;
}
if (!array_key_exists($parent, $array)) {
$array[$parent] = retrieveSubTree($parent, $myarray);
}
} else {
//now make sure they don't appear as some child
if (!array_key_exists($parent, $myarray)) {
//see if it is a parent of anybody
if (in_array($child, $myarray)) {
$array[$child] = retrieveSubTree($child, $myarray);
} else {
$array[$child] = true;
}
}//end if array key
}//end initial in array
}//end foreach
return (!empty($array) ? $array : false);
}
$test = array(
'1'=>'15',
'2'=>'1',
'3'=>'1',
'4'=>'0',
'5'=>'0',
'6'=>'4',
'7'=>'6',
'8'=>'7',
'9'=>'2',
'10'=>'9'
);
print_r(retrieveTree($test));
Without changing your table structure, this requires recursion, which MySQL does not support. You'll have to do it elsewhere. You can write a recursive function in PHP to use, for example, breadth-first search to build your array. Here it looks like you are using parent_id of 0 to denote a top-level object. You can search over your results, and add to your array every object whose parent is zero, which will give you an array with 1 and 8. Then you can recurse: find all the results with a parent of 1, and add that as a subarray to 1; then find all the results with a parent of 8 and add those as a subarray of 8. Continue doing this for each level until you've run out of results.
As other posters pointed out, you can do this natively in MySQL if you can change the table structure.

Categories