Grouping N person randomly but evenly? (Name & Gender) - php

So, I tried to create an algorithm(?) to assign a person to a classroom. The requirement for each class is :
Have at least 30 people and maximum of 45
The person name will not be "Homogen" (e.g: class 1 - 3 has all person name started with the letter "A", while class 4-5 the letter "B" etc.)
The gender is also evenly distributed
If the class is full, the remaining person will be moved to waiting list
My data has the column Unique ID, Name, and Gender. I'm still new to this kind of stuff (Algorithm?) so I don't even know where to start. Is it even possible? Where do I start? I am using PHP and my data is in MySQL Database

Step 1
You need to get data from DateBase (all people)
$host = '***';
$user = '***'';
$password = '***'';
$database = '***'';
$link = mysqli_connect($host, $user, $password, $database) or die("Error" . mysqli_error($link));
$query = "SELECT * FROM people";
$people = mysqli_query($link, $query) or die("Error" . mysqli_error($link));
mysqli_close($link);
Step 2
Conver mysql_result to Array and shuffle it.
$people = [];
foreach ($result as $person) {
$people[] = $person;
}
shuffle($people);
Step 3
There is algorithm:
$count = count($people);
// Classes
$classes = [];
const MIN_SIZE = 30;
const MAX_SIZE = 45;
$maxSizeClass= $count / MIN_SIZE;
$minSizeClass= $count / MAX_SIZE;
$countClasses = max(ceil($minSizeClass), floor($maxSizeClass));
$currentCountClass = $count / $countClasses;
$tmpClass = [];
foreach ($people as $person) {
if (count($tmpClass) < $currentCountClass) {
$tmpClass[] = $person;
} else {
$classes[] = $tmpClass;
$tmpClass = [];
}
}
if (count($tmpClass) >= MIN_SIZE) {
$classes[] = $tmpClass;
$tmpClass = [];
}
foreach ($tmpClass as $index => $person) {
foreach ($classes as &$class) {
if (count($class) < MAX_SIZE) {
$class[] = $person;
// be careful, PHP7 is OK
unset($tmpClass[$index]);
continue 2;
}
}
}
// persons awaiting distribution
$waitingQueue = $tmpClass;
Step 4
Result is:
$waitingQueue - persons awaiting distribution
$classes - classes with persons

$letters = array('a','b',....,'y','z');
foreach($letters as $letter){
$sql['male'] = "SELECT * FROM people_table WHERE person_name LIKE '".$letter."%' AND person_gender = 'male' ORDER BY person_name";
$sql['female'] = "SELECT * FROM people_table WHERE person_name LIKE '".$letter."%' AND person_gender = 'female' ORDER BY person_name";
foreach($sql as $key => $query){
$results[$key] = $connection->query($query);
for($i = 0; $i < $results[$key]->num_rows; $i++){
$people[$letter][$key][] = results->fetch_array(MYSQLI_ASSOC);
}
}
}
Here we have the lists of people listed by gender by letter... Now we can loop it and insert a man and a woman by pairs. If the count(); of the list of pairs is lesser than 30, people wait more. If bigger than 44 (because in pair isn't possible to have 45 people, if I don't missunderstand the question, of course) then save this 44 in a class array $class[$letter] which you can see all classes by each letter. To know about how many classes you have in total, you can use count($class); or if you would like to know how many classes of a specific letter you can do count($class[$letter]);.
You can redo other foreach in the $letters array or just put the loop inside the foreachabove to create the array of classes.
Inside the foreach($letters as $letter){} at the final:
if( !(count($people[$letter][$key]) < 15 OR count($people[$letter][$key]) < 15) ){
$she = count($people[$letter]['female'];
$he = count($people[$letter]['male'];
if($she < $he){
for($i = 0; $i < 2*count($she); $i++){
$class[$letter][$i] = $people[$letter]['female'][$i];
$class[$letter][$i+1] = $people[$letter]['male'][$i];
$i++;//Important to avoid replace values!
}
} else {
for($i = 0; $i < 2*count($he); $i++){
$class[$letter][$i] = $people[$letter]['female'][$i];
$class[$letter][$i+1] = $people[$letter]['male'][$i];
$i++;//Important to avoid replace values!
}
}
A false boolean in the bigger if means cannot create a class with this letter gender evenly distributed. You can loop again to make each class in an entry of an array.

Related

MySQL Select Random from All but not double time

i want to show Random Item with Select by WHERE code = $Code
But if he finds like 10 Items, i want to show only 1 by 1. All 10 must show in a Row, Random Order with a 5 seconds timer. The Problem is, i dont will show the same Item again and again. If the 10 Items are done, he must look again to database and start again
I already tried to do it with putting all in array but my skills are low :(
Thats my try...
$result = $db->query( 'SELECT * FROM ads WHERE zipcode = "'.$zipcode.'";');
while($row = $result->fetch_assoc()) {
$data[] = $row;
}
$count = mysqli_num_rows($result);
for($i = 0; $i < $count; $i++) {
$random = rand(0, $count);
$count -= 1;
$result[] = $data[$random];
echo json_encode($data);
unset($data[$random]);
}

Is there a faster way than array_diff in PHP

I have a set of numbers from MySQL within the range 1000 0000 (8 digits) to 9 999 999 999 (10 digits). It's supposed to be consecutive, but there are missing numbers. I need to know which numbers are missing.
The range is huge. At first I was going to use PHP to do this:
//MySqli Select Query
$results = $mysqli->query("SELECT `OCLC Number` FROM `MARC Records by Number`");
$n_array = array();
while($row = $results->fetch_assoc()) {
$n_array[] = $row["OCLC Number"];
}
d($n_array);
foreach($n_array as $k => $val) {
print $val . " ";
}
/* 8 digits */
$counter = 10000000;
$master_array = array();
/* 10 digits */
while ($counter <= 9999999999 ) {
$master_array[] = $counter;
$counter++;
d($master_array);
}
d($master_array);
$missing_numbers_ar = array_diff ($master_array, $n_array);
d($missing_numbers_ar);
d() is a custom function akin to var_dump().
However, I just realized it would take tons of time for this to be done. At the 15 minute mark, $master_array is being populated with only 4000 numbers.
How can I do this in a quicker way? MySQL-only or MySQL-and-PHP solutions both welcome. If the optimal solution depends on how many numbers are missing, please let me know how so. Tq.
Your d() probably is the cause of slowness, please remove it, and make small changes in your code
while($row = $results->fetch_assoc()) {
$n_array[$row["OCLC Number"]] = 1;
}
and
$missing_numbers_ar = [];
while ($counter++ <= 9999999999 ) {
if (empty($n_array[$counter])) {
$missing_numbers_ar[] = $counter;
}
}
If the following is still slow I would be surprised. I also just noticed it is similar to #Hieu Vo's answer.
// Make sure the data is returned in order by adding
// an `ORDER BY ...` clause.
$results = $mysqli->query("SELECT `OCLC Number`
FROM `MARC Records by Number`
ORDER BY `OCLC Number`");
$n_array = array();
while($row = $results->fetch_assoc()) {
// Add the "OCLC Number" as a key to the array.
$n_array[$row["OCLC Number"]] = $row["OCLC Number"];
}
// assume the first array key is in fact correct
$i = key($n_array);
// get the last key, also assume it is not missing.
end($n_array);
$max = key($n_array);
// reset the array (should not be needed)
reset($n_array);
do {
if (! $n_array[$i]) {
echo 'Missing key:['.$i.']<br />';
// flush the data to the page as you go.
flush();
}
} while(++$i <= $max);

How can I write a query to select similar titles?

I would like to select those movies which have similar titles.
I found this, but this way it dosn't work, it gives nothing. I would like to give toy story 2, toy story 3 and others with similar title like toy soldielrs, etc.
$title = "Toy Story";
$query = mysql_query("SELECT title, year, poster, LEVENSHTEIN_RATIO( ".$title.", title ) as textDiff FROM movies HAVING textDiff > 60");
I can compare strings in PHP with this function:
static public function string_compare($str_a, $str_b)
{
$length = strlen($str_a);
$length_b = strlen($str_b);
$i = 0;
$segmentcount = 0;
$segmentsinfo = array();
$segment = '';
while ($i < $length)
{
$char = substr($str_a, $i, 1);
if (strpos($str_b, $char) !== FALSE)
{
$segment = $segment.$char;
if (strpos($str_b, $segment) !== FALSE)
{
$segmentpos_a = $i - strlen($segment) + 1;
$segmentpos_b = strpos($str_b, $segment);
$positiondiff = abs($segmentpos_a - $segmentpos_b);
$posfactor = ($length - $positiondiff) / $length_b;
$lengthfactor = strlen($segment)/$length;
$segmentsinfo[$segmentcount] = array( 'segment' => $segment, 'score' => ($posfactor * $lengthfactor));
}
else
{
$segment = '';
$i--;
$segmentcount++;
}
}
else
{
$segment = '';
$segmentcount++;
}
$i++;
}
// PHP 5.3 lambda in array_map
$totalscore = array_sum(array_map(function($v) { return $v['score']; }, $segmentsinfo));
return $totalscore;
}
But how can I compare in a SELECT query or any other way?
You can use like queries for that:
Following example will return all the records from table customer for which customer name ends with kh
select * from customer where name like '%kh'
Following example will return all the records from table customer for which customer name start with kh
select * from customer where name like 'kh%'
Following example will return all the records from table customer for which the middle world of customer name is kh
select * from customer where name like 'kh%'
if you want more specific record then add some and/or condition in your query
I recommend you to read this
http://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html#operator_like
I think you might need to define how similar things need to be to be considered a match.
But if you just wanna search for containing words, you could split your search string by whitespaces and use it in a REGEXP in your query
$search_array = explode(" ", "Toy story");
$query = "SELECT title, year, poster FROM movies WHERE title REGEXP '".implode("|", $search_array)."'";
This would probably match a lot rows, but you could make a more restrictive regular expression.

PHP Loop - dealing with non-sequential iterations

I have the following code - it produces a series of queries that are sent to a database:
$a = 'q';
$aa = 1;
$r = "$a$aa";
$q = 54;
while($aa <= $q){
$query .= "SELECT COUNT(". $r .") as Responses FROM tresults;";
$aa = $aa + 1;
$r = "$a$aa";
}
The issue I have is simple, within the database, the number is not sequential.
I have fields that go from q1 to q13 but then goes q14a, q14b, q14c, q14d and q14e and then from q15 to q54.
I've looked at continue but that's more for skipping iterations and hasn't helped me.
I'm struggling to adapt the above code to handle this non-sequential situation. Any ideas and suggestions welcomed.
I have fields that go from q1 to q13 but then goes q14a, q14b, q14c, q14d and q14e and then from q15 to q54.
for($i=1; $i<=54; ++$i) {
if($i != 14) {
echo 'q' . $i . "<br>";
}
else {
for($j='a'; $j<='e'; ++$j) {
echo 'q14' . $j . "<br>";
}
}
}
If you don’t need to execute the statements in order of numbering, then you could also just skip one in the first loop if the counter is 14, and then have a second loop (not nested into the first one), that does the q14s afterwards.
You could get the columns from the table and test to see if they start with q (or use a preg_match):
$result = query("DESCRIBE tresults");
while($row = fetch($result)) {
if(strpos($row['Field'], 'q') === 0) {
$query .= "SELECT COUNT(". $r .") as Responses FROM tresults;";
}
}
Or build the columns array and use it:
$columns = array('q1', 'q2', 'q54'); //etc...
foreach($columns as $r) {
$query .= "SELECT COUNT(". $r .") as Responses FROM tresults;";
}

K-means clustering: What's wrong? (PHP)

I was looking for a way to calculate dynamic market values in a soccer manager game. I asked this question here and got a very good answer from Alceu Costa.
I tried to code this algorithm (90 elements, 5 clustes) but it doesn't work correctly:
In the first iteration, a high percentage of the elements changes its cluster.
From the second iteration, all elements change their cluster.
Since the algorithm normally works until convergence (no element changes its cluster), it doesn't finish in my case.
So I set the end to the 15th iteration manually. You can see that it runs infinitely.
You can see the output of my algorithm here. What's wrong with it? Can you tell me why it doesn't work correctly?
I hope you can help me. Thank you very much in advance!
Here's the code:
<?php
include 'zzserver.php';
function distance($player1, $player2) {
global $strengthMax, $maxStrengthMax, $motivationMax, $ageMax;
// $playerX = array(strength, maxStrength, motivation, age, id);
$distance = 0;
$distance += abs($player1['strength']-$player2['strength'])/$strengthMax;
$distance += abs($player1['maxStrength']-$player2['maxStrength'])/$maxStrengthMax;
$distance += abs($player1['motivation']-$player2['motivation'])/$motivationMax;
$distance += abs($player1['age']-$player2['age'])/$ageMax;
return $distance;
}
function calculateCentroids() {
global $cluster;
$clusterCentroids = array();
foreach ($cluster as $key=>$value) {
$strenthValues = array();
$maxStrenthValues = array();
$motivationValues = array();
$ageValues = array();
foreach ($value as $clusterEntries) {
$strenthValues[] = $clusterEntries['strength'];
$maxStrenthValues[] = $clusterEntries['maxStrength'];
$motivationValues[] = $clusterEntries['motivation'];
$ageValues[] = $clusterEntries['age'];
}
if (count($strenthValues) == 0) { $strenthValues[] = 0; }
if (count($maxStrenthValues) == 0) { $maxStrenthValues[] = 0; }
if (count($motivationValues) == 0) { $motivationValues[] = 0; }
if (count($ageValues) == 0) { $ageValues[] = 0; }
$clusterCentroids[$key] = array('strength'=>array_sum($strenthValues)/count($strenthValues), 'maxStrength'=>array_sum($maxStrenthValues)/count($maxStrenthValues), 'motivation'=>array_sum($motivationValues)/count($motivationValues), 'age'=>array_sum($ageValues)/count($ageValues));
}
return $clusterCentroids;
}
function assignPlayersToNearestCluster() {
global $cluster, $clusterCentroids;
$playersWhoChangedClusters = 0;
// BUILD NEW CLUSTER ARRAY WHICH ALL PLAYERS GO IN THEN START
$alte_cluster = array_keys($cluster);
$neuesClusterArray = array();
foreach ($alte_cluster as $alte_cluster_entry) {
$neuesClusterArray[$alte_cluster_entry] = array();
}
// BUILD NEW CLUSTER ARRAY WHICH ALL PLAYERS GO IN THEN END
foreach ($cluster as $oldCluster=>$clusterValues) {
// FOR EVERY SINGLE PLAYER START
foreach ($clusterValues as $player) {
// MEASURE DISTANCE TO ALL CENTROIDS START
$abstaende = array();
foreach ($clusterCentroids as $CentroidId=>$centroidValues) {
$distancePlayerCluster = distance($player, $centroidValues);
$abstaende[$CentroidId] = $distancePlayerCluster;
}
arsort($abstaende);
if ($neuesCluster = each($abstaende)) {
$neuesClusterArray[$neuesCluster['key']][] = $player; // add to new array
// player $player['id'] goes to cluster $neuesCluster['key'] since it is the nearest one
if ($neuesCluster['key'] != $oldCluster) {
$playersWhoChangedClusters++;
}
}
// MEASURE DISTANCE TO ALL CENTROIDS END
}
// FOR EVERY SINGLE PLAYER END
}
$cluster = $neuesClusterArray;
return $playersWhoChangedClusters;
}
// CREATE k CLUSTERS START
$k = 5; // Anzahl Cluster
$cluster = array();
for ($i = 0; $i < $k; $i++) {
$cluster[$i] = array();
}
// CREATE k CLUSTERS END
// PUT PLAYERS IN RANDOM CLUSTERS START
$sql1 = "SELECT ids, staerke, talent, trainingseifer, wiealt FROM ".$prefix."spieler LIMIT 0, 90";
$sql2 = mysql_abfrage($sql1);
$anzahlSpieler = mysql_num_rows($sql2);
$anzahlSpielerProCluster = $anzahlSpieler/$k;
$strengthMax = 0;
$maxStrengthMax = 0;
$motivationMax = 0;
$ageMax = 0;
$counter = 0; // for $anzahlSpielerProCluster so that all clusters get the same number of players
while ($sql3 = mysql_fetch_assoc($sql2)) {
$assignedCluster = floor($counter/$anzahlSpielerProCluster);
$cluster[$assignedCluster][] = array('strength'=>$sql3['staerke'], 'maxStrength'=>$sql3['talent'], 'motivation'=>$sql3['trainingseifer'], 'age'=>$sql3['wiealt'], 'id'=>$sql3['ids']);
if ($sql3['staerke'] > $strengthMax) { $strengthMax = $sql3['staerke']; }
if ($sql3['talent'] > $maxStrengthMax) { $maxStrengthMax = $sql3['talent']; }
if ($sql3['trainingseifer'] > $motivationMax) { $motivationMax = $sql3['trainingseifer']; }
if ($sql3['wiealt'] > $ageMax) { $ageMax = $sql3['wiealt']; }
$counter++;
}
// PUT PLAYERS IN RANDOM CLUSTERS END
$m = 1;
while ($m < 16) {
$clusterCentroids = calculateCentroids(); // calculate new centroids of the clusters
$playersWhoChangedClusters = assignPlayersToNearestCluster(); // assign each player to the nearest cluster
if ($playersWhoChangedClusters == 0) { $m = 1001; }
echo '<li>Iteration '.$m.': '.$playersWhoChangedClusters.' players have changed place</li>';
$m++;
}
print_r($cluster);
?>
It's so embarassing :D I think the whole problem is caused by only one letter:
In assignPlayersToNearestCluster() you can find arsort($abstaende);. After that, the function each() takes the first value. But it's arsort so the first value must be the highest. So it picks the cluster which has the highest distance value.
So it should be asort, of course. :) To prove that, I've tested it with asort - and I get convergence after 7 iterations. :)
Do you think that was the mistake? If it was, then my problem is solved. In that case: Sorry for annoying you with that stupid question. ;)
EDIT: disregard, I still get the same result as you, everyone winds up in cluster 4. I shall reconsider my code and try again.
I think I've realised what the problem is, k-means clustering is designed to break up differences in a set, however, because of the way you calculate averages etc. we are getting a situation where there are no large gaps in the ranges.
Might I suggest a change and only concentrate on a single value(strength appears to make most sense to me) to determine the clusters, or abandon this sorting method altogether, and adopt something different(not what you want to hear I know)?
I found a rather nice site with an example k-mean sort using integers, I'm going to try and edit that, I will get back with the results some time tomorrow.
http://code.blip.pt/2009/04/06/k-means-clustering-in-php/ <-- link I mentioned and forgot about.

Categories