Data relationship - looking for solutions - php

I have a problem I need to solve and I'm sure there is a way of doing this, I'm just not exactly sure "what to search for" and how to find it.
I was thinking of doing this either in Excel or I could maybe try to make a PHP script to do it.
So basically, I have a set of substances. Each pair of substances is either compatible or incompatible with another one. So what I have is a table with rows and columns where there is either 0 or 1, i.e. compatible/incompatible.
Now what I want to do is try to find groups of substances, where all substances in that group are compatible with each other. And the goal is to find as large group as possible, or ideally, find the largest, second largest etc. and sort them from largest to smallest (given there could be some limitation for the minimum number of elements in that group).
I hope it makes sense, the problem is that I'm not sure how to solve it, but I think this is something that should be relatively commonly done and so I doubt the only way is writing a script/macro from scratch that would use brute force to do this. This would also probably not be very efficient as I have a table with over 30 elements.
So just to make it more clear, for example here is a simplified table of what my data looks like:
Substance A B C D
A 0 1 1 1
B 1 0 0 1
C 1 0 0 0
D 1 0 0 0

If you use only php without database you can use uasort to sort by sum all elements of related array.
<?php
$substances = [
'A' => [
'A' => 0,
'B' => 1,
'C' => 1,
'D' => 0,
],
'B' => [
'A' => 1,
'B' => 0,
'C' => 1,
'D' => 1,
],
'C' => [
'A' => 0,
'B' => 1,
'C' => 0,
'D' => 0,
]
];
uasort ($substances, function ($a, $b) {
$a = array_sum($a);
$b = array_sum($b);
if ($a == $b) {
return 0;
}
return ($a > $b) ? -1 : 1;
});
var_export($substances);

Related

Get dense rank and gapped rank for all items in array

I want to calculate and store the dense rank and gapped rank for all entries in an array using PHP.
I want to do this in PHP (not MySQL because I am dealing with dynamic combinations 100,000 to 900 combinations per week, that’s why I cannot use MySQL to make that many tables.
My code to find the dense ranks is working, but the gapped ranks are not correct.
PHP code
$members = [
['num' => 2, 'rank' => 0, 'dense_rank' => 0],
['num' => 2, 'rank' => 0, 'dense_rank' => 0],
['num' => 3, 'rank' => 0, 'dense_rank' => 0],
['num' => 3, 'rank' => 0, 'dense_rank' => 0],
['num' => 3, 'rank' => 0, 'dense_rank' => 0],
['num' => 3, 'rank' => 0, 'dense_rank' => 0],
['num' => 3, 'rank' => 0, 'dense_rank' => 0],
['num' => 5, 'rank' => 0, 'dense_rank' => 0],
['num' => 9, 'rank' => 0, 'dense_rank' => 0],
['num' => 9, 'rank' => 0, 'dense_rank' => 0],
['num' => 9, 'rank' => 0, 'dense_rank' => 0]
];
$rank=0;
$previous_rank=0;
$dense_rank=0;
$previous_dense_rank=0;
foreach($members as &$var){
//star of rank
if($var['num']==$previous_rank){
$var['rank']=$rank;
}else{
$var['rank']=++$rank;
$previous_rank=$var['num'];
}//end of rank
//star of rank_dense
if($var['num']===$previous_dense_rank){
$var['dense_rank']=$dense_rank;
++$dense_rank;
}else{
$var['dense_rank']=++$dense_rank;
$previous_dense_rank=$var['num'];
}
//end of rank_dense
echo $var['num'].' - '.$var['rank'].' - '.$var['dense_rank'].'<br>';
}
?>
My flawed output is:
num
rank
dynamic rank
2
1
1
2
1
1
3
2
3
3
2
3
3
2
4
3
2
5
3
2
6
5
3
8
9
4
9
9
4
9
9
4
10
Notice when the error happens and there is a higher number in the number column it corrects the error in that row. See that when the number goes from 3 to 5.
Given that your results are already sorted in an ascending fashion...
For dense ranking, you need to only increment your counter when a new score is encountered.
For gapped ranking, you need to unconditionally increment your counter and use the counter value for all members with the same score.
??= is the "null coalescing assignment" operator (a breed of "combined operator"). It only allows the right side operand to be executed/used if the left side operand is not declared or is null. This is a technique of performing conditional assignments without needing to write a classic if condition.
Code: (Demo)
$denseRank = 0;
$gappedRank = 0;
foreach ($members as &$row) {
$denseRanks[$row['num']] ??= ++$denseRank;
$row['dense_rank'] = $denseRanks[$row['num']];
++$gappedRank;
$gappedRanks[$row['num']] ??= $gappedRank;
$row['rank'] = $gappedRanks[$row['num']];
// for better presentation:
echo json_encode($row) . "\n";
}
Output:
{"num":2,"rank":1,"dense_rank":1}
{"num":2,"rank":1,"dense_rank":1}
{"num":3,"rank":3,"dense_rank":2}
{"num":3,"rank":3,"dense_rank":2}
{"num":3,"rank":3,"dense_rank":2}
{"num":3,"rank":3,"dense_rank":2}
{"num":3,"rank":3,"dense_rank":2}
{"num":5,"rank":8,"dense_rank":3}
{"num":9,"rank":9,"dense_rank":4}
{"num":9,"rank":9,"dense_rank":4}
{"num":9,"rank":9,"dense_rank":4}
For the record, if you are dealing with huge volumes of data, I would be using SQL instead of PHP for this task.
It seems like you want the dynamic rank to be sequential?
Your sample data appears to be sorted, if this remains true for your real data then you can remove the conditional and just increment the variable as you assign it:
//start of rank_dense
$var['dense_rank']=++$dense_rank;
//end of rank_dense
It sounds like you're saying you won't be implementing a database.
Databases like MySQL can easily handle the workload numbers you outlined and they can sort your data as well. You may want to reconsider.

Minimise PHP PDO network transfer data in transit to MySQL

Assuming a target table like this:
CREATE TABLE mysql_mytable (
myint INT,
mytinyint TINYINT,
mydecimal DECIMAL(10,2),
mydatetime DATETIME, mydate DATE
)
And the following code with multiple test cases:
$mysql_pdo = new PDO("mysql:...", ..., [
PDO::ATTR_PERSISTENT => true,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
]);
foreach ([
'myint' => [
'a' => [PDO::PARAM_STR, "1048575"],
'b' => [PDO::PARAM_STR, 1048575],//same as previous?
'c' => [PDO::PARAM_STR, dechex(1048575), 'UNHEX(?)'],//"FFFFF"
'e' => [PDO::PARAM_INT, 1048575],//fewest as 4 byte int?
'f' => [PDO::PARAM_INT, "1048575"],//same as previous?
],
'mytinyint' => [
'a' => [PDO::PARAM_STR, "255"],
'b' => [PDO::PARAM_STR, 255],//same as previous?
'c' => [PDO::PARAM_STR, dechex(255), 'UNHEX(?)'],//"FF" fewest bytes as VARCHAR?
'e' => [PDO::PARAM_INT, 255],
'f' => [PDO::PARAM_INT, "255"],//same as previous?
],
'mydecimal' => [//PDO::PARAM_INT cannot be used correctly for decimals?
'a' => [PDO::PARAM_STR, "32000000.00"],
'b' => [PDO::PARAM_STR, "3.2e7"],//fewest bytes as VARCHAR?
],
'mydatetime' => [
'a' => [PDO::PARAM_STR, "2021-05-10 09:09:39"],
'c' => [PDO::PARAM_STR, strtotime("2021-05-10 09:09:39")],
'd' => [PDO::PARAM_INT, strtotime("2021-05-10 09:09:39")],//fewest as 4 byte int?
],
'mydate' => [
'a' => [PDO::PARAM_STR, "2021-05-10"],
'c' => [PDO::PARAM_STR, strtotime("2021-05-10")],
'd' => [PDO::PARAM_INT, strtotime("2021-05-10")],//fewest as 4 byte int?
]
] as $col => $tests) {
foreach ($tests as $case_label => $test) {
list($type, $value) = $test;
$mark = isset($test[2]) ? $test[2] : '?';
$stmt = $mysql_pdo->prepare('INSERT INTO mysql_mytable ({$col}) VALUES ({$mark})');
$stmt->bindValue(1, $value, $type);
$stmt->execute();
}
}
I have spent a lot of time optimizing the disk space used on the table - but there is an ungodly amount of data going to a non-local MySQL instance. There are many columns and many rows... the above is just to show certain groups and options within... this really is import - and yes I'm accounting for making the insert efficient in therms of disable key checks, indexes, etc. I repeat this are example with example code and I'm asking for help to minimise the amount of data sent to the remote MySQL instance... yes it is a huge amount of data and yes the connection is slow enough and the process time critical enough that it matters.
Which, by column/test group, would result in the smallest number of bytes transfered over the network to the MySQL database?
Is there any method to limit the data in using PDO in a different way or not using PDO at all?
Integers are more compact than the equivalent value in a string, regardless of whether it's in decimal or hex.
The decimal value 1048575 requires 7 characters, and the hex value FFFFF takes 5 characters. Whereas the integer uses only 4 bytes.
Also consider whether you have PDO::ATTR_EMULATE_PREPARES enabled. That will defeat the use of integer parameters, because the string value will be interpolated into the query string instead of sent separately as a real parameter.
Speaking for myself, I am not concerned about network transfer for simple data types. Networks are fast enough that the transfer time is negligible compared to the query execution time. Perhaps if you're transferring big BLOB/TEXT content around, or if you are bulk-loading millions of rows, but usually the difference between 4 bytes and 5-8 bytes for an individual integer is not going to solve any performance bottlenecks.

Avoid multiple if statements in PHP (use array or some other option)

I am creating a conversion table using PHP and I have to check the user's input against A LOT of scenarios and I started by using if statements but this doesn't seem to be efficient whatsoever and I was hoping for an easier way to go through all the scenarios.
I looked into ternary and switch options but those don't seem to do what I need it to do and I also considered array's (the option I think I need to use)
What I'm trying to do:
The user enters in a grade level and scores for a category. Based on the sum of those scores and the grade level, I need to compare them to get two other scores
Example code:
if ($grade == 1 && $sumScore <= 5)
{
$textscore = 'Beginning';
}
if ($grade ==1 && ($sumScore>5 && $sumScore <=8))
{
$textScore = 'Intermediate';
}
etc....
There are 13 grades (K-12) and 4 categories I need to go through all with their own "raw scores" to consider to get these other scores. How can I avoid using a ton of If/Else if statements?
Thanks!!
You could use a two-dimensional array that's 13x4. Then you can use a nested for loop to go through each possibility and just have one statement that gets run a bunch of times because of the for loops.
For example, the array might look like this:
$textscores = array (
1 => array(5 => 'Beginning', 8 => 'Intermediate', ...),
...
3 => array(5 => 'Intermediate', ...),
...
);
The nested for loop might look like this:
foreach($textscores as $grade => $scores) {
foreach($scores as $sumScore => $textScore) {
if($userGrade == $grade && $userSumScore <= $sumScore) {
$userTextScore = $textScore;
break 2;
}
}
}
I haven't tested this (sorry), but I think something like this
function getTextScore($grade, $sum) {
$rules = array( array("grade" => 1, "minSum" => null, "maxSum" => 5, "textScore" => "Beginning"),
array("grade" => 1, "minSum" => 6, "maxSum" => 8, "textScore" => "Intermediate" ),
/* ... */
);
for ($ruleIdx=0; $ruleIdx<count($rules); $ruleIdx++) {
$currentRule = $rules[$ruleIdx];
if (($currentRule['grade'] == $grade) &&
((is_null($currentRule['minSum'])) || ($currentRule['minSum'] <= $sum)) &&
((is_null($currentRule['maxSum'])) || ($currentRule['maxSum'] >= $sum))) {
return $currentRule['textScore'];
}
}
// got to the end without finding a match - need to decide what to do
}
The rules have optional min and max values. It will stop as soon as it finds a match, so the order is important. You will need to decide if no rules are matched. You should be able to just drop extra rules in or change the existing ones without changing the logic.
From your example I would suggest the following
Multidimensional array, but a bit different from the way you construct the array
// Grade => [Text => [Min,Max]]
$textScores = [
1 => [
'Beginning' => [0, 5],
'Intermediate' => [5, 8],
'Master' => [8, 10]
],
2 => [
'Beginning' => [0, 7],
'Intermediate' => [7, 8],
'Master' => [8, 10]
],
3 => [
'Beginning' => [0, 3],
'Intermediate' => [3, 6],
'Master' => [6, 10]
]
];
// Random input to test
$grade = rand(1, 3);
$sumScore = rand(0, 10);
foreach ($textScores[$grade] as $Text => $MinMax) {
if ($MinMax[0] <= $sumScore && $MinMax[1] >= $sumScore) {
$textScore = $Grade;
break;
}
}

Naturally sort a flat array of ranged values including data units (GB and TB)

I have the following output values from a associative array (id => value):
1GB - 4GB
1TB - 2TB
3TB - 5TB
5GB - 16GB
6TB+
17GB - 32GB
33GB - 63GB
64GB - 120GB
121GB - 200GB
201GB - 300GB
301GB - 500GB
501GB - 1TB
How can I group and sort it so that I get it goes from smallest to largest:
so:
1GB - 4GB
5GB - 16GB
17GB - 32GB
33GB - 63GB
64GB - 120GB
121GB - 200GB
201GB - 300GB
301GB - 500GB
501GB - 1TB
1TB - 2TB
3TB - 5TB
6TB+
Posting my comment as an answer for posterity...
When you have strings that cannot be easily sorted and you are fetching the data from a database table, you can:
Add a column called weight to the table that is of data type
integer
In the weight column use higher or lower numbers depending on how
you want the data sorted.
When querying the data add ORDER BY weight DESC to your query to
fetch the results in the way you want it
If data not come from table, you may try this.
$arr = [
'8TB' => '8TB',
'1GB' => '4GB',
'1TB' => '2TB',
'3TB' => '5TB',
'5GB' => '16GB',
'17GB' => '32GB',
'33GB' => '63GB',
'64GB' => '120GB',
];
foreach($arr as $key => $val){
$unit = strtoupper(trim(substr($key, -2)));
$io[$unit][$key] = $val;
}
function cmpValue($a, $b){
return (substr($a, 0, -2) > substr($b, 0, -2)) ? true :false;
}
$sd = array_map(function($ele){
uksort($ele, 'cmpValue');
return $ele;
}, $io);
function cmpUnit($a, $b){
$units = array('B'=>0, 'KB'=>1, 'MB'=>2, 'GB'=>3, 'TB'=>4, 'PB'=>5, 'EB'=>6, 'ZB'=>7, 'YB'=>8);
return $units[$a] > $units[$b];
}
uksort($sd, 'cmpUnit');
$sordArray = call_user_func_array('array_merge', $sd);
print_r($sordArray);
While a larger set of units may need to work in conjunction with a lookup map of values relating to different unit abbreviations, this question is only dealing with GB and TB which will simply sort alphabetically.
For best efficiency, perform a single loop over the input values and parse the leading number and its trailing unit expression. The remainder of the strings appear to be irrelevant to accurate sorting.
Now that you have flat arrays to sort with, pass the units then the numbers into array_multisort(), then write the original array as the third parameter so that it is the "affected" array.
See also this related answer to a very similar question: PHP: Sort an array of bytes
Code: (Demo)
$nums = [];
$units = [];
foreach ($array as $v) {
[$nums[], $units[]] = sscanf($v, '%d%[A-Z]');
}
array_multisort($units, $nums, $array);
var_export($array);
If the input is:
$array = [
'a' => '1GB - 4GB',
'b' => '1TB - 2TB',
'c' => '3TB - 5TB',
'd' => '5GB - 16GB',
'e' => '6TB+',
'f' => '17GB - 32GB',
'g' => '33GB - 63GB',
'h' => '64GB - 120GB',
'i' => '121GB - 200GB',
'j' => '201GB - 300GB',
'k' => '301GB - 500GB',
'l' => '501GB - 1TB',
];
The output is:
array (
'a' => '1GB - 4GB',
'd' => '5GB - 16GB',
'f' => '17GB - 32GB',
'g' => '33GB - 63GB',
'h' => '64GB - 120GB',
'i' => '121GB - 200GB',
'j' => '201GB - 300GB',
'k' => '301GB - 500GB',
'l' => '501GB - 1TB',
'b' => '1TB - 2TB',
'c' => '3TB - 5TB',
'e' => '6TB+',
)

Trying to understand array_diff_uassoc optimization

It seems that arrays sorted before comparing each other inside array_diff_uassoc.
What is the benefit of this approach?
Test script
function compare($a, $b)
{
echo("$a : $b\n");
return strcmp($a, $b);
}
$a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5);
$b = array('v' => 1, 'w' => 2, 'x' => 3, 'y' => 4, 'z' => 5);
var_dump(array_diff_uassoc($a, $b, 'compare'));
$a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5);
$b = array('d' => 1, 'e' => 2, 'f' => 3, 'g' => 4, 'h' => 5);
var_dump(array_diff_uassoc($a, $b, 'compare'));
$a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5);
$b = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5);
var_dump(array_diff_uassoc($a, $b, 'compare'));
$a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5);
$b = array('e' => 5, 'd' => 4, 'c' => 3, 'b' => 2, 'a' => 1);
var_dump(array_diff_uassoc($a, $b, 'compare'));
http://3v4l.org/DKgms#v526
P.S. it seems that sorting algorithm changed in php7.
Sorting algorithm didn't change in PHP 7. Elements are just passed in another order to the sorting algorithm for some performance improvements.
Well, benefit could be an eventual faster execution. You really hit worst case when both arrays have completely other keys.
Worst case complexity is twice sorting the arrays and then comparisons of each key of the two arrays. O(n*m + n * log(n) + m * log(m))
Best case is twice sorting and then just as many comparisons as there are elements in the smaller array. O(min(m, n) + n * log(n) + m * log(m))
In case of a match, you wouldn't have to compare against the full array again, but only from the key after the match on.
But in current implementation, the sorting is just redundant. Implementation in php-src needs some improvement I think. There's no outright bug, but implementation is just bad. If you understand some C: http://lxr.php.net/xref/PHP_TRUNK/ext/standard/array.c#php_array_diff
(Note that that function is called via php_array_diff(INTERNAL_FUNCTION_PARAM_PASSTHRU, DIFF_ASSOC, DIFF_COMP_DATA_INTERNAL, DIFF_COMP_KEY_USER); from array_diff_uassoc)
Theory
Sorting allows for a few shortcuts to be made; for instance:
A | B
-------+------
1,2,3 | 4,5,6
Each element of A will only be compared against B[0], because the other elements are known to be at least as big.
Another example:
A | B
-------+-------
4,5,6 | 1,2,6
In this case, the A[0] is compared against all elements of B, but A[1] and A[2] are compared against B[2] only.
If any element of A is bigger than all elements in B you will get the worst performance.
Practice
While the above works well for the standard array_diff() or array_udiff(), once a key comparison function is used it will resort to O(n * m) performance because of this change while trying to fix this bug.
The aforementioned bug describes how custom key comparison functions can cause unexpected results when used with arrays that have mixed keys (i.e. numeric and string key values). I personally feel that this should've been addressed via the documentation, because you would get equally strange results with ksort().

Categories