Minimise PHP PDO network transfer data in transit to MySQL - php

Assuming a target table like this:
CREATE TABLE mysql_mytable (
myint INT,
mytinyint TINYINT,
mydecimal DECIMAL(10,2),
mydatetime DATETIME, mydate DATE
)
And the following code with multiple test cases:
$mysql_pdo = new PDO("mysql:...", ..., [
PDO::ATTR_PERSISTENT => true,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
]);
foreach ([
'myint' => [
'a' => [PDO::PARAM_STR, "1048575"],
'b' => [PDO::PARAM_STR, 1048575],//same as previous?
'c' => [PDO::PARAM_STR, dechex(1048575), 'UNHEX(?)'],//"FFFFF"
'e' => [PDO::PARAM_INT, 1048575],//fewest as 4 byte int?
'f' => [PDO::PARAM_INT, "1048575"],//same as previous?
],
'mytinyint' => [
'a' => [PDO::PARAM_STR, "255"],
'b' => [PDO::PARAM_STR, 255],//same as previous?
'c' => [PDO::PARAM_STR, dechex(255), 'UNHEX(?)'],//"FF" fewest bytes as VARCHAR?
'e' => [PDO::PARAM_INT, 255],
'f' => [PDO::PARAM_INT, "255"],//same as previous?
],
'mydecimal' => [//PDO::PARAM_INT cannot be used correctly for decimals?
'a' => [PDO::PARAM_STR, "32000000.00"],
'b' => [PDO::PARAM_STR, "3.2e7"],//fewest bytes as VARCHAR?
],
'mydatetime' => [
'a' => [PDO::PARAM_STR, "2021-05-10 09:09:39"],
'c' => [PDO::PARAM_STR, strtotime("2021-05-10 09:09:39")],
'd' => [PDO::PARAM_INT, strtotime("2021-05-10 09:09:39")],//fewest as 4 byte int?
],
'mydate' => [
'a' => [PDO::PARAM_STR, "2021-05-10"],
'c' => [PDO::PARAM_STR, strtotime("2021-05-10")],
'd' => [PDO::PARAM_INT, strtotime("2021-05-10")],//fewest as 4 byte int?
]
] as $col => $tests) {
foreach ($tests as $case_label => $test) {
list($type, $value) = $test;
$mark = isset($test[2]) ? $test[2] : '?';
$stmt = $mysql_pdo->prepare('INSERT INTO mysql_mytable ({$col}) VALUES ({$mark})');
$stmt->bindValue(1, $value, $type);
$stmt->execute();
}
}
I have spent a lot of time optimizing the disk space used on the table - but there is an ungodly amount of data going to a non-local MySQL instance. There are many columns and many rows... the above is just to show certain groups and options within... this really is import - and yes I'm accounting for making the insert efficient in therms of disable key checks, indexes, etc. I repeat this are example with example code and I'm asking for help to minimise the amount of data sent to the remote MySQL instance... yes it is a huge amount of data and yes the connection is slow enough and the process time critical enough that it matters.
Which, by column/test group, would result in the smallest number of bytes transfered over the network to the MySQL database?
Is there any method to limit the data in using PDO in a different way or not using PDO at all?

Integers are more compact than the equivalent value in a string, regardless of whether it's in decimal or hex.
The decimal value 1048575 requires 7 characters, and the hex value FFFFF takes 5 characters. Whereas the integer uses only 4 bytes.
Also consider whether you have PDO::ATTR_EMULATE_PREPARES enabled. That will defeat the use of integer parameters, because the string value will be interpolated into the query string instead of sent separately as a real parameter.
Speaking for myself, I am not concerned about network transfer for simple data types. Networks are fast enough that the transfer time is negligible compared to the query execution time. Perhaps if you're transferring big BLOB/TEXT content around, or if you are bulk-loading millions of rows, but usually the difference between 4 bytes and 5-8 bytes for an individual integer is not going to solve any performance bottlenecks.

Related

Algorithm to find best combination of numbers - Bin Packing Prob lem

Say I have the following measures:
80
180
200
240
410
50
110
I can store each combination of numbers to a maximum of 480 per unit. How can I calculate the least number units required so all measures are spread in the most efficient way?
I've tagged PHP but it can be in JS too, or even pseudo language.
I know I'm supposed to tell what I did already but I'm quite stuck on how to approach this. The first thing that comes to mind is recursion but I'm no math expert to see how this can be done efficient...
Any help is greatly appreciated.
To further elaborate: I'm trying to calculate the number of skirtings I have to order, based on the different lengths I need for the walls. Each skirting has a length of 480cm and I want to know the best way to spread them so I have to buy the least number of skirtings. It's not so much about ordering a skirting extra, but the puzzle to figure it out is an interesting one (at least to me)
Update with solution
Despite people trying to close the question I've started fiddling with the Bin Packing Problem description and following the idea of sorting all items from largest to smallest and then fit them in the best possible way I created this small class that might help others in the future:
<?php
class BinPacker {
private $binSize;
public function __construct($binSize) {
$this->binSize = $binSize;
}
public function pack($elements) {
arsort($elements);
$bins = [];
$handled = [];
while(count($handled) < count($elements)) {
$bin = [];
foreach($elements as $label => $size) {
if(!in_array($label, $handled)) {
if(array_sum($bin) + $size < $this->binSize) {
$bin[$label] = $size;
$handled[] = $label;
}
}
}
$bins[] = $bin;
}
return $bins;
}
public function getMeta($bins) {
$meta = [
'totalValue' => 0,
'totalWaste' => 0,
'totalBins' => count($bins),
'efficiency' => 0,
'valuePerBin' => [],
'wastePerBin' => []
];
foreach($bins as $bin) {
$value = array_sum($bin);
$binWaste = $this->binSize - $value;
$meta['totalValue'] += $value;
$meta['totalWaste'] += $binWaste;
$meta['wastePerBin'][] = $binWaste;
$meta['valuePerBin'][] = $value;
}
$meta['efficiency'] = round((1 - $meta['totalWaste'] / $meta['totalValue']) * 100, 3);
return $meta;
}
}
$test = [
'Wall A' => 420,
'Wall B' => 120,
'Wall C' => 80,
'Wall D' => 114,
'Wall E' => 375,
'Wall F' => 90
];
$binPacker = new BinPacker(488);
$bins = $binPacker->pack($test);
echo '<h2>Meta:</h2>';
var_dump($binPacker->getMeta($bins));
echo '<h2>Bin Configuration</h2>';
var_dump($bins);
Which gives an output:
Meta:
array (size=6)
'totalValue' => int 1199
'totalWaste' => int 265
'totalBins' => int 3
'efficiency' => float 77.898
'valuePerBin' =>
array (size=3)
0 => int 420
1 => int 465
2 => int 314
'wastePerBin' =>
array (size=3)
0 => int 68
1 => int 23
2 => int 174
Bin Configuration
array (size=3)
0 =>
array (size=1)
'Wall A' => int 420
1 =>
array (size=2)
'Wall E' => int 375
'Wall F' => int 90
2 =>
array (size=3)
'Wall B' => int 120
'Wall D' => int 114
'Wall C' => int 80
While the data set is relatively small a rather high inefficiency rate is met. But in my own configuration where I entered all wall and ceiling measures I've reached an efficiency of 94.212% (n=129 measures).
(Note: the class does not check for ambigious labels, so if for example you define Wall A twice the result will be incorrect.)
Conclusion: for both the ceiling and the wall skirtings I can order one less skirting than my manual attempt to spread them efficiently.
Looks to me like a variation on the Bin Packing Problem where you're trying to pick the combination of elements that make up 480 (or just under). This is a fairly computationally hard problem and depending on how efficient/accurate it needs to be, might be overkill trying to get it exact.
A rough heuristic could be just to sort the measures, keep adding the smallest ones into a unit until the next one makes you go over, then add to a new unit and repeat.

Naturally sort a flat array of ranged values including data units (GB and TB)

I have the following output values from a associative array (id => value):
1GB - 4GB
1TB - 2TB
3TB - 5TB
5GB - 16GB
6TB+
17GB - 32GB
33GB - 63GB
64GB - 120GB
121GB - 200GB
201GB - 300GB
301GB - 500GB
501GB - 1TB
How can I group and sort it so that I get it goes from smallest to largest:
so:
1GB - 4GB
5GB - 16GB
17GB - 32GB
33GB - 63GB
64GB - 120GB
121GB - 200GB
201GB - 300GB
301GB - 500GB
501GB - 1TB
1TB - 2TB
3TB - 5TB
6TB+
Posting my comment as an answer for posterity...
When you have strings that cannot be easily sorted and you are fetching the data from a database table, you can:
Add a column called weight to the table that is of data type
integer
In the weight column use higher or lower numbers depending on how
you want the data sorted.
When querying the data add ORDER BY weight DESC to your query to
fetch the results in the way you want it
If data not come from table, you may try this.
$arr = [
'8TB' => '8TB',
'1GB' => '4GB',
'1TB' => '2TB',
'3TB' => '5TB',
'5GB' => '16GB',
'17GB' => '32GB',
'33GB' => '63GB',
'64GB' => '120GB',
];
foreach($arr as $key => $val){
$unit = strtoupper(trim(substr($key, -2)));
$io[$unit][$key] = $val;
}
function cmpValue($a, $b){
return (substr($a, 0, -2) > substr($b, 0, -2)) ? true :false;
}
$sd = array_map(function($ele){
uksort($ele, 'cmpValue');
return $ele;
}, $io);
function cmpUnit($a, $b){
$units = array('B'=>0, 'KB'=>1, 'MB'=>2, 'GB'=>3, 'TB'=>4, 'PB'=>5, 'EB'=>6, 'ZB'=>7, 'YB'=>8);
return $units[$a] > $units[$b];
}
uksort($sd, 'cmpUnit');
$sordArray = call_user_func_array('array_merge', $sd);
print_r($sordArray);
While a larger set of units may need to work in conjunction with a lookup map of values relating to different unit abbreviations, this question is only dealing with GB and TB which will simply sort alphabetically.
For best efficiency, perform a single loop over the input values and parse the leading number and its trailing unit expression. The remainder of the strings appear to be irrelevant to accurate sorting.
Now that you have flat arrays to sort with, pass the units then the numbers into array_multisort(), then write the original array as the third parameter so that it is the "affected" array.
See also this related answer to a very similar question: PHP: Sort an array of bytes
Code: (Demo)
$nums = [];
$units = [];
foreach ($array as $v) {
[$nums[], $units[]] = sscanf($v, '%d%[A-Z]');
}
array_multisort($units, $nums, $array);
var_export($array);
If the input is:
$array = [
'a' => '1GB - 4GB',
'b' => '1TB - 2TB',
'c' => '3TB - 5TB',
'd' => '5GB - 16GB',
'e' => '6TB+',
'f' => '17GB - 32GB',
'g' => '33GB - 63GB',
'h' => '64GB - 120GB',
'i' => '121GB - 200GB',
'j' => '201GB - 300GB',
'k' => '301GB - 500GB',
'l' => '501GB - 1TB',
];
The output is:
array (
'a' => '1GB - 4GB',
'd' => '5GB - 16GB',
'f' => '17GB - 32GB',
'g' => '33GB - 63GB',
'h' => '64GB - 120GB',
'i' => '121GB - 200GB',
'j' => '201GB - 300GB',
'k' => '301GB - 500GB',
'l' => '501GB - 1TB',
'b' => '1TB - 2TB',
'c' => '3TB - 5TB',
'e' => '6TB+',
)

Data relationship - looking for solutions

I have a problem I need to solve and I'm sure there is a way of doing this, I'm just not exactly sure "what to search for" and how to find it.
I was thinking of doing this either in Excel or I could maybe try to make a PHP script to do it.
So basically, I have a set of substances. Each pair of substances is either compatible or incompatible with another one. So what I have is a table with rows and columns where there is either 0 or 1, i.e. compatible/incompatible.
Now what I want to do is try to find groups of substances, where all substances in that group are compatible with each other. And the goal is to find as large group as possible, or ideally, find the largest, second largest etc. and sort them from largest to smallest (given there could be some limitation for the minimum number of elements in that group).
I hope it makes sense, the problem is that I'm not sure how to solve it, but I think this is something that should be relatively commonly done and so I doubt the only way is writing a script/macro from scratch that would use brute force to do this. This would also probably not be very efficient as I have a table with over 30 elements.
So just to make it more clear, for example here is a simplified table of what my data looks like:
Substance A B C D
A 0 1 1 1
B 1 0 0 1
C 1 0 0 0
D 1 0 0 0
If you use only php without database you can use uasort to sort by sum all elements of related array.
<?php
$substances = [
'A' => [
'A' => 0,
'B' => 1,
'C' => 1,
'D' => 0,
],
'B' => [
'A' => 1,
'B' => 0,
'C' => 1,
'D' => 1,
],
'C' => [
'A' => 0,
'B' => 1,
'C' => 0,
'D' => 0,
]
];
uasort ($substances, function ($a, $b) {
$a = array_sum($a);
$b = array_sum($b);
if ($a == $b) {
return 0;
}
return ($a > $b) ? -1 : 1;
});
var_export($substances);

Performance mySQL: Long string in one column in database table or add extra columns?

if I have a muli-dimensional array like that I take from a form submit:
$participants = array(
'participant1'=> array(
'name'=>'jim',
'age' => '15',
'grade' => '8th'),
'participant2'=> array(
'name'=>'tom',
'age' => '17',
'grade' => '9th'),
....
);
Is it better two store whole array into one db column named "Participants" or create a separate column in the row for each participant PERFORMANCE wise if i have a maximum number of participants?
using separate column would be better at the point of normalization also if you need only name or age than it would be better you dont need to fetch all

A lookup table, Store in MySQL or PHP

I have a question regarding to the performance between a lookup table stored in MySQL (standalone table) or PHP (array), so here is my data (in array form)
$users = array(
array(name => 'a', address => 'abc', age => '14'),
array(name => 'b', address => 'def', age => '12'),
array(name => 'c', address => 'ghi', age => '13'),
array(name => 'd', address => 'jkl', age => '14'),
array(name => 'd', address => 'mno', age => '11'),
);
It is a game on the facebook platform, may have a chance with a lot people access in same time.
Actually, the table should have ~100 rows, but all data is static, if all data store in MySQL, I can do any "select" easily, consider with too much query to MySQL, therefore I consider store in php array, however, I don't know how to select rows in a specific condition (I know it should have other other method rather than for loop) just like select all age = 14 in to another array.
So, which one have the better performance? (MySQL or PHP lookup table?)
If you know:
the data will always be static
that you'll only have 100 rows
that you'll only ever be using simple single-field matches in your queries
that you'll never need advanced features like joins
that you're going to get high traffic
... then I'd definitely put the logic in pure PHP. "Remove unnecessary database queries" is always going to be your first step in improving performance, and dropping these dead-simple rows into MySQL for no other reason than you don't want to write a simple foreach loop is a bad idea, I think. It's overkill. If you're using a opcode cache like APC (which you are, right?) then I don't think the performance comparison will even be close. (Though I'll always recommend actually benchmarking them both yourself to be sure.)
class Users
{
protected $_users = array(
array('name' => 'a', 'address' => 'abc', 'age' => '14'),
array('name' => 'b', 'address' => 'def', 'age' => '12'),
array('name' => 'c', 'address' => 'ghi', 'age' => '13'),
array('name' => 'd', 'address' => 'jkl', 'age' => '14'),
array('name' => 'e', 'address' => 'mno', 'age' => '11')
);
public function select($field, $value)
{
$list = array();
foreach ($this->_users as $user) {
if ($user[$field] == $value) {
$list[] = $user;
}
}
return $list;
}
}
$users = new Users();
$list = $users->select('age', 14);
PHP will be definitely faster in terms of performance. But querying will be difficult to maintain code. Keeping data in mysql will be a overhead because of network. But you can optimize that by using MyIASM tables and using query cached on server side.
My vote is for MyIASM table on MySql with query cache enabled.

Categories