Identifying the number of rows within a specific value range using PHP

Identifying the number of rows within a specific value range using PHP - php

I am trying to figure out how to determine the total number of rows within a specific number range using PHP.
I have a simple MySQL table with a single column. The column contains thousands of rows, each row containing a number between 0 and 100.
I figured out how to find the number of rows for a specific number, using array_count_values, but I can't figure out how to find the number of rows for a range.
For example, how many numbers are there between 60 and 80?
Here is the code that I put together to find a single value. What code should I add to find range values?
$query = "SELECT numbers FROM table";
$result = mysqli_query($conn, $query) or die("Error in $query");
$types = array();
while(($row = mysqli_fetch_assoc($result))) {
$types[] = $row['numbers'];
}
$counts = array_count_values($types);
echo $counts['12'];

If you need to count within multiple ranges you can use UNION so you don't have to send 5 queries.
$query = "SELECT COUNT(numbers) FROM `table` WHERE numbers between 00 and 20
UNION ALL
SELECT COUNT(numbers) FROM `table` WHERE numbers between 20 and 40
UNION ALL
SELECT COUNT(numbers) FROM `table` WHERE numbers between 40 and 60
UNION ALL
SELECT COUNT(numbers) FROM `table` WHERE numbers between 60 and 80
UNION ALL
SELECT COUNT(numbers) FROM `table` WHERE numbers between 80 and 100";

You can do this several ways.
Simple way (one full table scan)
SELECT SUM(IF(x BETWEEN 20 AND 30, 1, 0)) AS b_20_30,
SUM(...) AS b_31_40,
...
FROM tableName...
will return only one row with all your results in the time of a table scan.
Fancy way (not really recommended)
If you can come up with a rule to map your intervals to a single number, for example:
0...9 => interval i = 0
10...19 => interval i = 1 => the rule is "i = FLOOR(X/10)"
20...29 => interval i = 2
...and you don't need to scan too many rows, you might do something not very maintainable like this:
SELECT SUM(FLOOR(POW(100, FLOOR(x / 10)))) AS result FROM tableName;
Here, a value of 25 (between 20 and 29) will become 2, and the total sum will be increased by 1002. So long as you never have more than 100 rows in each group, the final result will be a univocal sum of powers and, if you have - say - 17 rows between 0 and 9, 31 rows between 10 and 19, and 74 between 20 and 29, you'll get a "magical parlor trick" answer of
743117
from whence you can recover the number of rows as 74,31,17 in that order.
Using 1000 instead of 100 would yield 74031017 (and the possibility of coping with up to 999 numbers in each group).
Note that the use of functions inside the SELECT pretty much guarantees you'll need a full, slow table scan.
Using indexes for speed
But we can get rid of the table scan, and simplify generation, by judiciously using indexed WHEREs - this is identical, performance-wise, to a UNION, but the result is simpler since it is only one row:
SELECT (SELECT COUNT(*) FROM tableName WHERE x BETWEEN ...) AS b_20_30,
(...)
; -- no table name on the outer query
This will need several subqueries (one per interval), but those subqueries will all use an index on x where available, which can make the overall query very fast. You just need
CREATE INDEX x_ndx ON tableName(x);
The same index will greatly improve the performance of the "simple" query above, which will no longer need a table scan but only a much faster index scan.
Build the query using PHP
Supposing we have the intervals specified as convenient arrays, we can use PHP to generate the query in the first place. No need of manually entering all those SELECTs.
$intervals = [ [ 20, 30 ], [ 17, 25 ], ... ];
function queryFromIntervals(array $intervals) {
$index = 0;
return 'SELECT ' .
implode(',', array_map(
function($interval) use ($tableName, &$index) {
return "(SELECT COUNT(1) FROM {$tableName} WHERE x BETWEEN {$interval[0]} AND {$interval[1]}) AS b_" . ($index++);
},
$intervals
))
// . ", (SELECT COUNT(*) FROM {$tableName}) AS total"
. ';';
}
This will again yield a single row, with fields named b_0, b_1, and so on. They will contain the number of rows where the value of x is between the bounds of the corresponding interval in $intervals (the intervals may overlap).
So by executing the query and retrieving a tuple called $result, you might get
[ 'b_0' => 133, 'b_1' => 29 ]
meaning that there are 133 rows with x between 20 and 30, and 29 with x between 17 and 25. You can add to the query a last total field (commented in the code above):
, ... ( SELECT COUNT(*) FROM tableName ) AS total;
to also get the total number of rows.
The same function, changing the inner return value, can be adapted to generate the "simple" query which uses IFs instead of SELECTs.

Why not let the database do the heavy lifting for you?
$query = "SELECT COUNT(*) FROM table WHERE numbers BETWEEN 60 AND 80";
$result = mysqli_query($conn, $query) or die("Error in $query");
$row = mysqli_fetch_array($result, MYSQLI_NUM);
echo $row[0];

If I understood well, you try to put a condition in your request sql.
Look that https://www.w3schools.com/sql/sql_where.asp
$query = "SELECT numbers FROM table WHERE numbers >= 60 AND numbers <= 80";

Related

Processing millions of data records with PHP MySQL issue

I have run into a delayed processing time for a PHP program,
I have a MySQL record with over 1000 tables;
Each table is created once a new device is added, e.g assets_data_imeixx - to assets_data_imeixx1000th table
Each table contains about 45,000 rows of records inserted every 10 seconds,
Below is my PHP code to query the database and fetch all these records based on datetime.
Issue: The program executes without error but it takes about 1.3minutes to 4mins for very large records.
PHP Code:
$ms = mysqli connection string in config.php //$ms is OKAY
$user_id = '5';
$q = "SELECT * FROM `user_assets` WHERE `user`='".$user_id ."' ORDER BY `imei` ASC";
$r = mysqli_query($ms,$q);
$result = array(); //$result array to contain all data
while($row =mysqli_fetch_array($r)){
//fetch 7 days record
for ($i=1; $i < 7; $i++) {
$date = "-" . $i . " days";
$days_ago = date('Y-m-d', strtotime($date, strtotime('today')));
$sql1 = "SELECT * FROM assets_data_" . $row["imei"] . " WHERE dt_time LIKE '" . $days_ago . "%' LIMIT 1"; // its correct
//$result1 = $conn->query($sql1);
$result1 = mysqli_query($ms,$sql1);
$row2 = mysqli_fetch_array($result1);
echo $row['imei']." ".$row2['dt_server']."<br/>";
}
}
Above code fetches over 1000 devices from user_assets table, These IMEI each has its own table that contains over 45,000 records in each table of location data.
The for loop iterates over each IMEI table and records.
Above code runs without error but take so much time to complete, I want to find a solution to optimize and have code execute in a very short time max 5 seconds.
I need help and suggestions on optimizing and running this large scale of data and iteration.
(from Comment)
CREATE TABLE gs_object_data_863844052008346 (
dt_server datetime NOT NULL,
dt_tracker datetime NOT NULL,
lat double DEFAULT NULL,
lng double DEFAULT NULL,
altitude double DEFAULT NULL,
angle double DEFAULT NULL,
speed double...
(From Comment)
gs_object_data_072101424612
gs_object_data_072101425049
gs_object_data_072101425486
gs_object_data_072101445153
gs_object_data_111111111111111
gs_object_data_1234567894
gs_object_data_222222222222222
gs_object_data_2716325849
gs_object_data_2716345818
gs_object_data_30090515907
gs_object_data_3009072323
gs_object_data_3009073758
gs_object_data_352093088838221
gs_object_data_352093088839310
gs_object_data_352093088840045
gs_object_data_352121088128697
gs_object_data_352121088132681
gs_object_data_352621109438959
gs_object_data_352621109440203
gs_object_data_352625694095355
gs_object_data_352672102822186
gs_object_data_352672103490900
gs_object_data_352672103490975
gs_object_data_352672103490991
gs_object_data_352887074794052
gs_object_data_352887074794102
gs_object_data_352887074794193
gs_object_data_352887074794417
gs_object_data_352887074794425
gs_object_data_352887074794433
gs_object_data_352887074794441
gs_object_data_352887074794458
gs_object_data_352887074794474
gs_object_data_352887074813696
gs_object_data_352887074813712
gs_object_data_352887074813720
gs_object_data_352887074813753
gs_object_data_352887074813761
gs_object_data_352887074813803
900+ tables each having different location data.
Requirement: Loop through each table, fetch data for selected date range say:
"SELECT dt_server FROM gs_object_data_" . $row["imei"] . " WHERE dt_server BETWEEN '2022-02-05 00:00:00' AND '2022-02-12 00:00:00'";
Expected Result: Return result set containing data from each table containing information for the selected date range. That means having 1000 tables will have to be looped through each table and also fetch data in each table.

I agree with KIKO -- 1 table not 1000. But, if I understand the rest, there are really 2 or 3 main tables.
Looking at your PHP -- It is often inefficient to look up one list, then go into a loop to find more. The better way (perhaps 10 times as fast) is to have a single SELECT with a JOIN to do both selects at once.
Consider some variation of this MySQL syntax; it may avoid most of the PHP code relating to $days_ago:
CURDATE() - INTERVAL 3 DAY
After also merging the Selects, this gives you the rows for the last 7 days:
WHERE date >= CURDATE() - INTERVAL 7 DAY
(I did not understand the need for LIMIT 1; please explain.)
Yes, you can use DATETIME values as strings, but try not to. Usually DateTime functions are more efficient.
Consider "composite" indexes:
INDEX(imei, dt)
which will be very efficient for
WHERE imei = $imei
AND dt >= CURDATE() - INTERVAL 7 DAY
I would ponder ways to have less redundancy in the output; but that should mostly be done after fetching the raw data from the table(s).
Turn on the SlowLog with a low value of long_query_time; it will help you locate the worst query; then we can focus on it.
An IMEI is up to 17 characters, always digits? If you are not already using this, I suggest BIGINT since it will occupy only 8 bytes.
For further discussion, please provide SHOW CREATE TABLE for each of the main tables.

Since all those 1000 tables are the same it would make sense to put all that data into 1 table. Then partition that table on date, use proper indexes, and optimize the query.
See: Normalization of Database
Since you limit results to one user, and one row per device, it should be possible to execute a query in well below one second.

Generate a million unique random 12 digit numbers

I need to generate close to a million(100 batches of 10000 numbers) unique and random 12 digit codes for a scratch card application. This process will be repeated and will need an equal number of codes to be generated everytime.
Also the generated codes need to be entered in a db so that they can be verified later when a consumer enters this on my website. I am using PHP and Mysql to do this. These are the steps I am following
Get admin input on the number of batches and the codes per batch
Using for loop generate the code using
mt_rand(100000000000,999999999999)
Check every time a number is generated to see if a duplicate exists
in the db and if not add to results variable else regenerate.
Save generated number in db if unique
Repeat b,c, and d over required number of codes
Output codes to admin in a csv
Code used(removed most of the comments to make it less verbose and because I have already explained the steps earlier):
$totalLabels = $numBatch*$numLabelsPerBatch;
// file name for download
$fileName = $customerName."_scratchcodes_" . date('Ymdhs') . ".csv";
$flag = false;
$generatedCodeInfo = array();
// headers for download
header("Content-Disposition: attachment; filename=\"$fileName\"");
header("Content-Type: application/vnd.ms-excel");
$codeObject = new Codes();
//get new batch number
$batchNumber = $codeObject->getLastBatchNumber() + 1;
$random = array();
for ($i = 0; $i < $totalLabels; $i++) {
do{
$random[$i] = mt_rand(100000000000,999999999999); //need to optimize this to reduce collisions given the databse will be grow
}while(isCodeNotUnique($random[$i],$db));
$codeObject = new Codes();
$codeObject->UID = $random[$i];
$codeObject->customerName = $customerName;
$codeObject->batchNumber = $batchNumber;
$generatedCodeInfo[$i] = $codeObject->addCode();
//change batch number for next batch
if($i == ($numLabelsPerBatch-1)){$batchNumber++;}
//$generatedCodeInfo[i] = array("UID" => 10001,"OID"=>$random[$i]);
if(!$flag) {
// display column names as first row
echo implode("\t", array_keys($generatedCodeInfo[$i])) . "\n";
$flag = true;
}
// filter data
array_walk($generatedCodeInfo[$i], 'filterData');
echo implode("\t", array_values($generatedCodeInfo[$i])) . "\n";
}
function filterData(&$str)
{
$str = preg_replace("/\t/", "\\t", $str);
$str = preg_replace("/\r?\n/", "\\n", $str);
if(strstr($str, '"')) $str = '"' . str_replace('"', '""', $str) . '"';
}
function isCodeNotUnique($random){
$codeObject = new Codes();
$codeObject->UID = $random;
if(!empty($codeObject->getCodeByUID())){
return true;
}
return false;
}
Now this is taking really long to execute and I believe is not optimal.
How can I optimize so that the unique random numbers are generated quickly?
Will it be faster if the numbers were instead generated in mysql or other way rather than php and if so how do I do that?
When the db starts growing the duplicate check in step b will be really time consuming so how do I avoid that?
Is there a limit on the number of rows in mysql?
Note: The numbers need to be unique across all batches across lifetime of the application.

1) Divide your range of numbers up to smaller ranges based on the number of batches. E.g. if your range 0 - 1000 and you have 10 batches, then have a batch from 0 - 99, the next 100 - 199, etc. When you generate the numbers for a batch, only generate the random number from the batch range. This way you know that you can only have duplicate numbers within a batch.
Do not insert each number into the database individually, but store them in an array. When you generate a new random number, then check against the array, not the database using in_array() function. When the batch is complete, then use a single insert statement to insert the contents of the batch:
insert into yourtable (bignumber) values (1), (2), ..., (n)
Check MySQL's max_allowed_packet setting to see if it is able to receive the complete sql statement in one go.
Implement a fallback plan, just in case a duplicate value is still found during the insert (error handling and number regeneration).
2) MySQL is not that great on procedural stuff, so I would stick with an external language, such as php.
3) Add a unique index on the field containing the random numbers. If you try to insert a duplicate record, MySQL will prevent it and throws an error. It is really quick.
4) Depending on the actual table engine used (innodb, myisam, etc), its configuration, and the OS, certain limits may apply on the size of the table. See Maximum number of records in a MySQL database table question here on SO for a more detailed answer (check the most upvoted answer, not the accepted one).

You can do the following:
$random = getExistingCodes(); // Get what you already have (from the DB).
$random = array_flip($random); //Make them into keys
$existingCount = count($random); //The codes you already have
do {
$random[mt_rand(100000000000,999999999999)] = 1;
} while ((count($random)-$existingCount) < $totalLabels);
$random = array_keys($random);
When you generate a duplicate number it will just overwrite that key and not increase the count.
To insert you can start a transaction and do as many inserts as needed. MySQL will try to optimize all operations within a single transaction.

Here is a query that generates 1 million pseudo-random numbers without repetitions:
select cast( (#n := (13*#n + 97) % 899999999981)+1e11 as char(12)) as num
from (select #n := floor(rand() * 9e11) ) init,
(select 1 union select 2) m01,
(select 1 union select 2) m02,
(select 1 union select 2) m03,
(select 1 union select 2) m04,
(select 1 union select 2) m05,
(select 1 union select 2) m06,
(select 1 union select 2) m07,
(select 1 union select 2) m08,
(select 1 union select 2) m09,
(select 1 union select 2) m10,
(select 1 union select 2) m11,
(select 1 union select 2) m12,
(select 1 union select 2) m13,
(select 1 union select 2) m14,
(select 1 union select 2) m15,
(select 1 union select 2) m16,
(select 1 union select 2) m17,
(select 1 union select 2) m18,
(select 1 union select 2) m19,
(select 1 union select 2) m20
limit 1000000;
How it works
It starts by generating a random integer value n with 0 <= n < 900000000000. This number will have the function of the seed for the generated sequence:
#n := floor(rand() * 9e11)
Through multiple (20) joins with inline pairs of records, this single record is multiplied to 220 copies, which is just a bit over 1 million.
Then the selection starts, and as record after record is fetched, the value of the #n variable is modified according to this incremental formula:
#n := (13*#n + 97) % 899999999981
This formula is a linear congruential generator. The three constant numbers need to obey some rules to maximise the period (of non-repetition), but it is the easiest when 899999999981 is prime, which it is. In that case we have a period of 899999999981, meaning that the first 899999999981 generated numbers will be unique (and we need much less). This number is in fact the largest prime below 900000000000.
As a final step, 100000000000 is added to the number to ensure the number always has 12 digits, so excluding numbers that are smaller than 100000000000. Because of the choice of 899999999981 there will be 20 numbers that will never be generated, namely those between 999999999981 and 999999999999 inclusive.
As this generates 220 records, the limit clause will make sure this is chopped off to exactly one million records.
The cast to char(12) is optional, but may be necessary to visualise the 12-digit numbers without them being rendered on the screen in scientific notation. If you will use this to insert records, and the target data type is numeric, then you would leave out this conversion of course.

CREATE TABLE x (v BIGINT(12) ZEROFILL NOT NULL PRIMARY KEY);
INSERT IGNORE INTO x (v) VALUES
(FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()),
(FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()),
(FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()),
(FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()),
(FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()), (FLOOR(1e12*RAND());
Do that INSERT 1e6/15 times.
Check COUNT(*) to see if you have a million. Do this until the table as a million rows:
INSERT IGNORE INTO x (v) VALUES
(FLOOR(1e12*RAND());
Notes:
ZEROFILL is assuming that you want the display to have leading zeros.
IGNORE is because there will be some number of duplicates. This avoids the costly check after each insert.
"Batch insert" is faster than one row at a time. (Doing 100 at a time is about optimal, but I am lazy.)
Potential problem: While I think the pattern of values for RAND() does not repeat at, say 2^16 or 2^32 values, I do not know for a fact. If you can't get to a million, then the random number generator is bad; you should switch to PHP's rand, or something else.
Beware of linear consequential random number generators. They are probably easily hacked. (I assume there is some "money" behind the scratch cards.)

Do not plan on mt_rand() being unique for small ranges
<?php
// Does mt_rand() repeat?
TryMT(100);
TryMT(100);
TryMT(1000);
TryMT(10000);
TryMT(1e6);
TryMT(1e8);
TryMT(1e10);
TryMT(1e12);
TryMT(1e14);
function TryMT($max) {
$h = [];
for ($j = 0; $j<$max; $j++) {
$v = mt_rand(1, $max);
if (isset($h[$v])) {
echo "Dup after $j iterations (limit=$max)<br>\n";
return;
}
$h[$v] = 1;
}
}
Sample output:
Dup after 7 iterations (limit=100)<br>
Dup after 13 iterations (limit=100)<br>
Dup after 29 iterations (limit=1000)<br>
Dup after 253 iterations (limit=10000)<br>
Dup after 245 iterations (limit=1000000)<br>
Dup after 3407 iterations (limit=100000000)<br>
Dup after 29667 iterations (limit=10000000000)<br>
Dup after 82046 iterations (limit=1000000000000)<br>
Dup after 42603 iterations (limit=1.0E+14)<br>
mt_rand() is a "good" random number generated because it does have dups.

Insert random number AND selected rows (php, mysql)

As the headline states, I'd like to know how to insert both a random number generated with php, and selected lines from another table. Example:
<?php
$randomid = (rand(1,1000000));
$sql = "INSERT INTO example2 (randomid, userid, name)
VALUES ('$randomid')
SELECT userid, name
FROM example1
WHERE name='Donald' "
$mysqli->query($sql);
?>
I'm not sure how to go about this. Must I divide this into an insert and an update query?

SELECT in MySQL can be used to output message/static values like in this example
SELECT CASE
WHEN userid<100 THEN 'less than 100'
WHEN userid<200 THEN 'less than 200'
ELSE 'greater than 200'
END AS message, userid
FROM mytable
so in your example you can just do the same
$randomid = (rand(1,1000000)); // <--- imagine 25 was returned
$sql = "INSERT INTO example2 (randomid, userid, name)
SELECT ".$randomid.", userid, name
FROM example1
WHERE name='Donald' " // <--- you select now looks like 'SELECT 25, userid, name'
however there is a downside as this will give every entry with the name Donald the same value so if you have multiple Donalds it kinda defeated the purpose of a random value unless you plan to limit the insert to do one at the time giving your PHP rand function to recalculate
a better way to do this is with MySQL's own RAND function
Returns a random floating-point value v in the range 0 <= v < 1.0.
ofcause since this function returns a decimal/float value which isn't really ideal for an integer key we want to make it into a inetger by mutiplying it and using FLOOR by using this
FLOOR(RAND()*1000000) AS randomid
Fiddle
this will get us a value between 0 and 1, we multiply it by 1000000 and then round it down to the nearest full number using FLOOR and unlike the PHP code a new number is created for every entry. so 15 Donalds will have 15 different random ids. there is still the possibility that you can get identical number but thus is the nature of random number

Mysql Select random rows to n times

I want to get 3 random records from my table to 90 times.
Scenario
user_id number_of_bids
12 20
8 40
6 30
what i want is...Get above 3 rows in random order to a specific number In fact it is sum(number_of_bids)...
And every row should not repeated greater than its number of bids..
I have created a query where I am getting sum of number_of_bids.Now second query required where these 3 records should be in random order to sum(number_of_bids) times and every record should not greater repeated greater than its number_of_bids.
Not sure it can be achieved in one query or not.But you people are experts I am sure you can help me.It will save my execution time..
Thanks..

I would just build an array out of the rows and shuffle it:
$stmt = $db->query('SELECT user_id, number_of_bids FROM table_name', PDO::FETCH_KEY_PAIR);
$results = array(); $startIndex = 0;
foreach ($stmt as $userId => $numberOfBids) {
$results += array_fill($startIndex, $numberOfBids, $userId);
$startIndex += $numberOfBids;
}
shuffle($results);
Then, you can iterate $results however you'd like.

PHP function - custom string length function

In php what is a function to only display strings that have a length greater than 50 characters, truncate it to not display more than 130 characters and limit it to one result?
so for example say i have 30 rows in a result set but I only want to show the newest row that have these parameters. If the newest row has 25 characters it should not display. It should only display the newest one that has a string length of 50 or more characters.

Use an SQL query. For finding the newest you want max on either an auto_increment primary key (ill call it id) or a date/time when the row was created (say, time time_created).
So I am assuming table with: id (int), stringVal (string, char(), varchar(), whatever)
SELECT MAX(id), SUBSTRING(stringVal, 1, 130)
FROM yourTable
WHERE LENGTH(stringVal) > 30
Replace id with a time field if you have to. You're going to have a hard time finding the newest without one of them, but you can always arbitrarily pick one row.
--Edit-- a sample of using mysql functions in PHP to run above query and fetch desired output
$sql = "SELECT MAX(id), SUBSTRING(stringVal, 1, 130) FROM yourTable WHERE LENGTH(stringVal) > 30";
$r = mysql_query($sql, $conn); //im hoping $conn or something like it is already set up
$row = mysql_fetch_assoc($r);
$desiredString = $row['stringVal'];

Something like this should do just make sure that you grab your data sorting by the newest items first. The break statement will ensure that the loop is terminated after the first result matching your criteria is found...
foreach($array_returned_from_query as $row)
{
if(strlen($row) > 50)
{
echo substr($row, 0, 130);
break;
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.