Calculate age: PHP vs MySQL, which method is better? - php

I have around 500,000 records of personal profile in MySQL database containing a birthdate column (dob). Since I had to get the ages for each profile, I needed to calculate it dynamically which I can either do via PHP (date_diff(date_create($dob), date_create('today'))->y) or through SQL ('SELECT TIMESTAMPDIFF(YEAR, dob, CURDATE()) AS age').
Which of the two is faster or more preferred especially if I have hundreds of thousands of rows?

In general, the best approach is to do such calculations on the server.
The ideal approach would be to use a generated column. This has been available since MySQL 5.7.5, and would be expressed as:
alter table t add age unsigned as
(TIMESTAMPDIFF(YEAR, dob, CURDATE()));
Alas, you can only use deterministic functions for generated columns. curdate() and now() are not deterministic, because their values can change with each call.
The next best thing is to use a view:
create view v_t as
select t.*,
TIMESTAMPDIFF(YEAR, dob, CURDATE())
from t;
Then, when you query the view, you'll have the age. This is true no matter where you query it. And it is the same logic everywhere.
The only caveat to doing the calculation on the server is that it uses server time, rather than local application time. If that is an issue, then that is a strong argument for doing the calculation locally.

Here is a test:
Create a table with 100K random dates
drop table if exists birthdays;
create table birthdays (
id int auto_increment primary key,
dob date
);
insert into birthdays (dob)
select '1950-01-01' + interval floor(rand(1)*68*365) day as dob
from information_schema.COLUMNS c1
, information_schema.COLUMNS c2
, information_schema.COLUMNS c3
limit 100000
;
Run this PHP script
<?php
header('Content-type: text/plain');
$db = new PDO("mysql:host=localhost;dbname=test", "test","");
### SQL
$starttime = microtime(true);
$stmt = $db->query("SELECT id, dob, TIMESTAMPDIFF(YEAR, dob, CURDATE()) AS age FROM birthdays");
$data = $stmt->fetchAll(PDO::FETCH_OBJ);
$runtime = microtime(true) - $starttime;
echo "SQL: $runtime \n";
### PHP
$starttime = microtime(true);
$stmt = $db->query("SELECT id, dob FROM birthdays");
$data = $stmt->fetchAll(PDO::FETCH_OBJ);
foreach ($data as $row) {
$row->age = date_diff(date_create($row->dob), date_create('today'))->y;
}
$runtime = microtime(true) - $starttime;
echo "PHP: $runtime \n";
Result:
SQL: 0.19094109535217
PHP: 1.203684091568
It looks like the SQL solution is 6 times faster. But that is not quite true. If we remove the code which calculates the age from both solutions, we will get something like 0.1653790473938. That means the overhead for SQL is 0.025 sec, while for PHP it is 1.038 sec. So SQL is 40 times faster in this test.
Note: There are faster ways to calculate the age in PHP. For example
$d = date('Y-m-d');
$row->age = substr($d, 0, 4) - substr($row->dob, 0, 4) - (substr($row->dob, 5) > substr($d, 5) ? 1 : 0);
is like four times faster - while date('Y-m-d') consumes more than 80% of the time. If you find a way to avoid any date function, you might get close to the performance of MySQL.

if you want get all 500,000 records you should do this in MySql because performance is better than PHP
but, if you want get some of that data (for example 10 records) , do that with PHP it's better to handle. and performance not different

Related

Processing millions of data records with PHP MySQL issue

I have run into a delayed processing time for a PHP program,
I have a MySQL record with over 1000 tables;
Each table is created once a new device is added, e.g assets_data_imeixx - to assets_data_imeixx1000th table
Each table contains about 45,000 rows of records inserted every 10 seconds,
Below is my PHP code to query the database and fetch all these records based on datetime.
Issue: The program executes without error but it takes about 1.3minutes to 4mins for very large records.
PHP Code:
$ms = mysqli connection string in config.php //$ms is OKAY
$user_id = '5';
$q = "SELECT * FROM `user_assets` WHERE `user`='".$user_id ."' ORDER BY `imei` ASC";
$r = mysqli_query($ms,$q);
$result = array(); //$result array to contain all data
while($row =mysqli_fetch_array($r)){
//fetch 7 days record
for ($i=1; $i < 7; $i++) {
$date = "-" . $i . " days";
$days_ago = date('Y-m-d', strtotime($date, strtotime('today')));
$sql1 = "SELECT * FROM assets_data_" . $row["imei"] . " WHERE dt_time LIKE '" . $days_ago . "%' LIMIT 1"; // its correct
//$result1 = $conn->query($sql1);
$result1 = mysqli_query($ms,$sql1);
$row2 = mysqli_fetch_array($result1);
echo $row['imei']." ".$row2['dt_server']."<br/>";
}
}
Above code fetches over 1000 devices from user_assets table, These IMEI each has its own table that contains over 45,000 records in each table of location data.
The for loop iterates over each IMEI table and records.
Above code runs without error but take so much time to complete, I want to find a solution to optimize and have code execute in a very short time max 5 seconds.
I need help and suggestions on optimizing and running this large scale of data and iteration.
(from Comment)
CREATE TABLE gs_object_data_863844052008346 (
dt_server datetime NOT NULL,
dt_tracker datetime NOT NULL,
lat double DEFAULT NULL,
lng double DEFAULT NULL,
altitude double DEFAULT NULL,
angle double DEFAULT NULL,
speed double...
(From Comment)
gs_object_data_072101424612
gs_object_data_072101425049
gs_object_data_072101425486
gs_object_data_072101445153
gs_object_data_111111111111111
gs_object_data_1234567894
gs_object_data_222222222222222
gs_object_data_2716325849
gs_object_data_2716345818
gs_object_data_30090515907
gs_object_data_3009072323
gs_object_data_3009073758
gs_object_data_352093088838221
gs_object_data_352093088839310
gs_object_data_352093088840045
gs_object_data_352121088128697
gs_object_data_352121088132681
gs_object_data_352621109438959
gs_object_data_352621109440203
gs_object_data_352625694095355
gs_object_data_352672102822186
gs_object_data_352672103490900
gs_object_data_352672103490975
gs_object_data_352672103490991
gs_object_data_352887074794052
gs_object_data_352887074794102
gs_object_data_352887074794193
gs_object_data_352887074794417
gs_object_data_352887074794425
gs_object_data_352887074794433
gs_object_data_352887074794441
gs_object_data_352887074794458
gs_object_data_352887074794474
gs_object_data_352887074813696
gs_object_data_352887074813712
gs_object_data_352887074813720
gs_object_data_352887074813753
gs_object_data_352887074813761
gs_object_data_352887074813803
900+ tables each having different location data.
Requirement: Loop through each table, fetch data for selected date range say:
"SELECT dt_server FROM gs_object_data_" . $row["imei"] . " WHERE dt_server BETWEEN '2022-02-05 00:00:00' AND '2022-02-12 00:00:00'";
Expected Result: Return result set containing data from each table containing information for the selected date range. That means having 1000 tables will have to be looped through each table and also fetch data in each table.
I agree with KIKO -- 1 table not 1000. But, if I understand the rest, there are really 2 or 3 main tables.
Looking at your PHP -- It is often inefficient to look up one list, then go into a loop to find more. The better way (perhaps 10 times as fast) is to have a single SELECT with a JOIN to do both selects at once.
Consider some variation of this MySQL syntax; it may avoid most of the PHP code relating to $days_ago:
CURDATE() - INTERVAL 3 DAY
After also merging the Selects, this gives you the rows for the last 7 days:
WHERE date >= CURDATE() - INTERVAL 7 DAY
(I did not understand the need for LIMIT 1; please explain.)
Yes, you can use DATETIME values as strings, but try not to. Usually DateTime functions are more efficient.
Consider "composite" indexes:
INDEX(imei, dt)
which will be very efficient for
WHERE imei = $imei
AND dt >= CURDATE() - INTERVAL 7 DAY
I would ponder ways to have less redundancy in the output; but that should mostly be done after fetching the raw data from the table(s).
Turn on the SlowLog with a low value of long_query_time; it will help you locate the worst query; then we can focus on it.
An IMEI is up to 17 characters, always digits? If you are not already using this, I suggest BIGINT since it will occupy only 8 bytes.
For further discussion, please provide SHOW CREATE TABLE for each of the main tables.
Since all those 1000 tables are the same it would make sense to put all that data into 1 table. Then partition that table on date, use proper indexes, and optimize the query.
See: Normalization of Database
Since you limit results to one user, and one row per device, it should be possible to execute a query in well below one second.

PHP / SQL - A lot of SQL Queries

On my website, I want to use a lot of different data from my database. Currently, I'm using four queries to gather different data. But is there a way to make it more efficient and put them into one big query? And how would I do that?
Edit: So the answer was to simply put all queries together into one and use as much data manipulation as possible in the database queries, and not in php.
$qry = "SELECT COUNT(*) cnt,
AVG(level) avg_lvl,
SUM(IF(onlinestatus=1, 1, 0)) online_cnt,
(SELECT Max(time) FROM refreshes) refresh_time
FROM players";
foreach ($db->query($qry) as $row){
$amount_total = $row['cnt'];
$average_level = floor($row['avg_lvl']);
$online_amount = $row['online_cnt'];
$milliseconds = $row['refresh_time'] + 1800000;
$update_time = DateTime::createFromFormat('U', intval($milliseconds / 1000));
}
You could combine all queries into one, like this:
$qry = "SELECT COUNT(*) cnt,
AVG(level) avg_lvl,
SUM(IF(onlinestatus=1, 1, 0)) online_cnt,
(SELECT Max(time) FROM refreshes) refresh_time
FROM rookstayers";
foreach ($db->query($qry) as $row){
$amount_total = $row['cnt']
$level = $row['avg_lvl'];
$online_amount = $row['online_cnt'];
$milliseconds = $row['refresh_time'] + 1800000;
$update_time = DateTime::createFromFormat('U', intval($milliseconds / 1000));
}
The last query you have seems to assume there is only one record in the result, as the loop would overwrite the previous result in each iteration. And as there is no order by in that query, it would be a bit of a gamble what the outcome would be. So I have taken the most recent time from the table in case there are multiple records there.
Note that that the above loop only executes once, as there is a guarantee to get exactly one result from the query.
The first and third queries can be combined into one:
select count(*) as num, sum(onlinestatus = 1) as numOnline
from rookstayers;
The second should be an aggregation:
select level, count(*) as cnt
from rookstayers
group by level;
The fourth is also an aggregation; I'm not sure exactly what the data looks like, but it seems to be something like:
select sum(time + 1800000)
from refreshes;
In general, you should do as much data manipulation in the database as you can. That is what databases are designed for.
EDIT:
The first, second, and third can be combined into:
select count(*) as num, sum(onlinestatus = 1) as numOnline,
avg(level) as avgLevel
from rookstayers;

PHP insert into MySQL after 30min

I have MySQL table:
and I want to add next row (from script in page) with values:
ip: 178.40.12.36
time: 2014-01-22 14:08:04
browser: Google Chrome
browser_version: 32.0.1700.76
platform: windows
country: Slovakia
Question: How to determine in mysql query to insert only if last insert with same identificator (IP+browser+platform) was 30min ago ?
My current insert (pseudo code):
$exist = SELECT *
FROM table
WHERE ip = $ip
AND browser = $browser
AND platform = $platform
if(!$exist) {
INSERT INTO table ...
}
My Idea:
$exist = SELECT ...
...
AND time < $time - 30MIN
Note: How to write this in MySQL?
You may use this as indicator:
SELECT
COUNT(1)
FROM `t`
WHERE `ip` = '$ip'
AND `browser` = '$browser'
AND `platform` = '$platform'
AND `time`>NOW() - INTERVAL 30 MINUTE
-I've replaced time with NOW() for current time, but you may wish to count from your last time value.
It will select records what are newer than 30 minutes, thus, if it's positive, then you don't need to insert new row(s).
Yes, it's easy.
AND time > NOW() - INTERVAL 30 MINUTE
There are many choices like this for date arithmetic.
You could just filter the SELECT for the INSERT:
INSERT INTO `Table` ( ... )
SELECT $ip, $time, $browser, $browser_version, $platform, $country
FROM `Other_Table`
WHERE ip = $ip AND browser = $browser AND platform = $platform AND
time < $time - 30MIN
Now, clearly that syntax won't work exactly, but you get the idea. If the time isn't 30MIN or more ago then it will return 0 records to INSERT.
This will avoid the need of performing a COUNT or EXISTS first; it can be done in one statement.

MySQL & PHP: summing up data from a table

Okay guys, this probably has an easy answer but has been stumping me for a few hours now.
I am using PHP/HTML to generate a table from a MySQL Table. In the MySQL table (TimeRecords) I have a StartTime and EndTime column. In my SELECT statement I am subtracting the EndTime from the StartTime and aliasing that as TotalHours. Here is my query thus far:
$query = "SELECT *,((EndTime - StartTime)/3600) AS TotalPeriodHours
FROM TimeRecords
WHERE Date
BETWEEN '{$CurrentYear}-{$CurrentMonth}-1'
AND '{$CurrentYear}-{$CurrentMonth}-31'
ORDER BY Date
";
I then loop that through an HTML table. So far so good. What I would like to do is to add up all of the TotalHours and put that into a separate DIV. Any ideas on 1) how to write the select statement and 2) where to call that code from the PHP/HTML?
Thanks in advance!
Try this
$query= "
SELECT ((EndTime - StartTime)/3600) AS Hours, otherFields, ...
FROM TimeRecords
WHERE
Date BETWEEN '{$CurrentYear} - {$CurrentMonth} - 1'
AND '{$CurrentYear}-{$CurrentMonth} - 31' ";
$records =mysql_query($query);
$sum= 0;
while($row=mysql_fetch_array($records))
{
echo"$row['otherFields']";
echo"$row['Hours']";
$sum+=$row['Hours'];
}
echo" Total Hours : $sum ";
Just use a single query with a Sum(). You could also manually calculate it if you're already displaying all rows. (If paginating or using LIMIT, you'll need a separate query like below.)
$query = "
SELECT Sum(((EndTime - StartTime)/3600)) AS SumTotalPeriodHours
FROM TimeRecords
WHERE
Date BETWEEN '{$CurrentYear} - {$CurrentMonth} - 1'
AND '{$CurrentYear}-{$CurrentMonth} - 31'
";
You can do this in the same query if you have a unique id using GROUP BY WITH ROLLUP
$query = "
SELECT unique_id,SUM((EndTime - StartTime)/3600) AS TotalPeriodHours
FROM TimeRecords
WHERE Date BETWEEN '{$CurrentYear}-{$CurrentMonth}-1'
AND '{$CurrentYear}-{$CurrentMonth}-31'
GROUP BY unique_id WITH ROLLUP
ORDER BY Date
";
In this instance the last result from your query with contain NULL and the overall total. If you don't have a unique ID you will need to do it in PHP as per Naveen's answer.
A few comments on your code:
Using SELECT * is not considered good practice. SELECT the columns you need.
Not all months have a day 31 so this may produce unexpected results. If you're using PHP5.3+, you can use
$date = new DateTime();
$endDate = $date->format( 'Y-m-t' );
The "t" flag here gets the last day of that month. See PHP docs for more on DateTime.

Find is a business is open: MySQL hours calculation

I have a list of business stored in a locations table, and stored in that table are hours the business opens and closes:
location
`mon_1_open`
`mon_1_closed`
`tue_1_open`
`tue_1_closed`
`wed_1_open`
`wed_1_closed`
ect...
I store the times in full hours and minutes, so say a business is open from 9:00AM to 5:30PM on monday.. mon_1_open = '900' AND mon_1_closed = '1730'.
I can't seem to figure out a way to find the day of week and output if the business is else open or closed based on the time of day.
Any suggestions?
This does not necessarily answer your question, but it may in the long run.
Your database scheme seems flawed. It definitely is not normalized. I would address that before it becomes a big issue, as you have noticed that it makes it hard to locate certain businesses hours. Here is a draft scheme that might be better suiting.
TABLE: locations
id INT AUTO_INCREMENT PRIMARY KEY
name VARCHAR(50)
TABLE: location_hours
id INT AUTO_INCREMENT PRIMARY KEY
location_id INT - Foreign Key references locations table
day CHAR(3) - (examples: mon, tue, wed, thu, fri, sat, sun)
hours VARCHAR(4) - (could also be int)
Then to get todays date, this can be done in MySQL with DATE_FORMAT %a, an example query:
SELECT locations.name, location_hours.hours
FROM locations
JOIN location_hours ON locations.id = location_hours.location_id
WHERE location_hours.day = DATE_FORMAT(NOW(), '%a')
AND location.name = 'Someway Business'
ORDER BY location_hours.hour
You should not need an open / close given that the the ORDER BY knows that 0900 < 1430 since it is a VARCHAR (although INT should know how to sort it as well), but your code when adding businesses will either need to update this record or you will need another field active to signify if that row should be used in the query. Just remember to use 24 hour time. Again this is a mock up, I just created it on the spot so it probably could use some improvements, but that would be better then doing a hack like you would have to with your current code.
UPDATE
Addressing the comment about finding if it is open or close:
Just use the PHP date function and call date('Hi') this will pull out the current time in 24-hour time, then you just do a simple if statement to see if it is between that, if it is, it is opened.
IE:
$sql = "SELECT locations.name, location_hours.hours
FROM locations
JOIN location_hours ON locations.id = location_hours.location_id
WHERE location_hours.day = DATE_FORMAT(NOW(), '%a')
AND location.name = 'Someway Business'
ORDER BY location_hours.hour";
$result = mysql_query($sql) or trigger_error("SQL Failed with Error: " . mysql_error());
$times = array();
while ($row = mysql_fetch_assoc($result)) {
if (empty($times['open'])) {
$times['open'] = $row['hours'];
}else {
$times['closed'] = $row['hours'];
}
}
$currentTime = date('Hi');
if ($times['open'] <= $currentTime
&& $times['closed'] > $currentTime) {
echo "Open";
}else {
echo "Closed";
}
Given that my logic is correct. Again, this is just pseudo code an example of usage. Given I just wrote it up on the spot. The above assumes you are only querying one business at a time.
$dayOfWeek = strtolower(date('D'));
$query = '
SELECT
location,
'.$dayOfWeek.'_1_open <= '.date('Gi').' AND
'.$dayOfWeek.'_1_closed >= '.date('Gi').' as is_open';
That should work.
However, you really should use a proper time datatype for the open/closed columns.

Categories