Processing millions of data records with PHP MySQL issue - php

I have run into a delayed processing time for a PHP program,
I have a MySQL record with over 1000 tables;
Each table is created once a new device is added, e.g assets_data_imeixx - to assets_data_imeixx1000th table
Each table contains about 45,000 rows of records inserted every 10 seconds,
Below is my PHP code to query the database and fetch all these records based on datetime.
Issue: The program executes without error but it takes about 1.3minutes to 4mins for very large records.
PHP Code:
$ms = mysqli connection string in config.php //$ms is OKAY
$user_id = '5';
$q = "SELECT * FROM `user_assets` WHERE `user`='".$user_id ."' ORDER BY `imei` ASC";
$r = mysqli_query($ms,$q);
$result = array(); //$result array to contain all data
while($row =mysqli_fetch_array($r)){
//fetch 7 days record
for ($i=1; $i < 7; $i++) {
$date = "-" . $i . " days";
$days_ago = date('Y-m-d', strtotime($date, strtotime('today')));
$sql1 = "SELECT * FROM assets_data_" . $row["imei"] . " WHERE dt_time LIKE '" . $days_ago . "%' LIMIT 1"; // its correct
//$result1 = $conn->query($sql1);
$result1 = mysqli_query($ms,$sql1);
$row2 = mysqli_fetch_array($result1);
echo $row['imei']." ".$row2['dt_server']."<br/>";
}
}
Above code fetches over 1000 devices from user_assets table, These IMEI each has its own table that contains over 45,000 records in each table of location data.
The for loop iterates over each IMEI table and records.
Above code runs without error but take so much time to complete, I want to find a solution to optimize and have code execute in a very short time max 5 seconds.
I need help and suggestions on optimizing and running this large scale of data and iteration.
(from Comment)
CREATE TABLE gs_object_data_863844052008346 (
dt_server datetime NOT NULL,
dt_tracker datetime NOT NULL,
lat double DEFAULT NULL,
lng double DEFAULT NULL,
altitude double DEFAULT NULL,
angle double DEFAULT NULL,
speed double...
(From Comment)
gs_object_data_072101424612
gs_object_data_072101425049
gs_object_data_072101425486
gs_object_data_072101445153
gs_object_data_111111111111111
gs_object_data_1234567894
gs_object_data_222222222222222
gs_object_data_2716325849
gs_object_data_2716345818
gs_object_data_30090515907
gs_object_data_3009072323
gs_object_data_3009073758
gs_object_data_352093088838221
gs_object_data_352093088839310
gs_object_data_352093088840045
gs_object_data_352121088128697
gs_object_data_352121088132681
gs_object_data_352621109438959
gs_object_data_352621109440203
gs_object_data_352625694095355
gs_object_data_352672102822186
gs_object_data_352672103490900
gs_object_data_352672103490975
gs_object_data_352672103490991
gs_object_data_352887074794052
gs_object_data_352887074794102
gs_object_data_352887074794193
gs_object_data_352887074794417
gs_object_data_352887074794425
gs_object_data_352887074794433
gs_object_data_352887074794441
gs_object_data_352887074794458
gs_object_data_352887074794474
gs_object_data_352887074813696
gs_object_data_352887074813712
gs_object_data_352887074813720
gs_object_data_352887074813753
gs_object_data_352887074813761
gs_object_data_352887074813803
900+ tables each having different location data.
Requirement: Loop through each table, fetch data for selected date range say:
"SELECT dt_server FROM gs_object_data_" . $row["imei"] . " WHERE dt_server BETWEEN '2022-02-05 00:00:00' AND '2022-02-12 00:00:00'";
Expected Result: Return result set containing data from each table containing information for the selected date range. That means having 1000 tables will have to be looped through each table and also fetch data in each table.

I agree with KIKO -- 1 table not 1000. But, if I understand the rest, there are really 2 or 3 main tables.
Looking at your PHP -- It is often inefficient to look up one list, then go into a loop to find more. The better way (perhaps 10 times as fast) is to have a single SELECT with a JOIN to do both selects at once.
Consider some variation of this MySQL syntax; it may avoid most of the PHP code relating to $days_ago:
CURDATE() - INTERVAL 3 DAY
After also merging the Selects, this gives you the rows for the last 7 days:
WHERE date >= CURDATE() - INTERVAL 7 DAY
(I did not understand the need for LIMIT 1; please explain.)
Yes, you can use DATETIME values as strings, but try not to. Usually DateTime functions are more efficient.
Consider "composite" indexes:
INDEX(imei, dt)
which will be very efficient for
WHERE imei = $imei
AND dt >= CURDATE() - INTERVAL 7 DAY
I would ponder ways to have less redundancy in the output; but that should mostly be done after fetching the raw data from the table(s).
Turn on the SlowLog with a low value of long_query_time; it will help you locate the worst query; then we can focus on it.
An IMEI is up to 17 characters, always digits? If you are not already using this, I suggest BIGINT since it will occupy only 8 bytes.
For further discussion, please provide SHOW CREATE TABLE for each of the main tables.

Since all those 1000 tables are the same it would make sense to put all that data into 1 table. Then partition that table on date, use proper indexes, and optimize the query.
See: Normalization of Database
Since you limit results to one user, and one row per device, it should be possible to execute a query in well below one second.

Related

Calculate age: PHP vs MySQL, which method is better?

I have around 500,000 records of personal profile in MySQL database containing a birthdate column (dob). Since I had to get the ages for each profile, I needed to calculate it dynamically which I can either do via PHP (date_diff(date_create($dob), date_create('today'))->y) or through SQL ('SELECT TIMESTAMPDIFF(YEAR, dob, CURDATE()) AS age').
Which of the two is faster or more preferred especially if I have hundreds of thousands of rows?
In general, the best approach is to do such calculations on the server.
The ideal approach would be to use a generated column. This has been available since MySQL 5.7.5, and would be expressed as:
alter table t add age unsigned as
(TIMESTAMPDIFF(YEAR, dob, CURDATE()));
Alas, you can only use deterministic functions for generated columns. curdate() and now() are not deterministic, because their values can change with each call.
The next best thing is to use a view:
create view v_t as
select t.*,
TIMESTAMPDIFF(YEAR, dob, CURDATE())
from t;
Then, when you query the view, you'll have the age. This is true no matter where you query it. And it is the same logic everywhere.
The only caveat to doing the calculation on the server is that it uses server time, rather than local application time. If that is an issue, then that is a strong argument for doing the calculation locally.
Here is a test:
Create a table with 100K random dates
drop table if exists birthdays;
create table birthdays (
id int auto_increment primary key,
dob date
);
insert into birthdays (dob)
select '1950-01-01' + interval floor(rand(1)*68*365) day as dob
from information_schema.COLUMNS c1
, information_schema.COLUMNS c2
, information_schema.COLUMNS c3
limit 100000
;
Run this PHP script
<?php
header('Content-type: text/plain');
$db = new PDO("mysql:host=localhost;dbname=test", "test","");
### SQL
$starttime = microtime(true);
$stmt = $db->query("SELECT id, dob, TIMESTAMPDIFF(YEAR, dob, CURDATE()) AS age FROM birthdays");
$data = $stmt->fetchAll(PDO::FETCH_OBJ);
$runtime = microtime(true) - $starttime;
echo "SQL: $runtime \n";
### PHP
$starttime = microtime(true);
$stmt = $db->query("SELECT id, dob FROM birthdays");
$data = $stmt->fetchAll(PDO::FETCH_OBJ);
foreach ($data as $row) {
$row->age = date_diff(date_create($row->dob), date_create('today'))->y;
}
$runtime = microtime(true) - $starttime;
echo "PHP: $runtime \n";
Result:
SQL: 0.19094109535217
PHP: 1.203684091568
It looks like the SQL solution is 6 times faster. But that is not quite true. If we remove the code which calculates the age from both solutions, we will get something like 0.1653790473938. That means the overhead for SQL is 0.025 sec, while for PHP it is 1.038 sec. So SQL is 40 times faster in this test.
Note: There are faster ways to calculate the age in PHP. For example
$d = date('Y-m-d');
$row->age = substr($d, 0, 4) - substr($row->dob, 0, 4) - (substr($row->dob, 5) > substr($d, 5) ? 1 : 0);
is like four times faster - while date('Y-m-d') consumes more than 80% of the time. If you find a way to avoid any date function, you might get close to the performance of MySQL.
if you want get all 500,000 records you should do this in MySql because performance is better than PHP
but, if you want get some of that data (for example 10 records) , do that with PHP it's better to handle. and performance not different

PHP & MySQL: How to select rows in 10 minute increments

I'm trying to build a MySQL SELECT that will only pull in rows in 10 minute increments (or sufficiently close to 10 min increments). My data is recorded about once a minute per row, but the timing of the INSERT isn't always exact. Each row has a datetime column.
My first idea was to pull in a whole bunch of rows then weed them out with a PHP function. That approach is messy and I'd much rather do this with a smarter MySQL query. Is it possible?
$result = mysqli_query($con,"SELECT * FROM pairs ORDER BY pair_id DESC LIMIT 2000;");
while($row = mysqli_fetch_array($result)) {
echo $row['SomeTimestamp'] . " " . $row['SomeValue'];
echo "<br>";
}
This will fetch all the records with timestamp within last 10min.
SELECT * FROM pairs WHERE SomeTimestamp > DATE_SUB(CURTIME(), INTERVAL 10 MINUTE)
You could use WHERE and date function
SELECT ... FROM ... WHERE /* select the lines that in the range of 10 mins */

Insert multiple rows in MySQL and check for random string without long delay (~80 rows each minute)

For a research project I am obtaining data from a local bus company's GPS system (through their API). I created a php cron job that runs every minute to obtain data like the vehicle, route ID, location, destination, etc. The data did not contain a unique "run number" for each bus route (a unique number so that I can track the progression of a single bus along its route), so I created my own that checks if the vehicle ID, destination, and relative time are similar, and assigns the unique "run ID" to it so that I can track the bus along its route. If no run ID exists, a random one is generated. (Any vehicle with the same "vid" and "pid" within 2 minutes of the last inserted row "timeadded" is on the same run, and this is important for my research)
Each time the cron runs (1 minute), approximately 80 rows are added into the database.
Initially the job would run quickly. However, with over 500,000 rows now, I've noticed the job can take upwards of 40 seconds. I believe it's because for each of the ~80 rows, it has to check the entire table ("vehicles") to see if the same run ID exists, essentially querying a large table and inserting a row 80 times. I want to get at least a week's worth of data (on day 4 now), at which point I can export the data, erase all rows, and start over. My question is: Is there any way I can refactor my PHP/SQL code to make the process run faster? It's been years since I've worked with SQL, so I'm sure there's a more ingenious way to insert all this data.
<?php
// Obtain data from XML
$xml = simplexml_load_file("url.xml");
foreach ($xml->vehicle as $vehicle) {
$vid = $vehicle->vid;
$tm = $vehicle->tmstmp;
$dat = substr($vehicle->tmstmp, 0, 8);
$tme = substr($vehicle->tmstmp, 9);
$lat = $vehicle->lat;
$lon = $vehicle->lon;
$hdg = $vehicle->hdg;
$pid = $vehicle->pid;
$rt = $vehicle->rt;
$des = $vehicle->des;
$pdist = $vehicle->pdist;
// Database connection and insert
mysql_connect("redacted", "redacted", "redacted") or die(mysql_error()); mysql_select_db("redacted") or die(mysql_error());
$sql_findsim = "SELECT vid, pid, timeadded, run, rt FROM vehicles WHERE vid=" . mysql_real_escape_string($vid). " AND pid=" . mysql_real_escape_string($pid). " AND rt=" . mysql_real_escape_string($rt). " AND timeadded > DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 2 MINUTE);";
$handle = mysql_query($sql_findsim);
$row = mysql_fetch_row($handle);
$runid = $row[3];
if($runid !== null) {
$run = $runid;
} else {
$run = substr(md5(rand()), 0, 30);
}
$sql = "INSERT INTO vehicles (vid, tmstmp, dat, tme, lat, lon, hdg, pid, rt, des, pdist, run) VALUES ($vid,'$tm','$dat','$tme','$lat','$lon',$hdg,$pid,'$rt','$des',$pdist,'$run')";
$result = mysql_query($sql);
mysql_close();
}
?>
Thanks for any help with refactoring this code to get it to run more quickly and efficiently.
Do you have any indexes on the table? A compound index on (vid,pid,rt,timeadded) will make the query faster, avoiding a full table scan.
create index fastmagic on vehicles (vid,pid,rt,timeadded)
Alternatively, you could skip the select all together and just to the insert without assigning the "run" random value. This will keep your cron job at "constant time" since all you're doing is appending new data.
After you've got your week of data go back and write "second pass" code to step through each row (select * from vehicle order by timeadded). For each row, do your "select" similar to how you've already done it - then "update" the row you are processing now.
If you go with the alternate, you'll probably want an autoincrement "id" integer column to make row identification clearer (if you don't already have one).
I would suggest that,
Create a table as vehicle_ids ( or some meaningful name ) these fields.
vid, pid, run, rt
instead of checking in vehicles table for vid, you can check the above table for id, if not insert ( make vid as auto increment ).
Normalize your table and also index your vehicle table

How to get size per day of a table

I have a database with ~20 tables. Each table has a column "dtLogTime" that records the time that row was inserted. I want to figure out the size (probably kb or mb) each table is recording per day. More specifically, I'm only interested in the last 3 days. Also, these tables keep track of data up to a certain time interval (i.e. 2 weeks, 1 month, etc), meaning I lose a day's worth of data for every new day's data stored.
I came across this code that can show me the size of each table.
<?php
$link = mysql_connect('host', 'username', 'password');
$db_name = "your database name here";
$tables = array();
mysql_select_db($db_name, $link);
$result = mysql_query("SHOW TABLE STATUS");
while($row = mysql_fetch_array($result)) {
/* We return the size in Kilobytes */
$total_size = ($row[ "Data_length" ] +
$row[ "Index_length" ]) / 1024;
$tables[$row['Name']] = sprintf("%.2f", $total_size);
}
print_r($tables);
?>
When I tried doing
"SHOW TABLE STATUS WHERE dtLogTime < '2011-08-28 00:00:00'
AND dtLogTime >= '2011-08-27 00:00:00'"
it gave me an error. Is there a way to do this?
Thanks
You need to include a LIKE clause to specify the table. Source: http://dev.mysql.com/doc/refman/5.6/en/show-table-status.html
SHOW TABLE STATUS
LIKE YourTable
WHERE dtLogTime < '2011-08-28 00:00:00'
AND dtLogTime >= '2011-08-27 00:00:00'
The Where clause applies to the resulting table generated by SHOW TABLE STATUS, and cannot be actual columns of your various tables. For instance:
SHOW TABLE STATUS where Index_length = 0
Run SHOW TABLE STATUS by itself to see a list of all the legal columns you can use in the WHERE clause. Unfortunately, for your situation, you'll have to run SHOW TABLE STATUS each day and store the result somewhere.
UPDATE
For clarification, SHOW TABLE STATUS is a convenience method that interrogates the system table, INFORMATION_SCHEMA.TABLES. It will pare down the results from that system table to just the persistent tables in your current database. It doesn't perform any calculations of its own.

Find is a business is open: MySQL hours calculation

I have a list of business stored in a locations table, and stored in that table are hours the business opens and closes:
location
`mon_1_open`
`mon_1_closed`
`tue_1_open`
`tue_1_closed`
`wed_1_open`
`wed_1_closed`
ect...
I store the times in full hours and minutes, so say a business is open from 9:00AM to 5:30PM on monday.. mon_1_open = '900' AND mon_1_closed = '1730'.
I can't seem to figure out a way to find the day of week and output if the business is else open or closed based on the time of day.
Any suggestions?
This does not necessarily answer your question, but it may in the long run.
Your database scheme seems flawed. It definitely is not normalized. I would address that before it becomes a big issue, as you have noticed that it makes it hard to locate certain businesses hours. Here is a draft scheme that might be better suiting.
TABLE: locations
id INT AUTO_INCREMENT PRIMARY KEY
name VARCHAR(50)
TABLE: location_hours
id INT AUTO_INCREMENT PRIMARY KEY
location_id INT - Foreign Key references locations table
day CHAR(3) - (examples: mon, tue, wed, thu, fri, sat, sun)
hours VARCHAR(4) - (could also be int)
Then to get todays date, this can be done in MySQL with DATE_FORMAT %a, an example query:
SELECT locations.name, location_hours.hours
FROM locations
JOIN location_hours ON locations.id = location_hours.location_id
WHERE location_hours.day = DATE_FORMAT(NOW(), '%a')
AND location.name = 'Someway Business'
ORDER BY location_hours.hour
You should not need an open / close given that the the ORDER BY knows that 0900 < 1430 since it is a VARCHAR (although INT should know how to sort it as well), but your code when adding businesses will either need to update this record or you will need another field active to signify if that row should be used in the query. Just remember to use 24 hour time. Again this is a mock up, I just created it on the spot so it probably could use some improvements, but that would be better then doing a hack like you would have to with your current code.
UPDATE
Addressing the comment about finding if it is open or close:
Just use the PHP date function and call date('Hi') this will pull out the current time in 24-hour time, then you just do a simple if statement to see if it is between that, if it is, it is opened.
IE:
$sql = "SELECT locations.name, location_hours.hours
FROM locations
JOIN location_hours ON locations.id = location_hours.location_id
WHERE location_hours.day = DATE_FORMAT(NOW(), '%a')
AND location.name = 'Someway Business'
ORDER BY location_hours.hour";
$result = mysql_query($sql) or trigger_error("SQL Failed with Error: " . mysql_error());
$times = array();
while ($row = mysql_fetch_assoc($result)) {
if (empty($times['open'])) {
$times['open'] = $row['hours'];
}else {
$times['closed'] = $row['hours'];
}
}
$currentTime = date('Hi');
if ($times['open'] <= $currentTime
&& $times['closed'] > $currentTime) {
echo "Open";
}else {
echo "Closed";
}
Given that my logic is correct. Again, this is just pseudo code an example of usage. Given I just wrote it up on the spot. The above assumes you are only querying one business at a time.
$dayOfWeek = strtolower(date('D'));
$query = '
SELECT
location,
'.$dayOfWeek.'_1_open <= '.date('Gi').' AND
'.$dayOfWeek.'_1_closed >= '.date('Gi').' as is_open';
That should work.
However, you really should use a proper time datatype for the open/closed columns.

Categories