How to get size per day of a table - php

I have a database with ~20 tables. Each table has a column "dtLogTime" that records the time that row was inserted. I want to figure out the size (probably kb or mb) each table is recording per day. More specifically, I'm only interested in the last 3 days. Also, these tables keep track of data up to a certain time interval (i.e. 2 weeks, 1 month, etc), meaning I lose a day's worth of data for every new day's data stored.
I came across this code that can show me the size of each table.
<?php
$link = mysql_connect('host', 'username', 'password');
$db_name = "your database name here";
$tables = array();
mysql_select_db($db_name, $link);
$result = mysql_query("SHOW TABLE STATUS");
while($row = mysql_fetch_array($result)) {
/* We return the size in Kilobytes */
$total_size = ($row[ "Data_length" ] +
$row[ "Index_length" ]) / 1024;
$tables[$row['Name']] = sprintf("%.2f", $total_size);
}
print_r($tables);
?>
When I tried doing
"SHOW TABLE STATUS WHERE dtLogTime < '2011-08-28 00:00:00'
AND dtLogTime >= '2011-08-27 00:00:00'"
it gave me an error. Is there a way to do this?
Thanks

You need to include a LIKE clause to specify the table. Source: http://dev.mysql.com/doc/refman/5.6/en/show-table-status.html
SHOW TABLE STATUS
LIKE YourTable
WHERE dtLogTime < '2011-08-28 00:00:00'
AND dtLogTime >= '2011-08-27 00:00:00'

The Where clause applies to the resulting table generated by SHOW TABLE STATUS, and cannot be actual columns of your various tables. For instance:
SHOW TABLE STATUS where Index_length = 0
Run SHOW TABLE STATUS by itself to see a list of all the legal columns you can use in the WHERE clause. Unfortunately, for your situation, you'll have to run SHOW TABLE STATUS each day and store the result somewhere.
UPDATE
For clarification, SHOW TABLE STATUS is a convenience method that interrogates the system table, INFORMATION_SCHEMA.TABLES. It will pare down the results from that system table to just the persistent tables in your current database. It doesn't perform any calculations of its own.

Related

Processing millions of data records with PHP MySQL issue

I have run into a delayed processing time for a PHP program,
I have a MySQL record with over 1000 tables;
Each table is created once a new device is added, e.g assets_data_imeixx - to assets_data_imeixx1000th table
Each table contains about 45,000 rows of records inserted every 10 seconds,
Below is my PHP code to query the database and fetch all these records based on datetime.
Issue: The program executes without error but it takes about 1.3minutes to 4mins for very large records.
PHP Code:
$ms = mysqli connection string in config.php //$ms is OKAY
$user_id = '5';
$q = "SELECT * FROM `user_assets` WHERE `user`='".$user_id ."' ORDER BY `imei` ASC";
$r = mysqli_query($ms,$q);
$result = array(); //$result array to contain all data
while($row =mysqli_fetch_array($r)){
//fetch 7 days record
for ($i=1; $i < 7; $i++) {
$date = "-" . $i . " days";
$days_ago = date('Y-m-d', strtotime($date, strtotime('today')));
$sql1 = "SELECT * FROM assets_data_" . $row["imei"] . " WHERE dt_time LIKE '" . $days_ago . "%' LIMIT 1"; // its correct
//$result1 = $conn->query($sql1);
$result1 = mysqli_query($ms,$sql1);
$row2 = mysqli_fetch_array($result1);
echo $row['imei']." ".$row2['dt_server']."<br/>";
}
}
Above code fetches over 1000 devices from user_assets table, These IMEI each has its own table that contains over 45,000 records in each table of location data.
The for loop iterates over each IMEI table and records.
Above code runs without error but take so much time to complete, I want to find a solution to optimize and have code execute in a very short time max 5 seconds.
I need help and suggestions on optimizing and running this large scale of data and iteration.
(from Comment)
CREATE TABLE gs_object_data_863844052008346 (
dt_server datetime NOT NULL,
dt_tracker datetime NOT NULL,
lat double DEFAULT NULL,
lng double DEFAULT NULL,
altitude double DEFAULT NULL,
angle double DEFAULT NULL,
speed double...
(From Comment)
gs_object_data_072101424612
gs_object_data_072101425049
gs_object_data_072101425486
gs_object_data_072101445153
gs_object_data_111111111111111
gs_object_data_1234567894
gs_object_data_222222222222222
gs_object_data_2716325849
gs_object_data_2716345818
gs_object_data_30090515907
gs_object_data_3009072323
gs_object_data_3009073758
gs_object_data_352093088838221
gs_object_data_352093088839310
gs_object_data_352093088840045
gs_object_data_352121088128697
gs_object_data_352121088132681
gs_object_data_352621109438959
gs_object_data_352621109440203
gs_object_data_352625694095355
gs_object_data_352672102822186
gs_object_data_352672103490900
gs_object_data_352672103490975
gs_object_data_352672103490991
gs_object_data_352887074794052
gs_object_data_352887074794102
gs_object_data_352887074794193
gs_object_data_352887074794417
gs_object_data_352887074794425
gs_object_data_352887074794433
gs_object_data_352887074794441
gs_object_data_352887074794458
gs_object_data_352887074794474
gs_object_data_352887074813696
gs_object_data_352887074813712
gs_object_data_352887074813720
gs_object_data_352887074813753
gs_object_data_352887074813761
gs_object_data_352887074813803
900+ tables each having different location data.
Requirement: Loop through each table, fetch data for selected date range say:
"SELECT dt_server FROM gs_object_data_" . $row["imei"] . " WHERE dt_server BETWEEN '2022-02-05 00:00:00' AND '2022-02-12 00:00:00'";
Expected Result: Return result set containing data from each table containing information for the selected date range. That means having 1000 tables will have to be looped through each table and also fetch data in each table.
I agree with KIKO -- 1 table not 1000. But, if I understand the rest, there are really 2 or 3 main tables.
Looking at your PHP -- It is often inefficient to look up one list, then go into a loop to find more. The better way (perhaps 10 times as fast) is to have a single SELECT with a JOIN to do both selects at once.
Consider some variation of this MySQL syntax; it may avoid most of the PHP code relating to $days_ago:
CURDATE() - INTERVAL 3 DAY
After also merging the Selects, this gives you the rows for the last 7 days:
WHERE date >= CURDATE() - INTERVAL 7 DAY
(I did not understand the need for LIMIT 1; please explain.)
Yes, you can use DATETIME values as strings, but try not to. Usually DateTime functions are more efficient.
Consider "composite" indexes:
INDEX(imei, dt)
which will be very efficient for
WHERE imei = $imei
AND dt >= CURDATE() - INTERVAL 7 DAY
I would ponder ways to have less redundancy in the output; but that should mostly be done after fetching the raw data from the table(s).
Turn on the SlowLog with a low value of long_query_time; it will help you locate the worst query; then we can focus on it.
An IMEI is up to 17 characters, always digits? If you are not already using this, I suggest BIGINT since it will occupy only 8 bytes.
For further discussion, please provide SHOW CREATE TABLE for each of the main tables.
Since all those 1000 tables are the same it would make sense to put all that data into 1 table. Then partition that table on date, use proper indexes, and optimize the query.
See: Normalization of Database
Since you limit results to one user, and one row per device, it should be possible to execute a query in well below one second.

How to handle/optimize thousands of different to executed SELECT queries?

I need to synchronize specific information between two databases (one mysql, the other a remote hosted SQL Server database) for thousands of rows. When I execute this php file it gets stuck/timeouts after several minutes I guess, so I wonder how I can fix this issue and maybe also optimize the way of "synchronizing" it.
What the code needs to do:
Basically I want to get for every row (= one account) in my database which gets updated - two specific pieces of information (= 2 SELECT queries) from another SQL Server database. Therefore I use a foreach loop which creates 2 SQL queries for each row and afterwards I update those information into 2 columns of this row. We talk about ~10k Rows which needs to run thru this foreach loop.
My idea which may help?
I have heard about things like PDO Transactions which should collect all those queries and sending them afterwards in a package of all SELECT queries, but I have no idea whether I use them correctly or whether they even help in such cases.
This is my current code, which is timing out after few minutes:
// DBH => MSSQL DB | DB => MySQL DB
$dbh->beginTransaction();
// Get all referral IDs which needs to be updated:
$listAccounts = "SELECT * FROM Gifting WHERE refsCompleted <= 100 ORDER BY idGifting ASC";
$ps_listAccounts = $db->prepare($listAccounts);
$ps_listAccounts->execute();
foreach($ps_listAccounts as $row) {
$refid=$row['refId'];
// Refsinserted
$refsInserted = "SELECT count(username) as done FROM accounts WHERE referral='$refid'";
$ps_refsInserted = $dbh->prepare($refsInserted);
$ps_refsInserted->execute();
$row = $ps_refsInserted->fetch();
$refsInserted = $row['done'];
// Refscompleted
$refsCompleted = "SELECT count(username) as done FROM accounts WHERE referral='$refid' AND finished=1";
$ps_refsCompleted = $dbh->prepare($refsCompleted);
$ps_refsCompleted->execute();
$row2 = $ps_refsCompleted->fetch();
$refsCompleted = $row2['done'];
// Update fields for local order db
$updateGifting = "UPDATE Gifting SET refsInserted = :refsInserted, refsCompleted = :refsCompleted WHERE refId = :refId";
$ps_updateGifting = $db->prepare($updateGifting);
$ps_updateGifting->bindParam(':refsInserted', $refsInserted);
$ps_updateGifting->bindParam(':refsCompleted', $refsCompleted);
$ps_updateGifting->bindParam(':refId', $refid);
$ps_updateGifting->execute();
echo "$refid: $refsInserted Refs inserted / $refsCompleted Refs completed<br>";
}
$dbh->commit();
You can do all of that in one query with a correlated sub-query:
UPDATE Gifting
SET
refsInserted=(SELECT COUNT(USERNAME)
FROM accounts
WHERE referral=Gifting.refId),
refsCompleted=(SELECT COUNT(USERNAME)
FROM accounts
WHERE referral=Gifting.refId
AND finished=1)
A correlated sub-query is essentially using a sub-query (query within a query) that references the parent query. So notice that in each of the sub-queries I am referencing the Gifting.refId column in the where clause of each sub-query. While this isn't the best for performance because each of those sub-queries still has to run independent of the other queries, it would perform much better (and likely as good as you are going to get) than what you have there.
Edit:
And just for reference. I don't know if a transaction will help here at all. Typically they are used when you have several queries that depend on each other and to give you a way to rollback if one fails. For example, banking transactions. You don't want the balance to deduct some amount until a purchase has been inserted. And if the purchase fails inserting for some reason, you want to rollback the change to the balance. So when inserting a purchase, you start a transaction, run the update balance query and the insert purchase query and only if both go in correctly and have been validated do you commit to save.
Edit2:
If I were doing this, without doing an export/import this is what I would do. This makes a few assumptions though. First is that you are using a mssql 2008 or newer and second is that the referral id is always a number. I'm also using a temp table that I insert numbers into because you can insert multiple rows easily with a single query and then run a single update query to update the gifting table. This temp table follows the structure CREATE TABLE tempTable (refId int, done int, total int).
//get list of referral accounts
//if you are using one column, only query for one column
$listAccounts = "SELECT DISTINCT refId FROM Gifting WHERE refsCompleted <= 100 ORDER BY idGifting ASC";
$ps_listAccounts = $db->prepare($listAccounts);
$ps_listAccounts->execute();
//loop over and get list of refIds from above.
$refIds = array();
foreach($ps_listAccounts as $row){
$refIds[] = $row['refId'];
}
if(count($refIds) > 0){
//implode into string for use in query below
$refIds = implode(',',$refIds);
//select out total count
$totalCount = "SELECT referral, COUNT(username) AS cnt FROM accounts WHERE referral IN ($refIds) GROUP BY referral";
$ps_totalCounts = $dbh->prepare($totalCount);
$ps_totalCounts->execute();
//add to array of counts
$counts = array();
//loop over total counts
foreach($ps_totalCounts as $row){
//if referral id not found, add it
if(!isset($counts[$row['referral']])){
$counts[$row['referral']] = array('total'=>0,'done'=>0);
}
//add to count
$counts[$row['referral']]['total'] += $row['cnt'];
}
$doneCount = "SELECT referral, COUNT(username) AS cnt FROM accounts WHERE finished=1 AND referral IN ($refIds) GROUP BY referral";
$ps_doneCounts = $dbh->prepare($doneCount);
$ps_doneCounts->execute();
//loop over total counts
foreach($ps_totalCounts as $row){
//if referral id not found, add it
if(!isset($counts[$row['referral']])){
$counts[$row['referral']] = array('total'=>0,'done'=>0);
}
//add to count
$counts[$row['referral']]['done'] += $row['cnt'];
}
//now loop over counts and generate insert queries to a temp table.
//I suggest using a temp table because you can insert multiple rows
//in one query and then the update is one query.
$sqlInsertList = array();
foreach($count as $refId=>$count){
$sqlInsertList[] = "({$refId}, {$count['done']}, {$count['total']})";
}
//clear out the temp table first so we are only inserting new rows
$truncSql = "TRUNCATE TABLE tempTable";
$ps_trunc = $db->prepare($truncSql);
$ps_trunc->execute();
//make insert sql with multiple insert rows
$insertSql = "INSERT INTO tempTable (refId, done, total) VALUES ".implode(',',$sqlInsertList);
//prepare sql for insert into mssql
$ps_insert = $db->prepare($insertSql);
$ps_insert->execute();
//sql to update existing rows
$updateSql = "UPDATE Gifting
SET refsInserted=(SELECT total FROM tempTable WHERE refId=Gifting.refId),
refsCompleted=(SELECT done FROM tempTable WHERE refId=Gifting.refId)
WHERE refId IN (SELECT refId FROM tempTable)
AND refsCompleted <= 100";
$ps_update = $db->prepare($updateSql);
$ps_update->execute();
} else {
echo "There were no reference ids found from \$dbh";
}

Insert multiple rows in MySQL and check for random string without long delay (~80 rows each minute)

For a research project I am obtaining data from a local bus company's GPS system (through their API). I created a php cron job that runs every minute to obtain data like the vehicle, route ID, location, destination, etc. The data did not contain a unique "run number" for each bus route (a unique number so that I can track the progression of a single bus along its route), so I created my own that checks if the vehicle ID, destination, and relative time are similar, and assigns the unique "run ID" to it so that I can track the bus along its route. If no run ID exists, a random one is generated. (Any vehicle with the same "vid" and "pid" within 2 minutes of the last inserted row "timeadded" is on the same run, and this is important for my research)
Each time the cron runs (1 minute), approximately 80 rows are added into the database.
Initially the job would run quickly. However, with over 500,000 rows now, I've noticed the job can take upwards of 40 seconds. I believe it's because for each of the ~80 rows, it has to check the entire table ("vehicles") to see if the same run ID exists, essentially querying a large table and inserting a row 80 times. I want to get at least a week's worth of data (on day 4 now), at which point I can export the data, erase all rows, and start over. My question is: Is there any way I can refactor my PHP/SQL code to make the process run faster? It's been years since I've worked with SQL, so I'm sure there's a more ingenious way to insert all this data.
<?php
// Obtain data from XML
$xml = simplexml_load_file("url.xml");
foreach ($xml->vehicle as $vehicle) {
$vid = $vehicle->vid;
$tm = $vehicle->tmstmp;
$dat = substr($vehicle->tmstmp, 0, 8);
$tme = substr($vehicle->tmstmp, 9);
$lat = $vehicle->lat;
$lon = $vehicle->lon;
$hdg = $vehicle->hdg;
$pid = $vehicle->pid;
$rt = $vehicle->rt;
$des = $vehicle->des;
$pdist = $vehicle->pdist;
// Database connection and insert
mysql_connect("redacted", "redacted", "redacted") or die(mysql_error()); mysql_select_db("redacted") or die(mysql_error());
$sql_findsim = "SELECT vid, pid, timeadded, run, rt FROM vehicles WHERE vid=" . mysql_real_escape_string($vid). " AND pid=" . mysql_real_escape_string($pid). " AND rt=" . mysql_real_escape_string($rt). " AND timeadded > DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 2 MINUTE);";
$handle = mysql_query($sql_findsim);
$row = mysql_fetch_row($handle);
$runid = $row[3];
if($runid !== null) {
$run = $runid;
} else {
$run = substr(md5(rand()), 0, 30);
}
$sql = "INSERT INTO vehicles (vid, tmstmp, dat, tme, lat, lon, hdg, pid, rt, des, pdist, run) VALUES ($vid,'$tm','$dat','$tme','$lat','$lon',$hdg,$pid,'$rt','$des',$pdist,'$run')";
$result = mysql_query($sql);
mysql_close();
}
?>
Thanks for any help with refactoring this code to get it to run more quickly and efficiently.
Do you have any indexes on the table? A compound index on (vid,pid,rt,timeadded) will make the query faster, avoiding a full table scan.
create index fastmagic on vehicles (vid,pid,rt,timeadded)
Alternatively, you could skip the select all together and just to the insert without assigning the "run" random value. This will keep your cron job at "constant time" since all you're doing is appending new data.
After you've got your week of data go back and write "second pass" code to step through each row (select * from vehicle order by timeadded). For each row, do your "select" similar to how you've already done it - then "update" the row you are processing now.
If you go with the alternate, you'll probably want an autoincrement "id" integer column to make row identification clearer (if you don't already have one).
I would suggest that,
Create a table as vehicle_ids ( or some meaningful name ) these fields.
vid, pid, run, rt
instead of checking in vehicles table for vid, you can check the above table for id, if not insert ( make vid as auto increment ).
Normalize your table and also index your vehicle table

Delete old rows in table if maximum exceeded

I have insert in table any time when users open any post on my site, in this way im get real time 'Whats happend on site'
mysql_query("INSERT INTO `just_watched` (`content_id`) VALUES ('{$id}')");
but now have problem because have over 100K hits every day, this is a 100K new rows in this table every day, there is any way to limit table to max 100 rows, and if max is exceeded then delete old 90 and insert again or something like that, have no idea what's the right way to make this
my table just_watched
ID - content_id
ID INT(11) - AUTO_INCREMENT
content_id INT(11)
Easiest way that popped into my head would be to use php logic to delete and insert your information. Then every time a user open a new post you would then add the count the database. (this you are already doing)
The new stuff comes here
Enter a control before the insertion, meaning before anything is inserted you would first count all the rows, if it does not exceed 100 rows then add a new row.
If it does exceed 100 rows then you before inserting a new row you, first do a delete statement THEN you insert a new row.
Example (sudo code) :
$sql = "SELECT COUNT(*) FROM yourtable";
$count = $db -> prepare($sql);
$count -> execute();
if ($count -> fetchColumn() >= 100) { // If the count is over a 100
............... //Delete the first 90 leave 10 then insert a new row which will leave you at 11 after the delete.
} else {
.................. // Keep inserting until you have 100 then repeat the process
}
More information on counting here. Then some more information on PDO here.
Hopefully this helps :)
Good luck.
Also information on how to set up PDO if you haven't already.
What I would do? :
At 12:00 AM every night run a cron job that deletes all rows from the past day. But thats just some advice. Have a good one.
Use this query for deleting old rows except last 100 rows:
DELETE FROM just_watched where
ID not in (SELECT id fromjust_watched order by ID DESC LIMIT 100)
You can run it by CRON in every n period where (n= hours, or minutes, or any)
$numRows = mysql_num_rows(mysql_query("SELECT ID FROM just_watched"));
if ($numRows > 100){
mysql_query("DELETE FROM just_watched LIMIT 90");
}
mysql_query("INSERT INTO `just_watched` (`content_id`) VALUES ('{$id}')");
I guess this should work fine.
You can get the number of rows in your table with:
$size = mysql_num_rows($result);
With the size of the table, you can check, if it's getting to big, and then remove 90 rows:
// Get 90 lines of code
$query = "Select * FROM just_watched ORDER BY id ASC LIMIT 90";
$result = mysql_query($query);
// Go through them
while($row = mysql_fetch_object($result)) {
// Delete the row with the id
$id = $row['id'];
$sql = 'DELETE FROM just_watched
WHERE id=$id';
}
Another way would be to just delete an old row if you add a new row to the table. The only problem is, that if something get's jammed, the table might get to big.
You may use
DELETE FROM just_watched ORDER BY id DESC LIMIT 100, 9999999999999999;
So, it'll delete all the rows from the offset 100 to a big number (for end of the tables). if you always run this query before you insert new one then it'll do the job for you.

Comparing strings returned from a mySql query

I am querying a large number of codes from my database, and need to have some validation before a user can input another code in to the database.
An example code would be this:
TD-BR-010212-xxxxxxxx
Where TD represents a promotion, BR represents a place, the numbers represent a date, and the rest are random.
My problem is that before the code is entered into the DB, I want to check to see if the date and place for that code already exists, as they should not be allwed to enter a code from the same place and date.
I assume it would be something within a loop as I already have:
$location_part_of_td = $code[2].$code[3];
$date_part_of_td = $code[4].$code[5].$code[6].$code[7].$code[8].$code[9];
$trade_day_result = mysql_query('SELECT * from wp_scloyalty WHERE promotion_type = trade-day') or die(mysql_error()); // Pulls all trade day codes from the database and checks the date part of the code.
// the date part exists with the same area part, user cant redeem.
while($info = mysql_fetch_array( $trade_day_result ))
{
$code = $info["product"];
}
But Im just not sure about the best way to check the strings..
You can use a MySQL LIKE clause to get entries in your DB that resemble your code.
Example:
$code_exists = mysql_query(
"SELECT 'a' FROM table_name WHERE column_name LIKE 'TD-BR-010212-%'"
);
if(mysql_num_rows($code_exists) > 0) {
// The specified place/date is taken
} else {
// No promotion at place BR on the specified date.
}
The '%' is used as a wildcard in SQL LIKE clauses.
You have two approach to solving this issue. Assuming you have access to alter the table.
Add a unique constraint to the table base off of the two columns.
Or Your approach by selecting all of the Location and Date, and see if it return any results.
SQL: SELECT COUNT(*) as counter FROM table where column = 'TD-BR-010212-%'
And check to see if counter return > 0;
I would use the LIKE statement in your SELECT and pull entries that start with the same promotion, place, and date. Unfortunately I don't know how your table looks so bear with me:
$promo_query = "SELECT * FROM wp_sclocalty WHERE column_name LIKE 'TD-BR-010212-%'";
$promo_result = mysql_query($promo_query);
if(mysql_num_rows($promo_result) == 0) {
// the promo code has NOT been used
} else {
// the promo code HAS been used
}
try this query
$part_code=substr($code, 0)
$records =mysql_query("select id from tableName where SUBSTRING(code,1,12)= $part_code");
if(mysql_num_rows($records) > 0)
{
// Duplicate exit
}
else
{
// insert code in DB
}
If you can, you'll get better performance and easier coding if you break apart the code into different fields when you save the data in each row. That way you can write queries that specifically check values for the components pieces of the code - you can even set rules in the database (like UNIQUE) to ensure that some parts are kept unique.
Specifically, I'd suggest:
create table your_table (
[... your other columns ...]
promotion char(2),
place char(2),
pr_date date,
pr_ident varchar(50)
)
Your first row would be ([...], 'TD','BR','2012-01-02', 'xxxxxxxx'). And queries would not require unpacking the formatted string - you could say things like "where promotion = 'TD' and place in ('BR','XX') ...". Simple, eh?

Categories