mysql database with revolving data - php

I'm not sure the best way to phrase this question!
I have a mysql database that needs to retrieve and store the past 24 data values, and the data always needs to be the last 24 data values.
I have this fully working, but I am sure there must be a better way to do it!
Just now, my mysql database has columns for id, timestamp, etc, and then 24 data columns:
data_01
data_02
data_03
data_04
data_05
etc
There are multiple rows for different ids.
I run a cron job every hour, which deletes column 'data_24', and then renames all columns:
data_01 -> data_02
data_02 -> data_03
data_03 -> data_04
data_04 -> data_05
data_05 -> data_06
etc
And then adds a new, blank column:
data_01
The new data is then added into this new, blank column.
Does this sound like a sensible way to do this, or is there any better way??
My concern with this method is that the column deleting, renaming and adding has to be done first, before the new data is retrieved, so that the new column exists for adding data.
If the data retrieve fails for any reason, my table then has a column with NULL as a data value.

Renaming columns for something like this is not a good idea.
I'm curious how you insert and update this data, but there must be a better way to do this.
Two things that seem feasible:
Not renaming the column, but moving the data to the next column:
update YourTable
set data1 = :newvalue,
data2 = data1,
data3 = data2,
...;
Or by spreading the data over 24 rows instead of having 24 columns. Each data is a row in your table, (or in a new table where your id is a foreign key). Every time when you insert a new value, you can also delete the oldest value for that same id. You can do this in one atomic transaction so there won't ever be more or less than 24 rows per id.
insert into YourTable(id, data)
values (:id, :newvalue);
delete from YourTable
where id = :id
order by timestamp desc
limit 1;
This will multiply the number of rows (but not the amount of data) by 24, so for 1000 rows (like you mentioned), you're talking about 24000 rows, which is still peanuts if you have the proper indexes.
We got tables in MySQL with over 100 million rows. Manipulating 24000 rows is WAY easier than rewriting a complete table of 1000 rows, which is essentially what you're doing by renaming the columns.
So the second option certainly has my preference. It will provide you with a simple structure, and should you ever decide to not clean up old data, or move that to a separate job, or stick to 100 items instead of 24, then you can easily do that by changing 3 lines of code, instead of completely overhauling your table structure and the application with it.

It doesn't look as a sensible way of doing thins, to be honest.
IMHO, having multiple rows instead of having the wide table is much more flexible.
You can define columns (id, entity_id, created). Then you'll be able to write your records in a "log" manner.
When you need to select the data in the same way as it used to be, you can use the MySQL view for that. Something like
CREATE VIEW my_view AS
SELECT data_01, ..., data_24 -- here you should put the aggregated values aliased as data_01 ... data_24
FROM my_table
WHERE my_table.created >= DATE_SUB(NOW(), INTERVAL 1 DAY)
GROUP BY ... -- here you should aggregate the fields by hours
ORDER BY created;

Related

Updating Large Amount Of Rows With Different Sets Of Columns More Efficiently?

I have 10,000 rows I need to update in a MySQL table, and I need to update a different set of columns for each row (for example, some rows need the username changed, some rows need the phone number changed, and some rows need both changed). I need to be able to update these 10,000 rows in under 10 minutes, which presents the problem:
I am currently performing a separate update query for each row (using PDO), and it takes way too long to update 10,000 rows via 10,000 separate queries. I have used a "batch insert" before to speed up inserting 10,000 rows, but what can I do to speed things up performance-wise in the update department?
There is basically no advantage to updating 1 or all columns in a record. The overhead is in logging, locking, and managing the dirty data page -- and a data page generally contains multiple records.
If I assume that the none of the values are being updated to NULL, then you can create a table of updates that has:
The primary key of the table being updated.
non-NULL new values in columns whose values are changing.
NULL values for columns that are not changing.
Then, the update looks like:
update original o join
updates u
on o.pk = u.pk
set o.col1 = coalesce(u.col1, o.col1),
o.col2 = coalesce(u.col2, o.col2),
. . . ;
No where clause is needed because presumably every row in the updates table will have at least one non-NULL value.
10,000 rows in Mysql is nothing, especially if you use the PK (integer) in your where clause. The secret to fast update is proper indexing and use of the WHERE clause. If you need to update all rows, then you might just scan the entire table.

I want to set a max number of rows in MySQL table and Delete

what i actual want that i want to insert data in sql table but i want to limit table with 50 row no more and when i insert new data it delete the old data as new data enter
for example
there is 50 data already in table i add 10 new data the new data will insert and table show delete 10 rows from beginning.
if new data(rows) 10 and already data in db is 45 row so it delete first 5 rows and add new 5
so i need help and suggestion how to put restriction on table and when i new data came it delete from start row some rows if data exceed 50 rows
thanks in advance
Why? You are just making the inserts take longer.
Instead, you can insert new rows and use an auto-incrementing primary key. Then you can do something like:
select t.*
from t
order by t.id desc
fetch first 50 rows only;
This will get you the most recent 50 rows. And the query should perform quite well.
What advantages does this have?
You get to keep all the data, which is quite useful to see what happened in the past.
Performance is not affected.
You can change "50" to another number on-the-fly.
Your inserts are not slowed down by deletes.
There is no need to deal with triggers and other complexity.
Of course, if your table is going to grow to tens of millions of rows, this might not be the optimal solution (the table itself will start to eat up memory for other purposes). But for smallish tables, this is a very viable solution.

MySQL time series database, track quantity/price/data history — insert a new row only if a new value is different from the previous one?

I am trying to do time series product database that tracks a product stock quantities (100k+ products). It will be updated from a CSV file every 30 min and I only want to insert a new record if the AvailQuant or the AvailNextQuant has changed. Every new source CSV file has a new date and a time on every row. Some stock quantities might change only once per month so no point to insert a new duplicate row every 30 min when only the time is different. There must be some easy obvious way to do this as I would think that this is quite a common thing to do (price history tracking sites etc, update only when price change).
Columns are as follows: ProductID, AvailQuant, AvailDate, AvailTime, AvailNextQuant, AvailNextDate.
I first thought to use 3 separate tables: tmp1, tmp2 and final time series table. First LOAD DATA INFILE REPLACE into tmp1 table and then INSERT only new products and UPDATE the existing products if stock value change into tmp2 and after that from tmp2 table INSERT IGNORE into final time series table where unique index is: ProductID + Date + Time. Not sure how to archive this or is it even anyway near a correct approach? Now I also think that with the LOAD DATA INFILE I should only need one tmp table?
PS. I’m totally newbie with the MySQL so if someone knows how to do this, a little explanation with the example code would be highly appreciated.
set ProductID, AvailQuant and AvailNextQuant as the primary key. then use an insert on duplicte key. here is an example
On duplicate key ignore?
So this is what I came up with so far. Not 100% sure if it works correctly and doesn’t skip any rows, it looks like it works ok when I tested it. If someone knows better and simpler way please let us know (there must be easier or simpler way)? This method is not perfect as the discontinued products will not be deleted from the temp tables. Also not sure how to test the integrity of the data and the code as there are 100k+ rows on the each file that gets loaded every 30 min?
I have set up 3 duplicate tables, tmp1, tmp2 and times_series
Step 1, tmp1: Primary key = ProductID (CSV gets imported here)
Step 2, tmp2: Primary key = ProductID (Cleans the unwanted rows)
Final, time_series: Primary key = ProductID, AvailDate, AvailTime (Holds the time series data history)
Columns are as follows: ProductID, AvailQuant, AvailDate, AvailTime, AvailNextQuant, AvailNextDate.
Step 1, First we need to get the data from CSV (Tab delimited) to the database. Load data infile from CSV file to tmp1. REPLACE command and the ProductID as a primary key will replace already existing Products and INSERT new ones that don't exist in the database. Discontinued products will not be deleted from tmp1. We only want the latest data, that's why to replace.
sql1 = ”LOAD DATA LOCAL INFILE ’csv_file.txt’
REPLACE
INTO TABLE tmp1
FIELDS TERMINATED BY '\t'
ENCLOSED BY ''
LINES TERMINATED BY '\n'
IGNORE 1 ROWS";
Step 2, Then we need to compare tmp1 ProductID, AvailQuant and AvailNextQuant to the tmp2 table and select and replace only the changed rows from the tmp1 to the tmp2. Again REPLACE command and the ProductID as a Primary key will replace the old rows with new and the new rows (products) that didn’t exist before will be inserted into the tmp2 as well. Discontinued products will not be deleted from tmp2. Without step 2, the tmp1 table would have contained new rows that have only different date and time, this would have caused time series data to have duplicate rows with only a different date. This data is ready for the time series table because it only contains new changed rows and the existing rows that didn’t change. Existing rows that didn't change will be ignored on the final insert.
sql2 = ”REPLACE tmp2
SELECT tmp1.*
FROM tmp1 LEFT OUTER JOIN tmp2
ON tmp1.ProductID=tmp2.ProductID
AND tmp1.AvailQuant=tmp2.AvailQuant
AND tmp1.AvailNextQuant=tmp2.AvailNextQuant
WHERE tmp2.ProductID IS NULL”;
Finally, We can insert and ignore from the tmp2 to the time_series table. Because primary key = (ProductID, date, time) IGNORE will ignore errors from duplicate rows that are in the time series table but has not changed in the tmp2.
sql3 = ”INSERT IGNORE INTO time_series
SELECT * FROM tmp2”;

Optimal query - any way to avoid hundreds of queries in loop?

I want to save top 100 results for a game daily.
I have two tables: Users, Stats.
Users db has two columns: userID(mediumint(8) unsigned AUTO_INCREMENT) and userName(varchar(500)).
Stats db has three columns: time(date), userID(mediumint(8) unsigned AUTO_INCREMENT), result(tinyint(3) unsigned).
Now, every time I execute query (daily) I have array of 100 results wit user name. So here's what I need to do:
For every result in array:
get user id from Users table - or if user doesn't exist in User
table than create entry and get id;
Insert to Stats table current date, user id and and result.
What would be the most optimal way to do this in php and mysql. Is there a way to avoid having 200 queries in 'for' loop.
Thanks for your time guys.
200 queries par day is nothing. You can leave everything as is and there will be not a single problem.
why do you have an array with user names where you ought to have user ids instead?
Mysql INSERT query support multiple VALUES statement. So, you could assemble a string like
VALUES (time,userid,result),(time,userid,result),(time,userid,result)
and run it at once.
Also note that userID should be int, not medium int and in the stats table it shouldn't be autoincremented.
Use a pair of prepared statements. You'll still be running each one 100 times, but the query itself will already be parsed (and even cached by the DB server), it'll just have 100 different sets of parameters to be run with, which is quite efficient.

What's the fastest way to poll a MySQL table for new rows?

My application needs to poll a MySQL database for new rows. Every time new rows are added, they should be retrieved. I was thinking of creating a trigger to place references to new rows on a separate table. The original table has over 300,000 rows.
The application is built in PHP.
Some good answers, i think the question deserves a bounty.
For external applications I find using a TimeStamp column is a more robust method that is independent of auto id and other primary key issues
Add columns to the tables such as:
insertedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP
or to track inserts and updates
updatedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
In the external application all you need to do is track the last timestamp when you did a poll. Then select from that timestamp forward on all the relevant tables. In large tables you may need to index the timestamp column
You can use the following statement to find out if a new record was inserted in the table:
select max(id) from table_name
replacing the name of primary key and table name in the above statement. Keep the max(id) value in a temporary variable, and retrieve all new records between this and the last saved max(id) value. After fetching the new records, set max(id) value to the one you got from the query.
Create a PHP Daemon to monitor the MySQL Table File size, if size changes query for new records, if new records found run next process.
I think there is an active PEAR daemon you can easily configure to monitor the MySQL Table file size and kick off your script.
assuming you have an identify or some other data that always grow, you should keep track on your php application of the last id retrieved.
that'd work for most scenarios. Unless you are into the real time camp, I don't think you'd need any more than that.
I would do something like this. Of course, this is assuming that ID is an incrementing numerical ID.
And how you store your "current location" in the database is upto you.
<?
$idFile = 'lastID.dat';
if(is_file($idFile)){
$lastSelectedId = (int)file_get_contents($idFile);
} else {
$lastSelectedId = 0;
}
$res = mysql_query("select * from table_name where id > {$lastSelectedId}");
while($row = mysql_fetch_assoc($res)){
// Do something with the new rows
if($row['id']>$lastSelectedId){
$lastSelectedId = $row['id'];
}
}
file_put_contents($idFile,$lastSelectedId);
?>
I would concurr with TFD's answer about keeping track of a timestamp in an separate file/table and then fetching all rows newer than that. That's how I do it for a similar application.
Your application querying a single row table (or file) to see if a timestamp has changed from the local storage should not be much of a performance hit. Then, fetching new rows from the 300k row table based on timestamp should again be fine, assuming timestamp is properly indexed.
However, reading your question I was curious if Mysql triggers can do system calls, say a php script that would do some heavy lifting. Turns out they can by using the sys_exec() User-Defined Function. You could use this to do all sorts of processing by passing into it the inserted row data, essentially having an instant notification of inserts.
Finally, a word of caution about using triggers to call external applications.
One option might be to use an INSERT INTO SELECT statement. Taking from the suggestions using timestamps to pull the latest rows, you could do something like...
INSERT INTO t2 (
SELECT *
FROM t1
WHERE createdts > DATE_SUB(NOW(), INTERVAL 1 HOUR)
);
This would take all of the rows inserted in the previous hour and insert them in to table 2. You could have a script run this query and have it run every hour (or whatever interval you need).
This would drastically simplify your PHP script for pulling rows as you wouldn't need to iterate over any rows. It also gets rid of having to keep track of the last insert id.
The solution Fanis purposed also sounds like it could be interesting as well.
As a note, the select query in the above insert can but adjusted to only insert certain fields. If you only need certain fields, you would need to specify them in the insert like so...
INSERT INTO t2 (field1, field2) (
SELECT field1, field2
FROM t1
WHERE createdts > DATE_SUB(NOW(), INTERVAL 1 HOUR)
);

Categories