Managing data in tables of MySQL by overwriting existing data [duplicate]

Managing data in tables of MySQL by overwriting existing data [duplicate] - php

This question already has answers here:
Can we limit the number of rows in a table in MySQL?
(2 answers)
Closed 8 years ago.
I have a table which holds notices for a university website.I want it to hold a maximum of 1,000 notices.I am using the id field (which is auto increment) to fetch 10 most recent notices (by counting the total entries that denotes current most recent id and then traversing in backward direction by 10)and then next 10 and so on.
Now when the notice reaches the 1000 limit the notice should start getting uploaded from id 1 by overwriting existing data.
Now the problem is how can I modify the sql query to identify the most recent notice?Because suppose I uploaded 17 notices after the table was full,then the notices 1 to 17 are the recent notices ,17 being the most recent and next to it i.e 18 is the least recent in the table .
Or is there any tutorial or something specific to such case or any optimal method?
Here is my table structure-
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(160) | NO | | NULL | |
| body | mediumtext | NO | | NULL | |
| posted_by | varchar(30) | NO | | NULL | |
| semester | int(2) | NO | | NULL | |
| branch | varchar(30) | NO | | NULL | |
| posted_on | date | NO | | NULL | |
+-----------+--------------+------+-----+---------+----------------+

You would use a delete query after your insertion query. That way you wouldn't need to pull another query to see which record was last updated.
insert into table notices...;
delete from notices where id not in (select top 1000 id from notices);
If you wanted to keep the same 1000 ids for some reason, then you could switch auto-increment off in the table, then you would have to pull two different queries:
select id,min(posted_on) from notices;
update notices set .... where id=$id;
Really because MySQL is designed to handle millions of rows, then you don't actually have to do this at all. You can pull a query with a limit of 1000.
select * from notices limit 1000 order by posted_on DESC;

Related

How to efficiently calculate averages from a big table?

I have a table called ratings with the following fields:
+-----------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------+------+-----+---------+----------------+
| rating_id | bigint(20) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | NULL | |
| movie_id | int(11) | NO | | NULL | |
| rating | float | NO | | NULL | |
+-----------+------------+------+-----+---------+----------------+
Indexes on this table:
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ratings | 0 | PRIMARY | 1 | rating_id | A | 100076 | NULL | NULL | | BTREE | | |
| ratings | 0 | user_id | 1 | user_id | A | 564 | NULL | NULL | | BTREE | | |
| ratings | 0 | user_id | 2 | movie_id | A | 100092 | NULL | NULL | | BTREE | | |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
I have another table called movie_average_ratings which has the following fields:
+----------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------+------+-----+---------+-------+
| movie_id | int(11) | NO | PRI | NULL | |
| average_rating | float | NO | | NULL | |
+----------------+---------+------+-----+---------+-------+
As it is obvious by this point I want to calculate the average rating of movies from ratings table and update the movie_average_ratingstable. I tried the following SQL query.
UPDATE movie_average_ratings
SET average_rating = (SELECT AVG(rating)
FROM ratings
WHERE ratings.movie_id = movie_average_ratings.movie_id);
Currently, there are around 10,000 movie records and 100,000 rating records and I get Lock wait timeout exceeded; try restarting transaction error. The number of records can grow significantly so I don't think increase timeout is a good solution.
So, how can I write 'scalable' query to acheive this? Is iterating the movie_average_ratings table records and calculate averages individually the most efficient solution to this?

Without an explain, it's hard to be clear on what's holding you up. It's also not clear that you will get a performance improvement by storing this aggregated data as a denormalized table - if the query to calculate the ratings executes in 0.04 seconds, it's unlikely querying your denormalized table will be much faster.
In general, I recommend only denormalizing if you know you have a performance problem.
But that's not the question.
I would do the following:
delete from movie_average_ratings;
insert into movie_average_ratings
Select movie_ID, avg(rating)
from ratings
group by movie_id;

I just found something in another post:
What is happening is, some other thread is holding a record lock on
some record (you're updating every record in the table!) for too long,
and your thread is being timed out.
This means that some of your records are locked you can force unlock them in the console:
1) Enter MySQL mysql -u your_user -p
2) Let's see the list of locked tables mysql> show open tables where in_use>0;
3) Let's see the list of the current processes, one of them is locking
your table(s) mysql> show processlist;
4) Kill one of these processes mysql> kill put_process_id_here;

You could redesign the movie_average_ratings table to
movie_id (int)
sum_of_ratings (int)
num_of_ratings (int)
Then, if a new rating is added you can add it to movie_average_ratings and calculate the average if needed

Ranking players in assassins game from mysql table

I am trying to figure out how to calculate the rankings for a game of assassins that I am running, I wish to rank people by kills primarily, and then by time of kills (those who got kills before the others are ranked higher) and then last the people that have been assassinated already ranked below those that are alive.
My table for logging assassinations looks like this:
mysql> describe assassinations;
+-----------+-----------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-----------------------------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| assassin | int(11) | NO | | NULL | |
| target | int(11) | NO | | NULL | |
| timestamp | int(11) | NO | | NULL | |
| ver | enum('assassin','target','both','none') | NO | | none | |
| confirmed | bit(1) | NO | | b'0' | |
+-----------+-----------------------------------------+------+-----+---------+----------------+
I am thinking that there must be a way to order the mysql results just like the way I want it to be ranked, but I don't know how. I got as far as trying to get the most common assassin value :(. I am using PHP with MySQL so a PHP solution would also work. (Please note, ignore the "confirmed" field, but "ver" must be both for it to be a valid kill).
Any help would be much appreciated. :)

Use COUNT and MIN to get the number of kills and the time of the first kill. And an EXISTS subquery to tell if the assassin has already been killed. Then you can use all these values in the ORDER BY clause to rank the players.
SELECT a1.assassin, COUNT(*) AS kills, MIN(timestamp) AS killtime,
EXISTS (SELECT * FROM assassinations AS a2
WHERE a2.target = a1.assassin) AS killed
FROM assassins AS a1
WHERE ver = 'both'
GROUP BY assassin
ORDER BY kills DESC, killtime ASC, killed ASC

How should I Query this in mysql

I have a web app in which I show a series of posts based on this table schema (there are thousands of rows like this and other columns too (removed as not required for this question)) :-
+---------+----------+----------+
| ID | COL1 | COL2 |
+---------+----------+----------+
| 1 | NULL | ---- |
| 2 | --- | NULL |
| 3 | NULL | ---- |
| 4 | --- | NULL |
| 5 | NULL | NULL |
| 6 | --- | NULL |
| 7 | NULL | ---- |
| 8 | --- | NULL |
+---------+----------+----------+
And I use this query :-
SELECT * from `TABLE` WHERE `COL1` IS NOT NULL AND `COL2` IS NULL ORDER BY `COL1`;
And the resultant result set I get is like:-
+---------+----------+----------+
| ID | COL1 | COL2 |
+---------+----------+----------+
| 12 | --- | NULL |
| 1 | --- | NULL |
| 6 | --- | NULL |
| 8 | --- | NULL |
| 11 | --- | NULL |
| 13 | --- | NULL |
| 5 | --- | NULL |
| 9 | --- | NULL |
| 17 | --- | NULL |
| 21 | --- | NULL |
| 23 | --- | NULL |
| 4 | --- | NULL |
| 32 | --- | NULL |
| 58 | --- | NULL |
| 61 | --- | NULL |
| 43 | --- | NULL |
+---------+----------+----------+
Notice that the IDs column is jumbled thanks to the order by clause.
I have proper indexes to optimize these queries.
Now, let me explain the real problem. I have a lazy-load kind of functionality in my web-app. So, I display around 10 posts per page by using a LIMIT 10 after the query for the first page.
We are good till here. But, the real problem comes when I have to load the second page. What do I query now? I do not want the posts to be repeated. And there are new posts coming up almost every 15 seconds which make them go on top(by top I literally mean the first row) of the resultset(I do not want to display these latest posts in the second or third pages but they alter the resultset size so I cannot use LIMIT 10,10 for the 2nd page and so on as the posts will be repeated.).
Now, all I know is the last ID of the post that I displayed. Say 21 here. So, I want to display the posts of IDs 23,4,32,58,61,43 (refer to the resultset table above). Now, do I load all the rows without using the LIMIT clause and display 10 ids occurring after the id 21. But for that I will have to interate over thousands of useless rows.But, I cannot use a LIMIT clause for the 2nd,3rd... pages that is for sure. Also, the IDs are jumbled, so I can definitely not use WHERE ID>.... So, where do we go now?

I'm not sure if I've understood your question correctly, but here's how I think I would do it:
Add a timestamp column to your table, let's call it date_added
When displaying the first page, use your query as-is (with LIMIT 10) and hang on to the timestamp of the most recent record; let's call it last_date_added.
For the 2nd, 3rd and subsequent pages, modify your query to filter out all records with date_added > last_date_added, and use LIMIT 10, 10, LIMIT 20, 10, LIMIT 30, 10 and so on.
This will have the effect of freezing your resultset in time, and resetting it every time the first page is accessed.
Notes:
Depending on the ordering of your resultset, you might need a separate query to obtain the last_date_added. Alternatively, you could just cut off at the current time, i.e. the time when the first page was accessed.
If your IDs are sequential, you could use the same trick with the ID.

Hmm..
I thought for a while and came up with 2 solutions. :-
To store the Ids of the post already displayed and query WHERE ID NOT IN(id1,id2,...). But, that would cost you extra memory. And if the user loads 100 pages and the ids are in 100000s then a single GET request would not be able to handle it. At least not in all browsers. A POST request can be used.
Alter the way you display posts from COL1. I don't know if this would be a good way for you. But, it can save you bandwith and make your code cleaner. It may also be a better way. I would suggest this :- SELECT * from TABLE where COL1 IS NOT NULL AND COL2 IS NULL AND Id>.. ORDER BY ID DESC LIMIT 10,10. This can affect the way you display your posts by leaps and bounds. But, as you said in your comments that you check if a post meets a criteria and change the COL1 from NULL to the current timestammp, I guess that the newer the posts the, the more above you want to display them. It's just an idea.

I assume new posts will be added with a higher ID than the current max ID right? So couldn't you just run your query and grab the current max ID. Then when you query for page 2 do the same query but with "ID < max_id". This should give you the same result set as your page 1 query because any new rows will have ID > max_id. Hope that helps?

How about?
ORDER BY `COL1`,`ID`;
This would always put IDs in order. This will let you use:
LIMIT 10,10
for your second page.

How can i update the Records included in another query using SUM and GROUP By in mysql

I am having a mysql table
content_votes_tmp
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| up | int(11) | NO | MUL | 0 | |
| down | int(11) | NO | | 0 | |
| ip | int(10) unsigned | NO | | NULL | |
| content | int(11) | NO | | NULL | |
| datetime | datetime | NO | | NULL | |
| is_updated | tinyint(2) | NO | | 0 | |
| record_num | int(11) | NO | PRI | NULL | auto_increment |
+------------+------------------+------+-----+---------+----------------+
surfers can vote up or vote down on posts i.e. content, a record gets inserted everytime a vote is given same as rating , in the table along with other data like ip , content id
Now i am trying to create cronjob script in php which will SUM(up) and SUM(down) of votes
like this,
mysqli_query($con, "SELECT SUM(up) as up_count, SUM(down) as down_count, content FROM `content_votes_tmp` WHERE is_updated = 0 GROUP by content")
and then by using while loop in php i can update the main table for the specific content id,
but i would like to set the records which are part of SUM to be marked as updated i.e. SET is_updated = 1, so the same values wont get summed again and again.
How can i achieve this ? using mysql query ? and work on same data set as , every second/milisecond the records are getting inserted in the table ,.
i can think of another way of achieving this is by getting all the non-updated records and doing sum in the php and then updating every record.

The simplest way would probably be a temporary table. Create one with the record_num values you want to select from;
CREATE TEMPORARY TABLE temp_table AS
SELECT record_num FROM `content_votes_tmp` WHERE is_updated = 0;
Then do your calculation using the temp table;
SELECT SUM(up) as up_count, SUM(down) as down_count, content
FROM `content_votes_tmp`
WHERE record_num IN (SELECT record_num FROM temp_table)
GROUP by content
Once you've received your result, you can set is_updated on the values you just calculated over;
UPDATE `content_votes_tmp`
SET is_updated = 1
WHERE record_num IN (SELECT record_num FROM temp_table)
If you want to reuse the connection to do the same thing again, you'll need to drop the temporary table before creating it again, but if you just want to do it a single time in a page, it will disappear automatically when the database is disconnected at the end of the page.

Speed Up MySQL (MyISAM) COUNTs with WHERE Clauses

We are implementing a system that analyses books. The system is written in PHP, and for each book loops through the words and analyses each of them, setting certain flags (that translate to database fields) from various regular expressions and other tests.
This results in a matches table, similar to the example below:
+------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| regex | varchar(250) | YES | | NULL | |
| description | varchar(250) | NO | | NULL | |
| phonic_description | varchar(255) | NO | | NULL | |
| is_high_frequency | tinyint(1) | NO | | NULL | |
| is_readable | tinyint(1) | NO | | NULL | |
| book_id | bigint(20) | YES | | NULL | |
| matched_regex | varchar(255) | YES | | NULL | |
| [...] | | | | | |
+------------------------+--------------+------+-----+---------+----------------+
Most of the omitted fields are tinyint, either 0 or 1. There are currently 25 fields in the matches table.
There are ~2,000,000 rows in the matches table, the output of analyzing ~500 books.
Currently, there is a "reports" area of the site which queries the matches table like this:
SELECT COUNT(*)
FROM matches
WHERE is_readable = 1
AND other_flag = 0
AND another_flag = 1
However, at present it takes over a minute to fetch the main index report as each query takes about 0.7 seconds. I am caching this at a query level, but it still takes too long for the initial page load.
As I am not very experienced in how to manage datasets such as this, can anyone advise me of a better way to store or query this data? Are there any optimisations I can use with MySQL to improve the performance of these COUNTs, or am I better off using another database or data structure?
We are currently using MySQL with MyISAM tables and a VPS for this, so switching to a new database system altogether isn't out of the question.

You need to use indexes, create them on the columns you do a WHERE on most frequently.
ALTER TABLE `matches` ADD INDEX ( `is_readable` )
etc..
You can also create indexes based on multiple columns, if your doing the same type of query over and over its useful. phpMyAdmin has the index option on the structure page of the table at the bottom.

Add multi index to this table as you are selecting by more than one field. Below index should help a lot. Those type of indexes are very good for boolean / int columns. For indexes with varchar values read more here: http://dev.mysql.com/doc/refman/5.0/en/create-index.html
ALTER TABLE `matches` ADD INDEX ( `is_readable`, `other_flag`, `another_flag` )
One more thing is to check your queries by using EXPLAIN {YOUR WHOLE SQL STATEMENT} to check which index is used by DB. So in this example you should run query:
EXPLAIN ALTER TABLE `matches` ADD INDEX ( `is_readable`, `other_flag`, `another_flag` )
More info on EXPLAIN: http://dev.mysql.com/doc/refman/5.0/en/explain.html

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Managing data in tables of MySQL by overwriting existing data [duplicate] - php

Related

How to efficiently calculate averages from a big table?

Ranking players in assassins game from mysql table

How should I Query this in mysql

How can i update the Records included in another query using SUM and GROUP By in mysql

Speed Up MySQL (MyISAM) COUNTs with WHERE Clauses

Categories

Resources