I have a table where I log members.
There are 1,486,044 records here.
SELECT * FROM `user_log` WHERE user = '1554143' order by id desc
However, this query takes 5 seconds. What do you recommend ?
Table construction below;
CREATE TABLE IF NOT EXISTS `user_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user` int(11) NOT NULL,
`operation_detail` varchar(100) NOT NULL,
`ip_adress` varchar(50) NOT NULL,
`l_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
COMMIT;
For this query:
SELECT * FROM `user_log` WHERE user = 1554143 order by id desc
You want an index on (user, id desc).
Note that I removed the single quotes around the filtering value for user, since this column is a number. This does not necessarily speeds things up, but is cleaner.
Also: select * is not a good practice, and not good for performance. You should enumerate the columns you want in the resultset (if you don't need them all, do not select them all). If you want all columns, since your table has not a lot of columns, you might want to try a covering index on all 5 columns, like: (user, id desc, operation_detail, ip_adress, l_date).
In addition to the option of creating an index on (user, id), which has already been mentioned, a likely better option is to convert the table to InnoDB as create an index only on (user).
Related
I have a MySQL (5.6.26) database with large ammount of data and I have problem with COUNT select on table join.
This query takes about 23 seconds to execute:
SELECT COUNT(0) FROM user
LEFT JOIN blog_user ON blog_user.id_user = user.id
WHERE email IS NOT NULL
AND blog_user.id_blog = 1
Table user is MyISAM and contains user data like id, email, name, etc...
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(50) DEFAULT NULL,
`email` varchar(100) DEFAULT '',
`hash` varchar(100) DEFAULT NULL,
`last_login` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`created` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`) USING BTREE,
UNIQUE KEY `email` (`email`) USING BTREE,
UNIQUE KEY `hash` (`hash`) USING BTREE,
FULLTEXT KEY `email_full_text` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=5728203 DEFAULT CHARSET=utf8
Table blog_user is InnoDB and contains only id, id_user and id_blog (user can have access to more than one blog). id is PRIMARY KEY and there are indexes on id_blog, id_user and id_blog-id_user.
CREATE TABLE `blog_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_blog` int(11) NOT NULL DEFAULT '0',
`id_user` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `id_blog_user` (`id_blog`,`id_user`) USING BTREE,
KEY `id_user` (`id_user`) USING BTREE,
KEY `id_blog` (`id_blog`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=5250695 DEFAULT CHARSET=utf8
I deleted all other tables and there is no other connection to MySQL server (testing environment).
What I've found so far:
When I delete some columns from user table, duration of query is shorter (like 2 seconds per deleted column)
When I delete all columns from user table (except id and email), duration of query is 0.6 seconds.
When I change blog_user table also to MyISAM, duration of query is 46 seconds.
When I change user table to InnoDB, duration of query is 0.1 seconds.
The question is why is MyISAM so slow executing the command?
First, some comments on your query (after fixing it up a bit):
SELECT COUNT(*)
FROM user u LEFT JOIN
blog_user bu
ON bu.id_user = u.id
WHERE u.email IS NOT NULL AND bu.id_blog = 1;
Table aliases help make it easier to both write and to read a query. More importantly, You have a LEFT JOIN but your WHERE clause is turning it into an INNER JOIN. So, write it that way:
SELECT COUNT(*)
FROM user u INNER JOIN
blog_user bu
ON bu.id_user = u.id
WHERE u.email IS NOT NULL AND bu.id_blog = 1;
The difference is important because it affects choices that the optimizer can make.
Next, indexes will help this query. I am guessing that blog_user(id_blog, id_user) and user(id, email) are the best indexes.
The reason why the number of columns affects your original query is because it is doing a lot of I/O. The fewer columns then the fewer pages needed to store the records -- and the faster the query runs. Proper indexes should work better and more consistently.
To answer the real question (why is myisam slower than InnoDB), I can't give an authoritative answer.
But it is certainly related to one of the more important differences between the two storage engines : InnoDB does support foreign keys, and myisam doesn't. Foreign keys are important for joining tables.
I don't know if defining a foreign key constraint will improve speed further, but for sure, it will guarantee data consistency.
Another note : you observe that the time decreases as you delete columns. This indicates that the query requires a full table scan. This can be avoided by creating an index on the email column. user.id and blog.id_user hopefully already have an index, if they don't, this is an error. Columns that participate in a foreign key, explicit or not, always must have an index.
This is a long time after the event to be much use to the OP and all the foregoing suggestions for speeding up the query are entirely appropriate but I wonder why no one has remarked on the output of EXPLAIN. Specifically, why the index on email was chosen and how that relates to the definition for the email column in the user table.
The optimizer has selected an index on email column, presumably because it's included in the where clause. key_len for this index is comparatively long and it's a reasonably large table given the auto_increment value so the memory requirements for this index would be considerably greater than if it had chosen the id column (4 bytes against 303 bytes). The email column is NULLABLE but has a default of the empty string so, unless the application explicitly sets a NULL, you are not going to find any NULLs in this column anyway. Neither will you find more than one record with the default given the UNIQUE constraint. The column DEFAULT and UNIQUE constraint appear to be completely at odds with each other.
Given the above, and the fact we only want the count in the query, I'd then wonder if the email part of the where clause serves any purpose other than slowing the query down as each value is compared to NULL. Without it the optimizer would probably pick the primary key and do a much better job. Better yet would be a query which ignored the user table entirely and took the count based on the covering index on blog_user that Gordon Linoff highlighted.
There's another indexing issues here worth mentioning:
On the user table
UNIQUE KEY `id` (`id`) USING BTREE,
is redundant since id is the PRIMARY KEY and therefore UNIQUE by definition.
To answer your last question,
The question is why is MyISAM so slow executing the command?
MyISAM is dependent on the speed of your hard drive,
INNODB once the data is read is at speed of RAM. 1st time query is run could be loading data, second and later will avoid hard drive until aged out of RAM.
I am working on a CMS system (largely as a learning exercise) for a private website. Atm I have three tables: one for articles, one for tags and a joining table so that each article can have multiple tags.
The table I am having issues with consists of three columns -
article_tags: id (auto_increment), article_id, tag_id
My problem stems from the fact that an article can appear any number of times, and a tag can also appear any number of times, however a given combination of the two should only appear once - that is, each article should only have one reference to any single tag. Currently it is possible to INSERT "duplicate" rows where the id is different, but the combination of article_id and tag_id are the same:
id , article_id, tag_id
1 1 1
2 1 2
3 2 1
4 1 1 <- this is wrong
I could check in PHP code for a record that contains this combination, but I'd prefer to do it in sql if possible (if it is not, or it is undesirable then I will do it using PHP). Due to the id being different and the inability to set unique columns things like INSERT IGNORE and ON DUPLICATE do not work.
I'm quite new to mySQL so if I'm doing something silly please point me in the right direction.
Thanks
You should review your table definition.
You can (from best to worst):
Add a composite primary key on (article_id and tag_id) and remove auto_increment (previous primary key)
Add an index (UNIQUE) on (article_id and tag_id) and keep your auto_increment primary key
Select distinct in php: SELECT DISTINCT(article_id, tag_id) FROM
... without changing anything in your table
Right now, your table is defined as something like this:
CREATE TABLE IF NOT EXISTS `article_tags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`article_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The best solution (option 1) would be to remove your current (auto_increment) primary key and add a primary key (composite) on columns article_id and tag_id:
CREATE TABLE IF NOT EXISTS `article_tags` (
`article_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`article_id`,`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
But (option 2) if you absolutely want to keep your auto_increment primary key, add an index (unique) on your columns:
CREATE TABLE IF NOT EXISTS `article_tags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`article_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `article_id` (`article_id`,`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Anyway, if you don't want to change your table definition, you could always use DISTINCT in your php query:
SELECT DISTINCT(article_id, tag_id) FROM article_tags
Such many-to-many relationship tables, sometimes called join tables, often have just two columns, and have a primary key that's a composite of the two.
article_id
tag_id
pk = (article_id, tag_id)
If you change the definition of that table you will definitively solve that problem.
How should you order the columns in composite keys? It depends on how your application will look up items in the join table. If you'll always start with the article_id and look up the tag_id, then you put the article_id first in the key. The DBMS can random-access values for the first column in the key, but has to scan the index to find values in second (or subsequent) columns in the key.
You may want to create a second index on the table, (tag_id, article_id). This will allow fast lookups based on the tag_id. You may ask, "why bother to put both columns in the index?" That's to make the index into a covering index. In a covering index, the desired value can be retrieved directly from the index. For example, with a covering index,
SELECT article_id FROM article_tag WHERE tag_id = 12345
(or a JOIN that uses similar lookup logic) only needs to access the index on the disk drive to get the result. If you don't have a covering index, the query needs to jump from the index to the data table, which is an extra step.
Join tables typically have very short rows (a couple of integers) so the duplicated data for a couple of covering indexes (the primary key and the extra one) isn't a big disk-space hog.
I need to determine what table data is from for a news feed. The feed must say something like "Person has uploaded a video" or "Person has updated their bio". Therefore I need to determine where data came from as different types of data are in different tables, obviously. I am hoping you can do this with SQL but probably not so PHP is the option. I have no idea how to do this so just need pointing in the right direction.
I'll briefly describe the database as I don't have time to make a diagram.
1.There is a table titled members with all basic info such as email, password and ID. The ID is the primary key.
All other tables have foreign keys for the ID linking to the ID in the members table.
Other tables include; tracks, status, pics, videos. All pretty self explanatory from there.
I need to determine somehow what table the updated data comes from so I can then tell the user what so and so has done. Preferably I would want only one SQL statement for the whole feed so all the tables are joined and ordered by timestamp making everything much simpler for me. Hopefully I can do both but as I said really not sure.
A basic outline of the statement, will be longer but have simplified;
SELECT N.article, N.ID, A.ID, A.name,a.url, N.timestamp
FROM news N
LEFT JOIN artists A ON N.ID = A.ID
WHERE N.ID = A.ID
ORDER BY N.timestamp DESC
LIMIT 10
Members table;
CREATE TABLE `members` (
`ID` int(111) NOT NULL AUTO_INCREMENT,
`email` varchar(100) COLLATE latin1_general_ci NOT NULL,
`password` varchar(100) COLLATE latin1_general_ci NOT NULL,
`FNAME` varchar(100) COLLATE latin1_general_ci NOT NULL,
`SURNAME` varchar(100) COLLATE latin1_general_ci NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`ID`),
UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
Tracks table, all other tables are pretty much the same;
CREATE TABLE `tracks` (
`ID` int(11) NOT NULL,
`url` varchar(200) COLLATE latin1_general_ci NOT NULL,
`name` varchar(100) COLLATE latin1_general_ci NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' ON UPDATE CURRENT_TIMESTAMP,
`track_ID` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`track_ID`),
UNIQUE KEY `url` (`url`),
UNIQUE KEY `track_ID` (`track_ID`),
KEY `ID` (`ID`),
CONSTRAINT `tracks_ibfk_1` FOREIGN KEY (`ID`) REFERENCES `members` (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
Before I have tried using a mysql query for each table and putting everything into an array and echoing it out. This seemed long and tiresome and I had no luck with it. I have now deleted all that code as it was a week or so ago.
Please do not feel you have to go into depth with this just point me in the right direction.
ADDITION:
Here is the sql query i have made for a trigger that was suggested. Not sure what is wrong as have never used trigger before. When inserting something into tracks this error comes up
#1054 - Unknown column 'test' in 'field list'
The values in the query are just for testing at the moment
delimiter $$
CREATE
TRIGGER tracks_event AFTER INSERT
ON tracks FOR EACH ROW
BEGIN
INSERT into events(ID, action)
VALUES (3, test);
END$$
delimiter ;
UPDATE!
I have now created a table called events as suggested and used triggers to update it AFTER an insert in one of several tables.
Here is the query I have tried but it is wrong. The query needs to get info referenced in the events table from all the other tables and order by timestamp.
SELECT T.url, E.ID, T.ID, E.action, T.name, T.timestamp
FROM tracks T
LEFT JOIN events E ON T.ID = E.ID
WHERE T.ID = E.ID
ORDER BY T.timestamp DESC
In that query I have only include the events and tracks table for simplicity as the problem is still there. There will be many more tables so the problem will worsen.
It's hard to describe the problem but basically because there is an ID in every table and one ID can do several actions, the action can be shown with the wrong outcome, in this case url.
I will explain what's in the events table and the tracks table and give the outcome to further explain.
In the events table;
4 has uploaded a track.
3 has some news.
4 has become an NBS artist.
In the tracks;
2 uploads/abc.wav Cannonballs & Stones 2012-08-20 23:59:59 1
3 uploads/19c9aa51c821952c81be46ca9b2e9056.mp3 test 2012-08-31 23:59:59 2
4 uploads/2b412dd197d464fedcecb1e244e18faf.mp3 testing 2012-08-31 00:32:56 3
4 111 111111 0000-00-00 00:00:00 111111
Outcome of query;
uploads/19c9aa51c821952c81be46ca9b2e9056.mp3 3 3 has some news. test 2012-08-31 23:59:59
uploads/2b412dd197d464fedcecb1e244e18faf.mp3 4 4 has uploaded a track. testing 2012-08-31 00:32:56
uploads/2b412dd197d464fedcecb1e244e18faf.mp3 4 4 has become an NBS artist. testing 2012-08-31 00:32:56
111 4 4 has become an NBS artist. 111111 0000-00-00 00:00:00
111 4 4 has uploaded a track. 111111 0000-00-00 00:00:00
As you can see the query gives unwanted results. The action for each ID is given on each url so the url can be shown more than once and with the wrong action. Because there is only the tracks table in that query, the only action i would want showing is 'has uploaded a track.'
It's hard to provide the statement you want without the full details of your schema. For example, the question refers to a news table and an artists table, but doesn't provide the schemas for those, or indicate how the statement that contains those references relate to any of the other tables mentioned in the question.
Still, I think what you want can be done entirely in MySQL, without any fun PHP tricks, especially if there are common fields in each of the various tables.
But first: this might not be the answer you're really wanting, but using triggers on your various tables to update an "events feed" table is likely the best solution. i.e., when an insert or update happens on the "status" table, have a trigger on the status table that inserts into the "events feed" table the ID of the person, and their type of action. You could have a separate insert and update trigger to indicate different events for the same data type.
Then it'd be super-easy to have an events feed, because you're just selecting straight from that events feed table.
Check out the create trigger syntax.
That said, I think you might have a look at the CASE and UNION keywords.
You can then construct a query that grabs data from all tables and outputs strings indicating something. You could then turn that query into a view, and use that as an "events feed" table to select directly from.
Say you have a list of members (which you do), and the various tables that contain actions from those members (i.e., tracks, status, pics, videos), which all have a key pointing back to your members table. You don't need to select from members to generate a list of activity, then; you can just UNION together the tables that have certain events.
SELECT
events.member_id
, events.table_id
, events.table
, events.action
, events.when_it_happened
, CASE
WHEN events.table = "tracks" THEN "Did something with tracks"
WHEN events.table = "status" THEN "Did something with status"
END
AS feed_description
FROM (
SELECT
tracks.ID AS member_id
, tracks.track_ID AS table_id
, "tracks" AS table
, CONCAT(tracks.url, ' ', tracks.name) AS action
, tracks.timestamp AS when_it_happened
ORDER BY tracks.timestamp DESC
LIMIT 10
UNION
SELECT
status.ID as member_id
, status.status_id AS table_id
, "status" AS table
, status.value AS action
, status.timestamp AS when_it_happened
ORDER BY status.timestamp DESC
LIMIT 10
UNION
...
) events
ORDER BY events.when_it_happened DESC
I still think you'd be better off creating a feed table built by triggers, because it'll perform a lot better if you're querying for the feed more often than generating events.
I have a table that its structure is as like as follow:
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`ttype` int(1) DEFAULT '19',
`title` mediumtext,
`tcode` char(2) DEFAULT NULL,
`tdate` int(11) DEFAULT NULL,
`visit` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `tcode` (`tcode`),
KEY `ttype` (`ttype`),
KEY `tdate` (`tdate`)
ENGINE=MyISAM
I have two query on x.php same as:
SELECT * FROM table_name WHERE id='10' LIMIT 1
UPDATE table_name SET visit=visit+1 WHERE id='10' LIMIT 1
My first problem is that whether updating 'visit' in table cause reindexing and decreasing performance or not? Note to this point that 'visit' is not key.
Second method may be creating new table that contain 'visit' like as follow:
'newid' int(10) unsigned NOT NULL ,
`visit` int(11) DEFAULT '0',
PRIMARY KEY (`newid`),
ENGINE=MyISAM
So selecting by
SELECT w.*,q.visit FROM table_name w LEFT JOIN table_name2 q
ON (w.id=q.newid) WHERE w.id='10' LIMIT 1
UPDATE table_name2 SET visit=visit+1 WHERE newid='10' LIMIT 1
Is second method prefered rescpect to first method? Which one would have better performance and would be quick?
Note: all sql queries would be run by PHP (mysql_query command). Also I need first table indexes for other queries on other pages.
I'd say your first method is the best, and simplest. Updating visit will be very fast and no updating of indexes needs to be performed.
I'd prefer the first, and have used that for similar things in the past with no problems. You can remove the limit clause since id is your primary key you will never have more than 1 result, although the query optimizer probably does this for you.
There was a question someone asked earlier to which I responded with a solution you may want to consider as well. When you do 'count' columns you lose the ability to mine the data later. With a transaction table not only can you get 'views' counts, but you can also query for date ranges etc. Sure you will carry the weight of storing potentially hundreds of thousands of rows, but the table is narrow and indices numeric.
I cannot see a solution on the database side... Perhaps you can do it in PHP: If the user has a PHP session, you could, for example, only update the visitor count each 10th time, like:
<?php
session_start();
$_SESSION['count']+=1;
if ($_SESSION['count'] > 10) {
do_the_function_that_updates_the_count_plus_10();
$_SESSION['count'] = 0;
}
Of course you loose some counts, this way, but perhaps this is not that important?
This post is a follow-up of this answered question: Best method for storing a list of user IDs.
I took cletus and Mehrdad Afshari's epic advice of using a normalized database approach. Are the following tables properly set up for proper optimization? I'm kind of new to MySQL efficiency, so I want to make sure this is effective.
Also, when it comes to finding the average rating for a game and the total number of votes should I use the following two queries, respectively?
SELECT avg(vote) FROM votes WHERE uid = $uid AND gid = $gid;
SELECT count(uid) FROM votes WHERE uid = $uid AND gid = $gid;
CREATE TABLE IF NOT EXISTS `games` (
`id` int(8) NOT NULL auto_increment,
`title` varchar(50) NOT NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1 ;
CREATE TABLE IF NOT EXISTS `users` (
`id` int(8) NOT NULL auto_increment,
`username` varchar(20) NOT NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1 ;
CREATE TABLE IF NOT EXISTS `votes` (
`uid` int(8) NOT NULL,
`gid` int(8) NOT NULL,
`vote` int(1) NOT NULL,
KEY `uid` (`uid`,`gid`)
) ;
average votes for a game: SELECT avg(vote) FROM votes WHERE gid = $gid;
number of votes for a game: SELECT count(uid) FROM votes WHERE gid = $gid;
As you will not have any user or game ids smaller then 0 you could make them unsigned integers (int(8) unsigned NOT NULL).
If you want to enforce that a user can only make a single vote for a game, then create a primary key over uid and gid in the votes table instead of just a normal index.
CREATE TABLE IF NOT EXISTS `votes` (
`uid` int(8) unsigned NOT NULL,
`gid` int(8) unsigned NOT NULL,
`vote` int(1) NOT NULL,
PRIMARY KEY (`gid`, `uid`)
) ;
The order of the primary key's fields (first gid, then uid) is important so the index is sorted by gid first. That makes the index especially useful for selects with a given gid. If you want to select all the votes a given user has made then add another index with just uid.
I would recommend InnoDB for storage engine because especially in high load settings the table locks will kill your performance. For read performance you can implement a caching system using APC, Memcached or others.
Looks good.
I would have used users_id & games_id instead of gid and uid which sounds like global id and unique id
Whatever you end up doing, make sure you test it with a large data-set (even if you don't plan on having a huge number of users)
Write a script that generates 100,000 games, 50,000 users and a million votes. May be slightly excessive, but if your queries don't take hours with that number of items, it'll never be an issue
Looks good so far. Don't forget indices and foreign keys. In my experience most issues don't arise from not-so-well-thought-out designs but from the lack of indices and foreign keys.
Also, regarding the storage engine selection I have yet to see a reason (in a reasonably complex/sized app) for not using innodb, not just because of transactional semantics.
you might want to add a voted_on (DATETIME) column too. That way, you could, say, see a game's trend in a certain timespan, or just in case someday a vote spam happened, you could delete unwanted votes accurately.