MySQL database row COUNT optimization - php

I have a MySQL (5.6.26) database with large ammount of data and I have problem with COUNT select on table join.
This query takes about 23 seconds to execute:
SELECT COUNT(0) FROM user
LEFT JOIN blog_user ON blog_user.id_user = user.id
WHERE email IS NOT NULL
AND blog_user.id_blog = 1
Table user is MyISAM and contains user data like id, email, name, etc...
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(50) DEFAULT NULL,
`email` varchar(100) DEFAULT '',
`hash` varchar(100) DEFAULT NULL,
`last_login` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`created` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`) USING BTREE,
UNIQUE KEY `email` (`email`) USING BTREE,
UNIQUE KEY `hash` (`hash`) USING BTREE,
FULLTEXT KEY `email_full_text` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=5728203 DEFAULT CHARSET=utf8
Table blog_user is InnoDB and contains only id, id_user and id_blog (user can have access to more than one blog). id is PRIMARY KEY and there are indexes on id_blog, id_user and id_blog-id_user.
CREATE TABLE `blog_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_blog` int(11) NOT NULL DEFAULT '0',
`id_user` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `id_blog_user` (`id_blog`,`id_user`) USING BTREE,
KEY `id_user` (`id_user`) USING BTREE,
KEY `id_blog` (`id_blog`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=5250695 DEFAULT CHARSET=utf8
I deleted all other tables and there is no other connection to MySQL server (testing environment).
What I've found so far:
When I delete some columns from user table, duration of query is shorter (like 2 seconds per deleted column)
When I delete all columns from user table (except id and email), duration of query is 0.6 seconds.
When I change blog_user table also to MyISAM, duration of query is 46 seconds.
When I change user table to InnoDB, duration of query is 0.1 seconds.
The question is why is MyISAM so slow executing the command?

First, some comments on your query (after fixing it up a bit):
SELECT COUNT(*)
FROM user u LEFT JOIN
blog_user bu
ON bu.id_user = u.id
WHERE u.email IS NOT NULL AND bu.id_blog = 1;
Table aliases help make it easier to both write and to read a query. More importantly, You have a LEFT JOIN but your WHERE clause is turning it into an INNER JOIN. So, write it that way:
SELECT COUNT(*)
FROM user u INNER JOIN
blog_user bu
ON bu.id_user = u.id
WHERE u.email IS NOT NULL AND bu.id_blog = 1;
The difference is important because it affects choices that the optimizer can make.
Next, indexes will help this query. I am guessing that blog_user(id_blog, id_user) and user(id, email) are the best indexes.
The reason why the number of columns affects your original query is because it is doing a lot of I/O. The fewer columns then the fewer pages needed to store the records -- and the faster the query runs. Proper indexes should work better and more consistently.

To answer the real question (why is myisam slower than InnoDB), I can't give an authoritative answer.
But it is certainly related to one of the more important differences between the two storage engines : InnoDB does support foreign keys, and myisam doesn't. Foreign keys are important for joining tables.
I don't know if defining a foreign key constraint will improve speed further, but for sure, it will guarantee data consistency.
Another note : you observe that the time decreases as you delete columns. This indicates that the query requires a full table scan. This can be avoided by creating an index on the email column. user.id and blog.id_user hopefully already have an index, if they don't, this is an error. Columns that participate in a foreign key, explicit or not, always must have an index.

This is a long time after the event to be much use to the OP and all the foregoing suggestions for speeding up the query are entirely appropriate but I wonder why no one has remarked on the output of EXPLAIN. Specifically, why the index on email was chosen and how that relates to the definition for the email column in the user table.
The optimizer has selected an index on email column, presumably because it's included in the where clause. key_len for this index is comparatively long and it's a reasonably large table given the auto_increment value so the memory requirements for this index would be considerably greater than if it had chosen the id column (4 bytes against 303 bytes). The email column is NULLABLE but has a default of the empty string so, unless the application explicitly sets a NULL, you are not going to find any NULLs in this column anyway. Neither will you find more than one record with the default given the UNIQUE constraint. The column DEFAULT and UNIQUE constraint appear to be completely at odds with each other.
Given the above, and the fact we only want the count in the query, I'd then wonder if the email part of the where clause serves any purpose other than slowing the query down as each value is compared to NULL. Without it the optimizer would probably pick the primary key and do a much better job. Better yet would be a query which ignored the user table entirely and took the count based on the covering index on blog_user that Gordon Linoff highlighted.
There's another indexing issues here worth mentioning:
On the user table
UNIQUE KEY `id` (`id`) USING BTREE,
is redundant since id is the PRIMARY KEY and therefore UNIQUE by definition.

To answer your last question,
The question is why is MyISAM so slow executing the command?
MyISAM is dependent on the speed of your hard drive,
INNODB once the data is read is at speed of RAM. 1st time query is run could be loading data, second and later will avoid hard drive until aged out of RAM.

Related

How to properly index attributes in wordpress database

I am extending a product sales plugin and am trying to understand how wordpress handles database relations. I am building tables on activation using dbDelta. An example of a table schema would be:
$table_schema = [
"CREATE TABLE IF NOT EXISTS `{$wpdb->prefix}plugin_orders` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`people_id` bigint(20) DEFAULT NULL,
`order_id` bigint(20) DEFAULT NULL,
`order_status` varchar(11) DEFAULT NULL,
`order_date` datetime DEFAULT NULL,
`order_total` decimal(13,2) DEFAULT NULL,
`accounting` tinyint(4) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `people_id` (`people_id`),
KEY `order_id` (`order_id`)
) $collate;",
"CREATE TABLE IF NOT EXISTS `{$wpdb->prefix}plugin_order_product` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) DEFAULT NULL,
`product_id` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_id` (`order_id`),
KEY `product_id` (`product_id`)
) $collate;"
];
I see that id in each table is the PRIMARY KEY but what does declaring the other KEYs actually do? I have read that wordpress uses MyISAM which doesn't actually build foreign key connections. While these tables may point to other tables already existing, in this example does declaring KEY order_id (order_id) create a variable of sorts called order_id that any other table can use to reference? Is this code specifically connecting one tables attributes to another tables attributes (it doesn't appear to be)? After these tables are built, I can inspect them in phpMyAdmin and see that there are indexes assigned but no foreign key constraints. How does this code create tables that point one table at another to build relations?
KEY `foo_bar` (`order_id`)
"KEY" is the same as "INDEX". It specifies that a separate data structure is maintained for the efficient access of the table via the column order_id.
foo_bar is the name of the index. It has no special meaning, and has very few uses. For example, DROP KEY foo_bar; is the way to get rid of the index.
In MyISAM, a "FOREIGN KEY" allowed, but ignored. In InnoDB, it does two things:
Create an index if one is not already provided
Provide a constraint. The default effectively "complain if the other table does not already have the value referenced".
Having an index is important for performance. The index above make this
SELECT ... WHERE order_id = 1234 ...
run in milliseconds, even if there are billions of rows in the table. Without the index, the query would take minutes or hours.
A PRIMARY KEY is a UNIQUE key, which is an INDEX.
UNIQUE(widget) says that only one row can have a particular value of `widget in the table.
PRIMARY KEY(id) says that each row is uniquely identified by the column id. InnoDB really wants each table to have a PK.
"id" is a convention (not a requirement) for the name of the PK. It is also INT AUTO_INCREMENT by convention. You may or may not actually ever touch id.
Tables can be related to each other in 3 main ways:
1:1 -- They share the same unique key. This is rarely useful; you may as well have a single table.
1:many -- An "order" has several "items" in it (one-order : many-items). This is usually handled by order_id being a column in the items table.
many:many -- students_classes -- each student is in many classes; each class has many students. This is implemented via a mapping table that has (usually) only two columns: student_id and class_id (no id is needed) and PRIMARY KEY(student_id, class_id) and INDEX(class_id, student_id). Those two indexes make it efficient to go from a known student to their classes, and vice versa.
Another convention for the PK of a table is to include the table name. (It is clutter to do that for other columns, such as order_status.) I was assuming this convention for student_id and class_id.
But now I am confused by your plugin_orders -- it has both id and order_id. If that table describes "orders", then I would expect order_id to be the PK instead of id.
And, if order_product is a list of all the "products" in each "order", then I would expect you to have the 1:many pattern.
What indexes to have?
PRIMARY KEY to uniquely identify each row -- either id or some column (or combination of columns) that are unique.
Other columns, as needed, for the SELECTs, UPDATEs, and DELETEs that you have. Do not blindly add indexes before having some clues of the queries that might need them.
Indexes sometimes help in sorting:
SELECT ... ORDER BY last_name, first_name;
together with
INDEX(last_name, first_name)
Indexes provide performance; FKs provide integrity checks. Neither is "required"; both are "desirable".
MyISAM is ancient; you should change to InnoDB.
Then do something like
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE ...
In this example, the Optimizer will perform the query something like this:
Look at the WHERE to see which table is best filtered by the conditions there. Declare that to be the first table work with.
Scan through the first table, using an index if practical.
For each row in the first table, reach into the second table.
Reaching into the second table would probably be done via INDEX(order_id) on the second table. This would make the JOIN fast and efficient.
Both tables have INDEX(order_id), but that is not relevant.
Next example:
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE o.people_id = 123 -- note
Pick o as the first table due to filtering on people_id
use op INDEX(people_id) to rapidly find the o rows that are relevant.
etc (op is the second table)
Next example:
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE op.product_id = 9887 -- changed again
Pick op as the first table due to filtering on product_id
use o INDEX(people_id) to rapidly find the op rows that are relevant.
etc (o is the second table this time)

Doctrine / MySQL Slow query even when using indexes

I cleaned the question a little bit because it was getting very big and unreadable.
Running on my localhost.
As you can see in the image below, the query takes 755.15 ms when selecting from the table Job that contains 15000 rows (with the where conditions returning 6650)
The table Company contains 1000 rows.
The table geo__name contains 84300 rows approx and is not giving me any problem, so I believe the problem is the database structure or something.
The structure of these 2 tables is the following:
Table Job is:
CREATE TABLE `job` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`company_id` int(11) NOT NULL,
`activity_sector_id` int(11) DEFAULT NULL,
`status` int(11) NOT NULL,
`active` datetime NOT NULL,
`contract_type_id` int(11) NOT NULL,
`salary_type_id` int(11) NOT NULL,
`workday_id` int(11) NOT NULL,
`geoname_id` int(11) NOT NULL,
`title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`minimum_experience` int(11) DEFAULT NULL,
`min_salary` decimal(7,2) DEFAULT NULL,
`max_salary` decimal(7,2) DEFAULT NULL,
`zip_code` int(11) DEFAULT NULL,
`vacancies` int(11) DEFAULT NULL,
`show_salary` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `created_at` (`created_at`,`active`,`status`) USING BTREE,
CONSTRAINT `FK_FBD8E0F823F5422B` FOREIGN KEY (`geoname_id`) REFERENCES `geo__name` (`id`),
CONSTRAINT `FK_FBD8E0F8398DEFD0` FOREIGN KEY (`activity_sector_id`) REFERENCES `activity_sector` (`id`),
CONSTRAINT `FK_FBD8E0F85248165F` FOREIGN KEY (`salary_type_id`) REFERENCES `job_salary_type` (`id`),
CONSTRAINT `FK_FBD8E0F8979B1AD6` FOREIGN KEY (`company_id`) REFERENCES `company` (`id`),
CONSTRAINT `FK_FBD8E0F8AB01D695` FOREIGN KEY (`workday_id`) REFERENCES `workday` (`id`),
CONSTRAINT `FK_FBD8E0F8CD1DF15B` FOREIGN KEY (`contract_type_id`) REFERENCES `job_contract_type` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The table company is:
CREATE TABLE `company` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`logo` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`website` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`user_id` int(11) NOT NULL,
`phone` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`cifnif` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`type` int(11) NOT NULL,
`subscription_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_4FBF094FA76ED395` (`user_id`),
KEY `IDX_4FBF094F9A1887DC` (`subscription_id`),
KEY `name` (`name`(191)),
CONSTRAINT `FK_4FBF094F9A1887DC` FOREIGN KEY (`subscription_id`) REFERENCES `subscription` (`id`),
CONSTRAINT `FK_4FBF094FA76ED395` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The query is the following:
SELECT
j0_.id AS id_0,
j0_.status AS status_1,
j0_.title AS title_2,
j0_.min_salary AS min_salary_3,
j0_.max_salary AS max_salary_4,
c1_.id AS id_5,
c1_.name AS name_6,
c1_.logo AS logo_7,
a2_.id AS id_8,
a2_.name AS name_9,
g3_.id AS id_10,
g3_.name AS name_11,
j4_.id AS id_12,
j4_.name AS name_13,
j5_.id AS id_14,
j5_.name AS name_15,
w6_.id AS id_16,
w6_.name AS name_17
FROM
job j0_
INNER JOIN company c1_ ON j0_.company_id = c1_.id
INNER JOIN activity_sector a2_ ON j0_.activity_sector_id = a2_.id
INNER JOIN geo__name g3_ ON j0_.geoname_id = g3_.id
INNER JOIN job_salary_type j4_ ON j0_.salary_type_id = j4_.id
INNER JOIN job_contract_type j5_ ON j0_.contract_type_id = j5_.id
INNER JOIN workday w6_ ON j0_.workday_id = w6_.id
WHERE
j0_.active >= CURRENT_TIMESTAMP
AND j0_.status = 1
ORDER BY
j0_.created_at DESC
When executing the above query I have these results:
In MYSQL Workbench: 0.578 sec / 0.016 sec
In Symfony profiler: 755.15 ms
The question is: Is the duration of this query correct? if not, how can I improve the speed of the query? it seems too much.
The Symfony debug toolbar if it helps:
As you can see in the below image, I'm only getting the data I really need:
The explain query:
The timeline:
The MySQL server can't handle the load being placed on it. This could be due to resource contention, or because it has not been appropriately tuned and it could also be a problem with your hard drive.
First, I would start your performance by adding MySQL keyword "STRAIGHT_JOIN" which tells MySQL to query the data in the order I have provided, dont try to think the relationships for me. However, on your dataset being so small, and already 1/2 second, don't know if that will help as much, but on larger datasets I have known it to SIGNIFICANTLY improve performance.
Next, you appear to be getting lookup descriptions based on the PK/FK relationship results. Not seeing the indexes on those tables, I would suggest doing covering indexes which contain both the key and description so the join can get the data from the index pages it uses for the JOIN instead of use index page, find the actual data pages to get the description and continue.
Last, your job table with the index on (created_at,active,status), might perform better if the index had the index as ( status, active, created_at ).
With your existing index, think of it this way, each day of data is put into a single box. Within each day box that is sorted by an active timestamp (even if simplified by active date), THEN the status.
So, for each day CREATED, you open a box. Look at secondary boxes, one for each "Active" timestamp (ex: by day). Within each Active timestamp (day), only now can you see if the "Status = 1" records. So open each active timestamp day, assess Status = 1, then close each created day box and go to the next created day box and repeat. So look at the labor intensive of open each box per day, each active box within that day.
Now, under the suggested index starting with status. You now have a very finite number of boxes, one for each status. Open only the 1 box for status = 1 These are the only ones you want to consider... All the others you don't care. Inside that, you have the actual records based on ACTIVE Timestamp and that is sub-sorted. From that, you can jump directly to those at the current timestamp. From the first record and the rest within the box, you now have all the records that qualify. Done. Since these records (index) ALSO has the Created_at as part of the index, it can optimize that with the descending sort order.
For ensuring "covering indexes" for the other lookup tables if they do not yet exist, I suggest the following.
table index
company ( id, name, logo )
activity_sector (id, name )
geo__name ( id, name )
job_salary_type ( id, name )
job_contract_type ( id, name )
workday ( id, name )
And the MySQL Keyword...
SELECT STRAIGHT_JOIN (rest of query...)
There are several reasons as to why Symfony is slow.
1. Server fault
First, it could be the server fault. Server performances may hinder your query time.
2. Data size and defered rendering
Then comes the data size. As you can see on the image below, the query on one of my project have a 50Mb data size (currently about 20k rows).
Parsing 50Mb in HTML can take some time, mostly because of loops.
Still, there are solutions about this, like defered rendering.
Defered rendering is quite simple, instead of parsing data in your twig you,
send all data to a javascript varaible, and use javascript to parse/render data once the DOM is loaded.
3. Query optimisation
As I wrote in comment, you can check the following question, on which I explained why custom queries are important.
Are Doctrine relations affecting application performance?
In this question, you will read that order matter... It's in fact the most important thing.
While static data in your databases are often inserted in the right order,
it's rarely the case for dynamic data (data provided by user during the website life)
Which is why, using ORDER BY in your query will often speed up the page rendering,
as doctrine won't be doing extra queries on it's own.
As exemple, One of my site have about 700 entries diplayed on the index.
First, here is the query count while using findAll() :
It show 254 query (253 duplicates) in 144ms, plus 39 render time.
Next, using the second parameter of findBy(), ORDER BY, I get this result :
You can see the full query here (sreenshot is big)
Much better, 1 query only in 8ms, and about the same render time.
But, here, I don't use any fields from associations.
From the moment I will do it, doctrine qui do some extra query, and query count and time will skyrocket.
In the end, it will turn back to something like findAll()
And last, this is the custom query :
In this custom query, the query time went from 8ms to 38ms.
But, unlike the previous query, I got way more data in my result,
which will prevent doctrine from doing extra query.
Again, ORDER BY() matter in this query. Without it, I skyrocket back to 84 queries.
4. Partials
When you do custom query, you can load partials objects instead of full data.
As you said in your question, description field seems to slow down your loading speed,
with partials, you can avoid to load some fields from the table, which will speed up query speed.
First, instead of your regular syntax, this is how you will create the query builder :
$em=$this->getEntityManager();
$qb=$em->createQueryBuilder();
Just in case, I prefer to keep $em as a separate variable (if I want to fetch some class repository for example).
Then you can start your partial select. Careful, first select can't include any association fields :
$qb->select("partial job.{id, status, title, minimum_experience, min_salary, max_salary, zip_code, vacancies")
->from(Job::class, "job");
Then you can add your associations :
$qb->addSelect("company")
->join("job.company", "company");
Or even add partial association in case you don't need all the data of the association :
$qb->addSelect("partial activitySector.{id}")
->join("job.activitySector", "activitySector");
$qb->addSelect("partial job.{id, company_id, activity_sector_id, status, active, contract_type_id, salary_type_id, workday_id, geoname_id, title, minimum_experience, min_salary, max_salary, zip_code, vacancies, show_salary");
5. Caches
You could also use various caches, like Zend OPCache for PHP, which you will find some advices in this question: Why Symfony3 so slow?
There is also the SQL cache Varnish.
This round up about everything I can share to lower your loading time.
Hope it will prove useful and you will be able to solve your problem.
So many keys , try to minimize the number of keys.

index chat mysql table

I have this chat table:
CREATE TABLE IF NOT EXISTS `support_chat` (
`id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`from` varchar(255) NOT NULL DEFAULT '',
`to` varchar(255) NOT NULL DEFAULT '',
`message` text NOT NULL,
`sent` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`seen` varchar(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `from` (`from`),
KEY `to` (`to`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 AUTO_INCREMENT=1 ;
basically I need to do a select all the time (3s per user) to check new messages:
select id, `from`, message, sent from support_chat where `to` = ? and seen = 0
I have 5 million rows, usually 100 users online at the same time. Can I change something to make this table faster? key from and key to is a good option?
There isn't much you can do by way of indexes to speed up that particular query. You could have a composite index on the to and seen fields but the improvement will be minimal if at all. Why? Because the seen field has very poor cardinality. You only seem to be storing 0 or 1 in it and indexes on such columns are not very usefull. Often it would be faster for the query optimizer to read the data directly.
But here's what you can do Partition:
... enables you to distribute portions of individual tables across a
file system according to rules which you can set largely as needed. In
effect, different portions of a table are stored as separate tables in
different locations. The user-selected rule by which the division of
data is accomplished is known as a partitioning function,
You can partition your data in such a way that very old data is separated from the new. This will probably give you a big boost. However be aware that if you have a query that fetches old data as well as new data that will be a lot slower.
Here is another thing you can do: Add a limit clause.
You are probably only showing a limited number of messages at any given time. Putting a limit clause will help. Then mysql knows that it doesn't need to look anymore after it has found N rows.
Add a multiple column index on to and seen columns in this particular order (to column should be the 1st column in the index). Then run explain select... on your query to see if the new index is used.
Assuming that the seen column stores 2 values only ('0' and '1') and that to column stores the recipient of the chat message (email, username), so it can have many more values, I'd use a composite index with seen first and to second:
ALTER TABLE support_chat
ADD INDEX seen_to_ix
(seen, `to`) ;
A composite index with reversed order (`to`, seen) would be a good choice, too. It might even be better depending on server load and how often the table is updated. An advantage (if you decide to use the second index), is that you can remove the (`to`) index.
Pick and add one of the two indexes and check the performance of your queries again.
Additional notes:
Using a varchar(1) for what is essentially a boolean value is not optimal. Even worse that it is a utf8mb4 charset. It uses 5 bytes! (1 for the variable and 4 for the single byte!)
I'd change the type of that column to tinyint (and store 0 and 1) or bit.
Please avoid using reserved words (eg, from, to) for table and column names.

Same MySql Query Long execution time but short on archive table with 6million more records

I am a bit stumped on this wierdness.
I have a gps tracking app that logs gps points into a track_log table.
When I do a basic query on the running log table it takes about 50 seconds to complete:
SELECT * FROM track_log WHERE node_id = '26' ORDER BY time_stamp DESC LIMIT 1
When I run the exact same query on the archived table where I copied most of the logs to to reduce the running table's logs to about 1.2 million records.
The archive table is 7.5 million records big.
The exact same query on the archive table runs for 0.1 seconds on the same server even though it's six times bigger!
What's going on?
Here's the full Create Table schema:
CREATE TABLE `track_log` (
`id_track_log` INT(11) NOT NULL AUTO_INCREMENT,
`node_id` INT(11) DEFAULT NULL,
`client_id` INT(11) DEFAULT NULL,
`time_stamp` DATETIME NOT NULL,
`latitude` DOUBLE DEFAULT NULL,
`longitude` DOUBLE DEFAULT NULL,
`altitude` DOUBLE DEFAULT NULL,
`direction` DOUBLE DEFAULT NULL,
`speed` DOUBLE DEFAULT NULL,
`event_code` INT(11) DEFAULT NULL,
`event_description` VARCHAR(255) DEFAULT NULL,
`street_address` VARCHAR(255) DEFAULT NULL,
`mileage` INT(11) DEFAULT NULL,
`run_time` INT(11) DEFAULT NULL,
`satellites` INT(11) DEFAULT NULL,
`gsm_signal_status` DOUBLE DEFAULT NULL,
`hor_pos_accuracy` double DEFAULT NULL,
`positioning_status` char(1) DEFAULT NULL,
`io_port_status` char(16) DEFAULT NULL,
`AD1` decimal(10,2) DEFAULT NULL,
`AD2` decimal(10,2) DEFAULT NULL,
`AD3` decimal(10,2) DEFAULT NULL,
`battery_voltage` decimal(10,2) DEFAULT NULL,
`ext_power_voltage` decimal(10,2) DEFAULT NULL,
`rfid` char(8) DEFAULT NULL,
`pic_name` varchar(255) DEFAULT NULL,
`temp_sensor_no` char(2) DEFAULT NULL,
PRIMARY KEY (`id_track_log`),
UNIQUE KEY `id_track_log_UNIQUE` (`id_track_log`),
KEY `client_id_fk_idx` (`client_id`),
KEY `track_log_node_id_fk_idx` (`node_id`),
KEY `track_log_event_code_fk_idx` (`event_code`),
KEY `track_log_time_stamp_index` (`time_stamp`),
CONSTRAINT `track_log_client_id` FOREIGN KEY (`client_id`) REFERENCES `clients` (`client_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `track_log_event_code_fk` FOREIGN KEY (`event_code`) REFERENCES `event_codes` (`event_code`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `track_log_node_id_fk` FOREIGN KEY (`node_id`) REFERENCES `nodes` (`id_nodes`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=8632967 DEFAULT CHARSET=utf8
TL;DR
Make sure the indexes are defined in both tables, for this query node_id and time_stamp are good indexes.
Defragment your table: https://dev.mysql.com/doc/refman/5.5/en/innodb-file-defragmenting.html (This could help, but should not make this much of a difference).
Make sure your query is not being blocked by other queries. If data is being inserted in the track_log table at continuously, those queries might block your query. You can prevent this by changing the transaction isolation level, see https://dev.mysql.com/doc/refman/5.5/en/set-transaction.html for more information. Caution: be carefull with this!
Indexes
I'm guessing this has something to do with the indexes you defined on the tables. Could you post the SHOW CREATE TABLES track_log output and the output of your archive table as well? The query you are executing would require an index on node_id and time_stamp for optimal performance.
Defragmentation
Besides this indexes you defined on the table, this might have something to do with data fragmentation. I'm assuming you are using InnoDB as your table engine now. Depending on your settings, every table in a database is stored in a separate file or every table in the database is stored in a single file (innodb_file_per_table variable). Those files will never shrink in size. If your track_log table has grown to 8.7 million records, on disk, it still takes up space for all those 8.7 million records.
If you have moved records from your track_log table to your archive table, the data might still be at the beginning and the end of the physical file for track_log. If no index is defined at time_stamp, a full table scan is still required to order by the timestamp. This means: reading the complete file from disk. Because the records you deleted still take up space in the file, this could make a difference.
Edit:
Transactions
Other transactions might be blocking your SELECT query. This can happen with the InnoDB engine. If you continously insert a lot of data into your track_log table, those queries might block your query. It will have to wait until no other transactions are being performed at this table.
There is a way around this, but you should be careful with this. You are able to change to transaction isolation level of your query. By setting the transaction isolation level to READ UNCOMMITTED you will be able to read data, while the other inserts are running. But it might not always give you the latest data. If you want to sacrifice this depends on your situation. If you are going to alter the data and update the data later, you generally do not want to change the transaction isolation level. But, for example, when showing statistics which should not always be accurate and up to date, this could be something that really speeds up your query.
I use this myself sometimes when I need to show statistics from large tables which are updated regularly.
This is almost certainly because your archive table has superior indexing to your track_log table.
To satisfy this query efficiently you need a compound index on (node_id, time_stamp) Why does this work? Because InnoDB and MyISAM indexes are so-called BTREE indexes, which means our intuition about searching them in order will work. Your query looks for a specific value of node_id, which means it can jump to that value in the index efficiently. The query then calls for the highest possible value of time_stamp related to that node_id value. Now that's in the same index, and in the right order to access it quickly too. So the row you need can be random-accessed, and MySQL doesn't have to hunt for it by scanning the table row by row. That scanning is almost certainly what's taking the time in your query.
Three things to keep in mind:
One: lots of indexes on single columns can't help a query as much as well-chosen compound indexes. Read this http://use-the-index-luke.com/
Two: SELECT * is usually harmful on a table with as many columns as the one you have shown. Instead, you should enumerate the columns you actually need in your SELECT query. That way MySQL doesn't have to sling as much data.
Three: The DOUBLE datatype is overkill for commercial-grade GPS data. FLOAT is plenty of precision.
Let us analyze your query:
SELECT * FROM track_log WHERE node_id = '26' ORDER BY time_stamp DESC LIMIT 1
The above mentioned query first sorts all the data present in the table based on time_stamp and then returns the top row.
But, when this query is executed on archived table, order by clause might be ignored (based on compression and system setting) and hence it returns the first row it encountered in the table.
You may verify the output of archived table by comparing the result with actual latest row.

Efficient MySQL table structure for rating system

This post is a follow-up of this answered question: Best method for storing a list of user IDs.
I took cletus and Mehrdad Afshari's epic advice of using a normalized database approach. Are the following tables properly set up for proper optimization? I'm kind of new to MySQL efficiency, so I want to make sure this is effective.
Also, when it comes to finding the average rating for a game and the total number of votes should I use the following two queries, respectively?
SELECT avg(vote) FROM votes WHERE uid = $uid AND gid = $gid;
SELECT count(uid) FROM votes WHERE uid = $uid AND gid = $gid;
CREATE TABLE IF NOT EXISTS `games` (
`id` int(8) NOT NULL auto_increment,
`title` varchar(50) NOT NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1 ;
CREATE TABLE IF NOT EXISTS `users` (
`id` int(8) NOT NULL auto_increment,
`username` varchar(20) NOT NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1 ;
CREATE TABLE IF NOT EXISTS `votes` (
`uid` int(8) NOT NULL,
`gid` int(8) NOT NULL,
`vote` int(1) NOT NULL,
KEY `uid` (`uid`,`gid`)
) ;
average votes for a game: SELECT avg(vote) FROM votes WHERE gid = $gid;
number of votes for a game: SELECT count(uid) FROM votes WHERE gid = $gid;
As you will not have any user or game ids smaller then 0 you could make them unsigned integers (int(8) unsigned NOT NULL).
If you want to enforce that a user can only make a single vote for a game, then create a primary key over uid and gid in the votes table instead of just a normal index.
CREATE TABLE IF NOT EXISTS `votes` (
`uid` int(8) unsigned NOT NULL,
`gid` int(8) unsigned NOT NULL,
`vote` int(1) NOT NULL,
PRIMARY KEY (`gid`, `uid`)
) ;
The order of the primary key's fields (first gid, then uid) is important so the index is sorted by gid first. That makes the index especially useful for selects with a given gid. If you want to select all the votes a given user has made then add another index with just uid.
I would recommend InnoDB for storage engine because especially in high load settings the table locks will kill your performance. For read performance you can implement a caching system using APC, Memcached or others.
Looks good.
I would have used users_id & games_id instead of gid and uid which sounds like global id and unique id
Whatever you end up doing, make sure you test it with a large data-set (even if you don't plan on having a huge number of users)
Write a script that generates 100,000 games, 50,000 users and a million votes. May be slightly excessive, but if your queries don't take hours with that number of items, it'll never be an issue
Looks good so far. Don't forget indices and foreign keys. In my experience most issues don't arise from not-so-well-thought-out designs but from the lack of indices and foreign keys.
Also, regarding the storage engine selection I have yet to see a reason (in a reasonably complex/sized app) for not using innodb, not just because of transactional semantics.
you might want to add a voted_on (DATETIME) column too. That way, you could, say, see a game's trend in a certain timespan, or just in case someday a vote spam happened, you could delete unwanted votes accurately.

Categories