I have some queries that are taking over 30mins to execute, I am not a database expert so I really dont know what to so here, I need someone to suggest a better query for:
select count(*),substring(tdate,1,7)
from bills
where amt='30'
group by substring(tdate,1,7)
order by substring(tdate,1,7) desc
SELECT count(*)
FROM `bills`
where amt='30'
and date(tdate)=date('$date')
and stat='RENEW'
and x1 in (select `id` from sub);
here I pass the value of $date in the following format 'Y-m-d 00:00:00'
select count(*),substring(tdate,1,7)
from bills
where amt='30'
group by substring(tdate,1,7)
order by substring(tdate,1,7) desc
Table structures:
MariaDB [talksport]> desc bills;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| bid | int(11) | NO | PRI | NULL | auto_increment |
| num | varchar(500) | NO | | NULL | |
| stat | varchar(500) | NO | | NULL | |
| tdate | varchar(500) | NO | | NULL | |
| x1 | varchar(500) | NO | | NULL | |
| amt | varchar(500) | NO | | 30 | |
+-------+--------------+------+-----+---------+----------------+
Any and all help is welcome.
Michael
Your three queries are really two (the first and third are the same). These are your three queries (reformatted so they are readable):
select count(*), left(tdate, 7)
from bills
where amt = '30'
group by left(tdate, 7)
order by left(tdate, 7) desc;
select count(*)
from `bills`
where amt = '30' and date(tdate) = date('$date') and stat = 'RENEW' and
x1 in (select `id` from sub);
First, you want an index on bills(amt, tdate) for the first query. THe second is more problematic. In some versions of MySQL, in can be an issue. Also, date arithmetic is problematic. So, if you are storing tdate as YYYY-MM-DD, then pass in $date in the same format (better yet, use parameters, better better yet use the right types). So, I would write this as:
select count(*)
from `bills` b
where amt = '30' and tdate = '$date' and stat = 'RENEW' and
exists (select 1 from sub s where b.x1 = s.id);
Then you want an index on bills(amt, stat, tdate, id).
The right indexes should speed your queries.
In addition to the answer above, one other optimisation that can possibly be done is, replacing COUNT(*) by COUNT(id).
When you're counting all the rows and each row has a unique identifier (id being the PRIMARY KEY, which is already indexed), you would as well get the same COUNT if you only counted the ids. The query will have to look for lesser number of columns as well as the one it will sift through will already be indexed to make the searches and aggregations much faster.
It's always good to try and use specific column names instead of * in SELECT queries. Likewise, always review the columns used in SELECT as well as the columns participating in WHERE and GROUP BY clauses to identify the potential candidates for indexing.
Please note:
Creating several indexes shouldn't be assumed to be the only method to optimise, as it can possibly slow down bulk INSERTs / UPDATEs, while attempting to speed up SELECT significantly. Besides, you could end up creating indexes that may turn out to be superfluous or redundant. Therefore, a holistic view on application's purpose has to be taken into account to attain an optimal balance - depending on whether more user operations are concentrated on INSERT / UPDATE or SELECT.
Related
I am working on a project which has a large Question Bank, and for Tests added to the System, 20 questions are fetched on Run-Time dynamically based on the following query:
SELECT Question.* from Question JOIN Test
ON Question.Subject_ID = Test.Subject_ID
AND Question.Question_Level = Test.Test_Level
ORDER BY RAND()
LIMIT 20;
However, as it is known that the RAND() function the MySQL kills your server I have been looking for better solutions.
Result of EXPLAIN [above query]:
+----+-------------+----------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | Test | ALL | NULL | NULL | NULL | NULL | 5 | Using temporary; Using filesort |
| 1 | SIMPLE | Question | ALL | NULL | NULL | NULL | NULL | 7 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+----------+------+---------------+------+---------+------+------+----------------------------------------------------+
Result of EXPLAIN Question:
+-------------------+------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+------------------------------------------+------+-----+---------+----------------+
| Question_ID | int(11) | NO | PRI | NULL | auto_increment |
| Questions | varchar(100) | NO | | NULL | |
| Available_Options | varchar(200) | NO | | NULL | |
| Correct_Answer | varchar(50) | NO | | NULL | |
| Subject_ID | int(11) | NO | | NULL | |
| Question_Level | enum('Beginner','Intermediate','Expert') | NO | | NULL | |
| Created_By | int(11) | NO | | NULL | |
+-------------------+------------------------------------------+------+-----+---------+----------------+
Result of EXPLAIN Test:
+----------------+------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------------------------------+------+-----+---------+----------------+
| Test_ID | int(11) | NO | PRI | NULL | auto_increment |
| Test_Name | varchar(50) | NO | | NULL | |
| Test_Level | enum('Beginner','Intermediate','Expert') | NO | | NULL | |
| Subject_ID | int(11) | NO | | NULL | |
| Question_Count | int(11) | NO | | NULL | |
| Created_By | int(11) | NO | | NULL | |
+----------------+------------------------------------------+------+-----+---------+----------------+
Any help would be appreciated to optimize the query to reduce server load and execution time.
P.S. The system has the capability of Deletion too so the AUTO_INCREMENT PRIMARY KEY of the QUESTION and TEST table can have large gaps.
I like this question. It's a very good optimization puzzle, and let's assume for the moment that performance is very important for this query, and that you cannot use any dynamically inserted values (e.g. from PHP).
One high performance solution would be to add column with random values (say called "Rand"), order the table by this value, and periodically regenerate and re-order the table. You could then use a query like this one:
SELECT Question.* from Question
JOIN Test
ON Question.Subject_ID = Test.Subject_ID
AND Question.Question_Level = Test.Test_Level
WHERE Question.Rand > RAND()
LIMIT 20
This would perform at O(n), requiring only one scan of the table, but it would come with the risk of returning fewer than 20 results if a value very close to 1 was generated. If this was an acceptable risk (e.g. you could programmatically check for an inadequate result and re-query), you would end up with nice runtime performance.
The periodic re-generating and re-ordering of the numbers is necessary because rows early in the table with high Rand values would be favored and show up disproportionately frequently in the results. (Imagine if the first row was lucky enough to receive a Rand value of .95)
Even better would be to create a column with contiguous integers, index on this column, and then randomly choose an insertion point to grab 20 results. Such a query might look like this:
SELECT Question.* from Question
JOIN Test
ON Question.Subject_ID = Test.Subject_ID
AND Question.Question_Level = Test.Test_Level
CROSS JOIN (SELECT MAX(Rand_id) AS max_id FROM Question)
WHERE Question.Rand_Id > ROUND(RAND() * max_id)
LIMIT 20
But what if you can't alter your table in any way? If it doesn't matter how messy your SQL is, and there is a relatively low proportion of missing ids (say roughly 1/10th). You could achieve your 20 random questions with a good degree of probability with the following SQL:
SELECT Question.* from Question JOIN Test
ON Question.Subject_ID = Test.Subject_ID
AND Question.Question_Level = Test.Test_Level
WHERE Question.Question_ID IN (
SELECT DISTINCT(ROUND(rand * max_id)) AS rand_id
FROM ( --generate 30 random numbers to make sure we get 20 results
SELECT RAND() AS rand UNION ALL
SELECT RAND() AS rand UNION ALL
SELECT RAND() AS rand UNION ALL
SELECT RAND() AS rand UNION ALL
...
SELECT RAND() AS rand UNION ALL
SELECT RAND() AS rand UNION ALL
SELECT RAND() AS rand
) a
CROSS JOIN ( --get the max possible id from the Question table
SELECT MAX(id) AS max_id FROM Question
) b
)
LIMIT 20 --finally pare our results down to 20 in case we got too many
However, this will cause problems in your use case, because you effectively can't know how many results (and their IDs) will be in the result set after the join. After joining on subject and difficulty, the proportion of missing IDs might be very high and you might end up with far fewer than 20 results, even with several hundred random guesses of what IDs might be in a table.
If you're able to use logic from PHP (sounds like you are), a lot of high performance solutions open up. You could, for example, create in PHP an object whose job it was to store arrays of all the IDs of Questions with a particular subject and difficulty level. You could then pick 20 random array indexes and get back 20 valid IDs, allowing you to run a very simple query.
SELECT Question.* from Question WHERE Question_ID IN ($dynamically_inserted_ids)
Anyway, I hope this gets your imagination going with some possibilities.
Why don't you rand the numbers in PHP and then select the questions by id?
Here's the logic of my point:
$MIN = 1;
$MAX = 50000; // You may want to get the MAX from your database
$questions = '';
for($i = 0; $i < 20; $i++)
$questions .= mt_rand($MIN, $MAX) . ',';
// Removes last comma
$questions = rtrim($questions, ',');
$query = "SELECT * FROM Question WHERE Question.id IN ($questions)";
Edit 1:
I was thinking about the problem, and it ocurred me that you can select all the ID's from your db and then pick 20 items using the array_rand() function.
$values = array(1, 5, 10000, 102021, 1000000); // Your database ID's
$questions = array_rand($values, 20);
$questions[0];
$questions[1];
$questions[2]; // etc
Create the following indexes:
CREATE INDEX Question_Subject_ID_idx ON Question (Subject_ID);
CREATE INDEX Test_Subject_ID_idx ON Test (Subject_ID);
CREATE INDEX Question_Question_Level_idx ON Question (Question_Level);
CREATE INDEX Test_Test_Level_idx ON Test (Test_Level);
I investigated on the same issue a while ago and my first approach was to load all IDs first, pick random ones in PHP (see: Efficiently pick n random elements from PHP array (without shuffle)) then query for these IDs directly in MySQL.
This was an improvement but memory-consuming for large data sets. On further investigation I found a better way: Pick random IDs in one query without any other fields or JOINs, then do your real query by these IDs:
SELECT Question.* from Question JOIN Test
ON Question.Subject_ID = Test.Subject_ID
AND Question.Question_Level = Test.Test_Level
WHERE Question_ID IN (
SELECT Question_ID from Question
ORDER BY RAND()
LIMIT 20
);
Here's a blog post with benchmarks for my concrete case: Show random products in Magento.
Relevant parts:
Besides the memory issues, could it be that ORDER BY RAND() by itself
is not the problem, but using it together with all the table joins of
Magento? What if I preselect the random IDs with ORDER BY RAND()?
[...]
It was slightly slower than the PHP preselect approach, but still clearly in favor of the pure order by rand and without the increased memory usage in PHP.
[...]
The problem of the pure MySQL approach with ORDER BY RAND() became even more evident. While monitoring MySQL with mytop I noticed that besides for sorting, lots of time is spent for copying. The problem here seems to be, that sorting without an index, as with ORDER BY RAND() copies the data to a temporary table and orders that. With the flat index, all product attributes are fetched from a single table, which increases the amount of data copied to and from the temporary table for sorting. I might be missing something else here, but the performance dropped from bad to horrible, and it even caused my Vagrantbox to crash at first try because its disk got full (40 GB). So while PHP uses less memory with this approach, MySQL is all the more resource hungry.
I don't know how big your questions table is, at some point this approach is still flawed:
Second, as stated above, for big catalogs you should look for something different. The problem with ORDER BY RAND() is that even though we minimized the data to be copied, it still copies all rows to a temporary table and generates a random number for each. The sorting itself is optimized to not sort all rows (See LIMIT Optimization), but copying takes its time.
There is another famous blog post on selecting random rows in MySQL written by Jan Kneschke. He suggests using an index table with all ids, that has its own primary key without gaps. This index table would be updated automatically with triggers, and random rows can be selected by the index table, using random keys between min(key) and max(key).
If you don't use any additional conditions and query random entries from all questions this should work for you.
I'm really not sure how to title this question, so sorry about that..
I have a site where users can "watch" products. When they start watching a product, a row is inserted into product_tracking, which looks like this:
+-----------------------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+------------+------+-----+---------+----------------+
| id | bigint(11) | NO | PRI | NULL | auto_increment |
| u_id | int(11) | NO | MUL | NULL | |
| p_id | bigint(11) | NO | MUL | NULL | |
| date_started_tracking | datetime | NO | | NULL | |
| date_stopped_tracking | datetime | YES | | NULL | |
+-----------------------+------------+------+-----+---------+----------------+
The datetime the user starts watching the product is inserted into date_started_tracking when the row is inserted, and date_stopped_tracking is null by default. When the user stops watching a product, date_stopped_tracking is updated with the current datetime.
I want to have a chart on my site showing the number of products being tracked over time. Something like this:
# tracks
| __/
| ___ /
| / \__/
| /
| /
| ____/
| /
| ___/
| /
|/___________________________ date (grouped by day, week, or month)
I don't know how to write a query to retrieve the information required to make the chart.
What makes it difficult is that I want to group rows by day, week, or month, but only include rows in a group if date_started_tracking is <= the grouping date and date_stopped_tracking is null or > the grouping date (i.e. select rows showing products being tracked at that date).
In other words, for each day, week, or month (depending on the desired granularity), the number of products being watched by users at that time should be returned.
My attempt at a solution
The only idea I've come up with is creating a table of dates and then writing a query such as:
SELECT WEEK(d.date) as week,
(SELECT COUNT(pt.id)
FROM product_tracking pt
WHERE pt.date_started_tracking <= d.date AND (pt.date_stopped_tracking IS NULL OR pt.date_stopped_tracking > d.date)
) as numWatches
FROM dates d
GROUP BY week
ORDER BY week ASC
I think this will work, but it requires creating a table of dates, which seems kind of retarded.
Is there a better way?
Your method works, but it won't scale very well. An alternative approach is to keep track of the tracking day-by-day using variables for the cumulative sums. This looks like:
select dte, sum(inc) as net,
(#sum := #sum + sum(inc)) as total_for_day
from ((select date_start_tracking as dte, count(*) as inc
from product_tracking
group by date_start_tracking
) union all
(select date_stopped_tracking, - count(*)
from product_tracking
where date_stopped_tracking is not null
group by date_stopped_tracking
)
) pt cross join
(select #sum := 0) vars
group by dte
order by dte;
This is an outline of a solution. For your data, you might want one more than the stop date -- depending on whether or not the product is counted on that day.
You can then use this query as a subquery to aggregate for whatever period you like.
This is a general question, one that I've been scratching my head on for a while now. My company's database handles about 2k rows a day. 99.9% of the time, we have no problem with the values that are returned in the different SELECT statements that are set up. However, on a very rare occasion, our database will "glitch" and return the value for a completely different row than what was requested.
This is a very basic example:
+---------+-------------------------+
| row_id | columnvalue |
+---------+-------------------------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 4 | 40 |
+---------+-------------------------+
SELECT columnvalue FROM table_name WHERE row_id = 1 LIMIT 1
Returns: 10
But on the very rare occasion, it may return: 20, or 30, etc.
I am completely baffled as to why it does this sometimes and would appreciate some insight on what appears to be a programming phenomena.
More specific information:
SELECT
USERID, CONCAT( LAST, ', ', FIRST ) AS NAME, COMPANYID
FROM users, companies
WHERE users.COMPANYCODE = companies.COMPANYCODE
AND USERID = 9739 LIMIT 1
mysql> DESCRIBE users;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| USERID | int(10) | NO | PRI | NULL | auto_increment |
| COMPANYCODE| varchar(255)| NO | MUL | | |
| FIRST | varchar(255)| NO | MUL | | |
| LAST | varchar(255)| NO | MUL | | |
+------------+-------------+------+-----+---------+----------------+
mysql> DESCRIBE companies;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| COMPANYID | int(10) | NO | PRI | NULL | auto_increment |
| COMPANYCODE| varchar(255)| NO | MUL | | |
| COMPANYNAME| varchar(255)| NO | | | |
+------------+-------------+------+-----+---------+----------------+
What the results were suppose to be: 9739, "L----, E----", 2197
What the results were instead: 9739, "L----, E----", 3288
Basically, it returned the wrong company id based off the join with companycode. Given the nature of our company, I can't share any more information than that.
I have run this query 5k times and have made very modification to the code imaginable in order to generate the second set of results and I have no been able to duplicate it. I'm not quick to blame MySQL -- this has been happening (though rarely) for over 8 years, and have exhausted all other possible causes. I have suspected the results were manually changed after the query was ran, but the timestamps states otherwise.
I'm just scratching my head as to why this can run perfectly 499k out of 500k times.
Now that we have a more realistic query, I notice right away that you are joining the tables, not on the primary key, but on the company code. Are we certain that the company code is being enforced as a unique index on companies? The Limit 1 would hide a second row if such a row was found.
From a design perspective, I would make the join on the primary key to avoid even the possibility of duplicate keys and put company code in as a unique indexed field for display and lookup only.
This behavior is either due to an incredibly unlikely SERIOUS bug in MySQL, -or- MySQL is returning a result that is valid at the time the statement is run, and there is some other software that is garfing up the displayed result.
One possibility to consider is that the row had been modified (by some other statement) at the time your SQL statement executed, and then the row was changed again later. (That's the most likely explanation we'd have for MySQL returning an unexpected result.)
The use of the LIMIT 1 clause is curious, because if the predicate uniquely identifies a row, there should be no need for the LIMIT 1, since the query is guaranteed to return no more than one row.
This leads me to suspect that row_id is not unique, and that the query actually returns more than one row. With the LIMIT clause, there is no guarantee as to which of the rows will get returned (absent an ORDER BY clause.)
Otherwise, the most likely culprit is out dated cache contents, or other problems in the code.
UPDATE
The previous answer was based on the example query given; I purposefully omitted the possibility that EMP was a view that was doing a JOIN, since the question originally said it was a table, and the example query showed just the one table.
Based on the new information in the question, I suggest that you OMIT the LIMIT 1 clause from the query. That will identify that the query is returning more than one row.
From the table definitions, we see that the database isn't enforcing a UNIQUE constraint on the COMPANYCODE column in the COMPANY table.
We also know there isn't a foreign key defined, due to the mismatch between the datatypes.
Normally, the foreign key would be defined referencing the PRIMARY KEY of the target table.
What we'd expect the users table to have a company_id column, which references the id (primary key) column in the companies table.
(We note the datatype of the companycode column (int) matches the datatype of the primary key column in the companies table, and we note that the join condition is matching on the companycode column, even though the datatypes do not match, which is very odd.)
There are several reasons this could happen. I suggest you look at the assumptions you're making. For example:
If you're using GROUP BY and one of the columns isn't an aggregate or the grouping expression, you're going to get an unpredictable value in that column. Make sure you use an appropriate aggregation (such as MAX or MIN) to get a predictable result on each column.
If you're assuming a row order without making it explicit, and using LIMIT to get only the first row, the actual returned order of rows differs depending on that result's execution plan, which is going to differ in large resultsets based on the statistics available to the optimiser. Make sure you use ORDER BY in such situations.
I know that this title is overused, but it seems that my kind of question is not answered yet.
So, the problem is like this:
I have a table structure made of four tables (tables, rows, cols, values) that I use to recreate the behavior of the information_schema (in a way).
In php I am generating queries to retrieve the data, and the result would still look like a normal table:
SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')
HAVING (col2 LIKE "%4%")
OR
SELECT * FROM
(SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')) d
WHERE col2 LIKE "%4%"
note that the part where I define the columns of the result is generated by a php script. It is less important why I am doing this, but I want to extend this algorithm that generates the queries for a broader use.
And we got to the core problem, I have to decide if I will generate a where or a having part for the query, and I know when to use them both, the problem is my algorithm doesn't and I have to make a few extra checks for this. But the two above queries are equivalent, I can always put any query in a sub-query, give it an alias, and use where on the new derived table. But I wonder if I will have problems with the performance or not, or if this will turn back on me in an unexpected way.
I know how they both work, and how where is supposed to be faster, but this is why I came here to ask. Hopefully I made myself understood, please excuse my english and the long useless turns of phrases, and all.
EDIT 1
I already know the difference between the two, and all that implies, my only dilemma is that using custom columns from other tables, with variable numbers and size, and trying to achieve the same result as using a normally created table implies that I must use HAVING for filtering the derived tables columns, at the same time having the option to wrap it up in a subquery and use where normally, this probably will create a temporary table that will be filtered afterwards. Will this affect performance for a large database? And unfortunately I cannot test this right now, as I do not afford to fill the database with over 1 billion entries (that will be something like this: 1 billion in rows table, 5 billions in values table, as every row have 5 columns, 5 rows in cols table and 1 row in tables table = 6,000,006 entries in total)
right now my database looks like this:
+----+--------+-----------+------+
| id | name | title | dets |
+----+--------+-----------+------+
| 1 | table1 | Table One | |
+----+--------+-----------+------+
+----+-------+------+
| id | table | name |
+----+-------+------+
| 3 | 1 | col1 |
| 4 | 1 | col2 |
+----+-------+------+
where `table` is a foreign key from table `tables`
+----+-------+-------+
| id | table | extra |
+----+-------+-------+
| 1 | 1 | |
| 2 | 1 | |
+----+-------+-------+
where `table` is a foreign key from table `tables`
+----+-----+-----+----------+
| id | row | col | value |
+----+-----+-----+----------+
| 1 | 1 | 3 | 13 |
| 2 | 1 | 4 | 14 |
| 6 | 2 | 4 | 24 |
| 9 | 2 | 3 | asdfghjk |
+----+-----+-----+----------+
where `row` is a foreign key from table `rows`
where `col` is a foreign key from table `cols`
EDIT 2
The conditions are there just for demonstration purposes!
EDIT 3
For only two rows, it seems there is a difference between the two, the one using having is 0,0008 and the one using where is 0.0014-0.0019. I wonder if this will affect performance for large numbers of rows and columns
EDIT 4
The result of the two queries is identical, and that is:
+----------+------+
| col1 | col2 |
+----------+------+
| 13 | 14 |
| asdfghjk | 24 |
+----------+------+
HAVING is specifically for GROUP BY, WHERE is to provide conditional parameters. See also WHERE vs HAVING
I believe the having clause would be faster in this case, as you're defining specific values, as opposed to reading through the values and looking for a match.
See: http://database-programmer.blogspot.com/2008/04/group-by-having-sum-avg-and-count.html
Basically, WHERE filters out columns before passing them to an aggregate function, but HAVING filters the aggregate function's results.
you could do it like that
WHERE col2 In (14,24)
your code WHERE col2 LIKE "%4%" is bad idea so what about col2 = 34 it will be also selected.
How much faster (in %) sql will be if I will avoid to used built-in mysql date and time functions ?
What do I mean ? For example: SELECT id FROM table WHERE WEEKOFYEAR(inserted)=WEEKOFYEAR(CURDATE())
MySQL has a lot of buil-in function to work with date and time, and they are suitable as well. But what about peromance ?
Above sql can be rewritten without built-in functions, like: SELECT id FROM table WHERE inserted BETWEEN 'date for 1 day of particular week 00:00:00' AND 'last day of particular week 23:59:59', server side code become worse :( but on db side we could use indexes
I see two problems for usage built-in functions:
1. indexes
I did small test
mysql> explain extended select id from table where inserted between '2013-07-01 00:00:00' and '2013-07-01 23:59:59';
+----+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | table | range | ins | ins | 4 | NULL | 7 | 100.00 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
mysql> explain extended select id from table where date(inserted)=curdate();
+----+-------------+-------+-------+---------------+------+---------+------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+--------+----------+--------------------------+
| 1 | SIMPLE | table | index | NULL | ins | 4 | NULL | 284108 | 100.00 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+--------+----------+--------------------------+
First one took 0.00 sec second one was running after first one and took 0.15. Everything was made with small anout of data.
and second problem, is
time to call that functions
If in table I have 1 billion records it means that WEEKOFYEAR, DATE whatever... would be called so many times, so many records do we have, right ?
So the question will it bring real profit if I will stop to work with mysql built-in date and time functions ?
Using a function of a column in a WHERE clause or in a JOIN condition will prevent the use of indexes on the column(s), if such indexes exist. This is because the raw value of the column is indexed, as opposed to the computed value.
Notice the above does not apply for a query like this:
SELECT id FROM atable WHERE inserted = CURDATE(); -- the raw value of "inserted" is used in the comparison
And yes, on top of that, the function will be executed for each and every row scanned.
The second query is running the date function on every row in the table, while the first query can just use the index to find the rows it needs. Thats where the biggest slowdown would be. Look at the rows column in the explain output