MySQL GROUP BY optimisation

MySQL GROUP BY optimisation - php

Can anyone tell me how I can speed up mysql group by clause? Ive read the documentation but it doesnt give any good examples.
UPDATE SQL
SELECT
post.topic_id,
topic.topic_posts,
topic.topic_title,
topic.topic_poster_name,
topic.topic_last_post_id,
forum.forum_name AS group_name,
`group`.slug AS child_slug,
`parent`.slug AS parent_slug
FROM bb_posts post
LEFT JOIN bb_topics topic
ON topic.topic_id = post.topic_id
LEFT JOIN bb_forums forum
ON forum.forum_id = topic.forum_id
LEFT JOIN wp_bp_groups `group`
ON topic.forum_id = `group`.id
LEFT JOIN wp_bp_groups `parent`
ON `group`.parent_id = `parent`.id
WHERE (topic_title LIKE '%$search_terms%' || MATCH(post.post_text) AGAINST('$search_terms'))
&& topic_status = 0
GROUP BY topic_id
ORDER BY topic.topic_start_time DESC
LIMIT $offset,$num

http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html
Group by is fastest when you have an index on the column being grouped on, and:
The query is over a single table.
The GROUP BY names only columns that form a leftmost prefix of the index and no other columns. (If, instead of GROUP BY, the query has a DISTINCT clause, all distinct attributes refer to columns that form a leftmost prefix of the index.) For example, if a table t1 has an index on (c1,c2,c3), loose index scan is applicable if the query has GROUP BY c1, c2,. It is not applicable if the query has GROUP BY c2, c3 (the columns are not a leftmost prefix) or GROUP BY c1, c2, c4 (c4 is not in the index).
The only aggregate functions used in the select list (if any) are MIN() and MAX(), and all of them refer to the same column. The column must be in the index and must follow the columns in the GROUP BY.
Any other parts of the index than those from the GROUP BY referenced in the query must be constants (that is, they must be referenced in equalities with constants), except for the argument of MIN() or MAX() functions.
For columns in the index, full column values must be indexed, not just a prefix. For example, with c1 VARCHAR(20), INDEX (c1(10)), the index cannot be used for loose index scan.

The general best practice would be to make sure the field you are grouping on has an index.
From the Reference Manual: Group by Optimization
The most general way to satisfy a
GROUP BY clause is to scan the whole
table and create a new temporary table
where all rows from each group are
consecutive, and then use this
temporary table to discover groups and
apply aggregate functions (if any). In
some cases, MySQL is able to do much
better than that and to avoid creation
of temporary tables by using index
access.
Make sure that every foreign key has a corresponding index.
Create covering indexes on the fields you retrieve
Creating an index on the field you are sorting bij wouldn't hurt either.

Make your where clause part of the join conditions.

Related

Optimizing and indexing query with random component

I have the following query in the code I've inherited:
SELECT a.row2, a.row3
FROM table1 a
JOIN table2 b ON a.row1 = b.row1
WHERE b.row2 IN (
SELECT id
FROM table3
WHERE id IN ($table3_ids)
)
ORDER BY RAND();
[a.row1 is the primary key for table1]
Several questions:
Is there a more efficient way to structure this query?
I already have an index in table1 on (row1, row2, row4); is it redundant to make a separate index for (row1, row2, row3), or should I just replace the former with an index on (row1, row2, row3, row4)?
From the opposite end, I already have an index in table2 on (row1, row2, row3); since it would seem I need an index in table2 for (row1, row2) to optimize this query, would it be redundant to include an index that simply excludes a single element from a different index in the same table?
This is where I'm unclear on how the query engine can know which index is appropriate; when it parses the query, does it first check for matching indices in the table?
Lastly (and probably most simply answered), I'm adding indices with this syntax:
ALTER TABLE table_name ADD KEY (row1, row2, row3);
After creating the index, I'm then manually renaming each index descriptively. Is it possible to include the name of the index in the command?
Many thanks!

This is your query:
SELECT a.row2, a.row3
FROM table1 a JOIN
table2 b
ON a.row1 = b.row1
WHERE b.row2 IN (SELECT id FROM table3 WHERE id IN ($table3_ids))
ORDER BY RAND();
I think the best indexes are: table2(row2, row1) and table1(row1, row2, row3), and table3(id). You can add row4 to the table1 index, but it doesn't make a difference. Also, it is really odd that you named your columns "row" -- for me it results in cognitive dissonance.
Actually, unless you have a typo in your query, you can leave out table3 and just do:
WHERE b.row2 IN ($table3_ids)
Note that in ($table3_ids) requires a string substitution. This cannot be parameterized. That introduces a danger of SQL injection.
If your result set is more than a few hundred, maybe a few thousand rows, then the order by will be significant. If this is the case, you might want to try a different approach to getting the results you want.

Some additions to Gordon's answer:
The ALTER TABLE reference shows an optional index_name in the syntax.
IN ( SELECT ... ) is grossly inefficient; turn it into a JOIN:
SELECT a.row2, a.row3
FROM table1 a
JOIN table2 b ON a.row1 = b.row1
JOIN table3 c ON b.row2 = c.id
WHERE c.id IN ($table3_ids) )
ORDER BY RAND();
or...
SELECT a.row2, a.row3
FROM table1 a
JOIN table2 b ON a.row1 = b.row1
WHERE b.row2 IN ($table3_ids) )
ORDER BY RAND();
(A possible reason needing c: You are filtering on missing ids in c?)
ORDER BY RAND() is costly. It essentially cannot be optimized unless you also have a LIMIT.

Joining two tables together with two foreign keys

I have a Description table which contains certain descriptions along with a unique ID. I have another table that contains two foreign keys to this table. So far i have the following query:
SELECT
Description.description AS Description,
Object.objID AS ID,
Description.description AS Location
FROM
Object
INNER JOIN
Description
ON
Object.objDescID=Description.descID
AND
Object.objLocID=Description.descID;
However this is not working, please can someone point me in the right direction?

If I understand right you want to join to the Description table twice for the same object. Give this a shot and see if it gets you what you're after:
SELECT
Object.objID AS ID,
od.description AS Description,
ld.description AS Location
FROM Object
INNER JOIN Description AS od
ON Object.objDescID=od.descID
INNER JOIN Description AS ld
ON Object.objLocID=ld.descID;
Edit: A word of advice, if you allow for null foreign keys you should use a LEFT JOIN instead of an INNER JOIN, that way if one of them is null it doesn't keep the entire record from showing.

Try Running This (might need minor adjustments):
SELECT
Description.description AS Description,
Object.objID AS ID,
Description.description AS Location
FROM
Object
INNER JOIN
Description AS Object.objDescID=Description.descID
INNER JOIN
Description AS Object.objLocID=Description.descID;

Looks like you need two references to the Description table. Each reference will be joined using one of the foreign key columns.
For example:
SELECT o.objID AS `ID`
, d.description AS `Description`
, l.description AS `Location`
FROM Object o
JOIN Description d
ON d.descID = o.objDescID
JOIN Description l
ON l.descID = o.objLocID
We assign the short alias d to the source we get the Description value from.
We assign the short alias l to the source we get the Location value from.
We reference columns from each table using the short alias, rather than the table name.
Essentially, think of the references to the Description table like it's two different tables, even though it's really the same table.
Note that we have to assign an alias to at least one of the references to Description, so that we can distinguish between them. (Otherwise, MySQL won't know which one we're talking about if we just said Description.description.)
Note that if the foreign key column objDescID or objLocID has a NULL value, or a matching value doesn't exist in the referenced table, the query won't return the row from Object.
To ensure you get a row from Object even when the matching values aren't found, you can use an OUTER join operation by including the LEFT keyword.
For example:
SELECT o.objID AS `ID`
, d.description AS `Description`
, l.description AS `Location`
FROM Object o
LEFT
JOIN Description d
ON d.descID = o.objDescID
LEFT
JOIN Description l
ON l.descID = o.objLocID
Note that only one alias is actually required, but I tend to assign short aliases to all row sources in a query. This makes the statement more decipherable, and really helps if I later need to add another reference to a table that is already used, or if I need to replace one of the table names with a different table name or an inline view (or subquery), I can leave the alias the same, and change just the rowsource. The other aliases don't make any difference in the actual execution of the statement, they are just there because I follow the same pattern for simple queries that I follow for more complex queries.

SQL SELECT statement - same column names

Imagine I have the following SELECT statement which has been oversimplified.
SELECT a.Name, b.Name FROM table a LEFT JOIN table b ON a.ID=b.TID
using php I run the following:
while ($result = mysql_fetch_array($results)) {
echo $result["Name"];
}
this will give me the result of b.Name. I am aware I can use a.Name AS aName, B.Name AS bName however this might sometimes complicate things where you have a long query and you use a.*. I tried using $result["a.Name"] but it does not work. I am aware this works $result[0] but again this is not always possible without complicating things.
Is there any other way I can show a.Name please?

simple answer : no.
long answer : the array index at PHP has to be unique. By this, the last similar name column will get the precedence.
If two or more columns of the result have the same field names, the last column will take precedence. To access the other column(s) of the same name, you must use the numeric index of the column or make an alias for the column. For aliased columns, you cannot access the contents with the original column name.
source
However, you can solve this by using aliases.
SELECT a.Name as aName, b.Name as bName FROM table a LEFT JOIN table b ON a.ID=b.TID
then you can access the names from both tables by using $result["aName"] and $result["bName"]

Based on your requirements, you could consider dividing your query in to two fetch statements. This would allow you to have the duplicate column names.
SELECT a.* FROM table a LEFT JOIN table b ON a.ID=b.TID
SELECT b.* FROM table b LEFT JOIN table a ON a.ID=b.TID

MySQL joining tables with join

I've been scratching my head at this problem all day and I simple just can't work it out. This is the first time I've attempted to try and use SQL Joining, while we do kinda get taught the basics I'm more into pushing a little more into the advanced stuff.
Basically I'm making my own forum, and I have two tables. f_topics (The threads) and f_groups (The forums, or categories). There is a relationship between topicBase in f_topics and groupID in f_groups, this shows which group each topic belongs to. Each topic has a unique ID called topicID and same for the groups, called groupID.
Basically, I'm trying to get all these columns into a single SELECT statement - The title of the topic, the date the topic was posted, the ID of the group the topic belongs in, and the name of that group. This is what I was trying to use, but the group always comes back as 1, even if the topic is in groupID 2:
$query=mysqli_query($link, "
SELECT `topicName`, `topicDate`, `groupName`, `groupID`
FROM `f_topics`
NATURAL JOIN `f_groups`
WHERE `f_topics`.`topicID`='$tid';
") or die("Failed to get topic detail E: ".mysqli_error());
var_dump(mysqli_fetch_assoc($query));
Sorry if this doesn't make much sense, and if my entire logic is completely wrong, if so could you suggest an alternate method?
Thanks for reading!

To join tables, you need to map the foreign keys. Assuming your groups table has an groupID field, this is how you'd join them:
SELECT `topicName`, `topicDate`, `groupName`, `groupID`
FROM `f_topics`
LEFT JOIN `f_groups`
ON `f_topics`.`groupID` = `f_groups`.`groupID`
WHERE`f_topics`.`topicID`='$tid';

So from what I gather there is a column in f_topics named "topicBase" which references the groupID column from the f_groups table.
Based on that assumption, you can perform either an INNER JOIN or a LEFT JOIN. INNER requires there be an entry in both tables while LEFT requires there only be data in f_topics.
SELECT
f_topics.topicName,
f_topics.topicDate
f_groups.groupName
f_groups.groupID
FROM
f_topics
INNER JOIN
f_groups
ON
f_topics.topicBase = f_groups.groupID
WHERE
f_topics.topicID = '$tid'

I recommend you avoid NATURAL JOIN.
Primarily because a working query can be broken by the addition of a new column in a referenced table, which matches a column name in the other referenced table.
Secondly, for any reader (reviewer) of the SQL, which columns are being matched to which columns is not clear, without a careful review of both tables. (And, if someone has added a column that has broken the query, it makes it even more difficult to figure out what the JOIN criteria used to be, before the column was added.
Instead, I recommend you specify the column names in a predicate in the ON clause.
It's also good practice to qualify all column references by table name, or preferably, a shorter table alias.
For simpler statements, I agree that this may look like unnecessary overhead. But once statements become more complicated, this pattern VASTLY improves the readability of the statement.
Absent the definitions of the two tables, I'm going to have to make assumptions, and I "guess" that there is a groupID column in both of those tables, and that is the only column that is named the same. But you specify that its the topicBase column in f_topics that matches groupID in f_groups. (And the NATURAL JOIN won't get you that.)
I think the resultset you want will be returned by this query:
SELECT t.`topicName`
, t.`topicDate`
, g.`groupName`
, g.`groupID`
FROM `f_topics` t
JOIN `f_groups` g
ON g.`groupID` = t.`topicBase`
WHERE t.`topicID`='$tid';
If its possible for the topicBase column to be NULL or to contain a value that does not match a f_groups.GroupID value, and you want that topic returned, with the columns from f_group returned as NULL (when there is no match), you can get that with an outer join.
To get that behavior, in the query above, add the LEFT keyword immediately before the JOIN keyword.

Mysql Query to check 3 tables for an existing row

What I want to do is to query three separate tables into one row which is identified by a unique reference. I don't really have full understanding of the Join clause as it seems to require some sort of related data from each table.
I know I can go about this the long way round, but can not afford to lose even a little efficiency. Any help would be greatly appreciated.
Table Structure
package_id int(8),
client_id int(8),
unique reference varchar (40)
Each of the tables have essentially the same structure. I just need to know how to query all three, for 1 row.

If you have few tables that are sharing the same or similar definition, you can use union or union all to treat them as one. This query will return rows from each table having requested reference. I've included OriginTable info in case your code will need to refer to original table for update or something else.
select 'TableA' OriginTable,
package_id,
client_id
from TableA
where reference = ?
union all
select 'TableB' OriginTable,
package_id,
client_id
from TableB
where reference = ?
union all
select 'TableC' OriginTable,
package_id,
client_id
from TableC
where reference = ?
You might extend select list with other columns, provided that they have the same data type, or are implicitly convertible to data type from first select.

Let's say you have 3 tables :
table1, table2 and table3 with structure
package_id int(8),
client_id int(8),
unique reference varchar (40)
Let's assume that column reference is unique key.
Then you can use this:
SELECT t1.exists_row ,t2.exists_row ,t3.exists_row FROM
(
(SELECT COUNT(1) as exists_row FROM table1 t1 WHERE
t1.reference = #reference ) t1,
(SELECT COUNT(1) as exists_row FROM table1 t2 WHERE
t2.reference = #reference ) t2,
(SELECT COUNT(1) as exists_row FROM table1 t3 WHERE
t3.reference = #reference ) t3
) a
;
Replace #reference with actual value of unique key
or when you provide output of
SHOW CREATE TABLE
I can rewrite SQL with actual query

It is entirely possible to create a join between tables using a where clause. In fact this is often what I do as I find it leads to clearer information of what you are actually doing, and if you don't get the results you expect you can debug it bit by bit.
That said however a join is certainly a lot quicker to write!
Please bear in mind I'm a bi rusty on SQL so I may have missed remembered, and I'm not going to include any code as you haven't said what DBMS you are using as they all have slightly different code.
The thing to remember is that the join functions on a column with the same data (and type) within it.
It is much easier if each table has the 'joining' field named the same, then it should be a matter of
join on <nameOfField>
However if you wish to use field that have different names in the different tables you will need to list the fully qualified names. ie tableName.FieldName
If you are having trouble with natural, inner and outer, left and right, you need to think of a venn diagram with the natural being the point of commonality between the tables. If you are using only 2 tables inner and outer are equivalent to left and right (with each table being a single circle in the venn diagram) and left and right being the order of the tables in your list in the main part of your select (the first being the left and the second being the right).
When you add a third table this is where you can select any of the cross over section using these keywords.
Again however I have always found it easier to do a primary select and create a temp table, then perform my next join using this temp table (so effectively only need to use natural or left and right again). Again I find this easier to debug.
The best thing is to experiment and see what you get in return. Without a diagram of your tables this is the best I can offer.
in brief...
nested selects where field = (select from table where field = )
and temp tables
are (I think) easier to debug... but do take more writting !
David.

array_of_tables[]; // contain name of each table
foreach(array_of_tables as $val)
{
$query="select * from `$val` where $condition "; // $conditon
$result=mysqli_query($connection,$query);
$result_row[]=mysqli_fetch_assoc($result); // if only one row going to return form each table
//check resulting array ,for your row
}

SELECT * FROM table1 t1 JOIN table2 t2 ON (t2.unique = t1.unique) JOIN table3 t3 ON (t3.unique = t1.unique) WHERE t1.unique = '?';
You could use a JOIN like this, assuming all three tables have the same unique column.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.