For simplicity, assume all relevant fields are NOT NULL.
You can do:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1, table2
WHERE
table1.foreignkey = table2.primarykey
AND (some other conditions)
Or else:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1 INNER JOIN table2
ON table1.foreignkey = table2.primarykey
WHERE
(some other conditions)
Do these two work on the same way in MySQL?
INNER JOIN is ANSI syntax that you should use.
It is generally considered more readable, especially when you join lots of tables.
It can also be easily replaced with an OUTER JOIN whenever a need arises.
The WHERE syntax is more relational model oriented.
A result of two tables JOINed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.
It's easier to see this with the WHERE syntax.
As for your example, in MySQL (and in SQL generally) these two queries are synonyms.
Also, note that MySQL also has a STRAIGHT_JOIN clause.
Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.
You cannot control this in MySQL using WHERE syntax.
Others have pointed out that INNER JOIN helps human readability, and that's a top priority, I agree.
Let me try to explain why the join syntax is more readable.
A basic SELECT query is this:
SELECT stuff
FROM tables
WHERE conditions
The SELECT clause tells us what we're getting back; the FROM clause tells us where we're getting it from, and the WHERE clause tells us which ones we're getting.
JOIN is a statement about the tables, how they are bound together (conceptually, actually, into a single table).
Any query elements that control the tables - where we're getting stuff from - semantically belong to the FROM clause (and of course, that's where JOIN elements go). Putting joining-elements into the WHERE clause conflates the which and the where-from, that's why the JOIN syntax is preferred.
Applying conditional statements in ON / WHERE
Here I have explained the logical query processing steps.
Reference: Inside Microsoft® SQL Server™ 2005 T-SQL Querying
Publisher: Microsoft Press
Pub Date: March 07, 2006
Print ISBN-10: 0-7356-2313-9
Print ISBN-13: 978-0-7356-2313-2
Pages: 640
Inside Microsoft® SQL Server™ 2005 T-SQL Querying
(8) SELECT (9) DISTINCT (11) TOP <top_specification> <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) WITH {CUBE | ROLLUP}
(7) HAVING <having_condition>
(10) ORDER BY <order_by_list>
The first noticeable aspect of SQL that is different than other programming languages is the order in which the code is processed. In most programming languages, the code is processed in the order in which it is written. In SQL, the first clause that is processed is the FROM clause, while the SELECT clause, which appears first, is processed almost last.
Each step generates a virtual table that is used as the input to the following step. These virtual tables are not available to the caller (client application or outer query). Only the table generated by the final step is returned to the caller. If a certain clause is not specified in a query, the corresponding step is simply skipped.
Brief Description of Logical Query Processing Phases
Don't worry too much if the description of the steps doesn't seem to make much sense for now. These are provided as a reference. Sections that come after the scenario example will cover the steps in much more detail.
FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.
ON: The ON filter is applied to VT1. Only rows for which the <join_condition> is TRUE are inserted to VT2.
OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.
WHERE: The WHERE filter is applied to VT3. Only rows for which the <where_condition> is TRUE are inserted to VT4.
GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.
CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.
HAVING: The HAVING filter is applied to VT6. Only groups for which the <having_condition> is TRUE are inserted to VT7.
SELECT: The SELECT list is processed, generating VT8.
DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.
ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).
TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.
Therefore, (INNER JOIN) ON will filter the data (the data count of VT will be reduced here itself) before applying the WHERE clause. The subsequent join conditions will be executed with filtered data which improves performance. After that, only the WHERE condition will apply filter conditions.
(Applying conditional statements in ON / WHERE will not make much difference in few cases. This depends on how many tables you have joined and the number of rows available in each join tables)
The implicit join ANSI syntax is older, less obvious, and not recommended.
In addition, the relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearranged by the optimizer.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
Implicit joins (which is what your first query is known as) become much much more confusing, hard to read, and hard to maintain once you need to start adding more tables to your query. Imagine doing that same query and type of join on four or five different tables ... it's a nightmare.
Using an explicit join (your second example) is much more readable and easy to maintain.
I'll also point out that using the older syntax is more subject to error. If you use inner joins without an ON clause, you will get a syntax error. If you use the older syntax and forget one of the join conditions in the where clause, you will get a cross join. The developers often fix this by adding the distinct keyword (rather than fixing the join because they still don't realize the join itself is broken) which may appear to cure the problem but will slow down the query considerably.
Additionally for maintenance if you have a cross join in the old syntax, how will the maintainer know if you meant to have one (there are situations where cross joins are needed) or if it was an accident that should be fixed?
Let me point you to this question to see why the implicit syntax is bad if you use left joins.
Sybase *= to Ansi Standard with 2 different outer tables for same inner table
Plus (personal rant here), the standard using the explicit joins is over 20 years old, which means implicit join syntax has been outdated for those 20 years. Would you write application code using a syntax that has been outdated for 20 years? Why do you want to write database code that is?
The SQL:2003 standard changed some precedence rules so a JOIN statement takes precedence over a "comma" join. This can actually change the results of your query depending on how it is setup. This cause some problems for some people when MySQL 5.0.12 switched to adhering to the standard.
So in your example, your queries would work the same. But if you added a third table:
SELECT ... FROM table1, table2 JOIN table3 ON ... WHERE ...
Prior to MySQL 5.0.12, table1 and table2 would be joined first, then table3. Now (5.0.12 and on), table2 and table3 are joined first, then table1. It doesn't always change the results, but it can and you may not even realize it.
I never use the "comma" syntax anymore, opting for your second example. It's a lot more readable anyway, the JOIN conditions are with the JOINs, not separated into a separate query section.
They have a different human-readable meaning.
However, depending on the query optimizer, they may have the same meaning to the machine.
You should always code to be readable.
That is to say, if this is a built-in relationship, use the explicit join. if you are matching on weakly related data, use the where clause.
I know you're talking about MySQL, but anyway:
In Oracle 9 explicit joins and implicit joins would generate different execution plans. AFAIK that has been solved in Oracle 10+: there's no such difference anymore.
If you are often programming dynamic stored procedures, you will fall in love with your second example (using where). If you have various input parameters and lots of morph mess, then that is the only way. Otherwise, they both will run the same query plan so there is definitely no obvious difference in classic queries.
ANSI join syntax is definitely more portable.
I'm going through an upgrade of Microsoft SQL Server, and I would also mention that the =* and *= syntax for outer joins in SQL Server is not supported (without compatibility mode) for 2005 SQL server and later.
I have two points for the implicit join (The second example):
Tell the database what you want, not what it should do.
You can write all tables in a clear list that is not cluttered by join conditions. Then you can much easier read what tables are all mentioned. The conditions come all in the WHERE part, where they are also all lined up one below the other. Using the JOIN keyword mixes up tables and conditions.
Related
I have a SQL query of this format:
SELECT * FROM table1 t1 LEFT JOIN table2 t2 ON t1.id = t2.id;
I would like, for any given SQL SELECT query unknown in advance to know which tables were used to run it. So I thought I would use the EXPLAIN SELECT statement for that.
My issue is that the EXPLAIN SELECT * FROM table1 t1 LEFT JOIN table2 t2 ON t1.id = t2.id; query returns "t1" and "t2" as table names. I need it to give me the original table names, so table1 and table2 respectively. Now, I understand that it is not possible according to this old report.
However, I need to make this work somehow. I don't really want to run some REGEX on the query (unless you have one in mind that will undoubtedly include all scenarios of how tables can be used in a query, no matter how unstandardized it is).
I'm ready to hear all the possibilities that you might have in mind, it does not have to use the EXPLAIN SELECT as long as I can get all my original table names that were used in an unknown SELECT query. I don't care about the rest of the information provided by EXPLAIN SELECT, I just need the table names.
In case you want to propose a solution that is outside MySQL's scope, I am using PHP as the main platform to execute these requests with PDO (however, the queries are executed directly, they are not prepared statements).
What you're asking for can get really complex, because a table reference in SQL can be a view, or a common table expression, or a derived table subquery.
Any of those may be a JOIN of multiple tables, or a UNION/INTERSECT/EXCEPT of multiple queries.
It can even have no base table at all if it's a subquery that selects a tuple, or a VALUES statement.
I don't think there's a way to do what you want with regular expressions unless you're satisfied with a very reduced subset of SQL queries. You'd need a full-blown SQL parser to track the base table(s) per table reference.
SELECT a.ts, b.barcodenumber, a.remarks, c.department
FROM documentlog a
INNER JOIN (select docid, max(logid) as logid from documentlog GROUP BY docid) d ON d.docid=a.docid AND d.logid=a.logid
INNER JOIN user c ON c.uid=a.user
INNER JOIN document b ON b.id=a.docid
WHERE c.department = 'PTO' AND b.end = 0
My problem is When I execute this query it's slow like 2sec+ execution but the data is only 9 , How can I speed up the execution of my query?
Old SS for EXPLAIN RESULT
UPDATED SS for EXPLAIN RESULT (Add INDEX logid,docid)
Check out your EXPLAIN result. Notice that MySQL does not use any kind of key when querying the documentlog table i.e., the documentlog table does not have a key defined on it. More than 2 million records are processed at this point in your query. This could be the most likely source of the slowness of your query.
Add an index on the docid, and logid fields in your documentlog table and check if it improves the queries' execution time.
Update!!
The output of the updated EXPLAIN query is saying that it is using a full table scan!! (i.e., type=ALL) to produce the output of the main outer query. Why? This is caused by the fact that there are no indices defined on the attributes used in the Where clause i.e., (department and end).
In general, if you want to speed up queries, then one has to make sure that appropriate indices are defined for the attributes used in the queries' WHERE condition.
By the way, you can learn more about the meaning of MySQL's EXPLAIN result by reading its documentation.
I've been scratching my head at this problem all day and I simple just can't work it out. This is the first time I've attempted to try and use SQL Joining, while we do kinda get taught the basics I'm more into pushing a little more into the advanced stuff.
Basically I'm making my own forum, and I have two tables. f_topics (The threads) and f_groups (The forums, or categories). There is a relationship between topicBase in f_topics and groupID in f_groups, this shows which group each topic belongs to. Each topic has a unique ID called topicID and same for the groups, called groupID.
Basically, I'm trying to get all these columns into a single SELECT statement - The title of the topic, the date the topic was posted, the ID of the group the topic belongs in, and the name of that group. This is what I was trying to use, but the group always comes back as 1, even if the topic is in groupID 2:
$query=mysqli_query($link, "
SELECT `topicName`, `topicDate`, `groupName`, `groupID`
FROM `f_topics`
NATURAL JOIN `f_groups`
WHERE `f_topics`.`topicID`='$tid';
") or die("Failed to get topic detail E: ".mysqli_error());
var_dump(mysqli_fetch_assoc($query));
Sorry if this doesn't make much sense, and if my entire logic is completely wrong, if so could you suggest an alternate method?
Thanks for reading!
To join tables, you need to map the foreign keys. Assuming your groups table has an groupID field, this is how you'd join them:
SELECT `topicName`, `topicDate`, `groupName`, `groupID`
FROM `f_topics`
LEFT JOIN `f_groups`
ON `f_topics`.`groupID` = `f_groups`.`groupID`
WHERE`f_topics`.`topicID`='$tid';
So from what I gather there is a column in f_topics named "topicBase" which references the groupID column from the f_groups table.
Based on that assumption, you can perform either an INNER JOIN or a LEFT JOIN. INNER requires there be an entry in both tables while LEFT requires there only be data in f_topics.
SELECT
f_topics.topicName,
f_topics.topicDate
f_groups.groupName
f_groups.groupID
FROM
f_topics
INNER JOIN
f_groups
ON
f_topics.topicBase = f_groups.groupID
WHERE
f_topics.topicID = '$tid'
I recommend you avoid NATURAL JOIN.
Primarily because a working query can be broken by the addition of a new column in a referenced table, which matches a column name in the other referenced table.
Secondly, for any reader (reviewer) of the SQL, which columns are being matched to which columns is not clear, without a careful review of both tables. (And, if someone has added a column that has broken the query, it makes it even more difficult to figure out what the JOIN criteria used to be, before the column was added.
Instead, I recommend you specify the column names in a predicate in the ON clause.
It's also good practice to qualify all column references by table name, or preferably, a shorter table alias.
For simpler statements, I agree that this may look like unnecessary overhead. But once statements become more complicated, this pattern VASTLY improves the readability of the statement.
Absent the definitions of the two tables, I'm going to have to make assumptions, and I "guess" that there is a groupID column in both of those tables, and that is the only column that is named the same. But you specify that its the topicBase column in f_topics that matches groupID in f_groups. (And the NATURAL JOIN won't get you that.)
I think the resultset you want will be returned by this query:
SELECT t.`topicName`
, t.`topicDate`
, g.`groupName`
, g.`groupID`
FROM `f_topics` t
JOIN `f_groups` g
ON g.`groupID` = t.`topicBase`
WHERE t.`topicID`='$tid';
If its possible for the topicBase column to be NULL or to contain a value that does not match a f_groups.GroupID value, and you want that topic returned, with the columns from f_group returned as NULL (when there is no match), you can get that with an outer join.
To get that behavior, in the query above, add the LEFT keyword immediately before the JOIN keyword.
I am trying to create a search functionality where users would type a word or key phrase and then information is displayed.
I was thinking of using the LEFT JOIN to add all the table i need to be searchable,someone has told me about UNION and I have a hunch that it may be slower than JOIN
so
$query = '
SELECT *
FROM t1
LEFT JOIN t2
ON t2.content = "blabla"
LEFT JOIN t3
ON t3.content = "blabla"
[...]
WHERE t1.content = "blabla"
';
Is the above a good practice or is there a better approach i should be looking into ?
Send me on the right path for this :) also argue why its wrong, argue why you think your approach is better so it will help me and other understand this:
In general, it's a bad idea to play hunches to "guess" what the performance of an SQL engine will be like. There is very sophisticated optimization happening in there which takes into account the size of the tables, the availability of indexes, the cardinality of indexes, and so on.
In this example, LEFT JOIN is wrong because you're producing a semi-cartesian JOIN. Basically, there will be a lot more rows in your result set than you think. That's because each matching row in t1 will be joined with each matching row in t2. If ten rows match in t1 and three in t2, you will not get ten results but thirty.
Even if only one row is guaranteed to match from each table (eliminating the cartesian join problem) it's clear that the LEFT JOIN solution will give you a dataset that's very hard to work with. That's because the content columns from each of the tables you JOIN will be separate columns in the result set. You'll have to examine each of the columns to figure out which table matched.
In this case, UNION is a better solution.
Also, please note:
Use of "*" in SELECT is generally not a good idea. It reduces performance (because all columns must be assembled in the result set) and in a case like this you lose the opportunity to ALIAS each of the content columns, making the result set harder to work with.
This is a very novel use of LEFT JOIN. Normally, it's used to associate rows from two different tables. In this case you're using it to produce three separate result sets "side-by-side". Most SQL programmers will have to look at this statement cross-eyed for a while to figure out what your intent was.
I'm making a query with Doctrine that contains lots of joins. Some are hasOne relationships, and some are hasMany.
I think CakePHP makes separate queries for each hasMany relationship, but Doctrine seems to make one huge query. Both can hydrate the data and return a nice array into your php script, but they seem to make different sets of queries to do so. With Doctrine, as soon as your query contains several hasMany joins, performance can become pretty terrible.
With CakePHP, the default is to split the query, but I can force it to join (http://book.cakephp.org/view/872/Joining-tables). Is there a way to do the reverse in Doctrine: to force it to split the hasMany joins into different queries? I've tried the docs and API but not found anything yet.
Wrong solution: querying with many joins at once should always be faster than using fragmented queries in multiple SQL statements, provided that your query is correct.
If your query is terribly slow as soon as you add n:n or 1:n joins, in Doctrine, there are several causes for it.
One of the most frequent mistakes in queries with multiple joins, is the use of the LEFT JOIN + WHERE construction where you could use INNER JOIN with ON. Consider this DQL example:
SELECT a.*, c.*
FROM Article a
LEFT JOIN a.Buyer b
LEFT JOIN b.Creditcard c
LEFT JOIN c.digitalinfo d
WHERE b.id = 2 AND d.info LIKE 'TEST%'
This is a very slow query if all tables have 10000 records. It will first join the entire table b with table a, resulting in 10000 ^ 2 rows, whereas in your WHERE clause you throw away pretty much 99,9% of them all.
SELECT a.*, c.*
FROM Article a
INNER JOIN a.Buyer b WITH b.id=2
LEFT JOIN b.Creditcard c
INNER JOIN c.Digitalinfo d WITH d.info LIKE 'TEST%'
Here, the INNER JOIN a.Buyer b does not end up with 100 000 000 rows, but since it uses an extended ON clause (Doctrine calles this WITH), it will only leave a small set. Therefore the other two joins will go lightning fast compared to what they performed like in the other statement.
Also, make sure you ALWAYS have indices
On columns that you search on. (If you search on full name, as in FirstName+' '+LastName, create an index on that sequence!)
On columns that you do specific joins on
On foreign keys and the fields they have the reference set to.
If you want to know what your database is doing behind the scenes, you could for example in MySQL type EXPLAIN followed by your query, and it tells you exactly why it is taking so long.
Doctrine lets you join more than one table, traversing more than one oneToMany relationship. That results in a query with an exponential (on number of table joined) number of records.
The hydration process than refines the result set and make a tree from your bidimensional result set (table).
That's reasonable for small result set and few table joined.
If you have relevant number of table and records, you must do separate queries.
I faced this problem too. I had a query that took 0.0060 seconds to execute in it's raw form but for which the array hydration process took 8 seconds. The query was returning 2180 rows due to the multiple left joins (this was a single entity with all it's relations).
Following #Elvis's solution I dropped the hydration time to 0.3 seconds. What i did was simply split the query in two separate queries, ending up with the first one having 60 records and the other 30 records.