INNER JOIN with subquery with max and where clause in mysql - php

SELECT a.ts, b.barcodenumber, a.remarks, c.department
FROM documentlog a
INNER JOIN (select docid, max(logid) as logid from documentlog GROUP BY docid) d ON d.docid=a.docid AND d.logid=a.logid
INNER JOIN user c ON c.uid=a.user
INNER JOIN document b ON b.id=a.docid
WHERE c.department = 'PTO' AND b.end = 0
My problem is When I execute this query it's slow like 2sec+ execution but the data is only 9 , How can I speed up the execution of my query?
Old SS for EXPLAIN RESULT
UPDATED SS for EXPLAIN RESULT (Add INDEX logid,docid)

Check out your EXPLAIN result. Notice that MySQL does not use any kind of key when querying the documentlog table i.e., the documentlog table does not have a key defined on it. More than 2 million records are processed at this point in your query. This could be the most likely source of the slowness of your query.
Add an index on the docid, and logid fields in your documentlog table and check if it improves the queries' execution time.
Update!!
The output of the updated EXPLAIN query is saying that it is using a full table scan!! (i.e., type=ALL) to produce the output of the main outer query. Why? This is caused by the fact that there are no indices defined on the attributes used in the Where clause i.e., (department and end).
In general, if you want to speed up queries, then one has to make sure that appropriate indices are defined for the attributes used in the queries' WHERE condition.
By the way, you can learn more about the meaning of MySQL's EXPLAIN result by reading its documentation.

Related

Why my PHP MYSQL query not working/running after i enter this query?

I have two tables tableOne = 90K data and tableTwo = 100k data, i will look for the duplicate numbers on both tables with the given conditions and the matching must be 1:1 if multiple match are on the other table only one will be tagged as match (given that the data on both tables has match data).
I have this select statement below, but when i run it on my local xampp and even on CMD the screen freezes after i press enter then it takes hours before it returns an error out of memory. Hope you can help me with this.
SELECT rNum,
cDate,
cTime,
aNumber,
bNumber,
duration,
tag,
aNumber2,
bNumber2,
'hasMatch',
concatDate,
timeMinutes
FROM tableOne a
LEFT JOIN
tableTwo b ON a.aNumber2 = b.aNumber2
AND a.bNumber2 = b.bNumber2
WHERE a.hasMatch = 'valid'
AND (a.duration - b.duration) <= 3
AND (a.duration - b.duration) >= -3
AND TIMEDIFF(a.concatDate,b.concatDate) <= 3
AND TIMEDIFF(a.concatDate,b.concatDate) >= -3
Thank you In advance.
If you're doing 1:1 relationship with two tables then I think you should probably go with INNER JOIN rather than LEFT JOIN
Secondly, your query doesn't seem to be indexed properly. So, better would be using EXPLAIN SELECT ... to see the profile of SQL and create INDEXES for Filters.
in your SELECT you have aNumber2 and based on your join rule both table a and table b have aNumber2 column. it's a problem. if two table have a column with the same name, on select you should specify the table.
for example like this
SELECT a.aNumber2 as a_number2,....
in your query the same problem exists for other columns like duration and concatDate
another thing is you should use INNER JOIN in your case instead of LEFT JOIN.
if you final result have many rows(thousands), take them step by step... add LIMIT to your example and take 100 result each time.

How to retrieve all data from two table in mysql? [duplicate]

For simplicity, assume all relevant fields are NOT NULL.
You can do:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1, table2
WHERE
table1.foreignkey = table2.primarykey
AND (some other conditions)
Or else:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1 INNER JOIN table2
ON table1.foreignkey = table2.primarykey
WHERE
(some other conditions)
Do these two work on the same way in MySQL?
INNER JOIN is ANSI syntax that you should use.
It is generally considered more readable, especially when you join lots of tables.
It can also be easily replaced with an OUTER JOIN whenever a need arises.
The WHERE syntax is more relational model oriented.
A result of two tables JOINed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.
It's easier to see this with the WHERE syntax.
As for your example, in MySQL (and in SQL generally) these two queries are synonyms.
Also, note that MySQL also has a STRAIGHT_JOIN clause.
Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.
You cannot control this in MySQL using WHERE syntax.
Others have pointed out that INNER JOIN helps human readability, and that's a top priority, I agree.
Let me try to explain why the join syntax is more readable.
A basic SELECT query is this:
SELECT stuff
FROM tables
WHERE conditions
The SELECT clause tells us what we're getting back; the FROM clause tells us where we're getting it from, and the WHERE clause tells us which ones we're getting.
JOIN is a statement about the tables, how they are bound together (conceptually, actually, into a single table).
Any query elements that control the tables - where we're getting stuff from - semantically belong to the FROM clause (and of course, that's where JOIN elements go). Putting joining-elements into the WHERE clause conflates the which and the where-from, that's why the JOIN syntax is preferred.
Applying conditional statements in ON / WHERE
Here I have explained the logical query processing steps.
Reference: Inside Microsoft® SQL Server™ 2005 T-SQL Querying
Publisher: Microsoft Press
Pub Date: March 07, 2006
Print ISBN-10: 0-7356-2313-9
Print ISBN-13: 978-0-7356-2313-2
Pages: 640
Inside Microsoft® SQL Server™ 2005 T-SQL Querying
(8) SELECT (9) DISTINCT (11) TOP <top_specification> <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) WITH {CUBE | ROLLUP}
(7) HAVING <having_condition>
(10) ORDER BY <order_by_list>
The first noticeable aspect of SQL that is different than other programming languages is the order in which the code is processed. In most programming languages, the code is processed in the order in which it is written. In SQL, the first clause that is processed is the FROM clause, while the SELECT clause, which appears first, is processed almost last.
Each step generates a virtual table that is used as the input to the following step. These virtual tables are not available to the caller (client application or outer query). Only the table generated by the final step is returned to the caller. If a certain clause is not specified in a query, the corresponding step is simply skipped.
Brief Description of Logical Query Processing Phases
Don't worry too much if the description of the steps doesn't seem to make much sense for now. These are provided as a reference. Sections that come after the scenario example will cover the steps in much more detail.
FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.
ON: The ON filter is applied to VT1. Only rows for which the <join_condition> is TRUE are inserted to VT2.
OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.
WHERE: The WHERE filter is applied to VT3. Only rows for which the <where_condition> is TRUE are inserted to VT4.
GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.
CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.
HAVING: The HAVING filter is applied to VT6. Only groups for which the <having_condition> is TRUE are inserted to VT7.
SELECT: The SELECT list is processed, generating VT8.
DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.
ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).
TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.
Therefore, (INNER JOIN) ON will filter the data (the data count of VT will be reduced here itself) before applying the WHERE clause. The subsequent join conditions will be executed with filtered data which improves performance. After that, only the WHERE condition will apply filter conditions.
(Applying conditional statements in ON / WHERE will not make much difference in few cases. This depends on how many tables you have joined and the number of rows available in each join tables)
The implicit join ANSI syntax is older, less obvious, and not recommended.
In addition, the relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearranged by the optimizer.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
Implicit joins (which is what your first query is known as) become much much more confusing, hard to read, and hard to maintain once you need to start adding more tables to your query. Imagine doing that same query and type of join on four or five different tables ... it's a nightmare.
Using an explicit join (your second example) is much more readable and easy to maintain.
I'll also point out that using the older syntax is more subject to error. If you use inner joins without an ON clause, you will get a syntax error. If you use the older syntax and forget one of the join conditions in the where clause, you will get a cross join. The developers often fix this by adding the distinct keyword (rather than fixing the join because they still don't realize the join itself is broken) which may appear to cure the problem but will slow down the query considerably.
Additionally for maintenance if you have a cross join in the old syntax, how will the maintainer know if you meant to have one (there are situations where cross joins are needed) or if it was an accident that should be fixed?
Let me point you to this question to see why the implicit syntax is bad if you use left joins.
Sybase *= to Ansi Standard with 2 different outer tables for same inner table
Plus (personal rant here), the standard using the explicit joins is over 20 years old, which means implicit join syntax has been outdated for those 20 years. Would you write application code using a syntax that has been outdated for 20 years? Why do you want to write database code that is?
The SQL:2003 standard changed some precedence rules so a JOIN statement takes precedence over a "comma" join. This can actually change the results of your query depending on how it is setup. This cause some problems for some people when MySQL 5.0.12 switched to adhering to the standard.
So in your example, your queries would work the same. But if you added a third table:
SELECT ... FROM table1, table2 JOIN table3 ON ... WHERE ...
Prior to MySQL 5.0.12, table1 and table2 would be joined first, then table3. Now (5.0.12 and on), table2 and table3 are joined first, then table1. It doesn't always change the results, but it can and you may not even realize it.
I never use the "comma" syntax anymore, opting for your second example. It's a lot more readable anyway, the JOIN conditions are with the JOINs, not separated into a separate query section.
They have a different human-readable meaning.
However, depending on the query optimizer, they may have the same meaning to the machine.
You should always code to be readable.
That is to say, if this is a built-in relationship, use the explicit join. if you are matching on weakly related data, use the where clause.
I know you're talking about MySQL, but anyway:
In Oracle 9 explicit joins and implicit joins would generate different execution plans. AFAIK that has been solved in Oracle 10+: there's no such difference anymore.
If you are often programming dynamic stored procedures, you will fall in love with your second example (using where). If you have various input parameters and lots of morph mess, then that is the only way. Otherwise, they both will run the same query plan so there is definitely no obvious difference in classic queries.
ANSI join syntax is definitely more portable.
I'm going through an upgrade of Microsoft SQL Server, and I would also mention that the =* and *= syntax for outer joins in SQL Server is not supported (without compatibility mode) for 2005 SQL server and later.
I have two points for the implicit join (The second example):
Tell the database what you want, not what it should do.
You can write all tables in a clear list that is not cluttered by join conditions. Then you can much easier read what tables are all mentioned. The conditions come all in the WHERE part, where they are also all lined up one below the other. Using the JOIN keyword mixes up tables and conditions.

MySQL update join performance

I need to update several columns in one table, based on columns in another. To start with I am just updating one of them. I have tried 2 ways of doing this, which both work, but they are taking about 4 minutes using mySQL commands, and over 20 when run in php. Both tables are about 20,000 rows long.
My question is, is there a better or more efficient way of doing this?
Method 1:
UPDATE table_a,table_b
SET table_a.price = table_b.price
WHERE table_a.product_code=table_b.product_code
Method 2:
UPDATE table_a INNER JOIN table_b
ON table_a.product_code = table_b.product_code
SET table_a.price=table_b.price
I guess that these basically work in the same way, but I thought that the join would be more efficient. The product_code column is random text, albeit unique and every row matches one in the other table.
Anything else I can try?
Thanks
UPDATE: This was resolved by creating an index e.g.
CREATE UNIQUE INDEX index_code on table_a (product_code)
CREATE UNIQUE INDEX index_code on table_b (product_code)
If your queries are running slowly you'll have to examine the data that query is using.
Your query looks like this:
UPDATE table_a INNER JOIN table_b
ON table_a.product_code = table_b.product_code
SET table_a.price=table_b.price
In order to see where the delay is you can do
EXPLAIN SELECT a.price, b.price FROM table_b b
INNER JOIN table_a a ON (a.product_code = b.product_code)
This will tell you if indexes are being used, see the info on EXPLAIN and more info here.
In your case you don't have any indexes (possible keys = null) forcing MySQL to do a full table scan.
You should always do an explain select on your queries when slowness is an issue. You'll have to convert non-select queries to a select, but that's not difficult, just list all the changed fields in the select clause and copy join and where clauses over as is.

Mysql query exicution time is too slow

SELECT * FROM articles t LEFT OUTER JOIN category_type category ON (t.category_id=category.id)
WHERE (t.status = 6 AND t.publish_on <= '2014-02-14' AND t.id NOT IN (13112,9490,9386,6045,1581,1034,991,933,879,758) AND t.category_id IN (14)) ORDER BY t.id DESC LIMIT 7;
It take more then 1.5 second to execute this query.
Can you give me some idea ? How can I improve this query and minimum execution time ?
First thing => use where instead of inner join. Because where is faster than inner join query.
Second thing => use indexes for the frequently searched columns. As in your example you search on the basis of status, publish_on besides id as primary index.
If you are using mysql then you can try propose table structure option in the phpmyadmin which can help you to decide valid data types for your column names. This could help you to optimize your query processing.
query processing time depends on many things like: database server load, amount of data in the table and the data types used for the column names too.
why join it with category table?, the category table is not in the where clause nor in the select column clause, so why add it in the query?
oops, * was used, so it "is" in the category table
apologies

reading data via PDO - the same row is brought twice

I read rows from some mssql table via PHPs PDO.
Some rows, are brought twice, exactly same rows, with exactly the same id values
This happens to specific rows. Each time I run my import script, the issue happens on the very same rows. For example, after bringing some 16,000 rows correctly, one row, the same one each time, is brought twice.
The duplication occurs in a row. The line is brought, and the next fetch() request returns the very same row.
When I run:
select * from MY_TABLE where id='the problematic id'
only one row is returned, not two
Any ideas what (the hell) can go on here?
Thank you very much guys
edit:
The query that is being run:
select o.accountid, c.contactid, o.opportunityid, o.createdate, o.modifydate, o.createuser, o.modifyuser, o.description, o.projclosedate, o.notes, o.accountmanagerid
from sysdba.opportunity o
left join sysdba.opportunity_contact oc on o.opportunityid = oc.opportunityid and oc.salesrole = 'speaker' ";
left join sysdba.contact c on c.contactid = oc.contactid
where o.status <> 'Inactive'
order by o.opportunityid asc;
I think you need to join your contact table to your opportunity table. It seems that you might not have a 1 to 1 mapping between those tables the way you have it set up. See below:
--This should reference the "o" table but it doesn't.
left join sysdba.contact c on c.contactid = oc.contactid
If that's not the case then you should really be joining around the opportunity_contact table instead (put it as your 'from' table).

Categories