I am building a web application for a customer and I have a problem with a particular SQL query.
The query is:
select order_header.order_no,
order_header.purchase_order_no,
order_header.entry_date,
order_header.delivery_date,
order_totals.total_quantity
from order_header,
order_totals
where order_header.order_no = order_totals.order_no
I have done some troubleshooting and this:
where order_header.order_no = order_totals.order_no
is the problem. The SQL query with this line takes 35 seconds (causes DataTables to even time out at times) and without it it is instant.
So, I know the problem but I'm not a DBA so don't know the solution.
It's not my database, so the solution I need to send to the DBA to sort so I can continue with my job. Something like
"Hey, would you mind doing A on B table so that C speeds up?"
I just don't know what actually needs to be done!
First add an index on both order_header.order_no and order_totals.order_no and check that both the columns are of the same type.
For other optimizations we should talk about the data.
Don't forget to update the statistics
Ask your DB to
add an index on the order_no column in both order_header and order_total
That should help with your current problem. Also, look up JOIN and change your query to use the JOIN syntax.
select order_header.order_no,
order_header.purchase_order_no,
order_header.entry_date,
order_header.delivery_date,
order_totals.total_quantity
from order_header
join order_totals ON order_header.order_no = order_totals.order_no
Related
I've been working on this for some hours now and it's getting tiring. I want to get users posts from people I follow and myself, just like twitter does.
So there's a table users_posts that has all users' posts with column user_id to determine who made the post.
Another table users_followers that contains all followers ie. user_id follows following_id
So here's my query:
User ID = 2271
SELECT users_followers.following_id AS following_id, users_posts.id
FROM users_followers, users_posts
WHERE users_posts.user_id = users_followers.following_id
AND (users_followers.user_id =2271)
This query works but the problem is, it's kinda slow. Is there a faster way to do this?
Also, as you can see, I can only get posts from those I follow, and not myself.
Any help?
If I'm understanding your tables properly, I would do this with an explicit JOIN. I'm not sure how much that would speed things up versus the implicit JOIN you're using though.
SELECT following_id, p.id
FROM users_followers f
LEFT JOIN users_posts p
ON (p.user_id = f.following_id)
WHERE f.user_id = 2271
Chances are, adding an index or two to your tables would help.
Using the MySQL EXPLAIN command (just put it in front of your SELECT query) will show the indexes, temporary tables, and other resources that are used for a query. It should give you an idea when one query is faster or more efficient than another.
Your query is fine as written, provided you have properly indexed your tables. At a minimum you need an index on users_posts.user_id and users_followers.following_id.
A query can also be slowed by large numbers of records, even when it is fully indexed. In that case, I find phpmyadmin to be an invaluable tool. From the server page (maybe localhost) select the Status tab to see a wealth of information about how your mysql server is performing and suggestions for how to improve it.
I am working on an timesheet application, and writing a PHP code to fetch all the timesheets till date. This is the query that I have written to fetch the timesheets -
SELECT a.accnt_name, u.username, DATE_FORMAT(t.in_time, '%H:%i') inTime, DATE_FORMAT(t.out_time, '%H:%i') outTime, DATE_FORMAT(t.work_time, '%H:%i') workTime, w.wrktyp_name, t.remarks, DATE_FORMAT(t.tmsht_date, '%d-%b-%Y') tmshtDate, wl.loctn_name, s.serv_name, t.status_code, t.conv_kms convkms, t.conv_amount convamount FROM timesheets t, accounts a, services s, worktypes w, work_location wl, users WHERE a.accnt_code=t.accnt_code and w.wrktyp_code=t.wrktyp_code and wl.loctn_code=t.loctn_code and s.serv_code=t.serv_code and t.usr_code = u. ORDER BY tmsht_date desc
The where clause contains the clauses to get the actual values of respective codes from respective tables.
The issue is that this query is taking a lot of time to execute and the application crashes at the end of few minutes.
I ran this query in the phpmyadmin, there it works without any issues.
Need help in understanding what might be the cause behind the slowness in the execution.
Use EXPLAIN to see the execution plan for the query. Make sure MySQL has suitable indexes available, and is using those indexes.
The query text seems to be missing the name of a column here...
t.usr_code = u. ORDER
^^^
We can "guess" that's supposed to be u.usr_code, but that's just a guess.
How many rows are supposed to be returned? How large is the resultset?
Is your client attempting to "store" all of the rows in memory, and crashing because it runs out of memory?
If so, I recommend you avoid doing that, and fetch the rows as you need them.
Or, consider adding some additional predicates in the WHERE clause to return just the rows you need, rather than all the rows in the table.
It's 2015. Time to ditch the old-school comma syntax for join operation, and use JOIN keyword instead, and move join predicates from the WHERE clause to the ON clause. And format it. The database doesn't care, but it will make it easier on the poor soul that needs to decipher your SQL statement.
SELECT a.accnt_name
, u.username
, DATE_FORMAT(t.in_time ,'%H:%i') AS inTime
, DATE_FORMAT(t.out_time ,'%H:%i') AS outTime
, DATE_FORMAT(t.work_time,'%H:%i') AS workTime
, w.wrktyp_name
, t.remarks
, DATE_FORMAT(t.tmsht_date, '%d-%b-%Y') AS tmshtDate
, wl.loctn_name
, s.serv_name
, t.status_code
, t.conv_kms AS convkms
, t.conv_amount AS convamount
FROM timesheets t
JOIN accounts a
ON a.accnt_code = t.accnt_code
JOIN services s
ON s.serv_code = t.serv_code
JOIN worktypes w
ON w.wrktyp_code = t.wrktyp_code
JOIN work_location wl
ON wl.loctn_code = t.loctn_code
JOIN users
ON u.usr_code = t.usr_code
ORDER BY t.tmsht_date DESC
Ordering on the formatted date column is very odd. Much more likely you want results returned in "date" order, not in the string order with month and day before the year. (Do you really want to sort on the day value first, before the year?)
FOLLOWUP
If this same exact query complete quickly, with the entire resultset (of approx 720 rows) from a different client (same database, same user), then the issue is likely something other than this SQL statement.
We would not expect the execution of the SQL statement to cause PHP to "crash".
If you are storing the entire resultset (for example, using mysqli store_result), you need to have sufficient memory for that. But the thirteen expressions in the select list all look relatively short (formatted dates, names and codes), and we wouldn't expect "remarks" would be over a couple of KB.
For debugging this, as others have suggested, try adding a LIMIT clause on the query, e.g. LIMIT 1 and observe the behavior.
Alternatively, use a dummy query for testing; use a query that is guaranteed to return specific values and a specific number of rows.
SELECT 'morpheus' AS accnt_name
, 'trinity' AS username
, '01:23' AS inTime
, '04:56' AS outTime
, '00:45' AS workTime
, 'neo' AS wrktyp_name
, 'yada yada yada' AS remarks
, '27-May-2015' AS tmshtDate
, 'zion' AS loctn_name
, 'nebuchadnezzar' AS serv_name
, '' AS status_code
, '123' AS convkms
, '5678' AS convamount
I suspect that the query is not the root cause of the behavior you are observing. I suspect The problem is somewhere else in the code.
How to debug small programs http://ericlippert.com/2014/03/05/how-to-debug-small-programs/
phpadmin automatically adds LIMIT to the query, that's why you got fast results.
Check how many rows are in table
Run your query with limit
First of all: modify you query so that it looks like the one given by Spencer
Do you get an error message when your application 'crashes' or does it just stop?
You could try:
ini_set('max_execution_time', 0);
in your php code. This sets the maximum execution time to unlimited. So if there are no errors, your script should execute to the end. So you can see if your query gets the desired results.
Also just as a test end your query with
LIMIT 10
This should greatly speed up your query as it will only take the first ten results.
You can later change this value to one better suited for your needs. Unless you absolutely need the complete result set, I suggest you always use LIMIT in your queries.
I am running a select * from table order by date desc query using php on a mysql db server, where the table has a lot of records, which slows down the response time.
So, is there any way to speed up the response. If indexing is the answer, what all columns should I make indexes.
An index speeds up searching when you have a WHERE clause or do a JOIN with fields you have indexed. In your case you don't do that: You select all entries in the table. So using an index won't help you.
Are you sure you need all of the data in that table? When you later filter, search or aggregate this data in PHP, you should look into ways to do that in SQL so that the database sends less data to PHP.
you need to use caching system.
the best i know Memcache It's really great to speed up your application and it's not using database at all.
Simple answer: you can't speed anything up using software.
Reason: you're selecting entire contents of a table and you said it's a large table.
What you could do is cache the data, but not using Memcache because it's got a limit on how much data it can cache (1 MB per key), so if your data exceeds that - good luck using Memcache to cache a huge result set without coming up with an efficient scheme of maintaining keys and values.
Indexing won't help because you haven't got a WHERE clause, what could happen is that you can speed up the order by clause slightly. Use EXPLAIN EXTENDED before your query to see how much time is being spent in transmitting the data over the network and how much time is being spent in retrieving and sorting the data from the query.
If your application requires a lot of data in order for it to work, then you have these options:
Get a better server that can push the data faster
Redesign your application because if it requires so much data in order to run, it might not be designed with efficiency in mind
Optimizing Query is a big topic and beyond the scope this question
here are some highlight that will boost you select statement
Use proper Index
Limit the number records
use the column name that you require (instead writing select * from table use select col1, col2 from table)
to limit query for large offset is little tricky in mysql
this select statement for large offset will be slow because it have to process large set of data
SELECT * FROM table order by whatever LIMIT m, n;
to optimize this query here is simple solution
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever
I have a query that works, but it's taking at least 3 seconds to run so I think it can probably be faster. It's used to populate a list of new threads and show how many unread posts there are in each thread. I generate the query string before throwing it into $db->query_read(). In order to only grab results from valid forums, $ids is string with up to 50 values separated by commas.
The userthreadviews table has existed for 1 week and there are roughly 9,500 rows in it. I'm not sure if I need to set up a cron job to regularly clear out thread views more than a week old, or if I will be fine letting it grow.
Here's the query as it currently stands:
SELECT
`thread`.`title` AS 'r_title',
`thread`.`threadid` AS 'r_threadid',
`thread`.`forumid` AS 'r_forumid',
`thread`.`lastposter` AS 'r_lastposter',
`thread`.`lastposterid` AS 'r_lastposterid',
`forum`.`title` AS 'f_title',
`thread`.`replycount` AS 'r_replycount',
`thread`.`lastpost` AS 'r_lastpost',
`userthreadviews`.`replycount` AS 'u_replycount',
`userthreadviews`.`id` AS 'u_id',
`thread`.`postusername` AS 'r_postusername',
`thread`.`postuserid` AS 'r_postuserid'
FROM
`thread`
INNER JOIN
`forum`
ON (`thread`.`forumid` = `forum`.`forumid`)
LEFT JOIN
(`userthreadviews`)
ON (`thread`.`threadid` = `userthreadviews`.`threadid`
AND `userthreadviews`.`userid`=$userid)
WHERE
`thread`.`forumid` IN($ids)
AND `thread`.`visible`=1
AND `thread`.`lastpost`> time() - 604800
ORDER BY `thread`.`lastpost` DESC LIMIT 0, 30
An alternate query that joins the post table (to only show threads where user has posted) is actually twice as fast, so I think there's got to be something in here that could be changed to speed it up. Could someone provide some advice?
Edit: Sorry, I had put the EXPLAIN in front of the alternate query. Here is the correct output:
As Requested, here is the output generated by EXPLAIN SELECT:
Have a look at the mysql explain statement. It gives you a execution plan of your query.
Once you know the plan, you can check if you have got a index on the fields involved in the plan. If not, create them.
Perhaps the plan reveals details about how the query can be written in another way, such that the query will be more optimized.
To have no indexes on joins / where (used key = NULL on explain), this is the reason why your queries are slow. You should index them in such a way :
CREATE INDEX thread_forumid_index ON thread(forumid);
CREATE INDEX userthreadviews_forumid_index ON userthreadviews(forumid);
Documentation here
Try to index the table forumid if it is not indexed
Suggestions:
move the conditions from the WHERE clause to the JOIN clause
put the JOIN with the conditions before the other JOIN
make sure you have proper indexes and that they are being used in the query (create the ones you'll need... too much indexes can be as bad as too few)
Here is my suggestion for the query:
SELECT
`thread`.`title` AS 'r_title',
`thread`.`threadid` AS 'r_threadid',
`thread`.`forumid` AS 'r_forumid',
`thread`.`lastposter` AS 'r_lastposter',
`thread`.`lastposterid` AS 'r_lastposterid',
`forum`.`title` AS 'f_title',
`thread`.`replycount` AS 'r_replycount',
`thread`.`lastpost` AS 'r_lastpost',
`userthreadviews`.`replycount` AS 'u_replycount',
`userthreadviews`.`id` AS 'u_id',
`thread`.`postusername` AS 'r_postusername',
`thread`.`postuserid` AS 'r_postuserid'
FROM
`thread`
INNER JOIN (`forum`)
ON ((`thread`.`visible` = 1)
AND (`thread`.`lastpost` > $time)
AND (`thread`.`forumid` IN ($ids))
AND (`thread`.`forumid` = `forum`.`forumid`))
LEFT JOIN (`userthreadviews`)
ON ((`thread`.`threadid` = `userthreadviews`.`threadid`)
AND (`userthreadviews`.`userid` = $userid))
ORDER BY
`thread`.`lastpost` DESC
LIMIT
0, 30
These are good candidates to be indexed:
- `forum`.`forumid`
- `userthreadviews`.`threadid`
- `userthreadviews`.`userid`
- `thread`.`forumid`
- `thread`.`threadid`
- `thread`.`visible`
- `thread`.`lastpost`
It seems you already have lots of indexes... so, make sure you keep the ones you really need and remove the useless ones.
I have a page that is taking 37 seconds to load. While it is loading it pegs MySQL's CPU usage through the roof. I did not write the code for this page and it is rather convoluted so the reason for the bottleneck is not readily apparent to me.
I profiled it (using kcachegrind) and find that the bulk of the time on the page is spent doing MySQL queries (90% of the time is spent in 25 different mysql_query calls).
The queries take the form of the following with the tag_id changing on each of the 25 different calls:
SELECT * FROM tbl_news WHERE news_id
IN (select news_id from
tbl_tag_relations WHERE tag_id = 20)
Each query is taking around 0.8 seconds to complete with a few longer delays thrown in for good measure... thus the 37 seconds to completely load the page.
My question is, is it the way the query is formatted with that nested select that is causing the problem? Or could it be any one of a million other things? Any advice on how to approach tackling this slowness is appreciated.
Running EXPLAIN on the query gives me this (but I'm not clear on the impact of these results... the NULL on primary key looks like it would be bad, yes? The number of results returned seems high to me as well as only a handful of results are returned in the end):
1 PRIMARY tbl_news ALL NULL NULL NULL NULL 1318 Using where
2 DEPENDENT SUBQUERY tbl_tag_relations ref FK_tbl_tag_tags_1 FK_tbl_tag_tags_1 4 const 179 Using where
I'e addressed this point in Database Development Mistakes Made by AppDevelopers. Basically, favour joins to aggregation. IN isn't aggregation as such but the same principle applies. A good optimize will make these two queries equivalent in performance:
SELECT * FROM tbl_news WHERE news_id
IN (select news_id from
tbl_tag_relations WHERE tag_id = 20)
and
SELECT tn.*
FROM tbl_news tn
JOIN tbl_tag_relations ttr ON ttr.news_id = tn.news_id
WHERE ttr.tag_id = 20
as I believe Oracle and SQL Server both do but MySQL doesn't. The second version is basically instantaneous. With hundreds of thousands of rows I did a test on my machine and got the first version to sub-second performance by adding appropriate indexes. The join version with indexes is basically instantaneous but even without indexes performs OK.
By the way, the above syntax I use is the one you should prefer for doing joins. It's clearer than putting them in the WHERE clause (as others have suggested) and the above can do certain things in an ANSI SQL way with left outer joins that WHERE conditions can't.
So I would add indexes on the following:
tbl_news (news_id)
tbl_tag_relations (news_id)
tbl_tag_relations (tag_id)
and the query will execute almost instantaneously.
Lastly, don't use * to select all the columns you want. Name them explicitly. You'll get into less trouble as you add columns later.
The SQL Query itself is definitely your bottleneck. The query has a sub-query in it, which is the IN(...) portion of the code. This is essentially running two queries at once. You can likely halve (or more!) your SQL times with a JOIN (similar to what d03boy mentions above) or a more targeted SQL query. An example might be:
SELECT *
FROM tbl_news, tbl_tag_relations
WHERE tbl_tag_relations.tag_id = 20 AND
tbl_news.news_id = tbl_tag_relations.news_id
To help SQL run faster you also want to try to avoid using SELECT *, and only select the information you need; also put a limiting statement at the end. eg:
SELECT news_title, news_body
...
LIMIT 5;
You also will want to look into the database schema itself. Make sure you are indexing all of the commonly referred to columns so that the queries will run faster. In this case, you probably want to check your news_id and tag_id fields.
Finally, you will want to take a look at the PHP code and see if you can make one single all-encompassing SQL query instead of iterating through several seperate queries. If you post more code we can help with that, and it will probably be the single greatest time savings for your posted problem. :)
If I understand correctly, this is just listing the news stories for a specific set of tags.
First of all, you really shouldn't
ever SELECT *
Second, this can probably be
accomplished within a single query,
thus reducing the overhead cost of
multiple queries. It seems like it
is getting fairly trivial data so
it could be retrieved within a
single call instead of 20.
A better approach to using IN might be to use a JOIN with a WHERE condition instead. When using an IN it will basically be a lot of OR statements.
Your tbl_tag_relations should definitely have an index on tag_id
select *
from tbl_news, tbl_tag_relations
where
tbl_tag_relations.tag_id = 20 and
tbl_news.news_id = tbl_tag_relations.news_id
limit 20
I think this gives the same results, but I'm not 100% sure. Sometimes simply limiting the results helps.
Unfortunately MySQL doesn't do very well with uncorrelated subqueries like your case shows. The plan is basically saying that for every row on the outer query, the inner query will be performed. This will get out of hand quickly. Rewriting as a plain old join as others have mentioned will work around the problem but may then cause the undesired affect of duplicate rows.
For instance the original query would return 1 row for each qualifying row in the tbl_news table but this query:
SELECT news_id, name, blah
FROM tbl_news n
JOIN tbl_tag_relations r ON r.news_id = n.news_id
WHERE r.tag_id IN (20,21,22)
would return 1 row for each matching tag. You could stick DISTINCT on there which should only have a minimal performance impact depending on the size of the dataset.
Not to troll too badly, but most other databases (PostgreSQL, Firebird, Microsoft, Oracle, DB2, etc) would handle the original query as an efficient semi-join. Personally I find the subquery syntax to be much more readable and easier to write, especially for larger queries.