I have a query that works, but it's taking at least 3 seconds to run so I think it can probably be faster. It's used to populate a list of new threads and show how many unread posts there are in each thread. I generate the query string before throwing it into $db->query_read(). In order to only grab results from valid forums, $ids is string with up to 50 values separated by commas.
The userthreadviews table has existed for 1 week and there are roughly 9,500 rows in it. I'm not sure if I need to set up a cron job to regularly clear out thread views more than a week old, or if I will be fine letting it grow.
Here's the query as it currently stands:
SELECT
`thread`.`title` AS 'r_title',
`thread`.`threadid` AS 'r_threadid',
`thread`.`forumid` AS 'r_forumid',
`thread`.`lastposter` AS 'r_lastposter',
`thread`.`lastposterid` AS 'r_lastposterid',
`forum`.`title` AS 'f_title',
`thread`.`replycount` AS 'r_replycount',
`thread`.`lastpost` AS 'r_lastpost',
`userthreadviews`.`replycount` AS 'u_replycount',
`userthreadviews`.`id` AS 'u_id',
`thread`.`postusername` AS 'r_postusername',
`thread`.`postuserid` AS 'r_postuserid'
FROM
`thread`
INNER JOIN
`forum`
ON (`thread`.`forumid` = `forum`.`forumid`)
LEFT JOIN
(`userthreadviews`)
ON (`thread`.`threadid` = `userthreadviews`.`threadid`
AND `userthreadviews`.`userid`=$userid)
WHERE
`thread`.`forumid` IN($ids)
AND `thread`.`visible`=1
AND `thread`.`lastpost`> time() - 604800
ORDER BY `thread`.`lastpost` DESC LIMIT 0, 30
An alternate query that joins the post table (to only show threads where user has posted) is actually twice as fast, so I think there's got to be something in here that could be changed to speed it up. Could someone provide some advice?
Edit: Sorry, I had put the EXPLAIN in front of the alternate query. Here is the correct output:
As Requested, here is the output generated by EXPLAIN SELECT:
Have a look at the mysql explain statement. It gives you a execution plan of your query.
Once you know the plan, you can check if you have got a index on the fields involved in the plan. If not, create them.
Perhaps the plan reveals details about how the query can be written in another way, such that the query will be more optimized.
To have no indexes on joins / where (used key = NULL on explain), this is the reason why your queries are slow. You should index them in such a way :
CREATE INDEX thread_forumid_index ON thread(forumid);
CREATE INDEX userthreadviews_forumid_index ON userthreadviews(forumid);
Documentation here
Try to index the table forumid if it is not indexed
Suggestions:
move the conditions from the WHERE clause to the JOIN clause
put the JOIN with the conditions before the other JOIN
make sure you have proper indexes and that they are being used in the query (create the ones you'll need... too much indexes can be as bad as too few)
Here is my suggestion for the query:
SELECT
`thread`.`title` AS 'r_title',
`thread`.`threadid` AS 'r_threadid',
`thread`.`forumid` AS 'r_forumid',
`thread`.`lastposter` AS 'r_lastposter',
`thread`.`lastposterid` AS 'r_lastposterid',
`forum`.`title` AS 'f_title',
`thread`.`replycount` AS 'r_replycount',
`thread`.`lastpost` AS 'r_lastpost',
`userthreadviews`.`replycount` AS 'u_replycount',
`userthreadviews`.`id` AS 'u_id',
`thread`.`postusername` AS 'r_postusername',
`thread`.`postuserid` AS 'r_postuserid'
FROM
`thread`
INNER JOIN (`forum`)
ON ((`thread`.`visible` = 1)
AND (`thread`.`lastpost` > $time)
AND (`thread`.`forumid` IN ($ids))
AND (`thread`.`forumid` = `forum`.`forumid`))
LEFT JOIN (`userthreadviews`)
ON ((`thread`.`threadid` = `userthreadviews`.`threadid`)
AND (`userthreadviews`.`userid` = $userid))
ORDER BY
`thread`.`lastpost` DESC
LIMIT
0, 30
These are good candidates to be indexed:
- `forum`.`forumid`
- `userthreadviews`.`threadid`
- `userthreadviews`.`userid`
- `thread`.`forumid`
- `thread`.`threadid`
- `thread`.`visible`
- `thread`.`lastpost`
It seems you already have lots of indexes... so, make sure you keep the ones you really need and remove the useless ones.
Related
I'm not very experienced with more advanced MySQL query stuff.. (mostly basic queries, return and parse response..etc)
However.. I am not clear on the correct approach when I need multiple things (responses) from the database.. Is there a way to get these things from the single query? or do I need to do a new query for each time?
Background:
I use PDO to do a SELECT statement
ie:
$getAllVideos_sql = "SELECT * as FROM $tableName WHERE active IS NOT NULL OR active != 'no' ORDER BY topic, speaker_last, title;";
$getAllVideos_stmt = $conn->prepare($getAllVideos_sql);
$getAllVideos_stmt->execute();
$getAllVideos_stmt->setFetchMode(PDO::FETCH_ASSOC);
$results = $getAllVideos_stmt->fetch(PDO::FETCH_ASSOC);
//parse as I see fit
This gives me my 'chunk of data' that I can pick apart and display as I want.
However.. I want to also be able to give some stats (totals)
For the total (distinct) 'topics'.. as well as total count for the 'titles' (should all be unique by default)
Do I need to do another query, prepare, execute, setFetchMode, fetch all over again?
Is this the proper way to do this? Or is there a way to crib off the initial commands that are already in play?
To be clear, I'm not really looking for a query... I'm looking to understand the proper way one does this.. when they need several pieces of data like I do? multiple queries and executions..etc?
Or maybe it can and -should- be done in one snippet? With an adjustment to the query itself to return sub select/queries info?
this isnt the correct syntax, because it only returns 1 record..(but the total topic count seems to be correct, even though I only get 1 record returned)
SELECT *, count(DISTINCT topic)as totalTopics, count(DISTINCT title)as totalTitles FROM $tableName;
Maybe this the more proper approach? Try to include these totals/details in the main query to pick out?
Hope this makes sense.
Thanks
I don't think you're going to get anything very clean that'll do this, however something like this might work:
SELECT * from $Table t
INNER JOIN (
SELECT COUNT(DISTINCT Topic) as TotalTopics FROM $Table
) s ON 1 = 1
INNER JOIN (
SELECT COUNT(DISTINCT Title) as TotalTitles FROM $Table
) f ON 1 = 1
WHERE ( Active IS NOT NULL ) AND Active != 'no'
Especially with web applications, many people are regularly doing counts or other aggregations somewhere along the way. Sometimes if it is a global context such as all topics for all users, having some stored aggregates helps rather than requerying all record counts every time.
Example. If you have a table with a list of "topics", have a column in there for uniqueTitleCount. Then, based on a trigger, when a new title is added to a topic, the count is automatically updated by adding 1. You can pre-populate this column by doing a correlated update to said "topics" table, then once the trigger is set, you can just have that column.
This also works as I see many times that people want "the most recent". If your system has auto-increment IDs in the tables, similarly, have the most recent ID created for a given topic, or even most recent for a given title/document/thread so you don't have to keep doing something like.
select documentID, other_stuff
from sometable
where documentID in ( select max( documentID )
from sometable
where the title = 'something' )
Use where these make sense then your optimization pull-downs get easier to handle. You could even have a counter per document "title" and even a most recent posting date so they can quickly be sorted based on interest, frequency of activity, whatever.
I have 17,257 rows in MySQL (Size: 6.6 MiB), whenever I am running my PHP code, it's too slow and takes more than 30 minutes to open the webpage. I read somewhere to change mysqli_fetch_array to fetch_assoc, but still I can't see any change. Any suggestions?
Initially I had a complex code, so I changed it to the one present below, but still I can't observe any change.
$md=$db->query("SELECT MDid,MD_FullName FROM MDList");
while($row=$md->fetch_assoc())
{
$mdid=$row['MDid'];
$mdname=$row['MD_FullName'];
$distinct_filenames=$db->query("SELECT DISTINCT(FileName) AS Files FROM InitialLog WHERE MDid='$mdid' AND FileName NOT LIKE '%Patient Names%'");
while($row2=$distinct_filenames->fetch_assoc())
{
$filename=$row2['Files'];
$finalquery=$db->query("SELECT LinesCount,CharCount,WordCount,PageCount FROM InitialLog WHERE FileName='$filename' AND (DateLastSaved>='$firstdate' AND DateLastSaved<='$presentdate') AND MONTH(DateLastSaved) = (SELECT MIN(MONTH(DateLastSaved)) FROM InitialLog WHERE FileName='$filename') ORDER BY DAY(DateLastSaved) DESC LIMIT 1");
while($row3=$finalquery->fetch_assoc())
{
$linecount=$linecount+$row3['LinesCount'];
$charcount=$charcount+$row3['CharCount'];
$wordcount=$wordcount+$row3['WordCount'];
$pagecount=$pagecount+$row3['PageCount'];
}
}
What I wan't to achieve through queries is:
Tables:
MDList (Consist of MD ids of all the MDs)
InitialLog (Consist of FileNames of each MDid and the counts)
My first query chooses each MDid one by one from the table MDlist.
Second query takes distinct file names from InitialLogs table for that specific MD chosen from first query (File names can be same)
Third query returns various counts of each distinct filename of the specific MD. The count is returned normally if one file exists of that name, if there are more files present, so it returns the count of such filename, which exists in the first month and the last day of that first month, like if it exists in 01-01-2016,22-01-2016,23-02-2016, so it returns that count which is in the row (22-01-2016), that is the last day of the first month.
In the end I sum all the counts returned for each MD.
You are making a zillion SQL queries.
Well, somewhere in the region of <Number of MD Results> * <Number of distinct filenames> SQL queries.
Since you are just adding up some stats, it will likely to be more efficient to create a single query that sums up the correct values to start with.
Check out SUM() and JOINs.
As said, you should avoid executing queries in loops at (pretty much) any cost. Your SGBD engine is designed to handle data aggregation, join, exclusions and such.
It should be better this way, but please read notes below about why it's not a good idea. It's a direct transcription from your query logic which might be rewritten for better performances and safety.
SELECT
sum(log.LinesCount), sum(log.CharCount),
sum(log.WordCount), sum(log.PageCount)
FROM InitialLog log
INNER JOIN (
SELECT l2.FileName, l2.MD_id
FROM InitialLog l2
WHERE l2.FileName NOT LIKE '%Patient Names%'
) filtered_name
ON filtered_name.FileName=log.FileName
INNER JOIN MDList md
ON filtered_name.MD_id = md.MDid
INNER JOIN (
SELECT MIN(MONTH(l3.DateLastSaved)) as minmonth
FROM InitialLog l3
WHERE l3.FileName='$filename'
) lastSaved
ON lastSaved.minmonth = log.DateLastSaved
WHERE
log.DateLastSaved>='$firstdate'
AND log.DateLastSaved<='$presentdate'
ORDER BY
DAY(log.DateLastSaved) DESC
LIMIT 1;
First, NOT LIKE '%whatever%' is usually a bad idea as it requires to perform a full scan; it would be much more efficient with a JOIN and a nullity test or use a view or another way to avoid this scan altogether (adding a column, etc.). At least, try to avoid wildcards (%) at start of pattern.
Next, you're using string concatenation to inject parameters into your query, that's bad. You should use prepared queries with real parameters to avoid SQL injection.
Finally, your should consider altering your dates (or add column updated by trigger, set up a view, whatever) to avoid inconsistent comparaisons.
I'm building a query to show items with user and then show highest Bid on the item.
Example:
Xbox 360 by james. - the highest bid was $55.
art table by mario. - the highest bid was $25.
Query
SELECT i, u
FROM AppBundle:Item i
LEFT JOIN i.user u
I have another table bids (one to many relationship). I'm not sure how can I include single highest bid of the item in the same query with join.
I know I can just run another query after this query, with function (relationship), but I'm avoiding to do that for optimisation reasons.
Solution
SQL
https://stackoverflow.com/a/16538294/75799 - But how is this possible in doctrine DQL?
You can use IN with a sub query in such cases.
I am not sure if I understood your model correctly, but I attempted to make your query with a QueryBuilder and I am sure you will manage to make it work with this example:
$qb = $this->_em->createQueryBuilder();
$sub = $qb;
$sub->select('mbi') // max bid item
->where('i.id = mbi.id')
->leftJoin('mbi.bids', 'b'))
->andWhere($qb->expr()->max('b.value'))
->getQuery();
$qb = $qb->select('i', 'u')
->where($qb->expr()->in('i', $sub->getDQL()))
->leftJoin('i.user', 'u');
$query = $qb->getQuery();
return $query->getResult();
Your SQL query may look something like
select i,u
from i
inner join bids u on i.id = u.item_id
WHERE
i.value = (select max(value) from bids where item_id = i.id)
group by i
DQL, I don't think supports subqueries, so you could try using a Having clause or see if Doctrine\ORM\Query\Expr offers anything.
To solve this for my own case, I added a method to the origin entity (item) to find the max entity in a list of entities (bids), using Doctrine's Collections' Criteria I've written about it here.
Your Item entity would contain
public function getMaxBid()
{
$criteria = Criteria::create();
$criteria->orderBy(['bid.value' => Criteria::ASC]);
$criteria->setLimit(1);
return $this->bids->matching($criteria);
}
Unfortunately, there's no way that i know to find the maximum bid and the bidder with one grouping query, but there's several techniques to making the logic work with several queries. You could do a sub select and that might work fine depending on the size of the table. If you're planning on growing to the point where that's not going to work, you're probably already looking at sharing your relational databases, moving some data to a less transactional, higher performance db technology, or denormalizing, but if you want to keep this implemented in pure MySQL, you could use a procedure to express in multiple commands how to check for a bid and optionally add to the list, also updating the current high bidder in a denormalized high bids table. This keeps the complex logic of how to verify the bid in one, the most rigorously managed place - the database. Just make sure you use transactions properly to stop 2 bids from being recorded concurrently ( eg, SELECT FOR UPDATE).
I used to ask prospective programmers to write this query to see how experienced with MySQL they were, many thought just a max grouping was sufficient, and a few left the interview still convinced that it would work fine and i was wrong. So good question!
I am working on an timesheet application, and writing a PHP code to fetch all the timesheets till date. This is the query that I have written to fetch the timesheets -
SELECT a.accnt_name, u.username, DATE_FORMAT(t.in_time, '%H:%i') inTime, DATE_FORMAT(t.out_time, '%H:%i') outTime, DATE_FORMAT(t.work_time, '%H:%i') workTime, w.wrktyp_name, t.remarks, DATE_FORMAT(t.tmsht_date, '%d-%b-%Y') tmshtDate, wl.loctn_name, s.serv_name, t.status_code, t.conv_kms convkms, t.conv_amount convamount FROM timesheets t, accounts a, services s, worktypes w, work_location wl, users WHERE a.accnt_code=t.accnt_code and w.wrktyp_code=t.wrktyp_code and wl.loctn_code=t.loctn_code and s.serv_code=t.serv_code and t.usr_code = u. ORDER BY tmsht_date desc
The where clause contains the clauses to get the actual values of respective codes from respective tables.
The issue is that this query is taking a lot of time to execute and the application crashes at the end of few minutes.
I ran this query in the phpmyadmin, there it works without any issues.
Need help in understanding what might be the cause behind the slowness in the execution.
Use EXPLAIN to see the execution plan for the query. Make sure MySQL has suitable indexes available, and is using those indexes.
The query text seems to be missing the name of a column here...
t.usr_code = u. ORDER
^^^
We can "guess" that's supposed to be u.usr_code, but that's just a guess.
How many rows are supposed to be returned? How large is the resultset?
Is your client attempting to "store" all of the rows in memory, and crashing because it runs out of memory?
If so, I recommend you avoid doing that, and fetch the rows as you need them.
Or, consider adding some additional predicates in the WHERE clause to return just the rows you need, rather than all the rows in the table.
It's 2015. Time to ditch the old-school comma syntax for join operation, and use JOIN keyword instead, and move join predicates from the WHERE clause to the ON clause. And format it. The database doesn't care, but it will make it easier on the poor soul that needs to decipher your SQL statement.
SELECT a.accnt_name
, u.username
, DATE_FORMAT(t.in_time ,'%H:%i') AS inTime
, DATE_FORMAT(t.out_time ,'%H:%i') AS outTime
, DATE_FORMAT(t.work_time,'%H:%i') AS workTime
, w.wrktyp_name
, t.remarks
, DATE_FORMAT(t.tmsht_date, '%d-%b-%Y') AS tmshtDate
, wl.loctn_name
, s.serv_name
, t.status_code
, t.conv_kms AS convkms
, t.conv_amount AS convamount
FROM timesheets t
JOIN accounts a
ON a.accnt_code = t.accnt_code
JOIN services s
ON s.serv_code = t.serv_code
JOIN worktypes w
ON w.wrktyp_code = t.wrktyp_code
JOIN work_location wl
ON wl.loctn_code = t.loctn_code
JOIN users
ON u.usr_code = t.usr_code
ORDER BY t.tmsht_date DESC
Ordering on the formatted date column is very odd. Much more likely you want results returned in "date" order, not in the string order with month and day before the year. (Do you really want to sort on the day value first, before the year?)
FOLLOWUP
If this same exact query complete quickly, with the entire resultset (of approx 720 rows) from a different client (same database, same user), then the issue is likely something other than this SQL statement.
We would not expect the execution of the SQL statement to cause PHP to "crash".
If you are storing the entire resultset (for example, using mysqli store_result), you need to have sufficient memory for that. But the thirteen expressions in the select list all look relatively short (formatted dates, names and codes), and we wouldn't expect "remarks" would be over a couple of KB.
For debugging this, as others have suggested, try adding a LIMIT clause on the query, e.g. LIMIT 1 and observe the behavior.
Alternatively, use a dummy query for testing; use a query that is guaranteed to return specific values and a specific number of rows.
SELECT 'morpheus' AS accnt_name
, 'trinity' AS username
, '01:23' AS inTime
, '04:56' AS outTime
, '00:45' AS workTime
, 'neo' AS wrktyp_name
, 'yada yada yada' AS remarks
, '27-May-2015' AS tmshtDate
, 'zion' AS loctn_name
, 'nebuchadnezzar' AS serv_name
, '' AS status_code
, '123' AS convkms
, '5678' AS convamount
I suspect that the query is not the root cause of the behavior you are observing. I suspect The problem is somewhere else in the code.
How to debug small programs http://ericlippert.com/2014/03/05/how-to-debug-small-programs/
phpadmin automatically adds LIMIT to the query, that's why you got fast results.
Check how many rows are in table
Run your query with limit
First of all: modify you query so that it looks like the one given by Spencer
Do you get an error message when your application 'crashes' or does it just stop?
You could try:
ini_set('max_execution_time', 0);
in your php code. This sets the maximum execution time to unlimited. So if there are no errors, your script should execute to the end. So you can see if your query gets the desired results.
Also just as a test end your query with
LIMIT 10
This should greatly speed up your query as it will only take the first ten results.
You can later change this value to one better suited for your needs. Unless you absolutely need the complete result set, I suggest you always use LIMIT in your queries.
I have a page that is taking 37 seconds to load. While it is loading it pegs MySQL's CPU usage through the roof. I did not write the code for this page and it is rather convoluted so the reason for the bottleneck is not readily apparent to me.
I profiled it (using kcachegrind) and find that the bulk of the time on the page is spent doing MySQL queries (90% of the time is spent in 25 different mysql_query calls).
The queries take the form of the following with the tag_id changing on each of the 25 different calls:
SELECT * FROM tbl_news WHERE news_id
IN (select news_id from
tbl_tag_relations WHERE tag_id = 20)
Each query is taking around 0.8 seconds to complete with a few longer delays thrown in for good measure... thus the 37 seconds to completely load the page.
My question is, is it the way the query is formatted with that nested select that is causing the problem? Or could it be any one of a million other things? Any advice on how to approach tackling this slowness is appreciated.
Running EXPLAIN on the query gives me this (but I'm not clear on the impact of these results... the NULL on primary key looks like it would be bad, yes? The number of results returned seems high to me as well as only a handful of results are returned in the end):
1 PRIMARY tbl_news ALL NULL NULL NULL NULL 1318 Using where
2 DEPENDENT SUBQUERY tbl_tag_relations ref FK_tbl_tag_tags_1 FK_tbl_tag_tags_1 4 const 179 Using where
I'e addressed this point in Database Development Mistakes Made by AppDevelopers. Basically, favour joins to aggregation. IN isn't aggregation as such but the same principle applies. A good optimize will make these two queries equivalent in performance:
SELECT * FROM tbl_news WHERE news_id
IN (select news_id from
tbl_tag_relations WHERE tag_id = 20)
and
SELECT tn.*
FROM tbl_news tn
JOIN tbl_tag_relations ttr ON ttr.news_id = tn.news_id
WHERE ttr.tag_id = 20
as I believe Oracle and SQL Server both do but MySQL doesn't. The second version is basically instantaneous. With hundreds of thousands of rows I did a test on my machine and got the first version to sub-second performance by adding appropriate indexes. The join version with indexes is basically instantaneous but even without indexes performs OK.
By the way, the above syntax I use is the one you should prefer for doing joins. It's clearer than putting them in the WHERE clause (as others have suggested) and the above can do certain things in an ANSI SQL way with left outer joins that WHERE conditions can't.
So I would add indexes on the following:
tbl_news (news_id)
tbl_tag_relations (news_id)
tbl_tag_relations (tag_id)
and the query will execute almost instantaneously.
Lastly, don't use * to select all the columns you want. Name them explicitly. You'll get into less trouble as you add columns later.
The SQL Query itself is definitely your bottleneck. The query has a sub-query in it, which is the IN(...) portion of the code. This is essentially running two queries at once. You can likely halve (or more!) your SQL times with a JOIN (similar to what d03boy mentions above) or a more targeted SQL query. An example might be:
SELECT *
FROM tbl_news, tbl_tag_relations
WHERE tbl_tag_relations.tag_id = 20 AND
tbl_news.news_id = tbl_tag_relations.news_id
To help SQL run faster you also want to try to avoid using SELECT *, and only select the information you need; also put a limiting statement at the end. eg:
SELECT news_title, news_body
...
LIMIT 5;
You also will want to look into the database schema itself. Make sure you are indexing all of the commonly referred to columns so that the queries will run faster. In this case, you probably want to check your news_id and tag_id fields.
Finally, you will want to take a look at the PHP code and see if you can make one single all-encompassing SQL query instead of iterating through several seperate queries. If you post more code we can help with that, and it will probably be the single greatest time savings for your posted problem. :)
If I understand correctly, this is just listing the news stories for a specific set of tags.
First of all, you really shouldn't
ever SELECT *
Second, this can probably be
accomplished within a single query,
thus reducing the overhead cost of
multiple queries. It seems like it
is getting fairly trivial data so
it could be retrieved within a
single call instead of 20.
A better approach to using IN might be to use a JOIN with a WHERE condition instead. When using an IN it will basically be a lot of OR statements.
Your tbl_tag_relations should definitely have an index on tag_id
select *
from tbl_news, tbl_tag_relations
where
tbl_tag_relations.tag_id = 20 and
tbl_news.news_id = tbl_tag_relations.news_id
limit 20
I think this gives the same results, but I'm not 100% sure. Sometimes simply limiting the results helps.
Unfortunately MySQL doesn't do very well with uncorrelated subqueries like your case shows. The plan is basically saying that for every row on the outer query, the inner query will be performed. This will get out of hand quickly. Rewriting as a plain old join as others have mentioned will work around the problem but may then cause the undesired affect of duplicate rows.
For instance the original query would return 1 row for each qualifying row in the tbl_news table but this query:
SELECT news_id, name, blah
FROM tbl_news n
JOIN tbl_tag_relations r ON r.news_id = n.news_id
WHERE r.tag_id IN (20,21,22)
would return 1 row for each matching tag. You could stick DISTINCT on there which should only have a minimal performance impact depending on the size of the dataset.
Not to troll too badly, but most other databases (PostgreSQL, Firebird, Microsoft, Oracle, DB2, etc) would handle the original query as an efficient semi-join. Personally I find the subquery syntax to be much more readable and easier to write, especially for larger queries.