I'm currently running this query. However, when run outside phpMyAdmin it causes a 504 timeout error. I'm thinking it has to do with how efficient the number of rows is returned or accessed by the query.
I'm not extremely experienced with MySQL and so this was the best I could do:
SELECT
s.surveyId,
q.cat,
SUM((sac.answer_id*q.weight))/SUM(q.weight) AS score,
user.division_id,
user.unit_id,
user.department_id,
user.team_id,
division.division_name,
unit.unit_name,
dpt.department_name,
team.team_name
FROM survey_answers_cache sac
JOIN surveys s ON s.surveyId = sac.surveyid
JOIN subcluster sc ON s.subcluster_id = sc.subcluster_id
JOIN cluster c ON sc.cluster_id = c.cluster_id
JOIN user ON user.user_id = sac.user_id
JOIN questions q ON q.question_id = sac.question_id
JOIN division ON division.division_id = user.division_id
LEFT JOIN unit ON unit.unit_id = user.unit_id
LEFT JOIN department dpt ON dpt.department_id = user.department_id
LEFT JOIN team ON team.team_id = user.team_id
WHERE c.cluster_id=? AND sc.subcluster_id=? AND s.active=0 AND s.prepare=0
GROUP BY user.team_id, s.surveyId, q.cat
ORDER BY s.surveyId, user.team_id, q.cat ASC
The problem I get with this query is that when I get a correct result returned it runs quickly (let's say +-500ms) but when the result has twice as much rows, it takes more than 5 minutes and then causes a 504 timeout.
The other problem is that I didn't create this database myself, so I didn't set the indices myself. I'm thinking of improving these and therefore I used the explain command:
I see a lot of primary keys and a couple double indices, but I'm not sure if this would affect the performance this greatly.
EDIT: This piece of code takes up all the execution time:
$start_time = microtime(true);
$stmt = $conn->query($query); //query is simply the query above.
while ($row = $stmt->fetch_assoc()){
$resultSurveys["scores"][] = $row;
}
$stmt->close();
$end_time = microtime(true);
$duration = $end_time - $start_time; //value typically the execution time #reallyHigh...
So my question: Is it possible to (greatly?) improve the performance of the query by altering the database keys or should I divide my query into multiple smaller queries?
You can try something like this ( although its not practical for me to test this )
SELECT
sac.surveyId,
q.cat,
SUM((sac.answer_id*q.weight))/SUM(q.weight) AS score,
user.division_id,
user.unit_id,
user.department_id,
user.team_id,
division.division_name,
unit.unit_name,
dpt.department_name,
team.team_name
FROM survey_answers_cache sac
JOIN
(
SELECT
s.surveyId,
sc.subcluster_id
FROM
surveys s
JOIN subcluster sc ON s.subcluster_id = sc.subcluster_id
JOIN cluster c ON sc.cluster_id = c.cluster_id
WHERE
c.cluster_id=? AND sc.subcluster_id=? AND s.active=0 AND s.prepare=0
) AS v ON v.surveyid = sac.surveyid
JOIN user ON user.user_id = sac.user_id
JOIN questions q ON q.question_id = sac.question_id
JOIN division ON division.division_id = user.division_id
LEFT JOIN unit ON unit.unit_id = user.unit_id
LEFT JOIN department dpt ON dpt.department_id = user.department_id
LEFT JOIN team ON team.team_id = user.team_id
GROUP BY user.team_id, v.surveyId, q.cat
ORDER BY v.surveyId, user.team_id, q.cat ASC
So I hope I didn't mess anything up.
Anyway, the idea is in the inner query you select only the rows you need based on your where condition. This will create a smaller tmp table as it only pulls 2 fields both ints.
Then in the outer query you join to the tables that you actually pull the rest of the data from, order and group. This way you are sorting and grouping on a smaller dataset. And your where clause can run in the most optimal way.
You may even be able to omit some of these tables as your only pulling data from a few of them, but without seeing the full schema and how it's related that's hard to say.
But just generally speaking this part (The sub-query)
SELECT
s.surveyId,
sc.subcluster_id
FROM
surveys s
JOIN subcluster sc ON s.subcluster_id = sc.subcluster_id
JOIN cluster c ON sc.cluster_id = c.cluster_id
WHERE
c.cluster_id=? AND sc.subcluster_id=? AND s.active=0 AND s.prepare=0
Is what is directly affected by your WHERE clause. See so we can optimize this part then use it to join the rest of the data you need.
An example of removing tables can be easily deduced from the above, consider this
SELECT
s.surveyId,
sc.subcluster_id
FROM
surveys s
JOIN subcluster sc ON s.subcluster_id = sc.subcluster_id
WHERE
sc.cluster_id=? AND sc.subcluster_id=? AND s.active=0 AND s.prepare=0
The c table cluster is never used to pull data from, only for the where. So is not
JOIN cluster c ON sc.cluster_id = c.cluster_id
WHERE
c.cluster_id=?
The same as or equivalent to
WHERE
sc.cluster_id=?
And therefore we can eliminate that join completely.
The EXPLAIN result is showing signs of problem
Using temporary;using filesort: the ORDER BY needs to create temporary tables to do the sorting.
On 3rd row for user table type is ALL, key and ref are NULL: means that it needs to scan the whole table each time to retrieve results.
Suggestions:
add indexes on user.cluster_id and all fields involved on the ORDER BY and GROUP by clauses. Keep in mind that user table seems to be under changein database (cross database query).
Add indexes on user columns involved on JOINs.
Add index to s.survey_id
If possible, keep the same sequence for GROUP BY and ORDER BY clauses
According to the accepted answer in this question move the JOIN on user table to the first position in the join queue.
Carefully read this official documentation. You may need to optimize the server configuration.
PS: query optimization is an art that requires patience and hard work. No silver bullet for that.
Welcome to the fine art of optimizing MySQL!
i think the problem happends when you add this:
JOIN user ON user.cluster_id = sc.subcluster_id
JOIN survey_answers_cache sac ON (sac.surveyId = s.surveyId AND sac.user_id = user.user_id)
the extra condition sac.user_id = user.user_id can be easily not consistent.
Can you try do a second join with user table?
pd. can you add a "SHOW CREATE TABLE"
Related
So I have a query that works exactly as intended but needs heavy optimization. I am using software to track load times across my site and 97.8% of all load time on my site is a result of this one function. So to explain my database a little bit before getting to the query.
First I have a films table, a competitions table and a votes table. Films can be in many competitions and competitions have many films, since this is a many to many relationship we have a pivot table to show their relationship (filmCompetition) When loading the competition page however these films need to be in order of their votes (most voted at the top. least at the bottom).
Now In the query below you can see what I am doing, grabbing from the films from the filmsCompetition table that match the current competition id $competition->id, and then I order by the total number of votes for that film. Like I said this works but is super efficient but I cannot think of another way to do it.
$films = DB::select( DB::raw("SELECT f.*, COUNT(v.value) AS totalVotes
FROM filmCompetition AS fc
JOIN films AS f ON f.id = fc.filmId AND fc.competitionId = '$competition->id'
LEFT JOIN votes AS v ON f.id = v.filmId AND v.competitionId = '$competition->id'
GROUP BY f.id
ORDER BY totalVotes DESC
") );
For this query, you want indexes on filmCompetition(competitionId, filmId), films(id), and votes(filmId, competitionId)`.
However, it is probably more efficient to write the query like this:
SELECT f.*,
(SELECT COUNT(v.value)
FROM votes v
WHERE v.filmId = f.id and v.competitionId = '$competition->id'
) AS totalVotes
FROM films f
WHERE EXISTS (SELECT 1
FROM FilmCompetition fc
WHERE fc.FilmId = f.Filmid AND
fc.competitionId = '$competition->id'
)
ORDER BY TotalVotes DESC
This saves the outer aggregation, which should be a performance win. For this version, the indexes are FilmCompetition(FilmId, CompetitionId) and Votes(FilmId, CompetitionId).
The solution I actually ended up using was a little different but since Gordon answered the question I marked him as the correct answer.
My Solution
To actually fix this I did a slightly different approach, rather than trying to do all of this in SQL I did my three queries separately and then joined them together in PHP. While this can be slower in my case doing it my original way took about 15 seconds, doing it Gordons way took about 7, and doing it my new way took about 600ms.
I've got a large mysql query with 5 joins which may not seem efficient but I'm struggling to find a different solution which would work.
The views table is the main table here, because both clicks and conversions table rely on it via the token column(which is indexed and set as a foreign key in all tables).
The query:
SELECT
var.id,
var.disabled,
var.name,
var.updated,
var.cid,
var.outdated,
IF(var.type <> 0,'DL','LP') AS `type`,
COUNT(DISTINCT v.id) AS `views`,
COUNT(DISTINCT c.id) AS `clicks`,
COUNT(DISTINCT co.id) AS `conversions`,
SUM(tc.cost) AS `cost`,
SUM(cp.value) AS `revenue`
FROM variants AS var
LEFT JOIN views AS v ON v.vid = var.id
LEFT JOIN traffic_cost AS tc ON tc.id = v.source
LEFT JOIN clicks AS c ON c.token = v.token
LEFT JOIN conversions AS co ON co.token = v.token
LEFT JOIN c_profiles AS cp ON cp.id = co.profile
WHERE var.cid = 28
GROUP BY var.id
The results I'm getting are:
The problem is the revenue and cost results are too hight, because for views,clicks and impressions only the distinct rows are counted, but for revenue and cost for some reason(I would really appreciate an explanation here) all rows in all tables are taken into the result set.
I know this is a large query, but both clicks and conversions tables rely on the views table which is used for filtering the results e.g. views.country = 'uk'. I've tried doing 3 queries and merging them, but that didn't work(it gave me wrong results).
One more thing that I find weird is that if I remove the joins with clicks, conversions, c_profiles the costs column shows correct results.
Any help would be appreciated.
In the end I had to use 3 different queries and do a merge on them. Seemed like an overhead, but worked for me.
Can you let me know if my interpretation is correct (the last AND part)?
$q = "SELECT title,name,company,address1,address2
FROM registrations
WHERE title != 0 AND id IN (
SELECT registrar_id
FROM registrations_industry
WHERE industry_id = '$industryid'
)";
Below was really where I am not sure:
... AND id IN (select registrar_id from registrations_industry where industry_id='$industryid')
Interpretation: Get any match on id(registrations id field) equals registrar_id(field) from the join table registrations_industry where industry_id equals the set $industryid
Is this select statement considered a sub routine since it's a query within the main query?
So an example would be with the register table id search to 23 would look like:
registrations(table)
id=23,title=owner,name=mike,company=nono,address1=1234 s walker lane,address2
registrations_industry(table)
id=256, registrar_id=23, industry_id=400<br>
id=159, registrar_id=23, industry_id=284<br>
id=227, registrar_id=23, industry_id=357
I assume this would return 3 records with the same registration table data And of course varying registrations_industry returns.
For a given test data set your query will return one record. This one:
id=23,title=owner,name=mike,company=nono,address1=1234 s walker lane,address2
To get three records with the same registration table data and varying registrations_industry you need to use JOIN.
Something like this:
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations AS r
LEFT OUTER JOIN registrations_industry AS ri
ON ri.registrar_id=r.id
WHERE r.title!=0 AND ri.industry_id={$industry_id}
Sorry for the essay, I didn't realize it was as long as it is until looking at it now. And although you've checked an answer, I hope you read this gain some insight into why this solution is preferred and how it evolved out of your original query.
First things first
Your query
$q = "SELECT title,name,company,address1,address2
FROM registrations
WHERE title != 0 AND id IN (
SELECT registrar_id
FROM registrations_industry
WHERE industry_id = '$industryid'
)";
seems fine. The IN syntax is equivalent to a number of OR matches. For example
WHERE field_id IN (101,102,103,105)
is functionally equivalent to
WHERE (field_id = 101
OR field_id = 102
OR field_id = 103
OR field_id = 105)
You complicate it a bit by introducing a subquery, no problem. As long as your subquery returns one column (and yours does), passing it to IN will be fine.
In your case, you're comparing registrations.id to registrations_industry.registrar_id. (Note: This is just <table>.<field> syntax, nothing special, but helpful to disambiguate what tables your fields are in.)
This seems fine.
What happens
SQL would first run the subquery, generating a result set of registrar_ids where the industry_id was set as specified.
SQL would then run the outer query, replacing the subquery with its results and you would get rows from registrations where registrations.id matched one of the registrar_ids returned from the subquery.
Subqueries are helpful to debug your code, because you can pull out the subquery and run it separately, ensuring its output is as you expect.
Optimization
While subqueries are good for debugging, they're slow, at least slower than using optmized JOIN statements.
And in this case, you can convert your query to a single-level query (without subqueries) by using a JOIN.
First, you'd start with basically the exact same outer query:
SELECT title,name,company,address1,address2
FROM registrations
WHERE title != 0 AND ...
But you're also interested in data from the registrations_industry table, so you need to include that. Giving us
SELECT title,name,company,address1,address2
FROM registrations, registrations_industry
WHERE title != 0 AND ...
We need to fix the ... and now that we have the registrations_industry table we can:
SELECT title,name,company,address1,address2
FROM registrations, registrations_industry
WHERE title != 0
AND id = registrar_id
AND industry_id = '$industryid'
Now a problem might arise if both tables have an id column -- since just saying id is ambiguous. We can disambiguate this by using the <table>.<field> syntax. As in
SELECT registrations.title, registrations.name,
registrations.company, registrations.address1, registrations.address2
FROM registrations, registrations_industry
WHERE registrations.title != 0
AND registrations_industry.industry_id = '$industryid'
We didn't have to use this syntax for all the field references, but we chose to for clarity. The query now is unnecessarily complex because of all the table names. We can shorten them while still providing disambiguation and clarity. We do this by creating table aliases.
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r, registrations_industry ri
WHERE r.title != 0
AND ri.industry_id = '$industryid'
By placing r and ri after the two tables in the FROM clause, we're able to refer to them using these shortcuts. This cleans up the query but still gives us the ability to clearly specify which tables the fields are coming from.
Sidenote: We could be more explicit about the table aliases by including the optional AS e.g. FROM registrationsASr rather than just FROM registrations r, but I typically reserve AS for field aliases.
If you run the query now you will get what is called a "Cartesian product" or in SQL lingo, a CROSS JOIN. This is because we didn't define any relationship between the two tables when, in fact, there is one. To fix this we need to reintroduce part of the original query that was lost: the relationship between the two tables
r.id = ri.registrar_id
so that our query now looks like
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r, registrations_industry ri
WHERE r.title != 0
AND r.id = ri.registrar_id
AND ri.industry_id = '$industryid'
And this should work perfectly.
Nitpicking -- implicit vs. explicit joins
But the nitpicker in me needs to point out that this is called an "implicit join". Basically you're joining tables but not using the JOIN syntax.
A simpler example of an implicit join is
SELECT *
FROM foo f, bar b
WHERE f.id = b.foo_id
The corresponding explicit syntax is
SELECT *
FROM foo f
JOIN bar b ON f.id = b.foo_id
The result will be identical but it is using proper (and clearer) syntax. (Its clearer because it explicitly stats that there is a relationship between the foo and bar tables and it is defined by f.id = b.foo_id.)
We could similarly express your implicit query
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r, registrations_industry ri
WHERE r.title != 0
AND r.id = ri.registrar_id
AND ri.industry_id = '$industryid'
explicitly as follows
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r
JOIN registrations_industry ri ON r.id = ri.registrar_id
WHERE r.title != 0
AND ri.industry_id = '$industryid'
As you can see, the relationship between the tables is now in the JOIN clause, so that the WHERE and subsequent AND and OR clauses are free to express any restrictions. Another way to look at this is if you took out the WHERE + AND/OR clauses, the relationship between tables would still hold and the results would still "make sense" whereas if you used the implicit method and removed the WHERE + AND/OR clauses, your result set would contain rows that were misleading.
Lastly, the JOIN syntax by itself will cause rows that are in registrations, but do not have any corresponding rows in registrations_industry to not be returned.
Depending on your use case, you may want rows from registrations to appear in the results even if there are no corresponding entries in registrations_industry. To do this you would use what's called an OUTER JOIN. In this case, we want what is called a LEFT OUTER JOIN because we want all of the rows of the table on the left (registrations). We could have alternatively used RIGHT OUTER JOIN for the right table or simply OUTER JOIN for the outer join of both tables.
Therefore our query becomes
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r
LEFT OUTER JOIN registrations_industry ri ON r.id = ri.registrar_id
WHERE r.title != 0
AND ri.industry_id = '$industryid'
And we're done.
The end result is we have a query that is
faster in terms of runtime
more compact / concise
more explicit about what tables the fields are coming from
more explicit about the relationship between the tables
A simpler version of this query would be:
SELECT title, name, company, address1, address2
FROM registrations, registrations_industry
WHERE title != 0
AND id = registrar_id
AND industry_id = '$industryid'
Your version was a subquery, this version is a simple join. Your assumptions about your query are generally correct, but it is harder for SQL to optimize and a little harder to unravel to anyone trying to read the code. Also, you won't be able to extract the data from the registrations_industry table in that parent SELECT statement because it's not technically joining and the subtable is not a part of the parent query.
I've been editing a script for a company that had this application specifically developed for their business, now they have came to me and wanted some upgrades. The entire application is in PHP and MySQL, minus a couple Python scripts to import 200k records daily to the database. My problem is that I need to allow the ability to categorize and edit notes on each record depending on it's event type. The only way to do this is by the URL embedded within each record, for it's the only truly unique value. Successfully figured this out, but now the page script takes forever (24 secs) to load.
Could someone please assist me in optimizing this bit of code?
$notesq = mysql_query("SELECT * FROM `campaign_event_detail_v2` WHERE `call_recording_url`<>'' AND `event_type_name`='Call'") or die(mysql_error());
while($cnD = mysql_fetch_array($notesq)) {
$callid=$cnD[0];
$getD = mysql_query("SELECT campaign_notes.note, campaign_categories.category FROM campaign_notes LEFT JOIN campaign_categories ON campaign_notes.cid = campaign_categories.cid WHERE campaign_notes.cid='".$cnD['call_recording_url']."' OR campaign_categories.cid='".$cnD['call_recording_url']."'");
$getData = mysql_fetch_row($getD);
mysql_query("UPDATE `campaign_event_detail_v2` SET `note`='".$getData[0]."',`category_id`='".$getData[1]."' WHERE `id`='".$callid."'");
}
Your help is greatly appreciated!
Thanks,
J
I think you can probably manage this in a single query:
UPDATE campaign_event_detail_v2 d
LEFT JOIN campaign_notes n ON n.cid = d.call_recording_url
LEFT JOIN campaign_categories c ON c.cid = n.cid
SET d.note = n.note, d.category_id = c.category
WHERE d.call_recording_url != '' AND d.event_type_name = 'Call'
I'm not 100% sure if this is the proper logic, from what I understood it is. I must apologize if it's not. But, my point being: you can probably make it all in a single query.
You probably should add indices on columns like event_type_name, category_id and cid, if they're not already there. It will not affect your scripts, but will take some time to execute depending on how many records you have in your table(s).
Also, it probably would be better to use triggers instead of executing this at each request.
It seems like you could improve this query:
SELECT campaign_notes.note, campaign_categories.category
FROM campaign_notes
LEFT JOIN campaign_categories
ON campaign_notes.cid = campaign_categories.cid
WHERE campaign_notes.cid = {$cnD['call_recording_url']}
OR campaign_categories.cid = {$cnD['call_recording_url']}
To this:
SELECT campaign_notes.note, campaign_categories.category
FROM campaign_notes
LEFT JOIN campaign_categories
ON campaign_categories.cid = campaign_notes.cid
WHERE campaign_notes.cid = {$cnD['call_recording_url']}
Something like this:
SELECT campaign_notes.note,
campaign_categories.category
FROM campaign_event_detail_v2 AS detail
LEFT JOIN campaign_notes ON campaign_notes.cid = detail.call_recording_url
LEFT JOIN campaign_categories ON campaign_notes.cid = campaign_categories.cid
AND campaign_notes.cid = detail.call_recording_url
WHERE detail.call_recording_url<>''
AND detail.event_type_name='Call'
AND campaign_notes.cid IS NOT NULL -- filters any from campaign notes that are empty
AND campaign_categories.cid IS NOT NULL -- filters any from campaign categories that are empty;
This grabs the urls from notes and categories and filters that match the call_recording_url. It then filters out any that didn't match in the left join (the is null check). If that is still too slow, there are other ways of optimizing. Then you should be able to loop through and do your update statement.
I would also try to index the ids you are joining on, especially the call_recording_url if that is a string, which takes longer to search on.
I would create a stored procedure for this and i would create some new indexsed depend on the queryies (for example one index on: call_recording_url, event_type_name).
And -- of course -- you can create a nicer query from the two first by mixing them to only one.
I have a page that pulls the users Post,username,xbc/xlk tags etc which is perfect... BUT since I am pulling information from a MyBB bulletin board system, its quite different. When replying, people are are allowed to change the "Thread Subject" by simplying replying and changing it.
I dont want it to SHOW the changed subject title, just the original title of all posts in that thread.
By default it repies with "RE:thread title". They can easily edit this and it will show up in the "Subject" cell & people wont know which thread it was posted in because they changed their thread to when replying to the post.
So I just want to keep the orginial thread title when they are replying.
Make sense~??
Tables:mybb_users
Fields:uid,username
Tables:mybb_userfields
Fields:ufid
Tables:mybb_posts
Fields:pid,tid,replyto,subject,ufid,username,uid,message
Tables:mybb_threads
Fields:tid,fid,subject,uid,username,lastpost,lastposter,lastposteruid
I haev tried multiple queries with no success:
$result = mysql_query("
SELECT * FROM mybb_users
LEFT JOIN (mybb_posts, mybb_userfields, mybb_threads)
ON (
mybb_userfields.ufid=mybb_posts.uid
AND mybb_threads.tid=mybb_posts.tid
AND mybb_users.uid=mybb_userfields.ufid
)
WHERE mybb_posts.fid=42");
$result = mysql_query("
SELECT * FROM mybb_users
LEFT JOIN (mybb_posts, mybb_userfields, mybb_threads)
ON (
mybb_userfields.ufid=mybb_posts.uid
AND mybb_threads.tid=mybb_posts.tid
AND mybb_users.uid=mybb_posts.uid
)
WHERE mybb_threads.fid=42");
$result = mysql_query("
SELECT * FROM mybb_posts
LEFT JOIN (mybb_userfields, mybb_threads)
ON (
mybb_userfields.ufid=mybb_posts.uid
AND mybb_threads.tid=mybb_posts.tid
)
WHERE mybb_posts.fid=42");
Your syntax isn't appropriate for carrying out multiple LEFT JOINs. Each join needs its own ON clause.
SELECT
*
FROM
mybb_users
LEFT JOIN mybb_userfields ON mybb_users.uid = mybb_userfields.ufid
LEFT JOIN mybb_posts ON mybb_userfields.ufid = mybb_posts.uid
LEFT JOIN mybb_threads ON mybb_posts.tid = mybb_threads.tid
WHERE
mybb_posts.fid = 42
This query should give the results you want. But it may not be the most efficient query for getting those results. Check the output of EXPLAIN as part of testing, to make sure it is not using table scans or anything like that.
Do all of these joins need to be LEFT JOINs? LEFT JOIN forces MySQL to join the tables in the indicated order, rather than allowing the query optimiser to determine the best order in which to join them. That's why you might need to be careful about the query execution plan. The main difference between JOIN and LEFT JOIN as far as query output is concerned is that LEFT JOIN resultsets will contain at least one row for each row of the table on the left-hand side of the join, whereas a regular JOIN will not contain a row if there aren't matches on the right-hand side of the join.
Edit: Also, you say that "I don't want it to SHOW the changed subject title, just the original title of all posts in that thread." This suggests that you only want a subset of the columns from these tables, in which case SELECT * is inappropriate.