Strange Doctrine behaviour with double innerJoin()

Strange Doctrine behaviour with double innerJoin() - php

I have a database schema like this:
My database schema: http://i.stack.imgur.com/vFKRk.png
To explain the context: One user writes one message. He can send it to one or more users.
I succeeded to get the title of message, the author for one user. However Doctrine, which I use for this project, do it with 2 queries. It's a little bit strange for me and I'm looking to understand, why. Normally, we can do it with one SQL query.
My DQL query:
$q = Doctrine_Query::create()->select('id_me, users_id_us, state_me, type_me, mc.title_mc, us.login_us') ->from('messages m')->innerJoin('m.messages_content mc')->innerJoin('mc.Users us') ->where('users_id_us = ?', $user)->limit($opt['limit'])->offset($opt['offset'])->orderBy($opt['order']);return $q->fetchArray();
SQL queries returned by Doctrine:
SELECT DISTINCT m3.id_me FROM messages m3 INNER JOIN messages_content m4 ON m3.messages_content_id_mc = m4.id_mc INNER JOIN users u2 ON m4.users_id_us = u2.id_us WHERE m3.users_id_us = '6' ORDER BY m3.id_me DESC LIMIT 2
SELECT m.id_me AS m__id_me, m.users_id_us AS m__users_id_us, m.state_me AS m__state_me, m.type_me AS m__type_me, m2.id_mc AS m2__id_mc, m2.title_mc AS m2__title_mc, u.id_us AS u__id_us, u.login_us AS u__login_us FROM messages m INNER JOIN messages_content m2 ON m.messages_content_id_mc = m2.id_mc INNER JOIN users u ON m2.users_id_us = u.id_us WHERE m.id_me IN ('11') AND (m.users_id_us = '6') ORDER BY m.id_me DESC
Why my Doctrine query doesn't return the query like this:
SELECT m.id_me, m.users_id_us, m.state_me, m.type_me, mc.title_mc, u.login_us FROM messages m JOIN messages_content mc ON mc.id_mc = m.messages_content_id_mc JOIN users u ON u.id_us = mc.users_id_us WHERE m.users_id_us = 6;
Any idea to transform my DQL query and execute it one time ?

The ORM Limit issue
This has to do with the LIMIT part. :) Doctrine LIMIT works a bit different than MySQL limit does.
MySQL LIMIT just issues the query, and stops searching as soon as n rows are found that matches your SQL query. Since in ORM, this is really unexpected behaviour (it might well be that in a scalar layout the SQL SELECT * FROM myModel LEFT JOIN someOtherModel ON someCondition LIMIT 3 actually only returns one myModel instance rather than three, since a left join can result 3 rows.
What does Doctrine do?
If your DQL query is FROM school s INNER JOIN s.students LIMIT 15, it means: give me 15 instances of school that have at least one student associated (hence INNER JOIN), with ALL of their student associates. To do this, Doctrine first asks for DISTINCT school with the exact same query parameters and a LIMIT part, to figure out which 15 school IDs should be returned. After this is done, these IDs are queried next, without the LIMIT part.
How to solve your issue
If you are not having an actual problem, huzzah, find the explaination of this behaviour above. If your query output is other than you expected, make sure you take these steps into consideraton. If for instance your DQL is FROM school s INNER JOIN s.students LIMIT 15, and you are wondering why you get more than 15 students, try: FROM students s INNER JOIN s.school LIMIT 15. In MySQL this means basically the same (disregarding the order of the result), though in Doctrine, this means you will get 15 students instead of 15 schools.

This bothered me too with some of the more complex queries. The only solution that I found was to bypass the sillyness altogether:
$q = Doctrine_Manager::getInstance()->getCurrentConnection();
$my_result = $q->fetchAssoc(" ... PUT SQL HERE ... ");

The solution which works:
I changed the relation alias and specified the columns participating in ON joint between messages_content and users.
The right Doctrine query is:
$q = Doctrine_Query::create()
->select('id_me, users_id_us, state_me, type_me, mc.title_mc, us.login_us')
->from('messages m')
->innerJoin('m.messages_content mc')
->innerJoin('m.Users us ON mc.users_id_us=us.id_us')
->where('users_id_us = ?', $user)
->limit($opt['limit'])
->offset($opt['offset'])
->orderBy($opt['order']);
It gives a SQL query like this:
SELECT m.id_me AS m__id_me, m.users_id_us AS m__users_id_us, m.state_me AS m__state_me, m.type_me AS m__type_me, m2.id_mc AS m2__id_mc, m2.title_mc AS m2__title_mc, u.id_us AS u__id_us, u.login_us AS u__login_us FROM messages m INNER JOIN messages_content m2 ON m.messages_content_id_mc = m2.id_mc INNER JOIN users u ON (m2.users_id_us = u.id_us) WHERE (m.users_id_us = '7') ORDER BY m.id_me DESC LIMIT 2
Tom and Pelle ten Cate, thanks for your participation.

Related

SQL Inner Join query doesn't return all records, but instead the first one ever made

SELECT user.name, comments.cdata, comments.likes FROM comments
WHERE pid = $postNum
INNER JOIN user ON comments.uid = user.uid
ORDER BY cdate
Quick Notes:
I am a beginner, please don't be rude to me, I am trying to learn more
Yes, I have tried LEFT JOIN, but that just returns an SQL sintax error
My database is like this:
2 tables, 1 one is comments, comments has comments.cdata, comments.likes and comments.uid, the user one has the name of the user.
What I have been trying to accomplish is getting the name of the user with the comment data, instead of UID and comment data.
I also can not use 2 queries, due to me getting all the records and then displaying them on page via PHP for each.

Your query is syntactically incorrect. JOIN is an operator in the FROM clause. WHERE is a clause that follows the FROM clause.
In addition, I think the cdata and cdate should be the same thing, although I don't know what.
I also recommend using table aliases. So:
SELECT u.name, c.cdata, c.likes
FROM comments c JOIN
user u
ON c.uid = u.uid
WHERE c.pid = $postNum
ORDER BY c.cdata

How to improve query performance (using explain command results f.e.)

I'm currently running this query. However, when run outside phpMyAdmin it causes a 504 timeout error. I'm thinking it has to do with how efficient the number of rows is returned or accessed by the query.
I'm not extremely experienced with MySQL and so this was the best I could do:
SELECT
s.surveyId,
q.cat,
SUM((sac.answer_id*q.weight))/SUM(q.weight) AS score,
user.division_id,
user.unit_id,
user.department_id,
user.team_id,
division.division_name,
unit.unit_name,
dpt.department_name,
team.team_name
FROM survey_answers_cache sac
JOIN surveys s ON s.surveyId = sac.surveyid
JOIN subcluster sc ON s.subcluster_id = sc.subcluster_id
JOIN cluster c ON sc.cluster_id = c.cluster_id
JOIN user ON user.user_id = sac.user_id
JOIN questions q ON q.question_id = sac.question_id
JOIN division ON division.division_id = user.division_id
LEFT JOIN unit ON unit.unit_id = user.unit_id
LEFT JOIN department dpt ON dpt.department_id = user.department_id
LEFT JOIN team ON team.team_id = user.team_id
WHERE c.cluster_id=? AND sc.subcluster_id=? AND s.active=0 AND s.prepare=0
GROUP BY user.team_id, s.surveyId, q.cat
ORDER BY s.surveyId, user.team_id, q.cat ASC
The problem I get with this query is that when I get a correct result returned it runs quickly (let's say +-500ms) but when the result has twice as much rows, it takes more than 5 minutes and then causes a 504 timeout.
The other problem is that I didn't create this database myself, so I didn't set the indices myself. I'm thinking of improving these and therefore I used the explain command:
I see a lot of primary keys and a couple double indices, but I'm not sure if this would affect the performance this greatly.
EDIT: This piece of code takes up all the execution time:
$start_time = microtime(true);
$stmt = $conn->query($query); //query is simply the query above.
while ($row = $stmt->fetch_assoc()){
$resultSurveys["scores"][] = $row;
}
$stmt->close();
$end_time = microtime(true);
$duration = $end_time - $start_time; //value typically the execution time #reallyHigh...
So my question: Is it possible to (greatly?) improve the performance of the query by altering the database keys or should I divide my query into multiple smaller queries?

You can try something like this ( although its not practical for me to test this )
SELECT
sac.surveyId,
q.cat,
SUM((sac.answer_id*q.weight))/SUM(q.weight) AS score,
user.division_id,
user.unit_id,
user.department_id,
user.team_id,
division.division_name,
unit.unit_name,
dpt.department_name,
team.team_name
FROM survey_answers_cache sac
JOIN
(
SELECT
s.surveyId,
sc.subcluster_id
FROM
surveys s
JOIN subcluster sc ON s.subcluster_id = sc.subcluster_id
JOIN cluster c ON sc.cluster_id = c.cluster_id
WHERE
c.cluster_id=? AND sc.subcluster_id=? AND s.active=0 AND s.prepare=0
) AS v ON v.surveyid = sac.surveyid
JOIN user ON user.user_id = sac.user_id
JOIN questions q ON q.question_id = sac.question_id
JOIN division ON division.division_id = user.division_id
LEFT JOIN unit ON unit.unit_id = user.unit_id
LEFT JOIN department dpt ON dpt.department_id = user.department_id
LEFT JOIN team ON team.team_id = user.team_id
GROUP BY user.team_id, v.surveyId, q.cat
ORDER BY v.surveyId, user.team_id, q.cat ASC
So I hope I didn't mess anything up.
Anyway, the idea is in the inner query you select only the rows you need based on your where condition. This will create a smaller tmp table as it only pulls 2 fields both ints.
Then in the outer query you join to the tables that you actually pull the rest of the data from, order and group. This way you are sorting and grouping on a smaller dataset. And your where clause can run in the most optimal way.
You may even be able to omit some of these tables as your only pulling data from a few of them, but without seeing the full schema and how it's related that's hard to say.
But just generally speaking this part (The sub-query)
SELECT
s.surveyId,
sc.subcluster_id
FROM
surveys s
JOIN subcluster sc ON s.subcluster_id = sc.subcluster_id
JOIN cluster c ON sc.cluster_id = c.cluster_id
WHERE
c.cluster_id=? AND sc.subcluster_id=? AND s.active=0 AND s.prepare=0
Is what is directly affected by your WHERE clause. See so we can optimize this part then use it to join the rest of the data you need.
An example of removing tables can be easily deduced from the above, consider this
SELECT
s.surveyId,
sc.subcluster_id
FROM
surveys s
JOIN subcluster sc ON s.subcluster_id = sc.subcluster_id
WHERE
sc.cluster_id=? AND sc.subcluster_id=? AND s.active=0 AND s.prepare=0
The c table cluster is never used to pull data from, only for the where. So is not
JOIN cluster c ON sc.cluster_id = c.cluster_id
WHERE
c.cluster_id=?
The same as or equivalent to
WHERE
sc.cluster_id=?
And therefore we can eliminate that join completely.

The EXPLAIN result is showing signs of problem
Using temporary;using filesort: the ORDER BY needs to create temporary tables to do the sorting.
On 3rd row for user table type is ALL, key and ref are NULL: means that it needs to scan the whole table each time to retrieve results.
Suggestions:
add indexes on user.cluster_id and all fields involved on the ORDER BY and GROUP by clauses. Keep in mind that user table seems to be under changein database (cross database query).
Add indexes on user columns involved on JOINs.
Add index to s.survey_id
If possible, keep the same sequence for GROUP BY and ORDER BY clauses
According to the accepted answer in this question move the JOIN on user table to the first position in the join queue.
Carefully read this official documentation. You may need to optimize the server configuration.
PS: query optimization is an art that requires patience and hard work. No silver bullet for that.
Welcome to the fine art of optimizing MySQL!

i think the problem happends when you add this:
JOIN user ON user.cluster_id = sc.subcluster_id
JOIN survey_answers_cache sac ON (sac.surveyId = s.surveyId AND sac.user_id = user.user_id)
the extra condition sac.user_id = user.user_id can be easily not consistent.
Can you try do a second join with user table?
pd. can you add a "SHOW CREATE TABLE"

Cross Join to DQL

I'm trying to convert this I think simple mysql query into Doctrine dql, however, Im experience quite a struggle right now...
SELECT (c.prix-aggregates.AVG) AS test
FROM immobilier_ad_blank c
CROSS JOIN (
SELECT AVG(prix) AS AVG
FROM immobilier_ad_blank)
AS aggregates
Purpose of this: creating z-score.
Original implementation coming from this question Calculating Z-Score for each row in MySQL? (simple)
I thought about creating an association within the entity, but I mean its not necessary, its only for stats.
Edit: Btw, I dont wanna use raw SQL, I will extract the "subquery" from another query builder expression using getDQL. Otherwise, I will have to rewrite my dynamic query builder to take in account for rawSQL.
Edit 2:
Tried this
$subQb = $this->_em->createQueryBuilder();
$subQb->addSelect("AVG(subC.prix) as AMEAN")
->from("MomoaIntegrationBundle:sources\Common", "subC");
$subDql = $subQb->getDQL();
$dql = "SELECT c.prix FROM MomoaIntegrationBundle:sources\Common c INNER JOIN ($subDql) AS aggregates";
Raw dql is:
SELECT c.prix FROM MomoaIntegrationBundle:sources\Common c INNER JOIN (SELECT AVG(subC.prix) as AMEAN FROM MomoaIntegrationBundle:sources\Common subC) AS aggregates
Getting this strange error:line 0, col 70 near '(SELECT AVG(subC.prix)': Error: Class '(' is not defined.
Edit 3:
I found kinda of a hawkish way to make it work but doctrine is stubborn with its implementation of entities and such and forgot that STATISTICS do NOT need ENTITIES !
$subQb = $this->_em->createQueryBuilder();
$subQb->addSelect("AVG(subC.prix) as AMEAN")
->from("MomoaIntegrationBundle:sources\Common", "subC");
$sql = "SELECT (c.prix-aggregates.sclr_0) AS test FROM immobilier_ad_blank c CROSS JOIN "
. "({$subQb->getQuery()->getSQL()}) AS aggregates";
$stm = $stm = $this->_em->getConnection()->prepare($sql);
$stm->execute();
$data = $stm->fetchAll();
If you have a better solution, Im all ears ! I actually dislike this solution.

Starting with Doctrine 2.4 it is possible to JOIN without using a defined association, for example:
SELECT u FROM User u JOIN Items i WITH u.age = i.price
This one doesn't make any sense but you get the point. The WITH keyword is absolutely required in this case, otherwise it is a syntax error, but you can just provide a dummy condition, like so:
SELECT u FROM User u JOIN Items i WITH 0 = 0
This essentially results in a cross join. Whether this is a good idea in a given situation is a different question, but I have encountered situations where this was indeed very useful.

For complex queries you might want to consider bypassing DQL and using a native query - especially since you don't need the result in an entity.
$connection = $em->getConnection();
$statement = $connection->prepare("
select c.prix-aggregates, t1.avg
from immobilier_ad_blank
cross join (
select avg(prix) as avg
from immobilier_ad_blank
) t1
");
$statement->execute();
$results = $statement->fetchAll();

MySQL GROUP BY taking so much time to fetch records

I want to query the database to fetch the last visit time of every user here is the query:
SELECT
u.user_id,
u.firstname,
u.lastname,
u.email,
pv.visit_time
FROM
users u
LEFT OUTER JOIN pageviews pv
ON u.user_id = pv.user_id
GROUP BY pv.user_id
LIMIT 0, 12
This query is taking 30 to 40 seconds to execute on live server, however if i remove the GROUP BY clause then it is taking 3 to 6 seconds but with duplicate records. Any idea what's wrong with this query?
Also i have tried DISTINCT but found same issue.
Thanks, any help would be appreciated.

What are your indexes?
Do you really want a left join, as that would seem irrelevant? Using a LEFT OUTER JOIN it would just seem that you are going to get a row for user_id of NULL, but with nulls also in the other columns.
Further you are using GROUP BY to return a single row for each user. However which row is returned is not defined, so it could be any page views visit_time that is brought back for a user.
Also you have only a single column in the GROUP BY clause but other non aggregate columns in the select. With default options in MySQL this will work but will not work in most flavours of SQL and will also not work when MySQL is performing the group by in strict mode (see this manual page ).
Add a index on u.user_id and a compound index on pv.user_id AND pv.visit_time. Then assuming you want the latest visit time for each user try the query as:-
SELECT u.user_id,
u.firstname,
u.lastname,
u.email,
MAX(pv.visit_time)
FROM users u
INNER JOIN pageviews pv
ON u.user_id = pv.user_id
GROUP BY u.user_id, u.firstname, u.lastname, u.email
ORDER BY u.user_id
LIMIT 0, 12
(strictly speaking the ORDER BY clause is not required as it is implicitly done by the GROUP BY clause, but it does make it more explicit what is expected to anyone reading the code in future).

The group by clause and distinct requires a full scan of the table.
Maybe the query without the group by clause can be faster in returning the first rows, have you checked how long it takes to retrieve the whole result set?
If it takes only 3-6 seconds, I would refresh the statistics, maybe the optimiser is not doing the best choices for the join (I imagine that the table pageviews is a large one).

Select t1.x, t1.y, t1.z from table1 t1 Group by t1.x,t1.y,t1.z....
It will give better performance dude...
Group by fields (x,y,z) should be appended with select statement to get better performance..
Try it ...(group by operation will happen with in result set for above query)

MySql - Joining another table with multiple rows, inserting a query into a another query?

I've been racking my brain for hours trying work out how to join these two queries..
My goal is to return multiple venue rows (from venues) based on certain criteria... which is what my current query does....
SELECT venues.id AS ven_id,
venues.venue_name,
venues.sub_category_id,
venues.score,
venues.lat,
venues.lng,
venues.short_description,
sub_categories.id,
sub_categories.sub_cat_name,
sub_categories.category_id,
categories.id,
categories.category_name,
((ACOS( SIN(51.44*PI()/180)*SIN(lat*PI()/180) + COS(51.44*PI()/180)*COS(lat*PI()/180)*COS((-2.60796 - lng)*PI()/180)) * 180/PI())*60 * 1.1515) AS dist
FROM venues,
sub_categories,
categories
WHERE
venues.sub_category_id = sub_categories.id
AND sub_categories.category_id = categories.id
HAVING
dist < 5
ORDER BY score DESC
LIMIT 0, 100
However, I need to include another field in this query (thumbnail), which comes from another table (venue_images). The idea is to extract one image row based on which venue it's related to and it's order. Only one image needs to be extracted however. So LIMIT 1.
I basically need to insert this query:
SELECT
venue_images.thumb_image_filename,
venue_images.image_venue_id,
venue_images.image_order
FROM venue_images
WHERE venue_images.image_venue_id = ven_id //id from above query
ORDER BY venue_images.image_order
LIMIT 1
Into my first query, and label this new field as "thumbnail".
Any help would really be appreciated. Thanks!

First of all, you could write the first query using INNER JOIN:
SELECT
...
FROM
venues INNER JOIN sub_categories ON venues.sub_category_id = sub_categories.id
INNER JOIN categories ON sub_categories.category_id = categories.id
HAVING
...
the result should be identical, but i like this one more.
What I'd like to do next is to JOIN a subquery, something like this:
...
INNER JOIN (SELECT ... FROM venue_images
WHERE venue_images.image_venue_id = ven_id //id from above query
ORDER BY venue_images.image_order
LIMIT 1) first_image
but unfortunately this subquery can't see ven_id because it is evaluated first, before the outer query (I think it's a limitation of MySql), so we can't use that and we have to find another solution. And since you are using LIMIT 1, it's not easy to rewrite the condition you need using just JOINS.
It would be easier if MySql provided a FIRST() aggregate function, but since it doesn't, we have to simulate it, see for example this question: How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?
So using this trick, you can write a query that extracts first image_id for every image_venue_id:
SELECT
image_venue_id,
SUBSTRING_INDEX(
GROUP_CONCAT(image_id order by venue_images.image_order),',',1) as first_image_id
FROM venue_images
GROUP BY image_venue_id
and this query could be integrated in your query above:
SELECT
...
FROM
venues INNER JOIN sub_categories ON venues.sub_category_id = sub_categories.id
INNER JOIN categories ON sub_categories.category_id = categories.id
INNER JOIN (the query above) first_image on first_image.image_venue_id = venues.id
INNER JOIN venue_images on first_image.first_image_id = venue_images.image_id
HAVING
...
I also added one more JOIN, to join the first image id with the actual image. I couldn't check your query but the idea is to procede like this.
Since the query is now becoming more complicated and difficult to mantain, i think it would be better to create a view that extracts the first image for every venue, and then join just the view in your query. This is just an idea. Let me know if it works or if you need any help!

I'm not too sure about your data but a JOIN with the thumbnails table and a group by on your large query would probably work.
GROUP BY venues.id

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.