Complex query join with doctrine

Complex query join with doctrine - php

I'm building a query to show items with user and then show highest Bid on the item.
Example:
Xbox 360 by james. - the highest bid was $55.
art table by mario. - the highest bid was $25.
Query
SELECT i, u
FROM AppBundle:Item i
LEFT JOIN i.user u
I have another table bids (one to many relationship). I'm not sure how can I include single highest bid of the item in the same query with join.
I know I can just run another query after this query, with function (relationship), but I'm avoiding to do that for optimisation reasons.
Solution
SQL
https://stackoverflow.com/a/16538294/75799 - But how is this possible in doctrine DQL?

You can use IN with a sub query in such cases.
I am not sure if I understood your model correctly, but I attempted to make your query with a QueryBuilder and I am sure you will manage to make it work with this example:
$qb = $this->_em->createQueryBuilder();
$sub = $qb;
$sub->select('mbi') // max bid item
->where('i.id = mbi.id')
->leftJoin('mbi.bids', 'b'))
->andWhere($qb->expr()->max('b.value'))
->getQuery();
$qb = $qb->select('i', 'u')
->where($qb->expr()->in('i', $sub->getDQL()))
->leftJoin('i.user', 'u');
$query = $qb->getQuery();
return $query->getResult();

Your SQL query may look something like
select i,u
from i
inner join bids u on i.id = u.item_id
WHERE
i.value = (select max(value) from bids where item_id = i.id)
group by i
DQL, I don't think supports subqueries, so you could try using a Having clause or see if Doctrine\ORM\Query\Expr offers anything.
To solve this for my own case, I added a method to the origin entity (item) to find the max entity in a list of entities (bids), using Doctrine's Collections' Criteria I've written about it here.
Your Item entity would contain
public function getMaxBid()
{
$criteria = Criteria::create();
$criteria->orderBy(['bid.value' => Criteria::ASC]);
$criteria->setLimit(1);
return $this->bids->matching($criteria);
}

Unfortunately, there's no way that i know to find the maximum bid and the bidder with one grouping query, but there's several techniques to making the logic work with several queries. You could do a sub select and that might work fine depending on the size of the table. If you're planning on growing to the point where that's not going to work, you're probably already looking at sharing your relational databases, moving some data to a less transactional, higher performance db technology, or denormalizing, but if you want to keep this implemented in pure MySQL, you could use a procedure to express in multiple commands how to check for a bid and optionally add to the list, also updating the current high bidder in a denormalized high bids table. This keeps the complex logic of how to verify the bid in one, the most rigorously managed place - the database. Just make sure you use transactions properly to stop 2 bids from being recorded concurrently ( eg, SELECT FOR UPDATE).
I used to ask prospective programmers to write this query to see how experienced with MySQL they were, many thought just a max grouping was sufficient, and a few left the interview still convinced that it would work fine and i was wrong. So good question!

Related

Correct way to handle loading doctrine entities with multiple associations

I'm currently building an eCommerce site using Symfony 3 that supports multiple languages, and have realised they way I've designed the Product entity will require joining multiple other entities on using DQL/the query builder to load up things like the translations, product reviews and discounts/special offers. but this means I am going to have a block of joins that are going to be the same in multiple repositories which seems wrong as that leads to having to hunt out all these blocks if we ever need to add or change a join to load in extra product data.
For example in my CartRepository's loadCart() function I have a DQL query like this:
SELECT c,i,p,pd,pt,ps FROM
AppBundle:Cart c
join c.items i
join i.product p
left join p.productDiscount pd
join p.productTranslation pt
left join p.productSpecial ps
where c.id = :id
I will end up with something similar in the SectionRepository when I'm showing the list of products on that page, what is the correct way to deal with this? Is there some place I can centrally define the list of entities needed to be loaded for the joined entity (Product in this case) to be complete. I realise I could just use lazy loading, but that would lead to a large amount of queries being run on pages like the section page (a section showing 40 products would need to run 121 queries with the above example instead of 1 if I use a properly joined query).

One approach (this is just off the top of my head, someone may have a better approach). You could reasonably easily have a centralised querybuilder function/service that would do that. The querybuilder is very nice for programattically building queries. The key difference would be the root entity and the filtering entity.
E.g. something like this. Note of course these would not all be in the same place (they might be across a few services, repositories etc), it's just an example of an approach to consider.
public function getCartBaseQuery($cartId, $joinAlias = 'o') {
$qb = $this->getEntityManager()->createQueryBuilder();
$qb->select($joinAlias)
->from('AppBundle:Cart', 'c')
->join('c.items', $joinAlias)
->where($qb->expr()->eq('c.id', ':cartId'))
->setParameter('cartId', $cartId);
return $qb;
}
public function addProductQueryToItem($qb, $alias) {
/** #var QueryBuilder $query */
$qb
->addSelect('p, pd, pt, ps')
->join($alias.'product', 'p')
->leftJoin('p.productDiscount', 'pd')
->join('p.productTranslation', 'pt')
->join('p.productSpecial', 'ps')
;
return $qb;
}
public function loadCart($cartId) {
$qbcart = $someServiceOrRepository->getCartBaseQuery($cartId);
$qbcart = $someServiceOrRepository->addProductQueryToItem($qbcart);
return $qbcart->getQuery()->getResult();
}
Like I said, just one possible approach, but hopefully it gives you some ideas and a start at solving the issue.
Note: If you religiously use the same join alias for the entity you attach your product data to you would not even have to specify it in the calls (but I would make it configurable myself).

There is no single correct answer to your question.
But if I have to make a suggestion, I'd say to take a look at CQRS (http://martinfowler.com/bliki/CQRS.html) which basically means you have a separated read model.
To make this as simple as possibile, let's say that you build a separate "extended_product" table where all data are already joined and de-normalized. This table may be populated at regular intervals with a background task, or by a command that gets triggered each time you update a product or related entity.
When you need to read products data, you query this table instead of the original one. Of course, nothing prevents you from having many different extended table with your data arranged in a separate way.
In some way it's a concept very similar to database "views", except that:
it is faster, because you query an actual table
since you create that table via code, you are not limited to a single SQL query to process data (think filters, aggregations, and so on)
I am aware this is not exactly an "answer", but hopefully it may give you some good ideas on how to fix your problem.

How to write/optimize sql query that need to select data from 6 related tables until it get what it need

I am writing PHP application built on MySQL database made for 5-6 application sharing it. Because of that, I can not alter database structure, and I know many of you will say to do that first, but unfortunately I can't.
Here is my SQL fiddle of database schema, query that I am using, and desired output:
http://sqlfiddle.com/#!2/de7493/1
My solution is working on this example database, but on real production one, where some of these tables have more than 1m rows, when I try to run it my DB crash. Even if I cut down this sql to select only from 3-4 tables it will still crash. Maybe this is not possible to do, maybe I am doing it wrong. Here is what I have to do:
I am dynamically getting cpv_id from url. In my example, cpv_id is 66113000. Based on that value, I have to discover which club offers are related with that cpv_id. Then based on those offers I have to discover which club members are having some of those offers. ( club members are companies ). Then based on club member id, I have to discover some informations about company that is a member of the club, among that data I have to discover company special_id. And based on that special_id I have to read company reports.
So basically: based on cpv_id I have to discover company reports for the company having club offers related to that id ( simple right ? ). As you can see from the way my tables are related in SQLFiddle, I need to get through 6 tables to get what I really need. Once again, I can not alter database structure.
This is very complex thing going on, I am afraid that you will not understand what I need. I hope that SQLFiddle will help. And if you have any more questions please ask me.
So considering that my solution, my query, fail since database crash if I run it. Is there any way to get desired result ? Can I optimize this query somehow, or do I need to write some other one, or do anything else ? I am pretty lost, since I never had to go this deep and read data from so many tables just to get desired result.
Thanks,
Anita

This seems to do the same thing:
SELECT DISTINCT company_report.*
FROM company_report,
company,
users,
club,
club_offer,
club_offer_cpv
WHERE company_report.company_special_id = company.special_id AND
company.id = users.company_id AND
users.id = club.users_id AND
club.id = club_offer.club_id AND
club_offer.id = club_offer_id AND
club_offer_cpv.cpv_id = 66113000
Other people will prefer joins, but I find this easier to read, and they are equivalent. It would look something like this:
SELECT DISTINCT company_report.*
FROM company_report
JOIN company ON company_report.company_special_id = company.special_id
JOIN users ON company.id = users.company_id
JOIN club ON users.id = club.users_id
JOIN club_offer ON club.id = club_offer.club_id
JOIN club_offer_cpv ON club_offer.id = club_offer_id AND
club_offer_cpv.cpv_id = 66113000
Actually, that's not bad, I mean I might even prefer this last one.

Add index to your table relationships id's, then try to add one by one table using left outer joins

Yii relation generates GROUP BY clause in the query

I have User, Play and UserPlay model. Here is the relation defined in User model to calculate total time, the user has played game.
'playedhours'=>array(self::STAT, 'Play', 'UserPlay(user_id,play_id)',
'select'=>'SUM(duration)'),
Now i am trying to find duration sum with user id.
$playedHours = User::model()->findByPk($model->user_id)->playedhours)/3600;
This relation is taking much time to execute on large amount of data. Then is looked into the query generated by the relation.
SELECT SUM(duration) AS `s`, `UserPlay`.`user_id` AS `c0` FROM `Play` `t` INNER JOIN
`UserPlay` ON (`t`.`id`=`UserPlay`.`play_id`) GROUP BY `UserPlay`.`user_id` HAVING
(`UserPlay`.`user_id`=9);
GROUP BY on UserPlay.user_id is taking much time. As i don't need Group by clause here.
My question is, how to avoid GROUP BY clause from the above relation.

STAT relations are by definition aggregation queries, See Statistical Query.
You cannot remove GROUP BY here and make a meaningful query for aggregate data. SUM(), AVG(), etc are all aggregate functions see GROUP BY Functions, for a list of all aggregate functions supported by MYSQL.
Your problem is for the calculation you are doing a HAVING clause. This is not required as HAVING checks conditions after the aggregation takes place, which you can use to put conditions like for example SUM(duration) > 500 .
Basically what is happening is that you are grouping all the users separately first, then filtering for the user id you want. If you instead use a WHERE clause which will filter before not after then aggregation is for only the user you want then group it your query will be much faster.
Although Active Record is good at modelling data in an OOP fashion, it
actually degrades performance due to the fact that it needs to create
one or several objects to represent each row of query result. For data
intensive applications, using DAO or database APIs at lower level
could be a better choice
Therefore it is best if you change the relation to a model function querying the Db directly using the CommandBuilder or DAO API. Something like this
Class User extends CActiveRecord {
....
public function getPlayedhours(){
if(!isset($this->id)) // to prevent query running on a newly created object without a row loaded to it
return 0;
$played = Yii::app()->db->createCommand()
->select('SUM(duration)')
->from('play')
->join("user_play up","up.play_id = play.id")
->where("up.user_id =".$this->id)
->group("up.user_id")
->queryScalar();
if($played == null)
return 0;
else
return $played/3600 ;
}
....
}
If you query still is slow, try optimizing the indexes, implement cache mechanism, and use the explain command to figure out what is actually taking more time and more importantly why. If nothing is good enough, upgrade your hardware.

see many status or create a table to store all status

I have multiple tables with relationships.
sometimes I need to do a join just to check if status = true and the query is large and a little confusing ...
wanted to know how to approach this type of situation in large projects.
was thinking of creating a table with parent and status to group all conditions - in this case only need a simple query to check if the relationship status of this true or false.
like this:
select *
from table
where table.parent in (select id from tableB where status = 1)
or table.parent in (select id from tableC where status = 1)
or table.parent in (select id from tableD where status = 1)
this is a good approach?
never tested and do not know to what extent it can be the best solution
thank you

I am little confused. Do you want to redesign your data structure, or want to optimize your query?
Without clear specification I can't offer optimized data structure. Though here is some optimization suggestion based on some assumptions.
If your parent id do not overlapped between the tables(i.e. tableb, tablec, tabled do not have common id) You can move status field to your 'table' table.
If they share some id then previous would not work. then you can use Denormalization. Add status field to the 'table' table, and keep it up to date while any of the status changed.
If you like to keep your data structure then you can optimize your query by removing sub-queries and using join instead.
In most cases JOINs are faster than sub-queries and it is very rare for a sub-query to be faster.
In JOINs RDBMS can create an execution plan that is better for your query and can predict what data should be loaded to be processed and save time, unlike the sub-query where it will run all the queries and load all their data to do the processing.
The good thing in sub-queries is that they are more readable than JOINs: that's why most new SQL people prefer them; it is the easy way; but when it comes to performance, JOINS are better in most cases even though they are not hard to read too.

You really haven't given much information - or even asked a clear question :(
SUGGESTIONS:
1) Focus on your data design first
2) Make sure your design allows the querying whatever you need from the data. For example, if you need to check "status" by date, then make sure you have datetime columns.
3) "Optimizing" queries comes later in the game. Make sure your queries are correct first, worry about "optimization" later.
4) Tuning your database (for example, identifying and implementing indexes) is crucial, and should always be done in conjunction with 3)
'Hope that helps!
PS:
If you have a specfic question, please be sure to show some sample code.

Making an SQL query more efficient

I have a query that works, but it's taking at least 3 seconds to run so I think it can probably be faster. It's used to populate a list of new threads and show how many unread posts there are in each thread. I generate the query string before throwing it into $db->query_read(). In order to only grab results from valid forums, $ids is string with up to 50 values separated by commas.
The userthreadviews table has existed for 1 week and there are roughly 9,500 rows in it. I'm not sure if I need to set up a cron job to regularly clear out thread views more than a week old, or if I will be fine letting it grow.
Here's the query as it currently stands:
SELECT
`thread`.`title` AS 'r_title',
`thread`.`threadid` AS 'r_threadid',
`thread`.`forumid` AS 'r_forumid',
`thread`.`lastposter` AS 'r_lastposter',
`thread`.`lastposterid` AS 'r_lastposterid',
`forum`.`title` AS 'f_title',
`thread`.`replycount` AS 'r_replycount',
`thread`.`lastpost` AS 'r_lastpost',
`userthreadviews`.`replycount` AS 'u_replycount',
`userthreadviews`.`id` AS 'u_id',
`thread`.`postusername` AS 'r_postusername',
`thread`.`postuserid` AS 'r_postuserid'
FROM
`thread`
INNER JOIN
`forum`
ON (`thread`.`forumid` = `forum`.`forumid`)
LEFT JOIN
(`userthreadviews`)
ON (`thread`.`threadid` = `userthreadviews`.`threadid`
AND `userthreadviews`.`userid`=$userid)
WHERE
`thread`.`forumid` IN($ids)
AND `thread`.`visible`=1
AND `thread`.`lastpost`> time() - 604800
ORDER BY `thread`.`lastpost` DESC LIMIT 0, 30
An alternate query that joins the post table (to only show threads where user has posted) is actually twice as fast, so I think there's got to be something in here that could be changed to speed it up. Could someone provide some advice?
Edit: Sorry, I had put the EXPLAIN in front of the alternate query. Here is the correct output:
As Requested, here is the output generated by EXPLAIN SELECT:

Have a look at the mysql explain statement. It gives you a execution plan of your query.
Once you know the plan, you can check if you have got a index on the fields involved in the plan. If not, create them.
Perhaps the plan reveals details about how the query can be written in another way, such that the query will be more optimized.

To have no indexes on joins / where (used key = NULL on explain), this is the reason why your queries are slow. You should index them in such a way :
CREATE INDEX thread_forumid_index ON thread(forumid);
CREATE INDEX userthreadviews_forumid_index ON userthreadviews(forumid);
Documentation here

Try to index the table forumid if it is not indexed

Suggestions:
move the conditions from the WHERE clause to the JOIN clause
put the JOIN with the conditions before the other JOIN
make sure you have proper indexes and that they are being used in the query (create the ones you'll need... too much indexes can be as bad as too few)
Here is my suggestion for the query:
SELECT
`thread`.`title` AS 'r_title',
`thread`.`threadid` AS 'r_threadid',
`thread`.`forumid` AS 'r_forumid',
`thread`.`lastposter` AS 'r_lastposter',
`thread`.`lastposterid` AS 'r_lastposterid',
`forum`.`title` AS 'f_title',
`thread`.`replycount` AS 'r_replycount',
`thread`.`lastpost` AS 'r_lastpost',
`userthreadviews`.`replycount` AS 'u_replycount',
`userthreadviews`.`id` AS 'u_id',
`thread`.`postusername` AS 'r_postusername',
`thread`.`postuserid` AS 'r_postuserid'
FROM
`thread`
INNER JOIN (`forum`)
ON ((`thread`.`visible` = 1)
AND (`thread`.`lastpost` > $time)
AND (`thread`.`forumid` IN ($ids))
AND (`thread`.`forumid` = `forum`.`forumid`))
LEFT JOIN (`userthreadviews`)
ON ((`thread`.`threadid` = `userthreadviews`.`threadid`)
AND (`userthreadviews`.`userid` = $userid))
ORDER BY
`thread`.`lastpost` DESC
LIMIT
0, 30
These are good candidates to be indexed:
- `forum`.`forumid`
- `userthreadviews`.`threadid`
- `userthreadviews`.`userid`
- `thread`.`forumid`
- `thread`.`threadid`
- `thread`.`visible`
- `thread`.`lastpost`
It seems you already have lots of indexes... so, make sure you keep the ones you really need and remove the useless ones.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.