Doctrine2 Lazy vs Eager, or why it makes multiple queries - php

So I am using QueryBuilder in some of my projects, but others I need to create RAW SQL queries in order to increase performance, as I have more than a million rows with their relations...
What I found awful about QueryBuilder, is the fact that it creates several queries when you have relations, for example, I have a OneToMany relation from Product to Image and a ManyToOne on the inversed side.
My query has a pagination, so it is limited to LIMIT 10 OFFSET 0 and on. My Image entity has about 2.7 million rows, that's why I am using pagination, doing this simple query, which fetches Image i plus Product p because I need p.title I end up with 1 Query for my 10 images, and 10 queries for each Product for Image.
That's unneeded, It can be done with just 2 queries, one for Image and one for Product, using fetch="EAGER" that's what I get. But I need to put fetch="EXTRA_LAZY" inside Product mapping, otherwise I will get 11 queries again..
With just 10 images isn't hard, but when the user filters 500 images, response time gets higher and higher... That's why I've ended up doing RAW queries, best performance, no extra queries (just 1 query that contains everything) BUT unable to work with objects like QueryBuilder does, can't access to image.product.title inside twig to get the title, instead I need to do SELECT p.title AS product_title and call image.product_title etc
So I need to know why QueryBuilder is that sh*t when reading but so marvelous when persisting objects (easy, fast, clean...) and how can I work with huge DB's using QueryBuilder without loosing performance and without getting tons of extra unneeded queries.
An example query, is this one
$qb = $this->createQueryBuilder('i');
$qb->innerJoin('i.product', 'p');
$qb->where('i.X = Y');
return $qb->getQuery()->getResult()
Using $qb->select('i, p'); seems to use only one query, which it's runnable raw has an INNER JOIN (which is actually how it is supossed to work WITHOUT the $qb->select()) but performance is still lower than doing a RAW SQL query... RAW SQL = 500MS for a 10.000 rows query, using QB it's 1,100 MS. I know I wont use 10.000 rows, but there's a chance...
The question is still the same, what advantages and disadvantages, besides the object manipulation, which is gone with a RAW SQL. And when to use LAZY or EAGER and WHY, or why/when you don't need them.
All of this may end a discussion in my DevTeam once for all. As I'm a QB lover.

Did you do something like this:
SELECT i FROM AcmeBundle:Image i JOIN i.product p WHERE ...
?
That would explain numerous queries because Doctrine does not preserve fetched data.
Doing something like this tells Doctrine to actually preserve fetched data of both Image and Product:
SELECT i, p FROM AcmeBundle:Image i JOIN i.product p WHERE ...
Then, you would have no need neither for EAGER nor EAGER_LAZY.
I might have missed the point in your question. If I have, please correct me and I might be able to suggesting something else.
Edit:
$qb = $this->createQueryBuilder('i');
$qb->innerJoin('i.product', 'p');
$qb->addSelect('p'); // Very importang, hints Doctrine to preserve fetched Product
$qb->where('i.X = Y');
return $qb->getQuery()->getResult()
Or using PARTIAL:
$qb = $this->createQueryBuilder('i');
$qb->innerJoin('i.product', 'p');
$qb->select('PARTIAL i.{image_field1, image_field2}', 'PARTIAL p.{product_field1, product_field2}'); // Very importang, hints Doctrine to preserve fetched Product
$qb->where('i.X = Y');
return $qb->getQuery()->getResult()

Related

Doctrine second level result caching not working with deep fetch join query

I have the following query in a doctrine repository:
/**
* #return AbstractSurveyPage[]
*/
public function getResultCacheTestPages(): array
{
$select = $this->createQueryBuilder('page')
->select(['page', 'pageGroup', 'groupLabelText'])
->leftJoin('page.group', 'pageGroup')
->leftJoin('pageGroup.labelText', 'groupLabelText');
// Both useResultCache(true) and setCacheable(true) is needed
return $select->getQuery()->useResultCache(true)
->setCacheable(true)->getResult();
}
The query fetches page objects, their group objects and the group label eagerly.
However, I get the following error:
File /home/vagrant/web-projects/blueprint/vendor/doctrine/orm/lib/Doctrine/ORM/Cache/DefaultQueryCache.php:263
Message Undefined index: labelText
It seems that doctrine cannot find the labelText association of the group object. When I remove 'groupLabelText' from the select clause, the query and the second level result caching work fine (but the group label objects are not fetched eagerly anymore). When I disable caching with setCacheable(false), the query works fine.
It seems that doctrine is unable to cache results of fetch join queries with joins of more than one level.
Is there a way to get this working? Or should I not use deep fetch join queries, and use lazy fetching for the joined objects? Entity and association second level caching works fine for me, so maybe the performance hit is not that bad. Also, the code would become easier, since the fetch join queries are quite complicated (I use up to 25 joins and elements in the select clause in real code).

Correct way to handle loading doctrine entities with multiple associations

I'm currently building an eCommerce site using Symfony 3 that supports multiple languages, and have realised they way I've designed the Product entity will require joining multiple other entities on using DQL/the query builder to load up things like the translations, product reviews and discounts/special offers. but this means I am going to have a block of joins that are going to be the same in multiple repositories which seems wrong as that leads to having to hunt out all these blocks if we ever need to add or change a join to load in extra product data.
For example in my CartRepository's loadCart() function I have a DQL query like this:
SELECT c,i,p,pd,pt,ps FROM
AppBundle:Cart c
join c.items i
join i.product p
left join p.productDiscount pd
join p.productTranslation pt
left join p.productSpecial ps
where c.id = :id
I will end up with something similar in the SectionRepository when I'm showing the list of products on that page, what is the correct way to deal with this? Is there some place I can centrally define the list of entities needed to be loaded for the joined entity (Product in this case) to be complete. I realise I could just use lazy loading, but that would lead to a large amount of queries being run on pages like the section page (a section showing 40 products would need to run 121 queries with the above example instead of 1 if I use a properly joined query).
One approach (this is just off the top of my head, someone may have a better approach). You could reasonably easily have a centralised querybuilder function/service that would do that. The querybuilder is very nice for programattically building queries. The key difference would be the root entity and the filtering entity.
E.g. something like this. Note of course these would not all be in the same place (they might be across a few services, repositories etc), it's just an example of an approach to consider.
public function getCartBaseQuery($cartId, $joinAlias = 'o') {
$qb = $this->getEntityManager()->createQueryBuilder();
$qb->select($joinAlias)
->from('AppBundle:Cart', 'c')
->join('c.items', $joinAlias)
->where($qb->expr()->eq('c.id', ':cartId'))
->setParameter('cartId', $cartId);
return $qb;
}
public function addProductQueryToItem($qb, $alias) {
/** #var QueryBuilder $query */
$qb
->addSelect('p, pd, pt, ps')
->join($alias.'product', 'p')
->leftJoin('p.productDiscount', 'pd')
->join('p.productTranslation', 'pt')
->join('p.productSpecial', 'ps')
;
return $qb;
}
public function loadCart($cartId) {
$qbcart = $someServiceOrRepository->getCartBaseQuery($cartId);
$qbcart = $someServiceOrRepository->addProductQueryToItem($qbcart);
return $qbcart->getQuery()->getResult();
}
Like I said, just one possible approach, but hopefully it gives you some ideas and a start at solving the issue.
Note: If you religiously use the same join alias for the entity you attach your product data to you would not even have to specify it in the calls (but I would make it configurable myself).
There is no single correct answer to your question.
But if I have to make a suggestion, I'd say to take a look at CQRS (http://martinfowler.com/bliki/CQRS.html) which basically means you have a separated read model.
To make this as simple as possibile, let's say that you build a separate "extended_product" table where all data are already joined and de-normalized. This table may be populated at regular intervals with a background task, or by a command that gets triggered each time you update a product or related entity.
When you need to read products data, you query this table instead of the original one. Of course, nothing prevents you from having many different extended table with your data arranged in a separate way.
In some way it's a concept very similar to database "views", except that:
it is faster, because you query an actual table
since you create that table via code, you are not limited to a single SQL query to process data (think filters, aggregations, and so on)
I am aware this is not exactly an "answer", but hopefully it may give you some good ideas on how to fix your problem.

Yii relation generates GROUP BY clause in the query

I have User, Play and UserPlay model. Here is the relation defined in User model to calculate total time, the user has played game.
'playedhours'=>array(self::STAT, 'Play', 'UserPlay(user_id,play_id)',
'select'=>'SUM(duration)'),
Now i am trying to find duration sum with user id.
$playedHours = User::model()->findByPk($model->user_id)->playedhours)/3600;
This relation is taking much time to execute on large amount of data. Then is looked into the query generated by the relation.
SELECT SUM(duration) AS `s`, `UserPlay`.`user_id` AS `c0` FROM `Play` `t` INNER JOIN
`UserPlay` ON (`t`.`id`=`UserPlay`.`play_id`) GROUP BY `UserPlay`.`user_id` HAVING
(`UserPlay`.`user_id`=9);
GROUP BY on UserPlay.user_id is taking much time. As i don't need Group by clause here.
My question is, how to avoid GROUP BY clause from the above relation.
STAT relations are by definition aggregation queries, See Statistical Query.
You cannot remove GROUP BY here and make a meaningful query for aggregate data. SUM(), AVG(), etc are all aggregate functions see GROUP BY Functions, for a list of all aggregate functions supported by MYSQL.
Your problem is for the calculation you are doing a HAVING clause. This is not required as HAVING checks conditions after the aggregation takes place, which you can use to put conditions like for example SUM(duration) > 500 .
Basically what is happening is that you are grouping all the users separately first, then filtering for the user id you want. If you instead use a WHERE clause which will filter before not after then aggregation is for only the user you want then group it your query will be much faster.
Although Active Record is good at modelling data in an OOP fashion, it
actually degrades performance due to the fact that it needs to create
one or several objects to represent each row of query result. For data
intensive applications, using DAO or database APIs at lower level
could be a better choice
Therefore it is best if you change the relation to a model function querying the Db directly using the CommandBuilder or DAO API. Something like this
Class User extends CActiveRecord {
....
public function getPlayedhours(){
if(!isset($this->id)) // to prevent query running on a newly created object without a row loaded to it
return 0;
$played = Yii::app()->db->createCommand()
->select('SUM(duration)')
->from('play')
->join("user_play up","up.play_id = play.id")
->where("up.user_id =".$this->id)
->group("up.user_id")
->queryScalar();
if($played == null)
return 0;
else
return $played/3600 ;
}
....
}
If you query still is slow, try optimizing the indexes, implement cache mechanism, and use the explain command to figure out what is actually taking more time and more importantly why. If nothing is good enough, upgrade your hardware.

One SELECT (to fetch/ rule them all!) then handle the entire collection using Ruby array functions?

I'm wondering, when looking through a set of rows and for each row fetching another tables set of rows. For example, looping through a series of categories and for each category, fetching all their news articles. Perhaps to display on a single page or such. It seems like a lot of SELECT queries - 1 to get all categories, and one for each category (to get it's articles). So, my question is - is it quicker to simple do two fetches at the start:
categories = Category.all
articles = Articles.all
...and then just use select() or where() on articles by category id to only take those from the articles array? Replacing multiple SELECT queries with multiple array functions, which is quicker? I imagine also that each app, depending on number of rows, may vary. I would be interested to hear what people think, or any links that clarify this as I ddin't find much on the matter myself.
My code example above is Ruby on Rails but this question might actually apply to any given language. I also use PHP from time to time.
It depends on what you want to do with your data. You could try eager loading.
categories = Category.includes(:articles)
Here's the documentation. http://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations
I think you're describing what's called the N+1 problem (I'm new to this too). Here's another stack overflow question that addresses this issue generally: What is SELECT N+1?
n+1 is the worst, especially when you think about 10k or 10M articles like timpone pointed out. For 10M articles you'll be hitting the DB 10,000,001 times for a single request (hence the name n + 1 problem). Avoid this. Always. Anything is better than this.
If Category has a has_many relation to Article (and Article has a belongs_to relation to Category) you could use #includes to "pre-fetch" the association like so:
categories = Category.includes(:articles)
This will two two queries, one for the Category and one for the Article. You can write it out in two explicit select/where statements but I think doing it this way is semantically clearer. If you want to retrieve all the categories and then for each category get all the articles. You can write code like this (in Ruby):
categories.articles.each do |article|
# do stuff...
end
and it's immediately clear that you mean "all the articles for this category instance".

Doctrine 2.1: Getting and assigning COUNT(t.id) from a subquery?

I have two entities in Doctrine 2.1: Category and Site each category has many sites and each site has a parent category.
I would like to make a single update query (in DQL) which will update a field called count of the Category entity with the number of related sites.
So in SQL I would do something like this:
UPDATE categories c SET c.count = (SELECT COUNT(s.id) FROM sites s WHERE s.category_id = c.id);
This would work beautifuly, in DQL it might something like this:
UPDATE PackageNameBundle:Category c SET c.count = (SELECT COUNT(s.id) FROM PackageNameBundle:Site s WHERE s.category = c)
Such attempt raises [Syntax Error] line 0, col 61: Error: Expected Literal, got 'SELECT'.
Subqueries DO work in DQL, but the problem here (as far as I see it) is that Doctrine cannot assign the returned value from the subquery, to the c.count. This is understandable since I might fetch more than 1 field in the subquery and even more than one row. It magicaly works in MySQL since it sees one row, one field and for convenience returns a single integer value. Doctrine on the other hand has to be object oriented and has to work with different engines where such convertions might not be supported.
Finally, my question is:
What is the best way to do this in Doctrine, should I go with Native SQL or it can be done with DQL and how?
Thanks in advance!
EDIT: I just found this quote in the DQL Docs:
References to related entities are only possible in the WHERE clause and using sub-selects.
So, I guess assigning anything but a scalar value is impossible?
The main question remains though..
You can use native sql queries in Doctrine also, for that kind of specific queries. DQL is powerful in its own way, but it's also limited due to performance constraints. Using native sql queries and mapping the results will achieve the same thing, and there is no disadvantage in doing that.
The documentation explains it in detail.

Categories