The first sentence of the Eager Loading section from the Laravel docs is:
When accessing Eloquent relationships as properties, the relationship
data is "lazy loaded". This means the relationship data is not
actually loaded until you first access the property.
In the last paragraph of this section it is stated:
To load a relationship only when it has not already been loaded, use
the loadMissing method:
public function format(Book $book)
{
$book->loadMissing('author');
return [
'name' => $book->name,
'author' => $book->author->name
];
}
But I don't see the purpose of $book->loadMissing('author'). Is it doing anything here?
What would be the difference if I just remove this line? According to the first sentence, the author in $book->author->name would be lazy-loaded anyway, right?
Very good question; there are subtle differences which are not getting reflected instantly by reading through the documentation.
You are comparing "Lazy Eager Loading" using loadMissing() to "Lazy Loading" using magic properties on the model.
The only difference, as the name suggests, is that:
"Lazy loading" only happens upon the relation usage.
"Eager lazy loading" can happen before the usage.
So, practically, there's no difference unless you want to explicitly load the relation before its usage.
It also worths a note that both load and loadMissing methods give you the opportunity to customize the relation loading logic by passing a closure which is not an option when using magic properties.
$book->loadMissing(['author' => function (Builder $query) {
$query->where('approved', true);
}]);
Which translates to "Load missing approved author if not already loaded" which is not achievable using $book->author unless you define an approvedAuthor relation on the model (which is a better practice, though).
To answer your question directly; yeah, there won't be any difference if you remove:
$book->loadMissing('author');
in that particular example as it's being used right after the loading. However, there might be few use cases where one wants to load the relation before its being used.
So, to overview how relation loading methods work:
Eager loading
Through the usage of with() you can "eager load" relationships at the time you query the parent model:
$book = Book::with('author')->find($id);
Lazy eager loading
To eager load a relationship after the parent model has already been retrieved:
$book->load('author');
Which also might be used in a way to only eager load missing ones:
$book->loadMissing('author');
Contrary to the load() method, loadMissing() method filters through the given relations and lazily "eager" loads them only if not already loaded.
Through accepting closures, both methods support custom relation loading logics.
Lazy loading
Lazy loading which happens through the usage of magic properties, is there for developer's convenience. It loads the relation upon its usage, so that you won't be needing to load it beforehand.
#rzb has mentioned a very good point in his answer as well. Have a look.
I believe the accepted answer is missing one important fact that may mislead some: you cannot run loadMissing($relation) on a collection.
This is important because most use cases of lazy eager loading relationships are when you already have a collection and you don't want to commit the n+1 sin - i.e. unnecessarily hit the DB multiple times in a loop.
So while you can use load($relation) on a collection, if you only want to do it if the relationships haven't already been loaded before, you're out of luck.
its mean do not repeat the query
to be clear about it
if you use : load() 2 times the query will repeat even if the relationships exists
while : loadMissing() is check if the relationship has loaded . it will not repeat the query . beacuse it has already loaded before by [ load() or with() ] = egear load
DB::enableQueryLog();
$user = User::find(1);
// see the query
$user->load('posts');
$user->load('posts');
$user->loadMissing('posts'); // put it on top to see the difference
dd(DB::getQueryLog());
that's what i think its purpose
Very useful for APIs
The use of with, loadMissing or load can has more importance when use it in API environment, where the results are passed to json. On this case, lazy loading hasn't any effect.
Lets say you have multiple relationships.
book belongs to an author and book belongs to a publisher.
so first you might load it with one relationship.
$books->load('author');
and later on certain condition you want to load another relationship into it.
$book->loadMissing('publisher');
But I don't see the purpose of $book->loadMissing('author');. Is it
doing anything here? What would be the difference if I just remove
this line? According to the first sentence, the author in
$book->author->name would be lazy-loaded anyway, right?
Suppose say
public function format(Book $book)
{
//book will not have the author relationship yet
return [
'name' => $book->name, //book will not have the author relationship loaded yet
'author' => $book->author->name //book will now have the author relationship
];
}
Difference between above and below code is when will the relationship be loaded and how much control you have over the property.
public function format(Book $book)
{
$book->loadMissing('author'); // book will now have the author relationship
return [
'name' => $book->name, // book have the author relationship loaded
'author' => $book->author->name // book have the author relationship loaded
];
}
Both answers here have covered pretty well what the technical difference is, so I'd refer you to them first. But the "why" isn't very evident.
Something I find myself preaching a lot lately is that Eloquent is really good at giving you enough rope to hang yourself with. By abstracting the developer so far away from the actual SQL queries being produced, especially with dynamic properties, it's easy to forget when your database hits are hurting your performance more than they need to.
Here's the thing. One query using an IN() statement on 1000 values takes about the same execution time as one query running on one value. SQL is really good at what it does- the performance hit usually comes with opening and closing the DB connection. It's a bit like going grocery shopping by way of making one trip to the market for each item, as opposed to getting it all done at once. Eager-loads use the IN statements.
Lazy-loading is good for instances where you're handling too much data for your server's RAM to cope with, and in my opinion, not good for much else. It handles only one entry at any given moment. But it's reconnecting each time. I can't tell you the number of times I've seen Transformer classes, which should be responsible only for reformatting data as opposed to retrieving it, leveraging those dynamic properties and not realizing that the data wasn't already there. I've seen improvements as dramatic reducing execution time from 30 minutes to 30 seconds just by adding a single line of eager-loading prior to the Transformer being called.
(By the way, batching might be considered the happy-medium, and Eloquent's chunk() method offers that too.)
To answer your question a little more directly; if you're dealing with an instance where it's a one-to-one relationship, and it's going to be used in only one place, then functionally there is no difference between load, loadMissing, or a lazy-loading dynamic property. But if you have a many-to-many, it may be worthwhile to gather up all that data all at once. One book can have many co-authors. One author can write many books. And if you're about to loop through large sets of either, go ahead and make the most of your trip to the market before you start cooking.
Related
I have two major questions, both related to eloquent's eager loading.
First question:
Since we already eager loaded the relationship, we have the loaded objects right there. For example,
Tag::with('posts');
will eager load all the posts for the tags. Then if I want to count the number of posts for the first tag, would it be better to use Tag::first()->posts()->count() or Tag::first()->posts->count()?
I know normally the answer would be the first one, because we dont want to load all the collections and instead we want to just do it using one query. However, in this case I assume we already loaded the collections, so it's like we already paid the cost, and therefore I feel like the second one would be better in this case?
Side Note:
I understand we can use withCount('posts') in this case, but my point here is to understand when to use relationship count vs collection count.
Second question:
I am also wondering if we already used Tag::with('posts'), then if I want to count the posts for the first tag, would it be faster using Tag::first()->posts()->count / Tag::first( )->posts->count or using Tag::withCount('posts')->with('posts')
My logistics behind this is that withCount() is fast if we only need the count, but what if we already loaded the relationship and at the same time also want to get the count?
Counting is costly depending on how large your database is. One common approach is to use caching. With large databases, faster almost always means using cache, but caching will not be 100% accurate since cache should be updated at set times.
And, if you are going to run count on a large collection, you are bound to face memory issues.
For the first one, you could learn a lot using a debugger to analyse what's happening using each method. Do also take note of laravel's loadCount() method.
The structure that I've got is as follows
User table
id PK
name
email
username
password
UserHierarchy table
user_parent_id FK of User
user_child_id FK of User
(Composite primary key)
I've written these 2 relationships, in order to retrieve who is the father of a user and who is a child of a user
public function parent()
{
return $this->hasManyThrough(\App\Models\User::class, \App\Models\UserHierarchy::class, 'user_child_id', 'id', 'id', 'user_parent_id');
}
public function children()
{
return $this->hasManyThrough(\App\Models\User::class, \App\Models\UserHierarchy::class, 'user_parent_id', 'id', 'id', 'user_child_id');
}
In order to get all the children, grandchildren and so on, I've developed this adicional relationship, that takes use of eager loading
public function childrenRecursive()
{
return $this->children()->with('childrenRecursive.children');
}
So far so good, when I find a User with an id, I can get all the downwards tree by using childrenRecursive. What I'm trying to achieve now is to re-use these relationships to filter a certain set of results, and by that I mean: When it's a certain User (for example id 1), I want a collection of Users that belong in his downward tree (children recursive) and his first direct parents as well.
$model->where(function ($advancedWhere) use ($id) {
$advancedWhere->whereHas('parent', function ($advancedWhereHas) use ($filterValue) {
$advancedWhereHas->orWhere('user_child_id', $id);
//I want all users that are recorded as his parents
})->whereHas('childrenRecursive', function ($advancedWhereHas) use ($id) {
// Missing code, I want all users that are recorded as his children and downwards
})->get();
This is the complete tree I'm testing and the result produced above (if I add a similar orWhere on the childrenRecursive) is that it returns every User that has a Parent-Child relationship. E.g User 2 should return every number except 11 and 12, and it's returning every number except 11 (because 11 is not a child of anyone)
I'm going to answer your question first, but in the second half of the answer I have proposed an alternative which I strongly suggest adopting.
MySQL (unlike, incidentally, Microsoft SQL) doesn't have an option to write recursive queries. Accordingly, there is no good Laravel relationship to model this.
As such, there is no way for Laravel to do it other than naively, which, if you have a complex tree, is going to lead to many queries.
Essentially when you load your parent, you will only have access to its children (as a relationship collection). Then you would foreach through its children (and then their children, etc, recursively) to generate the whole tree. Each time you do this, it performs new queries for the child and its children. This is essentially what you're currently doing, and you will find that as your data set grows it is going to start becoming very slow. In the end, this provides you with a data structure on which you can apply your filters and conditions in code. You will not be able to achieve this in a single query.
If you are writing to the db a lot, i.e. adding lots of new children but rarely reading the results, then this may be your best solution.
(Edit: abr's comment below linked me to the release notes for MySQL 8 which does have this functionality. My initial response was based on MySQL 5.7. However, I'm not aware of Laravel/Eloquent having a canonical relationship solution employing this yet. Furthermore I have previously used this functionality in MSSQL and nested sets are a better solution IMO.
Furthermore, Laravel isn't necessarily coupled to MySQL - it just often is the db of choice. It will therefore probably never use such a specific solution to avoid such tight coupling.)
However most hierarchical structure read more than they write, in which case this is going to start stressing your server out considerably.
If this is the case, I would advise looking into:
https://en.wikipedia.org/wiki/Nested_set_model
We use https://github.com/lazychaser/laravel-nestedset which is an implementation of the above, and it works very well for us.
It is worth mentioning that it can be slow and memory intensive when we redefine the whole tree (we have around 20,000 parent-child relationships), but this only has to happen when we've made an error in the hierarchy that can't be unpicked manually and this is rare (we haven't done it in 6 months). Again, if you think you may have to do that regularly, this may not be the best option for you.
Is it possible to have an eloquent query builder return StdClass rather then Model?
For example User::where('age', '>', 34)->get() returns a Collection of User models.
Whereas DB::table('users')->where('age', '>', 34)->get() returns a Collection of StdClass objects. Much faster.
Therefore:
Is it possible to prevent hydrating eloquent models and return StdClass objects as a database query builder would, but still leverage the usefulness of an eloquent query builder syntax?
Yes, is possible using the 'getQuery' or 'toBase' method. For example:
User::where('age', '>', 34)->getQuery()->get();
or
User::where('age', '>', 34)->toBase()->get();
In my opinion,
Hydrating models rarely affects application performance
There are so many ORMs out there and if you look at any framework, these questions keep popping up - but the truth, as I've come to realize, is that ORMs hardly affect performance.
More often than not the culprits are the queries themselves and not
the ORM
Let me give you a few examples of why Eloquent models may perhaps be slower than DB facade queries:
1. Model events:
When you have model events (such as saving, creating, etc.) in your models, they sometimes slow down processing. Not to say that events should be avoided, you just need to be careful when and when not to use them
2. Loading Relationships:
Countless times have I seen folks load relationships using appends lists provided by Eloquent and sometimes models have 5-10 relationships. That's 5-10 joins each time you fire an Eloquent query! If you compare that with a DB facade query, it would definitely be faster. But then again, who's the real culprit? Not the ORM, it's the queries (with the extra joins!)
As an example, not so long someone asked a question on this and he/she wondered why an Eloquent query was slower than a raw one. Check it out!
3. Not understanding what triggers an Eloquent query
This is by far the most prominent reason why people think ORMs are slower. They usually (not always) don't understand what triggers a query.
As an example, lets say you want to update a products table and set the price of product #25 to $250.
Perhaps, you write in your controller, the following:
$id = 25;
$product = Product::findOrFail($id);
$product->price = 250;
$product->save();
Then, your colleague says hey, this is super slow. Try using DB facade. So you write:
$id = 25;
DB::table('products')->where('product_id', $id)->update(['price' => 250]);
And boom! It's faster. Again, the culprit isn't the ORM. It's the query. The one above is actually 2 queries, the findOrFail triggers a select * query and the save triggers an update query.
You can and should write this as a single query using Eloquent ORM like so:
Product::where('product_id', 25)->update(['price' => 250]);
Some Good Practices for Query Optimization
Have your database do most of the work instead of PHP: E.g. instead of iterating over Eloquent collections, perhaps frame your DB query in such a manner that the database does the work for you.
Mass Updates Over Single Updates: Pretty obvious. Avoid saving models in for loops, yuk!
For heavy queries, use transactions: DB transactions avoid re-indexing on every insert. If you really need to call say thousands of inserts/update queries in a single function call, wrap them into a transaction
Last but not the least, when in doubt check your query: If you're ever ever ever in doubt, that perhaps the ORM is the real culprit - think again! Check your query, try and optimize it.
If the ORM is slowing things down, use obervers or the Laravel debugbar to compare the queries with and without the ORM. More often than not, you'll find that the queries are different, and the difference isnt in hydration but the actual queries themselves!
It is inefficient to have small models and load them. What ->toBase() does is lowering the inefficiency. Memory inefficiency for 4-5 model persisted attributes with average length between 5 and 10 when loading model is 90+ percent. What is even more inefficient is having many and small models and as definition of hell - when there is a lot of traffic on them. Then you should think of another persistence design for that data. Wisely choose when model is the right home for a piece of data.
Let's say I have a AuthorsTable with a defined "belongs to many" association with the Articles table defined like so:
// In the initialize function of the AuthorsTable class.
$this->belongsToMany('Articles',
['joinTable' => 'authors_articles']
);
(I don't think that the nature of the join is relevant to the question, but just in the interest of giving full context.)
And now, a I have an $author entity that was passed to my function that does not have the associated data loaded with it (i.e., it was created using something like $author = $authorsTable->get(19);, so it only has the information in the authors table, not from the articles table.
Is there some kind of entity function in which I can load the associated data, i.e., the articles data after the entity has already been created?
chriss's answer suggests using loadInto for this. For CakePHP 3, the PHP you need within any Author Entity method is:
TableRegistry::get($this->source())->loadInto($this, ['Articles']);
The same code from within an AuthorsTable method:
$this->loadInto($author, ['Articles']);
It's a little bit over and I do not know if it's still important to you, but maybe for everyone else.
You can either use the model's loadInto (https://book.cakephp.org/3.0/en/orm/retrieving-data-and-resultsets.html#loading-additional-associations) or you can just use the Lazy Loading Plugin, mentioned in the docs (https://book.cakephp.org/3.0/en/orm/entities.html#lazy-loading-associations)
You can find the plugin here:
https://github.com/jeremyharris/cakephp-lazyload
The plugin works with both methods! If you prefer Eager Loading, you can use contain (normaly faster). Otherwise, the associations will be loaded "on demand" (saving memory).
Personally, I prefer Eager Loading but use the Lazy Loading Plugin if another member of the team (out of ignorance) forgot to load the Associations.
Perhaps this is a question with a trivial answer but nevertheless it is driving me nuts for a couple of days so i would like to hear an answer. I'm recently looking up a lot of information related to building a custom datamapper for my own project (and not using an ORM) and read several thread on stackoverflow or other websites.
It seems very convincing to me to have AuthorCollection objects, which are basically only a container of Author instances or BookCollection objects, which hold multiple Book instances. But why would one need a mapper for the single Author object? All fetch criterias i can think of (except the one asking for the object with a specified BookID or AuthorID) will return multiple Book or Author instances hence BookCollection or AuthorCollection instances. So why bother with a mapper for the single objects, if the one for the appropriate collection is more general and you don't have to be sure that your criteria will only return one result?
Thanks in advance for your help.
Short answer
You don't need to bother creating two mappers for Author and AuthorCollection. If your program doesn't need an AuthorMapper and an AuthorCollectionMapper in order to work smoothly and have a clean source, by all means, do what you're most comfortable with.
Note: Choosing this route means you should be extra careful looking out for SRP violations.
Long(er) answer
It all depends on what you're trying to do. For the sake of this post, let's call AuthorMapper an item data mapper and AuthorCollectionMapper a collection data mapper.
Typically, item mappers won't be as sophisticated as their collection mappers. Item mappers will normally only fetch by a primary key and therefore limit the results, making the mapper clean and uncluttered by additional collection-specific things.
One main part of these "collection-specific things" I bring up is conditions1 and how they're implemented into queries. Often within collection mappers you'll probably have more advanced, longer, and tedious queries than what would normally be inside an item data mapper. Though entirely possible to combine your average item data mapper query (SELECT ... WHERE id = :id) with a complicated collection mapper query without using a smelly condition2, it gets more complicated and still bothers the database to execute a lengthy query when all it needed was a simple, generic one.
Additionally, though you pointed out that with an item mapper we really only fetch by a primary key, it usually turns out to be radically simpler using an item mapper for other things. An item mapper's save() and remove() methods can handle (with the right implementation) the job better than attempting to use a collection mapper to save/remove items. And, along with this point, it also becomes apparent that at times throughout using a collection mappers' save() and remove() method, a collection mapper may want to utilize item mapper methods.
In response to your question below, there may be numerous times you may want to set conditions in deleting a collection of rows from the database. For example, you may have a spam flag that, when set, hides the post but self-destructs in thirty days. I'm that case you'd most likely have a condition for the spam flag and one for the time range. Another might be deleting all the comments under an answer thirty days after an answer was deleted. I mention thirty days because it's wise to at least keep this data for a little while in case someone should want their comment or it turns out the row with a spam flag isn't actually spam.
1. Condition here means a property set on the collection instance which the collection mapper's query knows how to handle. If you haven't already, check out #tereško's answer here.
2. This condition is different and refers to the "evil if" people speak of. If you don't understand their nefariousness, I'd suggest watching some Clean Code Talks. This one specifically, but all are great.