Prefetching data vs using ActiveRecord methods in a loop

Prefetching data vs using ActiveRecord methods in a loop - php

In my MVC web app, I'm finding myself doing a lot of actions with ActiveRecords where I'll fetch a specific subset of Products from the database (for a search query, say), and then loop through again to display each one -- but to display each one requires several more trips to the database, to fetch things like price, who supplies them, and various other pieces of metadata. To calculate each of these pieces of metadata isn't very simple; it's not really something that could be achieved with a simple JOIN. However, it WOULD be possible (for most of these cases anyway) to batch the required database calls and do them all at once before the loop, and then within the loop refer to those pre-fetched data to do the various calculations.
Just as an example of the type of thing -- in a search, I might want to know what regions the product is provided by. In the database I have various rows which represent a particular supplier's stock of that item, and I can look up all the different suppliers which supply that item, and then get all the regions supplied by those suppliers. That fits nicely into one query, but it would start getting a bit complex to join into the original product search (wouldn't it?).
I have two questions:
does anyone else do something similar to this, and does it sound like a good way to handle the problem? Or does it sound more like a design problem (perhaps the application has grown out of ActiveRecord's usefulness, or perhaps the models need to be split up and combined in different ways, for instance).
If I do pre-fetch a bunch of different things I think I'll use inside the loop, I'm having a hard time deciding what would be the best way to pass the appropriate data back to the model. At the moment I'm using a static method on the model to fetch all the data I need at the start of the array, like fetchRegionsForProductIds(array $ids) and so forth; these methods return an array keyed by the ID of the product, so when I'm working inside the loop I can get the regions for the current product and pass them as a parameter to the model method that needs them. But that seems kind of hacky and laborious to me. So, can anyone tell me if there is just some really obvious and beautiful design pattern I'm missing which could totally resolve this for me, or if it's just a bit of a complex problem that needs a kind of ugly complex solution?
Small update: I wonder if using a datamapper class would put me on the right track? Is there a common way of implementing a data mapper so that it can be told to do large batch queries up front, store that information in an array, and then drip feed it out to the records as they request it?
I really hope this question makes sense; I've done the best I can to make it clear, and will happily add more detail if someone thinks they can have a go at it!

Related

Mvc and only selecting fields needed

I cant seem to find an acceptable answer to this.
There are two big things I keep seeing:
1) Don't execute queries in the controller. That is the responsibility of business or data.
2) Only select the columns that you need in a query.
My problem is that these two things kind of butt heads since what is displayed in the UI is really what determines what columns need to be queried. This in turn leads to the obvious solution of running the query in the controller, which you aren't supposed to do. Any documentation I have found googling, etc. seems to conveniently ignore this topic and pretend it isn't an issue.
Doing it in the business layer
Now if I take it the other way and query everything in the business layer then I implicitly am making all data access closely reflect the ui layer. This is more a problem with naming of query functions and classes than anything I think.
Take for example an application that has several views for displaying different info about a customer. The natural thing to do would be to name these data transfer classes the same as the view that needs them. But, the business or service layer has no knowledge of the ui layer and therefore any one of these data transfer classes could really be reused for ANY view without breaking any architecture rules. So then, what do I name all of these variations of, say "Customer", where one selects first name and last name, another might select last name and email, or first name and city, and so on. You can only name so many classes "CustomerSummary".
Entity Framework and IQueryable is great. But, what about everything else?
I understand that in entity framework I can have a data layer pass back an IQuerable whose execution is deferred and then just tell that IQueryable what fields I want. That is great. It seems to solve the problem. For .NET. The problem is, I also do PHP development. And pretty much all of the ORMs for php are designed in a way that totally defeat the purpose of using an ORM at all. And even those dont have the same ability as EF / IQueryable. So I am back to the same problem without a solution again in PHP.
Wrapping it up
So, my overall question is how do I get only the fields I need without totally stomping on all the rules of an ntier architecture? And without creating a data layer that inevitably has to be designed to reflect the layout of the UI layer?

And pretty much all of the ORMs for php are designed in a way that totally defeat the purpose of using an ORM at all.
The Doctrine PHP ORM offers lazy loading down to the property / field level. You can have everything done through proxies that will only query the database as needed. In my experience letting the ORM load the whole object once is preferable 90%+ of the time. Otherwise if you're not careful you will end up with multiple queries to the database for the same records. The extra DB chatter isn't worthwhile unless your data model is messy and your rows are very long.
Keep in mind a good ORM will also offer a built-in caching layer. Populating a whole object once and caching it is easier and more extensible then having your code keep track of which fields you need to query in various places.
So my answer is don't go nuts trying to only query the fields you need when using an ORM. If you are writing your queries by hand just in the places you need them, then only query the fields you need. But since you are talking good architectural patterns I assume you're not doing this.
Of course there are exceptions, like querying large data sets for reporting or migrations. These will require unique optimizations.

Questions
1) Don't execute queries in the controller. That is the responsibility of business or data.
How you design your application is up to you. That being said, it's always best to consider best patterns and practices. The way I design my controllers is that I pass in the data layer(IRepository) through constructor and inject that at run time.
public MyController(IRepository repo)
To query my code I simply call
repository.Where(x=> x.Prop == "whatever")
Using IQueryable creates the leaky abstraction problem. Although, it may not be a big deal but you have to be careful and mindful of how you are using your objects especially if they contain relational data. Once you query your data layer you would construct your view model in your controller action with the appropriate data required for your view.
public ActionResult MyAction(){
var data = _repository.Single(x => x.Id == 1);
var vm = new MyActionViewModel {
Name = data.Name,
Age = data.Age
};
return View();
}
If I had any queries that where complex I would create a business layer to include that logic. This would include enforcing business rules etc. In my business layer I would pass in the repository and use that.
2) Only select the columns that you need in a query.
With ORMs you usually pass back the whole object. After that you can construct your view model to include only the data you need.
My suggestion to your php problem is maybe to set up a web api for your data. It would return json data that you can then parse in whatever language you need.
Hope this helps.

The way I do it is as follows:
Have a domain object (entity, business object .. things with the same name) for Entities\Customer, that has all fields and associated logic for all of the data, that a complete instance would have. But for persistence create two separate data mappers:
Mappers\Customer for handling all of the data
Mappers\CustomerSummary for only important parts
If you only need to get customers name and phone number, you use the "summary mapper", but, when you need to examine user's profile, you have the "all data mapper". And the same separation can be really useful, when updating data too. Especially, if your "full customer" get populated from multiple tables.
// code from a method of some service layer class
$customer = new \Model\Entities\Customer;
$customer->setId($someID);
$mapper = new \Model\Mappers\CustomerSummary($this->db);
if ($needEverything) {
$mapper = new \Model\Mappers\Customer($this->db);
}
$mapper->fetch($customer);
As for, what goes where, you probably might want to read this old post.

Object Orientated Design with Databases and scalability/optimisation using PHP and mySQL

I'm currently at an impasse in reguards to the structural design of my website. At the moment I'm using objects to simplify the structure of my site (I have a person object, a party object, a position object, etc...) and in theory each of these is a row from it's respective table in the database.
Now from what I've learnt, OO Design is good for keeping things simple and easy to use/implement, which I agree with - it makes my code look so much cleaner and easier to maintain, but what I'm confused about is how I go about linking my objects to the database.
Let's say there is a person page. I create a person object, which equals one mysql query (which is reasonable), but then that person might have multiple positions which I need to fetch and display on a single page.
What I am currently doing is using a method called getPositions from the person object which gets the data from mysql and creates a separate position object for each row, passing in the data as an array. That keeps the queries down to a minimum (2 to a page) but it seems like a horrible implementation and to me, breaks the rules of object orientated design (should I want to change a mysql row, I'd need to change it in multiple places) but the alternative is worse.
In this case the alternative is just getting the ID's that I need and then creating separate positions, passing in the ID which then goes on to getting the row from the database in the constructor. If you have 20 positions per page, it can quickly add up and I've read about how much Wordpress is criticised for it's high number of queries per page and it's CPU usage. The other thing I'll need to consider in this case is sorting, and doing it this way means I'll need to sort the data using PHP, which surely can't be as efficient as natively doing it in mysql.
Of course, pages will be (and can be) cached, but to me, this seems almost like cheating for poorly built applications. In this case, what is the correct solution?

The way you're doing it now is at least on the right track. Having an array in the parent object with references to the children is basically how the data is represented in the database.
I'm not completely sure from your question if you're storing the children as references in the parent's array, but you should be and that's how PHP should store them by default. If you also use a singleton pattern for your objects that are pulled from the database, you should never need to modify multiple objects to change one row as you suggest in your question.
You should probably also create multiple constructors for your objects (using static methods that return new instances) so you can create them from their ID and have them pull the data or just create them from data you already have. The latter case would be used when you're creating children; you can have the parent pull all of the data for its children and create all of them using only one query. Getting a child from its ID will probably be used somewhere else so its good just to have if its needed.
For sorting, you could create additional private (or public if you want) arrays that have the children sorted in a particular way with references to the same objects the main array references.

How to query for multiple types of an object (using a key-value table) and grouping the results together as a complete object

I hope I asked the question properly. I have a table of objects grouped by object_id. They are stored as a key / value. I thought this would be simple but I cannot find a solution anywhere. I'm trying to get the most efficient method of querying against this table to return a full object based on multiple meta_name values. Here's the table structure:
Here's the code I have so far, which works great to query one value:
SELECT data2.object_id,data2.object, data2.meta_name, data2.value_string, data2.value_text FROM meta_data AS data1
LEFT JOIN meta_data AS data2 ON(data1.object_id = data2.object_id)
data1.object="domain"
AND data1.meta_name = "category"
AND data1.value_string = "programmer"
This gives me the following results. This is great for a single taxonomy (domain in category programmer).
The problem comes when I want to query for all domains with category programmer AND color red AND possibly other meta_name = value_strings. I can find no solution for this outside of making multiple queries from PHP (which I want to avoid for obvious performance reasons).
I need to point out that objects will be created on the fly, and without a specific schema (which is the point of having this structure to begin with) so I cannot hard code and assume anything about an object (Objects may have more meta properties defined to them from the admin panel at any given time).
Again, I hope I am asking this question right, since I have been completely unlucky in finding a solution by searching online for the last 3 days.
Thank you so much ahead of time to the MySQL pro that can help me with this!

In situations like this solutions typically query all records to avoid multiple queries and then stitch data objects together to provide the desired format. Then you can develop simple find() methods on those objects to further filter the results (e.g. using array functions)
If you're interested in exact implementation, I encourage you to look at WordPress - you noted taxonomies. As an open source project you can review their code for an example of how this is done. Take a look at the Taxonomies API as well as Meta API.

Database model and performance

I'm using a self constructed database model. This model is constructed for a webshop application. this is how it looks like:
First of all I have a table for my products. This contains only general data like id and articlenr, for all of the product attributes (like name, price,etc) I have made seperate tables per type, so I have the following tables :
product_att_varchar
product_att_decimal
product_att_int
product_att_select
product_att_text
product_att_date
these tables are related by a relational table procuct_att_relational
My problem is the performance of this structure, if I want all the attributes of a specific product if have to use so much joins that it will slow down very much.
Does anyone have a solution for this???
Thanks

This model is called EAV (entity-attribute-value) and has its drawbacks and benefits.
Benefits are that it's very flexible and can be extended easily. It may be useful if you have very large number of very sparse attributes, the attributes cannot be predicted at design time (say, user-provided), or the attributes that are rarely used.
The drawbacks are performance and inability to index several attributes at the same time. However, if your database system allows indexed views (like SQL Server) or clustered storage of multiple tables (like Oracle), then by using these techniques performance can be improved.
However, storing all attributes in one record will still be faster.

I don't see any good reason to move those attributes out of the product table. It'd be one thing if you did it because you had some data that suggested a problem, but it looks like you thought "this will be better". Why did you do it this way right off the bat?
If you did it this way because it was generated for you, I'd recommend abandoning that generator.

People keep coming back to this model because they think it's "flexible". Well, it is I suppose, but that flexibility comes at a huge price: Every update and every query is slow and complex. Quassnoi mentions that if the attributes are sparse, i.e. most entity instances have only a small percentage of the possible attributes, this can save space. This is true, but the flip side is that if it is not sparse, this takes hugely more space, because now you have to store the attribute name or code for every attribute in addition to the value, plus you need to repeat some sort of key to identify the logical entity instance for every attribute.
The only time I can think of when this would be a good idea is if the list of attributes needs to be updatable on the fly, that is, a user needs to be able to decide to create a new attribute whenever he likes. But then what will the system do with this attribute? If you just want the user to be able to type it in and then later retrieve what he typed, easy enough. But will it affect processing in any way? Like, if the user decides to add a "clearance sale code", how will your program know how this affects the sale price? It could be done of course: You could have additional screens where the user enters data that somehow describes how each field affects pricing or re-ordering or whatever. But that would add yet more layers of complexity.
So my short answer is: Unless you have a very specialized requirement, don't do this. If you are trying to build a database describing items that you sell, with things like description and price and and quantity on hand, then create one table with fields like description and price and quantity on hand. Life is hard enough without going out of your way to make it harder.

Doctrine2...Best hydration mode?

I am designing a room booking system which has nine entities, which all relate to each other. In this specific instance I am retrieving 10-30 rows from the entity entry which has 25 properties. Each entry has one room which has 10 properties. I need all of the entry information as well as entry->room->id and entry->room->name. But it seems like doctrine is loading the entire room when I use Query::HYDRATE_ARRAY. It seems to be lazy-loading in Query::HYDRATE_OBJECT more easily.
So, I am wondering if using the Query::HYDRATE_OBJECT mode is faster or "better" than Query::HYDRATE_ARRAY / Query::HYDRATE_SCALAR/ Query::HYDRATE_SINGLE_SCALAR. Since I am reusing some older code I'd like to use HYDRATE_ARRAY but only if it won't slow the application down.

My 2 cents:
HYDRATE_OBJECT is best for when you plan on using a lot of business logic with your objects. Especially if you're doing a lot of data manipulation. It's also probably the slowest (depending on the situation).
HYDRATE_ARRAY is usually reserved for when you only need a result and 1 degrees of relational data and it's going to be used for printing/viewing purposes only.
HYDRATE_NONE is another one I use when I'm only selecting a very small subset of data (like one or two fields instead of the entire row). This behaves much like a raw query result would.
This might also be of interest http://www.doctrine-project.org/2010/03/17/doctrine-performance-revisited.html
This is from the 1.2 docs but I think the Hydration tips apply in 2.0 http://doctrine.readthedocs.org/en/latest/en/manual/improving-performance.html
Another important rule that belongs in this category is: Only fetch objects when you really need them. Doctrine has the ability to fetch "array graphs" instead of object graphs. At first glance this may sound strange because why use an object-relational mapper in the first place then? Take a second to think about it. PHP is by nature a precedural language that has been enhanced with a lot of features for decent OOP. Arrays are still the most efficient data structures you can use in PHP. Objects have the most value when they're used to accomplish complex business logic. It's a waste of resources when data gets wrapped in costly object structures when you have no benefit of that
On using HYDRATE_ARRAY:
Can you think of any benefit of having objects in the view instead of arrays? You're not going to execute business logic in the view, are you? One parameter can save you a lot of unnecessary processing:
$blogPosts = $q->execute(array(1), Doctrine_Core::HYDRATE_ARRAY);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.