MongoDB object mapping (PHP)

MongoDB object mapping (PHP) - php

Intoduction problem:
What is the best practice to build my class T object, when I receive it from a MongoCursor::getNext()? As far as it goes, getNext() function of a MongoCursor returns with an array. I wish to use the result from that point as an object of type T.
Should I write my own constructor for type T, that accepts an array? Is there any generic solution to this, for example when type T extends G, and G does the job as a regular way, recursively (for nested documents).
I'm new to MongoDB, and I'd like to build my own generic mapper with a nice interface.
Bounty:
Which are the possible approaches, patterns and which would fit the concept of MongoDB the most from the view of PHP.

This answer has been rewritten.
Most data mappers work by representing one object per class or "model" is normally the coined term. If you wish to allow multiple accession through a single object (i.e. $model->find()) it is normally demmed so that the method will not actually return an instance of itself but instead that of an array or a MongoCursor eager loading classes into the space.
Such a paradigm is normally connected with "Active Record". This is the method that ORMs, ODMs and frameworks all use to communicate to databases in one way or another, not only for MongoDB but also for SQL and any other databases to happen to crop up (Cassandra, CouchDB etc etc).
It should be noted immediately that even though active record gives a lot of power it should not be blanketed across the entire application. There are times where using the driver directly would be more benefical. Most ORMs, ODMs and frameworks provide the ability to quickly and effortlessly access the driver directly for this reason.
There is, as many would say, no light weight data mapper. If you are going to map your returned data to classes then it will consume resources, end of. The benefit of doing this is the power you receive when manipulating your objects.
Active record is really good at being able to provide events and triggers from within PHP. A good example is that of an ORM I made for Yii: https://github.com/Sammaye/MongoYii it can provide hooks for:
afterConstruct
beforeFind
afterFind
beforeValidate
afterValidate
beforeSave
afterSave
It should be noted that when it comes to events like beforeSave and afterSave MongoDB does not possess triggers ( https://jira.mongodb.org/browse/SERVER-124 ) so it makes sense that the application should handle this. On top of the obvious reason for the application to handle this it also makes much better handling of the save functions by being able to call your native PHP functions to manipulate every document saved prior to touching the database.
Most data mappers work by using PHP own class CRUD to represent theirs too. For example to create a new record:
$d=new User();
$d->username='sammaye';
$d->save();
This is quite a good approach since you create a "new" ( https://github.com/Sammaye/MongoYii/blob/master/EMongoDocument.php#L46 shows how I prepare for a new record in MongoYii ) class to make a "new" record. It kind of fits quite nicely semantically.
Update functions are normally accessed through read functions, you cannot update a model you don't know the existane of. This brings us onto the next step of populating models.
To handle populating a model different ORMs, ODMs and frameworks commit to different methods. For example, my MongoYii extension uses a factory method called model in each class to bring back a new instance of itself so I can call th dynamic find and findOne and other such methods.
Some ORMs, ODMs and frameworks provide the read functions as direct static functions making them into factory methods themselves whereas some use the singleton pattern, however, I chose not to ( https://stackoverflow.com/a/4596323/383478 ).
Most, if not all, implement some form of the cursor. This is used to return multiples of the models and directly wraps (normally) the MongoCursor to replace the current() method with returning a pre-populate model.
For example calling:
User::model()->find();
Would return a EMongoCursor (in MongoYii) which would then sotre the fact that the class User was used to instantiate the cursor and when called like:
foreach(User::model() as $k=>$v){
var_dump($v);
}
Would call the current() method here: https://github.com/Sammaye/MongoYii/blob/master/EMongoCursor.php#L102 returning a new single instance of the model.
There are some ORMs, ODMs and frameworks which implement eager array loading. This means they will just load the whole result straight into your RAM as an array of models. I personally do not like this approach, it is wasteful and also does not bode well when you need to use active record for larger updates due to adding some new functionality in places that needs adding to old records.
One last topic before I move on is the schemaless nature of MongoDB. The problem with using PHP classes with MongoDB is that you want all the functionality of PHP but with the variable nature of MongoDB. This is easy to over come in SQL since it has a pre-defined schema, you just query for it and jobs done; however, MongoDB has no such thing.
This does make schema handling in MongoDB quite hazardous. Most ORMs, ODMs and frameworks demand that you pre-define the schema in the spot (i.e. Doctrine 2) using private variables with get and set methods. In MongoYii, to make my life easy and elegant, I decided to retain MongoDBs schemaless nature by using magics that would detect ( https://github.com/Sammaye/MongoYii/blob/master/EMongoModel.php#L26 is my __get and https://github.com/Sammaye/MongoYii/blob/master/EMongoModel.php#L47 is my __set ), if the property wa inaccessible in the class, if the field was in a internal _attributes array and if not then just return null. Likewise, for setting an attribute I would just set in the intrernal _attributes variable.
As for dealing with how to assign this schema I left internal assignment upto the user however, to deal with setting properties from forms etc I used the validation rules ( https://github.com/Sammaye/MongoYii/blob/master/EMongoModel.php#L236 ) calling a function called getSafeAttributeNames() which would return a list of attributes which had validation rules against them. If they did not have validation rules then those attributes which existed in the incoming $_POST or $_GET array would not be set. So this provided the ability for a schema, yet secure, model structure.
So we have covered how to actually use the root document you also ask how to data mappers handle subdocuments. Doctrine 2 and many others provide full class based subdocuments ( http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/embedded-mapping.html ) but this can be extremely resourceful. Instead I decided that I would provide helper functions which would allow for flexible usage of subdocument without eager loading them into models and so consuming RAM. Basically what I did was to leave them as they are a provide a validator ( https://github.com/Sammaye/MongoYii/blob/master/validators/ESubdocumentValidator.php ) for validating inside of them. Of course the validator is self spawning so if you had a rule in the validator that used the validator again to issue a validation of a nested subdocument then it would work.
So I think that completes a very basic discussion of ORMs, ODMs and frameworks use data mappers. Of course I could probably write an entire essay on this but this is a good enough discussion for the minute I believe.

Related

Selectivley expose functions based on external or internal calls to methods

This question closely resembles what I'm trying to achieve, but as has been indicated in excruciating detail in almost every answer to the question, it is bad-design.
PHP, distinguish between internal and external class method call
Here's what I'm trying to achieve:
Track all actions performed on low-activity configuration tables so that the changes can be propagated to production and QA databases. (configuration tables = Configuration is stored in tables.)
Here's a synopsis of my problem:
All models in Yii extend from the CActiveRecord class which provides some methods to manipulate instances of Models.
Let's break these into 2 categories:
Non-Primitives - Which trigger events like onBeforeDelete, onAfterFind, onAfterSave etc. (ex: https://github.com/yiisoft/yii/blob/1.1.13/framework/db/ar/CActiveRecord.php#L1061)
Primitives - Which directly create and execute commands without triggering events - i.e. act as query generators. (ex: https://github.com/yiisoft/yii/blob/1.1.13/framework/db/ar/CActiveRecord.php#L1684)
Now, these primitives are also public members and thus can be called from outside the class at the users' discretion - And they will modify the table without triggering any events.
These are the solutions I have come up with:
Lay down guidelines for all developers to use Non-primitive methods only.
Encapsulate CActiveRecord in my own model class and expose only non-primitives.
Case 1 will be easier to implement but will be more error prone since at some time some developer might forget the restriction and use a primitive method.
Case 2 will require me to write a lot of code / methods exposing methods I wish to be used. Also, this might cause confusion if both the Yii CActiveRecord and my ActiveRecord class don't have the same interface.
A better solution, in my opinion will be to allow usage of primitives internally while restricting external calls, i.e. using the private/public access specifiers. (This is already contradicted by the reason I provided in case 2, but this is the only solution I can come up with.) Since I cannot use private/public specifiers without encapsulating, and I cannot encapsulate, I'd like to distinguish within the method whether the function is an external call or an internal. debug_backtrace is a viable solution, but I'm here for a more elegant, less hacky solution, or a definitive statement that this cannot be done.

First you should take a step back and think about why there is such a difference in behavior. The methods you call non-primitive are supposed to be called on a model instance:
$ar = new Something();
$ar->update(...);
While the methods you call primitive are supposed to be called on the model itself:
Something::model()->updateByPk(...);
It stands to reason that it doesn't make sense to raise events in the second case because
you are not supposed to work directly with ::model() at all, and
depending on the method, the operation can affect multiple records for which there are no corresponding model instances in PHP
So the quest for a solution should start with you answering these two questions:
In a perfect world, how would you be notified when calling any method on the model? (Obviously the method would need to be primitive for the call to make sense).
In a perfect world, how would you be notified of an operation that affects an unknown (in PHP) number of records?

Do Abstract Factories use "new"?

I am trying to use Dependency Injection as much as possible, but I am having trouble when it comes to things like short-lived dependencies.
For example, let's say I have a blog manager object that would like to generate a list of blogs that it found in the database. The options to do this (as far as I can tell) are:
new Blog();
$this->loader->blog();
the loader object creates various other types of objects like database objects, text filters, etc.
$this->blogEntryFactory->create();
However, #1 is bad because it creates a strong coupling. #2 still seems bad because it means that the object factory has to be previously injected - exposing all the other objects that it can create.
Number 3 seems okay, but if I use #3, do I put the "new" keywords in the blogEntryFactory itself, OR, do I inject the loader into the blogEntryFactory and use the loader?
If I have many different factories like blogEntryFactory (for example I could have userFactory and commentFactory) it would seem like putting the "new" keyword across all these different factories would be creating dependency problems.
I hope this makes sense...
NOTE
I have had some answers about how this is unnecessary for this specific blog example, but there are, in fact, cases where you should use the Abstract Factory Pattern, and that is the point I am getting at. Do you use "new" in that case, or do something else?

I'm no expert, but I'm going to take a crack at this. This assumes that Blog is just a data model object that acts as a container for some data and gets filled by the controller (new Blog is not very meaningful). In this case, Blog is a leaf of the object graph, and using new is okay. If you are going to test methods that need to create a Blog, you have to simultaneously test the creation of the Blog anyway, and using a mock object doesn't make sense .. the Blog does not persist past this method.
As an example, say that PHP did not have an array construct but had a collections object. Would you call $this->collectionsFactory->create() or would you be satisfied to say new Array;?

In answer to the title: yes, abstract factories typically use new. For example, see the MazeFactory code on page 92 of the GoF book. It includes, return new Maze; return new Wall; return new Room; return new Door;
In answer to the note: a design that uses abstract factories to create data models is highly suspect. The purpose is to vary the behavior of the factory's products while making their concrete implementations invisible to clients. Data models with no behavior do not benefit from an abstract factory.

php oop MVC design - proper architecture for an application to edit data

Now that I have read an awfull lot of posts, articles, questions and answers on OOP, MVC and design patterns, I still have questions on what is the best way to build what i want to build.
My little framework is build in an MVC fashion. It uses smarty as the viewer and I have a class set up as the controller that is called from the url.
Now where I think I get lost is in the model part. I might be mixing models and classes/objects to much (or to little).
Anyway an example. When the aim is to get a list of users that reside in my database:
the application is called by e.g. "users/list" The controller then runs the function list, that opens an instance of a class "user" and requests that class to retrieve a list from the table. once returned to the controller, the controller pushes it to the viewer by assigning the result set (an array) to the template and setting the template.
The user would then click on a line in the table that would tell the controler to start "user/edit" for example - which would in return create a form and fill that with the user data for me to edit.
so far so good.
right now i have all of that combined in one user class - so that class would have a function create, getMeAListOfUsers, update etc and properties like hairType and noseSize.
But proper oop design would want me to seperate "user" (with properties like, login name, big nose, curly hair) from "getme a list of users" what would feel more like a "user manager class".
If I would implement a user manager class, how should that look like then? should it be an object (can't really compare it to a real world thing) or should it be an class with just public functions so that it more or less looks like a set of functions.
Should it return an array of found records (like: array([0]=>array("firstname"=>"dirk", "lastname"=>"diggler")) or should it return an array of objects.
All of that is still a bit confusing to me, and I wonder if anyone can give me a little insight on how to do approach this the best way.

The level of abstraction you need for your processing and data (Business Logic) depends on your needs. For example for an application with Transaction Scripts (which probably is the case with your design), the class you describe that fetches and updates the data from the database sounds valid to me.
You can generalize things a bit more by using a Table Data Gateway, Row Data Gateway or Active Record even.
If you get the feeling that you then duplicate a lot of code in your transaction scripts, you might want to create your own Domain Model with a Data Mapper. However, I would not just blindly do this from the beginning because this needs much more code to get started. Also it's not wise to write a Data Mapper on your own but to use an existing component for that. Doctrine is such a component in PHP.
Another existing ORM (Object Relational Mapper) component is Propel which provides Active Records.
If you're just looking for a quick way to query your database, you might find NotORM inspiring.
You can find the Patterns listed in italics in
http://martinfowler.com/eaaCatalog/index.html
which lists all patterns in the book Patterns of Enterprise Application Architecture.

I'm not an expert at this but have recently done pretty much exactly the same thing. The way I set it up is that I have one class for several rows (Users) and one class for one row (User). The "several rows class" is basically just a collection of (static) functions and they are used to retrieve row(s) from a table, like so:
$fiveLatestUsers = Users::getByDate(5);
And that returns an array of User objects. Each User object then has methods for retrieving the fields in the table (like $user->getUsername() or $user->getEmail() etc). I used to just return an associative array but then you run into occasions where you want to modify the data before it is returned and that's where having a class with methods for each field makes a lot of sense.
Edit: The User object also have methods for updating and deleting the current row;
$user->setUsername('Gandalf');
$user->save();
$user->delete();

Another alternative to Doctrine and Propel is PHP Activerecords.
Doctrine and Propel are really mighty beasts. If you are doing a smaller project, I think you are better off with something lighter.
Also, when talking about third-party solutions there are a lot of MVC frameworks for PHP like: Kohana, Codeigniter, CakePHP, Zend (of course)...
All of them have their own ORM implementations, usually lighter alternatives.
For Kohana framework there is also Auto modeler which is supposedly very lightweight.
Personally I'm using Doctrine, but its a huge project. If I was doing something smaller I'd sooner go with a lighter alternative.

benefit of having a factory for object creation?

I'm trying to understand the factory design pattern.
I don't understand why it's good to have a middleman between the client and the product (object that the client wants).
example with no factory:
$mac = new Mac();
example with a factory:
$appleStore = new AppleStore();
$mac = $appleStore->getProduct('mac');
How does the factory pattern decouple the client from the product?
Could someone give an example of a future code change that will impact on example 1 negative, but positive in example 2 so I understand the importance of decoupling?
Thanks.

I think it has to do with the resources needed to construct some types of objects.
Informally, if you told someone to build a Mac, it would be a painstaking process that would take years of design, development, manufacturing, and testing, and it might not be done right. This process would have to be repeated for every single Mac. However, if you introduce a factory, all the hard work can be done just once, then Macs can be produced more cheaply.
Now consider Joomla's factory.php. From what I can tell, the main purpose of JFactory is to pool objects and make sure objects that should be the same aren't copied. For instance, JFactory::getUser() will return a reference to one and only one object. If something gets changed in that user object, it will appear everywhere. Also, note that JFactory::getUser() returns a reference, not a new object. That is something you simply cannot do with a constructor.
Often, you need local context when constructing an object, and that context may persist and possibly take on many forms. For instance, there might be a MySQL database holding users. If User objects are created with a constructor, you'll need to pass a Database object to the constructor (or have it rely on a global variable). If you decide to switch your application to PostgreSQL, the semantics of the Database object may change, causing all uses of the constructor to need review. Global variables let us hide those details, and so do factories. Thus, a User factory would decouple the details of constructing User objects from places where User objects are needed.
When are factories helpful? When constructing an object involves background details. When are constructors better? When global variables suffice.

Don't know if I can put it any better than IBM did https://www.ibm.com/developerworks/library/os-php-designptrns/#N10076

This example returns an object of type Mac and it can never be anything different:
$mac = new Mac();
It can't be a subclass of Mac, not can it be a class that matches the interface of Mac.
Whereas the following example may return an object of type Mac or whatever other type the factory decides is appropriate.
$appleStore = new AppleStore();
$mac = $appleStore->getProduct('mac');
You might want a set of subclasses of Mac, each representing a different model of Mac. Then you write code in the factory to decide which of these subclasses to use. You can't do that with the new operator.
So a factory gives you more flexibility in object creation. Flexibility often goes hand in hand with decoupling.
Re your comment: I wouldn't say never use new. In fact, I do use new for the majority of simple object creation. But it has nothing to do with who is writing the client code. The factory pattern is for when you want an architecture that can choose the class to instantiate dynamically.
In your Apple Store example, you would probably want some simple code to instantiate a product and add it to a shopping cart. If you use new and you have different object types for each different product type, you'd have to write a huge case statement so you could make a new object of the appropriate type. Every time you add a product type, you'd have to update that case statement. And you might have several of these case statements in other parts of your application.
By using a factory, you would only have one place to update, that knows how to take a parameter and instantiate the right type of object. All places in your app would implicitly gain support for the new type, with no code changes needed. This is a win whether you're the sole developer or if you're on a team.
But again, you don't need a factory if you don't need to support a variety of subtypes. Just continue to use new in simple cases.

What does a Data Mapper typically look like?

I have a table called Cat, and an PHP class called Cat. Now I want to make a CatDataMapper class, so that Cat extends CatDataMapper.
I want that Data Mapper class to provide basic functionality for doing ORM, and for creating, editing and deleting Cat.
For that purpose, maybe someone who knows this pattern very well could give me some helpful advice? I feel it would be a little bit too simple to just provide some functions like update(), delete(), save().
I realize a Data Mapper has this problem: First you create the instance of Cat, then initialize all the variables like name, furColor, eyeColor, purrSound, meowSound, attendants, etc.. and after everything is set up, you call the save() function which is inherited from CatDataMapper. This was simple ;)
But now, the real problem: You query the database for cats and get back a plain boring result set with lots of cats data.
PDO features some ORM capability to create Cat instances. Lets say I use that, or lets even say I have a mapDataset() function that takes an associative array. However, as soon as I got my Cat object from a data set, I have redundant data. At the same time, twenty users could pick up the same cat data from the database and edit the cat object, i.e. rename the cat, and save() it, while another user still things about setting another furColor. When all of them save their edits, everything is messed up.
Err... ok, to keep this question really short: What's good practice here?

From DataMapper in PoEA
The Data Mapper is a layer of software
that separates the in-memory objects
from the database. Its responsibility
is to transfer data between the two
and also to isolate them from each
other. With Data Mapper the in-memory
objects needn't know even that there's
a database present; they need no SQL
interface code, and certainly no
knowledge of the database schema. (The
database schema is always ignorant of
the objects that use it.) Since it's a
form of Mapper (473), Data Mapper
itself is even unknown to the domain
layer.
Thus, a Cat should not extend CatDataMapper because that would create an is-a relationship and tie the Cat to the Persistence layer. If you want to be able to handle persistence from your Cats in this way, look into ActiveRecord or any of the other Data Source Architectural Patterns.
You usually use a DataMapper when using a Domain Model. A simple DataMapper would just map a database table to an equivalent in-memory class on a field-to-field basis. However, when the need for a DataMapper arises, you usually won't have such simple relationships. Tables will not map 1:1 to your objects. Instead multiple tables could form into one Object Aggregate and viceversa. Consequently, implementing just CRUD methods, can easily become quite a challenge.
Apart from that, it is one of the more complicated patterns (covers 15 pages in PoEA), often used in combination with the Repository pattern among others. Look into the related questions column on the right side of this page for similar questions.
As for your question about multiple users editing the same Cat, that's a common problem called Concurrency. One solution to that would be locking the row, while someone edits it. But like everything, this can lead to other issues.

If you rely on ORM's like Doctrine or Propel, the basic principle is to create a static class that would get the actual data from the database, (for instance Propel would create CatPeer), and the results retrieved by the Peer class would then be "hydrated" into Cat objects.
The hydration process is the process of converting a "plain boring" MySQL result set into nice objects having getters and setters.
So for a retrieve you'd use something like CatPeer::doSelect(). Then for a new object you'd first instantiate it (or retrieve and instance from the DB):
$cat = new Cat();
The insertion would be as simple as doing: $cat->save(); That'd be equivalent to an insert (or an update if the object already exists in the db... The ORM should know how to do the difference between new and existing objects by using, for instance, the presence ort absence of a primary key).

Implementing a Data Mapper is very hard in PHP < 5.3, since you cannot read/write protected/private fields. You have a few choices when loading and saving the objects:
Use some kind of workaround, like serializing the object, modifying it's string representation, and bringing it back with unserialize
Make all the fields public
Keep them private/protected, and write mutators/accessors for each of them
The first method has the possibility of breaking with a new release, and is very crude hack, the second one is considered a (very) bad practice.
The third option is also considered bad practice, since you should not provide getters/setters for all of your fields, only the ones that need it. Your model gets "damaged" from a pure DDD (domain driven design) perspective, since it contains methods that are only needed because of the persistence mechanism.
It also means that now you have to describe another mapping for the fields -> setter methods, next to the fields -> table columns.
PHP 5.3 introduces the ability to access/change all types of fields, by using reflection:
http://hu2.php.net/manual/en/reflectionproperty.setaccessible.php
With this, you can achieve a true data mapper, because the need to provide mutators for all of the fields has ceased.

PDO features some ORM capability to
create Cat instances. Lets say I use
that, or lets even say I have a
mapDataset() function that takes an
associative array. However, as soon as
I got my Cat object from a data set, I
have redundant data. At the same time,
twenty users could pick up the same
cat data from the database and edit
the cat object, i.e. rename the cat,
and save() it, while another user
still things about setting another
furColor. When all of them save their
edits, everything is messed up.
In order to keep track of the state of data typically and IdentityMap and/or a UnitOfWork would be used keep track of all teh different operations on mapped entities... and the end of the request cycle al the operations would then be performed.

keep the answer short:
You have an instance of Cat. (Maybe it extends CatDbMapper, or Cat3rdpartycatstoreMapper)
You call:
$cats = $cat_model->getBlueEyedCats();
//then you get an array of Cat objects, in the $cats array
Don't know what do you use, you might take a look at some php framework to the better understanding.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.