I have built into our ORM layer object caching. Basically a hash of the SQL query is used as the key and the value contains the collection of objects from the DB resultset. However this creates a problem if one of the objects in the resultset is updated the cached resultset does not include the updated object. There is no write consistency. How would I implement write consistency?
Thanks
UPDATE: Currently I have an ObjectWatcher class that handles what objects are cached and their keys. Objects are cached with retrievable keys so for Person class it's Person.101 for example. The SQL query is hashed and the key maps to a Dependency object which within it has a list of the dependent objects. So SELECT * FROM person might return a Dependency object from APC which maps to Person.101 and Person.102 the resulting collection is built from this Dependency object. This works fine for the update of a single object. So If I update Person.101 and put the newly update object into APC overwriting the stale one, when a older query is run that updated object will get put into that result set, which could be incorrect. I need a way to clean not only the object from memory but all the Dependency object which hold a reference to the updated object. In APC is there a way to search for keys containing or values containing or filter keys and values?
This question is not related to APC.
You need to manage, how data will be stored in APC (or any other storage). If you want to update value of key in APC, when object will be changed - it can be possible, only when Object will know the key (hash of query) and this object should be able to collect all data from another objects, fetched by that query. All it sounds like absurd idea.
Any model should be designed with Single Responsibility principle, so if you want to cache whole objects (it's not very good idea too), then create unique keys for each object.
Also, objects shouldn't care about how they will be stored (cached), and where. So you need one more object, which will manage objects caching.
And I recommend to cache not whole objects, but only values of records in DB, which are takes to many time to fetch them from DB.
But if you still want to use hash of SQL query as a key, then you can use tags and write "names" of objects in these tags.
For example, if in result-set you have objects Person, Employer and Customer, then key will have tags "person", "employer" and "customer". And, when object Customer will be changed, you can delete from cache all keys, which marked with tag "customer".
But, anyway, it's not a responsibility of Customer object, all this things should be managed by another object.
Question was edited, so I'll edit my answer too :)
Tags is not the part of APC, is the part of wrapper. Tags is very useful thing and very handy for your case.
which hold a reference to the updated object
Tags can be this reference. You don't need to search keys by tag, you need just to remove all keys associated with that tag (to keep data actual), and this wrapper has existing method to do it.
In examples:
Let we have query SELECT * FROM persons WHERE email <> '' - cached result of this query will be marked by the tag "person".
So, when we will update any Person object, we will remove all keys, which are marked with tag "person", so our result for query SELECT * FROM persons WHERE email <> '' will be removed, and in next request our script will generate new (actual) value.
Related
I'm currently at an impasse in reguards to the structural design of my website. At the moment I'm using objects to simplify the structure of my site (I have a person object, a party object, a position object, etc...) and in theory each of these is a row from it's respective table in the database.
Now from what I've learnt, OO Design is good for keeping things simple and easy to use/implement, which I agree with - it makes my code look so much cleaner and easier to maintain, but what I'm confused about is how I go about linking my objects to the database.
Let's say there is a person page. I create a person object, which equals one mysql query (which is reasonable), but then that person might have multiple positions which I need to fetch and display on a single page.
What I am currently doing is using a method called getPositions from the person object which gets the data from mysql and creates a separate position object for each row, passing in the data as an array. That keeps the queries down to a minimum (2 to a page) but it seems like a horrible implementation and to me, breaks the rules of object orientated design (should I want to change a mysql row, I'd need to change it in multiple places) but the alternative is worse.
In this case the alternative is just getting the ID's that I need and then creating separate positions, passing in the ID which then goes on to getting the row from the database in the constructor. If you have 20 positions per page, it can quickly add up and I've read about how much Wordpress is criticised for it's high number of queries per page and it's CPU usage. The other thing I'll need to consider in this case is sorting, and doing it this way means I'll need to sort the data using PHP, which surely can't be as efficient as natively doing it in mysql.
Of course, pages will be (and can be) cached, but to me, this seems almost like cheating for poorly built applications. In this case, what is the correct solution?
The way you're doing it now is at least on the right track. Having an array in the parent object with references to the children is basically how the data is represented in the database.
I'm not completely sure from your question if you're storing the children as references in the parent's array, but you should be and that's how PHP should store them by default. If you also use a singleton pattern for your objects that are pulled from the database, you should never need to modify multiple objects to change one row as you suggest in your question.
You should probably also create multiple constructors for your objects (using static methods that return new instances) so you can create them from their ID and have them pull the data or just create them from data you already have. The latter case would be used when you're creating children; you can have the parent pull all of the data for its children and create all of them using only one query. Getting a child from its ID will probably be used somewhere else so its good just to have if its needed.
For sorting, you could create additional private (or public if you want) arrays that have the children sorted in a particular way with references to the same objects the main array references.
Suppose I'm loading a large number of objects from a database. These are normal, plain PHP objects, no inheritance from anything fancy. Suppose I might change a few of these objects and want to write them back to the database, but only use the fields that actually differ in the UPDATE ... SET ... query. Also suppose that I don't know in advance which objects are going to be changed.
I'm thinking that I need to make a copy of all the objects loaded, and keep around for reference and comparison, should I need to write objects back to the database.
I see two possible approaches:
I can either clone all the loaded objects and store in a separate list. When saving, look up the object in the list using an index, and compare the values.
Or, I can simply serialize everything loaded into a string, and keep around. When saving, find the serialized object in the string (somehow), unserialize it, compare the values, and there you go.
In terms of efficiency (mostly memory, but speed is also a consideration), which would be favorable?
Well you actually needs something to compare if the state of the object has changed or not. If you even want to track not only which object has changed but also which member, you need to have a state per member.
As you don't want to extend the original objects (e.g. they could have a flag they invalidate when they are changed), you need to track the state from the outside. I'd say serializing is probably the best option then. Cloning will take more memory.
In a project I'm working on, I have an object that is a sort of Collection with a database back end. The exact results this Collection returns are dependent upon its configuration, which is itself dependent on a number of user inputs. I would like to have an element on the page that contains the records in the Collection and can be updated dynamically through an AJAX request. The idea has occurred to me to serialize() this object, store it in memcache, and include the memcache key as a parameter in my AJAX calls. I would then retrieve the string from memcahce, unserialize() it, and retrieve the next set of records from the collection.
Is this a good way to achieve the kind of object persistence I want to make this work? I considered storing just the configuration, but I feel like this is a better "set it and forget it" solution in the face of future changes to the user controls. My main concern is that there might be some pitfall with serialize that I'm not aware of that would make this solution not robust, unreliable, or not very fast. Do I need to be concerned in any of those regards?
serialize/unserialize works well enough with scalars, but can be more problematic when working with objects. I've had a couple of issues that highlight potential pitfalls.
If any of your object properties are resources, these can't be serialized. You'd need to use the magic __sleep and __wakeup methods to cleanly close the resource attribute and restore it again on unserialize.
If your collection contains objects with cyclic references (e.g. a cellCollection object is an array of cell objects, each of which has an attribute pointing back to the parent cellCollection object) then these won't be cleanly restored on unserialize... each cell's parent object will actually be a clone of the original parent. Again, __sleep and __wakeup need to be used to restore the true relationships (not a trivial task).
If the serialized objects are larger than just queries you are extracting from the database, and have had a lot of processing applied to them, then what you are proposing is actually a very good optimization.
Two reference in particular:
http://code.google.com/p/memcached/wiki/FAQ#Cache_things_other_than_SQL_data!
http://www.mysqlperformanceblog.com/2010/05/19/beyond-great-cache-hit-ratio/
Both promote using memcached as being beyond a "row cache".
Is there a way to insert logic based on virtual fields into a Doctrine_Query?
I have defined a virtual field in my model, "getStatus()" which I would ultimately like to utilize in a Where clause in my Doctrine_Query.
...
->AndWhere('x.status = ?',$status);
"status", however, is not a column in the table it is instead computed by business logic in the model.
Filtering the Collection after executing the query works in some situations, but not when a Doctrine_Pager is thrown in the mix, as it computes it's offsets and such before you have access to the Collection.
Am I best off ditching Doctrine_Pager and rebuilding that functionality after modifying the Doctrine_Collection?
If you can do it in SQL you can do it in Doctrine. All doctrine is doing is working out what you are putting into the DQL parser, be it strings or values and turning that into SQL then hydrating objects from the result.
You can't use Doctrine_Pager to page on non query objects, however you could use sfPager and pass it the results of the Doctrine_Collection as an array? In the worst case you could pass it the results of the query minus any limits in the query and let it handle the paging, however this is really inefficient.
It might be quicker to write the pager "old skool" like you would in plain old PHP.
I don't really know what business logic you're applying to work out the status, but if it's not live (as in, computed per request), I'd compute it on save (using a Doctrine Record Listener or simply a preSave/preInsert hook in the model) and store it in the table, or set up a symfony task to refresh it periodically and run that as a cronjob. That would let you query it in Doctrine just fine and boost performance as a fringe benefit.
Alternatively, if status is dependent on the state of related objects, you can put an event trigger on them that updates the status of the parent object when they're modified. It's hard to recommend a best approach without more context. :)
Bit of an abstract problem here. I'm experimenting with the Domain Model pattern, and barring my other tussles with dependencies - I need some advice on generating Identity for use in an Identity Map.
In most examples for the Data Mapper pattern I've seen (including the one outlined in this book: http://apress.com/book/view/9781590599099) - the user appears to manually set the identity for a given Domain Object using a setter:
$UserMapper = new UserMapper;
//returns a fully formed user object from record sets
$User = $UserMapper->find(1);
//returns an empty object with appropriate properties for completion
$UserBlank = $UserMapper->get();
$UserBlank->setId();
$UserBlank->setOtherProperties();
Now, I don't know if I'm reading the examples wrong - but in the first $User object, the $id property is retrieved from the data store (I'm assuming $id represents a row id). In the latter case, however, how can you set the $id for an object if it has not yet acquired one from the data store?
The problem is generating a valid "identity" for the object so that it can be maintained via an Identity Map - so generating an arbitrary integer doesn't solve it.
My current thinking is to nominate different fields for identity (i.e. email) and demanding their presence in generating blank Domain Objects. Alternatively, demanding all objects be fully formed, and using all properties as their identity...hardly efficient.
(Or alternatively, dump the Domain Model concept and return to DBAL/DAO/Transaction Scripts...which is seeming increasingly elegant compared to the ORM implementations I've seen...)
You would use the setId function if you are controlling the IDs, if you want to override the data store ID, or if you want to update/delete the data without having to retrieve it first (i.e. already have the ID from a POST).
Another alternative would be calling setId() to reserve an ID by "querying" (insert a record) the data store for the next available ID.
It's not really relevant what the ID is set to until you actually need to use it to reference something. Calling setId with no parameter would do nothing except flag the object as new data. Once you actually try to "get" the ID is when one would be generated. Sort lazy ID generation.