Given a class with some really expensive code, I want to avoid running that code when re-defining an instance.
Best explained with some pseudo-code:
$foo = new Foo('bar');
print $foo->eat_cpu_and_database_resources(); #=> 3.14159
$foo->store_in_cache(); #Uses an existing Memcached and/or caching to store serialized.
#new thread, such as a new HTTP request. Could be days later.
$bar = new Foo('bar');
print $foo->eat_cpu_and_database_resources(); #=> 3.14159
The second $bar should re-initialize the earlier created instance $foo. Inside my actual class, I do several things on eat_cpu_and_database_resources(), which is named get_weighted_tags(): calculate a weighted tagcloud from values in $foo->tags. $foo->tags() was filled with expensive $foo->add_tag() calls. I would like to retrieve the prepared and filled instance from now on, from cache.
I have tried to simply fetch from (serialized) cache on __construct() and assign the retrieved instance to $this, which is not allowed in PHP:
function __construct ($id) {
if ($cached = $this->cache_get($id)) {
$this = $cached
}
else {
#initialize normally
}
}
Is this a proper way? Or should I treat every instance unique and instead apply caching in the eat_cpu_and_database_resources() method, instead of caching the entire instance?
Is there a built-in way in PHP to revive old instances (in a new thread)?
Depending on the size of Foo, you might want to cache the entire object in the cache store Drupal provides. If it's too big for that, see if it makes sense to just cache the result to the expensive method call(s).
If you want to unserialize an object from the PHP internal format, you have to use the corresponding unserialize method and might want to add the magic __wakeup method to do any post re-initializations:
The intended use of __wakeup is to reestablish any database connections that may have been lost during serialization and perform other reinitialization tasks.
Since you have to have the serialized string for that first, you might want to add some facilitiy to encapsulate this logic, like a Factory or Builder pattern or a dedicated FooCache.
Personally I find caching the method call the best option because there's no point in caching the whole object when it's really just the method call that's expensive. That will also save you any additional work checking whether there is a serialized string to start with or building a factory.
Related
I have some pattern that works great for me, but that I have some difficulty explaining to fellow programmers. I am looking for some justification or literature reference.
I personally work with PHP, but this would also be applicable to Java, Javascript, C++, and similar languages. Examples will be in PHP or Pseudocode, I hope you can live with this.
The idea is to use a lazy evaluation container for intermediate results, to avoid multiple computation of the same intermediate value.
"Dynamic programming":
http://en.wikipedia.org/wiki/Dynamic_programming
The dynamic programming approach seeks to solve each subproblem only once, thus reducing the number of computations: once the solution to a given subproblem has been computed, it is stored or "memo-ized": the next time the same solution is needed, it is simply looked up
Lazy evaluation container:
class LazyEvaluationContainer {
protected $values = array();
function get($key) {
if (isset($this->values[$key])) {
return $this->values[$key];
}
if (method_exists($this, $key)) {
return $this->values[$key] = $this->$key();
}
throw new Exception("Key $key not supported.");
}
protected function foo() {
// Make sure that bar() runs only once.
return $this->get('bar') + $this->get('bar');
}
protected function bar() {
.. // expensive computation.
}
}
Similar containers are used e.g. as dependency injection containers (DIC).
Details
I usually use some variation of this.
It is possible to have the actual data methods in a different object than the data computation methods?
It is possible to have computation methods with parameters, using a cache with a nested array?
In PHP it is possible to use magic methods (__get() or __call()) for the main retrieval method. In combination with "#property" in the class docblock, this allows type hints for each "virtual" property.
I often use method names like "get_someValue()", where "someValue" is the actual key, to distinguish from regular methods.
It is possible to distribute the data computation to more than one object, to get some kind of separation of concerns?
It is possible to pre-initialize some values?
EDIT: Questions
There is already a nice answer talking about a cute mechanic in Spring #Configuration classes.
To make this more useful and interesting, I extend/clarify the question a bit:
Is storing intermediate values from dynamic programming a legitimate use case for this?
What are the best practices to implement this in PHP? Is some of the stuff in "Details" bad and ugly?
If I understand you correctly, this is quite a standard procedure, although, as you rightly admit, associated with DI (or bootstrapping applications).
A concrete, canonical example would be any Spring #Configuration class with lazy bean definitions; I think it displays exactly the same behavior as you describe, although the actual code that accomplishes it is hidden from view (and generated behind the scenes). Actual Java code could be like this:
#Configuration
public class Whatever {
#Bean #Lazy
public OneThing createOneThing() {
return new OneThing();
}
#Bean #Lazy
public SomeOtherThing createSomeOtherThing() {
return new SomeOtherThing();
}
// here the magic begins:
#Bean #Lazy
public SomeThirdThing getSomeThirdThing() {
return new SomeThirdThing(this.createOneThing(), this.createOneThing(), this.createOneThing(), createSomeOtherThing());
}
}
Each method marked with #Bean #Lazy represents one "resource" that will be created once it is needed (and the method is called) and - no matter how many times it seems that the method is called - the object will only be created once (due to some magic that changes the actual code during loading). So even though it seems that in createOneThing() is called two times in createOneThing(), only one call will occur (and that's only after someone tries to call createSomeThirdThing() or calls getBean(SomeThirdThing.class) on ApplicationContext).
I think you cannot have a universal lazy evaluation container for everything.
Let's first discuss what you really have there. I don't think it's lazy evaluation. Lazy evaluation is defined as delaying an evaluation to the point where the value is really needed, and sharing an already evaluated value with further requests for that value.
The typical example that comes to my mind is a database connection. You'd prepare everything to be able to use that connection when it is needed, but only when there really is a database query needed, the connection is created, and then shared with subsequent queries.
The typical implementation would be to pass the connection string to the constructor, store it internally, and when there is a call to the query method, first the method to return the connection handle is called, which will create and save that handle with the connection string if it does not exist. Later calls to that object will reuse the existing connection.
Such a database object would qualify for lazy evaluating the database connection: It is only created when really needed, and it is then shared for every other query.
When I look at your implementation, it would not qualify for "evaluate only if really needed", it will only store the value that was once created. So it really is only some sort of cache.
It also does not really solve the problem of universally only evaluating the expensive computation once globally. If you have two instances, you will run the expensive function twice. But on the other hand, NOT evaluating it twice will introduce global state - which should be considered a bad thing unless explicitly declared. Usually it would make code very hard to test properly. Personally I'd avoid that.
It is possible to have the actual data methods in a different object than the data computation methods?
If you have a look at how the Zend Framework offers the cache pattern (\Zend\Cache\Pattern\{Callback,Class,Object}Cache), you'd see that the real working class is getting a decorator wrapped around it. All the internal stuff of getting the values stored and read them back is handled internally, from the outside you'd call your methods just like before.
The downside is that you do not have an object of the type of the original class. So if you use type hinting, you cannot pass a decorated caching object instead of the original object. The solution is to implement an interface. The original class implements it with the real functions, and then you create another class that extends the cache decorator and implements the interface as well. This object will pass the type hinting checks, but you are forced to manually implement all interface methods, which do nothing more than pass the call to the internal magic function that would otherwise intercept them.
interface Foo
{
public function foo();
}
class FooExpensive implements Foo
{
public function foo()
{
sleep(100);
return "bar";
}
}
class FooCached extends \Zend\Cache\Pattern\ObjectPattern implements Foo
{
public function foo()
{
//internally uses instance of FooExpensive to calculate once
$args = func_get_args();
return $this->call(__FUNCTION__, $args);
}
}
I have found it impossible in PHP to implement a cache without at least these two classes and one interface (but on the other hand, implementing against an interface is a good thing, it shouldn't bother you). You cannot simply use the native cache object directly.
It is possible to have computation methods with parameters, using a cache with a nested array?
Parameters are working in the above implementation, and they are used in the internal generation of a cache key. You should probably have a look at the \Zend\Cache\Pattern\CallbackCache::generateCallbackKey method.
In PHP it is possible to use magic methods (__get() or __call()) for the main retrieval method. In combination with "#property" in the class docblock, this allows type hints for each "virtual" property.
Magic methods are evil. A documentation block should be considered outdated, as it is no real working code. While I found it acceptable to use magic getter and setter in a really easy-to-understand value object code, which would allow to store any value in any property just like stdClass, I do recommend to be very careful with __call.
I often use method names like "get_someValue()", where "someValue" is the actual key, to distinguish from regular methods.
I would consider this a violation of PSR-1: "4.3. Methods: Method names MUST be declared in camelCase()." And is there a reason to mark these methods as something special? Are they special at all? The do return the value, don't they?
It is possible to distribute the data computation to more than one object, to get some kind of separation of concerns?
If you cache a complex construction of objects, this is completely possible.
It is possible to pre-initialize some values?
This should not be the concern of a cache, but of the implementation itself. What is the point in NOT executing an expensive computation, but to return a preset value? If that is a real use case (like instantly return NULL if a parameter is outside of the defined range), it must be part of the implementation itself. You should not rely on an additional layer around the object to return a value in such cases.
Is storing intermediate values from dynamic programming a legitimate use case for this?
Do you have a dynamic programming problem? There is this sentence on the Wikipedia page you linked:
There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping subproblems. If a problem can be solved by combining optimal solutions to non-overlapping subproblems, the strategy is called "divide and conquer" instead.
I think that there are already existing patterns that seem to solve the lazy evaluation part of your example: Singleton, ServiceLocator, Factory. (I'm not promoting singletons here!)
There also is the concept of "promises": Objects are returned that promise to return the real value later if asked, but as long as the value isn't needed right now, would act as the values replacement that could be passed along instead. You might want to read this blog posting: http://blog.ircmaxell.com/2013/01/promise-for-clean-code.html
What are the best practices to implement this in PHP? Is some of the stuff in "Details" bad and ugly?
You used an example that probably comes close to the Fibonacci example. The aspect I don't like about that example is that you use a single instance to collect all values. In a way, you are aggregating global state here - which probably is what this whole concept is about. But global state is evil, and I don't like that extra layer. And you haven't really solved the problem of parameters enough.
I wonder why there are really two calls to bar() inside foo()? The more obvious method would be to duplicate the result directly in foo(), and then "add" it.
All in all, I'm not too impressed until now. I cannot anticipate a real use case for such a general purpose solution on this simple level. I do like IDE auto suggest support, and I do not like duck-typing (passing an object that only simulates being compatible, but without being able to ensure the instance).
There is a class like this in codeigniter framework ( I edited it to be more clear, full function is here http://pastebin.com/K33amh7r):
function &load_class($class, $directory = 'libraries', $prefix = 'CI_')
{
static $_classes = array();
// Does the class exist? If so, we're done...
if (isset($_classes[$class]))
{
return $_classes[$class];
}
is_loaded($class);
$_classes[$class] = new $name();
return $_classes[$class];
}
So, first time when class is loaded ( passed to this function), it will be saved to this static variable. Next time when the same class is loaded, this function checks if class exists already ( if it's already assigned to static, cached, I'm not sure how in memory is this stored) and if it exists, it's loaded ( NOT *instantiated* again )
As far as I can see, the only purpose is to save time or memory and not instantiate the same class twice.
My question here is: Does really instantiating a class can take up memory or consume loading time so it has to be cached like this?
CodeIgniter is is geared for rapid prototyping, and is really not a good example of enterprise patterns in almost any cases. This behavior is related to their design choice of the relationship the "controller" has to almost all other objects; namely that there is exactly one of almost anything (only one instance of controller, only one instance of each library, etc). This design choice is more for rapid development (justified by the developer "not having to keep track of as much" or some such...).
Basically, there is memory saved by not instantiating an object (as much memory as it takes to store the object's instance variables) and if the object's constructor tries to do a fair bit of work, you can save time, too.
However, the appropriateness of the single-instance imperative is clearly not universal; sometimes you really do want a new instance. When you can justify this, choose a better framework.
The resources and time used in instantiating a class are usually negligible. The main reason I usually see singleton classes used is to maintain data integrity. For example, if you have a class that represents data in a database, creating multiple objects for it can cause the data to become out of sync. If one object changes and commits data to the DB, the other objects could have old data.
Its rather a simple concept, utilizing singleton-pattern it makes sure that one class is instantiated only once during an application's execution cycle.
This sort of concept apply for libraries more. Lets see a basic example:
class Authenticate {
public function login($username, $password) {
....
}
public function logout() {
}
}
Now, through a execution of a page, there is hardly any case that the object of the above class, needs to be created more than once. The main thing to understand is Utilization of resources
And YES, instantiating same class over and over again will without a doubt add up in the memory, although it might be negligible like in the example I have shown, but it does affect.
This is probably a noob question, so please be kind.
I'm trying to implement a cache on an expensive "activity" object. In the constructor I first check the cache to see if this Activity instance already exists. If not, I do all the queries to build up the object, serialize it and save it to cache. The next time I come in, I check the cache and my object is there, so I unserialize it. Now is my problem, how do I put that object into $this, the current object? I can't just say "$this = unserialize($row[0]);" That fails with the error message, "Cannot re-assign $this in ActivityClass.php". What am I missing?
Thanks a ton!
Mike
If you don't want your construction to leave the class, you can create a factory method:
class Activity
{
public static function Create(/* your params */)
{
// construct cache and key, whatever
$obj = unserialize($cache->get($key));
if ($obj) return $obj;
return new Activity(/* params */);
}
// rest of your stuff
}
You'll have to serialize only your object's internal state, i.e. its parameters (aka "member variables"). In fact, in this instance, serialize() isn't really what you want to do; rather, you want to store your ActivityClass's data to your cache, not the serialization of the entire object. This gets tricky, though, because as you add new parameters later you need to remember to store these in your cache as well.
Alternatively, you can implement a singleton or factory pattern for your ActivityClass. Since you say you're pulling the class from the cache in the constructor, I take it that only one instance of this class is meant to exist at any given time? In this case, you should make your class a singleton, by doing the following:
Make the __construct() method private or protected.
Create a public static method (I tend to call this getInstance()) that will check your cache for the object, or instantiate a new one and then cache it.
Now instead of directly instantiating a new ActivityClass object, you instead write $foo = ActivityClass::getInstance();, which gives you either a new object or unserializes and returns your cached one.
As you noticed, you cannot just override the current object as a whole.
Instead, a possibility would be to store the data you're serializing/unserializing into a property of your object.
This way, you wouldn't serialize your whole object, but only one of its properties -- and only that single property would be overriden when unserializing.
Typically, you wouldn't serialize the connection to the database, which could be another property of your object.
Another possibility would be to not have your object deal with its own (de-)serialization.
Instead, you should :
Use an external class to instanciate your object
With that external class being responsible of either :
Loading data from cache and pushing it into your object,
Or calling the right method of your class, to load data from the database -- and, then, save that object to cache.
I'm trying to track all changes made to a PHP variable. The variable can be an object or array.
For example it looks something like:
$object = array('a', 'b');
This object is then persisted to storage using an object-cache. When php script runs again.
So when the script runs the second time, or another script runs and modifies that object, I want those modifications to be tracked, either as they are being done, or in one go after the script executes.
eg:
$object[] = 'c';
I would like to know that 'c' was added to the object.
Now the actually code looks something like this:
$storage = new Storage();
$storage->object = array('a', 'b');
second load:
$storage = new Storage();
var_dump($storage->object); // array('a', 'b')
$storage->object[] = 'c';
What I want to know is that 'c' was pushed into $storage->object so in the class "Storage" I can set that value to persistent storage.
I have tried a few methods, that work, but have downsides.
1) Wrap all objects in a class "Storable" which tracks changes to the object
The class "Storable" just saves the actual data object as a property, and then provides __get() and __set() methods to access it. When a member/property of the object is modified or added, the "Storable" class notes this.
When a a property is accessed __get() on the Storable class returns the property, wrapped in another Storable class so that changes on that are tracked also, recursively for each new level.
The problem is that the objects are no longer native data types, and thus you cannot run array functions on arrays.
eg:
$storage = new Storage();
var_dump($storage->object); // array('a', 'b')
array_push($storage->object, 'c'); // fails
So instead we'd have to implement these array functions as methods of Storable.
eg:
$storage = new Storage();
var_dump($storage->object); // array('a', 'b')
$storage->object->push('c');
This is all good, but I'd like to know if its possible to somehow use native functions, to reduce the overhead on the library I'm developing, while tracking changes so any changes can be added to persistent storage.
2) Forget about tracking changes, and just update whole object structures
This is the simplest method of keeping the objects in the program synchronized with the objects actually stored in the object-cache (which can be on a different machine).
However, it means whole structures, like an array with 1000 indexes, have to be sent though a socket to the object-cache when a single index changes.
3) Keep a mirror of the object locally
I've also tried cloning the object, and keeping a clone object untouched. Then when all processing is done by the PHP script, compare the clone to the modified object recursively, and submitting changed properties back to the object-cache.
This however requires that the whole object be downloaded in order to use it.
It also requires that the object take up twice as much memory, since it is cloned.
I know this is pretty vague, but there is a quite a bit of code involved. If anyone wants to see the code I can post it, or put it up on an open SVN repo. The project is open source but I haven't set up a public repository yet.
Since your 'object' is really an array, you can't add functionality to it. Your idea of encapsulating it with class methods is the correct approach. Worrying about performance over proper design at this stage is irrelevant and misguided - the overhead you incur with this approach will likely be insignificant to your overall application performance.
You should look into the SPL array classes such as ArrayObject. They provide a complete array-like interface and are easily extendable.
In all honesty I would reconsider what you're doing. You're really trying to turn PHP into something it's not. This is the kind of ORM you see in Java and C# not PHP, which is basically transient in nature (meaning everything, barring memcache/APC/etc, is recreated on each request). This is anathema to sophisticated object caching and change tracking.
That being said, the only way you could do this is wrap everything in something that overloads __get(), __set() and __isset() and implements ArrayAccess.
Is there any way I can persist objects in PHP?
I am creating some objects in a script, but I have to create them everytime the script is run (losing their state unless I save it to DB).
I know variables can be persisted with $_SESSION global, but should I save objects in it?
If object persistance is not possible, what's the use of OOP in PHP?
Serialize the object before you store it in the session:
$s_obj = serialize($myObj);
$_SESSION['myObj'] = $s_obj;
and later, to retrieve and reconstruct it:
$s_obj = $_SESSION['myObj'];
$myObj = unserialize($s_obj);
There is no need to serialize objects:
<?php
class A
{
protected $name;
public function __construct($name) { $this->name = $name; }
public function getName() { return $this->name; }
}
session_start();
if (isset($_SESSION['obj'])) {
die( $_SESSION['obj']->getName() );
}
$_SESSION['obj'] = new A('name');
?>
Object persistence is possible, but it is not automatically provided. You either need to write it yourself, or use an object layer that does it for you. So you'll probably need a database.
PHP is not an environment where your program responds to multiple page requests over time: instead, your program is invoked to response to a page request and terminates when it's done.
The purpose of object oriented code in PHP is to make it possible to do a whole raft of programming algorithms and styles, and to make it easier to do an even bigger range of coding solutions. Yes, they are instantiated and destroyed within a single page call, so you have to work within that paradigm. Many codebases pass object IDs around between pages or in sessions; as soon as they need the corresponding object, it is instantiated and loaded from persistent storage using that ID. A good object layer will make this easy.
Agree with jcinacio, no need to serialize values before inserting into $_SESSION..
php will manage serialize/unserialize for you on each page request/end.
Another way to persist objects/sessions is to save them on file/database, "emulating" the php behaviour. In this case you'll need to serialize values to convert them into strings, and unserialize them once retrieved from database to convert them back to object.
You may also be interested in the __sleep and __wakeup "Magic Methods" [0] of the object you're going to save. These methods are called when serializing/unserializing the object, to perform action such as connecting/disconnecting from a database, etc.
[0] http://php.net/oop5.magic
Note that if your state is truly shared between the various users, you don't want to use $_SESSION. $_SESSION is only available in the same user session - i.e. if you have 50 users on the site at once, every one of them will have to pay the computation penalty at least once.
In those cases, you might want to use a persistent disk-based on in-memory (memcache) cache.
Try a cache like APC http://www.php.net/apc/