Tracking PHP Object property changes - php

I'm trying to track all changes made to a PHP variable. The variable can be an object or array.
For example it looks something like:
$object = array('a', 'b');
This object is then persisted to storage using an object-cache. When php script runs again.
So when the script runs the second time, or another script runs and modifies that object, I want those modifications to be tracked, either as they are being done, or in one go after the script executes.
eg:
$object[] = 'c';
I would like to know that 'c' was added to the object.
Now the actually code looks something like this:
$storage = new Storage();
$storage->object = array('a', 'b');
second load:
$storage = new Storage();
var_dump($storage->object); // array('a', 'b')
$storage->object[] = 'c';
What I want to know is that 'c' was pushed into $storage->object so in the class "Storage" I can set that value to persistent storage.
I have tried a few methods, that work, but have downsides.
1) Wrap all objects in a class "Storable" which tracks changes to the object
The class "Storable" just saves the actual data object as a property, and then provides __get() and __set() methods to access it. When a member/property of the object is modified or added, the "Storable" class notes this.
When a a property is accessed __get() on the Storable class returns the property, wrapped in another Storable class so that changes on that are tracked also, recursively for each new level.
The problem is that the objects are no longer native data types, and thus you cannot run array functions on arrays.
eg:
$storage = new Storage();
var_dump($storage->object); // array('a', 'b')
array_push($storage->object, 'c'); // fails
So instead we'd have to implement these array functions as methods of Storable.
eg:
$storage = new Storage();
var_dump($storage->object); // array('a', 'b')
$storage->object->push('c');
This is all good, but I'd like to know if its possible to somehow use native functions, to reduce the overhead on the library I'm developing, while tracking changes so any changes can be added to persistent storage.
2) Forget about tracking changes, and just update whole object structures
This is the simplest method of keeping the objects in the program synchronized with the objects actually stored in the object-cache (which can be on a different machine).
However, it means whole structures, like an array with 1000 indexes, have to be sent though a socket to the object-cache when a single index changes.
3) Keep a mirror of the object locally
I've also tried cloning the object, and keeping a clone object untouched. Then when all processing is done by the PHP script, compare the clone to the modified object recursively, and submitting changed properties back to the object-cache.
This however requires that the whole object be downloaded in order to use it.
It also requires that the object take up twice as much memory, since it is cloned.
I know this is pretty vague, but there is a quite a bit of code involved. If anyone wants to see the code I can post it, or put it up on an open SVN repo. The project is open source but I haven't set up a public repository yet.

Since your 'object' is really an array, you can't add functionality to it. Your idea of encapsulating it with class methods is the correct approach. Worrying about performance over proper design at this stage is irrelevant and misguided - the overhead you incur with this approach will likely be insignificant to your overall application performance.
You should look into the SPL array classes such as ArrayObject. They provide a complete array-like interface and are easily extendable.

In all honesty I would reconsider what you're doing. You're really trying to turn PHP into something it's not. This is the kind of ORM you see in Java and C# not PHP, which is basically transient in nature (meaning everything, barring memcache/APC/etc, is recreated on each request). This is anathema to sophisticated object caching and change tracking.
That being said, the only way you could do this is wrap everything in something that overloads __get(), __set() and __isset() and implements ArrayAccess.

Related

Mocked object with fixed class name pollutes my following tests

In the framework we're using, there are several operations that are performed considering the class name.
So, while testing, I have to force the mock classname, doing something like this:
$this->getMock('Model', $methods, array($config), 'ModelFoobars');
Sadly I just found out that doing this will pollutes my following tests. This is very strange, since there is no internal cache (singleton pattern) in the objects I'm curently testing.
However, if I debug my test, I can find that my $model variable is an instance of PhpUnit, even if it was created using a new syntax!
$classname = 'ModelFoobars';
$model = new $classname();
I have no opcode cache, nor anything like that. This is really driving me crazy, any suggestions?
After a lot of testing and debugging (and I really mean a lot), I finally found the problem.
When you create a new mocked object, under the hood PhpUnit is dynamically creating new code, then it evals it.
By default, PhpUnit creates random class names, so you usually don't have any problems; however, if you force the class name, you can have "funny" results. Since the code is evaled, the class is pushed into the global scope!
This means that once you created the mock, you can't create another one with the same name: you are stuck using the first one.
The only solution I found is to avoid using fixed class names and always rely on the random ones.

Lazy evaluation container for dynamic programming?

I have some pattern that works great for me, but that I have some difficulty explaining to fellow programmers. I am looking for some justification or literature reference.
I personally work with PHP, but this would also be applicable to Java, Javascript, C++, and similar languages. Examples will be in PHP or Pseudocode, I hope you can live with this.
The idea is to use a lazy evaluation container for intermediate results, to avoid multiple computation of the same intermediate value.
"Dynamic programming":
http://en.wikipedia.org/wiki/Dynamic_programming
The dynamic programming approach seeks to solve each subproblem only once, thus reducing the number of computations: once the solution to a given subproblem has been computed, it is stored or "memo-ized": the next time the same solution is needed, it is simply looked up
Lazy evaluation container:
class LazyEvaluationContainer {
protected $values = array();
function get($key) {
if (isset($this->values[$key])) {
return $this->values[$key];
}
if (method_exists($this, $key)) {
return $this->values[$key] = $this->$key();
}
throw new Exception("Key $key not supported.");
}
protected function foo() {
// Make sure that bar() runs only once.
return $this->get('bar') + $this->get('bar');
}
protected function bar() {
.. // expensive computation.
}
}
Similar containers are used e.g. as dependency injection containers (DIC).
Details
I usually use some variation of this.
It is possible to have the actual data methods in a different object than the data computation methods?
It is possible to have computation methods with parameters, using a cache with a nested array?
In PHP it is possible to use magic methods (__get() or __call()) for the main retrieval method. In combination with "#property" in the class docblock, this allows type hints for each "virtual" property.
I often use method names like "get_someValue()", where "someValue" is the actual key, to distinguish from regular methods.
It is possible to distribute the data computation to more than one object, to get some kind of separation of concerns?
It is possible to pre-initialize some values?
EDIT: Questions
There is already a nice answer talking about a cute mechanic in Spring #Configuration classes.
To make this more useful and interesting, I extend/clarify the question a bit:
Is storing intermediate values from dynamic programming a legitimate use case for this?
What are the best practices to implement this in PHP? Is some of the stuff in "Details" bad and ugly?
If I understand you correctly, this is quite a standard procedure, although, as you rightly admit, associated with DI (or bootstrapping applications).
A concrete, canonical example would be any Spring #Configuration class with lazy bean definitions; I think it displays exactly the same behavior as you describe, although the actual code that accomplishes it is hidden from view (and generated behind the scenes). Actual Java code could be like this:
#Configuration
public class Whatever {
#Bean #Lazy
public OneThing createOneThing() {
return new OneThing();
}
#Bean #Lazy
public SomeOtherThing createSomeOtherThing() {
return new SomeOtherThing();
}
// here the magic begins:
#Bean #Lazy
public SomeThirdThing getSomeThirdThing() {
return new SomeThirdThing(this.createOneThing(), this.createOneThing(), this.createOneThing(), createSomeOtherThing());
}
}
Each method marked with #Bean #Lazy represents one "resource" that will be created once it is needed (and the method is called) and - no matter how many times it seems that the method is called - the object will only be created once (due to some magic that changes the actual code during loading). So even though it seems that in createOneThing() is called two times in createOneThing(), only one call will occur (and that's only after someone tries to call createSomeThirdThing() or calls getBean(SomeThirdThing.class) on ApplicationContext).
I think you cannot have a universal lazy evaluation container for everything.
Let's first discuss what you really have there. I don't think it's lazy evaluation. Lazy evaluation is defined as delaying an evaluation to the point where the value is really needed, and sharing an already evaluated value with further requests for that value.
The typical example that comes to my mind is a database connection. You'd prepare everything to be able to use that connection when it is needed, but only when there really is a database query needed, the connection is created, and then shared with subsequent queries.
The typical implementation would be to pass the connection string to the constructor, store it internally, and when there is a call to the query method, first the method to return the connection handle is called, which will create and save that handle with the connection string if it does not exist. Later calls to that object will reuse the existing connection.
Such a database object would qualify for lazy evaluating the database connection: It is only created when really needed, and it is then shared for every other query.
When I look at your implementation, it would not qualify for "evaluate only if really needed", it will only store the value that was once created. So it really is only some sort of cache.
It also does not really solve the problem of universally only evaluating the expensive computation once globally. If you have two instances, you will run the expensive function twice. But on the other hand, NOT evaluating it twice will introduce global state - which should be considered a bad thing unless explicitly declared. Usually it would make code very hard to test properly. Personally I'd avoid that.
It is possible to have the actual data methods in a different object than the data computation methods?
If you have a look at how the Zend Framework offers the cache pattern (\Zend\Cache\Pattern\{Callback,Class,Object}Cache), you'd see that the real working class is getting a decorator wrapped around it. All the internal stuff of getting the values stored and read them back is handled internally, from the outside you'd call your methods just like before.
The downside is that you do not have an object of the type of the original class. So if you use type hinting, you cannot pass a decorated caching object instead of the original object. The solution is to implement an interface. The original class implements it with the real functions, and then you create another class that extends the cache decorator and implements the interface as well. This object will pass the type hinting checks, but you are forced to manually implement all interface methods, which do nothing more than pass the call to the internal magic function that would otherwise intercept them.
interface Foo
{
public function foo();
}
class FooExpensive implements Foo
{
public function foo()
{
sleep(100);
return "bar";
}
}
class FooCached extends \Zend\Cache\Pattern\ObjectPattern implements Foo
{
public function foo()
{
//internally uses instance of FooExpensive to calculate once
$args = func_get_args();
return $this->call(__FUNCTION__, $args);
}
}
I have found it impossible in PHP to implement a cache without at least these two classes and one interface (but on the other hand, implementing against an interface is a good thing, it shouldn't bother you). You cannot simply use the native cache object directly.
It is possible to have computation methods with parameters, using a cache with a nested array?
Parameters are working in the above implementation, and they are used in the internal generation of a cache key. You should probably have a look at the \Zend\Cache\Pattern\CallbackCache::generateCallbackKey method.
In PHP it is possible to use magic methods (__get() or __call()) for the main retrieval method. In combination with "#property" in the class docblock, this allows type hints for each "virtual" property.
Magic methods are evil. A documentation block should be considered outdated, as it is no real working code. While I found it acceptable to use magic getter and setter in a really easy-to-understand value object code, which would allow to store any value in any property just like stdClass, I do recommend to be very careful with __call.
I often use method names like "get_someValue()", where "someValue" is the actual key, to distinguish from regular methods.
I would consider this a violation of PSR-1: "4.3. Methods: Method names MUST be declared in camelCase()." And is there a reason to mark these methods as something special? Are they special at all? The do return the value, don't they?
It is possible to distribute the data computation to more than one object, to get some kind of separation of concerns?
If you cache a complex construction of objects, this is completely possible.
It is possible to pre-initialize some values?
This should not be the concern of a cache, but of the implementation itself. What is the point in NOT executing an expensive computation, but to return a preset value? If that is a real use case (like instantly return NULL if a parameter is outside of the defined range), it must be part of the implementation itself. You should not rely on an additional layer around the object to return a value in such cases.
Is storing intermediate values from dynamic programming a legitimate use case for this?
Do you have a dynamic programming problem? There is this sentence on the Wikipedia page you linked:
There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping subproblems. If a problem can be solved by combining optimal solutions to non-overlapping subproblems, the strategy is called "divide and conquer" instead.
I think that there are already existing patterns that seem to solve the lazy evaluation part of your example: Singleton, ServiceLocator, Factory. (I'm not promoting singletons here!)
There also is the concept of "promises": Objects are returned that promise to return the real value later if asked, but as long as the value isn't needed right now, would act as the values replacement that could be passed along instead. You might want to read this blog posting: http://blog.ircmaxell.com/2013/01/promise-for-clean-code.html
What are the best practices to implement this in PHP? Is some of the stuff in "Details" bad and ugly?
You used an example that probably comes close to the Fibonacci example. The aspect I don't like about that example is that you use a single instance to collect all values. In a way, you are aggregating global state here - which probably is what this whole concept is about. But global state is evil, and I don't like that extra layer. And you haven't really solved the problem of parameters enough.
I wonder why there are really two calls to bar() inside foo()? The more obvious method would be to duplicate the result directly in foo(), and then "add" it.
All in all, I'm not too impressed until now. I cannot anticipate a real use case for such a general purpose solution on this simple level. I do like IDE auto suggest support, and I do not like duck-typing (passing an object that only simulates being compatible, but without being able to ensure the instance).

PHPUnit: Mock methods of existing object

PHPUnit's getMock($classname, $mockmethods) creates a new object based on the given class name and lets me change/test the behavior of the methods I specified.
I long for something different; it's changing the behavior of methods of an existing object - without constructing a new object.
Is that possible? If yes, how?
When thinking about the problem I came to the conclusion that it would be possible by serializing the object, changing the serialized string to let the object be instance of a new class that extends the old class plus the mocked methods.
I'd like some code for that - or maybe there is such code already somewhere.
While it would certainly be possible to create the to-be-mocked object again, it's just too complicated to do it in my test. Thus I don't want to do that if I don't really really really have to. It's a TYPO3 TSFE instance, and setting that up once in the bootstrapping process is hard enough already.
I know this answer is rather late, but I feel for future viewers of this question there now exists a simpler solution (which has some drawbacks, but depending on your needs can be much easier to implement). Mockery has support for mocking pre-existing objects with what they call a "proxied partial mock." They say that this is for classes with final methods, but it can be used in this case (although the docs do caution that it should be a "last resort").
$existingObjectMock = \Mockery::mock($existingObject);
$existingObjectMock->shouldReceive('someAction')->andReturn('foobar');
It acts by creating a proxy object which hands all method calls and attribute gets/sets to the existing object unless they are mocked.
It should be noted, though, that the proxy suffers from the obvious issue of failing any typechecks or typehints. But this can be usually avoided, because the $existingObject can still be passed around. You should only use the $existingObjectMock when you need the mock capabilities.
Not all pre-existing code can be tested. Code really needs to be designed to be testable. So, while not exactly what you're asking for, you can refactor the code so that the instantiation of the object is in a separate method, then mock that method to return what you want.
class foo {
function new_bar($arg) { return new bar($arg); }
function blah() {
...
$obj = $this->new_bar($arg);
return $obj->twiddle();
}
}
and then you can test it with
class foo_Test extends PHPUnit_Framework_TestCase {
function test_blah() {
$sut = $this->getMock('foo', array('new_bar', 'twiddle'));
$sut->expects($this->once())->method('new_bar')->will($this->returnSelf());
$sut->expects($this->once())->method('twiddle')->will($this->returnValue('yes'));
$this->assertEquals('yes', $sut->blah());
}
}
Let me start of by saying: Welcome to the dark side of unit testing.
Meaning: You usually don't want to do this but as you explained you have what seems to be a valid use case.
runkit
What you can do quite easily, well not trivial but easier than changing your application architecture, is to change class behavior on the fly by using runkit
runkit_method_rename(
get_class($object), 'methodToMock', 'methodToMock_old'
);
runkit_method_add(
get_class($object), 'methodToMock', '$param1, $param2', 'return 7;'
);
runkit::method_add
and after the test to a method_remove and the rename again. I don't know of any framework / component that helps you with that but it's not that much to implement on your own in a UglyTestsBaseTest extends PHPUnit_Framework_TestCase.
Well...
If all you have access to is a reference to that object (as in: The $x in $x = new Foo();) i don't know of any way to say: $x, you are now of type SomethingElse and all other variables pointing to that object should change too.
I'm going to assume you already know things like testing your privates but it doesn't help you because you don't have control over the objects life cycle.
The php test helpers extension
Note: the Test-Helper extension is superseded by https://github.com/krakjoe/uopz
What also might help you out here is: Stubbing Hard-Coded Dependencies using the php-test-helpers extension that allows you do to things like Intercepting Object Creation.
That means while your application calls $x = new Something(); you can hack PHP to make it so that $x then contains an instance of YourSpecialCraftedSomething.
You might create that classing using the PHPUnit Mocking API or write it yourself.
As far as i know those are your options. If it's worth going there (or just writing integration / selenium tests for that project) is something you have to figure out on your own as it heavily depends on your circumstances.

Create an object instance from cache trough the class in PHP

Given a class with some really expensive code, I want to avoid running that code when re-defining an instance.
Best explained with some pseudo-code:
$foo = new Foo('bar');
print $foo->eat_cpu_and_database_resources(); #=> 3.14159
$foo->store_in_cache(); #Uses an existing Memcached and/or caching to store serialized.
#new thread, such as a new HTTP request. Could be days later.
$bar = new Foo('bar');
print $foo->eat_cpu_and_database_resources(); #=> 3.14159
The second $bar should re-initialize the earlier created instance $foo. Inside my actual class, I do several things on eat_cpu_and_database_resources(), which is named get_weighted_tags(): calculate a weighted tagcloud from values in $foo->tags. $foo->tags() was filled with expensive $foo->add_tag() calls. I would like to retrieve the prepared and filled instance from now on, from cache.
I have tried to simply fetch from (serialized) cache on __construct() and assign the retrieved instance to $this, which is not allowed in PHP:
function __construct ($id) {
if ($cached = $this->cache_get($id)) {
$this = $cached
}
else {
#initialize normally
}
}
Is this a proper way? Or should I treat every instance unique and instead apply caching in the eat_cpu_and_database_resources() method, instead of caching the entire instance?
Is there a built-in way in PHP to revive old instances (in a new thread)?
Depending on the size of Foo, you might want to cache the entire object in the cache store Drupal provides. If it's too big for that, see if it makes sense to just cache the result to the expensive method call(s).
If you want to unserialize an object from the PHP internal format, you have to use the corresponding unserialize method and might want to add the magic __wakeup method to do any post re-initializations:
The intended use of __wakeup is to reestablish any database connections that may have been lost during serialization and perform other reinitialization tasks.
Since you have to have the serialized string for that first, you might want to add some facilitiy to encapsulate this logic, like a Factory or Builder pattern or a dedicated FooCache.
Personally I find caching the method call the best option because there's no point in caching the whole object when it's really just the method call that's expensive. That will also save you any additional work checking whether there is a serialized string to start with or building a factory.

Implications of Instantiating Objects with Dynamic Variables in PHP

What are the performance, security, or "other" implications of using the following form to declare a new class instance in PHP
<?php
$class_name = 'SomeClassName';
$object = new $class_name;
?>
This is a contrived example, but I've seen this form used in Factories (OOP) to avoid having a big if/switch statement.
Problems that come immediately to mind are
You lose the ability to pass arguments into a constructor (LIES. Thanks Jeremy)
Smells like eval(), with all the security concerns it brings to the table (but not necessarily the performance concerns?)
What other implications are there, or what search engine terms other than "Rank PHP Hackery" can someone use to research this?
One of the issues with the resolving at run time is that you make it really hard for the opcode caches (like APC). Still, for now, doing something like you describe in your question is a valid way if you need a certain amount of indirection when instanciating stuff.
As long as you don't do something like
$classname = 'SomeClassName';
for ($x = 0; $x < 100000; $x++){
$object = new $classname;
}
you are probably fine :-)
(my point being: Dynamically looking up a class here and then doesn't hurt. If you do it often, it will).
Also, be sure that $classname can never be set from the outside - you'd want to have some control over what exact class you will be instantiating.
It looks you can still pass arguments to the constructor, here's my test code:
<?php
class Test {
function __construct($x) {
echo $x;
}
}
$class = 'Test';
$object = new $class('test'); // echoes "test"
?>
That is what you meant, right?
So the only other problem you mentioned and that I can think of is the security of it, but it shouldn't be too difficult to make it secure, and it's obviously a lot more secure than using eval().
I would add that you can also instanciate it with a dynamic number of parameters using :
<?php
$class = "Test";
$args = array('a', 'b');
$ref = new ReflectionClass($class);
$instance = $ref->newInstanceArgs($args);
?>
But of course you add some more overhead by doing this.
About the security issue I don't think it matters much, at least it's nothing compared to eval(). In the worst case the wrong class gets instanciated, of course this is a potential security breach but much harder to exploit, and it's easy to filter using an array of allowed classes, if you really need user input to define the class name.
There may indeed be a performance hit for having to resolve the name of the variable before looking up the class definition. But, without declaring classes dynamically you have no real way to do "dyanmic" or "meta" programming. You would not be able to write code generation programs or anything like a domain-specific language construct.
We use this convention all over the place in some of the core classes of our internal framework to make the URL to controller mappings work. I have also seen it in many commercial open source applications (I'll try and dig for an example and post it). Anyway, the point of my answer is that it seems well worth what is probably a slight performance decrease if it makes more flexible, dynamic code.
The other trade-off that I should mention, though, is that performance aside, it does make the code slightly less obvious and readable unless you are very careful with your variable names. Most code is written once, and re-read and modified many times, so readability is important.
Alan, there's nothing wrong with dynamic class initialisation. This technique is present also in Java language, where one can convert string to class using Class.forClass('classname') method. It is also quite handy to defer algorithm complexity to several classes instead of having list of if-conditions. Dynamic class names are especially well suited in situations where you want your code to remain open for extension without the need for modifications.
I myself often use different classes in conjunction with database tables. In one column I keep class name that will be used to handle the record. This gives me great power of adding new types of records and handle them in unique way without changing a single byte in existing code.
You shouldn't be concerned about the performance. It has almost no overhead and objects themselves are super fast in PHP. If you need to spawn thousands of identical objects, use Flyweight design pattern to reduce memory footprint. Especially, you should not sacrifice your time as a developer just to save milliseconds on server. Also op-code optimisers work seamlessly with this technique. Scripts compiled with Zend Optimizer did not misbehave.
So I've recently encountered this, and wanted to give my thoughts on the "other" implications of using dynamic instantiation.
For one thing func_get_args() throws a bit of a wrench into things. For example I want to create a method that acts as a constructor for a specific class (e.g. a factory method). I'd need to be able to pass along the params passed to my factory method to the constructor of the class I'm instantiating.
If you do:
public function myFactoryMethod()
{
$class = 'SomeClass'; // e.g. you'd get this from a switch statement
$obj = new $class( func_get_args() );
return $obj;
}
and then call:
$factory->myFactoryMethod('foo','bar');
You're actually passing an array as the first/only param, which is the same as new SomeClass( array( 'foo', 'bar' ) ) This is obviously not what we want.
The solution (as noted by #Seldaek) requires us to convert the array into params of a constructor:
public function myFactoryMethod()
{
$class = 'SomeClass'; // e.g. you'd get this from a switch statement
$ref = new ReflectionClass( $class );
$obj = $ref->newInstanceArgs( func_get_args() );
return $obj;
}
Note: This could not be accomplished using call_user_func_array, because you can't use this approach to instantiate new objects.
HTH!
I use dynamic instantiation in my custom framework. My application controller needs to instantiate a sub-controller based on the request, and it would be simply ridiculous to use a gigantic, ever-changing switch statement to manage the loading of those controllers. As a result, I can add controller after controller to my application without having to modify the app controller to call them. As long as my URIs adhere to the conventions of my framework, the app controller can use them without having to know anything until runtime.
I'm using this framework in a production shopping cart application right now, and the performance is quite favorable, too. That being said, I'm only using the dynamic class selection in one or two spots in the whole app. I wonder in what circumstances you would need to use it frequently, and whether or not those situations are ones that are suffering from a programmer's desire to over-abstract the application (I've been guilty of this before).
One problem is that you can't address static members like that, for instance
<?php
$className = 'ClassName';
$className::someStaticMethod(); //doesn't work
?>
#coldFlame: IIRC you can use call_user_func(array($className, 'someStaticMethod') and call_user_func_array() to pass params
class Test {
function testExt() {
print 'hello from testExt :P';
}
function test2Ext()
{
print 'hi from test2Ext :)';
}
}
$class = 'Test';
$method_1 = "testExt";
$method_2 = "test2Ext";
$object = new $class(); // echoes "test"
$object->{$method_2}(); // will print 'hi from test2Ext :)'
$object->{$method_1}(); // will print 'hello from testExt :P';
this trick works in both php4 and php5 :D enjoy..

Categories