Implementing a memory-efficient arrayAccess class in PHP

Implementing a memory-efficient arrayAccess class in PHP - php

I am creating a class that implements the composite pattern; the class is supposed to be able to represent a tree (so let's call it Tree).
In order to use as little memory as possible, when a Tree class instance is passed an array, it is kept as an array until the index is required; at which time a new child Tree instance is created and returned
The reasoning for this: My Tree instances pass through many filters and their content might be traversed and modified by many observers, but their content will be requested only once (on the final rendering). Moreover, a certain chunk might not be rendered at all, so it would be wasted memory to create instances of Tree that are not even going to be used.
In other words, if one uses
$class->set($array,$index); //(or $class[$index] = $array)
, the array is stored as normal. But when someone uses
$class->get($index) //(or $class[$index])
, a new instance of Tree will be returned (and cached for subsequent calls).
However, I am left with a dilemma: Do I
create a new instance of Tree when someone sets data?
Pros: code is easy to write, easy to maintain, easy for someone to pick up and improve upon
Cons: memory consumption, even more if we consider a part of the data entered might not be used. Manipulation gets more messy as special methods have to be written for special cases (as opposed to dealing with native arrays).
leave it as is, but have people do $class[$index] = array($index2=>$array)?
Pros: everything stays stored as an array, so normal array functions work, and memory consumption is
minimal.
cons: the usage is less "elegant", and possibly more complex.
I know this is not a forum, so I am not asking for opinions, but for the advice of someone having already developed a (potentially heavy) tree structure in PHP and what is the best way (or "accepted way") to go about it.

Demo creating the Tree on construct and Demo creating Tree on demand are simple tests you can run for yourself.
The latter creates new node objects only when they're accessed, thus the memory consumption is not that high.
The beauty of object oriented programming, encapsulation and facades is that nobody cares what you do internally, as long as the API does what it's supposed to. Whoever / Whatever is using your TreeNode doesn't need to know how it functions internally. Just go with whatever feels right. Design your APIs right and you can change internals any time.

Related

Are DAO objects better than static DAO classes in PHP?

I understand the reasons for not using statics in Java.
However, I'm currently developing OO code in PHP. I use DAOs with the goal of keeping my queries in one place so I can easily find them. I also instantiate some DAOs so I can incorporate pagination in some (relevant) queries. In many cases, it's not necessary and so I tend to just create static methods (even though technically I don't think I can call that a DAO) in the form:
$info = schemeDAO::someFunction($variable);
I may need only that single method during a page refresh (i.e. a specific value in a header file).
I may need to instantiate the same DAO a hundred times as objects are created and destroyed.
$dao = new myDao();
$info = $dao->someFunction($variable);
Either way, it seems to me, in PHP at least, wouldn't it be more performance efficient to simply load a static and keep it in memory?

While the static access is acceptable (to an extent), with the dynamic approach you can pass the object transitively to the 3rd side object via dependency, (otherwise also the transitive call the transition of dependency would have to be initiated from the original class), which needs not to be pushed some data, but rather the dependency decides and pulls the data/method it needs be it multiple times in a single method. Otherwise way it can only return, while instance can be called, not-separated wrapper method logic from data. Instance inline code seems to be shorter, and when you remove an instance, all their calls complain at that moment, whereas static class continues to preserve unnoticed in the code as they don't need the instantiation prerequisite.
Static classes preserve their state in between various objects, and methods contexts and thus are not automatically "reset" as it is with the 'new construct'. Instances encourage more transparent pure functions approach - passing parameters. When you pass an object, you don't separate the service logic from it's data structure, when you pass only the array data structure, the execution logic is lost in transit or separated into another place and must be eventually called intransparently statically when not-passed - pure functions concept.
I would use comparison with Einsteins vs Newton's equations. In some cases, they look totally the same. But to be pretty content I would use more versatile instances or service locator singletons, vs static classes. On the other side, the less "versatile" static classes might be initially easier to implement, especially if you don't plan on rocket with them as far as to the space orbit as you might get with instances. Similarly as with private attributes you signal they are not passed anywhere, pure functions, though it might less often signalize also the bad, being called from anywhere.

PHP Iterator classes

I'm trying to figure out what's the actual benefit of using Iterator classes in Object Oriented PHP over the standard array.
I'm planning to upgrade my framework by converting all arrays to object, but I just don't understand the actual need apart from having the system being fully OOP.
I know that by the use of IteratorAggregate I can create:
class MyModel implements IteratorAggregate {
public $records = array();
public function __construct(array $records) {
$this->records = $records;
}
public function getIterator() {
return new ArrayIterator($this->records);
}
}
and then simply loop through it like using the array:
$mdlMy = new MyModel(array(
array('first_name' => 'Mark', 'last_name' => 'Smith'),
array('first_name' => 'John', 'last_name' => 'Simpson')
));
foreach($mdlMy as $row) {
echo $row['first_name'];
echo $row['last_name'];
}
Could someone in simple terms explain the actual purpose of these - perhaps with some use case.

Shortest Answer
Extensibility & abstraction.
Abstract Answer
As soon as you have the ArrayAccess interface, you've got things that aren't arrays but have an array interface. How will you traverse these? You could do it directly, which is where the Iterator interface comes from. Iterator might not make sense for some classes, either due to the single-responsibility principle, or for performance's sake, which is where you get IteratorAggregate.
SPL-based Answer
SPL introduced a number of data structures. Iterators allow these to be traversed in foreach loops. Without iterators, collections would need to be converted to arrays, a potentially costly operation.
Long Answer
Source-Iterators
The first use comes up with data sources (e.g. collections), which aren't all natively held in arrays. Examples (note: there is some overlap):
trees
the file system
the previously mentioned SPL data structures
network communications
database query results
external process results
ongoing computation (PHP 5.5 introduces generators for this case)
Any collection that isn't array-based typically either is an iterator or has a corresponding iterator. Without iterators, each of the above would need to be converted to or collected in an array, which might incur heavy time & space costs. If you only had arrays available for iteration, the process can't proceed until the conversion/collection finishes. Iterators allow for partial results to be processed as they become available, and for only portions of collections to be in memory at any point in time.
In particular case outlined in the question, the UserEntityManager::getAll() method could benefit from an Iterator by reducing memory usage. Depending on what is used for data storage, an Iterator will allow just some user records to be processed at a time, rather than loading all at once.
ArrayIterator, DirectoryIterator, and the SPL data structures are all examples of source-iterators (i.e. they iterate over a data source).
Processing-Iterators
Another use for iterators is in-place data processing. Processing iterators wrap other iterators, which allows for iterator composition. In PHP, these are the OuterIterators and sometimes have 'IteratorIterator' in their names.
You might ask "Why not just use functions?" The answer is that you could, but iterator composition (like function composition) is another (powerful) tool that allows for different types of solutions, sometimes achieving better performance or clarity. In particular, functions become a choke point in PHP, since it doesn't have in-language concurrency. Functions must finish before returning a result, which can be costly in terms of time & space, just as using arrays for iteration can be costly. Shallow.
The choke-point could be side-stepped by returning an iterator from a function, but that places a function call in between each iterator. Iterator composition allows deep iterator-based computations, cutting out the middle-man.
As for use-cases, consider a batch processing system that consumes data from multiple feeds, all of which have different formats. An adapting iterator can normalize the data for processing, allowing a single batch processor to service all the feeds.
As a reality check, in PHP you typically don't go full iterator-style any more than you'd write full-FP style, though PHP supports it. You usually don't compose more than a few iterators at a time (just as you often don't compose more than a few functions at a time in languages with function composition), and you don't create numerous iterators instead of functions.
RecursiveIteratorIterator is an example of a processing iterator; it linearizes a tree (simplifying tree traversal).
Iterator & Functional Styles
Iterator composition allows for a style closer to functional programming. At its most basic, an iterator is (roughly) a sequence. In FP, the most basic operation is fold (aka reduce), though others (especially append/concat, filter and map) are often implemented natively rather than in terms of fold for performance. PHP supports a few sequence operations on iterators (usually as OuterIterators); many are missing, but are easy to implement.
append: AppendIterator
cons: nothing, but easily (though not efficiently) implemented by creating an iterator that takes a single value, converting it to a single-element sequence, along with AppendIterator. EmptyIterator represents the empty sequence.
filter: CallbackFilterIterator
convolute (aka zip): MultipleIterator
slice: LimitIterator
map - nothing, but easily implemented
fold: nothing. Using a foreach loop and accumulating a value in a variable is probably clearer than implementing fold, but if you find a reason to do so, it's also straightforward (though probably not as an iterator).
flat-map: nothing. Can be written fairly easily (though not efficiently) in terms of append and map.
cycle: InfiniteIterator
unfold: generators (which are, in general, just a special case of iterators).
memoization: CachingIterator. Not so much a sequence operation as an (FP language) feature for function results.
The Future
Part of any language design is considering what the language could be. If concurrency were ever added to PHP, code that uses iterators (especially processing-iterators) could be made concurrent without being changed by making the iterators themselves concurrent.

Like this you have no advantage. Yet. But as soon as you need to implement some new functionality, you are having fun!
If you want to add a record to your 'array', but need to do some checking first, you'll have that in all instances: you don't have to find all the places you use this array, but you'll just add your validation to the constructor.
All other things that could go in a model will be your advantage.
But beware, 'OOP' is not "I'm using objects me!". It is also about what objects you have, what they do, etc. So don't just go packing all your arrays in objects and call it "going fully OOP". There's nothing OOP about that, nor is there a limit on using arrays in an OOP project.

PHP Extending class vs. passing objects as parameter - performance issue

I need to build a class for generating (and submitting, validating etc) web forms. I also have a class for generating raw HTML by writing PHP code.
Now, when designing the class to build web-forms, I need many HTML generator related activities, in other words, I need some functionalities done by the HTML generator class in a great deal. I can achieve this in two ways:
Making the form class extending the HTMl class: class WebForm extends HTML {}
I've already created an object of HTML class, (let, $html) for some other purposes in my project. I can pass this as parameter in the constructor of the WebForm class:
class WebForm{
public $html;
public function __construct($html) {
$this->html = $html;
}
}
$html = new HTML(); // already created for some other purpose
$webform = new WebForm($html);
Which method would be faster & why?

It almost certainly makes negligible difference, unless you are making millions of calls to HTML methods.
If I had to guess (and this really is just a guess), then I'd say that Option #1 could potentially be faster (and negligibly so). The rationale being that there's one less level of indirection required to invoke HTML methods. But the only way to confirm this theory is to profile each option in turn.
Note also that you should be designing your classes for clarity first, performance second (and only once you've confirmed it's an issue). For example, you should ask yourself whether it makes sense to have WebForm extend HTML; is it meaningful to say that a Webform is an HTML?

Speaking in C terms, the inheritance approach would be faster by one pointer dereference operation than the aggregation approach (which needs the additional $this->html dereference). In PHP it should be comparable.
However, the cost of one pointer dereference is practically negligible. You should not base your architecture on this purely theoretical difference because it's never worth it. Even if you are already serving hundreds of requests per second and you need extra performance, this is not the way to go. This is a classic case where aggregation is to be preferred over inheritance.
Finally, consider making the Html class only have static methods. It's totally reasonable since it's a helper class, so it doesn't really have to maintain any state. If for some reason you feel it needs some state, have the callers maintain it and pass it to Html as function parameters. This will not only improve your design, but it will also remove the additional pointer dereference we were talking about above. And as a further bonus, it will completely negate the overhead of instantiating Html instances even if you need to instantiate multiple WebForm objects over the lifetime of a single request (both approaches you consider do have this drawback).

Honestly, you'll never be able to get an educated answer until you write your application (or a prototype) and start profiling, otherwise you'll get answers based on assumptions.
You should consider more then just speed to structure your application. Decoupling, maintenance, testability and everything related. Speed might be negligible and you choose one over the other but then you'll be trapped in a maintenance hell.
I suggest that you implement both approaches and see for yourself. There are tools to help profiling (like xdebug).

The extending method is the faster, because you'll only instantiate a single object. This is not measurable, however, if you do this only once in a page.
The composing method is the better, for maintainability.
BTW you should seriously consider moving all your HTML-generating code to a template engine. Just render templates from you form class, and handle only validation in PHP, not HTML generation, or you will end up doing view-related tasks in your controllers.

Resetting Objects vs. Constructing New Objects

Is it considered better practice and/or more efficient to create a 'reset' function for a particular object that clears/defaults all the necessary member variables to allow for further operations, or to simply construct a new object from outside?
I've seen both methods employed a lot, but I can't decide which one is better. Of course, for classes that represent database connections, you'd have to use a reset method rather than constructing a new one resulting in needless connecting/disconnecting, but I'm talking more in terms of abstraction classes.
Can anyone give me some real-world examples of when to use each method? In my particular case I'm thinking mostly in terms of ORM or the Model in MVC. For example, if I would want to retrieve a bunch of database objects for display and modify them in one operation.

When you re-use the objects, you're using the Object Pool pattern.
One of the main issues to consider is is how much state these objects have, and how much of that state needs to be reset for the next user. With a database connection, you don't want to have to do the connection again - otherwise you might as well just create a new one. The idea is to leave the object connected, but to clear any results.
Reasons not to use object pool:
Complexity of the pool
Memory cost of having these objects instantiated when not required. This might even slow down garbage collection.
Establishing exactly what state needs to be reset
Reasons to use an object pool:
It takes too long to create or destroy an object
Further details in a paper by Kircher and Jain.

Reseting is done for performance reasons. Default approach is to create new object when you need it, not recycle some existing one. If you are not worrying about your php being slow than just create. If you worry about php being slow, you should stop and worry about other things you rely on being much slower.

I can't but help get the idea I'm doing it all wrong (Python, again)

All of the questions that I've asked recently about Python have been for this project. I have realised that the reason I'm asking so many questions may not be because I'm so new to Python (but I know a good bit of PHP) and is probably not because Python has some inherent flaw.
Thus I will now say what the project is and what my current idea is and you can either tell me I'm doing it all wrong, that there's a few things I need to learn or that Python is simply not suited to dealing with this type of project and language XYZ would be better in this instance or even that there's some open source project I might want to get involved in.
The project
I run a free turn based strategy game (think the campaign mode from the total war series but with even more complexity and depth) and am creating a combat simulator for it (again, think total war as an idea of how it'd work). I'm in no way deluded enough to think that I'll ever make anything as good as the Total war games alone but I do think that I can automate a process that I currently do by hand.
What will it do
It will have to take into account a large range of variables for the units, equipment, training, weather, terrain and so on and so forth. I'm aware it's a big task and I plan to do it a piece at a time in my free time. I've zero budget but it's a hobby that I'm prepared to put time into (and have already).
My current stumbling block
In PHP everything can access everything else, "wrong" though some might consider this it's really really handy for this. If I have an array of equipment for use by the units, I can get hold of that array from anywhere. With Python I have to remake that array each time I import the relevant data file and this seems quite a silly solution for a language that from my experience is well thought out. I've put in a system of logging function calls and class creation (because I know from a very basic version of this that I did in PHP once that it'll help a lot down the line) and the way that I've kept the data in one place is to pass each of my classes an instance to my logging list, smells like a hack to me but it's the only way I've gotten it to work.
Thus I conclude I'm missing something and would very much appreciate the insight of anybody willing to give it. Thank you.
Code samples
This creates a list of formations, so far there's only one value (besides the name) but I anticipate adding more on which is why they're a list of classes rather than just a standard list. This is found within data.py
formations = []
formationsHash = []
def createFormations(logger):
"""This creates all the formations that will be used"""
# Standard close quarter formation, maximum number of people per square metre
formationsHash.append('Tight')
formations.append(Formation(logger, 'Tight', tightness = 1))
# Standard ranged combat formation, good people per square metre but not too cramped
formationsHash.append('Loose')
formations.append(Formation(logger, 'Loose', tightness = 0.5))
# Standard skirmishing formation, very good for moving around terrain and avoiding missile fire
formationsHash.append('Skirmish')
formations.append(Formation(logger, 'Skirmish', tightness = 0.1))
# Very unflexible but good for charges
formationsHash.append('Arrowhead')
formations.append(Formation(logger, 'Arrowhead', tightness = 1))
def getFormation(searchFor):
"""Returns the fomation object with this name"""
indexValue = formationsHash.index(searchFor)
return formations[indexValue]
I don't have a code sample of when I'd need to access it because I've not gotten as far as making it but I anticipate the code looking something like the following:
Python
tempFormation = data.getFormation(unit.formationType)
tempTerrain = data.getTerrain(unit.currentTerrain)
unit.attackDamage = unit.attackDamage * tempTerrain.tighnessBonus(tempFormation.tightness)
The unit contains an integer that links to the index/key of the relevant terrain, formation and whatnot in the master list. Temporary variables are used to make the 3rd line shorter but in the long run will possibly cause issues if I forget to get one and it's use a value from earlier which is then incorrect (that's where the logging comes in handy).
PHP
$unit->attackDamage *= $terrain[$unit->currentTerrain]->tighnessBonus($unit->currentTerrain)
The unit class contains the index (probably a string) of the relevant terrain it's on and the formation it's in.
Maybe this will show some massive flaw in my understanding of Python (6 months vs the 3 years of PHP).

With Python I have to remake that
array each time I import the relevant
data file
You're missing a subtle point of Python semantics here. When you import a module for a second time, you aren't re-executing the code in that module. The name is found in a list of all modules imported, and the same module is returned to you. So the second time you import your module, you'll get a reference to the same list (in Python, don't say array, say list).
You'll probably need to post specific code samples to get more help, it seems like there are a few Python misconceptions mixed into this, and once those are cleared up you'll have a simpler time.

I have narrowed your issue down to:
With Python I have to remake that
array each time I import the relevant
data file
Well you have two choices really, the first and easiest is to keep the structure in memory. That way (just like PHP) you can in theory access it from "anywhere", you are slightly limited by namespacing, but that is for your own good. It would translate as "anywhere you would like to".
The second choice is to have some data abstraction (like a database, or data file as you have) which stores and you retrieve data from this. This may be better than the first choice, as you might have far too much data to fit in memory at once. Again the way of getting this data will be available "anywhere" just like PHP.
You can either pass these things directly to instances in an explicit way, or you can use module-level globals and import them into places where you need them, as you go on to say:
and the way that I've kept the data in
one place is to pass each of my
classes an instance to my logging list
I can assure you that this is not a hack. It's quite reasonable, depending on the use, eg a config object can be used in the same way, as you may want to test your application with simultaneous different configs. Logging might be better suited as a module-level global that is just imported and called, as you probably only ever want one way of logging, but again, it depends on your requirements.
I guess to sum up, you really are on the right track. Try not to give in to that "hackish" smell especially when using languages you are not altogether familiar with. A hack in one language might be the gold-standard in another. And of course, best of luck with your project - it sounds fun.

Please don't reinvent the wheel. Your formationsHash as a list of key values isn't helpful and it duplicates the features of a dictionary.
def createFormations(logger):
"""This creates all the formations that will be used"""
formations = {}
formations['Tight']= Formation(logger, 'Tight', tightness = 1)
formations['Loose']= Formation(logger, 'Loose', tightness = 0.5)
formations['Skirmish']= Formation(logger, 'Skirmish', tightness = 0.1)
formations['Arrowhead']= Formation(logger, 'Arrowhead', tightness = 1)
return formations
Note, you don't actually need getFormation, since it does you no good. You can simply use something like this.
formations = createFormations( whatever )
f= formations[name]

"data.py creates an array (well, list), to use this list from another file I need to import data.py and remake said list."
I can't figure out what you're talking about. Seriously.
Here's a main program, which imports the data, and another module.
SomeMainProgram.py
import data
import someOtherModule
print data.formations['Arrowhead']
someOtherModule.function()
someOtherModule.py
import data
def function():
print data.formations['Tight']
data.py
import theLoggerThing
class Formation( object ):
pass # details omitted.
def createFormations( logger ):
pass # details omitted
formations= createFormations( theLoggerThing.logger )
So the main program works like this.
import data. The data module is imported.
a. import theLoggerThing. Whatever this is.
b. class Formation( object ):. Create a class Formations.
c. def createFormations( logger ):. Create a function createFormations.
d. formations =. Create an object, formations.
import someOtherModule. The someOtherModule is imported.
a. import data. Nothing happens. data is already available globally. This is a reference to what is -- effectively -- a Singleton. All Python modules are Singletons.
b. def function. Create a function function.
print data.formations['Arrowhead']. Evaluate data.formations, which is a dictionary object. Do a get('Arrowhead') on the dictionary which does some magical lookup and returns the object found there (an instance of Formation).
someOtherModule.function().
What happens during this function evaluation.
a. print data.formations['Tight']. Evaluate data.formations, which is a dictionary object. Do a get('Tight') on the dictionary which does some magical lookup and returns the object found there (an instance of Formation).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.