Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am trying to figure out what should be considered better for performance:
I have a bunch of objects that contain a lot of page-data.
A few examples of the data that an object can have:
filepath of PHP-file for includes
CSS filepath
JavaScript filepath
Meta data of the page
The object is specific for each type of content. I have an interface that defines the render-function. Each object implements this function differently.
Example:
class PhpFragment extends FragmentBase {
public function render() {
//... render output for this type of data
}
}
I am currently using a parent-object that contains variables that can contain multiple object of the type mentioned above. The object looks something like this:
class pageData {
protected $CSS;
protected $PHP;
protected $JS;
protected $Meta;
protected etc...
public function getCSS() {
return $this->CSS;
}
public function getPHP() {
return $this->PHP;
}
public function getJS() {
return $this->JS;
}
}
Whenever I load in a page, I walk through a template and render the data of each object that matches the tag in the template.
For example: If a template has a line where CSS is needed, I call the getCSS function of the pageData which returns an array of objects. Foreach of these objects I call the render function and add the data in the page.
What do I want?
I want to get rid of these fixed variables in the pageData object to be able to use my design as dynamically as possible. I want the pageData object to disappear and just have an array of different fragment-objects.
To achieve this, I need to replace the get-functions in the pageData with something clever?
My top priority is performance, so I thought I'd look through all the objects once to get all the different types, and put all the types as key in the array, the value of the array will then be a subarray that contains the correct key to the objects that match the type.
What I was wondering, before I start changing the design entirely, is this faster?
I don't know if this is the right place to ask this question (it's more a code-review question IMO). Anyway, here's a couple of thoughts I'd consider if I were you:
What are objects
Objects are units of functionality, or entities that represent a specific set of values. DTO's (like your pageData class) serves one purpouse: to group, and represent a set of values that belong together. The fact that a class has a type (type-hints) and an interface makes a code-base testable, easier to understand, maintain, debug, and document.
At first glance, a simple DTO isn't too different from a simple array, and yes, objects have a marginal performance cost.
The question you need to ask is whether or not you want to shave of those 1 or 2 ms per request at the cost of: increased development time, less testable, more error prone, and harder to maintain code. I'd argue that for this reason alone, DTO's make more sense than arrays
pre-declared properties are fast
If you want an object that is as dynamic as possible, then PHP offers you to possibility to add properties to instances on the fly:
Class Foo{}
$x = new Foo;
$x->bar = 'new property';
echo $x->bar;//echoes new property
So in essence, objects are just as flexible as arrays. However, properties that weren't declared beforehand are (again marginally) slower than predeclared properties.
When a class definition declares 3 properties, these properties are stored in a hash table. When accessing a member of an instance, this hashtable will be checked first. Internally, these hashtable lookups are O(1), If no properties were declared, any "dynamic" property is stored in a second hash table. Lookups on this fallback HT are O(n). Not terrible, but worse than they need be.
In addition to dynamic properties being less performant, they're also always public, so you have no control over their values (they can be reassigned elsewhere), and they are, of course, susceptible to human error (typo's):
$x = new Foo;
$x->foo = 'Set the value of foo';
echo $x->fo;//typo...
Getters and setters are good
The methods you have now don't do anything, true enough, but consider this:
class User
{
protected $email;
public function setEmail($email)
{
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
throw new \InvalidArgumentException('Invalid email');
}
$this->email = $email;
return $this;
}
}
A setter like this not only allows me to control/check when and where a property is set, but also to validate the data that someone is trying to assign to it. You can validate the data. You can ensure that, no matter what, if you receive an instance of User, the email will either be null, or a valid email address.
There are many more reasons why objects make more sense than arrays, but these alone, to me at least, outweigh the benefits of 2ms/req performance gain.
If performance is such an issue, why not write in a faster language?
If all you're after is performance, you might want to look into languages that outperform PHP to begin with. Don't get me wrong: I honestly like PHP, but it's just a fact that, for example, Go can do the same thing, only faster.
Pass by value, copy-on-write, and (almost) pass by reference
Arrays are, essentially, scalar values. Pass an array to a function, and any changes made to that array inside the function doesn't change the array you passed to that method. Objects are (sort-of) passed by reference. That's to say: objects are passed by identifier.
Say you have an instance of Foo. The Zend engine will assign a unique ID to that instance (eg 123). When you call a function and pass that instance, internally, you'll pass the identifier of that object to the method not the object itself.
This has several implications: When changing the state of the instance, PHP doesn't have to make a copy of the object: it just uses the ID to get the zval (internal representation of a PHP variable), and operates on the same piece of memory. The net result: you're passing a simple value (an int), and whatever happens to the object, wherever it happens, the state is shared throughout.
Arrays are different: Passing an array is (sort-of) passing a copy of that value. In reality, PHP is clever enough to pass a reference to the existing array, but once you start reassigning values, PHP does have to create a copy. This is the copy-on-write mechanism. Put simply, the idea is: do not create needless copies of values, unless you have to:
function foo(array $data)
{
$x = $data[0];//read, no copy of argument is required
$data[1] = $x * $data[3];//now, we're altering the argument, a copy is created
}
$data = [1, 2, 3, 4];
foo($data);//passes reference
Depending on how you use the arrays or objects you pass to functions, one might perform better than the other. On the whole: passing an array that you'll only use to read values will most likely outperform passing an object. However, if you start operating on the array/object, an object might turn out to outperform arrays...
TL;DR
Yes, arrays are generally faster than objects. But they're less safe, pretty much impossible to test, harder to maintain an non-communicative (public function doStuff(array $data) doesn't tell me as much as public function doStuff(User $data)).
Owing to the copy-on-write and the way instances are passed to functions, it's impossible to say which will be faster with absolute certainty. It really boils down to what you do: is the array fairly small, and are you only reading its values, then it's probably going to be faster than objects.
The moment you start operating on the data, it's entirely possible objects might prove to be faster.
I can't just leave it there without at least mentioning that old mantra:
Premature optimization is the root of all evil
Switching from objects to arrays for performance sake does smell of micro-optimization. If you have in fact reached the point that there's nothing else to optimize but these kinds of trivial things, then the project is either a small one; or you're the first person to actually work on a big project and actually finish it. In all other cases, you shouldn't really be wasting time on this kind of optimization.
Things that are far more important to profile, and then optimize are:
Caching (opcache, memcache, ...)
Disk IO (including files, autoloader mechanisms)
Resource management: open file pointers, DB connections (when to connect, when to close connections)
If you're using a traditional SQL DB: queries... The vast majority of PHP applications can benefit a lot by having a DBA look at the queries and actually optimize those
Server setup
...
Only if you've gone through this list, and more, could you perhaps consider thinking about some micro optimization. That is of course, if by then you haven't encountered any bugs...
Related
So if I need to pass data around the application internally is there nice predictable way to pass it around.
I want to be able to look at a function and know whats in the collection that is getting passed into the function.
An array is an unpredictable data structure, so you never know what you're getting.
I feel like it's better to have an object defined with named properties and pass that around the application instead of an array, because that way theres SOME sort of a definition available.
The issue with that is that there will just be an bunch of objects accumulating in some folder somewhere.
Was wondering anyone's opinion on this, and any other alternatives?
You might be having a larger (although - hidden) problem. A structure in your code should not cross more than one layer boundary.
If some structure (doesn't matter whether it is an array or an object) is crossing two layers, then, unless it is an intentional compromise, it is a sign, that there might be "architectural flaws" indicating fragility of your codebase. Such cross-cutting data structures become the fault-lines across witch your codebase exhibits none-trivial bugs.
If your have structures, that cross 3 or more layer boundaries, your codebase is fucked. It becomes one of those "kill it with fire, before it lays eggs" projects.
The solution that I use is this:
Instead of having dedicated "data structures" being passed around, focus your business logic around domain objects. You create it at some point in the layer, where you will be actually using it logic-related behaviour and inject it or return it to other layer only to affect it.
Kinda like this:
public function loginWithPassword(Entity\EmailIdentity $identity, string $password): Entity\CookieIdentity
{
if ($identity->matchPassword($password) === false) {
throw new PasswordMismatch;
}
$identity->setPassword($password);
$this->updateEmailIdentityOnUse($identity);
$cookie = $this->createCookieIdentity($identity);
return $cookie;
}
What is being passed in an out of this method is not some "data structure", but a fully formed logical entity, which contains specific, business-relate behaviour.
I have a NoteEntity class that is meant to represent a row in my notes table. An "(data)entity" in my framework strictly means a stored/storable data entry - a database row, if you will. Creating such an object is as straightforward as initializing it with an array returned by mysqli_fetch_array(), which requires me to match the object properties to the column names of my table.
The constructor inherited from the parent DataEntity class:
PHP code
public function __construct($row_array)
{
foreach ($row_array as $column => $value)
{
if (property_exists($this, $column))
{
$this->$column = $value;
}
else
{
trigger_error(get_class($this) . " has no attribute called '$column'.", E_USER_NOTICE);
}
}
}
As you can see, all it does is mapping the corresponding columns to their corresponding object properties.
This is fine for me, since the NoteEntity class is defined only once, and thus it is easy to change its internal workings if the table columns would ever change. But this also requires me to use a getter method for each and every property, if I don't want my entire code to depend on the given table's column names.
Question goes as: is it good practice to have a getter for every property, or should I investigate another approach? I'm asking this from a performance perspective, but I'd wish to keep my code as maintainable as possible, as well.
The reason I'm concerned about performance is that if it gets a bit busier, getting the properties quickly becomes:
PHP code
foreach ($notes as $note_entity)
{
$template->Process(array(
'note_name' => $note_entity->GetName(),
'note_ext' => array_pop(explode('.', $note_entity->GetFilename())),
'subject_id' => $note_entity->GetSubjectID(),
'subject_name' => $note_entity->GetSubject(),
'institute_id' => $note_entity->GetInstituteID(),
'institute_nick' => $note_entity->GetInstitute(),
// ...
));
}
... which might be fine for a few dozen notes, but their count is anticipated to be in even the thousands per request, which adds significant function call overhead. The present version is extremely convenient to use, becase in no part of the code you need to keep the column names in mind.
Possible solutions that I came up with include an approach that returns every, or a subset of every property in an associative array. This has the following minor downsides:
returning only a subset of data creates implicit logical dependencies between NoteEntity and where it is used, if the wanted subset is varying between calling locations,
returning all properties creates an unnecessary flow of data that will never be used,
updating the colum configuration increases the cost of maintainibility, since we need to update the structure of the returned array as well.
I think this is more of a design preference, but there are many advantages of using getters/setters.
Encapsulation and hiding internals are typically good practice. Interoperability is another reason why using getters and setters is a good idea (ie. mocking becomes much easier).
Whether you want to do this for primitives or not is debatable, but typically you don't want to update properties by referencing them directly and especially not externally. So having getters/setters is good way insulate them.
Advantages that you get are much greater than the performance that you're about to lose.
Currently, I am creating a script that will parse information out of a certain type of report that it is passed to it. There is a part of the script that will pull a students information from the report and save it for later processing.
Is it best the hold it in the form of a class, or in an array variable? I figure there are 3 methods that I could use.
EDIT: Really, the question comes down to does any one way have any sort of performance advantage? Because otherwise, each method is really the same as the last.
As a class with direct access to variables:
class StudentInformation {
public $studentID;
public $firstName;
public $lastName;
public $middleName;
public $programYear;
public $timeGenerated;
function StudentInformation(){}
}
As a class with functions:
class StudentInformation {
private $studentID;
private $firstName;
private $lastName;
private $middleName;
private $programYear;
private $timeGenerated;
function StudentInformation(){}
public function setStudentID($id)
{
$this->studentID = $id;
}
public function getStudentID()
{
return $this->studentID;
}
public function setFirstName($fn)
{
$this->firstName = $fn;
}
/* etc, etc, etc */
}
Or as an array with strings as keys:
$studentInfo = array();
$studentInfo["idnumber"] = $whatever;
$studentInfo["firstname"] = $whatever;
$studentInfo["lastname"] = $whatever;
/* etc, etc, etc */
Trying to optimize the use of an array vs. a simple value object is probably an unnecessary micro-optimization. For the simplest cases, an array is faster because you don't have the overhead of constructing a new object.
It's important to remember this: array is NOT the only data structure that exists. If you don't need the hash capabilities, a simple SplFixedArraydocs will result in lower memory overhead and faster iteration once you get past the initial overhead of object creation. If you're storing a large amount of data the aforementioned fixed array or one of the other SPL data structures are likely a better option.
Finally: value objects should be immutable, so in your case I would strongly recommend the encapsulation afforded by an object over the ability to assign hash map values willy-nilly. If you want the simplicity of using the array notation, have your class implement ArrayAccessdocs and get the best of both worlds. Some would suggest magic getters and setters with __get and __set. I would not, as magic generally obfuscates your code unnecessarily. If you really really need magic, you might reconsider your design.
There's a reason why the OOP paradigm is recognized as the best programming paradigm that we've come up with -- because it's the best paradigm we've come up with. You should use it. Avoid falling into the trap of many/most PHP devs who use arrays for everything.
You could create a class for a single student, with appropriate operations for a single student, like updating grades or getting a full name (private data members with functions for access). Then create another class for multiple students that contains an array of single students. You can create functions that operate on sets of students, testing and calling the individual student functions as needed.
The high level answer is that you can do this any of the ways you suggest, but most of us will recommend the OOP solution. If your needs are very simple, a simple array may suffice. If your needs change, you may have to re-code the whole thing for objects anyway. Classes can be kept simple too, so I suggest you start with classes and add complexity as needed. I believe that long term, it will maintain and scale better built with classes.
Regarding your performance question, classes are probably faster than arrays, since different instances are stored independently. By putting all your stuff in one giant hash-map (associative array), you are also getting some of the limitations/properties of arrays. For example, ordering. You don't need that. Also, if the PHP interpreter isn't being smart, it will hash your lookup strings each time you lookup. Using classes and static typing that wouldn't be necessary.
The 3 options are valid but with a different degree of encapsulation/protection from the outside world.
from lowest protection to highest :
array
object with direct access to public properties
object with getter/setters
The choice highly depends on the environment of your project (2 hours ? sent to the bin tomorrow ?)
choice 2 seems pragmatic.
take into account that depending on your database wrapper, the data could be fetched into array or objects. If it is fetched as an array, you may have to map those to objects.
This is a subjective question with no definitive answer but I would recommend the OOP way. You could create a class Parser holding an instance of StudentInformation.
It's way more comfortable and you can add more methods if you need some additional processing.
I would go with the class, with the properties being private.
Any operations pertaining to the student information could/should be created in that class.
If this was a one time thing, I would go for the array, but really, I know what it means to have something that is going to used one time only and then finding out that I need to execute n operations on the data, and end up having to go for a class.
Arrays are better if you're only coding a small project without many functions. However, classes have their advantages. For example, would you rather type
$class->db_update("string", $database);
or
$query = "SELECT * FROM `table` WHERE foo='".$array['bar'].'";
mysql_connect("...
mysql_query($query...
Basically, each side has its advantages. I'd recommend the OOP way as most other people here should and would.
EDIT: Also, take a look at this.
I do a lot of work in WordPress, and I've noticed that far more functions return objects than arrays. Database results are returned as objects unless you specifically ask for an array. Errors are returned as objects. Outside of WordPress, most APIs give you an object instead of an array.
My question is, why do they use objects instead of arrays? For the most part it doesn't matter too much, but in some cases I find objects harder to not only process but to wrap my head around. Is there a performance reason for using an object?
I'm a self-taught PHP programmer. I've got a liberal arts degree. So forgive me if I'm missing a fundamental aspect of computer science. ;)
These are the reasons why I prefer objects in general:
Objects not only contain data but also functionality.
Objects have (in most cases) a predefined structure. This is very useful for API design. Furthermore, you can set properties as public, protected, or private.
objects better fit object oriented development.
In most IDE's auto-completion only works for objects.
Here is something to read:
Object Vs. Array in PHP
PHP stdClass: Storing Data in an Object Instead of an Array
When should I use stdClass and when should I use an array in php5 oo code
PHP Objects vs Arrays
Mysql results in PHP - arrays or objects?
PHP objects vs arrays performance myth
A Set of Objects in PHP: Arrays vs. SplObjectStorage
Better Object-Oriented Arrays
This probably isn't something you are going to deeply understand until you have worked on a large software project for several years. Many fresh computer science majors will give you an answer with all the right words (encapsulation, functionality with data, and maintainability) but few will really understand why all that stuff is good to have.
Let's run through a few examples.
If arrays were returned, then either all of the values need to be computed up front or lots of little values need to be returned with which you can build the more complex values from.
Think about an API method that returns a list of WordPress posts. These posts all have authors, authors have names, e-mail address, maybe even profiles with their biographies.
If you are returning all of the posts in an array, you'll either have to limit yourself to returning an array of post IDs:
[233, 41, 204, 111]
or returning a massive array that looks something like:
[ title: 'somePost', body: 'blah blah', 'author': ['name': 'billy', 'email': 'bill#bill.com', 'profile': ['interests': ['interest1', 'interest2', ...], 'bio': 'info...']] ]
[id: '2', .....]]
The first case of returning a list of IDs isn't very helpful to you because then you need to make an API call for each ID in order to get some information about that post.
The second case will pull way more information than you need 90% of the time and be doing way more work (especially if any of those fields is very complicated to build).
An object on the other hand can provide you with access to all the information you need, but not have actually pulled that information yet. Determining the values of fields can be done lazily (that is, when the value is needed and not beforehand) when using an object.
Arrays expose more data and capabilities than intended
Go back to the example of the massive array being returned. Now someone may likely build an application that iterates over each value inside the post array and prints it. If the API is updated to add just one extra element to that post array then the application code is going to break since it will be printing some new field that it probably shouldn't. If the order of items in the post array returned by the API changes, that will break the application code as well. So returning an array creates all sorts of possible dependencies that an object would not create.
Functionality
An object can hold information inside of it that will allow it to provide useful functionality to you. A post object, for instance, could be smart enough to return the previous or next posts. An array couldn't ever do that for you.
Flexibility
All of the benefits of objects mentioned above help to create a more flexible system.
My question is, why do they use objects instead of arrays?
Probably two reasons:
WordPress is quite old
arrays are faster and take less memory in most cases
easier to serialize
Is there a performance reason for using an object?
No. But a lot of good other reasons, for example:
you may store logic in the objects (methods, closures, etc.)
you may force object structure using an interface
better autocompletion in IDE
you don't get notices for not undefined array keys
in the end, you may easily convert any object to array
OOP != AOP :)
(For example, in Ruby, everything is an object. PHP was procedural/scripting language previously.)
WordPress (and a fair amount of other PHP applications) use objects rather than arrays, for conceptual, rather than technical reasons.
An object (even if just an instance of stdClass) is a representation of one thing. In WordPress that might be a post, a comment, or a user. An array on the other hand is a collection of things. (For example, a list of posts.)
Historically, PHP hasn't had great object support so arrays became quite powerful early on. (For example, the ability to have arbitrary keys rather than just being zero-indexed.) With the object support available in PHP 5, developers now have a choice between using arrays or objects as key-value stores. Personally, I prefer the WordPress approach as I like the syntactic difference between 'entities' and 'collections' that objects and arrays provide.
My question is, why do they (Wordpress) use objects instead of arrays?
That's really a good question and not easy to answer. I can only assume that it's common in Wordpress to use stdClass objects because they're using a database class that by default returns records as a stdClass object. They got used to it (8 years and more) and that's it. I don't think there is much more thought behind the simple fact.
syntactic sugar for associative arrays
-- Zeev Suraski about the standard object since PHP 3
stdClass objects are not really better than arrays. They are pretty much the same. That's for some historical reasons of the language as well as stdClass objects are really limited and actually are only sort of value objects in a very basic sense.
stdClass objects store values for their members like an array does per entry. And that's it.
Only PHP freaks are able to create stdClass objects with private members. There is not much benefit - if any - doing so.
stdClass objects do not have any methods/functions. So no use of that in Wordpress.
Compared with array, there are far less helpful functions to deal with a list or semi-structured data.
However, if you're used to arrays, just cast:
$array = (array) $object;
And you can access the data previously being an object, as an array. Or you like it the other way round:
$object = (object) $array;
Which will only drop invalid member names, like numbers. So take a little care. But I think you get the big picture: There is not much difference as long as it is about arrays and objects of stdClass.
Related:
Converting to object PHP Manual
Reserved Classes PHP Manual
What is stdClass in PHP?
The code looks cooler that way
Objects pass by reference
Objects are more strong typed then arrays, hence lees pron to errors (or give you a meaningful error message when you try to use un-existing member)
All the IDEs today have auto-complete, so when working with defined objects, the IDE does a lot for you and speeds up things
Easilly encapsulate logic and data in the same box, where with arrays, you store the data in the array, and then use a set of different function to process it.
Inheritance, If you would have a similar array with almost but not similar functionality, you would have to duplicate more code then if you are to do it with objects
Probably some more reason I have thought about
Objects are much more powerful than arrays can be.
Each object as an instance of a class can have functions attached.
If you have data that need processing then you need a function that does the processing.
With an array you would have to call that function on that array and therefore associate the logic yourself to the data.
With an object this association is already done and you don't have to care about it any more.
Also you should consider the OO principle of information hiding. Not everything that comes back from or goes to the database should be directly accessible.
There are several reasons to return objects:
Writing $myObject->property requires fewer "overhead" characters than $myArray['element']
Object can return data and functionality; arrays can contain only data.
Enable chaining: $myobject->getData()->parseData()->toXML();
Easier coding: IDE autocompletion can provide method and property hints for object.
In terms of performance, arrays are often faster than objects. In addition to performance, there are several reasons to use arrays:
The the functionality provided by the array_*() family of functions can reduce the amount of coding necessary in some cases.
Operations such as count() and foreach() can be performed on arrays. Objects do not offer this (unless they implement Iterator or Countable).
It's usually not going to be because of performance reasons. Typically, objects cost more than arrays.
For a lot of APIs, it probably has to do with the objects providing other functionality besides being a storage mechanism. Otherwise, it's a matter of preference and there is really no benefit to returning an object vs an array.
An array is just an index of values. Whereas an object contains methods which can generate the result for you. Sure, sometimes you can access an objects values directly, but the "right way to do it" is to access an objects methods (a function operating on the values of that object).
$obj = new MyObject;
$obj->getName(); // this calls a method (function), so it can decide what to return based on conditions or other criteria
$array['name']; // this is just the string "name". there is no logic to it.
Sometimes you are accessing an objects variables directly, this is usually frowned upon, but it happens quite often still.
$obj->name; // accessing the string "name" ... not really different from an array in this case.
However, consider that the MyObject class doesn't have a variable called 'name', but instead has a first_name and last_name variable.
$obj->getName(); // this would return first_name and last_name joined.
$obj->name; // would fail...
$obj->first_name;
$obj->last_name; // would be accessing the variables of that object directly.
This is a very simple example, but you can see where this is going. A class provides a collection of variables and the functions which can operate on those variables all within a self-contained logical entity. An instance of that entity is called an object, and it introduces logic and dynamic results, which an array simply doesn't have.
Most of the time objects are just as fast, if not faster than arrays, in PHP there isn't a noticeable difference. the main reason is that objects are more powerful than arrays. Object orientated programming allows you to create objects and store not only data, but functionality in them, for example in PHP the MySQLi Class allows you to have a database object that you can manipulate using a host of inbuilt functions, rather than the procedural approach.
So the main reason is that OOP is an excellent paradigm. I wrote an article about why using OOP is a good idea, and explaining the concept, you can take a look here: http://tomsbigbox.com/an-introduction-to-oop/
As a minor plus you also type less to get data from an object - $test->data is better than $test['data'].
I'm unfamiliar with word press. A lot of answers here suggest that a strength of objects is there ability to contain functional code. When returning an object from a function/API call it shouldn't contain utility functions. Just properties.
The strength in returning objects is that whatever lies behind the API can change without breaking your code.
Example: You get an array of data with key/value pairs, key representing the DB column. If the DB column gets renamed your code will break.
Im running the next test in php 5.3.10 (windows) :
for ($i = 0; $i < 1000000; $i++) {
$x = array();
$x['a'] = 'a';
$x['b'] = 'b';
}
and
for ($i = 0; $i < 1000000; $i++) {
$x = new stdClass;
$x->a = 'a';
$x->b = 'b';
}
Copied from http://atomized.org/2009/02/really-damn-slow-a-look-at-php-objects/comment-page-1/#comment-186961
Calling the function for 10 concurrent users and 10 times (for to obtain an average) then
Arrays : 100%
Object : 214% – 216% (2 times slower).
AKA, Object it is still painful slow. OOP keeps the things tidy however it should be used carefully.
What Wordpress is applying?. Well, both solutions, is using objects, arrays and object & arrays, Class wpdb uses the later (and it is the heart of Wordpress).
It follows the boxing and unboxing principle of OOP. While languages such as Java and C# support this natively, PHP does not. However it can be accomplished, to some degree in PHP, just not eloquently as the language itself does not have constructs to support it. Having box types in PHP could help with chaining, keeping everything object oriented and allows for type hinting in method signatures. The downside is overhead and the fact that you now have extra checking to do using the “instanceof†construct. Having a type system is also a plus when using development tools that have intellisense or code assist like PDT. Rather than having to google/bing/yahoo for the method, it exists on the object, and you can use the tool to provide a drop down.
Although the points made about objects being more than just data are valid since they are usually data and behaviour there is at least one pattern mentioned in Martin Fowler's "Patterns of Enterprise Application Architecture" that applies to this type of cenario in which you're transfering data from one system (the application behind the API) and another (your application).
Its the Data Transfer Object - An object that carries data between processes in order to reduce the number of method calls.
So if the question is whether APIs should return a DTO or an array I would say that if the performance cost is negligible then you should choose the option that is more maintainable which I would argue is the DTO option... but of course you also have to consider the skills and culture of the team that is developing your system and the language or IDE support for each of the options.
In PHP, what's the best way to track an objects state, to tell whether or not it has been modified? I have a repository object that creates new entities, they are then read/modified by other components and eventually given back to the repository. If the object has changed I want the repository to save the data back to the persistent storage (DB).
I can think of four options:
Use an internal boolean property called $_modified, which every setter updates when modifications are made (tedious if there are a lot of setters).
Some horrible ugly hack using serialize() and comparing the strings (I'm sure this is a very bad idea, but I thought I'd add it for completeness)
Something like the above but with a hash of the objects properties (not sure if it would work with objects that contain other objects)
Taking clones of the objects as they come out of the repo and comparing what comes in (more complicated than it sounds, because how do you know which clone to compare the object to?)
or...
Some other clever trick I'm not aware of?
Thanks,
Jack
Option 1 - using a boolean flag - is the best way. In terms of performance, as well as general usability and portability.
All the other options incur excessive overhead which is simply not needed in this case.
You might want to look into the Unit of Work pattern, as this is much the problem which it is designed to tackle. As objects are changed, they register themselves, or are passively registered, with the UoW, which is responsible for keeping track of which objects have been changed, and for figuring out what needs to be saved back to the database at the end of play.
First, I generally think a $modified or $dirty property is the best solution here.
But the reason I'm writing was your comment that it would be
tedious if there are a lot of setters
I'm wondering if you're not using PHP5?
If you are, this is a perfect use of the __set() magic method. You could even use it as-is, calling your existing setters from the __set() method.
For example:
class some_concept
{
private $x = 1;
private $y = 2;
private $dirty = false;
// This represents your existing setter methods
function set_y($i)
{
$this->y = $i;
}
function __set($name, $value)
{
if ($value != $this->{$name}) $this->dirty = true;
return $this->{'set_' . $name}($value);
}
function __get($name)
{
return $this->{$name};
}
}
However, there is some merit to your idea of comparing serialized strings. Serialization can be expensive. It's true.
But it is the most accurate solution on your list. Imagine loading the object above, setting $y to 0, then setting it back to 2, then saving. Is it dirty? It will say so, but in reality, it's in the same state it was when it was loaded.
The test I would use here is how expensive is your save(). If saving is a very expensive API call, DB transaction, etc, then you might find that it's worth the expense to serialize the object on-load and save an md5 hash of it.
If that op, which will take a fraction of a second, can save a multi-second transaction, then it could really be worth it.
Finally, I want to point out a contrarian opinion on this subject from Damien Katz. Katz is a talented developer who created CouchDb, and currently works for MySQL.
His Error Codes v Exceptions post is long but a very good read on this topic.
While it begins talking about the merits of returning an Error Code of throwing an Exception, he really ends up talking about how to write solid software.
Primarily, he talks about how to create classes that are atomic in the same way that SQL Transactions are. The general idea being that you make a copy of an object, modify it its state, and, only on the last step, if successful, do you swap that copy out for the primary object. That allows meaningful undo features. A pattern like this, while difficult to shim into an existing app, also provides a solution to this is_modified problem.
I think the first option is the best for more readable code and faster execution. I would also consider serializing the objects first(say to file) and then hashing the contents.
The same contents should produce the same hash value.