I'm just starting out with a new project which has a product class. An object will represent a product.
Each product has any number of un-defined attributes (could be colour, could be foobar etc..). The product object will contain an array of either attribute objects:
class attr {
var type; // string, float, int etc..
var name; // the name
var value; // the value
...
(and then the product object has an array of these attr objects..)
OR should I store an array for each product:
class product {
var attributes = array('colour' => 'red', 'weight' => '11')
...
Obviously I can make the array 2d and store the attribute type if I needed to.
My main concern is that products may have 20 or so attributes and with lots of users to the site I'm creating this could use up loads of memory - is that right, or just a myth?
I'm just wondering if someone knows what the best practice is for this sort of thing.. the object method feels more right but feels a bit wasteful to me. Any suggestions/thoughts?
As a general advise I'm against early optimization, especially if that means turning your OO models into implicit (non-modeled concepts) things like arrays. While I don't know what is the context in which you will be using them, having attributes as first class citizens will allow you to delegate behavior to them if you need, leading to a much cleaner design IMO. From my point of view you will mainly benefit in using the first approach if you need to manipulate the attributes, their types, etc.
Having said that, there are lots of frameworks that use the array approach for dynamically populating an object by using arrays and the _get()/_set() magic methods. You may want to take a look for example at how Elgg handles the $attributes array of the ElggEntity class to see it working and get some ideas.
HTH
There is no best practice for that, AFAIK.
You have two options as described in Your question where each has its cos'n'pros. All it depends on is how/where the products and attributes will be stored - if it is in MySQL database still there shouldn't be a difference as You could fetch an array or an object from a DB.
If You are using classes for every simple thing, then use classes also for attributes, if You use classes only for the big objects, then use arrays. It is upon Your preference. There won't be any significant memory consumption difference when using classes or arrays and by no means while having 20 or so attributes.
If it was upon me, I'd go with classes and array of classes for attributes as it gives more advantages in the future should I need to extend the attributes some way.
If the Product class really needs to be flexible and accomodate an arbitrary number of attributes, I would be tempted to make use of __get and __set e.g.
class Product {
protected $attributes = array();
public function __get($name) {
if (array_key_exists($name, $this->attributes)) {
return $this->attributes[$name];
}
}
public function __set($name, $value) {
$this->attributes[$name] = $value;
}
}
$o = new Product();
$o->foo = 123;
var_dump($o->foo);
This would lend itself nicely to implementing ArrayAccess and other SPL iterator type classes in the future - if your solution required it.
I thinks it more important to considering how you are going to store them in you database. I guess that you want to have a search where you could say something like I want a RED HAT size 12 then you must be able to find all the products that have attributes that match this. This is all done on a database level. It would not be a good idea to load all products in PHP classes first and then search.
Once you get what you want to show (search result, overview page, details) then you load the full product class with attributes. Since its all text/numbers and probably not more then a 100 products at once, speed wise it wont matter what you choose in PHP. Do what you like best.
In your database it could matter (since there you always works with all the products). Make sure you seperate strings/numbers/bools etc and put the correct indexes or you could have a mayor performance drop.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am trying to figure out what should be considered better for performance:
I have a bunch of objects that contain a lot of page-data.
A few examples of the data that an object can have:
filepath of PHP-file for includes
CSS filepath
JavaScript filepath
Meta data of the page
The object is specific for each type of content. I have an interface that defines the render-function. Each object implements this function differently.
Example:
class PhpFragment extends FragmentBase {
public function render() {
//... render output for this type of data
}
}
I am currently using a parent-object that contains variables that can contain multiple object of the type mentioned above. The object looks something like this:
class pageData {
protected $CSS;
protected $PHP;
protected $JS;
protected $Meta;
protected etc...
public function getCSS() {
return $this->CSS;
}
public function getPHP() {
return $this->PHP;
}
public function getJS() {
return $this->JS;
}
}
Whenever I load in a page, I walk through a template and render the data of each object that matches the tag in the template.
For example: If a template has a line where CSS is needed, I call the getCSS function of the pageData which returns an array of objects. Foreach of these objects I call the render function and add the data in the page.
What do I want?
I want to get rid of these fixed variables in the pageData object to be able to use my design as dynamically as possible. I want the pageData object to disappear and just have an array of different fragment-objects.
To achieve this, I need to replace the get-functions in the pageData with something clever?
My top priority is performance, so I thought I'd look through all the objects once to get all the different types, and put all the types as key in the array, the value of the array will then be a subarray that contains the correct key to the objects that match the type.
What I was wondering, before I start changing the design entirely, is this faster?
I don't know if this is the right place to ask this question (it's more a code-review question IMO). Anyway, here's a couple of thoughts I'd consider if I were you:
What are objects
Objects are units of functionality, or entities that represent a specific set of values. DTO's (like your pageData class) serves one purpouse: to group, and represent a set of values that belong together. The fact that a class has a type (type-hints) and an interface makes a code-base testable, easier to understand, maintain, debug, and document.
At first glance, a simple DTO isn't too different from a simple array, and yes, objects have a marginal performance cost.
The question you need to ask is whether or not you want to shave of those 1 or 2 ms per request at the cost of: increased development time, less testable, more error prone, and harder to maintain code. I'd argue that for this reason alone, DTO's make more sense than arrays
pre-declared properties are fast
If you want an object that is as dynamic as possible, then PHP offers you to possibility to add properties to instances on the fly:
Class Foo{}
$x = new Foo;
$x->bar = 'new property';
echo $x->bar;//echoes new property
So in essence, objects are just as flexible as arrays. However, properties that weren't declared beforehand are (again marginally) slower than predeclared properties.
When a class definition declares 3 properties, these properties are stored in a hash table. When accessing a member of an instance, this hashtable will be checked first. Internally, these hashtable lookups are O(1), If no properties were declared, any "dynamic" property is stored in a second hash table. Lookups on this fallback HT are O(n). Not terrible, but worse than they need be.
In addition to dynamic properties being less performant, they're also always public, so you have no control over their values (they can be reassigned elsewhere), and they are, of course, susceptible to human error (typo's):
$x = new Foo;
$x->foo = 'Set the value of foo';
echo $x->fo;//typo...
Getters and setters are good
The methods you have now don't do anything, true enough, but consider this:
class User
{
protected $email;
public function setEmail($email)
{
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
throw new \InvalidArgumentException('Invalid email');
}
$this->email = $email;
return $this;
}
}
A setter like this not only allows me to control/check when and where a property is set, but also to validate the data that someone is trying to assign to it. You can validate the data. You can ensure that, no matter what, if you receive an instance of User, the email will either be null, or a valid email address.
There are many more reasons why objects make more sense than arrays, but these alone, to me at least, outweigh the benefits of 2ms/req performance gain.
If performance is such an issue, why not write in a faster language?
If all you're after is performance, you might want to look into languages that outperform PHP to begin with. Don't get me wrong: I honestly like PHP, but it's just a fact that, for example, Go can do the same thing, only faster.
Pass by value, copy-on-write, and (almost) pass by reference
Arrays are, essentially, scalar values. Pass an array to a function, and any changes made to that array inside the function doesn't change the array you passed to that method. Objects are (sort-of) passed by reference. That's to say: objects are passed by identifier.
Say you have an instance of Foo. The Zend engine will assign a unique ID to that instance (eg 123). When you call a function and pass that instance, internally, you'll pass the identifier of that object to the method not the object itself.
This has several implications: When changing the state of the instance, PHP doesn't have to make a copy of the object: it just uses the ID to get the zval (internal representation of a PHP variable), and operates on the same piece of memory. The net result: you're passing a simple value (an int), and whatever happens to the object, wherever it happens, the state is shared throughout.
Arrays are different: Passing an array is (sort-of) passing a copy of that value. In reality, PHP is clever enough to pass a reference to the existing array, but once you start reassigning values, PHP does have to create a copy. This is the copy-on-write mechanism. Put simply, the idea is: do not create needless copies of values, unless you have to:
function foo(array $data)
{
$x = $data[0];//read, no copy of argument is required
$data[1] = $x * $data[3];//now, we're altering the argument, a copy is created
}
$data = [1, 2, 3, 4];
foo($data);//passes reference
Depending on how you use the arrays or objects you pass to functions, one might perform better than the other. On the whole: passing an array that you'll only use to read values will most likely outperform passing an object. However, if you start operating on the array/object, an object might turn out to outperform arrays...
TL;DR
Yes, arrays are generally faster than objects. But they're less safe, pretty much impossible to test, harder to maintain an non-communicative (public function doStuff(array $data) doesn't tell me as much as public function doStuff(User $data)).
Owing to the copy-on-write and the way instances are passed to functions, it's impossible to say which will be faster with absolute certainty. It really boils down to what you do: is the array fairly small, and are you only reading its values, then it's probably going to be faster than objects.
The moment you start operating on the data, it's entirely possible objects might prove to be faster.
I can't just leave it there without at least mentioning that old mantra:
Premature optimization is the root of all evil
Switching from objects to arrays for performance sake does smell of micro-optimization. If you have in fact reached the point that there's nothing else to optimize but these kinds of trivial things, then the project is either a small one; or you're the first person to actually work on a big project and actually finish it. In all other cases, you shouldn't really be wasting time on this kind of optimization.
Things that are far more important to profile, and then optimize are:
Caching (opcache, memcache, ...)
Disk IO (including files, autoloader mechanisms)
Resource management: open file pointers, DB connections (when to connect, when to close connections)
If you're using a traditional SQL DB: queries... The vast majority of PHP applications can benefit a lot by having a DBA look at the queries and actually optimize those
Server setup
...
Only if you've gone through this list, and more, could you perhaps consider thinking about some micro optimization. That is of course, if by then you haven't encountered any bugs...
I have a NoteEntity class that is meant to represent a row in my notes table. An "(data)entity" in my framework strictly means a stored/storable data entry - a database row, if you will. Creating such an object is as straightforward as initializing it with an array returned by mysqli_fetch_array(), which requires me to match the object properties to the column names of my table.
The constructor inherited from the parent DataEntity class:
PHP code
public function __construct($row_array)
{
foreach ($row_array as $column => $value)
{
if (property_exists($this, $column))
{
$this->$column = $value;
}
else
{
trigger_error(get_class($this) . " has no attribute called '$column'.", E_USER_NOTICE);
}
}
}
As you can see, all it does is mapping the corresponding columns to their corresponding object properties.
This is fine for me, since the NoteEntity class is defined only once, and thus it is easy to change its internal workings if the table columns would ever change. But this also requires me to use a getter method for each and every property, if I don't want my entire code to depend on the given table's column names.
Question goes as: is it good practice to have a getter for every property, or should I investigate another approach? I'm asking this from a performance perspective, but I'd wish to keep my code as maintainable as possible, as well.
The reason I'm concerned about performance is that if it gets a bit busier, getting the properties quickly becomes:
PHP code
foreach ($notes as $note_entity)
{
$template->Process(array(
'note_name' => $note_entity->GetName(),
'note_ext' => array_pop(explode('.', $note_entity->GetFilename())),
'subject_id' => $note_entity->GetSubjectID(),
'subject_name' => $note_entity->GetSubject(),
'institute_id' => $note_entity->GetInstituteID(),
'institute_nick' => $note_entity->GetInstitute(),
// ...
));
}
... which might be fine for a few dozen notes, but their count is anticipated to be in even the thousands per request, which adds significant function call overhead. The present version is extremely convenient to use, becase in no part of the code you need to keep the column names in mind.
Possible solutions that I came up with include an approach that returns every, or a subset of every property in an associative array. This has the following minor downsides:
returning only a subset of data creates implicit logical dependencies between NoteEntity and where it is used, if the wanted subset is varying between calling locations,
returning all properties creates an unnecessary flow of data that will never be used,
updating the colum configuration increases the cost of maintainibility, since we need to update the structure of the returned array as well.
I think this is more of a design preference, but there are many advantages of using getters/setters.
Encapsulation and hiding internals are typically good practice. Interoperability is another reason why using getters and setters is a good idea (ie. mocking becomes much easier).
Whether you want to do this for primitives or not is debatable, but typically you don't want to update properties by referencing them directly and especially not externally. So having getters/setters is good way insulate them.
Advantages that you get are much greater than the performance that you're about to lose.
Currently, I am creating a script that will parse information out of a certain type of report that it is passed to it. There is a part of the script that will pull a students information from the report and save it for later processing.
Is it best the hold it in the form of a class, or in an array variable? I figure there are 3 methods that I could use.
EDIT: Really, the question comes down to does any one way have any sort of performance advantage? Because otherwise, each method is really the same as the last.
As a class with direct access to variables:
class StudentInformation {
public $studentID;
public $firstName;
public $lastName;
public $middleName;
public $programYear;
public $timeGenerated;
function StudentInformation(){}
}
As a class with functions:
class StudentInformation {
private $studentID;
private $firstName;
private $lastName;
private $middleName;
private $programYear;
private $timeGenerated;
function StudentInformation(){}
public function setStudentID($id)
{
$this->studentID = $id;
}
public function getStudentID()
{
return $this->studentID;
}
public function setFirstName($fn)
{
$this->firstName = $fn;
}
/* etc, etc, etc */
}
Or as an array with strings as keys:
$studentInfo = array();
$studentInfo["idnumber"] = $whatever;
$studentInfo["firstname"] = $whatever;
$studentInfo["lastname"] = $whatever;
/* etc, etc, etc */
Trying to optimize the use of an array vs. a simple value object is probably an unnecessary micro-optimization. For the simplest cases, an array is faster because you don't have the overhead of constructing a new object.
It's important to remember this: array is NOT the only data structure that exists. If you don't need the hash capabilities, a simple SplFixedArraydocs will result in lower memory overhead and faster iteration once you get past the initial overhead of object creation. If you're storing a large amount of data the aforementioned fixed array or one of the other SPL data structures are likely a better option.
Finally: value objects should be immutable, so in your case I would strongly recommend the encapsulation afforded by an object over the ability to assign hash map values willy-nilly. If you want the simplicity of using the array notation, have your class implement ArrayAccessdocs and get the best of both worlds. Some would suggest magic getters and setters with __get and __set. I would not, as magic generally obfuscates your code unnecessarily. If you really really need magic, you might reconsider your design.
There's a reason why the OOP paradigm is recognized as the best programming paradigm that we've come up with -- because it's the best paradigm we've come up with. You should use it. Avoid falling into the trap of many/most PHP devs who use arrays for everything.
You could create a class for a single student, with appropriate operations for a single student, like updating grades or getting a full name (private data members with functions for access). Then create another class for multiple students that contains an array of single students. You can create functions that operate on sets of students, testing and calling the individual student functions as needed.
The high level answer is that you can do this any of the ways you suggest, but most of us will recommend the OOP solution. If your needs are very simple, a simple array may suffice. If your needs change, you may have to re-code the whole thing for objects anyway. Classes can be kept simple too, so I suggest you start with classes and add complexity as needed. I believe that long term, it will maintain and scale better built with classes.
Regarding your performance question, classes are probably faster than arrays, since different instances are stored independently. By putting all your stuff in one giant hash-map (associative array), you are also getting some of the limitations/properties of arrays. For example, ordering. You don't need that. Also, if the PHP interpreter isn't being smart, it will hash your lookup strings each time you lookup. Using classes and static typing that wouldn't be necessary.
The 3 options are valid but with a different degree of encapsulation/protection from the outside world.
from lowest protection to highest :
array
object with direct access to public properties
object with getter/setters
The choice highly depends on the environment of your project (2 hours ? sent to the bin tomorrow ?)
choice 2 seems pragmatic.
take into account that depending on your database wrapper, the data could be fetched into array or objects. If it is fetched as an array, you may have to map those to objects.
This is a subjective question with no definitive answer but I would recommend the OOP way. You could create a class Parser holding an instance of StudentInformation.
It's way more comfortable and you can add more methods if you need some additional processing.
I would go with the class, with the properties being private.
Any operations pertaining to the student information could/should be created in that class.
If this was a one time thing, I would go for the array, but really, I know what it means to have something that is going to used one time only and then finding out that I need to execute n operations on the data, and end up having to go for a class.
Arrays are better if you're only coding a small project without many functions. However, classes have their advantages. For example, would you rather type
$class->db_update("string", $database);
or
$query = "SELECT * FROM `table` WHERE foo='".$array['bar'].'";
mysql_connect("...
mysql_query($query...
Basically, each side has its advantages. I'd recommend the OOP way as most other people here should and would.
EDIT: Also, take a look at this.
I'm only half-familiar with OO PHP. I have made a few simple classes before, and used many downloaded ones, but would like to make one properly for once and am hoping for some reco's on either how to do it best, or which resource / tutorials to consult to get my head properly wrapped around it.
I will start with the background on how what I want to do.
I'm building a program that takes a user uploaded file (an excel file) which contains a list and various values. It would look something like this:
Item, person1, person2, person3
Car, 1, 3, 4
Bike, 5, 0, 1
Now, the main part of the class is used to create a graphic for each person that shows how many bikes and cars they have (and possibly many more items). So I create the class with some functions to specify background imagery, size etc... that is straightforward enough I think.
What I want to do though is figure out the best way to pass the data to that class so it can efficiently make my graphic. Should I just read through the data in the excel file using some excel file reading class and do something like:
foreach column {
foreach row {
$data[$vehiclename] = $columnVehicleCount;
}
$image = new MyClassName;
$image->loadData($data);
...
}
and then within my class I just just iterate through the $data's key value pairs? Or should I create some other object form to store the data before passing it? My hunch is that is the right thing to do, and sort of the point of OOP but this is where I am at a loss. Is this were I extend something? I am getting used to JS Objects using JSON, but still not 100% certain of the best way to use them.
Would I create some sort of subclass? How would it work?
$thisItem = $image->addItem(Bike);
$thisItem->quantity(5);
And then within the main class something like
foreach($this->items as $item) {
draw($item->name);
resize($item->quantity)
}
I'm sure that last bit is all wrong, but that's why I'm here. Any help or direction would be greatly appreciated.
Cheers,
M
I recommend you to build your classes according to oo principals
think about each object, what can he do (methods) what defines him (properties)
afterwards think about how you create the objects (Factory), and how you initialize them with data(builder).
My question is more like a theoretical.
Say you have an object, that represents the list of something (articles, pages, accounts etc.)
class ObjCollection
You have a class, that represents a specific item in collection:
class objItem
I have a problem thinking of a basic responsibilities of each object.
Which class is responsible for creating a new objItem?
Which class is responsible for deleting a objItem? Should it delete itself as a method?
Update 1:
Techpriester: Is it ok to use object's constructor as a function to create new item?
I think of that like:
class objItem {
public function __construct($id = 0) {
if ($id > 0) {
// load item data...
} else {
// make new item...
}
}
}
But what if something goes wrong in the code, and instead of passing an $id > 0, it passes 0? In this case a more expected behavior would be an empty object, and not the new one, or am I wrong?
A way of thinking about this:
objItem usually have a class constructor so this class might be responsible for creating objects of type objItem.
When an objItem is inserted in a list/collection let's say objCollection it can be objCollection responsability to delete it from the collection.
objItem usually have a class
constructor so this class is
responsible for creating objects of
type objItem.
Constructor has nothing to do with responsibility (usually). Thinking this way, every object would be only responsible for itself.
Responsiblity is a concept not directly binded with class hierarchy.
If:
ObjCollection = Nest objItem = Egg. And there is third object Bird, Then Bird takes responsibility for creating egs (even if nest contains egg). It is not about programming it is about common sense... :)
There is not such thing like "empty object". Objects have "state". You can create an object and then you have it, or you may not to create it and there is no object then.
All you have to worry about is if your constructor will work fine in both cases, with new object created and without it.
Usually it is better to inject object as a constructor parameter (instead of $id) not to create it inside another object.
I know this doesn't answer your question, but since you tagged this as PHP I'm going to assume that it will almost certainly be applied with some sort of database model.
In that case, it's probably a better idea to do away with 'collections' altogether since if you made each class represent only one object, if you wanted to view 10 blog posts, for example, you would be calling 10 separate SELECT queries each retrieving only an individual database record, because you decided to have the 'BlogPost' class encapsulate its retrieval method.
The alternative is to let the class represent either one or more records, that way, you only need to run one SELECT query whether you're retrieving 5000 records or only one. Pretty much every object-relational-mapper does this.
When doing object-oriented programming, it's better to think in terms of behavior or responsibility than whether or not the object is a tangible 'thing'. That's the problem with theoretical discussion of OOP. It's very tempting to use analogies like animals and fruits which have very little relevance to real-world programming.
Since an object cannot delete itself, that has to be the responsibility of the collection.
Wether you let the collection create it's objects like $collection->makeNewItem(); (which then calls the items constructor) or use $item = new Item(); directly and then some $collection->addItem($item);method is entirely up to you and the needs of your application.
I'd recommend using regular instantiation if the items themselves are also used outside of the collection.