I can define constants in file and then include them, or
I can store them in DB and seed them.
From your experience, which one is better/faster/less resource costly for say .. dozens, hundreds, thousands defined?
This is an ever-present question, which differs from person to person, and is a bit of a balancing act. For me, it all depends on how big and static the data-set is presumed to be. In my applications I have about 10 code points where I will refer to constants rather than going through all the overhead to pull data from a database.
Another benefit of using constants, is you can refer to them throughout the codebase which will help reduce defects and improve semantics, for example:
class Ad
{
const ID_AD_TYPE_BANNER = 1;
const ID_AD_TYPE_SKYSCRAPER = 2;
}
// Somewhere in app
if (array_key_exists(Ad::ID_AD_TYPE_BANNER, $array)) {
However there are a few hmmm downsides? I hesitate to call them downsides because they are not difficult to work around. First, if your constants are ints, and your application ever inputs these contants into a table, you will not be able to join to a 'lookup' table and see the lookup values as part of the query results. Instead you will just have to know what an id of 1 means in the query.
Second, be sure to place constants in common sense locations, and try to namespace if you can, that way you can easily find the values if you end up forgetting what they are. If you have a clever IDE, it will do this for you. Also know that if a constant changes, you will have to do a code push vs manipulating database data directly.
I'd say yes to constants, if the list is relatively small (<25) and not prone to change much, however, if your data won't match these constraints, I would suggest popping them into a database.
I would say that the file would be faster, but could be a bigger size, where as the database would be slower, with having to connect and grab the result and return it. Personally I would use a file.
Related
I work at a school and have been looking at a way to speed up and improve the way some of our database functions work.
We have a PHP formatting class that seems to be slowing things down now that the database is getting bigger and bigger and some of the queries longer.
The class does things like take a foreign key and find the value for that key in a lookup table.
For example a student class will use the formatting class for:
courseID = 114 and studentID are looked up to return Biology and John Doe each time using a mysql query.
My issue is there are classes that generate an array of objects for example and array of 500 student objects and each of these student classes accesses this formatter class and thus run several queries.
I am thinking this is slowing things down
Worst case
500 student objects x 10 lookups in the formatter class that means 5000 queries executed.
I am wondering the best way to deal with this.
Do I preload all the lookup data into arrays into that one formatter class?
Make that formatted class a one instance (singleton) so that the worst case scenario, one master class that generates a whole array of classes uses that one and only class.
Is it better to store all that lookup data already parsed in one array (cache issue?)
Some classes now have so many queries they no longer work.
edit 8/23/2013 below
To add further information.
I am not really concerned with a single lookup, those have no issues speed wise. Such as a teacher looking up one student's info. Having a formatter class run 10 queries is no issue.
The class which generate a huge listing of other objects, such as a teacher requesting to see all students, where there is 500 objects is the issue.
I have several of these type of classes, creating a Join for all of them is the probably the fastest but as someone pointed out a lot of work.
edit 1/30/2014
Wanted to thank Lorenz Meyer for the great start to my speed issues, been working on some of the suggestions!!!!
I do have a further related question.
For simpler lookups, say storing the values of 50 pairs of values, for example the teacherIds and the corresponding teacher names.
Option 1:
I have in some cases added a field to some tables and had a script pre-populate those fields with the value, such as the teachers name for the teacherIds in that row. At run time the field already has a value, I did this in some of the huge script and it dramatically cut the amount of queries.
With a Cron to run the script, it is a ok solution but I could see it becoming an issue adding fields just for rendered data to so many tables
Option 2:
I have been thinking of using the $_Session to store that pair data. Once a user logs in, an array of teacherIds and Full Names populate an array in the $_Session data. Any class that previously used a lookup to find a teachers name, could use the $_Session array and use that instead, with a fallback to query the lookup table just in case.
I don't have many concurrent users, 30 at most so it seems this would not be hugely taxing, and would limit it to some of the smaller lookup tables.
What are peoples thoughts on these two options, especially Option 2.
I see three solutions, I present them from the easiest to the heaviest, but most effective.
Caching, but limited to one request
This solution would be to include a static variable in this function and use this as a temporary store for the students and classes. This would result in fewer queries, because you query each student only once and each class only once.
Something like this
function format($studentid, $classid){
static $students = array();
static $classes = array();
if !isset($students[$studentid]) $students[$studentid] db_lookup($studentid);
if !isset($classes[$classid]) $classes[$classid] db_lookup($classid);
$student_name = $students[$studentid];
$class_name = $classes[$studentid];
(...)
instead of
function format($studentid, $classid){
$student_name = db_lookup($studentid);
$class_name = db_lookup($classid);
(...)
This solution is very easy to implement, but it caches the result only for one request, for instance if you display a table wich contains many times the same course.
Caching between requests
For caching between requests, you need to use a cache solution like for instance the PEAR package Cache_Lite. It allows to cache the result of a function call with a fixed value (e.g. db_lookup($studentid=123) ) and store the result in the cache. Cache_Lite implements a memory cache, a file cache and a database cache. I used it with memcache and it worked well.
This solution requires more work, and it will either use diskspace or memory.
Code refactoring
The most efficient solution, but the one that requires the most effort is to refactor your code. It does not make sense to query 500 times the database for one row each time. You should rewrite the code, so that the query gets all data, and then format the data for each row of your record set.
First, I apologize if this just a coding style issue. I'm wondering about the pros and cons of assign a new variable for each property or function to just to re-assign an existing variable. This is assuming you don't need access to the variable beyond the scope.
Here's what I mean (noting that the names $var0,... are just for simplicity),
Option#1:
$var0= array('hello', 'world');
$var1="hello world";
$var2=//some crazy large database query result
$var3=//some complicated function()
vs.
Option#2:
$var0= array('hello', 'world');
$var0="hello world";
$var0=//some crazy large database query result
$var0=//some complicated function()
Does it depend on the memory size of the existing variable? I.e., is re-assigning memory more computationally expensive that assigning a new variable?
Is this always a scope issue, meaning you should use Option#2 only if you don't need each of the variable values outside the scope shown here?
Does it depend on what each variable value is? Does re-assigning to different data types have different costs associated with it?
Technically speaking, reusing variables would be (insignificantly) faster. It will make zero difference in measurable performance though.
While hardware gets cheaper and hours get more expensive, you should rather look to have maintainable code. This will save yourself headaches and your company hard dollars in the long run.
Only optimize where enough performance gain can be made to offset the
amount of work (money) you are putting into it.
Nowadays of clouds and server clusters, a-bit-less-optimized code will most likely not make for a slower project in the end. It is more probable that your project will run just as fast, but will take a few more cpu cycles, and therefore cost you a little bit more money to your hosting provider. This minor added cost though, will most likely not weigh up to hours of optimizing for performance gain. Unless, ofcourse, you're asking this because you're developing for Amazon. (and even at places like Amazon, with millions and millions of hits per day, reusing variables will not result any noticable performance gain)
To get back to your question; I believe you should only reuse a variable when it will hold updated content of the original state. But in general, that doesn't happen too much.
I think in the following situation, reusing the $content var is the logical choice to make
function getContent()
{
$cacheId = 'someUniqueCacheIdSoItDoesNotTriggerANotice';
$content = someCacheLoadingCall( $cacheId );
if (null === $content) {
$content = someContentGeneratingFunction();
someCacheSavingCall( $cacheId, $content);
}
return $content;
}
Descriptive code
Also, please be kind to your future self to always use descriptive names for your variables. You will thank yourself for it. When you then make the pact with yourself to never reuse variables unless it logically makes sense, you've made another step towards maintainable code.
Imagine, that in 6 months from now, after you've done another big project - or a more small projects - you get a call from an important client that there is a bug in the old project. Holy !##! Gotta fix that right now!
You open up and see functions like this everywhere;
function gC()
{
$cI = 'someUniqueCacheIdSoItDoesNotTriggerANotice';
$c = sclc( $cI );
if (null === $c) {
$c = scg_f();
scsc( $cI, $c);
}
return $c;
}
Much better to use descriptive variable and function names and to get a code editor with good code completion so you're still coding as fast as you want. Right now, I would recommend Aptana Studio or Zend Studio, Zend has a little bit better code completion, but Aptana has proven to be more stable.
PS. I don't know your level of programming, sorry if I babbled on too far. If not relevant for you, I hope to have helped someone else who might read this :)
Personally I would say you should never ever reassign a variable to contain different stuff. This makes it really hard to debug. If you are worried about memory consumption you can always unset() variables.
Also note that you should never ever have variables names $var#. Your variablenames should describe what it holds.
In the end of the day it's all about minimizing the number of WTFs inyour code. And option two is one big WTF.
Does it depend on the memory size of the existing variable? I.e., is re-assigning memory more computationally expensive that assigning a new variable?
It's about limiting the number of WTFs for both you and other people (re)viewing your code.
Is this always a scope issue, meaning you should use Option#2 only if you don't need each of the variable values outside the scope shown here?
Well if it is in a totally other scope it is fine if you use the same name multiple names. As long as it is clear what the variabel contains, e.g.:
// perfectly fine to use the same name again. I would go as far as to say this is prefered.
function doSomethingWithText($articleText)
{
// do something
}
$articleText = 'Some text of some article';
doSomethingWithText($articleText);
Does it depend on what each variable value is? Does re-assigning to different data types have different costs associated with it?
Not a matter of cost, but a matter of maintainability. Which is often way more important.
You should never use option #2. Reusing variables for unrelated blocks of code is a terrible practice. You shouldn't even be in a situation where option #2 is possible. If your function is so long that you're changing context completely and working on some different problem, you should refactor your function into smaller single-purpose functions.
You should never reuse a variable out of some desire to "recycle" them after the old value is no longer used. If a variable is no longer it should naturally fall out of scope if you're architecturing your software correctly. Your decision should have nothing to do with performance or memory-optimization, neither of which are affected by the naming of your variables. Your only consideration when picking variable names should be producing maintainable, stable code.
The fact that you're even asking yourself whether to reuse your variables means you're using names which are too generic. Variable names like var0,var1 etc are terrible. You should be naming your variables according to what they actually contain, and declaring a new variable when you need to store a new value.
I've implemented an Access Control List using 2 static arrays (for the roles and the resources), but I added a new table in my database for the permissions.
The idea of using a static array for the roles is that we won't create new roles all the time, so the data won't change all the time. I thought the same for the resources, also because I think the resources are something that only the developers should treat, because they're more related to the code than to a data. Do you have any knowledge of why to use a static array instead of a database table? When/why?
The problem with hardcoding values into your code is that compared with a database change, code changes are much more expensive:
Usually need to create a new package to deploy. That package would need to be regression tested, to verify that no bugs have been introduced. Hint: even if you only change one line of code, regression tests are necessary to verify that nothing went wrong in the build process (e.g. a library isn't correctly packaged causing a module to fail).
Updating code can mean downtime, which also increases risk because what if the update fails, there always is a risk of this
In an enterprise environment it is usually a lot quicker to get DB updates approved than code change.
All that costs time/effort/money. Note, in my opinion holding reference data or static data in a database does not mean a hit on performance, because the data can always be cached.
Your static array is an example of 'hard-coding' your data into your program, which is fine if you never ever want to change it.
In my experience, for your use case, this is not ever going to be true, and hard-coding your data into your source will result in you being constantly asked to update those things you assume will never change.
Protip: to a project manager and/or client, nothing is immutable.
I think this just boils down to how you think the database will be used in the future. If you leave the data in arrays, and then later want to create another application that interacts with this database, you will start to have to maintain the roles/resources data in both code bases. But, if you put the roles/resources into the database, the database will be the one authority on them.
I would recommend putting them in the database. You could read the tables into arrays at startup, and you'll have the same performance benefits and the flexibility to have other applications able to get this information.
Also, when/if you get to writing a user management system, it is easier to display the roles/resources of a user by joining the tables than it is to get back the roles/resources IDs and have to look up the pretty names in your arrays.
Using static arrays you get performance, considering that you do not need to access the database all the time, but safety is more important than performance, so I suggest you do the control of permissions in the database.
Study on RBAC.
Things considered static should be coded static. That is if you really consider them static.
But I suggest using class constants instead of static array values.
I see programmers putting a lot of information into databases that could otherwise be put in a file that holds arrays. Instead of arrays, they'll use many tables of SQL which, I believe, is slower.
CitrusDB has a table in the database called "holiday". This table consists of just one date column called "holiday_date" that holds dates that are holidays. The idea is to let the user add holidays to the table. Citrus and the programmers I work with at my workplace will prefer to put all this information in tables because it is "standard".
I don't see why this would be true unless you are allowing the user, through a user interface, to add holidays. I have a feeling there's something I'm missing.
Sometimes you want to design in a bit of flexibility to a product. What if your product is released in a different country with different holidays? Just tweak the table and everything will work fine. If it's hard coded into the application, or worse, hard coded in many different places through the application, you could be in a world of pain trying to get it to work in the new locale.
By using tables, there is also a single way of accessing this information, which probably makes the program more consistent, and easier to maintain.
Sometimes efficiency/speed is not the only motivation for a design. Maintainability, flexibility, etc are very important factors.
The main advantage I have found of storing 'configuration' in a database, rather than in a property file, or a file full of arrays, is that the database is usually centrally stored, whereas a server may often be split across a farm of several, or even hundreds of servers.
I have implemented, in a corporate environment, such a solution, and the power of being able to change configuration at a single point of access, knowing that it will immediately be propagated to all servers, without the concern of a deployment process is actually very powerful, and one that we have come to rely on quite heavily.
The actual dates of some holidays change every year. The flexibility to update the holidays with a query or with a script makes putting it in the database the easiest way. One could easily implement a script that updates the holidays each year for their country or region when it is stored in the database.
Theoretically, databases are designed and tuned to provide faster access to data than doing a disk read from a file. In practice, for small to mid-sized applications this difference is minuscule. Best practices, however, are typically oriented at larger scale. By implementing best practices on your small application, you create one that is capable of scaling up.
There is also the consideration of the accessibility of the data in terms of other aspects of the project. Where is most of the data in a web-based application? In the database. Thus, we try to keep ALL the data in the database, or as much as is feasible. That way, in the future, if you decide that now you need to join the holiday dates again a list of events (for example), all the data is in a single place. This segmenting of disparate layers creates tiers within your application. When each tier can be devoted to exclusive handling of the roles within its domain (database handles data, HTML handles presentation, etc), it is again easier to change or scale your application.
Last, when designing an application, one must consider the "hit by a bus principle". So you, Developer 'A', put the holidays in a PHP file. You know they are there, and when you work on the code it doesn't create a problem. Then.... you get hit by a bus. You're out of commission. Developer 'B' comes along, and now your boss wants the holiday dates changed - we don't get President's Day off any more. Um. Johnny Next Guy has no idea about your PHP file, so he has to dig. In this example, it sounds a little trivial, maybe a little silly, but again, we always design with scalability in mind. Even if you KNOW it isn't going to scale up. These standards make it easier for other developers to pick up where you left off, should you ever leave off.
The answer lays in many realms. I used to code my own software to read and write to my own flat-file database format. For small systems, with few fields, it may seem worth it. Once you learn SQL, you'll probably use it for even the smallest things.
File parsing is slow. String readers, comparing characters, looking for character sequences, all take time. SQL Databases do have files, but they are read and then cached, both more efficiently.
Updating & saving arrays require you to read all, rebuild all, write all, save all, then close the file.
Options: SQL has many built-in features to do many powerful things, from putting things in order to only returning x through y results.
Security
Synchronization - say you have the same page accessed twice at the same time. PHP will read from your flatfile, process, and write at the same time. They will overwrite each other, resulting in dataloss.
The amount of features SQL provides, the ease of access, the lack of things you need to code, and plenty other things contribute to why hard-coded arrays aren't as good.
The answer is it depends on what kind of lists you are dealing with. It seems that here, your list consists of a small, fixed set of values.
For many valid reasons, database administrators like having value tables for enumerated values. It helps with data integrity and for dealing wtih ETL, as two examples for why you want it.
At least in Java, for these kinds of short, fixed lists, I usually use Enums. In PHP, you can use what seems to be a good way of doing enums in PHP.
The benefit of doing this is the value is an in-memory lookup, but you can still get data integrity that DBAs care about.
If you need to find a single piece of information out of 10, reading a file vs. querying a database may not give a serious advantage either way. Reading a single piece of data from hundreds or thousands, etc, has a serious advantage when you read from a database. Rather than load a file of some size and read all the contents, taking time and memory, querying from the database is quick and returns exactly what you query for. It's similar to writing data to a database vs text files - the insert into the database includes only what you are adding. Writing a file means reading the entire contents and writing them all back out again.
If you know you're dealing with very small numbers of values, and you know that requirement will never change, put data into files and read them. If you're not 100% sure about it, don't shoot yourself in the foot. Work with a database and you're probably going to be future proof.
This is a big question. The short answer would be, never store 'data' in a file.
First you have to deal with read/write file permission issues, which introduces security risk.
Second, you should always plan on an application growing. When the 'holiday' array becomes very large, or needs to be expanded to include holiday types, your going to wish it was in the DB.
I can see other answers rolling in, so I'll leave it at that.
Generally, application data should be stored in some kind of storage (not flat files).
Configuration/settings can be stored in a KVP storage (such as Redis) then access it via REST API.
I am curious, is there any performance gain, like using less memory or resources in PHP for:
50 different setting variables saved into array like this
$config['facebook_api_secret'] = 'value here';
Or 50 different setting variables saved into a Constant like this
define('facebook_api_secret', 'value here');
I think this is in the realm of being a micro-optimization. That is, the difference is small enough that it isn't worth using one solution over the other for the sake of performance. If performance were that critical to your app, you wouldn't be using PHP! :-)
Use whatever is more convenient or that makes more sense. I put config data into constants if only because they shouldn't be allowed to change after the config file is loaded, and that's what constants are for.
In my informal tests, I've actually found that accessing/defining constants is a bit slower than normal variables/arrays.
it's not going to make a different anyway; more than likely whatever you do with these will happen in thousandths of a second.
Optimizing your DB (indexing, using EXPLAIN to check your queries) and server set up (using APC) will make more a difference in the long run.
Performance gains for 50 variables using a different coding technique / clever programming tricks is the wrong way to do things in PHP. Always remember: the optimizer is smarter than you are.
You will not receive any kind of performance gain for either of these. The real question is which one is more useful.
For scalar values (Strings, ints, etc) that are defined once, should never change, and need to be accessible all over the place, you should use a constant.
If you have some kind of complex nested configuration, eg:
$config->facebook->apikey = 'secret_key';
$config->facebook->url = 'http://www.facebook.com';
You may want to use an array or a configuration api provided by one of the many frameworks out there (Zend_Config isn't bad)