PHP Options Map Discussion

PHP Options Map Discussion - php

Every site should have options, changeable via a Control Panel. My new project is going to have this. However, I am curious as to which is the best method for storing these options.
Here are my known methods:
I have tested these, although not very thoroughly, as in all likelihood I will not find the long-term problems or benefits.
Using a MySQL table with fields key and value, where each column
is a new key/value pair. The downside to this is that MySQL can be
slow, and in fact, would require a loop before every page to fetch
the options from the database and parse it into an array.
Using a MySQL table with a field for each value and a single record.
The downside to this is that each new options requires a new field,
and this is not the standard use of MySQL tables, but a big benefit
is that it requires a single function to bring it into a PHP indexed
array.
Using a flat file containing the options array in serialized form,
using the PHP functions serialize and unserialize. The main
problem with this method is that I would have to first traverse to
the file, read in the whole file, and serializing can be slow, so it
would get slower as more options are created. It also offers a small
layer of obfuscation to the data.
Using an ini file. Ini parser's are rather fast, and this options
would make it easy to pass around a site configuation. However, as
above, I would have to traverse to the ini, and also, using an ini
file with PHP is generally unused.
Other formats, such as XML and JSON, have all been considered
too. However, they all require some sort of storage, and I am mostly
curious about the benefits of each kind of storage.
These are my specific idealistic requirements:
So the basic thing I am looking for is speed, security, and portability. I want the configuration to not be human readable (hence, an unencrypted flat file is bad), be easily portable (ruling out MySQL), and have almost zero but constant performance impact (ruling out most options).
I am not trying to ask people to write code for me, or anything like that. I just need a second pair of eyes on this problem, possibly bringing up points that I never factored in.
Thank you for your help
Thank you- Daniel.

Using a MySQL table with fields key and value, where each column is a
new key/value pair. The downside to this is that MySQL can be slow,
and in fact, would require a loop before every page to fetch the
options from the database and parse it into an array.
That is false. Unless you plan on storing couple hundred million configuration pairs, you will be fine and dandy. If you worry about performance using this method, simply cache the query (and wipe the cache only when you make changes inside the table).
This will also give you most flexibility, ease of use and so on.

Related

MySQL + PHP: ~500kb JSON file - Loading data as tables and fields VS as a single serialized variable

I am making a website that interacts with an offline project through json files sent from the offline project to the site.
The site will need to load these files and manipulate the data.
Is it feasible with modern computing power to simply load these files into the database as a single serialized field, which can then be loaded and decoded for every use?
Or would it save significant overhead to properly store the JSON as tables and fields and refer to those for every use?

Without knowing more about the project, a table with multiple fields is probably the better solution.
There will be more options for the data in the long run, for example, indexing fields, searching through fields and many other MySQL commands that would not be possible if it was all stored in a single variable.
Consider future versions of the project too, example adding another field to a table is easy, but adding another field to a block of JSON would be more difficult.
Project growth, what if you experience 100x or 1000x growth will the table handle the extra load.
500kb is a relatively small data block, there shouldn't be any issue with computing power regardless of which method is used, although more information would be handy here, example 500kb per user, per upload, how many stores a day how often is it accessed.
Debugging will also be easier.

The New MySQL shell has a bulk JSON loader that is not only very quick but lets you have a lot of control on how the data is handled. See https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-json.html

Load it as JSON.
Think about what queries you need to perform.
Copy selected fields into MySQL columns so that those queries can use WHERE, GROUP BY, ORDER BY of MySQL instead of having to deal with the processing in the client.
A database table contains a bunch of similarly structured rows. Each row has a constant set of columns. (NULLs can be used to indicate missing columns for a given row.) JSON complicates things by providing a complex column. My advice above is a compromise between the open-ended flexibility of JSON and the need to use the database server to process lots of data. Further discussion here.

All WordPress options as single option (serialized multidimentional array), or multiple options?

I know my way around WordPress, but right now I'm developing a rather big and advanced WordPress plugin.
For this reason I've put a lot of thought into my data structure.
When I was a beginner I always used to save it like this(get_option(prefix_option_name)).
Then I started using multidimentional arrays, registering 1 for each settings_section and now I typically save all plugin options in 1 big multidimentional array like this: plugin_options[section][option][evt.more subs here][etc]
This does work fine, and I do like the fact that I can just pull all options out one time in the init-hook ($plugin_options = get_option('plugin_options), so I can work with the $options "locally" in the plugin, HOWEVER...
Taking into account that WordPress is already utilizing transients to cache (WP Cache API) the get_option call, which is better for performance? Even though my plugin has a lot of options, I guess you could never reach the limit of the longtext type (4gb data or something), even I packed it all in one single serializable multidimentional array? But I want to do what's best from a performance point of view, so in short, here's my question again:
What is best (for a rather big and complex wordpress plugin)?
Saving all your plugin options as a single option (serialized multidimentional array), like eg. name='plugin_options[section][option]'
Splitting each options tab and options page into it's own options entry like eg: section[option][etc]
simply just prefixing all your plugin options and putting then as a seperate db entry like eg. pluginname_option_1, pluginname_option_2
I like the "single plugin option" approach, but right now I'm confused as to whether or not fetching/updating 1 big array from the db really is the best way to go, if the array get's REALLY big - like in a very big and advanced plugin.
The problem with 3 as I see it is that with 1, you would only need to fetch all options in one db-call, where in 3 (where you save each option as a db entry for itself), you would have to query the db for each specific and individual option.
But which is better 1 call for all options, 1 for each section or 1 for each individual option (I guess my question could be narrowed down to this in the end :D). Can the serializable "single option" plugin option multi-dimentional array realistically grow too big? Should it be split up?
Look forward to hearing your opinions on this. Cheers. :-)

I tend to prefer separate options for unrelated data. It most cases it doesn't matter for performance, but there significant benefits compared to combining them.
Performance
If the option is autoloaded -- which they are by default -- then using separate options won't result in any extra database queries.
If you're using Memcache, then by default objects are limited to 1mb, and if your option grows beyond that, it will bypass caching and hit the database every time.
Other Considerations
I think Core has been designed with the assumption that options will be separated, so if you combine them then you have to do extra work to utilize some of the helpful things that Core otherwise gives you for free.
For example, all of these common practices are easy to do with individual options, but require extra logic to reduce the combined option to the targeted entry:
pre_update_option{$option_name}
Sanitization callbacks with register_setting
Retrieving settings from the REST API
It can also be an "all your eggs in one basket" problem; if a bug or some other factor causes unexpected data loss, the user can lose everything, instead of just a single datum. That's rare in practice, but I've seen it happen, with very damaging effects. People should have backups, but many don't, and I've also seen backups fail in practice.

Personally, I usually go for the single option.
If you're using multiple options in a single page load, then each of those would be one db call (unless they're autoloaded). If you're using all or most of your options in a page anyway, then you would be saving a db call for each option you're using if you put them all in a single option. The savings here can stack up quick.
However, this still depends on how much data you're putting in your single option though. If it's quite large then unserializing or serializing your data might have a performance impact on your server. (see: serialize a large array in PHP?). You will have to do some sort of benchmark to measure which is faster if you're handling lots of data.

FWIW, WordPress already autoloads and caches all options on each page load (unless an option is saved as not to do so), so it doesn't really matter.

when should I use a static array instead of a new table in my database?

I've implemented an Access Control List using 2 static arrays (for the roles and the resources), but I added a new table in my database for the permissions.
The idea of using a static array for the roles is that we won't create new roles all the time, so the data won't change all the time. I thought the same for the resources, also because I think the resources are something that only the developers should treat, because they're more related to the code than to a data. Do you have any knowledge of why to use a static array instead of a database table? When/why?

The problem with hardcoding values into your code is that compared with a database change, code changes are much more expensive:
Usually need to create a new package to deploy. That package would need to be regression tested, to verify that no bugs have been introduced. Hint: even if you only change one line of code, regression tests are necessary to verify that nothing went wrong in the build process (e.g. a library isn't correctly packaged causing a module to fail).
Updating code can mean downtime, which also increases risk because what if the update fails, there always is a risk of this
In an enterprise environment it is usually a lot quicker to get DB updates approved than code change.
All that costs time/effort/money. Note, in my opinion holding reference data or static data in a database does not mean a hit on performance, because the data can always be cached.

Your static array is an example of 'hard-coding' your data into your program, which is fine if you never ever want to change it.
In my experience, for your use case, this is not ever going to be true, and hard-coding your data into your source will result in you being constantly asked to update those things you assume will never change.
Protip: to a project manager and/or client, nothing is immutable.

I think this just boils down to how you think the database will be used in the future. If you leave the data in arrays, and then later want to create another application that interacts with this database, you will start to have to maintain the roles/resources data in both code bases. But, if you put the roles/resources into the database, the database will be the one authority on them.
I would recommend putting them in the database. You could read the tables into arrays at startup, and you'll have the same performance benefits and the flexibility to have other applications able to get this information.
Also, when/if you get to writing a user management system, it is easier to display the roles/resources of a user by joining the tables than it is to get back the roles/resources IDs and have to look up the pretty names in your arrays.

Using static arrays you get performance, considering that you do not need to access the database all the time, but safety is more important than performance, so I suggest you do the control of permissions in the database.
Study on RBAC.

Things considered static should be coded static. That is if you really consider them static.
But I suggest using class constants instead of static array values.

Which is better? An extra database call or a generated PHP file?

I want to add some static information associated with string keys to all of my pages. The individual PHP pages use some of that information filtered by a query string. Which is the better approach to add this information? Generate a 100K (or larger if more info is needed later) PHP file with an associated array or add an other DB table with this info and query that?
The first solution involves loading the 100K file every time even if I use only some of the information on the current page. The second on the other hand adds an extra database call to the rendering of every page.
Which is the less costly if there are a large number of pages? Loading a PHP file or making an extra db call?

Unless it is shown to really be a bottleneck (be it including the php file or querying the database), you should choose the option that is best maintainable.
My guess is that it is the second option. Store it in a database.

Storing it in a database is a much better plan. With the database you can provide better data constraints, more easily cross reference with other data and create strong relationships. You may or may not need that at this time, but it's a much more flexible solution in the end.
What is the data used for? I'm wondering if the data you need could be stored in a session variable/cookie once it is pulled from the database which would allow you to not query the db on the rendering of every page.

If you were to leverage a PHP file then utilizing APC or some other opcode cache will mitigate performance concerns as your PHP files will only be loaded each time the file changes.
However, as others have noted, a database is the best place to store this stuff as it is much easier to maintain (this should be your priority to begin with).
Having ensured ease of maintenance and a working application, should you require a performance boost then generally accepted practice would be to cache this static data in an in-memory key/value store such as memcached. This will give you rapid access to your static values (for most requests).

I wouldn't call this information "static".
To me, it's just a routine call to get dome information from the database, among other calls being made to assemble whole page. What I am missing?
And I do agree with Dennis, all optimizations should be based on real needs and profiling. Otherwise it's effect could be opposite.
If you want to utilize some caching, consider to implement Conditional GET for the whole page.

Is PHP serialization a good choice for storing data of a small website modified by a single person

I'm planning a PHP website architecture. It will be a small website with few visitors and small set of data. The data is modified exclusively by a single user (administrator).
To make things easier, I don't want to bother with a real database or XML data. I think about storing all data through PHP serialization into several files. So for example if there are several categories, I will store an array containing Category class instances for each category.
Are there any pitfalls using PHP serialization in those circumstances?

Use databases -- it is not that difficult and any extra time spent will be well learnt with database use.
The pitfalls I see are as Yehonatan mentioned:
1. Maintenance and adding functionality.
2. No easy way to query or look at data.
3. Very insecure -- take a look at "hackthissite.org". A lot of the beginning examples have to do with hacking where someone put the data hard coded in files.
4. Serialization will work for one array, meaning one table. If you have to do anything like have parent categories that have to match up to other data, not going to work so well.

The pitfalls come when with maintenance and adding functionality.
it is a very good way to learn but you will appreciate databases more after the lessons.

I tried to implement PHP serialization to store website data. For those who want to do the same thing, here's a feedback from the project started a few months ago and heavily modified since:
Pros:
It was very easy to load and save data. I don't have to write SQL queries, optimize them, etc. The code is shorter (with parametrized SQL queries, it may grow a lot).
The deployment does not require additional effort. We don't care about what is supported on the web server: if there is just PHP with no additional extensions, database servers, etc., the website will still work. Sqlite is a good thing, but it is not possible to install it on some servers, and it also requires a PHP extension.
We don't have to care about updating a database server, nor about the database server to use (thus avoiding the scenario where the customer wants to migrate from Microsoft SQL Server to Oracle, etc.).
We can add more properties to the objects without having to break everything (just like we can add other columns to the database).
Cons:
Like Kerry said in his answer, there is "no easy way to query or look at data". It means that any business intelligence/statistics cases are impossible or require a huge amount of work. By the way, some basic scenarios become extremely complicated. Let's say we store products and we want to know how much products there are. Instead of just writing select count(1) from Products, in my case it requires to create a PHP file just for that, load all data then count the number of items, sometimes by adding stuff manually.
Some changes required to implement data migration, which was painful and required more work than just executing an SQL query.
To conclude, I would recommend using PHP serialization for storing data of a small website modified by a single person only if all the following conditions are true:
The deployment context is unknown and there are chances to have a server which supports only basic PHP with no extensions,
Nobody cares about business intelligence or similar usages of the information,
There will be no changes to the requirements with large impact on the data structure.

I would say use a small database like sqlite if you don't want to go through setting up a full db server. However I will also say that serializing an array and storing that in a text file is pretty dang fast. I've had to serialize an array with a few thousand records (a dump from a database) and used that as a temp database when our DB server was being rebuilt for a few days.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.