Why vbulletin uses ENUMs? - php

Today, I've had some debate with my colleague about choosing data types in our projects.
We're web developers and we code back-end in PHP, and for database we use mySQL.
So, I went on internet a bit, and they don't recommend ENUM data type for various reasons (also I've read here on SO that this is not recommended) - For ENUM('yes','no') for example you should use tinyint(1) .
If ENUMs are bad and should be avoided, why does vBulletin for example, uses them?
Why use them at all when you can use VARCHAR, TEXT and so on and enforce use of 1 of 2 possible values in PHP.
Thank you for your answers.

Enums aren't ideal, but they are waaaay better than your alternative suggestion of using a VARCHAR and enforcing one of a few possible values!
Enums store their data as a numeric value. This is ideal for storing a field with a limited set of possible values such as 'yes' or 'no', because it uses the minimum amount of space, and gives the quickest possible access, especially for searches.
Where enums fall over is if you later need to add additional values to the list. Let's say you need to have 'maybe' as well as 'yes' or 'no'. Because it's stored in an enum, this change requires a database change. This is a bad thing for several reasons - for example, if you have a large data set, it can take a significant amount of time to rebuild the table.
The solution to this is to use a related table which stores a list of possible values, and your original field would now simply contain an ID reference to your new table, and queries would make a join to the lookup table to get the string value. This is called "normalisation" and is considered good database practice. It's a classic relational database scenario to have a large number of these lookup tables.
Obviously, if you care fairly sure that the field will never store anything other than 'yes' or 'no', then it can be overkill to have a whole extra table for it and an enum may be appropriate.
Some database products do not even provide an enum data type, so if you're using these DBs, you are forced to use the lookup table solution (or just a simple numeric field, and map the values in your application).
What is never appropriate in this situation is to use an actual string value in the table. This is considered extremely poor practice.
VARCHARS take up much more disk space than the numeric values used by an enum. They are also slower to read, and slower to look up in a query. In addition, they remove the enforcement of fixed values provided by enum. This means that a bug in your program could result in invalid values going into the data, as could an inadvertant update using PHPMyAdmin or a similar tool.
I hope that helps.

Related

Should I store product lists to users with arrays in MySQL?

Well, I have a big dilemma in my life about how to store product lists when I need to link that lists to an ID or something like that.
For this first example I'll gonna use the user cart.
I've always used this, even never liking it:
ID | PRODUCT
12 | Ring
12 | Necklace
12 | Bracelet
But lately I've been thinking about store arrays in MySQL. While it sounds like a good idea (in first view), by using arrays I'd only be able to manage the carts through PHP, by creating some kind of control panel or something.
Unfortunately, there is a con in it. Although the whole system take less space than the other way, I wouldn't be able to handle things through MySQL itself. Like, if someone make an order, I wouldn't be able to SUM the prices*quantity to get the order value.
What is the best way? Is there another way?
As soon as you store unstructured information in MySQL you lose most of the benefits of using a relational database and you only have an over-engineered file system (with, that's true, excellent multi-user capabilities). If you aren't familiar with SQL (which I suspect is the case) you'll initially think you're speeding development. But one day, if the shop hopefully grows, you'll realise that retrieving everything for further PHP postprocessing doesn't scale well.
While it can certainly make sense to outsource certain features (not the complete app) to a nosql database if you have a high concurrence site, if you use a relational database you'd better use it as it's meant to be used: with properly structured and normalised information.
If you are going to need this operations over elements:
searching
sorting
filtering elements
aggregate functions
I would recomend you storing elements in separate rows. This way you are going to get full control over elements in "your arrays"
I wouldn't be able to handle things through MySQL itself. Like, if someone make an order, I wouldn't be able to SUM the prices*quantity to get the order value.
It seems you understand why your 'arrays' idea isn't a good one :)
Consider that XML (or JSON or similar) was invented because distributed systems get disconnected. It is OK to stored such 'complex' values in a database, even better if the DBMS has support for XML as a data type.
However, if you need to query the 'complex' value in the DBMS then you should be looking for first class support for such data types. For example, most SQL DBMSs have support for a timestamp/datetime temporal type: being able to declare a column of that type and compare two values of the type is one thing but what you really need are the temporal functions that will e.g. get time granules (years, days, etc), formats (weekday etc), support UTC and timezones, etc.
For your 'arrays' I'm guessing you will need to handle serialization and storage yourself, plus all that entails e.g. convert to a delimited string, escaping delimiting characters, and storing in a text column. Once you have done that work you will be left with no support on the DBMS side.

Using VARCHAR in MySQL for everything! (on small or micro sites)

I tried searching for this as I felt it would be a commonly asked beginner's question, but I could only find things that nearly answered it.
We have a small PHP app that is at most used by 5 people (total, ever) and maybe 2 simultaneously, so scalability isn't a concern.
However, I still like to do things in a best practice manner, otherwise bad habits form into permanent bad habits and spill into code you write that faces more than just 5 people.
Given this context, my question is: is there any strong reason to use anything other than VARCHAR(250+) in MySQL for a small PHP app that is constantly evolving/changing? If I picked INT but that later needed to include characters, it would be annoying to have to go back and change it when I could have just future-proofed it and made it a VARCHAR to begin with. In other words, choosing anything other than VARCHAR with a large character count seems pointlessly limiting for a small app. Is this correct?
Thanks for reading and possibly answering!
If you have the numbers 1 through 12 in VARCHAR, and you need them in numerical order, you get 1,10,11,12,2,3,4,5,6,7,8,9. Is that OK? Well, you could fix it in SQL by saying ORDER BY col+0. Do you like that kludge?
One of the major drawbacks will be that you will have to add consistency checks in your code. For a small, private database, no problem. But for larger projects...
Using the proper types will do a lot of checks automatically. E.g., are there any wrong characters in the value; is the date valid...
As a bonus, it is easy to add extra constraints when using right types; is the age less than 110; is the start date less than the end date; is the indexing an existing value in another table?
I prefer to make the types as specific as possible. Although server errors can be nasty and hard to debug, it is way better than having a database that is not consistent.
Probably not a great idea to make a habit out of it as with any real amount of data will become inefficient. If you use the text type the amount of storage space used for the same amount of data will be differ depending on your storage engine.
If you do as you suggested don't forget that all values that would normally be of a numeric type will need to be converted to a numeric type in PHP. For example if you store the value "123" as a varchar or text type and retrieve it as $someVar you will have to do:
$someVar = intval($someVar);
in PHP before arithmetic operations can be performed, otherwise PHP will assume that 123 is a string.
As you may already know VARCHAR columns are variable-length strings. We have the advantage of dynamic memory allocation when using VARCHAR.
VARCHAR is stored inline with the table which makes faster when the size is reasonable.
If your app need performance you can go with CHAR which is little faster than VARCHAR.

MySql Tinytext vs Varchar vs Char

Building a system that has the potential to get hammered pretty hard with hits and traffic.
It's a typical Apache/PHP/MySql setup.
Have build plenty of systems before, but never had a scenario where I really had to make decisions regarding potential scalability of this size. I have dozens of questions regarding building a system of this magniture, but for this particular question, I am trying to decide on what to use as the data type.
Here is the 100ft view:
We have a table which (among other things) has a description field. We have decided to limit it to 255 characters. It will be searchable (ie: show me all entries with description that contains ...). Problem: this table is likely to have millions upon millions of entries at some point (or so we think).
I have not yet figured out the strategy for the search (the MySql LIKE operator is likely to be slow and/or a hog I am guessing for such a large # records), but thats for another SO question. For this question, I am wondering what the pro's and cons are to creating this field as a tinytext, varchar, and char.
I am not a database expert, so any and all commentary is helpful. Thanks -
Use a CHAR.
BLOB's and TEXT's are stored outside the row, so there will be an access penalty to reading them.
VARCHAR's are variable length, which saves storage space by could introduce a small access penalty (since the rows aren't all fixed length).
If you create your index properly, however, either VARCHAR or CHAR can be stored entirely in the index, which will make access a lot faster.
See: varchar(255) v tinyblob v tinytext
And: http://213.136.52.31/mysql/540
And: http://forums.mysql.com/read.php?10,254231,254231#msg-254231
And: http://forums.mysql.com/read.php?20,223006,223683#msg-223683
Incidentally, in my experience the MySQL regex operator is a lot faster than LIKE for simple queries (i.e., SELECT ID WHERE SOME_COLUMN REGEX 'search.*'), and obviously more versatile.
I believe with varchar you've got a variable length stored in the actual database at the low levels, which means it could take less disk space, with the text field its fixed length even if a row doesn't use all of it. The fixed length string should be faster to query.
Edit: I just looked it up, text types are stored as variable length as well. Best thing to do would be to benchmark it with something like mysqlslap
In regards to your other un-asked question, you'd probably want to build some sort of a search index that ties every useful word in the description field individually to a description, then you you can index that and search it instead. will be way way faster than using %like%.
In your situation all three types are bad if you'll use LIKE (a LIKE '%string%' won't use any index created on that column, regardless of its type) . Everything else is just noise.
I am not aware of any major difference between TINYTEXT and VARCHAR up to 255 chars, and CHAR is just not meant for variable length strings.
So my suggestion: pick VARCHAR or TINYTEXT (I'd personally go for VARCHAR) and index the content of that column using a full text search engine like Lucene, Sphinx or any other that does the job for you. Just forget about LIKE (even if that means you need to custom build the full text search index engine yourself for whatever reasons you might have, i.e. you need support for a set of features that no engine out there can satisfy).
If you want to search among millions of rows, store all these texts in a different table (which will decrease row size of your big table) and use VARCHAR if your text data is short, or TEXT if you require greater length.
Instead of searching with LIKE use a specialized solution like Lucene, Sphinx or Solr. I don't remember which, but at least one of them can be easily configured for real-time or near real-time indexing.
EDIT
My proposition of storing text in different table reduces IO required for main table, but when data is inserted it requires to keep an additional index and adds join overhead in selects, so is valid only if you use your table to read a few descriptions at once and other data from the table is is used more often.

What is the best strategy to store user searches for an email alert?

Users can do advanced searches (they are many possible parameters):
/search/?query=toto&topic=12&minimumPrice=0&maximumPrice=1000
I would like to store the search parameters (after the /search/?) for an email alert.
I have 2 possibilites:
Storing the raw request (query=toto&topicId=12&minimumPrice=0&maximumPrice=1000) in a table with a structure like id, parameters.
Storing the request in a structured table id, query, topicId, minimumPrice, maximumPrice, etc.
Each solution has its pros and cons. Of course the solution 2 is the cleaner, but is it really worth the (over)effort?
If you already have implemented such a solution and have experienced the maintenance of it, what is the best solution?
The better solution should be the best for each dimension:
Rigidity
Fragility
Viscosity
Performance
Daniel's solution is likely to be the cleanest solution, but I get your point about performance. I'm not very familiar with PHP, but there should be some db abstraction library that takes care relations and multiple inserts so that you get the best performance, right? I only mention it because there may not be a real performance issue. DO you have load tests that point to an issue perhaps?
Anyway, if it is between your original 2 solutions, I would have to select the first. Having a table with column names (like your solution #2) is just asking for trouble. If you add new params, you have to modify the table columns. And there is the ever present issue of "what do we put to indicate not selected vs left empty?"
So I don't agree that solution 2 is cleaner.
You could have a table consisting of three columns: search_id, key, value with the two first being the primary key. This way you can reconstruct a particular search if you have the ID of a saved search. This also allows you to expand with additional search keywords without having to actually modify your table.
If you wish, you can also have key be a foreign key to another table containing valid search terms to ensure integrity. Whether you want to do that depends on your specific needs though.
Well that's completely dependent on what you want to do with the data. For the PHP part, you need to process it anyway, either on insertion or selection time.
For really large number of parameters you may save some time with the 1st on the database management/maintenance, since you don't need to change anything about your database scheme.
Daniel's answer is a generic solution, but if you consider performance an issue, you may end up doing too many inserts on the database side for a single search (one for each parameter). Too many inserts is a common source of performance problems.
You know your resources.

Thoughts/Input about Database Design for a CMS

I'm just about to expand the functionality of our own CMS but was thinking of restructuring the database to make it simpler to add/edit data types and values.
Currently, the CMS is quite flat - the CMS requires a field in the database for every type of stored value (manually created).
The first option that comes to mind is simply a table which keeps the data types (ie: Address 1, Suburb, Email Address etc) and another table which holds values for each of these data types. Just like how Wordpress keeps values in the 'options' table, PHP serialize would be used to store an array of values.
The second option is how Drupal works, the CMS creates tables for every data type. Unlike Wordpress, this can be a bit of an overkill but really useful for SQL queries when ordering and grouping by a particular value.
What's everyone's thoughts?
In my opinion, you should avoid serialization where possible. Your relational database should be relational, and thus be structured as such. This would include the 'Drupal Method', e.g. one table per data type. This also keeps your database healthy in a sense that it can be indexed en easily queried upon.
Unless you plan to have lots of different data types that will be added in the future which are unknown now, this is not really going to help you and would be overkill. If you have very wide tables and lots of holes in your data (i.e. lots of columns that seem to be NULL at random) then that is a pattern that is screaming to maybe have a seperate table for data that may only belong to certain entries.
Keep it simple and logical. Don't abstract for the sake of abstraction. Indeed, storing integers is cheaper with regards to storage space but unless that is a problem then don't do it in this case.

Categories