How to store searchable arrays in MySQL

How to store searchable arrays in MySQL - php

So I've got this form with an array of checkboxes to search for an event. When you create an event, you choose one or more of the checkboxes and then the event gets created with these "attributes". What is the best way to store it in a MySQL database if I want to filter results when searching for these events? Would creating several columns with boolean values be the best way? Or possibly a new table with the checkbox values only?
I'm pretty sure selializing is out of the question because I wouldn't be able to query the selialized string for whether the checkbox was ticked or not, right?
Thanks

You can use the set datatype or a separate table that you join. Either will work.
I would not do a bunch of columns though.
You can search the set easily using FIND_IN_SET(), but it's not indexed, so it depends on how many rows you expect (up to a few thousand is probably OK - it's a very fast search).
The normal solution is a separate table with one column being the ID of the event, and the second column being the attribute using the enum datatype (don't use text, it's slower).

create separate columns or you can store them all in one column using bit mask

One way would be to create a new table with a column for each checkbox, as already described by others. I'll not add to that.
However, another way is to use a bitmask. You have just one column myCheckboxes and store the values as an int. Then in the code you have constants or another appropriate way to store the correlation between each checkbox and it's bit. I.e.:
CHECKBOX_ONE 1
CHECKBOX_TWO 2
CHECKBOX_THREE 4
CHECKBOX_FOUR 8
...
CHECKBOX_NINE 256
Remember to always use the next power of two for new values, otherwise you'll get values that overlap.
So, if the first two checkboxes have been checked you should have 3 as the value of myCheckboxes for that row. If you have ONE and FOUR checked you'd have 9 as the values of myCheckboxes, etc. When you want to see which rows have say checkboxes ONE, THREE and NINE checked your query would be like:
SELECT * FROM myTable where myCheckboxes & 1 AND myCheckboxes & 4 AND myCheckboxes & 256;
This query will return only rows having all this checkboxes marked as checked.
You should also use bitwise operations when storing and reading the data.
This is a very efficient way when it comes to speed. You have just a single column, probably just a smallint, and your searches are pretty fast. This can make a big difference if you have several different collections of checkboxes that you want to store and search trough. However, this makes the values harder to understand. If you see the value 261 in the DB it'll not be easy for a human to immeditely see that this means checkboxes ONE, THREE and NINE have been checked whereas it is much easier for a human seeing separate columns for each checkbox. This normally is not an issue, cause humans don't need to manually poke the database, but it's something worth mentioning.
From the coding perspective it's not much of a difference, but you'll have to be careful not to corrupt the values, cause it's not that hard to mess up a single int, it's magnitudes easier than screwing the data than when it's stored in different columns. So test carefully when adding new stuff. All that said, the speed and low memory benefits can be very big if you have a ton of different collections.

Related

MySQL Table - Large Gaps in ID Field

I'm new to MySQL so please forgive me if this question is too "basic".
We imported data from another database to MySQL. In two of the tables, there are large gaps in the ID field. For example, in one table, IDs 1 to 5438 have smaller gaps but then the next few IDs are 5823, 6612, 7880, 8577, 12541 and it continues like this to 54189. Then it jumps to 441739936 and continues to increase with large gaps in between to 3872082950. I'm assuming that when we start adding data to this table the next ID will be 3872082951 (it's set to auto-increment). The table only has 5234 rows.
My questions are:
Is there any problem with having these large gaps? Will it negatively impact query response time? Are there any other negative side effects of having these large gaps?
Is it fine to leave it as is or are we better off renumbering the IDs so they are sequential without gaps?

There is no problem or penalty with allowing large gaps to remain in the database. There's no impact to performance. Auto-increment id's must be unique, but there's no need for them to be consecutive.

1) No, it wont
2) Probably you should not, remember that id can be used to reference a data row in other table, so if you renumber this, you can change the relations.

There isn't any problem having the gaps, as stated by Bill, but be careful to set the correct data type for the field to cope with the size of the value (bigint, etc).
You can read more here: http://dev.mysql.com/doc/refman/5.7/en/example-auto-increment.html and find the Numeric Data types here: http://dev.mysql.com/doc/refman/5.7/en/numeric-types.html

How to insert data in the nested set model(MySQL);

In the nested set model we have LEFT and Right columns
the first time when the table is empty, what i need to insert to the RIGHT column, if i don't know how many children will i have
LEFT 1 - forever
RIGHT ? - what value goes here??
how to make it dynamic? not static.
ps: using php

I'm assuming from your tags and title that you are looking for a solution that works with MySQL.
Yes, you are right that unless you know the number of elements in advance the value for right needs to be calculated dynamically. There are two approaches you can use:
You could start with the least value that works (2 in this case) and increase it later as needed.
You could just make a guess like 10000000 and hope that's enough, but you need to be prepared for the possibility that it wasn't enough and may need adjusting again later.
In both cases you need to implement that the left and right values for multiple rows may need to be adjusted when inserting new rows, but in the second case you only actually need to perform the updates if your guesses were wrong. So the second solution is more complex, but can give better performance.
Note that of the four common ways to store heirarchical data, the nested sets approach is the hardest to perform inserts and updates. See slide 69 of Bill Karwin's Models for Heirarchical Data.

When to use comma-separated values in a DB Column?

OK, I know the technical answer is NEVER.
BUT, there are times when it seems to make things SO much easier with less code and seemingly few downsides, so please here me out.
I need to build a Table called Restrictions to keep track of what type of users people want to be contacted by and that will contain the following 3 columns (for the sake of simplicity):
minAge
lookingFor
drugs
lookingFor and drugs can contain multiple values.
Database theory tells me I should use a join table to keep track of the multiple values a user might have selected for either of those columns.
But it seems that using comma-separated values makes things so much easier to implement and execute. Here's an example:
Let's say User 1 has the following Restrictions:
minAge => 18
lookingFor => 'Hang Out','Friendship'
drugs => 'Marijuana','Acid'
Now let's say User 2 wants to contact User 1. Well, first we need to see if he fits User 1's Restrictions, but that's easy enough EVEN WITH the comma-separated columns, as such:
First I'd get the Target's (User 1) Restrictions:
SELECT * FROM Restrictions WHERE UserID = 1
Now I just put those into respective variables as-is into PHP:
$targetMinAge = $row['minAge'];
$targetLookingFor = $row['lookingFor'];
$targetDrugs = $row['drugs'];
Now we just check if the SENDER (User 2) fits that simple Criteria:
COUNT (*)
FROM Users
WHERE
Users.UserID = 2 AND
Users.minAge >= $targetMinAge AND
Users.lookingFor IN ($targetLookingFor) AND
Users.drugs IN ($targetDrugs)
Finally, if COUNT == 1, User 2 can contact User 1, else they cannot.
How simple was THAT? It just seems really easy and straightforward, so what is the REAL problem with doing it this way as long as I sanitize all inputs to the DB every time a user updates their contact restrictions? Being able to use MySQL's IN function and already storing the multiple values in a format it will understand (e.g. comma-separated values) seems to make things so much easier than having to create join tables for every multiple-choice column. And I gave a simplified example, but what if there are 10 multiple choice columns? Then things start getting messy with so many join tables, whereas the CSV method stays simple.
So, in this case, is it really THAT bad if I use comma-separated values?
****ducks****

You already know the answer.
First off, your PHP code isn't even close to working because it only works if user 2 has only a single value in LookingFor or Drugs. If either of these columns contains multiple comma-separated values then IN won't work even if those values are in the exact same order as User 1's values. What do expect IN to do if the right-hand side has one or more commas?
Therefore, it's not "easy" to do what you want in PHP. It's actually quite a pain and would involve splitting user 2's fields into single values, writing dynamic SQL with many ORs to do the comparison, and then doing an extremely inefficient query to get the results.
Furthermore, the fact that you even need to write PHP code to answer such a relatively simple question about the intersection of two sets means that your design is badly flawed. This is exactly the kind of problem (relational algebra) that SQL exists to solve. A correct design allows you to solve the problem in the database and then simply implement a presentation layer on top in PHP or some other technology.
Do it correctly and you'll have a much easier time.

Suppose User 1 is looking for 'Hang Out','Friendship' and User 2 is looking for 'Friendship','Hang Out'
Your code would not match them up, because 'Friendship','Hang Out' is not in ('Hang Out','Friendship')
That's the real problem here.

best way to store options in a db

i have a table and one of the columns is co_com
this is communication preferences
there are three options (and only ever will be)
i dont want to have a seperate column for these values
so i was thinking of storing them as
sms/email/fax
sms = yes
email = no
fax = yes
which would be stored as: 101
but,
im thinking thats not the best way
what other ways can you see?
yes i am aware that this is a subjective question
but im not sure how else to ask.

You're correct. That is in fact not the best way.
You say you don't want to have separate columns for these values, but that's exactly what you should be doing.

Storing combinations of logical values as coded binary is... 1900's. Seriously, how much does disk space cost these days, and how much do you save by cramming three bits of information into a single number rather than three bytes or characters?
Go on, create three columns with sensible names, and store either 0's and 1's in them, or if your DB is weird that way, story 'Y' and 'N'. But don't do this binary cleverness stuff. It will bite you eventually when you try to write sensible queries.

In my mind, columns is the best way to go, for ease of use if nothing else. The columns are straight forward and won't be confusing in the future. BUT I wouldn't say storing them in a single column as 3 digits is necessarily bad, just confusing. Save yourself the headaches later and do 3 columns.

Another point of view would be to have another table called com_options for example. Have an ID field and an options field, store all of the different communication options combinations in the options field along with a unique ID in the ID field and in your co_com table have an opt_id field referencing the ID in the com_options table. Then use an INNER JOIN to join these 2 tables together.

If your DB is MySQL, then you can use SET datatype.
It's OK, don't worry -- sometimes we should denormalize tables :)
But if your DB isn't MySQL, then you also can use this method, but implementation will be non-your-DB-native. Also bitwise logic on a big bunch of data works very well vs default normalize d one-to-many relation. Because it`s more computer-oriented.

mysql insert multiple data into a single column or multiple row

just want to ask for an opinion regarding mysql.
which one is the better solution?
case1:
store in 1 row:-
product_id:1
attribute_id:1,2,3
when I retreive out the data, I split the string by ','
I saw some database, the store the data in this way, the record is a product, the column is stored product attribute:
a:3:{s:4:"spec";a:2:{i:1;s:6:"black";i:3;s:2:"37";}s:21:"spec_private_value_id";a:2:{i:1;s:11:"12367591683";i:3;s:11:"12367591764";}s:13:"spec_value_id";a:2:{i:1;s:1:"5";i:3;s:2:"29";}}
or
case2:
store in 3 row:-
product_id:1
attribute_id:1
product_id:1
attribute_id:2
product_id:1
attribute_id:3
this is the normal I do, to store 3 rows for the attribute for a record.
In term of performance and space, anyone can tell me which one is better. From what I see is case1 save space, but need to process the data in PHP (or other server side scripting).
case2 is more straight forward, but use spaces.

Save space? Seriously? You're talking about saving bytes when a one terabyte disk goes for 70 dollars?
And maybe you're not even saving bytes. If you store attributes as "12234,23342,243234", that's like 30 bytes for 3 attributes. If you'd store them as smallint, they'd take up 6 bytes.

Depends on whether the attributes are important for searching later, for example.
It may be good if you keep attributes as serialized array in just one field in case you actually don't care about them and in case that you, for example, won't need to run a query to show all products that have one attribute.
However, finding all products that have one attribute would be at least "lousy" in case you have attributes as comma-separated (you need to use LIKE), and in case you store attributes as serialized arrays they are completely unusable for any kind of sorting or grouping using sql queries.
Using separate table for multiple relations between products and attributes is far better if they are of any importance for selecting/grouping/sorting other data.

In case 1, although you save space, there's time spent on splitting the string.
You also must take care of the size of your field: If you have 50 products with 2 attributes and one with 100 attributes, you must make the field ~ varchar(200)... You will not save space at all.
I think case 2 is the best and recommended solution.

You need to consider the SELECT statements that would be using these values. If you wish to search for records that have certain attributes, it is much more efficient to store them in separate columns and index them. Otherwise, you are doing "LIKE" statements which take much longer to process.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.