i have a table and one of the columns is co_com
this is communication preferences
there are three options (and only ever will be)
i dont want to have a seperate column for these values
so i was thinking of storing them as
sms/email/fax
sms = yes
email = no
fax = yes
which would be stored as: 101
but,
im thinking thats not the best way
what other ways can you see?
yes i am aware that this is a subjective question
but im not sure how else to ask.
You're correct. That is in fact not the best way.
You say you don't want to have separate columns for these values, but that's exactly what you should be doing.
Storing combinations of logical values as coded binary is... 1900's. Seriously, how much does disk space cost these days, and how much do you save by cramming three bits of information into a single number rather than three bytes or characters?
Go on, create three columns with sensible names, and store either 0's and 1's in them, or if your DB is weird that way, story 'Y' and 'N'. But don't do this binary cleverness stuff. It will bite you eventually when you try to write sensible queries.
In my mind, columns is the best way to go, for ease of use if nothing else. The columns are straight forward and won't be confusing in the future. BUT I wouldn't say storing them in a single column as 3 digits is necessarily bad, just confusing. Save yourself the headaches later and do 3 columns.
Another point of view would be to have another table called com_options for example. Have an ID field and an options field, store all of the different communication options combinations in the options field along with a unique ID in the ID field and in your co_com table have an opt_id field referencing the ID in the com_options table. Then use an INNER JOIN to join these 2 tables together.
If your DB is MySQL, then you can use SET datatype.
It's OK, don't worry -- sometimes we should denormalize tables :)
But if your DB isn't MySQL, then you also can use this method, but implementation will be non-your-DB-native. Also bitwise logic on a big bunch of data works very well vs default normalize d one-to-many relation. Because it`s more computer-oriented.
Related
I have been looking for some optimization tips since I´m doing a RPG modification which uses MySQL to store data by PHP.
I´m using one unique table to store all user information in columns by his unique ID, and I have to store (many?) data for each user. Weapons and other information.
I´m using explode and implode as a method to store the weapons, for example, in one column with the 'text' value. I don´t know if that´s a good practice and I don´t know if I will have performance problems if I get thousands of players doing tons of UPDATES , SELECT , etc, requests.
I read that a Junction table may be better to store the weapons and all those information, but I don´t know if that will get better information that you request it by the explode method.
I mean, I should store all the weapons in a different table, each weapon with his information (each weapon have some information, like different columns, I use multiple explode for that inside the main explode) and the user owner of that weapon to identify the weapon than just have them in one column.
It can be 100 items at least to store, I don´t know if it´s good to make 100 records per user on a different table and call all of them all the time better than just call the column and use explode.
Also I want to improve my skills and knowledge to make the best performance MySQL database I can.
I hope somebody can tell me something.
Thanks, and sorry for my stupid english grammar.
It is almost always best practice to normalize your table data. There are some exceptions to this rule (especially in very high volume databases), but you probably do not need to worry about those exceptions until you get to the point of first understanding how to properly normalize and index your tables.
Typically, try to arrange your tables in a way that mimics real-world objects and their relations to each other.
So, in your case you have users - that is one table. Each user might have multiple weapons. So, you now have a weapons table. Since multiple different users might have the same weapon and each user might have multiple weapons, you have a many-to-many relationship between them, so you should have a table "users_weapons" or similar that does nothing but relate user id's to weapon id's.
Now say the users can all have armor. So now you add an armor table and a users_armor table (as this is likely many-to-many as well).
Just think through the different aspects of your game and try to understand the relationships between them. Make sure you can model these relationships in database tables before you even bother writing any code to actually implement the functionality.
Yes it is better to use several tables instead of one. It's better to db performance, easier to understand, easier to maintain and simplier to use as well.
Let's suggest that one user has several weapons with multiple features(but not unique among all weapons). And in one place in your game you just need to know the value of one specific feature:
doing it by your way you'll need to find user row in users table, fetch on column, explode it several times, and there you have your value, but it complicates even more if you want to change it and save then.
better way is having one table for user details(login, password, email etc), another table which keeps user weapons(name of weapon, image maybe) and table in which will be all features, special powers of weapons kept. You could keep all possible features of all weapons in extra table as well. This way you if you already know user id from user table, you'll have to only join 2 tables in your sql query, and there you got value of feature of specific weapon of user.
Example pseudo schema of tables:
users
user_id
user_name
password
email
weapons
weapon_id
user_id
weapon_name
image
weapons_features
feature_id
weapon_id
feature_name
feature_value
And if you really want to use some ordered data in text field in database encode it to JSON or serialize it. This way you don't have to explode and implode it!
As all guys said, typically you should start from normalized database structure.
If performance is ok, then great, nothing to do.
If not, you can try many different things:
Find and optimize query which works slow.
Denormalize queries - sometimes joins kill performance.
Change data access pattern used in application.
Store data in file system or use NoSQL/polyglot persistence solution.
So I've got this form with an array of checkboxes to search for an event. When you create an event, you choose one or more of the checkboxes and then the event gets created with these "attributes". What is the best way to store it in a MySQL database if I want to filter results when searching for these events? Would creating several columns with boolean values be the best way? Or possibly a new table with the checkbox values only?
I'm pretty sure selializing is out of the question because I wouldn't be able to query the selialized string for whether the checkbox was ticked or not, right?
Thanks
You can use the set datatype or a separate table that you join. Either will work.
I would not do a bunch of columns though.
You can search the set easily using FIND_IN_SET(), but it's not indexed, so it depends on how many rows you expect (up to a few thousand is probably OK - it's a very fast search).
The normal solution is a separate table with one column being the ID of the event, and the second column being the attribute using the enum datatype (don't use text, it's slower).
create separate columns or you can store them all in one column using bit mask
One way would be to create a new table with a column for each checkbox, as already described by others. I'll not add to that.
However, another way is to use a bitmask. You have just one column myCheckboxes and store the values as an int. Then in the code you have constants or another appropriate way to store the correlation between each checkbox and it's bit. I.e.:
CHECKBOX_ONE 1
CHECKBOX_TWO 2
CHECKBOX_THREE 4
CHECKBOX_FOUR 8
...
CHECKBOX_NINE 256
Remember to always use the next power of two for new values, otherwise you'll get values that overlap.
So, if the first two checkboxes have been checked you should have 3 as the value of myCheckboxes for that row. If you have ONE and FOUR checked you'd have 9 as the values of myCheckboxes, etc. When you want to see which rows have say checkboxes ONE, THREE and NINE checked your query would be like:
SELECT * FROM myTable where myCheckboxes & 1 AND myCheckboxes & 4 AND myCheckboxes & 256;
This query will return only rows having all this checkboxes marked as checked.
You should also use bitwise operations when storing and reading the data.
This is a very efficient way when it comes to speed. You have just a single column, probably just a smallint, and your searches are pretty fast. This can make a big difference if you have several different collections of checkboxes that you want to store and search trough. However, this makes the values harder to understand. If you see the value 261 in the DB it'll not be easy for a human to immeditely see that this means checkboxes ONE, THREE and NINE have been checked whereas it is much easier for a human seeing separate columns for each checkbox. This normally is not an issue, cause humans don't need to manually poke the database, but it's something worth mentioning.
From the coding perspective it's not much of a difference, but you'll have to be careful not to corrupt the values, cause it's not that hard to mess up a single int, it's magnitudes easier than screwing the data than when it's stored in different columns. So test carefully when adding new stuff. All that said, the speed and low memory benefits can be very big if you have a ton of different collections.
OK, I know the technical answer is NEVER.
BUT, there are times when it seems to make things SO much easier with less code and seemingly few downsides, so please here me out.
I need to build a Table called Restrictions to keep track of what type of users people want to be contacted by and that will contain the following 3 columns (for the sake of simplicity):
minAge
lookingFor
drugs
lookingFor and drugs can contain multiple values.
Database theory tells me I should use a join table to keep track of the multiple values a user might have selected for either of those columns.
But it seems that using comma-separated values makes things so much easier to implement and execute. Here's an example:
Let's say User 1 has the following Restrictions:
minAge => 18
lookingFor => 'Hang Out','Friendship'
drugs => 'Marijuana','Acid'
Now let's say User 2 wants to contact User 1. Well, first we need to see if he fits User 1's Restrictions, but that's easy enough EVEN WITH the comma-separated columns, as such:
First I'd get the Target's (User 1) Restrictions:
SELECT * FROM Restrictions WHERE UserID = 1
Now I just put those into respective variables as-is into PHP:
$targetMinAge = $row['minAge'];
$targetLookingFor = $row['lookingFor'];
$targetDrugs = $row['drugs'];
Now we just check if the SENDER (User 2) fits that simple Criteria:
COUNT (*)
FROM Users
WHERE
Users.UserID = 2 AND
Users.minAge >= $targetMinAge AND
Users.lookingFor IN ($targetLookingFor) AND
Users.drugs IN ($targetDrugs)
Finally, if COUNT == 1, User 2 can contact User 1, else they cannot.
How simple was THAT? It just seems really easy and straightforward, so what is the REAL problem with doing it this way as long as I sanitize all inputs to the DB every time a user updates their contact restrictions? Being able to use MySQL's IN function and already storing the multiple values in a format it will understand (e.g. comma-separated values) seems to make things so much easier than having to create join tables for every multiple-choice column. And I gave a simplified example, but what if there are 10 multiple choice columns? Then things start getting messy with so many join tables, whereas the CSV method stays simple.
So, in this case, is it really THAT bad if I use comma-separated values?
****ducks****
You already know the answer.
First off, your PHP code isn't even close to working because it only works if user 2 has only a single value in LookingFor or Drugs. If either of these columns contains multiple comma-separated values then IN won't work even if those values are in the exact same order as User 1's values. What do expect IN to do if the right-hand side has one or more commas?
Therefore, it's not "easy" to do what you want in PHP. It's actually quite a pain and would involve splitting user 2's fields into single values, writing dynamic SQL with many ORs to do the comparison, and then doing an extremely inefficient query to get the results.
Furthermore, the fact that you even need to write PHP code to answer such a relatively simple question about the intersection of two sets means that your design is badly flawed. This is exactly the kind of problem (relational algebra) that SQL exists to solve. A correct design allows you to solve the problem in the database and then simply implement a presentation layer on top in PHP or some other technology.
Do it correctly and you'll have a much easier time.
Suppose User 1 is looking for 'Hang Out','Friendship' and User 2 is looking for 'Friendship','Hang Out'
Your code would not match them up, because 'Friendship','Hang Out' is not in ('Hang Out','Friendship')
That's the real problem here.
I'm new to web development and could really use some advice. Thank you all in advance!
I have a form where I ask users to rate their experience in several areas (Mental, Physical, Fun) from 1 - 10. There are about 10 areas in all. Is it worth it to make a table of areas and a intersection table with areaID and userID? In this case, wouldn't it be easier to just create one table with 11 columns (one for each area plus a userID)?
I understand that a table of 'areas' with a key of areaID would be the most robust, and allow me to later add more areas but I really don't see myself doing that. Also, as I only expect <1000 users, would adding another column to a MySQL db be a lot of work?
Finally, what is the cutoff point where I actually should use a many-to-many structure? Later in the form I have users rate ~30 fields from 1 - 4. Should I use the two table system there?
Thanks again!
It'd be better to use the separate child table. The SQL overhead for a join is minimal, and you get total flexibility in how many/few entries you want to allow, without ever having to change the database structure.
It may only be 10 areas now, but PHBs are notorious for changing their minds about something that was "bedrock" on the hour before launch.
I would always model a many-to-many relationship with a proper intersection table because you can never predict what the future will bring.
Suppose you want to count the number of areas per user. How would you do that if you use a single table with 11 columns?
Implement this with the intersection table. You will be glad you did.
Both makes sense, but I would choose the many-to-many approach as you can then specify in the areas table the name of each area (and maybe add some comments what 1 or 10 for that specific area means, etc..)
There's no wrong answer to this, but consider this: are you sure that that number will remain fixed at ten?
Personally, I would still use the many-to-many relationship. It's the more elegant solution, and, in my mind, better. However, doing a job well is often a waste of time.
just want to ask for an opinion regarding mysql.
which one is the better solution?
case1:
store in 1 row:-
product_id:1
attribute_id:1,2,3
when I retreive out the data, I split the string by ','
I saw some database, the store the data in this way, the record is a product, the column is stored product attribute:
a:3:{s:4:"spec";a:2:{i:1;s:6:"black";i:3;s:2:"37";}s:21:"spec_private_value_id";a:2:{i:1;s:11:"12367591683";i:3;s:11:"12367591764";}s:13:"spec_value_id";a:2:{i:1;s:1:"5";i:3;s:2:"29";}}
or
case2:
store in 3 row:-
product_id:1
attribute_id:1
product_id:1
attribute_id:2
product_id:1
attribute_id:3
this is the normal I do, to store 3 rows for the attribute for a record.
In term of performance and space, anyone can tell me which one is better. From what I see is case1 save space, but need to process the data in PHP (or other server side scripting).
case2 is more straight forward, but use spaces.
Save space? Seriously? You're talking about saving bytes when a one terabyte disk goes for 70 dollars?
And maybe you're not even saving bytes. If you store attributes as "12234,23342,243234", that's like 30 bytes for 3 attributes. If you'd store them as smallint, they'd take up 6 bytes.
Depends on whether the attributes are important for searching later, for example.
It may be good if you keep attributes as serialized array in just one field in case you actually don't care about them and in case that you, for example, won't need to run a query to show all products that have one attribute.
However, finding all products that have one attribute would be at least "lousy" in case you have attributes as comma-separated (you need to use LIKE), and in case you store attributes as serialized arrays they are completely unusable for any kind of sorting or grouping using sql queries.
Using separate table for multiple relations between products and attributes is far better if they are of any importance for selecting/grouping/sorting other data.
In case 1, although you save space, there's time spent on splitting the string.
You also must take care of the size of your field: If you have 50 products with 2 attributes and one with 100 attributes, you must make the field ~ varchar(200)... You will not save space at all.
I think case 2 is the best and recommended solution.
You need to consider the SELECT statements that would be using these values. If you wish to search for records that have certain attributes, it is much more efficient to store them in separate columns and index them. Otherwise, you are doing "LIKE" statements which take much longer to process.