i don't even know if calling it serialized column is right, but i'm going to explain myself, for example, i have a table for users, i want to store the users phone numbers(cellphone, home, office, etc), so, i was thinkin' to make a column for each number type, but at the same time came to my head an idea, what if i save a json string in a single column, so, i will never have a column that probably will never be used and i can turn that string into a php array when reading the data from database, but i would like to hear the goods and bads of this practice, maybe it is just a bad idea, but first i want to know what other people have to say about
thanks
Short Answer, Multiple columns.
Long Answer:
For the love of all that is holy in the world please do not store mutiple data sets in a single text column
I am assuming you will have a table that will either be
+------------------------------+ +----------------------+
| User | cell | office | home | OR | User | JSON String |
+------------------------------+ +----------------------+
First I will say both these solutions are not the best solution but if you were to pick the from the two the first is best. There are a couple reasons mainly though the ability to modify and query specifically is really important. Think about the algrothim to modify the second option.
SELECT `JSON` FROM `table` WHERE `User` = ?
Then you have to do a search and replace in either your server side or client side language
Finally you have to reinsert the JSON string
This solution totals 2 queries and a search and replace algorithm. No Good!
Now think about the first solution.
SELECT * FROM `table` WHERE `User` = ?
Then you can do a simple JSON encode to send it down
To modify you only need one Query.
UPDATE `table` SET `cell` = ? WHERE `User` = ?
to update more than one its again a simple single query
UPDATE `table` SET `cell` = ?, `home` = ? WHERE `User` = ?
This is clearly better but it is not best
There is a third solution Say you want a user to be able to insert an infinite number of phone numbers.
Lets use a relation table for that so now you have two tables.
+-------------------------------------+
+---------+ | Phone |
| Users | +-------------------------------------+
+---------+ | user_name| phone_number | type |
| U_name | +-------------------------------------+
+---------+
Now you can query all the phone numbers of a user with something like this
Now you can query the table via a join
SELECT Users., phone. FROM Phone, Users WHERE phone.user_name = ? AND Users.U_name = ?
Inserts are just as easy and type checking is easy too.
Remember this is a simple example but SQL really provides a ton of power to your data-structure you should use it rather than avoiding it
I would only do this with non-essential data, for example, the user's favorite color, favorite type of marsupial (obviously 'non-essential' is for you to decide). The problem with doing this for essential data (phone number, username, email, first name, last name, etc) is that you limit yourself to what you can accomplish with the database. These include indexing fields, using ORDER BY clauses, or even searching for a specific piece of data. If later on you realize you need to perform any of these tasks it's going to be a major headache.
Your best best in this situation is using a relational table for 1 to many objects - ex UserPhoneNumbers. It would have 3 columns: user_id, phone_number, and type. The user_id lets you link the rows in this table to the appropriate User table row, the phone_number is self explanatory, and the type could be 'home', 'cell', 'office', etc. This lets you still perform the tasks I mentioned above, and it also has the added benefit of not wasting space on empty columns, as you only add rows to this table as you need to.
I don't know how familiar you are with MySQL, but if you haven't heard of database normalization and query JOINs, now is a good time to start reading up on them :)
Hope this helps.
If you work with json, there are more elegant ways than MySQL. Would recommend to use either another Database working better with json, like mongoDB or a wrapper for SQL like Persevere, http://www.persvr.org/Documentation (see "Perstore")
I'm not sure what the advantages of this approach would be. You say "so, i will never have a column that probably will never be used..." What I think you meant was (in your system) that sometimes a user may not have a value for each type of phone number available, and that being the case, why store records with empty columns?
Storing records with some empty columns is not necessarily bad. However, if you wanted to normalize your database, you could have a separate table for user_phonenumber, and create a 1:many relationship between user and user_phonenumber records. The user_phonenumber table would basically have four columns:
id (primary key)
userid (foreign key to user table)
type (e.g. cellphone, home, office, etc.)
value (the phone number)
Constraints would be that id is a primary key, userid is a foreign key for user.id, and type would be an enum (of all possible phone number types).
Related
When storing relationship data for a user (potentially a thousand friends per user), would it be faster to create a new row for each relationship, or to concatenate all of of their friends into a string and then parse that later?
I.e.
Primary id | Friend1ID | Friend2ID|
1| 234| 5789|
2| 5789| 234|
Where the IDs are references to primary IDs in a 'Users' table.
Or for the 'Users' table to just have a column called friends which may look like this:
Primary id | Friend1ID |
234| 5789.123.8474|
5789| 234|
I'm of the understanding that string concatenation and parsing is generally quite slow, so I'd be tempted to lean towards the first method. However as the number of users grows, this then becomes a case of selecting one row and parsing it V searching millions of rows for rows which match the WHERE criteria.
Is one method distinctly faster than the other? Particularly as the number of users grows.
You should use a second table to store the friends.
Users Table
----------
userid | username
1 | Bob
2 | Mike
3 | John
Users Friends Table
--------------------
userid | friend_id
1 | 2
3 | 2
Here you can see that Mike is friends with both Bob and John.... This is of course a very simply demonstration.
Your second option will not scale, some people may have hundreds of thousands of friends, storing each Id in a single field is going to cause a headache further down the line. adding friends, removing friends. working out complex relationships between people. Lots of over head.
Querying millions of records with a WHERE clause on a properly indexed table should take no more than a second, the first option is the better one.
The "correct" way would probably be keeping multiple rows. This allows for much easier statistical analysis and more complex queries (like friends of friends) without any hacky stuff. Integer storage size is also often smaller than string storage, even though you're repeating one ID - especially if you use an appropriately sized integer store (like mediumint).
It's also more maintainable, scalable (if they start getting a damn lot of friends) export and importable. The speed gain from concatenation, if any, wouldn't be worth the rest of the benefits.
If you wanted for instance to search if Bob was a friend of Jane, this would be a single row lookup in the multiple row implementation, or in the single row implementation: get Bob's row, decode field, loop through field looking for Jane - found Jane. DBMS optimisation and indexing would make the multiple row implementation much faster in this case - if you had the primary key as (id, friendid) then it'd be pretty much instantaneous as the table would probably be hashed on that key.
I believe the proper way to do it which might be more faster is two do a two columns table
user | friend
1 | 2
1 | 3
It will simple and will make queering and updating much easier and you can have as many relationship as you want.
Don't over complicate the problem...
... Asking for the more "correct" way is wrong itself.
It depends based on case.
If you have low access rate to your web application having more rows won't change anything on the other side of the coins (i'm not English), on large and medium application access it's maybe better to have the minimal access to the db possible.
To obtain this as you've already thinked you can concatenate the values and then split them on login of the user and then put everything into the $_SESSION supervar.
At least this is what i think.
Description:
I am building a rating system with mysql/php. I am confused as to how I would set up the database.
Here is my article setup:
Article table:
id | user_id | title | body | date_posted
This is my assumed rating table:
Rating table:
id | article_id | score | ? user_id ?
Problem:
I don't know if I should place the user_id in the rating table. My plan is to use a query like this:
SELECT ... WHERE user_id = 1 AND article_id = 10
But I know that it's redundant data as it stores the user_id twice. Should I figure out a JOIN on the tables or is the structure good as is?
It depends. I'm assuming that the articles are unique to individual users? In that case, I could retain the user_id in your rating table and then just alter your query to:
SELECT ... WHERE article_id = 10
or
SELECT ... WHERE user_id = 1
Depending on what info you're trying to pull.
You're not "storing the user_id twice" so much as using the user_id to link the article to unique data associated to the user in another table. You're taking the right approach, except in your query.
I don't see anything wrong with this approach. The user id being stored twice is not particularly relevant since one is regarding a rating entry and the other, i assume, is related to the article owner.
The benefit of this way is you can prevent multiple scores being recorded for each user by making article_id and user_id unique and use replace into to manage scoring.
There are many things to elaborate on this depending on whether or not this rating system needs to be intelligent to prevent gaming, etc. How large the user base is, etc.
I bet for any normal person, this setup would not be detrimental to even a relatively large scale system.
... semi irrelevant:
Just FYI, depending on the importance and gaming aspects of this score, you could use STDDEV() to fetch an average factoring the standard deviation on the score column...
SELECT STDDEV(`score`) FROM `rating` WHERE `article_id` = {article_id}
That would factor outliers supposing you cared whether or not it looked like people were ganging up on a particular article to shoot it down or praise it without valid cause.
you should not, due to 3rd normal form, you need to keep the independence.
"The third normal form (3NF) is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971.[1] Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:
The relation R (table) is in second normal form (2NF)
Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every superkey of R."
Source here: http://en.wikipedia.org/wiki/Third_normal_form
First normal Form: http://en.wikipedia.org/wiki/First_normal_form
Second normal Form: http://en.wikipedia.org/wiki/Second_normal_form
you should take a look to normalization and E/R model it will help you a lot.
normalization in wikipedia: http://en.wikipedia.org/wiki/Database_normalization
Soon I'll be working on catalog(php+mysql) that will have multilang content support. And now I'm considering the best approach to design the database structure. At the moment I see 3 ways for multilang handling:
1) Having separate tables for each language specific data, i.e. schematicly it'll look like this:
There will be one table Main_Content_Items, storing basic data that cannot be translated like ID, creation_date, hits, votes on so on - it will be only one and will refer to all languages.
And here are tables that will be dublicated for each language:
Common_Data_LANG table(example: common_data_en_us) (storing common/"static" fields that can be translated, but are present for eny catalog item: title, desc and so on...)
Extra_Fields_Data_LANG table (storing extra fields data that can be translated, but can be different for custom item groups, i.e. like: | id | item_id | field_type | value | ...)
Then on items request we will look in table according to user/default language and join translatable data with main_content table.
Pros:
we can update "main" data(i.e. hits, votes...) that are updated most often with only one query
we don't need o dublicate data 4x or more times if we have 4 or more languages in comparison with structure using only one table with 'lang' field. So MySql queries would take less time to go through 100000(for example) records catalog rather then 400000 or more
Cons:
+2 tables for each language
2) Using 'lang' field in content tables:
Main_Content_Items table (storing basic data that cannot be translated like ID, creation_date, hits, votes on so on...)
Common_Data table (storing common/"static" fields that can be translated, but are present for eny catalog item: | id | item_id | lang | title | desc | and so on...)
Extra_Fields_Data table (storing extra fields data that can be translated, but can be different for custom item groups, i.e. like: | id | item_id | lang | field_type | value | ...)
So we'll join common_data and extra_fields to main_content_items according to 'lang' field.
Pros:
we can update "main" data(i.e. hits, votes...) that are updated most often with only one query
we only 3 tables for content data
Cons:
we have custom_data and extra_fields table filled with data for all languages, so its X time bigger and queries run slower
3) Same as 2nd way, but with Main_Content_Items table merged with Common_Data, that has 'lang' field:
Pros:
...?
Cons:
we need to update update "main" data(i.e. hits, votes...) that are updated most often with for every language
we have custom_data and extra_fields table filled with data for all languages, so its X time bigger and queries run slower
Will be glad to hear suggestions about "what is better" and "why"? Or are there better ways?
Thanks in advance...
I've given a similar anwer in this question and highlighted the advantages of this technique (it would be, for example, important for me to let the application decide on the language and build the query accordingly by only changing the lang parameter in the WHERE clause of the SQL query.
This get's pretty close to your second solution. I didn't quite got the "extra_fields" but if it makes sense, you could(!) merge it into the common_data table. I would advise you against the first idea since there will be too many tables and it can be easy to lose track about the items in there.
To your edit: I still consider the second approach the better one (it's my optinion so it's relative ;)) I'm no expert on optimization but I think that with proper indexes and proper table structure speed should be not be a problem. As always, the best way to find the most effective way is doing both methods and see which is best since speed will vary from data, structure, ....
I have a table which would contain information about a certain month, and one column in that row would have mysql row id's for another table in it to grab multiple information from
is there a more efficent way to get the information than exploding the ids and doing seperate sql queryies on each... here is an example:
Row ID | Name | Other Sources
1 Test 1,2,7
the Other Sources has the id's of the rows from the other table which are like so
Row ID | Name | Information | Link
1 John | No info yet? | http://blah.com
2 Liam | No info yet? | http://blah.com
7 Steve| No info yet? | http://blah.com
and overall the information returned wold be like the below
Hi this page is called test... here is a list of our sources
- John (No info yet?) find it here at http://blah.com
- Liam (No info yet?) find it here at http://blah.com
- Steve (No info yet?) find it here at http://blah.com
i would do this... i would explode the other sources by , and then do a seperate SQL query for each, i am sure there could be a better way?
Looks like a classic many-to-many relationship. You have pages and sources - each page can have many sources and each source could be the source for many pages?
Fortunately this is very much a solved problem in relational database design. You would use a 3rd table to relate the two together:
Pages (PageID, Name)
Sources (SourceID, Name, Information, Link)
PageSources (PageID, SourceID)
The key for the "PageSources" table would be both PageID and SourceID.
Then, To get all the sources for a page for example, you would use this SQL:
SELECT s.*
FROM Sources s INNER JOIN PageSources ps ON s.SourceID = ps.SourceID
AND ps.PageID = 1;
Not easily with your table structure. If you had another table like:
ID Source
1 1
1 2
1 7
Then join is your friend. With things the way they are, you'll have to do some nasty splitting on comma-separated values in the "Other Sources" field.
Maybe I'm missing something obvious (been known to), but why are you using a single field in your first table with a comma-delimited set of values rather than a simple join table. The solution if do that is trivial.
The problem with these tables is that having a multi-valued column doesn't work well with SQL. Tables in this format are considered to be normalized, as multi-valued columns are forbidden in First Normal Form and above.
First Normal Form means...
There's no top-to-bottom ordering to the rows.
There's no left-to-right ordering to the columns.
There are no duplicate rows.
Every row-and-column intersection contains exactly one
value from the applicable domain (and
nothing else).
All columns are regular [i.e. rows have no hidden components such as
row IDs, object IDs, or hidden timestamps].
—Chris Date, "What First Normal Form Really Means", pp. 127-8[4]
Anyway, the best way to do it is to have a many to many relationship. This is done by putting a third table in the middle, like Dominic Rodger does in his answer.
Here is the scenario 1.
I have a table called "items", inside the table has 2 columns, e. g. item_id and item_name.
I store my data in this way:
item_id | item_name
Ss001 | Shirt1
Sb002 | Shirt2
Tb001 | TShirt1
Tm002 | TShirt2
... etc, i store in this way:
first letter is the code for clothes, i.e S for shirt, T for tshirt
second letter is size, i.e s for small, m for medium and b for big
Lets say in my items table i got 10,000 items. I want to do fast retrieve, lets say I want to find a particular shirt, can I use:
Method1:
SELECT * from items WHERE item_id LIKE Sb99;
or should I do it like:
Method2:
SELECT * from items WHERE item_id LIKE S*;
*Store the result, then execute second search for the size, then third search for the id. Like the hash table concept.
What I want to achieve is, instead of search all the data, I want to minimize the search by search the clothes code first, follow by size code and then id code. Which one is better in term of speed in mysql. And which one is better in long run. I want to reduce the traffic and not to disturb the database so often.
Thanks guys for solving my first scenario. But another scenario comes in:
Scenario 2:
I am using PHP and MySQL. Continue from the preivous story. If my users table structure is like this:
user_id | username | items_collected
U0001 | Alex | Ss001;Tm002
U0002 | Daniel | Tb001;Sb002
U0003 | Michael | ...
U0004 | Thomas | ...
I store the items_collected in id form because one day each user can collect up to hundreds items, if I store as string, i.e. Shirt1, pants2, ..., it would required a very large amount of database spaces (imagine if we have 1000 users and some items name are very long).
Would it be easier to maintain if I store in id form?
And if lets say, I want to display the image, and the image's name is the item's name + jpg. How to do that? Is it something like this:
$result = Select items_collected from users where userid= $userid
Using php explode:
$itemsCollected = explode($result, ";");
After that, matching each item in the items table, so it would like:
shirt1, pants2 etc
Den using loop function, loop each value and add ".jpg" to display the image?
The first method will be faster - but IMO it's not the right way of doing it. I'm in agreement with tehvan about that.
I'd recommend keeping the item_id as is, but add two extra fields one for the code and one for the size, then you can do:
select * from items where item_code = 'S' and item_size = 'm'
With indexes the performance will be greatly increased, and you'll be able to easily match a range of sizes, or codes.
select * from items where item_code = 'S' and item_size IN ('m','s')
Migrate the db as follows:
alter table items add column item_code varchar(1) default '';
alter table items add column item_size varchar(1) default '';
update items set item_code = SUBSTRING(item_id, 1, 1);
update items set item_size = SUBSTRING(item_id, 2, 1);
The changes to the code should be equally simple to add. The long term benefit will be worth the effort.
For scenario 2 - that is not an efficient way of storing and retrieving data from a database. When used in this way the database is only acting as a storage engine, by encoding multiple data into fields you are precluding the relational part of the database from being useful.
What you should do in that circumstance is to have another table, call it 'items_collected'. The schema would be along the lines of
CREATE TABLE items_collected (
id int(11) NOT NULL auto_increment KEY,
userid int(11) NOT NULL,
item_code varchar(10) NOT NULL,
FOREIGN KEY (`userid`) REFERENCES `user`(`id`),
FOREIGN KEY (`itemcode`) REFERENCES `items`(`item_code`)
);
The foreign keys ensure that there is Referential integrity, it's essential to have referential integrity.
Then for the example you give you would have multiple records.
user_id | username | items_collected
U0001 | Alex | Ss001
U0001 | Alex | Tm002
U0002 | Daniel | Sb002
U0002 | Daniel | Tb001
U0003 | Michael | ...
U0004 | Thomas | ...
The first optimization would be splitting the id into three different fields:
one for type, one for size, one for the current id ending (whatever the ending means)
If you really want to keep the current structure, go for the result straight away (option 1).
If you want to speed up for results you should split up the column into multiple columns, one for each property.
Step 2 is to create an index for each column. Remember that mysql only uses one index per table per query. So if you really want speedy queries and your queries vary a lot with these properties, then you might want to create an index on (type,size,ending), (type,ending,size) etc.
For example a query with
select * from items where type = s and size = s and ending = 001
Can benefit from the index (type,size,ending) but:
select * from items where size = s and ending = 001
Can not, because the index will only be used in order, so it needs type, then size, then ending. This is why you might want multiple indexes if you really want fast searches.
One other note, generally it is not a good idea to use * in queries, but to select only the columns you need.
You need to have three columns for the model, size and id, and index them this way:
CREATE INDEX ix_1 ON (model, size, id)
CREATE INDEX ix_2 ON (size, id)
CREATE INDEX ix_3 ON (id, model)
Then you'll be able to search efficiently on any subset of the parameters:
model-size-id, model-size and model queries will use ix_1;
size-id and size queries will use ix_2;
model-id and id queries will use ix_3
Index on your column as it is now is equivalent to ix_1, and you can use this index to efficiently search on the appropriate conditions (model-size-id, model-size and model).
Actually, there is a certain access path called INDEX SKIN SCAN that may be used to search on non-first columns of a composite index, but MySQL does not support it AFAIK.
If you need to stick to your current design, you need to index the field and use queries like:
WHERE item_id LIKE #model || '%'
WHERE item_id LIKE #model || #size || '%'
WHERE item_id = #model || #size || #id
All these queries will use the index if any.
There is not need to put in into multiple queries.
I'm comfortable that you've designed your item_id to be searchable with a "Starts with" test. Indexes will solve that quickly for you.
I don't know MySQL, but in MSSQL having an index on a "Size" column that only has choices of S, M, L most probably won't achieve anything, the index won't be used because the values it contains are not sufficiently selective - i.e. its quicker to just go through all the data rather than "Find the first S entry in the index, now retrieve the data page for that row ..."
The exception is where the query is covered by the index - i.e. several parts of the WHERE clause (and indeed, all of them and also the SELECT columns) are included in the index. In this instance, however, the first field in the index (in MSSQL) needs to be selective. So put the column with the most distinct values first in the index.
Having said that if your application has a picklist for Size, Colour, etc. you should have those data attributes in separate columns in the record - and separate tables with lists of all the available Colours and Sizes, and then you can validate that the Colour / Size given to a Product is actually defined in the Colour / Size tables. Cuts down the Garbage-in / Garbage-out problem!
Your item_selected needs to be in a separate table so that it is "normalised". Don't store a delimited list in a single column, store it using individual rows in a separate table
Thus your USERS table will contain user_id & username
Your, new, items_collected table will contains user_id & item_id (and possibly also Date Purchased or Invoice Number)
You can then say "What did Alex buy" (your design has that) and also "Who bought Ss001" (which, in your design, would require ploughing through all the rows in your USERS table and splitting out the items_collected to find which ones contained Ss001 [1])
[1] Note that using LIKE wouldn't really be safe for that because you might have an item_id of "Ss001XXX" which would match WHERE items_collected LIKE '%Ss001%'