For the purpose of this question, lets assume we have a many to many relationship set up using three tables:
courses - with a id and name
students - with id and name
enrollment - with course_id and student_id
In rails, if you do a has_and_belongs_to_many, you can do
course.student_ids = [1,2,3,4,5]
And rails says that it will add / delete ids as necessary. I would like to do something similar in PHP. Something like
set_courses($student_id, array(1,2,3,4,5));
What I was wondering is if there is a good way to implement this efficiently in PHP. I can think of a way to do it in 3 queries (one to get current ids, one to delete unnecessary ones, and one to add new ones). Is there a way to do this in one or even two queries?
Thanks!
If I'm understanding you correctly than you can do it in 2:
delete all current enrollments
insert enrollments
I'd use transactions, though, to make sure both are executed.
Related
This question already has answers here:
Many database rows vs one comma separated values row
(4 answers)
Closed 8 years ago.
I'm interested how and why many to many relationship is better than storing the information in one row.
Example: I have two tables, Users and Movies (very big data). I need to establish a relationship "view".
I have two ideas:
Make another column in Users table called "views", where I will store the ids of the movies this user has viewed, in a string. for example: "2,5,7...". Then I will process this information in PHP.
Make new table users_movies (many to many), with columns user_id and movie_id. row with user_id=5 and movie_id=7 means that user 5 has viewed movie 7.
I'm interested which of this methods is better and WHY. Please consider that the data is quite big.
The second method is better in just about every way. Not only will you utilize your DBs indexes to find records faster, it will make modification far far easier.
Approach 1) could answer the question "Which movies has User X viewed" by just having an SQL like "...field_in_set(movie_id, user_movielist) ...". But the other way round ("Which user do have viewed movie x") won't work on an sql basis.
That's why I always would go for approach 2): clear normalized structure, both ways are simple joins.
It's just about the needs you have. If you need performance then you must accept redundancy of the information and add a column. If your main goal is to respect the Normalization paradigma then you should not have redundancy at all.
When I have to do this type of choice I try to estimate the space loss of redundancy vs the frequency of the query of interest and its performance.
A few more thoughts.
In your first situation if you look up a particular user you can easily get the list of ids for the films they have seen. But then would need a separate query to get the details such as the titles of those movies. This might be one query using IN with the list of ids, or one query per film id. This would be inefficient and clunky.
With MySQL there is a possible fudge to join in this situation using the FIND_IN_SET() function (although a down side of this is you are straying in to non standard SQL). You could join your table of films to the users using ON FIND_IN_SET(film.id, users.film_id) > 0 . However this is not going to use an index for the join, and involves a function (which while quick for what it does, will be slow when performed on thousands of rows).
If you wanted to find all the users who had view any film a particular user had viewed then it is a bit more difficult. You can't just use FIND_IN_SET as it requires a single string and a comma separated list. As a single query you would need to join the particular user to the film table to get a lot of intermediate rows, and then join that back against the users again (using FIND_IN_SET) to find the other users.
There are ways in SQL to split up a comma separated list of values, but they are messy and anyone who has to maintain such code will hate it!
These are all fudges. With the 2nd solution these easy to do, and any resulting joins can easily use indexes (and possibly the whole queries can just use indexes without touching the actual data).
A further issue with the first solution is data integretity. You will have to manually check that a film doesn't appear twice for a user (with the 2nd solution this can easily be enforced using a unique key). You also cannot just add a foreign key to ensure that any film id for a user does actually exist. Further you will have to manually ensure that nothing enters a character string in your delimited list of ids.
I have been looking for some optimization tips since I´m doing a RPG modification which uses MySQL to store data by PHP.
I´m using one unique table to store all user information in columns by his unique ID, and I have to store (many?) data for each user. Weapons and other information.
I´m using explode and implode as a method to store the weapons, for example, in one column with the 'text' value. I don´t know if that´s a good practice and I don´t know if I will have performance problems if I get thousands of players doing tons of UPDATES , SELECT , etc, requests.
I read that a Junction table may be better to store the weapons and all those information, but I don´t know if that will get better information that you request it by the explode method.
I mean, I should store all the weapons in a different table, each weapon with his information (each weapon have some information, like different columns, I use multiple explode for that inside the main explode) and the user owner of that weapon to identify the weapon than just have them in one column.
It can be 100 items at least to store, I don´t know if it´s good to make 100 records per user on a different table and call all of them all the time better than just call the column and use explode.
Also I want to improve my skills and knowledge to make the best performance MySQL database I can.
I hope somebody can tell me something.
Thanks, and sorry for my stupid english grammar.
It is almost always best practice to normalize your table data. There are some exceptions to this rule (especially in very high volume databases), but you probably do not need to worry about those exceptions until you get to the point of first understanding how to properly normalize and index your tables.
Typically, try to arrange your tables in a way that mimics real-world objects and their relations to each other.
So, in your case you have users - that is one table. Each user might have multiple weapons. So, you now have a weapons table. Since multiple different users might have the same weapon and each user might have multiple weapons, you have a many-to-many relationship between them, so you should have a table "users_weapons" or similar that does nothing but relate user id's to weapon id's.
Now say the users can all have armor. So now you add an armor table and a users_armor table (as this is likely many-to-many as well).
Just think through the different aspects of your game and try to understand the relationships between them. Make sure you can model these relationships in database tables before you even bother writing any code to actually implement the functionality.
Yes it is better to use several tables instead of one. It's better to db performance, easier to understand, easier to maintain and simplier to use as well.
Let's suggest that one user has several weapons with multiple features(but not unique among all weapons). And in one place in your game you just need to know the value of one specific feature:
doing it by your way you'll need to find user row in users table, fetch on column, explode it several times, and there you have your value, but it complicates even more if you want to change it and save then.
better way is having one table for user details(login, password, email etc), another table which keeps user weapons(name of weapon, image maybe) and table in which will be all features, special powers of weapons kept. You could keep all possible features of all weapons in extra table as well. This way you if you already know user id from user table, you'll have to only join 2 tables in your sql query, and there you got value of feature of specific weapon of user.
Example pseudo schema of tables:
users
user_id
user_name
password
email
weapons
weapon_id
user_id
weapon_name
image
weapons_features
feature_id
weapon_id
feature_name
feature_value
And if you really want to use some ordered data in text field in database encode it to JSON or serialize it. This way you don't have to explode and implode it!
As all guys said, typically you should start from normalized database structure.
If performance is ok, then great, nothing to do.
If not, you can try many different things:
Find and optimize query which works slow.
Denormalize queries - sometimes joins kill performance.
Change data access pattern used in application.
Store data in file system or use NoSQL/polyglot persistence solution.
I have several models that are related to one another through a HABTM relationship.
Workouts has many Exercises |
Exercises has many Workouts |
Exercises has one Logs |
Users has many Exercises_Workouts
All of these table relations are set in one table
What I would like to do:
As you can see, user_id and workout_id are not unique but exercise_id and log_id will always be unique.
I want to find the data for one user then all workouts and have it return all the exercises and their corresponding information as well as each exercise's log information.
Final output would look something like this.
I have tried several methods and none of them have returned positive results. I would also like to hear how someone much more experienced than myself would handle this situation. The only thing I can think of that would possibly get what I want is multiple SELECT statements.
thank you for your help.
cheers!
"All of these table relations are set in one table": What do you call that table? And that's not the usual way to define relationships.
"Workouts has many Exercises | Exercises has many Workouts | Exercises has one Logs | Users has many Exercises_Workouts": Workouts HABTM Exercises, (if logs is not connected to any other table, include its fields in exercises table), User hasMany Workouts.
I have tried several methods and none of them have returned positive results. I would also like to hear how someone much more experienced than myself would handle this situation. The only thing I can think of that would possibly get what I want is multiple SELECT statements.
How much are you familiar with SQL, PHP, and CakePHP? If you are new to all those, it's kinda hard to explain how to do what you want. Show us what approaches you have used so far.
Thank you for your help. I ended up figuring out my own answer.
The reason all these have to be in one table is that I need to be able to access a single ID that relates to all of these branches. Also, each of the branches are not to be updated only referenced. What I ended up doing is using the containable behavior. I actually found this answer looking through my own code. I had a sneaking suspicion that I had come across this problem earlier.
The containable behavior allowed me to go deeper into my association than I was able to before.
Originally I was only pulling data from Workout->Exercise->Log but I also needed LogDay(individual entries). So I used the containable behavior to get that data as well as remove other unnecessary data.
thanks again!
I have many fields which are multi valued and not sure how to store them? if i do 3NF then there are many tables. For example: Nationality.
A person can have single or dual nationality. if dual this means it is a 1 to many. So i create a user table and a user_nationality table. (there is already a nationality lookup table). or i could put both nationalities into the same row like "American, German" then unserialize it on run-time. But then i dont know if i can search this? like if i search for only German people will it show up?
This is an example, i have over 30 fields which are multi-valued, so i assume i will not be creating 61 tables for this? 1 user table, 30 lookup tables to hold each multi-valued item's lookups and 30 tables to hold the user_ values for the multi valued items?
You must also keep in mind that some multi-valued fields group together like "colleges i have studied at" it has a group of fields such as college name, degree type, time line, etc. And a user can have 1 to many of these. So i assume i can create a separate table for this like user_education with these fields, but lets assume one of these fields is also fixed list multi-valued like college campuses i visited then we will end up in a never ending chain of FK tables which isn't a good design for social networks as the goal is it put as much data into as fewer tables as possible for performance.
If you need to keep using SQL, you will need to create these tables. you will need to decide on how far you are willing to go, and impose limitations on the system (such as only being able to specify one campus).
As far as nationality goes, if you will only require two nationalities (worst-case scenario), you could consider a second nationality field (Nationality and Nationality2) to account for this. Of course this only applies to fields with a small maximum number of different values.
If your user table has a lot of related attributes, then one possibility is to create one attributes table with rows like (user_id, attribute_name, attribute_value). You can store all your attributes to one table. You can use this table to fetch attributes for given users, also search by attribute names and values.
The simple solution is to stop using a SQL table. This what NoSQL is deigned for. Check out CouchDB or Mongo. There each value can be stored as a full structure - so this whole problem could be reduced to a single (not-really-)table.
The downside of pretty much any SQL based solution is that it will be slow. Either slow when fetching a single user - a massive JOIN statement won't execute quickly or slow when searching (if you decide to store these values as serialized).
You might also want to look at ORM which will map your objects to a database automatically.
http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software#PHP
This is an example, i have over 30
fields which are multi-valued, so i
assume i will not be creating 61
tables for this?
You're right that 61 is the maximum number of tables, but in reality it'll likely be less, take your own example:
"colleges i have studied at"
"college campuses i visited"
In this case you'll probably only have one "collage" table, so there would be four tables in this layout, not five.
I'd say don't be afraid of using lots of tables if the data set you're modelling is large - just make sure you keep an up to date ERD so you don't get lost! Also, don't get caught up too much in the "link table" paradigm - "link tables" can be "entities" in their own rights, for example you could think of the "colleges i have studied at" link table as an "collage enrolments" table instead, give it it's own primary key, and store each of the times you pay your course fees as rows in a (linked) "collage enrolment payments" table.
I'm building a movies website... I need to display info about each movie, including genres, actors, and a lot of info (IMDB.com like)...
I created a 'movies' table including an ID and some basic information.
For the genres I created a 'genres' table including 2 columns: ID and genre.
Then I use a 'genres2movies' table with two columns:movieID and the genreID, to connect between the genres and the movies tables...
This way, for example, if a movie have 5 different genres I get the movieID in 5 different rows of the'genres2movies' table. Its better than including the genre each time for each movie but...
There is a better way for doing this???
I need to do this also for actors, languages and countries so performance and database size is really important.
Thanks!!!
It sounds like you are following proper normalisation rules at the moment, which is exactly what you want.
However, you may find that if performance is a key factor you may want to de-normalise some parts of your data, since JOINs between tables are relatively expensive operations.
It's usually a trade-off between proper/full normalisation and performance
You are in the right track. That's the way to do many-to-many relationships. Database size won't grow much because you use integers and for speed you must set up correct indexes for those IDs. When making SELECt queries check out the EXPLAIN - it helps to find the bottlenecks of speed.
You're on exactly the right track - this is the correct, normalized, approach.
The only thing I would add is to ensure that your index on the join table (genres2movies) includes both genre and movie id and it is generally worthwhile (depending upon the selects used) to define indexes in both directions - ie. two indexes, ordered genre-id,movie-id and movie-id,genre-id. This ensures that any range select on either genre or movie will be able to use an index to retrieve all the data it needs and not have to resort to a full table scan, or even have to access the table rows themselves.