Join tables vs JSON data in MySQL - php

I am wondering which is the best approach to handle relationships in my data.
My person table currently stores 'name' and 'url' for multiple instances of the following (column names changed for simplicity):
Favourite books
Favourite websites
Favourite shops
Favourite places
So, for each of the above, I have a column in the person table containing a JSON string. For example, for a single person, I might have the following in the 'favourite_shops' column:
[{"name":"john lewis", "url":"http://www.johnlewis.com"},{"name":"tesco", "url":"http://www.tesco.co.uk"}]
And the following in the 'favourite_websites' column
[{"name":"bbc", "url":"http://www.bbc.co.uk"},{"name":"the guardian", "url":"http://www.guardian.co.uk"}]
Obviously these JSON strings can contain many more items, and this is repeated for the other two columns - 'favourite_books' and 'favourite_places'.
So, one query to the database can return all the data I need. But I must then loop through the JSON in AngularJS for display on the front end.
My question is this - would it be beneficial (especially due to the front end processing required), to have 4 join tables instead of the columns listed above?
Or perhaps a single jointable (as the 4 share the same properties, name and url) with an additional column indicating which of the 4 the row refers to?
Things to add:
I am using Laravel 5.2 and Eloquent
It is very unlikely I will ever need to display all 'persons' AND their related data at once, the view will be one person at a time.
It is also very unlikely I would ever need to query specific items - ie 'favourite_shops' named 'john_lewis' - the data is supplementary to a person.
The relationship between person and the 4 options is one to many - persons will not need to share the data, any matching entries are simply additional rows. The actual columns I am using are highly unlikely to have need to share data between persons.
I am building the above as a RESTful api, so all data will be json_encoded.

Related

Many to many vs one row [duplicate]

This question already has answers here:
Many database rows vs one comma separated values row
(4 answers)
Closed 8 years ago.
I'm interested how and why many to many relationship is better than storing the information in one row.
Example: I have two tables, Users and Movies (very big data). I need to establish a relationship "view".
I have two ideas:
Make another column in Users table called "views", where I will store the ids of the movies this user has viewed, in a string. for example: "2,5,7...". Then I will process this information in PHP.
Make new table users_movies (many to many), with columns user_id and movie_id. row with user_id=5 and movie_id=7 means that user 5 has viewed movie 7.
I'm interested which of this methods is better and WHY. Please consider that the data is quite big.
The second method is better in just about every way. Not only will you utilize your DBs indexes to find records faster, it will make modification far far easier.
Approach 1) could answer the question "Which movies has User X viewed" by just having an SQL like "...field_in_set(movie_id, user_movielist) ...". But the other way round ("Which user do have viewed movie x") won't work on an sql basis.
That's why I always would go for approach 2): clear normalized structure, both ways are simple joins.
It's just about the needs you have. If you need performance then you must accept redundancy of the information and add a column. If your main goal is to respect the Normalization paradigma then you should not have redundancy at all.
When I have to do this type of choice I try to estimate the space loss of redundancy vs the frequency of the query of interest and its performance.
A few more thoughts.
In your first situation if you look up a particular user you can easily get the list of ids for the films they have seen. But then would need a separate query to get the details such as the titles of those movies. This might be one query using IN with the list of ids, or one query per film id. This would be inefficient and clunky.
With MySQL there is a possible fudge to join in this situation using the FIND_IN_SET() function (although a down side of this is you are straying in to non standard SQL). You could join your table of films to the users using ON FIND_IN_SET(film.id, users.film_id) > 0 . However this is not going to use an index for the join, and involves a function (which while quick for what it does, will be slow when performed on thousands of rows).
If you wanted to find all the users who had view any film a particular user had viewed then it is a bit more difficult. You can't just use FIND_IN_SET as it requires a single string and a comma separated list. As a single query you would need to join the particular user to the film table to get a lot of intermediate rows, and then join that back against the users again (using FIND_IN_SET) to find the other users.
There are ways in SQL to split up a comma separated list of values, but they are messy and anyone who has to maintain such code will hate it!
These are all fudges. With the 2nd solution these easy to do, and any resulting joins can easily use indexes (and possibly the whole queries can just use indexes without touching the actual data).
A further issue with the first solution is data integretity. You will have to manually check that a film doesn't appear twice for a user (with the 2nd solution this can easily be enforced using a unique key). You also cannot just add a foreign key to ensure that any film id for a user does actually exist. Further you will have to manually ensure that nothing enters a character string in your delimited list of ids.

When working with tick-boxes, should I use a serialized array or a separate database table?

I'm writing a lesson observation system for the school I work at.
The database structure looks like this:
I'm currently developing the input form for the Observations table and quite a few of its fields require tick-boxes. For example, the Focus field can be any number of twelve options; the Positive and Negative field can each be any number of nearly twenty options.
In the image above, I've used VARCHAR to allow for a serialize'd array.
However, I'm starting to wonder if this is the best route to go down, especially considering I may want to do some semi-complex analysis of the data such as top 5 staff for a specific positive attribute such as behaviour (which would involve counting the number of behaviour attributes each member of staff had under the Positive field in the Observations table).
Two questions here;
Should I be using extra tables? For Focus I would imagine the first table to have three fields - ID (key and auto-increment), Observation_ID and Focus_ID. The Focus_ID would correlate with another table called focii or something which would simply have two fields - Focus_ID and Title. I would need two tables for Focus and three tables for Positive/Development (same options but different logging).
If I use these extra database tables, what would my SQL statement look like when trying to retrieve the information from the Observations table including all associated focuses, positives and developments?
Thanks in advance,
Yes, you should always normalise in this kind of situations. Storing serialised data is hellish if you wanna do calculations or even modifications. (How are you gonna remove the 3rd element?)
Example of a query:
SELECT
o.id,
COUNT(f.id) AS number_of_focusses
FROM
observations o
INNER JOIN focuses f
ON f.observation_id = o.id
GROUP BY id
This would give you the number of 'focus records' belonging to each observation.

Many types of data - One table vs multiple tables

I am working on a webapplication, which main functionality will be to present some data to user. However, there are several types of these data and each of them have to be presented in a diffrent way.
For example I have to list 9 results - 3 books, 3 authors and 3 files.
Book is described with (char)TITLE, (text)DESCRIPTION.
Author is described with (char)TITLE, (char)DESCRIPTION.
File is described with (char)URL.
Moreover, every type has fields like ID, DATE, VIEWS etc.
Book and Author are presented with simple HTML code, File use external reader embed on the website.
Should I build three diffrent tables and use JOIN while getting these data or build one table and store all types in there? Which attitude is more efficient?
Additional info - there are going to be really huge amounts of records.
The logical way of doing this is keeping things separate, which is following the 3NF rules o the database design. This gives more flexibility while retrieving different kinds of results specially when there is huge amount of data. Putting everything in a single table is absolutely bad DB practice.
That depends on the structure of your data.
If you have 1:1 relationships, say one book has one author, you can put the records in one row. If one book has several authors or one author has several books you should set up seperate tables books and authors and link those with a table author_has_books where you have both foreign keys. This way you won't store duplicate data and avoid inconsistencies.
More information about db normalization here:
http://en.wikipedia.org/wiki/Database_normalization
Separate them and create a relationship. That way, when you start to get a lot of data, you'll notice a performance boost because you are only calling 3 fields at a time (IE when you are just looking at a book) instead of 7.

MySQL query - getting a list of 'extra' information from JOIN table similar to a nested array

This was a bit difficult to explain in the title, but I should be able to here. I have two tables that look like this:
Table 1:
-id
-created
-last_modified
-title
Table 2:
-id
-parent_id
-type
-value
The structure is somewhat akin to the following: an item from table one can have many attributes associated with it. Each attribute is listed in the second table, with a reference back to the original.
The issue I have, is that I want to be able to get a list of records from table 1 to display in a table (using pagination), but also want to be able to retrieve all the attributes from Table 2 associated with each Table 1 record at the same time, so that I might have the following:
(Table 1) ID1 [Title] has attributes x, y, z
(Table 1) ID2 [Title] has attributes x, y, z
(Table 1) ID3 [Title] has attributes x, y, z
and so on. Ideally I would like to be able to associate each attribute with its type as well...currently with a join I receive multiple rows of the same records (with the joined data different each time), and grouping them together removes some of the joined data entirely.
Essentially what I'm after is an array of attributes to be returned for each record from Table 1 (in some sort).
I'm thinking of using MongoDB for this project as I know I can do it simply with that, but I'm trying to do it with MySQL as that is what the existing platform is using.
I hope I've made sense with what I'm asking :) Any help would be appreciated!
Dan
Sounds like more of a display problem. The joined query is the best way to go. You'd then just have a simple loop in your retrieve/display code to check for when you transition from one Table1 record to another and adjust the output as necessary.
You could retrieve all the child records as single fields using MySQL's group_concat() function, but then you just end up with (basically) a monolithic string of concatenated data, and not the individual records the joinedquery/display loop will provide. group_concat also has length-limits on how much data it'll return (1024 bytes by default), which can be easily hit with large data sets.

How to store multi-valued profile details?

I have many fields which are multi valued and not sure how to store them? if i do 3NF then there are many tables. For example: Nationality.
A person can have single or dual nationality. if dual this means it is a 1 to many. So i create a user table and a user_nationality table. (there is already a nationality lookup table). or i could put both nationalities into the same row like "American, German" then unserialize it on run-time. But then i dont know if i can search this? like if i search for only German people will it show up?
This is an example, i have over 30 fields which are multi-valued, so i assume i will not be creating 61 tables for this? 1 user table, 30 lookup tables to hold each multi-valued item's lookups and 30 tables to hold the user_ values for the multi valued items?
You must also keep in mind that some multi-valued fields group together like "colleges i have studied at" it has a group of fields such as college name, degree type, time line, etc. And a user can have 1 to many of these. So i assume i can create a separate table for this like user_education with these fields, but lets assume one of these fields is also fixed list multi-valued like college campuses i visited then we will end up in a never ending chain of FK tables which isn't a good design for social networks as the goal is it put as much data into as fewer tables as possible for performance.
If you need to keep using SQL, you will need to create these tables. you will need to decide on how far you are willing to go, and impose limitations on the system (such as only being able to specify one campus).
As far as nationality goes, if you will only require two nationalities (worst-case scenario), you could consider a second nationality field (Nationality and Nationality2) to account for this. Of course this only applies to fields with a small maximum number of different values.
If your user table has a lot of related attributes, then one possibility is to create one attributes table with rows like (user_id, attribute_name, attribute_value). You can store all your attributes to one table. You can use this table to fetch attributes for given users, also search by attribute names and values.
The simple solution is to stop using a SQL table. This what NoSQL is deigned for. Check out CouchDB or Mongo. There each value can be stored as a full structure - so this whole problem could be reduced to a single (not-really-)table.
The downside of pretty much any SQL based solution is that it will be slow. Either slow when fetching a single user - a massive JOIN statement won't execute quickly or slow when searching (if you decide to store these values as serialized).
You might also want to look at ORM which will map your objects to a database automatically.
http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software#PHP
This is an example, i have over 30
fields which are multi-valued, so i
assume i will not be creating 61
tables for this?
You're right that 61 is the maximum number of tables, but in reality it'll likely be less, take your own example:
"colleges i have studied at"
"college campuses i visited"
In this case you'll probably only have one "collage" table, so there would be four tables in this layout, not five.
I'd say don't be afraid of using lots of tables if the data set you're modelling is large - just make sure you keep an up to date ERD so you don't get lost! Also, don't get caught up too much in the "link table" paradigm - "link tables" can be "entities" in their own rights, for example you could think of the "colleges i have studied at" link table as an "collage enrolments" table instead, give it it's own primary key, and store each of the times you pay your course fees as rows in a (linked) "collage enrolment payments" table.

Categories