Using Json format to decrease JOINs in MySQL

Using Json format to decrease JOINs in MySQL - php

I have many tables in my database, an example is the table fs_user, the following is an extract of the table columns (dealing with privacy settings):
4 Columns from the table fs_user:
show_email_to
show_address_to
show_gender_to
show_interested_in_to
Like many social networks, I need not only to specify which data is private and which is public, but also which data is available to a chosen users, and which one is not.
As I have about 30 data like the 4 data above, I think it will be bad to create one table for every data, and make a many to many relation with the table fs_user.
This is why, I got the idea of saving this data in a Json form for every column (whose type=TEXT), example
show_email_to => {1:'ALL',2:'BUT',3:'3'}
This data means, show email to all users, except the user whose id=3.
Another example:
show_email_to => {1:'NONE',2:'BUT',3:'3',4:'80',5:'10'}
This means, no user will see the email except the users id=3,id=80 and id=10.
Of course, the MySql query will select this data, and PHP/Js will extract the data I need from Json.
Another point, is that sometimes .. a user wants to show data only to his friends except 3 friends.
This will do :
show_email_to => {1:'FRIENDS',2:'BUT',3:'3'}
This means that the email will be shown to all his friends, except user with id=3.
My question is : How much will be this system performant, flexible (for other uses) compared to the 'many to many' solution (which requires to have many data in many tables)??
Note: I know already that saving many elements in one column is a bad practice, But here: I think this is a json element and can be considered as a one Object

This is a good question. What you propose is, with respect, a very bad idea indeed if you're using any flavor of SQL. You are proposing to denormalize your tables in a way that will defeat every attempt to speed up searching or querying in the future.
What should you do instead? You could take a look at using an XML-centric dbms like MarkLogic. It's capable of creating indexes that accelerate various Xpath-style queries, so you would be able to search on relationships. If you do that, I hope you have a big budget.
Or, you could use normalized permission tables.
item_to_show (item id)
order (an integer specifying rule ordering, needed for this)
recipient (user id)
isdenied (0 means recipient is allowed, 1 means she is denied)
In this table, the primary key is a compound key constructed of the first two columns.
I'm aware that you have many types of items. You assert that it's bad to have an extra table for each item type in your system. I don't agree that it's inherently bad. I believe your proposed solution is far worse.
You could arrange to give each item a unique id number to allow you to use a single permission table. See this for an example of how to do that. Fastest way to generate 11,000,000 unique ids
Or you could have a single permission table with a type id.
item_to_show (item id)
item_type_to_show (item type id)
order (an integer specifying rule ordering, needed for this)
recipient (user id)
isdenied (0 means recipient is allowed, 1 means she is denied)
In this case the primary key is the first three columns.
Or, you can do what you don't want to do and have a separate permission table for each item type.

You say, "As I have about 30 data like the 4 data above, I think it will be bad to create one table for every data, and make a many to many relation with the table fs_user"
I agree with the first part of your statement only. You only need one table. For the sake of a name, I'll call it ShowableItems. Fields would be ShowableItemId (PK) and Item. Some of these items would be email, gender, address, etc.
Then you need a many to many table that shows what items can be shown to whom. Your three fields would be, the id of the person who owns the item, the showable item id, and the id of the person who can see it.

Related

MySQL: enumeration tables - use only one or many?

Until now, I still made more enumerating tables with columns ID and CODE .. (money_type, payment_type, shipping_type ...).
Is it better to make every enumerating table separately or only one ? with columns ID, CODE, TYPE ... where type will be "money, payment, shipping".
In my system there will be minimal 50 enumerating tables..

From a pure technical point of view, there is little against creating a giant lookup table for such purposes. However, there are some valid business reasons against doing so:
Security. You may not want the same user to be able to edit all lookup data. If you store all of them in a single table, then restricting access to certain records only can be difficult. Obviously, you can do that with views, but if you create as many views as lookup types, then what's the point of having a single table in the first place?
Configrability. In many cases these lookup tables may hold more data than just an id and a human readable description and the additional data would be specific for that given parameter only. For example, on a tax code lookup table you may be able to specify that the given tax code is only applicable to domestic transactions only, and not applicable on cross-border ones (such as VAT). If you have a giant table holding all configuration data, then such customization is a lot more difficult.
Obviously, if the purpose of having these lookup tables is to provide a human readable description (e.g. a definitions table you can use for GUI), then you can have a single table for that. Otherwise I would go for the 1 lookup table per parameter approach, even if you need to have 50 lookup tables.

What is the proper way to store user preferences in a database?

I have a MySQL database that stores user emails and news articles that my service provides. I want users to be able to save/bookmark articles they would like to read later.
My plan for accomplishing this was to have a column, in the table where I store the users' emails, that holds comma-delineated strings of unique IDs, where the unique IDs are values assigned to each article as they are added into the database. These articles are stored in a separate table and I use UUID_SHORT() to generate the unique IDs of type BIGINT.
For example, let's say in the table where I store my articles, I have
ArticleID OtherColumn
4419350002044764160 other stuff
4419351050184556544 other stuff
In the table where I store user data, I would have
UserEmail ArticlesSaved OtherColumn
examlple1#email.com 4419350002044764160,4419351050184556544,... other stuff
examlple2#email.com 4419350002044764160,4419351050184556544,... other stuff
to indicate the first two users have saved the articles with IDs 4419350002044764160 and 4419351050184556544.
Is this a proper way to store something like this on a database? If there is a better method, could someone explain it please?
One other option I was thinking of was having a separate table for each user where I can store the IDs of the articles they saved into a column, though the answer for this post that this is not very efficient: Database efficiency - table per user vs. table of users

I would suggest one table for the user and one table his/her bookmarked articles.
USERs
id - int autoincrement
user_email - varchar50
PREFERENCES
id int autoincrement
article_index (datatype that you find accurate according to your structure)
id_user (integer)
This way it will be easy for a user to bookmark and unbookmark an article. Connecting the two tables are done with id in users and id_user in preferences. Make sure that each row in the preferences/bookmarks is one article (don't do anything comma seperated). Doing it this way will save you much time/complications - I promise!
A typical query to fetch a user's bookmarked pages would look something like this.
SELECT u.id,p.article_index,p.id_user FROM users u
LEFT JOIN preferences ON u.id=p.id_user
WHERE u.id='1' //user id goes here, make sure it's an int.. apply appropriate security to your queries.

"Proper" is a squirrely word, but the approach you suggest is pretty flawed. The resulting database no longer satisfies even first normal form, and that predicts practical problems even if you don't immediately see them. Some of the problems you would be likely to encounter are
the number of articles each user can "save" will be limited by the data type of the ArticlesSaved column;
you will have issues around duplicate "saved" article IDs; and
queries about which articles are saved will be more difficult to formulate and will probably run slower; in part because
you cannot meaningfully index the the ArticlesSaved column.
The usual way to model a many-to-many relationship (such as between users and articles) is via a separate table. In this case, such a table would have one row for each (user, saved article) pair.

Saving data in CSV format in a database field is (almost) never a good idea. You should have 3 tables :
1 table describing users with everything concerning directly the user
1 table describing articles with data about it
1 table with 2 columns "userid" and "articleid" linking both. If a user bookmarks 10 articles, this table will have 10 records with a different aticleid each time.

Advice: MySQL Database, using concatened data as single row or create several rows

I'm making a table (with MySQL) to store some data, but i'm not sure of the way to do it properly, because of the amount of data. For example if it's adress book database.
so there is a table for users and a table for contacts. Each users can own hundreds of contacts, and there could be thousans of users. Should I add a new row for every single contact (it will make a lot of rows!), or can i just concatenate all of them in one row with the user id.
uuh, this is just an example, but in my case once contacts are INSERTED they will never be UPDATED so, no modifications, they can only be DELETED.

To go by the normal forms, you should have three tables
1) Users -> {User_id} (primary key)
2) Contacts -> {Contact_id} (primary key)
3) Users_Contacts -> {User_id, Contact_id} (Compound key)
The Junction table Users_Contacts will have one record per contact - meaning for each unique value of User_id+Contact_id, there will be one record.
However, In practice, it is not always necessary to stick to the rule book. Depending on the use case, it is often advisable to have a denormalized table. The call is yours.
There is also another option of using NoSQL with MySQL. For example, the contacts can be serialized into JSON and stored. Mysql 5.7 seem to support this data format (with some external help). See this for details.

Say for eg: If you add 3 contacts for a single user and as you mentioned you would be deleting contacts the its better to insert all three contacts, each in a new row with its user id. Because if you want to delete any one of the contact from 3 of them, then it will be easy.

If you concatenate all the contacts for an user and add them in one row could land up many issues. What in future the requirement changes and you need to make a layout all the contacts for an user with edit/delete individual contacts. So you should have one contact in each row.
You can optimize your query by indexing the columns.
Say userid#1234 has 1000 contacts in contact table where the primary key in contact table is idcontact (Indexed by default) and then in contact table another field called "iduser" which is also indexed, then the select performance over an iduser on contact table will be fast.
Ideally its the best approach using mysql database. There are examples of many apps where it maintains millions of data so it should be fine with a contact table and for each contact a new row.

I wouldn't worry about lots of rows. You have to keep in-mind the granularity of control the user would expect (deleting / adding a contact, rearranging the list based on different factors, etc). It's always better to break things out into their owns rows if they are going to be treated independently from a similar item (contacts, users, addresses, etc). Additionally, if you were to concatenate your data, re-ordering for display or removing data becomes extremely resource intensive. Where as MySQL is designed to do exactly that "on the cheap".
MySQL can easily handle millions of rows of data. If you are worried about speed, just make sure your indexes are in-place before your data collection is too big (I would venture a guess, and say you'll need to index the user ID the contact belongs to and the first/last names). Indexes are a double-edged sword, however, as they take up disk space, but allow fast querying of large data sets. So you don't want to go over-board and index everything, only what you'll be sorting/searching by.
(Why on earth will contacts never be updated?...)

Build PHP function to retrieve a variety of mySQL database queries and correctly traverse through multiple tables via their foreign key relationship

I am trying to build a robust php function that allows me to traverse over my normalized database. My mySQL database has 6 tables with the following column names (I am only including the primary and foreign keys, as well as some limited table columns for simplicity) so that you can see how they are related.
tableA:
partID (primary key)
tableABJunction
itemID (foreign key)
partID (foreign key)
tableB
itemID (primary key)
itemName
sales
customerID (foreign key)
itemID (foreign key)
partDate
itemID (foreign key)
customer
customerID (primary key)
nameFirst
nameLast
When I need to generate a query, such as: What are the names of the customers that ordered itemID = 12? I have to first do a query from the sales database for all customerIDs where itemID=12 and then query the customer table to find out their first and last names. Some times, I may need to perform a query where I have to return data from all 6 tables, based on a query asking for all information pertaining to customer whose name is John Smith. Is there any easy way to build a function to handle this variety of queries, without having to build a query for every possible type of search?
Currently, my approach is to pass the following to php via AJAX:
web_conditionArray (contains the column name and value of the data provided. Such as nameFirst => 'John', nameLast => 'Smith'); web_resultArray (contains the table name and the columns that I am requesting: sales => 'itemID, itemName').
The issue that I am having with this approach is a way to store the relationships between all of the mySQL datatables with their foreign keys so that my php program knows how to link all the tables together to run the correct query to get from the data provided from one table to the data requested in another table. Any suggestions or a better way to solve this? I was initially thinking of a doubly linked list but the flow from table to table is not linear given that there is a fork where the tableB links to the sales and partDate tables.
I tried to be as specific as I could in describing this situation without writing a novel; however, please let me know if you need any additional information to refine my question further.

Looking at your table structure, I imagine it would be possible to construct logic to calculate the relationships between tables, and dynamically construct queries, but it seems to me that that would be far more work than manually constructing queries for your particular database. I'm assuming that your tables have many more fields in them, but that you've only included the most important, and have definitely included all primary and foreign keys.
Based on that, you have only three information objects in your database: Parts, Items and Customers. You should, therefore, not need more than 12 manually constructed queries to make your system work. You just need to ensure that you simplify your queries to work with whole information objects, and use the PHP layer to filter them later.
So, you reduce your query logic to:
"Fetch me all [Parts, Items or Customers] (and possibly also all [Parts, Items or Customers]) related to [Part, Item or Custromer] (and possibly [Part, Item or Customer])"
This results in the following queries:
All Customers for a Part
All Customers for an Item
All Customers for a Part and an Item
All Items for a Part
All Items for a Customer
All Items for a Part and a Customer
All Parts for an Item
All Parts for a Customer
All Parts for a Customer and an Item
All Parts and Customers for an Item
All Customers and Items for a Part
All Items and Parts for a Customer
(This is the full list of logical relationships - some may not make any sense practically, which makes your life easier)
So, your PHP script needs to perform the following tasks:
Identify which object(s) are required for the criteria of the query. This is based on the fields supplied.
Construct a WHERE clause for your query which identifies the primary key for the criteria objects from the fields passed.
Identify which object(s) are required for the result of the query, based on the fields requested.
Select the query based on the criteria and return objects, and insert the constructed WHERE clause.
Perform the query, extracting all information available about the requested objects
Filter the results, extracting only the required information
Return the final results.

First, know that my answer will most likely be downvoted to hell (as this methodology is constantly downvoted despite its' correctness). DBAs want you to believe that just because a complex query can be done with a SQL statement that it should (like how server-siders think all client-side should be done with server-side or how client-siders think layouts should be done with client-side instead of CSS). No. Complex queries are for people sitting at command lines needing to come up with on demand data grabbing for specific, non-routine reasons. For processing speed, SELECTing, UPDATEing, and DELETEing should always be done off the PK server-side.
It sounds like you have a set of legitimately large tables.
Assuming it's large and speed is the primary concern (and not development time), use only a primary key and no other indexes because the more indexes you have, the more those indexes need to be reindexed by the database when really the comparisons that DBAs would have you do are faster server-side.
The primary key will take some finagling, but it's the most important thing past data types and lengths. For instance, the non-FK, independent tables like tableA, tableB, and customer should probably have an ai INT PK (Generally, remember that computers think in terms of integers), but the ones with multiple FKs should probably have no ai INT but instead a composite PK with the less variant SELECTed FK first. For example, with my site, I store vote totals on links by userID and linkID. If a user's logged in, they'll need to know how many votes they've placed on a link, so the userID is the one less likely to change, so that's first in my PK on that table. Counting this on demand database side or server-side was a performance nightmare.
For just a few lines of code, you will GREATLY improve speed. Sorting on the PK via php will cut latency by 50%. Absorbing JOINs into php will decrease the rate of latency spikes. Having no on demand MySQL calculations will keep your site from becoming paralyzed.
If you step away from the dogma that just because a SQL statement can get you the results that you should use a SQL statement instead of a server-side language (C++ being the fastest), you'll see performance skyrocket.
If you can be more specific with the tables you're trying to obfuscate, I can get more specific, but you probably get the idea.
AJAX has changed the game and forced refocus. CSS for layouts; js for client-side programming; server-side for...server-side processing; database for storing everything that lasts longer than a moment.
Bring on the downvotes! LOL

How to apply normalization on mysql using php

Please I don't have any idea. Although I've made some readings on the topic. All I know is it is used to make the data in the database more efficient and easy to handle. And It can also be used to save disk space. And lastly, if you used normalization. You will have to generate more tables.
Now I have a lot of questions to ask.
First, how will normalization help to save disk space or whatever space occupied by the database.
Second, Is it possible to add data on multiple tables using only 1 query.
Please help, I'm just a newbie wanting to learn from you. Thanks.

Ok, couple of things:
php has got nothing to do with this. normalization is about modelling data
normalization is not about saving disk space. It is about organizing data so that it is easily maintainable, which in turn is a way to maintain data-integrity.
normalization is typically described in a few stages or 'normal forms'. In practice, people that design relational databases often intuitively 'get it right' most of the time. But it is still good to be aware of the normal forms and what their characteristics are. There is a lot of documentation on that on the internet (fe http://en.wikipedia.org/wiki/Database_normalization), and you should certainly do you own research, but the most important stages are:
unormalized data: in this stage, data is not truly tabular ('relational'). There is a lot of discussion of what tabular really means, and experts disagree with one another. but most people agree that data is unnormalized in case there are multi-valued attributes (=columns that can for one row contain lists as value), or in case there are repeating groups (=multiple columns or multiple groups of columns for storing the same type of data)
Example of multi-valued column: person (first_name, last_name, phonenumbers)
Here, phonenumbers implies there could be more phonenumbers, stored in one column
Example of repeating group: person(first_name, last_name, child1_first_name, child1_birth_date, child2_first_name, child2_birth_date..., childN_first_name, childN_birth_date)
Here, the person table has a number of column pairs (child_first_name, child_birth_date) to store the person's children.
Note that something like order (shipping_address, billing_address) is not a repeating group: the addresses for billing and shipping may be similar pieces of data, but each has its own distinct role for an order, both just represent a different aspect of an order. child1 thru child10 do not - children do not have specific roles, and the list of children is variable (you never know how many groups you should reserve in advance)
In both cases, multi-valued columns and repeating groups, you basically have "nested table" structure - a table within a table. Data is said to be in 1NF (first normal form) if neither of these occur.
The 1NF is about structural characeristics: the tabular form of the data. All subsequenct normal forms have to do with eliminating redundancy. Redundancy occurs when the same information is independently stored multiple times. Redundancy is bad: if you want to change some fact, you have to change it in multiple places. If you forget to chance one of them, you have inconsistent data - the data is contradicting itself.
There are a lot of processes that can eliminate redundancy, each leading to a higher normal form, all the way from 1nf up to 6nf. However, typically most databases are adequately normalized at 3nf (or a lsight variation of that called boyce-codd normal form, BCNF) You should study 2nf and 3nf, but the principle is very simple: a table is adequately normalized, if:
the table is in 1nf
the table has a key (a column or column combination whose values are required, and which uniquely identifies a row - ie. there can be only one row having that combination of values in the key columns)
there are no functional dependencies between the non-key columns
non-key columns are not functionally dependent upon part of the key (but are completely functionally dependent upon the entire key).
functional dependency means that a column's value can be derived from another column. simple example:
order_item (order_id, item_number, customer_id, product_code, product_description, amount)
let's assume (order_id, item_number) is key. product_code and product description are functionally dependent upon each other: for one particular product_code, you will always find the same product description (as if product description is a function of product_code). The problem is now: suppose a product description changes for a particualr product code, you have to change all orders that us that product_code. forget only one and you have an inconsistent database.
The way to solve it is to create a new product table with (product_code, product_description), having (product_code) as key, and then instead of storing all product fields in order, only store a reference to a row in the product table in the order_item records (in this case, order_item should only keep product_code, which is sufficient to look up a row in the product table and find the product_description)
So as you u can see, with this solution you do actually save space (by not storing all these product descriptions in each order_item that happens to order the product) and you do get more tables (split off product from order_item) But just remember that it is not because of saving diskspace: it is because you eliminate redundancy, thus making it easier to maintain the data. because now you only have to change one row in the product table to change the description

There are a lot of similar questions on StackOverflow already, for example, Can someone please give an example of 1NF, 2NF and 3NF in plain english?
Look in the Related sidebar to the right for a bunch of them. That'll get you started.
As for your specific questions:
Normalization saves disk space by reducing redundant data storage. This has another benefit: if you have multiple copies of a given entity attribute in your database, they can get out of sync, while if you have a normalized database and use referential integrity, this cannot happen.
The INSERT statement references only one table. A TRIGGER on the insert statement can add rows to other tables, but there's no way to supply data to the trigger other than those columns in the table that spawned it.
When you need to insert dependent rows after inserting a row to the parent table, use the LAST_INSERT_ID() function to retrieve the auto-generated primary key value of the last INSERT statement in your session.

I think you will learn this when you start creating the schema for your database.
Please think reverse when you add a field that exists somewhere else in your database.
By reverse I mean, ask yourself: if I have to modify the field, how many queries do I have to run?
Probably you end up, with the answer, that you will have to run 2 or X times the query to modify the content of your column.
Keep it simple, that means assign an ID to each content you have duplicated in your database.
For example taking column address
this is not good
update clients set address = 'new address' where clientid=500;
update orders set address = 'new address' where orderid=300;
good approach would be
create a addresses table
//and run a single query
update addresses set address = 'new address' where addressid=100;
And use the address id 100 everywhere in your database table as a foreign key reference (clients+orders), this way you achieve that the id 100 is not changed, but if you update the content of the address all linked tables will pick up the change.
Level 3 of normalization is enough this time for you.

Normalization is a set of rules. The more you follow, the higher a "level" of normalisation your database has. In general, level 3 is the highest level sought after.
Normalised data is theoretically "purer" than non-normalised data. This makes it easier to rationalise about it, and it removes redundancy, which is reduces the chance of data getting out of sync.
From a pratical viewpoint however, normalised data isn't always the best design, even if it is in theory. If you don't really know the finer points, aiming for normalised data isn't such a bad idea though.

in phpmyadmin > 4.3.0, in structure -> Table structure, we got above the table:
"Print" "Propose table structure" "Track table" "Move columns" "Improve table structure" , in "Improve table structure" you got a wizard which says :
Improve table structure (Normalization):
Select up to what step you want to normalize
First step of normalization (1NF)
Second step of normalization (1NF+2NF)
Third step of normalization (1NF+2NF+3NF)

To question 2: No it is not possible to insert data into multiple tables with one query.
See the INSERT syntax.
In addition to other answers, you can also search here on SO for normalization and find e.g. the question: Normalization in MySQL

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.