Optimizing the friend-relationship storage in MySQL

Optimizing the friend-relationship storage in MySQL - php

Assumptions
If A is a friend of B, B is also a friend of A.
I searched for this question and there are already lots of questions on Stack Overflow. But all of them suggest the same approach.
They are creating a table friend and have three columns from, to and status. This serves both purposes : who sent friend request as well as who are friends if status is accepted.
But this means if there are m users and each user has n friends, then I will have mn rows in the friends table.
What I was thinking is to store friends list in a text column. For every user I have a single row and a friends column which will have all accepted friends' IDs separated by a character, say | which I can explode to get all friends list. Similarly, I will have another column named pending requests. When a request is accepted, IDs will move from pending requests to friends column.
Now, this should significantly reduce the entries in the table and the search time.
The only overhead will be when I will have to delete a friend, I will have to retrieve the friend string, search the ID of the friend to be deleted, delete the ID and update the column. However, this is almost negligible if I assume a user cannot have more than 2000 friends.
I assume that I will definitely be forgetting some situations or this approach will have certain pitfalls. So please correct if so.

The answer is NO! Do not try to implement this idea - its complete disaster.
I am going to describe more precise why:
Relations. You are storing just keys separeted with |. What if you want to display list with names of friends? You will have to get list, explode it and make another n queries to DB. With relation table from | to | status you will be able to do that with one JOIN.
Deletions. Just horrible.
Inserts. For every insert you will need to do SELECT + UPDATE instead of INSERT.
Types. You should keep items in DB as they are, so integers as integers. Converting ints into string and back could cause some errors, bugs etc.
No ORM support. In future you will probably leave plain PHP for some framework. Take in mind that none of them will support your idea.
Search time?
Please do some tests. Search with WHERE + PRIMARY KEY is very fast.

Related

Many to many vs one row [duplicate]

This question already has answers here:
Many database rows vs one comma separated values row
(4 answers)
Closed 8 years ago.
I'm interested how and why many to many relationship is better than storing the information in one row.
Example: I have two tables, Users and Movies (very big data). I need to establish a relationship "view".
I have two ideas:
Make another column in Users table called "views", where I will store the ids of the movies this user has viewed, in a string. for example: "2,5,7...". Then I will process this information in PHP.
Make new table users_movies (many to many), with columns user_id and movie_id. row with user_id=5 and movie_id=7 means that user 5 has viewed movie 7.
I'm interested which of this methods is better and WHY. Please consider that the data is quite big.

The second method is better in just about every way. Not only will you utilize your DBs indexes to find records faster, it will make modification far far easier.

Approach 1) could answer the question "Which movies has User X viewed" by just having an SQL like "...field_in_set(movie_id, user_movielist) ...". But the other way round ("Which user do have viewed movie x") won't work on an sql basis.
That's why I always would go for approach 2): clear normalized structure, both ways are simple joins.

It's just about the needs you have. If you need performance then you must accept redundancy of the information and add a column. If your main goal is to respect the Normalization paradigma then you should not have redundancy at all.
When I have to do this type of choice I try to estimate the space loss of redundancy vs the frequency of the query of interest and its performance.

A few more thoughts.
In your first situation if you look up a particular user you can easily get the list of ids for the films they have seen. But then would need a separate query to get the details such as the titles of those movies. This might be one query using IN with the list of ids, or one query per film id. This would be inefficient and clunky.
With MySQL there is a possible fudge to join in this situation using the FIND_IN_SET() function (although a down side of this is you are straying in to non standard SQL). You could join your table of films to the users using ON FIND_IN_SET(film.id, users.film_id) > 0 . However this is not going to use an index for the join, and involves a function (which while quick for what it does, will be slow when performed on thousands of rows).
If you wanted to find all the users who had view any film a particular user had viewed then it is a bit more difficult. You can't just use FIND_IN_SET as it requires a single string and a comma separated list. As a single query you would need to join the particular user to the film table to get a lot of intermediate rows, and then join that back against the users again (using FIND_IN_SET) to find the other users.
There are ways in SQL to split up a comma separated list of values, but they are messy and anyone who has to maintain such code will hate it!
These are all fudges. With the 2nd solution these easy to do, and any resulting joins can easily use indexes (and possibly the whole queries can just use indexes without touching the actual data).
A further issue with the first solution is data integretity. You will have to manually check that a film doesn't appear twice for a user (with the 2nd solution this can easily be enforced using a unique key). You also cannot just add a foreign key to ensure that any film id for a user does actually exist. Further you will have to manually ensure that nothing enters a character string in your delimited list of ids.

Is DynamoDB the right option for this use case?

I want to love DynamoDB, but the major drawback is the query/scan on the whole DB to pull the results for one query. Would I be better sicking with MySQL or is there another solution I should be aware of?
Uses:
Newsfeed items (Pulls most recent items from table where id in x,x,x,x,x)
User profiles relationships (users follow and friend eachother)
User lists (users can have up to 1,000 items in one list)
I am happy to mix and match database solutions.The main use is lists.
There will be a few million lists eventually, ranging from 5 to 1000 items per list. The list table is formatted as follows: list_id(bigint)|order(int(1))|item_text(varchar(500))|item_text2(varchar(12))|timestamp(int(11))
The main queries on this DB would be on the 'list_relations' table:
Select 'item_text' from lists where list_id=539830
I suppose my main question. Can we get all items for a particular list_id, without a slow query/scan? and by 'slow' do people mean a second? or a few minutes?
Thank you

I'm not going to address whether or not it's a good choice or the right choice, but you can do what you're asking. I have a large dynamoDB instance with vehicle VINs as the Hash, something else for my range, and I have a secondary index on vin and a timestamp field, I am able to make fast queries over thousands of records for specific vehicles over timestamp searches, no problem.

Constructing your schema in DynamoDB requires different considerations than building in MySQL.
You want to avoid scans as much as possible, this means picking your hash key carefully.
Depending on your exact queries, you may also need to have multiple tables that have the same data..but with different hashkeys depending on your querying needs.
You also did not mention the LSI and GSI features of DynamoDB, these also help your query-ability, but have their own sets of drawbacks. It is difficult to advise further without knowing more details about your requirements.

MySQL database table design for black lists

I have a PHP application and I need to store black list data. My site members will add any user to his/her black list. So they won't see the texts of that users.
Every user's black list is different.
A user can have 1000-1500 users in his/her black list.
User can add/remove anybody from his/her list.
Black list will have member's id and black listed people's ids.
I'm trying to design database table for this. But I couldn't be sure about how can structure be ?
I have 7-8 MySQL tables but none of them is like this.
Way 1:
--member ID-----black listed people (BLOB)
-----------------------------------------
--1234----------(Some BLOB data)---------
--6789----------(Some BLOB data)---------
I can serialize blacklisted people's IDs and save them inside a BLOB data column. When a user want to edit his/her list, I get BLOB data from table, remove unwanted ID and update column with new data. IT seems like a bit slow operation when a user has 1k-2k IDs.
Way 2:
--member ID----black listed ID--------
--------------------------------------
--1234---------113434545--------------
--1234---------444445454--------------
--1234---------676767676--------------
--6789---------534543545--------------
--6789---------353453454--------------
In this way, when a user wants to see his/her black list I give them all users in "black listed ID" column. When editing I add/remove new rows to table. This operation is fast but the table can be huge in time.
Way 3:
--member ID----113434545----444445454----676767676---534543545-----353453454
----------------------------------------------------------------------------
--1234--------yes------------yes------------yes------------no------no-------
--6789--------no-------------no-------------no-------------yes------yes------
Yes shows black listed, No shows not black listed. I create new column for each black listed person and update that column when a user adds a person or removes it.
Way 4:
???
These are my ideas. I really appreciate if you can offer me a better one?
Thank you.

What you are creating is a so-called 1 to n relation table.
3rd version
The 3rd version would require to have n rows x n columns, where n is the amount of registred users. InnoDB has a limit of 1000 columns, breaking your logic as soon as 1001st. user registers. Not to mention that you don´t want to ALTER TABLE for every new user. Forget that.
1st version
The first solution is really slow: BLOB data won´t be really idexed, it tends to get into a second page (file on harddisk, effectively doubeling disk I/O), it has massive datasize overhead, sorting and grouping won't happen in RAM, and you have no efficient way for backwards search (how many people did blacklist user xy?)... as a general advise, try to avoid BLOB untill absolutely necesarry.
2nd version
The second solution is the way to go. MySQL is optimized for stuff like that, and a table with 2 numeric, indexed rows is really fast.
Table design
I would create a table consisting of
blocker_id | blocked_id
and no separate primary key. Instead I would create a 2-column-primary-key with blocker beeing the first column and blocked the second. That way you save a B-Tree (expensive to create index) and can search fast for both all blockeds from a blocker (using half of the key) and for the existence of a single combination. (That will be most relevant for filtering posts, and should be optimized for.)

I think you should make the blacklist like way 2:
black_list_id | blocker | blocked
So when you want to take whom a user blocks you would get it by SELECT * FROM black_list_table WHERE blocker = :user_id.
To get who is blocking the user you get SELECT SELECT * FROM black_list_table WHERE blocked = :user_id.
You can easily take how many people block user, how many blocked people user has, and moreover, you can set indices on all columns and get other users' data using JOIN statements.

MySQL database optimization for 20.000 users or more

I have been looking for some optimization tips since I´m doing a RPG modification which uses MySQL to store data by PHP.
I´m using one unique table to store all user information in columns by his unique ID, and I have to store (many?) data for each user. Weapons and other information.
I´m using explode and implode as a method to store the weapons, for example, in one column with the 'text' value. I don´t know if that´s a good practice and I don´t know if I will have performance problems if I get thousands of players doing tons of UPDATES , SELECT , etc, requests.
I read that a Junction table may be better to store the weapons and all those information, but I don´t know if that will get better information that you request it by the explode method.
I mean, I should store all the weapons in a different table, each weapon with his information (each weapon have some information, like different columns, I use multiple explode for that inside the main explode) and the user owner of that weapon to identify the weapon than just have them in one column.
It can be 100 items at least to store, I don´t know if it´s good to make 100 records per user on a different table and call all of them all the time better than just call the column and use explode.
Also I want to improve my skills and knowledge to make the best performance MySQL database I can.
I hope somebody can tell me something.
Thanks, and sorry for my stupid english grammar.

It is almost always best practice to normalize your table data. There are some exceptions to this rule (especially in very high volume databases), but you probably do not need to worry about those exceptions until you get to the point of first understanding how to properly normalize and index your tables.
Typically, try to arrange your tables in a way that mimics real-world objects and their relations to each other.
So, in your case you have users - that is one table. Each user might have multiple weapons. So, you now have a weapons table. Since multiple different users might have the same weapon and each user might have multiple weapons, you have a many-to-many relationship between them, so you should have a table "users_weapons" or similar that does nothing but relate user id's to weapon id's.
Now say the users can all have armor. So now you add an armor table and a users_armor table (as this is likely many-to-many as well).
Just think through the different aspects of your game and try to understand the relationships between them. Make sure you can model these relationships in database tables before you even bother writing any code to actually implement the functionality.

Yes it is better to use several tables instead of one. It's better to db performance, easier to understand, easier to maintain and simplier to use as well.
Let's suggest that one user has several weapons with multiple features(but not unique among all weapons). And in one place in your game you just need to know the value of one specific feature:
doing it by your way you'll need to find user row in users table, fetch on column, explode it several times, and there you have your value, but it complicates even more if you want to change it and save then.
better way is having one table for user details(login, password, email etc), another table which keeps user weapons(name of weapon, image maybe) and table in which will be all features, special powers of weapons kept. You could keep all possible features of all weapons in extra table as well. This way you if you already know user id from user table, you'll have to only join 2 tables in your sql query, and there you got value of feature of specific weapon of user.
Example pseudo schema of tables:
users
user_id
user_name
password
email
weapons
weapon_id
user_id
weapon_name
image
weapons_features
feature_id
weapon_id
feature_name
feature_value
And if you really want to use some ordered data in text field in database encode it to JSON or serialize it. This way you don't have to explode and implode it!

As all guys said, typically you should start from normalized database structure.
If performance is ok, then great, nothing to do.
If not, you can try many different things:
Find and optimize query which works slow.
Denormalize queries - sometimes joins kill performance.
Change data access pattern used in application.
Store data in file system or use NoSQL/polyglot persistence solution.

How to store multi-valued profile details?

I have many fields which are multi valued and not sure how to store them? if i do 3NF then there are many tables. For example: Nationality.
A person can have single or dual nationality. if dual this means it is a 1 to many. So i create a user table and a user_nationality table. (there is already a nationality lookup table). or i could put both nationalities into the same row like "American, German" then unserialize it on run-time. But then i dont know if i can search this? like if i search for only German people will it show up?
This is an example, i have over 30 fields which are multi-valued, so i assume i will not be creating 61 tables for this? 1 user table, 30 lookup tables to hold each multi-valued item's lookups and 30 tables to hold the user_ values for the multi valued items?
You must also keep in mind that some multi-valued fields group together like "colleges i have studied at" it has a group of fields such as college name, degree type, time line, etc. And a user can have 1 to many of these. So i assume i can create a separate table for this like user_education with these fields, but lets assume one of these fields is also fixed list multi-valued like college campuses i visited then we will end up in a never ending chain of FK tables which isn't a good design for social networks as the goal is it put as much data into as fewer tables as possible for performance.

If you need to keep using SQL, you will need to create these tables. you will need to decide on how far you are willing to go, and impose limitations on the system (such as only being able to specify one campus).
As far as nationality goes, if you will only require two nationalities (worst-case scenario), you could consider a second nationality field (Nationality and Nationality2) to account for this. Of course this only applies to fields with a small maximum number of different values.

If your user table has a lot of related attributes, then one possibility is to create one attributes table with rows like (user_id, attribute_name, attribute_value). You can store all your attributes to one table. You can use this table to fetch attributes for given users, also search by attribute names and values.

The simple solution is to stop using a SQL table. This what NoSQL is deigned for. Check out CouchDB or Mongo. There each value can be stored as a full structure - so this whole problem could be reduced to a single (not-really-)table.
The downside of pretty much any SQL based solution is that it will be slow. Either slow when fetching a single user - a massive JOIN statement won't execute quickly or slow when searching (if you decide to store these values as serialized).

You might also want to look at ORM which will map your objects to a database automatically.
http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software#PHP

This is an example, i have over 30
fields which are multi-valued, so i
assume i will not be creating 61
tables for this?
You're right that 61 is the maximum number of tables, but in reality it'll likely be less, take your own example:
"colleges i have studied at"
"college campuses i visited"
In this case you'll probably only have one "collage" table, so there would be four tables in this layout, not five.
I'd say don't be afraid of using lots of tables if the data set you're modelling is large - just make sure you keep an up to date ERD so you don't get lost! Also, don't get caught up too much in the "link table" paradigm - "link tables" can be "entities" in their own rights, for example you could think of the "colleges i have studied at" link table as an "collage enrolments" table instead, give it it's own primary key, and store each of the times you pay your course fees as rows in a (linked) "collage enrolment payments" table.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.