MySQL database table design for black lists - php

I have a PHP application and I need to store black list data. My site members will add any user to his/her black list. So they won't see the texts of that users.
Every user's black list is different.
A user can have 1000-1500 users in his/her black list.
User can add/remove anybody from his/her list.
Black list will have member's id and black listed people's ids.
I'm trying to design database table for this. But I couldn't be sure about how can structure be ?
I have 7-8 MySQL tables but none of them is like this.
Way 1:
--member ID-----black listed people (BLOB)
-----------------------------------------
--1234----------(Some BLOB data)---------
--6789----------(Some BLOB data)---------
I can serialize blacklisted people's IDs and save them inside a BLOB data column. When a user want to edit his/her list, I get BLOB data from table, remove unwanted ID and update column with new data. IT seems like a bit slow operation when a user has 1k-2k IDs.
Way 2:
--member ID----black listed ID--------
--------------------------------------
--1234---------113434545--------------
--1234---------444445454--------------
--1234---------676767676--------------
--6789---------534543545--------------
--6789---------353453454--------------
In this way, when a user wants to see his/her black list I give them all users in "black listed ID" column. When editing I add/remove new rows to table. This operation is fast but the table can be huge in time.
Way 3:
--member ID----113434545----444445454----676767676---534543545-----353453454
----------------------------------------------------------------------------
--1234--------yes------------yes------------yes------------no------no-------
--6789--------no-------------no-------------no-------------yes------yes------
Yes shows black listed, No shows not black listed. I create new column for each black listed person and update that column when a user adds a person or removes it.
Way 4:
???
These are my ideas. I really appreciate if you can offer me a better one?
Thank you.

What you are creating is a so-called 1 to n relation table.
3rd version
The 3rd version would require to have n rows x n columns, where n is the amount of registred users. InnoDB has a limit of 1000 columns, breaking your logic as soon as 1001st. user registers. Not to mention that you don´t want to ALTER TABLE for every new user. Forget that.
1st version
The first solution is really slow: BLOB data won´t be really idexed, it tends to get into a second page (file on harddisk, effectively doubeling disk I/O), it has massive datasize overhead, sorting and grouping won't happen in RAM, and you have no efficient way for backwards search (how many people did blacklist user xy?)... as a general advise, try to avoid BLOB untill absolutely necesarry.
2nd version
The second solution is the way to go. MySQL is optimized for stuff like that, and a table with 2 numeric, indexed rows is really fast.
Table design
I would create a table consisting of
blocker_id | blocked_id
and no separate primary key. Instead I would create a 2-column-primary-key with blocker beeing the first column and blocked the second. That way you save a B-Tree (expensive to create index) and can search fast for both all blockeds from a blocker (using half of the key) and for the existence of a single combination. (That will be most relevant for filtering posts, and should be optimized for.)

I think you should make the blacklist like way 2:
black_list_id | blocker | blocked
So when you want to take whom a user blocks you would get it by SELECT * FROM black_list_table WHERE blocker = :user_id.
To get who is blocking the user you get SELECT SELECT * FROM black_list_table WHERE blocked = :user_id.
You can easily take how many people block user, how many blocked people user has, and moreover, you can set indices on all columns and get other users' data using JOIN statements.

Related

Optimizing the friend-relationship storage in MySQL

Assumptions
If A is a friend of B, B is also a friend of A.
I searched for this question and there are already lots of questions on Stack Overflow. But all of them suggest the same approach.
They are creating a table friend and have three columns from, to and status. This serves both purposes : who sent friend request as well as who are friends if status is accepted.
But this means if there are m users and each user has n friends, then I will have mn rows in the friends table.
What I was thinking is to store friends list in a text column. For every user I have a single row and a friends column which will have all accepted friends' IDs separated by a character, say | which I can explode to get all friends list. Similarly, I will have another column named pending requests. When a request is accepted, IDs will move from pending requests to friends column.
Now, this should significantly reduce the entries in the table and the search time.
The only overhead will be when I will have to delete a friend, I will have to retrieve the friend string, search the ID of the friend to be deleted, delete the ID and update the column. However, this is almost negligible if I assume a user cannot have more than 2000 friends.
I assume that I will definitely be forgetting some situations or this approach will have certain pitfalls. So please correct if so.
The answer is NO! Do not try to implement this idea - its complete disaster.
I am going to describe more precise why:
Relations. You are storing just keys separeted with |. What if you want to display list with names of friends? You will have to get list, explode it and make another n queries to DB. With relation table from | to | status you will be able to do that with one JOIN.
Deletions. Just horrible.
Inserts. For every insert you will need to do SELECT + UPDATE instead of INSERT.
Types. You should keep items in DB as they are, so integers as integers. Converting ints into string and back could cause some errors, bugs etc.
No ORM support. In future you will probably leave plain PHP for some framework. Take in mind that none of them will support your idea.
Search time?
Please do some tests. Search with WHERE + PRIMARY KEY is very fast.

MySQL : For big storage, should I use a single heavy column or a table with thousand of rows?

I build a like system for a website and I'm front of a dilemma.
I have a table where all the items which can be liked are stored. Call it the "item table".
In order to preserve the speed of the server, do I have to :
add a column in the item table.
It means that I have to search (with a regex in my PHP) inside a string where all the ID of the users who have liked the item are registered, each time a user like an item. This in order verify if the user in question has (or not) already liked the item before. In this case, I show a different button on my html.
Problem > If I have (by chance) 3000 liked on an item, I fear the string to begin very big and heavy to regex each time ther is a like
on it...
add a specific new table (LikedBy) and record each like separately with the ID of the liker, the name of the item and the state of the like (liked or not).
Problem > In this case, I fear for the MySQL server with thousand of rows to analyze each time a new user like one popular item...
Server version: 5.5.36-cll-lve MySQL Community Server (GPL) by Atomicorp
Should I put the load on the PHP script or the MySql Database? What is the most performant (and scalable)?
If, for some reasons, my question does not make sens could anyone tell me the right way to do the trick?
thx.
You have to create another table call it likes_table containing id_user int, id_item int that's how it should be done, if you do like your proposed first solution your database won't be normalized and you'll face too many issues in the future.
To get count of like you just have to
SELECT COUNT(*) FROM likes_table WHERE id_item='id_item_you_are_looking_for';
To get who liked what:
SELECT id_item FROM likes_table WHERE id_user='id_user_you_are_looking_for';
No regex needed nothing, and your database is well normalized for data to be found easily. You can tell mysql to index id_user and id_item making them unique in likes_table this way all your queries will run much faster
With MySQL you can set the user ID and the item ID as a unique pair. This should improve performance by a lot.
Your table would have these 2 columns: item id, and user id. Every row would be a like.

MySQL optimization: more entries vs complex queries

I want to improve the speed of a notification board. It retrieves data from the event table.
At this moment the events MySQL table looks like this
id | event_type | who_added_id | date
In the event table I store one row with information regarding a particular event. Each time a users A asks for new notifications, the query runs through the table and looks if the notifications added by the user B suit him (they have to be friends, members of the same groups, have previously chatted).
Table events became big, because of the bulky query the page loads slow.
I'm thinking of changing entirely this design and, instead of adding one event row and then compare if the user's event suits or not, to add as many rows as interested users. I would change the table events structure as follows:
id | event_type | who_added_id | forwho_id | date
Now, if user B creates an event which interests other 50 members, I create 50 rows with the same information and in the 'forwho_id' field I mention those 50 members which must get this notification.
I think the query will become much more simple and it will take less time to search through it.
How do you think:
1. Is this a good approach in storing such kind of data or we should avoid duplicate data at any cost?
2. How do you think the events table will behave if the number of interested users will be not 50 but hundreds?
Thank you for reading this and I hope I made myself understandable.
Duplicated data is not "bad", and it's not to be "avoided at all cost".
What is "bad" is uncontrolled redundancy, and the kind of problems that come up when the logical data model isn't third normal form. It is acceptable and expected that an implementation will deviate from a logical data model, and introduce redundancy for performance.
Your revised design looks appropriate for your needs.

Save additional information to MYSQL Database and use a simple query, or use complex query?

I have a drupal site, and am trying to use php to grab some data from my database. What I need to do is to display, in a user's profile, how many times they were the first person to review a venue (exactly like Yelp's "First" tally). I'm looking at two options, and trying to decide which is the better way to approach it.
First Option: The first time a venue is reviewed, save the value of the reviewer's user ID into a table in the database. This table will be dedicated to storing the UID of the first user to review each venue. Then, use a simple query to display a count in the user's profile of the number of times their UID appears in this table.
Second Option: Use a set of several more complex queries to display the count in the user's profile, without storing any extra data in the database. This will rely on several queries which will have to do something along the lines of:
Find the ID for each review the user has created
Check the ID of the venue contained in each review
First review for each venue based on the venue ID stored in the review
Get the User ID of the author for the first review
Check which, if any, of these Author UIDs match the current user's UID
I'm assuming that this would involve creating an array of the IDs in step one, and then somehow executing each step for each item in the array. There would also be 3 or 4 different tables involved in the query.
I'm relatively new to writing SQL queries, so I'm wondering if it would be better to perform the set of potentially longer queries, or to take the small database hit and use a much much smaller count query instead. Is there any way to compare the advantages of either, or is it like comparing apples and oranges?
The volume of extra data stored will be negligible; the simplification to the processing will be significant. The data won't change (the first person to review a venue won't change), so there is a negligible update burden. Go with the extra data and simpler query.

MySQL database optimization for 20.000 users or more

I have been looking for some optimization tips since I´m doing a RPG modification which uses MySQL to store data by PHP.
I´m using one unique table to store all user information in columns by his unique ID, and I have to store (many?) data for each user. Weapons and other information.
I´m using explode and implode as a method to store the weapons, for example, in one column with the 'text' value. I don´t know if that´s a good practice and I don´t know if I will have performance problems if I get thousands of players doing tons of UPDATES , SELECT , etc, requests.
I read that a Junction table may be better to store the weapons and all those information, but I don´t know if that will get better information that you request it by the explode method.
I mean, I should store all the weapons in a different table, each weapon with his information (each weapon have some information, like different columns, I use multiple explode for that inside the main explode) and the user owner of that weapon to identify the weapon than just have them in one column.
It can be 100 items at least to store, I don´t know if it´s good to make 100 records per user on a different table and call all of them all the time better than just call the column and use explode.
Also I want to improve my skills and knowledge to make the best performance MySQL database I can.
I hope somebody can tell me something.
Thanks, and sorry for my stupid english grammar.
It is almost always best practice to normalize your table data. There are some exceptions to this rule (especially in very high volume databases), but you probably do not need to worry about those exceptions until you get to the point of first understanding how to properly normalize and index your tables.
Typically, try to arrange your tables in a way that mimics real-world objects and their relations to each other.
So, in your case you have users - that is one table. Each user might have multiple weapons. So, you now have a weapons table. Since multiple different users might have the same weapon and each user might have multiple weapons, you have a many-to-many relationship between them, so you should have a table "users_weapons" or similar that does nothing but relate user id's to weapon id's.
Now say the users can all have armor. So now you add an armor table and a users_armor table (as this is likely many-to-many as well).
Just think through the different aspects of your game and try to understand the relationships between them. Make sure you can model these relationships in database tables before you even bother writing any code to actually implement the functionality.
Yes it is better to use several tables instead of one. It's better to db performance, easier to understand, easier to maintain and simplier to use as well.
Let's suggest that one user has several weapons with multiple features(but not unique among all weapons). And in one place in your game you just need to know the value of one specific feature:
doing it by your way you'll need to find user row in users table, fetch on column, explode it several times, and there you have your value, but it complicates even more if you want to change it and save then.
better way is having one table for user details(login, password, email etc), another table which keeps user weapons(name of weapon, image maybe) and table in which will be all features, special powers of weapons kept. You could keep all possible features of all weapons in extra table as well. This way you if you already know user id from user table, you'll have to only join 2 tables in your sql query, and there you got value of feature of specific weapon of user.
Example pseudo schema of tables:
users
user_id
user_name
password
email
weapons
weapon_id
user_id
weapon_name
image
weapons_features
feature_id
weapon_id
feature_name
feature_value
And if you really want to use some ordered data in text field in database encode it to JSON or serialize it. This way you don't have to explode and implode it!
As all guys said, typically you should start from normalized database structure.
If performance is ok, then great, nothing to do.
If not, you can try many different things:
Find and optimize query which works slow.
Denormalize queries - sometimes joins kill performance.
Change data access pattern used in application.
Store data in file system or use NoSQL/polyglot persistence solution.

Categories