I'm attempting to build a database that stores messages for multiple users. Each user will be able to send/receive 5 different message "types" (strictly a label, actual data types will be the same). My initial thought was to create multiple tables for each user, representing the 5 different message types. I quickly learned this is not such a good idea. My next thought was to create 1 table per message type with a users column, but I'm not sure that's the best method either from a performance perspective. What happens if user 1 sends 100 message type 1's, while user 3 only sends 10? The remaining fields would be null values, and I'm really not sure if that makes a difference or not. Thoughts? Suggestions and/or suggested reading? Thank you in advance!
No, that (the idea given in the subject of this question) will be tremendously inefficient. You'll need to introduce a new table each time a new user is created, and querying all them at once would be a nightmare.
It's far easier to be done with a single table for storing information about message. Each row in this table will correspond to one - and only - message.
Besides, this table should probably have three 'referential' columns: two for linking a specific message to its sender and receiver, and one for storing its type, that can be assigned only a limited set of values.
For example:
MSG_ID | SENDER_ID | RECEIVER_ID | MSG_TYPE | MSG_TEXT
------------------------------------------------------
1 | 1 | 2 | 1 | .......
2 | 2 | 1 | 1 | #######
3 | 1 | 3 | 2 | $$$$$$$
4 | 3 | 1 | 2 | %%%%%%%
...
It'll be quite easy to get both all the messages sent by someone (with WHERE sender_id = %someone_id% clause), sent to someone (WHERE receiver_id = %someone_id%), of some specific type (WHERE msg_type = %some_type%). But what's best of it, one can easily combine these clauses to set up more sophisticated filters.
What you initially thought of, it seems, looks like this:
IS_MSG_TYPE1 | IS_MSG_TYPE2 | IS_MSG_TYPE3 | IS_MSG_TYPE4
---------------------------------------------------------
1 | 0 | 0 | 0
0 | 1 | 0 | 0
0 | 0 | 1 | 0
It can be NULLs instead of 0, the core is still the same. And it's broken. Yes, you can still get all the messages of a single type with WHERE is_msg_type_1 = 1 clause. But even such an easy task as getting a type of specific message becomes, well, not so easy: you'll have to check each of these 5 columns until you find the one that has truthy value.
The similar difficulties expect the one who tries to count the number of messages of each types (which is almost trivial with the structure given above: COUNT(msg_id)... GROUP BY msg_type.
So please, don't do this. ) Unless you have a very strong reason not to, try to structure your tables so that with the time passing by they will grow in height - not in width.
The remaining fields would be null values
Except if you're designing your database vertically, there will be no remaining fields.
user int
msgid int
msg text
create table `tv_ge_main`.`Users`(
`USER_ID` bigint NOT NULL AUTO_INCREMENT ,
`USER_NAME` varchar(128),
PRIMARY KEY (`ID`)
)
create table `tv_ge_main`.`Message_Types`(
`MESSAGE_TYPE_ID` bigint NOT NULL AUTO_INCREMENT ,
`MESSAGE_TYPE` varchar(128),
PRIMARY KEY (`ID`)
)
create table `tv_ge_main`.`Messages`(
`MESSAGE_ID` bigint NOT NULL AUTO_INCREMENT ,
`USER_ID` bigint ,
`MESSAGE_TYPE_ID` bigint ,
`MESSAGE_TEXT` varchar(255) ,
PRIMARY KEY (`ID`)
)
Related
I am making a social application which needs paging for the posts.
Here is the database:
id | post | time |
---------|---------------|----------|
1 | "oldest" | 9:00 |
2 | "old" | 10:00 |
3 | "new" | 11:00 |
4 | "newest" | 12:00 |
In my app:
Newest posts are on top and I only load 2 posts at the time.
Let's say the first 2 data is loaded into the app
4 (12:00) newest
3 (11:00) new
User scrolls down, the app detects that the last post was reached, so it requests the PHP file to download 2 more the following order:
2 (10:00) old
1 (9:00) older
It works fine. The following is my code:
$qry = $db->prepare('SELECT id, post
FROM posts
WHERE id < :lastLoadedId
ORDER BY time DESC LIMIT 0, 2');
The problem / question:
My server deletes really old posts automatic (in order to save space)
Let's assume that after a while the mysql table reaches it's limitations (last available id which is 2,147,483,647)
Then I need to give ids again from 1:
here comes the problem.
id | post | time |
--------------|---------------|----------|
1 | "new" | 11:00 |
2 | "newest" | 12:00 |
2,147,483,646 | "oldest" | 9:00 |
2,147,483,647 | "old" | 10:00 |
The first 2 data is loaded again into my app.
2 (12:00) newest
1 (11:00) new
When it tries to load more, it searches for IDs that are smaller than number 2, but since 2,147,483,647 is bigger therefore it would not return back the "oldest" and "old" posts.
Should I worry about this?
How does big companies handle that much data? After a while they start a new table?
According to the MySQL website, the unsigned bigint can go up to 18446744073709551615. If you insert 1 million records per second 24x7, it will take 584542 years to reach the limit. So I don't think you should worry too much.
Here is an example :
CREATE TABLE foo (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
)
Note that the 20 stands for the number of digits to be displayed and has nothing to do with storage.
Why not adding another column ?
ALTER TABLE `files` ADD `real_id` BIGINT NOT NULL AFTER `id`;
So before adding an article, search for the last (biggest) real_id and then increment it.
You have a problem with your design.
Your id is not a real id. The id should be primary key and auto increment. There shouldn't be a case of reusing id. It's confusing.
Your id data type INT is not enough to support real life data. As suggested by other developers, change it to bigint.
MySQL internally stores date time or timestamp as integer. The size impact is small. But you should use Id only (order by id desc, instead of time) and make sure it's primary key. This will make your query very fast because it's directly working on the clustered index
I have a table that looks like this
+----+----------+------------
| id | restaurant| ... |
+----+----------+-----------+
| 1 | one | ... |
| 2 | tow | ... |
And now for the restaurants I want to add in hours of operation for each. Now from what I have read it would be bad to have a new column and call it hours and then have a varchar and have something like
"9:00-22:00,10:00-20:00,10:00-21:00"
And then when I pull this data into my app later split it at the commas to make an array. Not 100% sure why this is bad, but I know that I am not supposed to do that right?
So I was thinking making a new table called "Restaurant_Hours" and have it look like this
+----+----------+------------+------------+
| id | restaurant| mon |tue | ect...
+----+----------+------------+------------+
| 1 | one | 9:00-22:00|10:00-22:00 |
| 2 | tow | ect. | ect. |
Is this strategy of making the new table and having it like the way I showed best? And is this also not the correct way of doing things. And then restaurant would be my unique each in each so I could get the hours that way?
The base of what I'm thinking is something like this:
CREATE TABLE `restaurant_hours` (
`restaurant_hour_id` INT NOT NULL AUTO_INCREMENT,
`restaurant_id` INT NOT NULL,
`day_of_week` TINYINT NULL,
`opens_at` TIME NOT NULL,
`closes_at` TIME NOT NULL,
`hours_desc` CHAR(16) NOT NULL DEFAULT ''
);
Of course, restaurant_id should be a FOREIGN KEY to restaurants.id, and you might want a UNIQUE constraint on (restaurant_id, day_of_week, hours_desc). If they have special hours for holidays, you might want to use day_of_week == 0 as a "flag".
... or if you're feeling really ambitious, have it also reference some sort of "day_descriptions" table, where 1-7 correspond to Sunday-Saturday, and >=8 can be used to signal things that may need calculated by year (specific holidays).
Edit: hours_desc is intended as things like "Breakfast", "Lunch", "Dinner", etc...
Even without that, a query to find out "what's open when" would go something like this:
SELECT r.restaurant
FROM restaurant_hours AS rh
INNER JOIN restaurants AS r rh.restaurant_id = r.id
WHERE rh.day_of_week = DAYOFWEEK(#theWhen)
AND rh.opens_at < TIME(#theWhen)
AND rh.ends_at > TIME(#theWhen)
;
This is a general question, one that I've been scratching my head on for a while now. My company's database handles about 2k rows a day. 99.9% of the time, we have no problem with the values that are returned in the different SELECT statements that are set up. However, on a very rare occasion, our database will "glitch" and return the value for a completely different row than what was requested.
This is a very basic example:
+---------+-------------------------+
| row_id | columnvalue |
+---------+-------------------------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 4 | 40 |
+---------+-------------------------+
SELECT columnvalue FROM table_name WHERE row_id = 1 LIMIT 1
Returns: 10
But on the very rare occasion, it may return: 20, or 30, etc.
I am completely baffled as to why it does this sometimes and would appreciate some insight on what appears to be a programming phenomena.
More specific information:
SELECT
USERID, CONCAT( LAST, ', ', FIRST ) AS NAME, COMPANYID
FROM users, companies
WHERE users.COMPANYCODE = companies.COMPANYCODE
AND USERID = 9739 LIMIT 1
mysql> DESCRIBE users;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| USERID | int(10) | NO | PRI | NULL | auto_increment |
| COMPANYCODE| varchar(255)| NO | MUL | | |
| FIRST | varchar(255)| NO | MUL | | |
| LAST | varchar(255)| NO | MUL | | |
+------------+-------------+------+-----+---------+----------------+
mysql> DESCRIBE companies;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| COMPANYID | int(10) | NO | PRI | NULL | auto_increment |
| COMPANYCODE| varchar(255)| NO | MUL | | |
| COMPANYNAME| varchar(255)| NO | | | |
+------------+-------------+------+-----+---------+----------------+
What the results were suppose to be: 9739, "L----, E----", 2197
What the results were instead: 9739, "L----, E----", 3288
Basically, it returned the wrong company id based off the join with companycode. Given the nature of our company, I can't share any more information than that.
I have run this query 5k times and have made very modification to the code imaginable in order to generate the second set of results and I have no been able to duplicate it. I'm not quick to blame MySQL -- this has been happening (though rarely) for over 8 years, and have exhausted all other possible causes. I have suspected the results were manually changed after the query was ran, but the timestamps states otherwise.
I'm just scratching my head as to why this can run perfectly 499k out of 500k times.
Now that we have a more realistic query, I notice right away that you are joining the tables, not on the primary key, but on the company code. Are we certain that the company code is being enforced as a unique index on companies? The Limit 1 would hide a second row if such a row was found.
From a design perspective, I would make the join on the primary key to avoid even the possibility of duplicate keys and put company code in as a unique indexed field for display and lookup only.
This behavior is either due to an incredibly unlikely SERIOUS bug in MySQL, -or- MySQL is returning a result that is valid at the time the statement is run, and there is some other software that is garfing up the displayed result.
One possibility to consider is that the row had been modified (by some other statement) at the time your SQL statement executed, and then the row was changed again later. (That's the most likely explanation we'd have for MySQL returning an unexpected result.)
The use of the LIMIT 1 clause is curious, because if the predicate uniquely identifies a row, there should be no need for the LIMIT 1, since the query is guaranteed to return no more than one row.
This leads me to suspect that row_id is not unique, and that the query actually returns more than one row. With the LIMIT clause, there is no guarantee as to which of the rows will get returned (absent an ORDER BY clause.)
Otherwise, the most likely culprit is out dated cache contents, or other problems in the code.
UPDATE
The previous answer was based on the example query given; I purposefully omitted the possibility that EMP was a view that was doing a JOIN, since the question originally said it was a table, and the example query showed just the one table.
Based on the new information in the question, I suggest that you OMIT the LIMIT 1 clause from the query. That will identify that the query is returning more than one row.
From the table definitions, we see that the database isn't enforcing a UNIQUE constraint on the COMPANYCODE column in the COMPANY table.
We also know there isn't a foreign key defined, due to the mismatch between the datatypes.
Normally, the foreign key would be defined referencing the PRIMARY KEY of the target table.
What we'd expect the users table to have a company_id column, which references the id (primary key) column in the companies table.
(We note the datatype of the companycode column (int) matches the datatype of the primary key column in the companies table, and we note that the join condition is matching on the companycode column, even though the datatypes do not match, which is very odd.)
There are several reasons this could happen. I suggest you look at the assumptions you're making. For example:
If you're using GROUP BY and one of the columns isn't an aggregate or the grouping expression, you're going to get an unpredictable value in that column. Make sure you use an appropriate aggregation (such as MAX or MIN) to get a predictable result on each column.
If you're assuming a row order without making it explicit, and using LIMIT to get only the first row, the actual returned order of rows differs depending on that result's execution plan, which is going to differ in large resultsets based on the statistics available to the optimiser. Make sure you use ORDER BY in such situations.
I am still new to PHP and I was wondering which alternative would be better or maybe someone could suggest a better way.
I have a set of users and I have to track all of their interactions with posts. If a users taps on a button, it will add the post to a list and if they tap it again, it will remove the post, so would it be better to:
Have a column of a JSON array of postIDs stored in the table for each user (probably thousands).
-or-
Have a separate table with every save (combination of postID and userID) (probably millions) and return all results where the userID's match?
For the purposes of this question, there are two tables: Table A is users and Table B is posts. How should I store all of the user's saved posts?
EDIT: Sorry, but I didn't mention that posts will have multiple user interactions and users will have multiple post interactions (Many to Many relationship). I think that would affect Bob's answer.
This is an interesting question!
The solution really depends on your expected use case. If each user has a list of posts they've tagged, and that is all the information you need, it will be expedient to list these as a field in the user's table (or in their blob if you're using a nosql backend - a viable option if this is your use case!). There will be no impact on transmission time since the list will be the same size either way, but in this solution you will probably save on lookup time, since you're only using one table and dbs will optimize to keep this information close together.
On the other hand, if you have to be able to query a given post for all the users that have tagged it, then option two will be much better. In the former method, you'd have to query all users and see if each one had the post. In this option, you simply have to find all the relations and work from there. Presumably you'd have a user table, a post table and a user_post table with foreign keys to the first two tables. There are other ways to do this, but it necessitates maintaining multiple lists and cross checking each time, which is an expensive set of operations and error-prone.
Note that the latter option shouldn't choke on 'millions' of connections, since the db should be optimized for this sort of quick read. (pro tip: index the proper columns!) Do be careful about any data massage, though. One unnecessary for-loop will kill your performance.
For the purposes of this question, there are two tables: Table A is users and Table B is posts. How should I store all of the user's saved posts?
If each user has a unique ID of some sort (primary key), then ad a field to each post that refers to the unique ID of the user.
mysql> describe users;
+----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(200) | YES | | NULL | |
| username | varchar(20) | YES | | NULL | |
+----------+------------------+------+-----+---------+----------------+
mysql> describe posts;
+---------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| user | int(11) unsigned | NO | | NULL | |
| text | text | YES | | NULL | |
+---------+------------------+------+-----+---------+----------------+
Then to get posts for a user, for example:
SELECT text
FROM posts
WHERE user=5;
Or to get all the posts from a particular organization:
SELECT posts.text,users.username
FROM posts,users
WHERE post.user=users.id
AND users.email LIKE '%#example.com';
I think it would make sense to keep a third table that would be all the post status data.
If your user interface shows, say, 50 posts per page, then the UI only needs to keep track of 50 posts at a time. They'll all have unique IDs in your database, so that shouldn't be a problem.
I'm developing a QA web-app which will have some points to evaluated assigned to one of the following Categories.
Call management
Technical skills
Ticket management
As this aren't likely to change it's not worth making them dynamic but the worst point is that points are like to.
First I had a table of 'quality' which had a column for each point but then requisites changed and I'm kinda blocked.
I have to store "evaluations" that have all points with their values but maybe, in the future, those points will change.
I thought that in the quality table I could make some kind of string that have something like that
1=1|2=1|3=2
Where you have sets of ID of point and punctuation of that given value.
Can someone point me to a better method to do that?
As mentioned many times here on SO, NEVER PUT MORE THAN ONE VALUE INTO A DB FIELD, IF YOU WANT TO ACCESS THEM SEPERATELY.
So I suggest to have 2 additional tables:
CREATE TABLE categories (id int AUTO_INCREMENT PRIMARY KEY, name VARCHAR(50) NOT NULL);
INSERT INTO categories VALUES (1,"Call management"),(2,"Technical skills"),(3,"Ticket management");
and
CREATE TABLE qualities (id int AUTO_INCREMENT PRIMARY KEY, category int NOT NULL, punctuation int NOT nULL)
then store and query your data accordingly
This table is not normalized. It violates 1st Normal Form (1NF):
Evaluation
----------------------------------------
EvaluationId | List Of point=punctuation
1 | 1=1|2=1|3=2
2 | 1=5|2=6|3=7
You can read more about Database Normalization basics.
The table could be normalized as:
Evaluation
-------------
EvaluationId
1
2
Quality
---------------------------------------
EvaluationId | Point | Punctuation
1 | 1 | 1
1 | 2 | 1
1 | 3 | 2
2 | 1 | 5
2 | 2 | 6
2 | 3 | 7