I am making a social application which needs paging for the posts.
Here is the database:
id | post | time |
---------|---------------|----------|
1 | "oldest" | 9:00 |
2 | "old" | 10:00 |
3 | "new" | 11:00 |
4 | "newest" | 12:00 |
In my app:
Newest posts are on top and I only load 2 posts at the time.
Let's say the first 2 data is loaded into the app
4 (12:00) newest
3 (11:00) new
User scrolls down, the app detects that the last post was reached, so it requests the PHP file to download 2 more the following order:
2 (10:00) old
1 (9:00) older
It works fine. The following is my code:
$qry = $db->prepare('SELECT id, post
FROM posts
WHERE id < :lastLoadedId
ORDER BY time DESC LIMIT 0, 2');
The problem / question:
My server deletes really old posts automatic (in order to save space)
Let's assume that after a while the mysql table reaches it's limitations (last available id which is 2,147,483,647)
Then I need to give ids again from 1:
here comes the problem.
id | post | time |
--------------|---------------|----------|
1 | "new" | 11:00 |
2 | "newest" | 12:00 |
2,147,483,646 | "oldest" | 9:00 |
2,147,483,647 | "old" | 10:00 |
The first 2 data is loaded again into my app.
2 (12:00) newest
1 (11:00) new
When it tries to load more, it searches for IDs that are smaller than number 2, but since 2,147,483,647 is bigger therefore it would not return back the "oldest" and "old" posts.
Should I worry about this?
How does big companies handle that much data? After a while they start a new table?
According to the MySQL website, the unsigned bigint can go up to 18446744073709551615. If you insert 1 million records per second 24x7, it will take 584542 years to reach the limit. So I don't think you should worry too much.
Here is an example :
CREATE TABLE foo (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
)
Note that the 20 stands for the number of digits to be displayed and has nothing to do with storage.
Why not adding another column ?
ALTER TABLE `files` ADD `real_id` BIGINT NOT NULL AFTER `id`;
So before adding an article, search for the last (biggest) real_id and then increment it.
You have a problem with your design.
Your id is not a real id. The id should be primary key and auto increment. There shouldn't be a case of reusing id. It's confusing.
Your id data type INT is not enough to support real life data. As suggested by other developers, change it to bigint.
MySQL internally stores date time or timestamp as integer. The size impact is small. But you should use Id only (order by id desc, instead of time) and make sure it's primary key. This will make your query very fast because it's directly working on the clustered index
Related
I have a PHP script pulling a JSON file that is static and updates every 10 seconds. It has details about some events that happen and it just adds to the top of the JSON file. I then insert them into a MySQL database.
Because I have to pull every event every time I pull the file, I will only be inserting new events. The easy way would be to search for the event in the database (primary keys are not the same), but I am talking about ~4000 events every day, and I do not want that many queries just to see if it exists.
I am aware of INSERT IGNORE, but it looks like it only uses PRIMARY_KEY to do this.
What can I do (preferably easily) to prevent duplicates on two keys?
Example:
I have a table events with the following columns:
ID (irrelevant, really)
event_id (that I need to store from the source)
action_id (many action_ids belong to one event_id)
timestamp
whatever...
And my data is my JSON comes out on the first pull like this:
event_id|action_id|...
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
2 | 3
Then the next pull is this:
event_id|action_id|...
1 | 1
1 | 2
1 | 3
1** | 4**
1** | 5**
2 | 1
2 | 2
2 | 3
2** | 4**
I only want the rows marked with asterisks to be inserted, and the others to be ignored. Remember, primary_key column id is completely in this table, and I just use it for ubiquity.
What command can I use to "INSERT" every event I pull, but ONLY adding those that aren't duplicated by way of the two columns event_id and action_id.
Thanks.
Create a unique index of both columns.
CREATE
UNIQUE INDEX event_action
ON tablename (event_id, action_id)
I'm attempting to build a database that stores messages for multiple users. Each user will be able to send/receive 5 different message "types" (strictly a label, actual data types will be the same). My initial thought was to create multiple tables for each user, representing the 5 different message types. I quickly learned this is not such a good idea. My next thought was to create 1 table per message type with a users column, but I'm not sure that's the best method either from a performance perspective. What happens if user 1 sends 100 message type 1's, while user 3 only sends 10? The remaining fields would be null values, and I'm really not sure if that makes a difference or not. Thoughts? Suggestions and/or suggested reading? Thank you in advance!
No, that (the idea given in the subject of this question) will be tremendously inefficient. You'll need to introduce a new table each time a new user is created, and querying all them at once would be a nightmare.
It's far easier to be done with a single table for storing information about message. Each row in this table will correspond to one - and only - message.
Besides, this table should probably have three 'referential' columns: two for linking a specific message to its sender and receiver, and one for storing its type, that can be assigned only a limited set of values.
For example:
MSG_ID | SENDER_ID | RECEIVER_ID | MSG_TYPE | MSG_TEXT
------------------------------------------------------
1 | 1 | 2 | 1 | .......
2 | 2 | 1 | 1 | #######
3 | 1 | 3 | 2 | $$$$$$$
4 | 3 | 1 | 2 | %%%%%%%
...
It'll be quite easy to get both all the messages sent by someone (with WHERE sender_id = %someone_id% clause), sent to someone (WHERE receiver_id = %someone_id%), of some specific type (WHERE msg_type = %some_type%). But what's best of it, one can easily combine these clauses to set up more sophisticated filters.
What you initially thought of, it seems, looks like this:
IS_MSG_TYPE1 | IS_MSG_TYPE2 | IS_MSG_TYPE3 | IS_MSG_TYPE4
---------------------------------------------------------
1 | 0 | 0 | 0
0 | 1 | 0 | 0
0 | 0 | 1 | 0
It can be NULLs instead of 0, the core is still the same. And it's broken. Yes, you can still get all the messages of a single type with WHERE is_msg_type_1 = 1 clause. But even such an easy task as getting a type of specific message becomes, well, not so easy: you'll have to check each of these 5 columns until you find the one that has truthy value.
The similar difficulties expect the one who tries to count the number of messages of each types (which is almost trivial with the structure given above: COUNT(msg_id)... GROUP BY msg_type.
So please, don't do this. ) Unless you have a very strong reason not to, try to structure your tables so that with the time passing by they will grow in height - not in width.
The remaining fields would be null values
Except if you're designing your database vertically, there will be no remaining fields.
user int
msgid int
msg text
create table `tv_ge_main`.`Users`(
`USER_ID` bigint NOT NULL AUTO_INCREMENT ,
`USER_NAME` varchar(128),
PRIMARY KEY (`ID`)
)
create table `tv_ge_main`.`Message_Types`(
`MESSAGE_TYPE_ID` bigint NOT NULL AUTO_INCREMENT ,
`MESSAGE_TYPE` varchar(128),
PRIMARY KEY (`ID`)
)
create table `tv_ge_main`.`Messages`(
`MESSAGE_ID` bigint NOT NULL AUTO_INCREMENT ,
`USER_ID` bigint ,
`MESSAGE_TYPE_ID` bigint ,
`MESSAGE_TEXT` varchar(255) ,
PRIMARY KEY (`ID`)
)
Thanks for reading.
This is not a coding question as much as it is a logic one. But if my current logic is wrong, some coding help would be appreciated.
I have made a table on my database which is a log of everything that happens on my site.
When a user registers, it's saved. When he logs, again. And so on. Each action is represented by a number.
The data looks like this
----------------------------
| id | action | timestamp |
----------------------------
| 1 | 1 | 1299132900 |
| 2 | 2 | 1346876672 |
| 3 | 14 | 1351983948 |
| 4 | 1 | 1359063373 |
----------------------------
ID and action are of type INT(11) and timestamp is TIMESTAMP
I'm using a query to retrieve all records from the last 30 days.
SELECT id, action, timestamp FROM log WHERE timestamp >= DATE_SUB( CURDATE(),INTERVAL 30 DAY)
It works, and gives me all the correct values.
I need to arrange this data to make a graphic in flot.
As I see it, there are 2 steps:
Group the results by action number.
Then, inside each group, separate values by date, so the X axis of the graphic is date and Y axis is count.
With those arrays I could make different javascript data arrys to pass to flot.
Am I on the right track?
Should there be several mysql queries, or a GROUP BY clause?
I'm kind of lost here and would appreciate any help.
I have a report I'm rewriting for an application using MySQL as the database. Currently, the report is using a lot of grunt work coming from php, which creates arrays, re-stores them into a temp database then generates results from that temp DB.
One of the main goals from rewriting a bulk of all this code is to simplify and clean a lot of my old code and am wondering whether the below process can be simplified, or even better done solely on MySQL to let php just handle the dstribution of the data to the client.
I will use a made up scenario to describe what I am attempting to do:
Let's assume the following table (please note in real app, this table's information is actually pulled from several tables, but this should get the point across for clarity):
+----+-----------+--------------+--------------+
| id | location | date_visited | time_visited |
+----+-----------+--------------+--------------+
| 1 | place 1 | 2012-04-20 | 11:00:00 |
+----+-----------+--------------+--------------+
| 2 | place 2 | 2012-04-20 | 11:06:00 |
+----+-----------+--------------+--------------+
| 3 | place 1 | 2012-04-20 | 11:06:00 |
+----+-----------+--------------+--------------+
| 4 | place 3 | 2012-04-20 | 11:20:00 |
+----+-----------+--------------+--------------+
| 5 | place 2 | 2012-04-20 | 11:21:00 |
+----+-----------+--------------+--------------+
| 6 | place 1 | 2012-04-20 | 11:22:00 |
+----+-----------+--------------+--------------+
| 7 | place 3 | 2012-04-20 | 11:23:00 |
+----+-----------+--------------+--------------+
The report I need requires me to first list each location and then the number of visits made to that place. However, the caveat and what makes the query difficult for me is that there needs to be a time interval met for the visit to count whithin this report.
For example: Let's say the interval between visits to any given place is 10 minutes.
The first entry is locked in automatically because there are no previous entries, and so is the second since there are no other entries for 'place 2' yet. However on the third entry, place 1 is checked for the last time it was visited, which was less than the interval defined (10 minutes), therefore the report would ignore this entry and move along to the next one.
In essence, we are checking on a case by case scenario where the time interval is not from the last entry, but from the last entry from the same location.
The results from the report should look something like this in the end:
+----+-----------+--------+
| id | location | visits |
+----+-----------+--------+
| 1 | place 1 | 2 |
+----+-----------+--------+
| 2 | place 2 | 2 |
+----+-----------+--------+
| 3 | place 3 | 1 |
+----+-----------+--------+
My current implementation on a basic level goes through the following steps to acquire the above result set:
MySQL query creates one temp table with a list of all the required locations and their ID.
MySQL query selects all the visit data whithin the specified time frame and passes it to PHP.
PHP & MySQL populate the temporary table with the visits data, PHP does the grunt work here.
MySQL selects data from temporary table and returns it to client for display.
My question is. Is there a way to do most of this with MySQL alone? What I've been trying to find is a way to write a MySQL query which can parse through the select statement and select only the visits which meet the above criteria and then finally groups it by location and provides me with a COUNT(*) of each group.
I really don't know if it's possible and am in hopes that one of the database gurus out there might be able to shed some light on how to do this.
Suppose you have a table (probably temporary) of a slightly different structure:
CREATE TABLE `visits` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`location` varchar(45) NOT NULL,
`visited` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `loc_vis` (`location`,`visited`)
) ENGINE=InnoDB;
INSERT INTO visits (location, visited) VALUES
('place 1', '2012-04-20 11:00:00'),
('place 2', '2012-04-20 11:06:00'),
('place 1', '2012-04-20 11:06:00'),
('place 3', '2012-04-20 11:20:00'),
('place 2', '2012-04-20 11:21:00'),
('place 1', '2012-04-20 11:22:00'),
('place 1', '2012-04-20 11:23:00');
which, as you see, has an index on (location,visited). Then the following query will use the index, that is read data in the order of the index, and return the results you expected:
SELECT
location,
COUNT(IF(#loc <> #loc:=location,
#vis:=visited,
IF(#vis + INTERVAL 10 MINUTE < #vis:=visited,
visited,
NULL))) as visit_count
FROM visits,
(SELECT #loc:='', #vis:=FROM_UNIXTIME(0)) as init
GROUP BY location;
Result:
+----------+-------------+
| location | visit_count |
+----------+-------------+
| place 1 | 2 |
| place 2 | 2 |
| place 3 | 1 |
+----------+-------------+
3 rows in set (0.00 sec)
Some explanation:
The key of the solution is that it fades out the functional nature of SQL, and uses MySQL implementation specifics (they say it is bad, never do it again!!!).
If a table has an index (an ordered representation of column values) and the index is used in a query, that means that the data from the table is read in the order of the index.
GROUP BY operation will benefit from an index (since the data is already grouped there) and will choose it if it is applicable.
All aggregating functions in SQL (except for COUNT(*) which has a special meaning) check each row, and use the value only if it is not NULL (the expression within COUNT above returns NULL for wrong conditions)
The rest is just a hacky representation of procedural iteration over a list of rows (which is read in the order of the index, that is ordered by location asc, visisted asc): I initialize some variables, if location differs from the previous row - I count it, if not - I check the interval and return NULL if it is wrong.
You can populate the temporary table using a INSERT / SELECT statement.
See manual. http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
I'd use the GROUP BY in the SELECT statement to narrow down the places.
For the visits column that can be populated as a COUNT operation, and I think it might be possible to perform that as also part of the INSERT / SELECT.
See manual. http://dev.mysql.com/doc/refman/5.1/en/counting-rows.html
So your SQL might look something like this.
INSERT INTO temp
SELECT * FROM (
SELECT *,COUNT('visits')
FROM source AS table1
GROUP BY location
WHERE date_visited > xxxx AND date_visited < xxxx
)
AS table2
Seriously, that is off the top of my head but it should give you some ideas on how SQL can be structured. But you likely can do the report using just one good query.
I'm developing a QA web-app which will have some points to evaluated assigned to one of the following Categories.
Call management
Technical skills
Ticket management
As this aren't likely to change it's not worth making them dynamic but the worst point is that points are like to.
First I had a table of 'quality' which had a column for each point but then requisites changed and I'm kinda blocked.
I have to store "evaluations" that have all points with their values but maybe, in the future, those points will change.
I thought that in the quality table I could make some kind of string that have something like that
1=1|2=1|3=2
Where you have sets of ID of point and punctuation of that given value.
Can someone point me to a better method to do that?
As mentioned many times here on SO, NEVER PUT MORE THAN ONE VALUE INTO A DB FIELD, IF YOU WANT TO ACCESS THEM SEPERATELY.
So I suggest to have 2 additional tables:
CREATE TABLE categories (id int AUTO_INCREMENT PRIMARY KEY, name VARCHAR(50) NOT NULL);
INSERT INTO categories VALUES (1,"Call management"),(2,"Technical skills"),(3,"Ticket management");
and
CREATE TABLE qualities (id int AUTO_INCREMENT PRIMARY KEY, category int NOT NULL, punctuation int NOT nULL)
then store and query your data accordingly
This table is not normalized. It violates 1st Normal Form (1NF):
Evaluation
----------------------------------------
EvaluationId | List Of point=punctuation
1 | 1=1|2=1|3=2
2 | 1=5|2=6|3=7
You can read more about Database Normalization basics.
The table could be normalized as:
Evaluation
-------------
EvaluationId
1
2
Quality
---------------------------------------
EvaluationId | Point | Punctuation
1 | 1 | 1
1 | 2 | 1
1 | 3 | 2
2 | 1 | 5
2 | 2 | 6
2 | 3 | 7