I'm building a small messaging system for my app, the primary idea is to have 2 tables.
Table1 messages
Id,sender_id,title,body,file,parent_id
Here is where messages are stored, decoupled from whom will receive it to allow for multiple recipients.
Parent I'd link to parent message if its a reply, and file is a blob to store single file attached to message
Table 2 message_users
Id,thread_id,user_id,is_read,stared,deleted
Link parent thread to target users,
Now for a single user to get count of unread messages I can do
Select count(*) from message_users where user_id = 1 and is_read is null
To get count of all messages in his inbox I can do
Select count(*) from message_users where user_id = 1;
Question is how to combine both in single optimized query ?
So you're trying to achieve something that will total rows that meet one condition and the total rows that meet an extra condition:
|---------|---------|
| total | unread |
|---------|---------|
| 20 | 12 |
|---------|---------|
As such will need something with a form along the lines of:
SELECT A total, B unread FROM message_users WHERE user_id=1
A is fairly straightforward, you already more-or-less have it: COUNT(Id).
B is marginally more complicated and might take the form SUM( IF(is_read IS NULL,1,0) ) -- add 1 each time is_read is not null; the condition will depend on your database specifics.
Or B might look like: COUNT(CASE WHEN is_read IS NULL THEN 1 END) unread -- this is saying 'when is_read is null, count another 1'; the condition will depend on your database specifics.
In total:
SELECT COUNT(Id) total, COUNT(CASE WHEN is_read IS NULL THEN 1 END) unread FROM message_users WHERE user_id=1
Or:
SELECT COUNT(Id) total, SUM( IF(is_read IS NULL,1,0) ) unread FROM message_users WHERE user_id=1
In terms of optimised, I'm not aware of a query that can necessarily go quicker than this. (Would love to know of it if it does exist!) There may be ways to speed things up if you have a problem with performance:
Examine your indexes: use the built in tools EXPLAIN and some reading around etc.
Use caches and/or pre-compute the value and store it elsewhere -- e.g. have a field unread_messages against user and grab this value directly. Obviously there will need to be some on-write invalidation, or indeed some service running to keep these values up to date. There are many ways of achieving this, tools in MySQL, hand roll your own etc etc.
In short, start optimising from a requirement and some real data. (My query takes 0.8s, I need the results in 0.1s and they need to be consistent 100% of the time -- how can I achieve this?) Then you can tweak and experiment with the SQL, hardware that the server runs on (maybe?), caching/pre-calculate at different points etc.
In MySQL, when you count a field, it only counts non null occurrences of that field, so you should be able to do something like this:
SELECT COUNT(user_id), COUNT(user_id) - COUNT(is_read) AS unread
FROM message_users
WHERE user_id = 1;
Untested, but it should point you in the right direction.
You can use sum with CASE WHEN clause. If is_read is null then +1 is added to the sum, else +0.
SELECT count(*),
SUM(CASE WHEN is_read IS NULL THEN 1 ELSE 0 END) AS count_unread
FROM message_users WHERE user_id = 1;
Related
I have a query over two tables -- matchoverview
id, home_id, away_id, date, season, result
matchattributes
id, game_id, attribute_id, attribute_value
My query
select m.id from matchOverview m
join matchAttributes ma on ma.match_id=m.id and ma.attribute_id in (3,4,5,6)
group by m.id
having sum(case when ma.attribute_id in (3,4)
then ma.attribute_value end) > 3
or sum(case when ma.attribute_id in (5,6)
then ma.attribute_value end) > 3;
Which returns all match ids where the sum of attributes 3 and 4 or 5 and 6 is greater than 3.
This particular query returns 900k rows, unsurprisingly in phpmyadmin this query takes a deal of time, as I imagine it needs to format the results into a table, but it clocks the query at .0113 seconds.
Yet when I make this query over PHP it takes 15 seconds, if I alter the query to LIMIT to only 100 results, it runs almost instantly, leaving me with the belief the only possibility being the amount of data being transferred is what is slowing it.
But would it really take 15 seconds to transfer 1M 4 byte ints over the network?
Is the only solution to further limit the query so that it returns less results?
EDIT
Results of an EXPLAIN on my query
id select_type table type key key key_len ref rows Extra
1 SIMPLE m index PRIMARY PRIMARY 4 NULL 2790717 Using index
1 SIMPLE ma ref match,attribute match 4 opta_matches2.m.id 2 Using where
How I am timing my SQL query
$time_pre = microtime(true);
$quer = $db->query($sql);
$time_post = microtime(true);
$exec_time = $time_post - $time_pre;
Data from slow query log
# Thread_id: 15 Schema: opta_matches2 QC_hit: No
# Query_time: 15.594386 Lock_time: 0.000089 Rows_sent: 923962 Rows_examined: 15688514
# Rows_affected: 0 Bytes_sent: 10726615
I am ok with dealing with a 15 second query if it is because that is how long it takes the data to move over the network, but if the query or my table can be optimized that is the best solution
The row count is not the issue, the following query
select m.id from matchOverview m
join matchAttributes ma on ma.match_id=m.id and ma.attribute_id in (1,2,3,4)
group by m.id
having sum(case when ma.attribute_id in (3,4)
then ma.attribute_value end) > 8
and sum(case when ma.attribute_id in (1,2)
then ma.attribute_value end) = 0;
returns only 24 rows but also takes ~15 seconds
phpMyAdmin doesn't give you all results,
it also using limit to default 25 results.
If you change this limit by changing "Number of rows" select box or type the limit in query, It will take more time to run the query.
I think if you rewrote the conditions, at a minimum you might find something out. For instance, I think this does the same as the second example (the 24 results one);
SELECT
m.id
, at.total_12
, at.total_34
FROM matchOverview AS m
JOIN (
SELECT
m.id
, SUM(IF (ma.attribute_id IN(1,2), ma.attribute_value, 0)) AS total_12
, SUM(IF (ma.attribute_id IN(3,4), ma.attribute_value, 0)) AS total_34
FROM matchAttributes AS ma
WHERE m.id = ma.match_id
AND ma.attribute_id IN(1,2,3,4)
GROUP BY m.id
) AS at
WHERE at.total_12 > 0
AND at.total_34 > 8
It's more verbose, but it could help triangulate where the bottleneck(s) come from more readily.
For instance, if (a working) version of the above is still slow, then run the inner query with the GROUP BY intact. Still slow? Remove the GROUP BY. Move the GROUP BY/SUM into the outer query, what happens?
That kinda thing. I can't run it so I can't work out a more precise answer, which I would like to know.
There are probably two significant parts to the timing: Locate the rows and decide which ids to send; then send them. I will address both.
Here's a way to better separate the elapsed time for just the query (and not the network): SELECT COUNT(*) FROM (...) AS x; Where '...' is the 1M-row query.
Speeding up the query
Since you aren't really using matchoverview, let's get rid of it:
select ma.match_id
from matchAttributes ma
WHERE ma.attribute_id in (3,4,5,6)
group by ma.match_id
having sum(case when ma.attribute_id in (3,4) then ma.attribute_value end) > 3
or sum(case when ma.attribute_id in (5,6) then ma.attribute_value end) > 3;
And have a composite index with the columns in this order:
INDEX(attribute_id, attribute_value, match_id)
As for the speedy LIMIT, that is because it can stop short. But a LIMIT without an ORDER BY is rather meaningless. If you add an ORDER BY, it will have to gather all the results, sort them, and finally perform the LIMIT.
Network transfer time
Transferring millions of rows (I see 10.7MB in the slowlog) over the network is time-consuming, but takes virtually no CPU time.
One EXPLAIN implies that there might be 2.8M rows; is that about correct? The slowlog says that about 16M rows are touched -- this may be because of the two tables, join, group by, etc. My reformulation and index should decrease the 16M significantly, hence decrease the elapsed time (before the network transfer time).
923K rows "sent" -- What will the client do with that many rows. In general, I find that more than a few thousand rows "sent" indicates poor design.
"take 15 seconds to transfer 1M 4 byte ints over the network" -- That is elapsed time, and cannot be sped up except by sending fewer rows. (BTW, it is probably sent as strings of several digits, plus overhead for each row; I don't whether the 10726615 is actual network bytes or counts only the ints.)
"the ids are used in an internal calculation" -- How do you calculate with ids? If you are looking up the ids in some other place, perhaps you can add complexity to the query, thereby doing more work before hitting the network; then shipping less data?
If you want to discuss further, please provide SHOW CREATE TABLE. (It may have some details that don't show up in your simplified table definition.)
Beginner here so please go easy on me :)
So I have these two tables in my DB
Reply Table
+------------------------------------------------+
| message_id | client_id | message | date_posted |
+------------------------------------------------+
Request Table (Exactly the same)
+------------------------------------------------+
| message_id | client_id | message | date_posted |
+------------------------------------------------+
Problem:
They serve a messaging app I was testing but now I don't know how to query these tables to get all chat ordered by date from two tables. For example
Client 14 (2 hours ago): Hello there // Coming from request table
Admin (1 hour ago): Welcome // Coming from reply table
So the messages are displayed oldest first...
I tried using JOIN on clien_id since that is what I want. However, it doesn't seem to work.
I also tried selecting from a subquery containing UNION ALL, also no luck... Any ideas on how this can be done? Thanks in advance!
A union is what you're looking for. In your case, a join would combine columns from the two tables into a single row, where as you're looking to union rows from multiple tables into a single result set.
You'll want to enclose your select statements individually, and then add the order clause.
Edit: Updating this answer to include a column for the source table, as per OP's comment
(select source='reply_table', * from reply_table)
union
(select source='request_table', * from request_table)
order by date_posted desc
MySQL's docs are pretty good, and its page on unions outlines several sorting scenarios: https://dev.mysql.com/doc/refman/5.7/en/union.html
But the instruction specific to your case is:
To use an ORDER BY or LIMIT clause to sort or limit the entire UNION result, parenthesize the individual SELECT statements and place the ORDER BY or LIMIT after the last one.
select a.message
from table1 a
inner join
table2 b
on a.client_id=b.client_id
order by a.date_posted desc;
in our project we've got an user table where userdata with name and different kind of scores (overall score, quest score etc. is stored). How the values are calculated doesn't matter, but take them as seperated.
Lets look table 'users' like below
id name score_overall score_trade score_quest
1 one 40000 10000 20000
2 two 20000 15000 0
3 three 30000 1000 50000
4 four 80000 60000 3000
For showing the scores there are then a dummy table and one table for each kind of score where the username is stored together with the point score and a rank. All the tables look the same but have different names.
id name score rank
They are seperated to allow the users to search and filter the tables. Lets say there is one row with the player "playerX" who has rank 60. So if I filter the score for "playerX" I only see this row, but with rank 60. That means the rank are "hard stored" and not only displayed dynamically via a rownumber or something like that.
The different score tables are filled via a cronjob (and under the use of a addional dummy table) which does the following:
copies the userdata to a dummy table
alters the dummy table by order by score
copies the dummy table to the specific score table so the AI primary key (rank) is automatically filled with the right values, representing the rank for each user.
That means: Wheren there are five specific scores there are also five score tables and the dummy table, making a total of 6.
How to optimize?
What I would like to do is to optimize the whole thing and to drop duplicate tables (and to avoid the dummy table if possible) to store all the score data in one table which has the following cols:
userid, overall_score, overall_rank, trade_score, trade_rank, quest_score, quest_rank
My question is now how I could do this the best way and is there another way as the one shown above (with all the different tables)? MYSQL-Statements and/or php-code is welcome.
Some time ago I tried using row numbers but this doesn't work a) because they can't be used in insert statements and b) because when filtering every player (like 'playerX' in the example above) would be on rank 1 as it's the only row returning.
Well, you can try creating a table with the following configuration:
id | name | score_overall | score_trade | score_quest | overall_rank | trade_rank | quest_rank
If you do that, you can use the following query to populate the table:
SET #overall_rank:=-(SELECT COUNT(id) FROM users);
SET #trade_rank:=#overall_rank;
SET #quest_rank:=#overall_rank;
SELECT *
FROM users u
INNER JOIN (SELECT id,
#overall_rank:=#overall_rank+1 AS overall_rank
FROM users
ORDER BY score_overall DESC) ovr
ON u.id = ovr.id
INNER JOIN (SELECT id,
#trade_rank:=#trade_rank+1 AS trade_rank
FROM users
ORDER BY score_trade DESC) tr
ON u.id = tr.id
INNER JOIN (SELECT id,
#quest_rank:=#quest_rank+1 AS quest_rank
FROM users
ORDER BY score_quest DESC) qr
ON u.id = qr.id
ORDER BY u.id ASC
I've prepared an SQL-fiddle for you.
Although I think performance will weigh in if you start getting a lot of records.
A bit of explanation: the #*_rank things are SQL variables. They get increased with 1 on every new row.
Info: I have this table (PERSONS):
PERSON_ID int(10)
POINTS int(6)
4 OTHER COLUMNS which are of type int(5 or 6)
The table consist of 25M rows and is growing 0.25M a day. The distribution of points is around 0 to 300 points and 85% of the table has 0 points.
Question: I would like to return to the user which rank he/she has if they got at least 1 point. How and where would be the fastest way to do it, in SQL or PHP or combination?
Extra Info: Those lookups can happen every second 100 times. The solutions I have seen so far are not fast enough, if more info needed please ask.
Any advice is welcome, as you understand I am new to PHP and MySQL :)
Create an index on t(points) and on t(person_id, points). Then run the following query:
select count(*)
from persons p
where p.points >= (select points from persons p where p.person_id = <particular person>)
The subquery should use the second index as a lookup. The first should be an index scan on the first index.
Sometimes MySQL can be a little strange about optimization. So, this might actually be better:
select count(*)
from persons p cross join
(select points from persons p where p.person_id = <particular person>) const
where p.points > const.points;
This just ensures that the lookup for the points for the given person happens once, rather than for each row.
Partition your table into two partitions - one for people with 0 points and one for people with one or more points.
Add one index on points to your table and another on person_id (if these indexes don't already exist).
To find the dense rank of a specific person, run the query:
select count(distinct p2.points)+1
from person p1
join person p2 on p2.points > p1.points
where p1.person_id = ?
To find the non-dense rank of a specific person, run the query:
select count(*)
from person p1
join person p2 on p2.points >= p1.points
where p1.person_id = ?
(I would expect the dense rank query to run significantly faster.)
I have a table with fields id, votes(for each users), rating.
Task: Counting user rating based on votes for him and for others. that is, each time i update the field votes needed recalculation field rating.
Which means some can be on the 3rd place. voted for him and that he would be stood up to 2rd place, and the other vice versa - from 2 to 3. (in rating fiels)
How to solve this problem? Each time update the field to count users ratings on php and do a lot of update query in mysql is very expensive.
If you want to get the ratings with a select without having a rating column, then this is the way. However from a performance perspective I cannot guarantee this will be your best option. The way it works is that if two users have the same amount of votes they will have the same rating and then it will skip ahead the necessary number for the next different rating:
set #rating:=0;
set #count:=1;
select id,
case when #votes<>votes then #rating:=#rating+#count
else #rating end as rating,
case when #votes=votes then #count:=#count+1
else #count:=1 end as count,
#votes:=votes as votes
from t1
order by votes desc
sqlfiddle
This gives you an extra column which you can ignore, or you could wrap this select in to a subquery and have:
select t2.id,t2.votes,t2.rating from (
select id,
case when #votes<>votes then #rating:=#rating+#count
else #rating end as rating,
case when #votes=votes then #count:=#count+1
else #count:=1 end as count,
#votes:=votes as votes
from t1
order by votes desc) as t2
but the sqlfiddle is strangely giving inconsistent results so you'd have to do some testing. If anyone knows why this is I'd be interested in knowing the reason.
If you want to get the rating for just one user then doing the subquery option and using a where after the from should give you the desired result. sqlfiddle - but again, inconsistent results, run it a few times and sometimes it gives rating as 10 other times as 30. I think testing in your db to see what happens will be best.
Well it depends on a lot of factors
Do you have a large system that is growing exponentially?
Do you require the voting data for historical reporting?
Do users need to register when they vote?
Will this system be use only for one voting type throughout the system life cycle or will more voting on different subjects take place?
If all of the answers are NO then your current update method will work just fine. Just ensure that you apply best coding and MySQL table practices anyway.
Let assume most or all your answers were YES then I would suggest the following:
Every time a vote takes place INSERT the record into your table
Using INSERT, add a timestamp, user id if not possible then maybe an ip address/location
Assign a subject id as foreign key from the vote_subject table. In this table store the subject and date of voting
Now you can create a SELECT statement that can count the votes and calculate the ratings. The person top of the vote count list will get rating 1 in the SELECT. Furthermore you can filter per subject, per day, per user and you should also be able to determine volume depending on the result required.
All this of course dependent on how your system will scale in future. This might be way overkill but something to think about.
Yes aggregations are expensive. You could update a rank table every five minutes or so and query from there. The query as you probably already now is this:
select id, count(*) as votes
from users
group by id
order by votes desc
Instead of having the fields id, votes and rating, alter the table to have the fields id, rating_sum and rating_count. Each time you have a new rating you quering the database like this:
"UPDATE `ratings` SET `rating_count` = `rating_count` + 1, `rating_sum` = `rating_sum`+ $user_rating WHERE `id` = $id"
Now the rating is just the average -> rating_sum / rating_count. No need to have a field with the rating.
Also, to prevent a user rate more than one times, you could create a table named rating_users that will have 2 foreign keys the users.id and ratings.id. The primary key will be (users.id, ratings.id). So each time a user tries to rate first you check this table.
I would recommend doing this when querying the data. It would be much simpler. Order by votes descending.
Perhaps create a view and use the view when querying the data.
You could try something like this:
SET #rank := 0
select id, count(*) as votes, #rank := #rank + 1
from users
group by id
order by votes desc
Or
SET #rank := 0
select id, votes, #rank := #rank + 1
from users
order by votes desc