i'm having a good time coding a little visitor counter. it's a PHP5/SQLite3 mix.
made two database tables, one for the visitors, and one for the hits. structure and sample data:
CREATE TABLE 'visitors' (
'id' INTEGER DEFAULT NULL PRIMARY KEY AUTOINCREMENT,
'ip' TEXT DEFAULT NULL,
'hash' TEXT DEFAULT NULL,
UNIQUE(ip)
);
INSERT INTO "visitors" ("id","ip","hash") VALUES ('1','1.2.3.4','f9702c362aa9f1b05002804e3a65280b');
INSERT INTO "visitors" ("id","ip","hash") VALUES ('2','1.2.3.5','43dc8b0a4773e45deab131957684867b');
INSERT INTO "visitors" ("id","ip","hash") VALUES ('3','1.2.3.6','9ae1c21fc74b2a3c1007edf679c3f144');
CREATE TABLE 'hits' (
'id' INTEGER DEFAULT NULL PRIMARY KEY AUTOINCREMENT,
'time' INTEGER DEFAULT NULL,
'visitor_id' INTEGER DEFAULT NULL,
'host' TEXT DEFAULT NULL,
'location' TEXT DEFAULT NULL
);
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('1','1418219548','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('2','1418219550','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('3','1418219553','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('4','1418219555','2','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('5','1418219557','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('6','1418219558','3','localhost','/some/path/example.php');
i now want to fetch the visitors data, but only from those who where active in the last 30 seconds for example. i need the following data as output, here with user id 1 as example:
$visitor = Array(
[id] => 1
[ip] => 1.2.3.4
[hash] => f9702c362aa9f1b05002804e3a65280b
[first_hit] => 1418219548
[last_hit] => 1418219557
[last_host] => localhost
[last_location] => /some/path/example.php
[total_hits] => 4
[idle_since] => 11
)
i'll get this with my current query, all good, but as you can see i need a lot of sub-selects for this:
SELECT
visitors.id,
visitors.ip,
visitors.hash,
(SELECT hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id ASC LIMIT 1) AS first_hit,
(SELECT hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_hit,
(SELECT hits.host FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_host,
(SELECT hits.location FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_location,
(SELECT COUNT(hits.id) FROM hits WHERE hits.visitor_id = visitors.id) AS total_hits,
(SELECT strftime('%s','now') - hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS idle_since
FROM visitors
WHERE idle_since < 30
ORDER BY last_hit DESC
so, is this ok for my use case or do you know a better approach to get this data out of those two tables? i already played around with JOINS, but no matter how i tweaked it, COUNT() gave me wrong outputs, like user id 1 has only one total hit for example.
i probably have to re-model the database, if i wanna use JOINS properly, i guess.
Update: based on AeroX' Answer i've built the new query. it basically had just one little bug. you can't have MAX() in a WHERE clause. using HAVING now after the GROUPING.
i also tested both the old and the new one with EXPLAIN and EXPLAIN QUERY PLAN. looks much better. Thank you guys!
SELECT
V.id,
V.ip,
V.hash,
MIN(H.time) AS first_hit,
MAX(H.time) AS last_hit,
strftime('%s','now') - MAX(H.time) AS idle_since,
COUNT(H.id) AS total_hits,
LH.host AS last_host,
LH.location AS last_location
FROM visitors AS V
INNER JOIN hits AS H ON (V.id = H.visitor_id)
INNER JOIN (
SELECT visitor_id, MAX(id) AS id
FROM hits
GROUP BY visitor_id
) AS L ON (V.id = L.visitor_id)
INNER JOIN hits AS LH ON (L.id = LH.id)
GROUP BY V.id, V.ip, V.hash, LH.host, LH.location
HAVING idle_since < 30
ORDER BY last_hit DESC
You probably want to clean this up but this should give you the idea of how to make the joins and how to use the GROUP BY statement to aggregate your hits table for each visitor. This should be more efficient then using lots of sub-queries.
I've included comments on the joins so that you can see why I'm making them.
SELECT
V.id,
V.ip,
V.hash,
MIN(H.time) AS first_hit,
MAX(H.time) AS last_hit,
COUNT(H.id) AS total_hits,
strftime('%s','now') - MAX(H.time) AS idle_since,
LH.host AS last_host,
LH.location AS last_location
FROM visitors AS V
-- Join hits table so we can calculate aggregates (MIN/MAX/COUNT)
INNER JOIN hits AS H ON (V.id = H.visitor_id)
-- Join a sub-query as a table which contains the most recent hit.id for each visitor.id
INNER JOIN (
SELECT visitor_id, MAX(id) AS id
FROM hits
GROUP BY visitor_id
) AS L ON (V.id = L.visitor_id)
-- Use the most recent hit.id for each visitor.id to fetch that most recent row (for last_host/last_location)
INNER JOIN hits AS LH ON (L.id = LH.id)
GROUP BY V.id, V.ip, V.hash, LH.host, LH.location
HAVING idle_since < 30
ORDER BY last_hit DESC
One of the best ways to measure query performance is using explain.
From sqlite
The EXPLAIN QUERY PLAN SQL command is used to obtain a high-level
description of the strategy or plan that SQLite uses to implement a
specific SQL query. Most significantly, EXPLAIN QUERY PLAN reports on
the way in which the query uses database indices. This document is a
guide to understanding and interpreting the EXPLAIN QUERY PLAN output.
Background information is available separately:
Notes on the query optimizer.
How indexing works.
The next generation query planner.
An EXPLAIN QUERY PLAN command returns zero or more rows of four
columns each. The column names are "selectid", "order", "from",
"detail". The first three columns contain an integer value. The final
column, "detail", contains a text value which carries most of the
useful information.
EXPLAIN QUERY PLAN is most useful on a SELECT statement, but may also
be appear with other statements that read data from database tables
(e.g. UPDATE, DELETE, INSERT INTO ... SELECT).
An example of an explain query is:
EXPLAIN SELECT * FROM COMPANY WHERE Salary >= 20000;
http://www.tutorialspoint.com/sqlite/sqlite_explain.htm
Below are more complex usage examples.
How can I analyse a Sqlite query execution?
Related
Data are taking too much time to load when I am searching with specified date. In my project has two table, In one table has unique entries and I added unique index to "request_id" and primary index to auto incremented "id" . another table have multiple records with request_id and in that I added only a primary index to auto incremented "id". Now I am to search these all record through join in both table to check the count for a every "request_id".
I am using below query:-
SELECT
m.id,m.request_id as id,count(m.request_id) as count,m.reqtype,m.request_time,w.status as status,w.updated_time as updated_time,w.reg_date as reg_date
FROM
multi_requests m JOIN unique w ON m.request_id = w.request_id
WHERE
m.request_time
Between
'2015-07-05'
AND
'2015-07-06'
GROUP BY
m.request_id
ORDER BY
m.id asc
LIMIT
0,10" ;
I also try to add index to "request_id" in multi_requests table. But when I am adding Index to "request_id" and searching with above query its not showing any type of records on UI.
In multi_requests table has total records = 6033030.
So please suggest me..
This is your query:
SELECT m.id, m.request_id as id, count(m.mac_address) as count, m.reqtype,
m.request_time, w.status as status, w.updated_time as updated_time,
w.reg_date as reg_date
FROM multi_requests m JOIN
unique w
ON m.mac_address = w.mac_address
WHERE m.request_time Between '2015-07-05' AND '2015-07-06'
GROUP BY m.request_id
ORDER BY m.id asc
LIMIT 0, 10 ;
It is a bit strange, because you have a ton of columns in the select, but only one in the group by. Let me assume that you know what you are doing.
For this query, the best indexes are on multi_requests(request_time, mac_address, request_id) and unique(mac_address).
(Apologies if this a duplicate - I have tried searching, but I may not know the right word for what I'm trying to achieve - feel free to correct me!)
The Background
So I have a PHP based app (Codeigniter, but I'm using normal SQL language for this part), that has a MySQL database, with 2 tables - 'contact' and 'order'.
For simplicity, let's assume that:
'contact' has 3 cols : Id, FirstName, LastName
'order' has 4 cols : Id, ContactId, ItemBought, ItemValidDate
Example of a row in 'order' table: 22, 11, Adult Membership, 2012/13
Id is primary key of both tables, ContactId is foreign key for 'contact' and ItemBought and ItemValidDate are both simple varchar (we're storing 'seasons' rather than dates -I know, its not ideal but its what the client wants)
At some point, I know, I am going to have to extend this for 3 tables and use an OrderItem table, to allow an order to have multiple items, so I'd like to find a solution that can be built on. But at present, I don't even understand the basics so I've kept it to 2 tables
The Problem
I want to create a search page that allows the user to find subsets of records based on lots of different criteria.
See screenshot of search page
This form submits as an array of criteria like this:
[order_type_operator] => Array
(
[0] => equal
[1] => equalor
[2] => notequal
)
[order_type] => Array
(
[0] => Adult Membership
[1] => Adult Membership
[2] => Adult Membership
)
[order_expire] => Array
(
[0] => 2005/06
[1] => 2006/07
[2] => 2010/11
)
[submit] => Start Search
I then cycle through this array, testing to see if values have been submitted, and build up my SQL query.
So, I hope I've explained it properly, so that its clear a user may use this form to search for records that match lots of different conditions - in theory, unlimited numbers of conditions - to end up with a list of contacts that match this criteria.
What I have Tried
Example 1 - simple WHERE
"find contact records that have an order record for 'Adult Membership' in '2009/10'"
i.e. SELECT * FROM contact
JOIN order ON contact.Id = order.ContactId WHERE (order.ItemBought = 'Adult Membership' AND order.ItemValidDate = '2009/10')
This works fine.
Example 2 - WHERE OR WHERE
"find contact records that have an order record for 'Adult Membership' in '2009/10'" OR have a an order record for 'Adult Membership' in '2010/10'
i.e. SELECT * FROM contact
JOIN order ON contact.Id = order.ContactId WHERE (order.ItemBought = 'Adult Membership' AND order.ItemValidDate = '2009/10') OR (order.ItemBought = 'Adult Membership' AND order.ItemValidDate = '2010/11')
This works fine as long as EVERY condition the user is asking for is an OR query. I assume that I can build this query up using brackets and OR for as big as I like? E.g. find Adult membership in 2005/06, OR 2006/07, OR 2007/08, OR 2008/09 etc etc will be just like the above SQL with lots more brackets joined by 'OR'?
Example 3 - WHERE AND WHERE - I'm stuck!
"find contact records that have an order record for 'Adult Membership' in '2009/10' OR 2010/11 AND have a an order record for 'Adult Membership' in '2012/13'
At the moment, I've been trying UNION, however if there are more queries to follow this (e.g Adult membership in 2008 OR 2009 AND 2010) this means doing more than one SELECT. (Perhaps this is the answer?)
e.g. `SELECT * FROM contact
JOIN order ON contact.Id = order.ContactId WHERE (order.ItemBought = 'Adult Membership' AND order.ItemValidDate = '2009/10') OR (order.ItemBought = 'Adult Membership' AND order.ItemValidDate = '2010/11')
UNION
SELECT * FROM contact
JOIN order ON contact.Id = order.ContactId WHERE (order.ItemBought = 'Adult Membership' AND order.ItemValidDate = '2012/13)`
Example 4 - But does NOT have a record.... Blows my mind
"find contact records that have an order record for 'Adult Membership' in '2009/10' AND have an order record for 'Adult Membership' in '2010/10' BUT DO NOT have an order of 'Sponsorship' in 2007/08
I wondered about running these queries, storing the results in a PHP array and then doing a IN (*array of ids already selected*), but this just seems like I'm not using SQL properly.
So clever people - what am I doing wrong?
Thank you so much in advance for you help.
PS. Not asking you write the code for me!
PPS. If you know of any good tutorials then I'll happily follow them!
PPPS. If this is a duplicate, then please accept my apologies!
As ZorleQ says it can rapidly get to be a mess
For your 3rd question a possible solution using joins of subselects would be as follows
SELECT contact.*, order.*
FROM contact
INNER JOIN order
ON contact.Id = order.ContactId
INNER JOIN (SELECT DISTINCT ContactId
FROM order
WHERE (ItemBought = 'Adult Membership' AND ItemValidDate = '2009/10')
OR (ItemBought = 'Adult Membership' AND ItemValidDate = '2010/11')) Sub1
ON contact.Id = Sub1.ContactId
INNER JOIN (SELECT DISTINCT ContactId
FROM order
WHERE (ItemBought = 'Adult Membership' AND ItemValidDate = '2012/13')) Sub2
ON contact.Id = Sub2.ContactId
You could probably do this without using the subselects and just a plain join as follows
SELECT contact.*, order.*
FROM contact
INNER JOIN order
ON contact.Id = order.ContactId
LEFT OUTER JOIN order Sub1
ON contact.Id = Sub1.ContactId AND Sub1.ItemBought = 'Adult Membership' AND Sub1.ItemValidDate = '2009/10'
LEFT OUTER JOIN order Sub2
ON contact.Id = Sub2.ContactId AND Sub2.ItemBought = 'Adult Membership' AND Sub2.ItemValidDate = '2010/11'
INNER JOIN order Sub3
ON contact.Id = Sub3.ContactId AND Sub3.ItemBought = 'Adult Membership' AND Sub3.ItemValidDate = '2012/13'
WHERE Sub1.ContactId IS NOT NULL OR Sub2 IS NOT NULL
Your 4th question can be done using a LEFT OUTER JOIN to find a record with Sponsorship bought for 2007/08, and only returning rows where a match isn't found (ie, check the ContactId on the LEFT OUTER JOINed table is NULL).
SELECT contact.*, order.*
FROM contact
INNER JOIN order
ON contact.Id = order.ContactId
INNER JOIN order Sub1
ON contact.Id = Sub1.ContactId AND Sub1.ItemBought = 'Adult Membership' AND Sub1.ItemValidDate = '2009/10'
INNER JOIN order Sub2
ON contact.Id = Sub2.ContactId AND Sub2.ItemBought = 'Adult Membership' AND Sub2.ItemValidDate = '2010/10'
LEFT OUTER JOIN order Sub3
ON contact.Id = Sub3.ContactId AND Sub3.ItemBought = 'Sponsorship' AND Sub3.ItemValidDate = '2007/08'
WHERE Sub3.ContactId IS NULL
I think you have to take a step back and try to visualize your question on paper first. Examples 1 and 2 are pretty easy, but let's look at example 3.
For conditions where all your criteria are 'AND' or 'OR' - things are very simple. Just do a long WHERE, just liek before.
However, when you start mixing them you have to answer yourself a serious question:
How do you split the conditions?
Lets say someone picked up those criteria:
and A
or B
and C
This gives you so many permutations of your query! eg:
(A or B) and C
A or (B and C)
(A and C) or B
If you add one more 'OR' to it, you will end it with tens of combinations more! Leaving you in a place where you have to guess what to do. Don't even want to think what would happen if there is a NOT involved...
This is not a direct answer to your question, but more of a pointer towards a possible solution. The last time we had to do something similar, we've ended up grouping the conditions together into blocks.
You could either add a condition within a block or add a new search block. Think of the blocks as brackets in the example above. Everything in a block is an 'AND' or 'NOT AND', and between blocks you can specify 'and' or 'or'. This way you know straight away how to structure your query. This worked like a charm in a standalone application. Might be a bit tricky to implement it nicely on a page, but you catch the idea.
My solution to all issues like this where multiple criteria may or may not be provided by the user is the following... (this example is for oracle, but should be able to be done in MySQL as well)...
You pass in all the filter variables, regardless of whether they are null or filled with a value. In this example, I'll say I have 3 values the user may or may not fill that act as filters on the SELECT.
SELECT
*
FROM
table
WHERE
NVL2(InputVariable1, InputVariable1, Column1) = Column1
OR NVL2(InputVariable2, InputVariable2, Column2) = Column2
OR NVL2(InputVariable3, InputVariable3, Column3) = Column3
NVL2 - This is an oracle function. If the first value is not null, it returns the second value, otherwise it returns the third value. If you aren't using oracle, and there is no equivalent function for NVL2, simply write the function yourself.
So, using the above example, the code ALWAYS passes all three InputVariables into the select statement, even if they are NULL. By using NVL2 or an equivalent function, the comparison is between the InputVariable and the Column ONLY if the InputVariable is not null; otherwise it is between the Column and the Column, which will of course always be true, thereby effectively ignoring that filter variable, which is what you want (i.e. a null filter value matches all rows - i.e. if user does not specify LastName, then include all LastNames).
This solution allows you to use many filter variables without having to do a lot of processing up front - just pass them all down into the SELECT every time, whether they are null or not.
If you have sets of filter variables (i.e. the user enables a set of input values via a checkbox or some similar mechanism), you can do the above inside of a CASE statement. Each case should check the enable value for a given set, and return the result of evaluating the entire set of filter variables (exactly like the above). You then compare the result of the entire CASE structure to 1, as in...
WHERE
CASE [ expression ]
WHEN enableSet1
THEN NVL2(InputVariable1, InputVariable1, Column1) = Column1
OR NVL2(InputVariable2, InputVariable2, Column2) = Column2
OR NVL2(InputVariable3, InputVariable3, Column3) = Column3
WHEN condition_2 THEN result_2
...
WHEN condition_n THEN result_n
END = 1
This works because the value of a CASE structure is the result of the THEN block which was evaluated.
This will allow you to do ALL or MOST of your desired filtering within the confines of a single SELECT statement - again, without having to do a lot of pre-processing to build the SELECT.
I am writing a web app in PHP using mySQL that models an election.
I have three tables: Candidates, Elections, and Votes. Votes contains CandidateID, ElectionID and Count, which is the number of times that the given candidate was voted for in the given Election. Votes also contains TimeStamp which is the last time the row was modified which is used for breaking ties (the earlier vote wins). A candidate may have run in multiple elections. How do I find how many elections a given candidate has ever won?
All help greatly appreciated, thanks.
Some sample data:
CREATE TABLE IF NOT EXISTS `Votes` (
`ElectionID` int(11) unsigned NOT NULL,
`CandidateID` int(11) unsigned NOT NULL,
`Count` smallint(5) unsigned NOT NULL DEFAULT '0',
`stamp` int(11) unsigned NOT NULL,
PRIMARY KEY (`ElectionID`,`CandidateID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `Votes` (`ElectionID`, `CandidateID`, `Count`, `stamp`)
VALUES
(1, 1, 3, 1332897534),
(4, 1, 3, 1333149930),
(4, 4, 2, 1333149947),
(4, 5, 3, 1333149947),
(1, 4, 4, 1333153373);
Desired output: One row, with one column, being the number of wins for a certain candidate
You can write:
SELECT COUNT(1)
FROM Elections AS e
INNER
JOIN Votes AS v1 -- representing the candidate of interest
ON v1.ElectionID = e.ID
AND v1.CandidateID = ...
LEFT
OUTER
JOIN Votes AS v2 -- representing a candidate who beat the candidate of interest
ON v2.ElectionID = e.ID
AND ( v2.Count > v1.Count
OR ( v2.Count = v1.Count
AND v2.stamp < v1.stamp
)
)
WHERE v2.ElectionID IS NULL -- meaning that no candidate beat the candidate of interest
;
(It's also possible to represent either or both of those joins with EXISTS and a correlated subquery; or the first join could be changed to IN with an uncorrelated subquery; but the above is the most likely to perform best, IMHO, and my experience on StackOverflow has been that people seem to like joins better than subqueries for some reason. If you'd prefer a subquery answer, let me know.)
SELECT CandidateID, MAX(Count) FROM Votes GROUP BY ElectionID
should do the trick
Your query basically needs to return every election and WHO Won it. Then apply that result to the specific candidate your are interested in finding out how many that person won out of all elections. Ex: in the U.S. Republican Race, you have 4 candidates... 2 are really the only real considered by most regardless of party affiliation. Each party runs their campaign in each state and they all have their respective votes tallied. So, at the end of ex: 20 states, you will only have 21 winners, but who won how many. Candidate "A" may win 10, "B" wins 6, "C" wins 3 and "D" wins 2. So if you wanted to know how many Candidate "B" won, your answer desired is 6... from my impression of your question.
This will give you all qualifying "First Place" elections for a given candidate. If all you care about is the HOW MANY, you can just change the Prequery.fields to COUNT(*). If you want to get the candidate's name and the name/info of the election, you can add that as join conditions AFTER the PreQuery has been executed.
select
PreQuery.idVotes,
PreQuery.CandidateID,
PreQuery.ElectionID,
PreQuery.Votes,
PreQuery.LastEntry
from
( select
v.*,
#WinRow := if( #LastElection = v.ElectionID, #WinRow +1, 1 ) as FinalPlace,
#LastElection := v.ElectionID as ignoreMe
from
Votes v,
( select #WinRow := 0, #LastElection := 0 ) sqlvars
order by
v.ElectionID,
v.Votes DESC,
v.LastEntry ASC ) PreQuery
where
PreQuery.FinalPlace = 1
AND PreQuery.CandidateID = CandidateIDYouAreInterestedIn
Basically what you want to do is group the Votes rows by ElectionID, and order by Count descending, stamp ascending. That will give you a result set of ordered Votes rows, with the "winners" of each election as the first row within each group.
Next, you want to select these first rows within each group and discard the rest (see here for how to do a Top-N query: http://www.sqlines.com/mysql/how-to/get_top_n_each_group).
Finally, you want to select count(*) from this result set where CandidateID = whatever candidate you're looking for. Alternatively, you can group by CandidateID and leave out the where clause if you want the number of wins for all candidates instead of a specific one.
Hope this helps.
Apparently I'm late to the party, but this is the first thing I though of:
First have one subquery to select the winning count for each unique electionid in Votes, call this table wins.
Then, join wins with Votes where electionid and count are equal. Because there may be a tie, we also need to choose the Votes row with the lowest stamp, so we'll group by electionid and count but this time choose the minimum stamp. We'll call this resulting table wins_with_stamp/wws for short.
Now, wins_with_stamp has all of the rows from Votes that are "winning" rows, so selecting how many a particular candidate won is just a matter of a where candidateid = ? clause.
-- Returns how many Votes rows that is the winner of its election
-- and candidateid is the candidate in question
select count(*)
from Votes v2
right join (
-- Gets the earliest stamp for the votes with the winning count for each election
select v.electionid, v.count, min(v.stamp) as minstamp
from Votes v
right join (
-- Gets the winning count for each election
select electionid, max(count) as max
from Votes
group by electionid
) wins on wins.max = v.count and wins.electionid = v.electionid
group by electionid, count
) wws on wws.count = v2.count and wws.electionid = v2.electionid and wws.minstamp = v2.stamp
where candidateid = [YOUR_CANDIDATEID]
so I'm trying to create a ranking system for my website, however as a lot of the records have same number of points, they all have same rank, is there a way to avoid this?
currently have
$conn = $db->query("SELECT COUNT( * ) +1 AS 'position' FROM tv WHERE points > ( SELECT points FROM tv WHERE id ={$data['id']} )");
$d = $db->fetch_array($conn);
echo $d['position'];
And DB structure
`id` int(11) NOT NULL,
`name` varchar(150) NOT NULL,
`points` int(11) NOT NULL,
Edited below,
What I'm doing right now is getting records by lets say
SELECT * FROM tv WHERE type = 1
Now I run a while loop, and I need to make myself a function that will get the rank, but it would make sure that the ranks aren't duplicate
How would I go about making a ranking system that doesn't have same ranking for two records? lets say if the points count is the same, it would order them by ID and get their position? or something like that? Thank you!
If you are using MS SQL Server 2008R2, you can use the RANK function.
http://msdn.microsoft.com/en-us/library/ms176102.aspx
If you are using MySQL, you can look at one of the below options:
http://thinkdiff.net/mysql/how-to-get-rank-using-mysql-query/
http://www.fromdual.ch/ranking-mysql-results
select #rnk:=#rnk+1 as rnk,id,name,points
from table,(select #rnk:=0) as r order by points desc,id
You want to use ORDER BY. Applying on multiple columns is as simple as comma delimiting them: ORDER BY points, id DESC will sort by points and if the points are the same, it will sort by id.
Here's your SELECT query:
SELECT * FROM tv WHERE points > ( SELECT points FROM tv WHERE id ={$data['id']} ) ORDER BY points, id DESC
Documentation to support this: http://dev.mysql.com/doc/refman/5.0/en/sorting-rows.html
Many Database vendors have added special functions to their products to do this, but you can also do it with straight SQL:
Select *, 1 +
(Select Count(*) From myTable
Where ColName < t.ColName) Rank
From MyTable t
or to avoid giving records with the same value of colName the same rank, (This requires a key)
Select *, 1 +
(Select Count(Distinct KeyCol)
From myTable
Where ColName < t.ColName or
(ColName = t.ColName And KeyCol < t.KeyCol)) Rank
From MyTable t
I have 3 queries. I was told that they were potentially inefficient so I was wondering if anyone who is experienced could suggest anything. The logic is somewhat complex so bear with me.
I have two tables: shoutbox, and topic. Topic stores all information on topics that were created, while shoutbox stores all comments pertaining to each topic. Each comment comes with a group labelled by reply_chunk_id. The earliest timestamp is the first comment, while any following with the same reply_chunk_id and a later timestamp are replies. I would like to find the latest comment for each group that was started by the user (made first comment) and if the latest comment was made this month display it.
What I have written achieves that with one problem: all the latest comments are displayed in random order. I would like to organize these groups/latest comments. I really appreciate any advice
Shoutbox
Field Type
-------------------
id int(5)
timestamp int(11)
user varchar(25)
message varchar(2000)
topic_id varchar(35)
reply_chunk_id varchar(35)
Topic
id mediumint(8)
topic_id varchar(35)
subject_id mediumint(8)
file_name varchar(35)
topic_title varchar(255)
creator varchar(25)
topic_host varchar(255)
timestamp int(11)
color varchar(10)
mp3 varchar(75)
custom_background varchar(55)
description mediumtext
content_type tinyint(1)
Query
$sql="SELECT reply_chunk_id FROM shoutbox
GROUP BY reply_chunk_id
HAVING count(*) > 1
ORDER BY timestamp DESC ";
$stmt16 = $conn->prepare($sql);
$result=$stmt16->execute();
while($row = $stmt16->fetch(PDO::FETCH_ASSOC)){
$sql="SELECT user,reply_chunk_id, MIN(timestamp) AS grp_timestamp
FROM shoutbox WHERE reply_chunk_id=? AND user=?";
$stmt17 = $conn->prepare($sql);
$result=$stmt17->execute(array($row['reply_chunk_id'],$user));
while($row2 = $stmt17->fetch(PDO::FETCH_ASSOC)){
$sql="SELECT t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
JOIN topic t ON (t.topic_id = c1.topic_id)
WHERE reply_chunk_id = ? AND c1.timestamp > ?
ORDER BY c1.timestamp DESC, c1.id
LIMIT 1";
$stmt18 = $conn->prepare($sql);
$result=$stmt18->execute(array($row2['reply_chunk_id'],$month));
while($row3 = $stmt18->fetch(PDO::FETCH_ASSOC)){
Make the first query:
SELECT reply_chunk_id FROM shoutbox
GROUP BY reply_chunk_id
HAVING count(*) > 1
ORDER BY timestamp DESC
This does the same, but is faster.
Make sure you have an index on reply_chunk_id.
The second query:
SELECT user,reply_chunk_id, MIN(timestamp) AS grp_timestamp
FROM shoutbox WHERE reply_chunk_id=? AND user=?
The GROUP BY is unneeded, because only one row gets returned, because of the MIN() and the equality tests.
The third query:
SELECT t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
JOIN topic t ON (t.topic_id = c1.topic_id)
WHERE reply_chunk_id = ? AND c1.timestamp > ?
ORDER BY c1.timestamp DESC, c1.id
LIMIT 1
Doing it all in one query:
SELECT
t.user,t.reply_chunk_id, MIN(t.timestamp) AS grp_timestamp,
t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
INNER JOIN topic t ON (t.topic_id = c1.topic_id)
LEFT JOIN shoutbox c2 ON (c1.id = c2.id and c1.timestamp < c2.timestamp)
WHERE c2.timestamp IS NULL AND t.user = ?
GROUP BY t.reply_chunk_id
HAVING count(*) > 1
ORDER BY t.reply_chunk_id
or the equivalent
SELECT
t.user,t.reply_chunk_id, MIN(t.timestamp) AS grp_timestamp,
t.topic_title, t.content_type, t.subject_id,
t.creator, t.description, t.topic_host,
c1.message, c1.topic_id, c1.user, c1.timestamp AS max
FROM shoutbox c1
INNER JOIN topic t ON (t.topic_id = c1.topic_id)
WHERE c1.timestamp = (SELECT max(timestamp) FROM shoutbox c2
WHERE c2.reply_chunk_id = c1.reply_chunk_id)
AND t.user = ?
GROUP BY t.reply_chunk_id
HAVING count(*) > 1
ORDER BY t.reply_chunk_id
How does this work?
The group by selects one entry per topic.reply_chunk_id
The left join (c1.id = c2.id and c1.`timestamp` < c2.`timestamp`) + WHERE c2.`timestamp` IS NULL selects only those items from shoutbox which have the highest timestamp. This works because MySQL keeps increasing c1.timestamp to get c2.timestamp to be null as soon as that is true, it c1.timestamp will have reached its maximum value and will select that row within the possible rows to choose from.
If you don't understand point 2, see: http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html
Note that the PDO is autoescaping the fields with backticks
Sounds like most of it should be directly from your ShoutBox table. Prequery to find all "Chunks" the user replied to... of those chunks (and topic_ID since each chunk is always the same topic), get their respective minimum and maximum. Using the "Having count(*) > 1" will force only those that HAVE a second posting by a given user (what you were looking for).
THEN, re-query to the chunks to get the minimum regardless of user. This prevents the need of querying ALL chunks. Then join only what a single user is associated with back to the Topic.
Additionally, and I could be incorrect and need to adjust (minimally), but it appears that the SOUNDBOX table ID column would be an auto-increment column, and just happens to be time-stamped too at time of creation. That said, for a given "Chunk", the earliest ID would be the same as the earliest timestamp as they would be stamped at the same time they are created. Also makes easier on subsequent JOINs and sub query too.
By using STRAIGHT_JOIN, should force the "PreQuery" FIRST, come up with a very limited set, then qualify the WHERE clause and joins afterwords.
select STRAIGHT_JOIN
T.topic_title,
T.content_type,
T.subject_id,
T.creator,
T.description,
T.topic_host,
sb2.Topic_ID
sb2.message,
sb2.user,
sb2.TimeStamp
from
( select
sb1.Reply_Chunk_ID,
sb1.Topic_ID,
count(*) as TotalEntries,
min( sb1.id ) as FirstIDByChunkByUser,
min( sbJoin.id ) as FirstIDByChunk,
max( sbJoin.id ) as LastIDByChunk,
max( sbJoin.timestamp ) as LastTimeByChunk
from
ShoutBox sb1
join ShoutBox sbJoin
on sb1.Reply_Chunk_ID = sbJoin.Reply_Chunk_ID
where
sb1.user = CurrentUser
group by
sb1.Reply_Chunk_ID,
sb1.Topic_ID
having
min( sb1.id ) = min( sbJoin.ID ) ) PreQuery
join Topic T on
PreQuery.Topic_ID = T.ID
join ShoutBox sb2
PreQuery.LastIDByChunk = sb2.ID
where
sb2.TimeStamp >= YourTimeStampCriteria
order by
sb2.TimeStamp desc
EDIT ---- QUERY EXPLANATION -- with Modified query.
I've changed the query from re-reading (as was almost midnight when answered after holiday weekend :)
First, "STRAIGHT_JOIN" is a MySQL clause telling the engine to "do the query in the way / sequence I've stated". Basically, sometimes an engine will try to think for you and optimize in ways that may appear more efficient, but if based on your data, you know what will retrieve the smallest set of data first, and then join to other lookup fields next might in fact be better. Second the "PreQuery". If you have a "SQL-Select" statement (within parens) as Alias "From" clause, The "PreQuery" is just the name of the alias of the resultset... I could have called it anything, just makes sense that this is a stand-alone query of it's own. (Ooops... fixed to ShoutBox :) As for case-sensitivity, typically Column names are NOT case-sensitive... However, table names are... You could have a table name "MyTest" different than "mytest" or "MYTEST". But by supplying "alias", it helps shorten readability (especially with VeryLongTableNamesUsed ).
Should be working after the re-reading and applying adjustments.. Try the first "Prequery" on its own to see how many records it returns. On its own merits, it should return... for a single "CurrentUser" parameter value, every "Reply_Chunk_ID" (which will always have the same topic_id", get the first ID the person entered (min()). By JOINing again to Shoutbox on the chunk id, we (only those qualified as entered by the user), get the minimum and maximum ID per the chunk REGARDLESS of who started or responded. By applying the HAVING clause, this should only return those where the same person STARTED the topic (hence both have the same min() value.)
Finally, once those have been qualified, join directly to the TOPIC and SHOUTBOX tables again on their own merits of topic_id and LastIDByChunk and order the final results by the latest comment response timestamp descending.
I've added a where clause to further limit your "timestamp" criteria where the most recent final timestamp is on/after the given time period you want.
I would be curious how this query's time performance works compared to your already accepted answer too.