mysql - getting only the results with diferences from same table - php

So I have a single table inside which I have a score system for points. It looks something along this line:
Columns:
ID Name Date Points
1 Peter 2014-07-15 5
2 John 2014-07-15 6
3 Bill 2014-07-15 3
and so on...
Everyday, the new results are being put into the table with the total amount of points acumulated, however in order to be able to get historic values, the results are put into new rows. So on the 2014-07-16, the table will look like this:
ID Name Date Points
1 Peter 2014-07-15 5
2 John 2014-07-15 6
3 Bill 2014-07-15 3
4 Peter 2014-07-16 11
5 John 2014-07-16 12
6 Bill 2014-07-16 3
However sometimes when a player doesn't take part for the whole day and doesn't get any points, he will still be added, but the points will remain the same (here this is shown by the case of Bill).
My question is how to count the number of each type of players (active - Peter and John ie when the points value changes from one date to another and inactive - Bill ie when the points value stays the same).
I have managed to get this query to only select players who do have the same value, but it's giving me the list of players rather than the count. Although I could potentialy be wrong with this query:
SELECT Points, name, COUNT(*)
FROM points
WHERE DATE(Date) = '2014-07-15' OR DATE(Date) = '2014-07-16'
GROUP BY Points
HAVING COUNT(*)>1
I'm not sure how to count the number of rows (could do a bypass trick with PHP getting the number of rows, but interested in SQL only) or how to invert it, to get a count of players who have a different score (again, could get total of rows and then subtract the above number, but not interested in that either - I'd prefer the SQL).
Regards and thanks in advance.

You are pretty close.
If you have at most one row per "player" per "date", you could do something like this:
SELECT SUM(IF(c.cnt_distinct_points<2,1,0)) AS cnt_inactive
, SUM(IF(c.cnt_distinct_points>1,1,0)) AS cnt_active
FROM ( SELECT p.name
, COUNT(DISTINCT p.points) AS cnt_distinct_points
FROM points p
WHERE DATE(p.Date) IN ('2014-07-15','2014-07-16')
GROUP BY p.name
) c
The inline view query (aliased as c) gets a count of the distinct number of "points" values for each player. We need to "group by" name, so we can get a distinct list of players, along with an indication whether the points value was different or not. If all of the non-NULL "points" values for a given player are the same, COUNT(DISTINCT ) will return a value of 1. Otherwise, we'll get a value larger than 1.
The outer query processes that list, collapsing all of the rows into a single row. The "trick" is to use expressions in the SELECT list that return 1 or 0, depending on whether the player is "inactive", and perform a SUM aggregate on that. Do the same thing, but a different expression to return a 1 if the player is "active".
If the count of distinct points for a player is 1, we'll essentially be adding 1 to cnt_inactive. Similarly, of the distinct points for a player is greater than 1, we'll be adding 1 to the cnt_active.
If this doesn't make sense, let me know if you have questions.
NOTE: Ideally, we'd avoid using the DATE() function around the p.Date column reference, so we could enable an appropriate index.
If the Date column is defined as (MySQL datatype) DATE, then the DATE() function is unnecessary. If the Date column is defined as (MySQL datatype) DATETIME or TIMESTAMP, we could use an equivalent predicate:
WHERE p.Date >= '2014-07-15' AND p.Date < '2014-07-16' + INTERVAL 1 DAY
That looks more complicated, but a predicate of that form is sargable (i.e. MySQL can use an index range scan to satisfy it, rather than having to look at every row in the table.)
For performance, we'd probably benefit from an index with leading columns of name and date
... ON points (`name`,`date`)
(MySQL may be able to avoid a "Using filesort" operation for the GROUP BY).

I would solve this problem by looking at the previous number of points and then doing a comparison:
select date(date), count(*) as NumActives;
from (select p.*,
(select p2.points
from points p2
where p2.name = p.name and p2.date < p.date
order by p2.date desc
limit 1
) as prev_points
from points p
) p
where prev_points is NULL or prev_points <> points;
Of course, you can add a where clause to get the count for any particular day.

Related

Can 4 fields be put together to use as a key to join two tables?

I am trying to create a solution which allows pilots to swap a 1, 2, 3 or 4 day trip for another pilot's 1, 2, 3 or 4 day trip. I have 3 tables, Pilot, Have and Want. A pilot creates a Have which, for example, is a 2 day on 10/20 (October 20th). This pilot wants a 3 day starting on the 22. Another pilot has the 3 day and he wants something like the other pilot's 2 day. The tables looks like this;
id_pilot, name, phone, employee_num, aircraft, base, seat
1 Steve 363-0040 123454 320 DCA FO
2 Ted 992-5380 123455 320 DCA FO
id_have, id_pilot, daytrip, start_month, start_day
1 1 02 10 20
2 2 03 10 22
id_want, id_have, daytrip, start_month, start_day
1 1 03 10 22
To see what Wants are out there for a particular Have I need to join the Want and Have table on something that looks like DCA|320|FO|10|20|2. I only want to see the Wants for a particular Have that are for the aircraft, base and seat. I can do this by creating a new join field but having such a simplistic understanding of MySQL I imagine there is a way to do this on the fly. I used joins to grab information via the primary keys but this seems like it's one step removed from that. What would such a query look like?
To give an example of what Marc B is saying, the join clause can compare the daytrip, start_month and start_date columns between the have and want tables, e.g.
select have.id_pilot, want.id_pilot, want.daytrip, want.start_month, want.start_day
from have
inner join want
on have.daytrip = want.daytrip
and have.start_month = want.start_month
and have.start_day = want.start_day
The 'on' clause is executed on each row comparison between the two tables and the only absolute is that it must return true or false. So any column from either table can be used in the evaluation in any combination.

Finding Interval of a data present on latest 2 dates

I'm developing a web-based tool that can help analyze number intervals that occurs in a 6-digit lottery.
Let us focus on a certain number first. Say 7
The sql query I've done so far:
SELECT * FROM `l642` WHERE `1d`=7 OR `2d`=7 OR `3d`=7 OR `4d`=7 OR `5d`=7
OR `6d`=7 ORDER BY `draw_date` DESC LIMIT 2
This will pull the last two latest dates where number 7 is present
I'm thinking of using DATEDIFF but I'm confused on how to get the previous value to subtract it on the latest draw_date
My goal is to list the intervals of numbers 1-42 and I'll plan to accomplish it using PHP.
Looking forward to your help
A few ideas spring to mind.
(1) First, since you perfectly have your result set ordered, use PHP loop on the two rows getting $date1 =$row['draw_date']. Then fetch next/last row and set $date2 =$row['draw_date']. With these two you have
$diff=date_diff($date1,$date2);
as the difference in days.
(2)
A second way is to have mysql return datediff by including a rownumber in the resultset and doing a self-join with aliases say alias a for row1 and alias b for row2.
datediff(a.draw_date,b.drawdate).
How one goes about getting rownumber could be either:
(2a) rownumber found here: With MySQL, how can I generate a column containing the record index in a table?
(2b) worktable with id int auto_increment primary key column with select into from your shown LIMIT 2 query (and a truncate table worktable between iterations 1 to 42) to reset auto_increment to 0.
The entire thing could be wrapped with an outer table 1 to 42 where 42 rows are brought back with 2 columns (num, number_of_days), but that wasn't your question.
So considering how infrequent you are probably doing this, I would probably recommend not over-engineering it and would shoot for #1

Lowest free value in mysql column

I already searched but I always find LEAST and GREATEST as hints. I want to have the next ascending number in a row that's not used. Like the following:
entries
1
2
3
5
6
7
If every of the numbers is for one row in my table I want the number 4 as a result and in the following example:
1
2
3
4
5
6
I want the number 7 as a result. Is there any possiblity to accomplish this in an SQL statement?
Best,
Robin
This query assumes that the number 1 is in your table
select min(number) + 1 from entries e1
where not exists (
select 1 from entries e2
where e2.number = e1.number + 1
)
If you want all missing numbers (where gaps are no larger than 1) instead of the smallest one, then remove min()
It think the solution is to do a self-join with the next value, and extract the first lowest result. Example:
Table: values, with column value
SELECT v1.value
FROM values v1
LEFT JOIN values v2 ON v1.value = (v2.value + 1)
WHERE v2.value IS NULL
ORDER BY v1.value ASC
LIMIT 1

Count of column within every 30 day range

So I have a table that looks like this:
Person Product Date Quantity
1 A 1/11/2014 1
2 A 1/11/2014 2
1 A 1/20/2014 2
3 A 1/21/2014 1
3 B 1/21/2014 1
1 A 1/25/2014 1
I want to find the Count of Quantity where Product is A and Person has a Count > 1 WITHIN ANY SLIDING 30 DAY RANGE. Another key is that once two records meet the criteria, they should not add to the count again. For example, Person 1 will have a count of 3 for 1/11 and 1/20, but will not have a count of 3 for 1/20 and 1/25. Person 2 will have a count of 2. Person 3 will not show up in the results, because the second product is B. This query will run within a specific date range also (e.g, 1/1/2014 - 10/27/2014).
My product is written in MySQL and PHP and I would prefer to do this exclusively in MySQL, but this seems more like an OLAP problem. I greatly appreciate any guidance.
Another key is that once two records meet the criteria, they should not add to the count again.
This is not relational. In order for this to be meaningful, we have to define the order in which records are evaluated. While SQL does have ORDER BY, that's for display purposes only. It does not affect the order in which the query is computed. The order of evaluation is not meant to matter.
I do not believe this can be expressed as a SELECT query at all. If I am correct, that leaves you with plSQL or a non-SQL language.
If you're willing to drop this requirement (and perhaps implement it in post-processing, see below), this becomes doable. Start with a view of all the relevant date ranges:
CREATE VIEW date_ranges(
start_date, -- DATE
end_date -- DATE
) AS
SELECT DISTINCT date, DATE_ADD(date, INTERVAL 30 day)
FROM your_table;
Now, create a view of relevant counts:
CREATE VIEW product_counts(
person, -- INTEGER REFERENCES your_table(person)
count, -- INTEGER
start_date, -- DATE
end_date -- DATE
) AS
SELECT y.person,
sum(y.quantity),
r.start_date,
r.end_date
FROM date_ranges r
JOIN your_table y
ON y.date BETWEEN r.start_date AND r.end_date
GROUP BY y.person
HAVING sum(y.quantity) > 1;
For post-processing, you need to look at each row in the product_counts view and look up the purchase orders (rows of your_table) which correspond to it. Check whether you've seen any of those orders before (using a hash set), and if so, exclude them from consideration, reducing the count of the current item and possibly eliminating it entirely. This is best done in a procedural language other than SQL.

Fastest way to get row number with percentage selects (LIMIT X, 1)?

In my MySQL database I have a table (PERSONS) with over 10 million rows, the two important columns are:
ID
POINTS
I would like to know the rank of the person with ID = randomid
I want to return to the person his "rank", which depends on his points. But his rank will not be the exact row number, but more like a percentage layer. Like: "You are in the top 5%" or "You are in the layer 10% - 15%".
Of course I could query the table and convert the row number to the layer% by dividing it with the total number of rows. But my question is, would it be faster (with 10M+ rows) to just grab the several rows with LIMIT X, 1, where X will be a row on percentage 100, 95, 90, 85 .. of the table. Next step: check if the points of this row is lower than the current persons points and if yes, grab next layer % row, if not, return previous layer row.
In the persons table there are 9 columns with 2 bigints, 4 varchars 150, 1 date and 2 booleans.
Of course I would prefer to get the exact row rank, but from what I tested, this is slow and takes at least several seconds, with my wat it can be done in a few hundreds of a second.
Also, the way I suggested is not precise when there are several layers with the same points, but it doesn't need to be that precise, so we can neglect that fact.
Extra info, I program in PHP, so if there is a specific solution for this in PHP + MySQL it would be nice too.
At last, it's worth to mention that the table grows with 20k rows an hour (almost 500k a day).
I appreciate all the help.
You could try this. I first count the number of rows with more points, and then add one to that, in case there are a number of rows with the same number of points. So if there are 10 rows with the same number of points, they all have the same rank as the first one in that group.
SELECT SUM(CASE WHEN points > (SELECT POINTS FROM YOUR_TABLE WHERE ID = randomid) THEN 1 ELSE 0 END) + 1 as Rank,
(SUM(CASE WHEN points > (SELECT POINTS FROM YOUR_TABLE WHERE ID = randomid) THEN 1 ELSE 0 END) + 1) / COUNT(*) as Pct
FROM YOUR_TABLE
If that is slow, I would run two queries. First get that ID's points and then plug that into a second query to determine the rank/pct.
SELECT POINTS
FROM YOUR_TABLE
WHERE ID = randomid
Then compute the rank and pct, plugging in the points from above.
SELECT SUM(CASE WHEN points > POINTS THEN 1 ELSE 0 END) + 1 as Rank,
(SUM(CASE WHEN points > POINTS THEN 1 ELSE 0 END) + 1) / COUNT(*) as Pct
FROM YOUR_TABLE

Categories