Count of column within every 30 day range - php

So I have a table that looks like this:
Person Product Date Quantity
1 A 1/11/2014 1
2 A 1/11/2014 2
1 A 1/20/2014 2
3 A 1/21/2014 1
3 B 1/21/2014 1
1 A 1/25/2014 1
I want to find the Count of Quantity where Product is A and Person has a Count > 1 WITHIN ANY SLIDING 30 DAY RANGE. Another key is that once two records meet the criteria, they should not add to the count again. For example, Person 1 will have a count of 3 for 1/11 and 1/20, but will not have a count of 3 for 1/20 and 1/25. Person 2 will have a count of 2. Person 3 will not show up in the results, because the second product is B. This query will run within a specific date range also (e.g, 1/1/2014 - 10/27/2014).
My product is written in MySQL and PHP and I would prefer to do this exclusively in MySQL, but this seems more like an OLAP problem. I greatly appreciate any guidance.

Another key is that once two records meet the criteria, they should not add to the count again.
This is not relational. In order for this to be meaningful, we have to define the order in which records are evaluated. While SQL does have ORDER BY, that's for display purposes only. It does not affect the order in which the query is computed. The order of evaluation is not meant to matter.
I do not believe this can be expressed as a SELECT query at all. If I am correct, that leaves you with plSQL or a non-SQL language.
If you're willing to drop this requirement (and perhaps implement it in post-processing, see below), this becomes doable. Start with a view of all the relevant date ranges:
CREATE VIEW date_ranges(
start_date, -- DATE
end_date -- DATE
) AS
SELECT DISTINCT date, DATE_ADD(date, INTERVAL 30 day)
FROM your_table;
Now, create a view of relevant counts:
CREATE VIEW product_counts(
person, -- INTEGER REFERENCES your_table(person)
count, -- INTEGER
start_date, -- DATE
end_date -- DATE
) AS
SELECT y.person,
sum(y.quantity),
r.start_date,
r.end_date
FROM date_ranges r
JOIN your_table y
ON y.date BETWEEN r.start_date AND r.end_date
GROUP BY y.person
HAVING sum(y.quantity) > 1;
For post-processing, you need to look at each row in the product_counts view and look up the purchase orders (rows of your_table) which correspond to it. Check whether you've seen any of those orders before (using a hash set), and if so, exclude them from consideration, reducing the count of the current item and possibly eliminating it entirely. This is best done in a procedural language other than SQL.

Related

MySQL sorting date then by part number

Essentially I want these parts (below) grouped then the groups place in order of time, starting from the latest time being at the top of the list.
ID Parts Time
1 SMH_2010 08:59:18
2 JJK_0101 08:59:26
3 FTD_0002 08:59:24
4 JJK_0102 08:59:27
5 FTD_0001 08:59:22
6 SMH_2010 08:59:20
7 FTD_0003 08:59:25
So, the results would look like:
ID Parts Time
1 JJK_0101 08:59:26
2 JJK_0102 08:59:27
3 FTD_0001 08:59:22
4 FTD_0002 08:59:24
5 FTD_0003 08:59:25
6 SMH_2010 08:59:20
7 SMH_2010 08:59:18
Please, I would be grateful for any help.
What you are asking is not sorting in the traditional meaning. Your first attempt orders the result by time, and then by part if multiple timestamps occur at the same time.
What you want neither sorts the result in alphabetically by Parts name, nor ascending/descending on timestamp. What you are asking for can't be accomplished by the sort operation in SQL. Having the parts in sequence is not ordering.
I finally found a solution to this. Not my ideal solution but, never the less it works.
I added another field called max_date which by default is ‘now()’ as every new part is inserted.
I create a prefix from the current part being inserted, something like “SMH_” as a variable called $prefix = “SMH_”;
I have another query that directly follows the insert, which updates the max_date again, by ‘now()’ where the prefix is like $prefix.
UPDATE parts SET max_date = now() WHERE prefix LIKE '%$prefix%'
To display the results I use something along the line of :
SELECT * FROM parts ORDER BY parts.max_date DESC, parts.part ASC

Finding Interval of a data present on latest 2 dates

I'm developing a web-based tool that can help analyze number intervals that occurs in a 6-digit lottery.
Let us focus on a certain number first. Say 7
The sql query I've done so far:
SELECT * FROM `l642` WHERE `1d`=7 OR `2d`=7 OR `3d`=7 OR `4d`=7 OR `5d`=7
OR `6d`=7 ORDER BY `draw_date` DESC LIMIT 2
This will pull the last two latest dates where number 7 is present
I'm thinking of using DATEDIFF but I'm confused on how to get the previous value to subtract it on the latest draw_date
My goal is to list the intervals of numbers 1-42 and I'll plan to accomplish it using PHP.
Looking forward to your help
A few ideas spring to mind.
(1) First, since you perfectly have your result set ordered, use PHP loop on the two rows getting $date1 =$row['draw_date']. Then fetch next/last row and set $date2 =$row['draw_date']. With these two you have
$diff=date_diff($date1,$date2);
as the difference in days.
(2)
A second way is to have mysql return datediff by including a rownumber in the resultset and doing a self-join with aliases say alias a for row1 and alias b for row2.
datediff(a.draw_date,b.drawdate).
How one goes about getting rownumber could be either:
(2a) rownumber found here: With MySQL, how can I generate a column containing the record index in a table?
(2b) worktable with id int auto_increment primary key column with select into from your shown LIMIT 2 query (and a truncate table worktable between iterations 1 to 42) to reset auto_increment to 0.
The entire thing could be wrapped with an outer table 1 to 42 where 42 rows are brought back with 2 columns (num, number_of_days), but that wasn't your question.
So considering how infrequent you are probably doing this, I would probably recommend not over-engineering it and would shoot for #1

mysql - getting only the results with diferences from same table

So I have a single table inside which I have a score system for points. It looks something along this line:
Columns:
ID Name Date Points
1 Peter 2014-07-15 5
2 John 2014-07-15 6
3 Bill 2014-07-15 3
and so on...
Everyday, the new results are being put into the table with the total amount of points acumulated, however in order to be able to get historic values, the results are put into new rows. So on the 2014-07-16, the table will look like this:
ID Name Date Points
1 Peter 2014-07-15 5
2 John 2014-07-15 6
3 Bill 2014-07-15 3
4 Peter 2014-07-16 11
5 John 2014-07-16 12
6 Bill 2014-07-16 3
However sometimes when a player doesn't take part for the whole day and doesn't get any points, he will still be added, but the points will remain the same (here this is shown by the case of Bill).
My question is how to count the number of each type of players (active - Peter and John ie when the points value changes from one date to another and inactive - Bill ie when the points value stays the same).
I have managed to get this query to only select players who do have the same value, but it's giving me the list of players rather than the count. Although I could potentialy be wrong with this query:
SELECT Points, name, COUNT(*)
FROM points
WHERE DATE(Date) = '2014-07-15' OR DATE(Date) = '2014-07-16'
GROUP BY Points
HAVING COUNT(*)>1
I'm not sure how to count the number of rows (could do a bypass trick with PHP getting the number of rows, but interested in SQL only) or how to invert it, to get a count of players who have a different score (again, could get total of rows and then subtract the above number, but not interested in that either - I'd prefer the SQL).
Regards and thanks in advance.
You are pretty close.
If you have at most one row per "player" per "date", you could do something like this:
SELECT SUM(IF(c.cnt_distinct_points<2,1,0)) AS cnt_inactive
, SUM(IF(c.cnt_distinct_points>1,1,0)) AS cnt_active
FROM ( SELECT p.name
, COUNT(DISTINCT p.points) AS cnt_distinct_points
FROM points p
WHERE DATE(p.Date) IN ('2014-07-15','2014-07-16')
GROUP BY p.name
) c
The inline view query (aliased as c) gets a count of the distinct number of "points" values for each player. We need to "group by" name, so we can get a distinct list of players, along with an indication whether the points value was different or not. If all of the non-NULL "points" values for a given player are the same, COUNT(DISTINCT ) will return a value of 1. Otherwise, we'll get a value larger than 1.
The outer query processes that list, collapsing all of the rows into a single row. The "trick" is to use expressions in the SELECT list that return 1 or 0, depending on whether the player is "inactive", and perform a SUM aggregate on that. Do the same thing, but a different expression to return a 1 if the player is "active".
If the count of distinct points for a player is 1, we'll essentially be adding 1 to cnt_inactive. Similarly, of the distinct points for a player is greater than 1, we'll be adding 1 to the cnt_active.
If this doesn't make sense, let me know if you have questions.
NOTE: Ideally, we'd avoid using the DATE() function around the p.Date column reference, so we could enable an appropriate index.
If the Date column is defined as (MySQL datatype) DATE, then the DATE() function is unnecessary. If the Date column is defined as (MySQL datatype) DATETIME or TIMESTAMP, we could use an equivalent predicate:
WHERE p.Date >= '2014-07-15' AND p.Date < '2014-07-16' + INTERVAL 1 DAY
That looks more complicated, but a predicate of that form is sargable (i.e. MySQL can use an index range scan to satisfy it, rather than having to look at every row in the table.)
For performance, we'd probably benefit from an index with leading columns of name and date
... ON points (`name`,`date`)
(MySQL may be able to avoid a "Using filesort" operation for the GROUP BY).
I would solve this problem by looking at the previous number of points and then doing a comparison:
select date(date), count(*) as NumActives;
from (select p.*,
(select p2.points
from points p2
where p2.name = p.name and p2.date < p.date
order by p2.date desc
limit 1
) as prev_points
from points p
) p
where prev_points is NULL or prev_points <> points;
Of course, you can add a where clause to get the count for any particular day.

MySQL view to generate monthly bonus

I'm building a web site for marketing company. As per their requirement, when a customer makes a booking. A certain amount of bonus is distributed between employees based on
their hierarchy. The distribution starts from 60 days after booking and bonus is given
for 24 months.
The tables are
bookings
bid book_date
1 2012-05-09
2 2012-05-10
bonus
bid empid amount
1 1 300
1 2 400
2 2 300
2 3 400
Is it possible to write mysql views that generates monthly bonus an employee gets
for every month. I didn't find solution on how to make update with mysql view. Any hint
will of great help.
Instead of view, I would suggest is write mysql function which will return the bonus by accepting the employee ID.
Using mysql function you will have more room to write logic and PL/SQL.
Inner join on bid and filter to only include eligible bonuses by comparing the book date to today's date. If today's date is less than 60 days after or more than 24 months plus 60 days after the original book date, exclude it. (You can go to mySQL.com to learn more about how to manipulate dates in mySQL. I forget...)
You will be left with multiple rows containing only emp id and amount. In the second round, use a "select sum(amount) from (...put your other query here...) group by empid" to get the aggregate bonus per employee.
This approach (and I think any solution) requires a nested SQL statement, and so if you're not comfortable with that syntax you can use that term to explore in google or SO. Cheers!

Popularity Algorithm

I'm making a digg-like website that is going to have a homepage with different categories. I want to display the most popular submissions.
Our rating system is simply "likes", like "I like this" and whatnot. We basically want to display the submissions with the highest number of "likes" per time. We want to have three categories: all-time popularity, last week, and last day.
Does anybody know of a way to help? I have no idea how to go about doing this and making it efficient. I thought that we could use some sort of cron-job to run every 10 minutes and pull in the number of likes per the last 10 minutes...but I've been told that's pretty inefficient?
Help?
Thanks!
Typically Digg and Reddit-like sites go by the date of the submission and not the times of the votes. This way all it takes is a simple SQL query to find the top submissions for X time period. Here's a pseudo-query to find the 10 most popular links from the past 24 hours using this method:
select * from submissions
where (current_time - post_time) < 86400
order by score desc limit 10
Basically, this query says to find all the submissions where the number of seconds between now and the time it was posted is less than 86400, which is 24 hours in UNIX time.
If you really want to measure popularity within X time interval, you'll need to store the post and time for every vote in another table:
create table votes (
post foreign key references submissions(id),
time datetime,
vote integer); -- +1 for upvote, -1 for downvote
Then you can generate a list of the most popular posts between X and Y times like so:
select sum(vote), post from votes
where X < time and time < Y
group by post
order by sum(vote) desc limit 10;
From here you're just a hop, skip, and inner join away from getting the post data tied to the returned ids.
Do you have a decent DB setup? Can we please hear about your CREATE TABLE details and indices? Assuming a sane setup, the DB should be able to pull the counts you require fast enough to suit your needs! For example (net of indices and keys, that somewhat depend on what DB engine you're using), given two tables:
CREATE TABLE submissions (subid INT, when DATETIME, etc etc)
CREATE TABLE likes (subid INT, when DATETIME, etc etc)
you can get the top 33 all-time popular submissions as
SELECT *, COUNT(likes.subid) AS score
FROM submissions
JOIN likes USING(subid)
GROUP BY submissions.subid
ORDER BY COUNT(likes.subid) DESC
LIMIT 33
and those voted for within a certain time range as
SELECT *, COUNT(likes.subid) AS score
FROM submissions
JOIN likes USING(subid)
WHERE likes.when BETWEEN initial_time AND final_time
GROUP BY submissions.subid
ORDER BY COUNT(likes.subid) DESC
LIMIT 33
If you were storing "votes" (positive or negative) in likes, instead of just counting each entry there as +1, you could simply use SUM(likes.vote) instead of the COUNTs.
For stable list like alltime, lastweek, because they are not supposed to change really fast so that I think you should save the list in your cache with expiration time is around 1 days or longer.
If you concern about correct count in real time, you can check at every page view by comparing the page with lowest page in the cache.
All you need to do is to care for synchronizing between the cache and actual database.
thethanghn
Queries where the order is some function of the current time can become real performance problems. Things get much simpler if you can bucket by calendar time and update scores for each bucket as people vote.
To complete nobody_'s answer I would suggest you read up on the documentation (if you are using MySQL of course).

Categories