I have a MySQL database that is holding users activities on the site. Each time the user completes something I log the amount of time it took them to complete. The field is stored as a decimal.
I am wanting to know how I can get the lowest sum and the highest sum amounts of users. Lets say user1 has performed 2 tasks taking .5 each. Their sum would be 1.0. User2 has performed 10 tasks each taking 1.5 each. Their sum would be 15. User3 has performed 20 tasks each taking .25 each. Their sum would be 5.
So running a query over the DB the lowest amount would be 1 and the highest amount would be 15.
I know how to get the sum of columns but not sure how to return the lowest and the highest.
Thanks.
You can do this with a subquery and two levels of aggregation:
select min(totaltime), max(totaltime)
from (select user, sum(amountoftime) as totaltime
from t
group by user
) t;
Actually getting the users associated with the min and max is a bit more difficult, but that is not this question.
Related
I want to display 10 lines of the "questions" table with probability 0.2 of question that have type_id = 1 and probability 0.8 of question that have type_id =2.
Below my request, how to add the probability?
$query = "select * from questions ORDER BY RAND() LIMIT 10";
I want to display 10 questions which 20% of the questions have type_id = 2 and 80% have type_id = 1.
can someone help me please?
As I noted in the comments, you won't be able to use anything as obvious as ORDER BY RAND() if you want to include probabilities or anything like that. ORDER BY RAND() simply doesn't support that kind of thing. ORDER BY RAND() is also very slow, and not really suitable for use on a database of any significant size anyway.
There are a whole bunch of approaches you can use to do a random sort order with weighting or probabilities; I'm not going to try to discuss them all; I'll just give you a relatively simple one, but please be aware that the best technique for you will depend on your specific use case.
A simple approach would be something like this:
Create a new integer field on your table called weight or something similar.
Add a DB index for this field to enable you to query it quickly.
Set the first record to a value equal to its weighting as a whole number. ie a probability of 0.2 could be a weight of 20.
Set each subsequent record to the max value of this field plus the weight for that record. So if the second record is also 0.2, it would get a value of 40; if the one after that is only 0.1, it would be 50; and so on.
Do likewise for any new records that get added.
Now you can select a random record, with different weights for each record, as follows:
SELECT * FROM questions
WHERE weight >= FLOOR(RAND() * (SELECT MAX(weight) FROM questions))
ORDER BY weight
LIMIT 1
(note, I'm writing is answer in a hurry and without resource to test it; I haven't run this query so I may have got the syntax wrong, but the basic technique is sound)
This will pick a random number between zero and the largest weight value, and then find the question record that has the closest weight value to that random number.
Also, because the weight field is indexed, this query will be quick and efficient.
Downsides of this technique: It assumes that the weights for any given record won't change. If the weight of a record does need to change, then you would have to update the weight value for every record after it in the index.
[EDIT]
Let's imagine a table like this:
id Name
1 Question One
2 Question Two
3 Question Three
4 Question Four
5 Question Five
In this example, we want Questions 1 and 2 to have a probability of 0.2, question 3 to have a probability of 0.1 and questions 4 and 5 to have a probability of 0.3. Those probabilities can be expressed as integers by multiplying them by 100. (multiply by 10 also works, but 100 means we can have probabilities like 0.15 as well)
We add the weight column and the index for it, and set the weight values as follows:
id Name Weight
1 Question One 20
2 Question Two 40 (ie previous value + 20)
3 Question Three 50 (ie previous value + 10)
4 Question Four 80 (ie previous value + 30)
5 Question Five 110 (ie previous value + 30)
Now we can run our query.
The random part of the query FLOOR(RAND() * (SELECT MAX(weight) FROM questions)) will select a value between zero and 110. Let's imagine it gives 68.
Now the rest of our query says to pick the first record where the weight is greater than 68. In this case, that means that the record we get is record #4.
This gives us our probability because the random number could be anything, but is more likely to select a given record if the gap between its weight and the one before it is larger. You'll get record #4 three times as often as record #3.
I am trying to calculate the class average of a class where students can hold different amounts of subjects (e.g.: two students in a class of which one takes 3 subjects and the other 2). How do I get the class average that applies to all students? I already have their total term score in a table. I know the formula for calculating the average is simply the total averages of individual students divided by the number of students, but I can't seem to get it working. Can anyone please help me write a query that will calculate and print out this class average?
Here is what I have tried so far:
SELECT student_id, SUM(CA_total)/count(term_total) AS average
FROM score_entry
GROUP BY student_id
HERE IS THE OUTCOME. result from database
BUT I want the final result to be the sum of 25 + 26.5 which will be 51.5 then divided by the 2 students in the class should give 25.75. this is the class average. I want the query or code that will echo out the final answer 25.75. thanks
You are looking for SQL AVG function. More about it here: http://www.w3schools.com/sql/sql_func_avg.asp
I got a poll on my website and 5 stars rating:
1 star - 1 (worst)
2 stars - 2
3 stars - 3
4 stars - 4
5 stars - 5 (best)
Now, how should I store the poll records in MySQL? How to calculate them?
Default rate value is 5, but if user would rate it 1 star, it should change this value to 1 instead and then start to calculating it somehow... First I need an idea on how to store the votes in my database. You probably have more experience with that.
Store votes in a separate table, this way you will have record on who has voted.
user_id, topic_id , vote, date will be enough for now. Calculating is easy sum all votes divide by the total number of votes related to the topic. This will give you the average . In case you want it to show as 1-5 you can round() it. In order not to do this calculation every time you load a topic you can store it in a field in the topics table and update that field each time you add/remove record from the votes table.
Just store the votes in an integer field (1 to 5) in the table, combined with other info (eg to make sure the user can vote only once).
When you want to show the result, you use the cast votes, eg to calculate an average, or other statistics.
Recalculating (and storing) the statistics after each vote is cast, is also possible but not really required, unless you have much more page views than votes cast then it might result in less resource usage. (This also depending on the complexity of your statical calculations of course)
I have multiple tables/content types searched for a keyword and a fixed number of "result slots" for the autocomplete in the UI.
Let's assume there are 4 tables (persons,pages,articles,places) and 12 result slots. When a search returns 3 or more hits in each table, 3 results are displayed for each table.
I need an algorithm (preferably PHP) that increases the number of slots for a table when there are less than three results in the others. It should "fill up" the slots with results from the other tables as long as there are slots (and of course results) left
e.g.
person: 6
pages: 3
articles:2
places: 1
thanks!
Interesting question.
Lets say you have 4 categories A,B,C,D in the order of priority.
Fetch the number of rows of A,B,C,D
The function min(3,X) returns the smaller of 3 and X. Now do your initial allocation of slots by
Alloc_A=min(3,A)
Alloc_B=min(3,B)
Alloc_C=min(3,C)
Alloc_D=min(3,D)
The remaining slots are then:
Rem_A=A-Alloc_A
and so on.
The number of free slots are then:
free_slots=12-Alloc_A-Alloc_B-Alloc_C-Alloc_D
As for filling in the remaining slots, you can do it in proportion to the number of remainaing articles. We can allocate in proportion by
Alloc_A+=round(Rem_A/(Rem_A+Rem_B+Rem_C+Rem_D))
Alloc_B+=round(Rem_B/(Rem_A+Rem_B+Rem_C+Rem_D))
and so on. For example if there are 4 free slots and there are 9 in B and 3 in D,This will allocate 3/4 slots to B and 1 to D. But this can get unfair if, say b is 10 times as large as D. You can cap the others as, say 3x the smallest one.
I have a newssystem where you can rate News with 1 to 5 stars. In the Database i save the count, the sum and the absolute rating as int up to 100 (for html output, so 5 stars would be 100 1 star would be 20percent.
Now i have three toplists:
Best Rated
Most viewed
Most commented
Last two ones are simple, but the first is kinda tricky.
Before i took that thing over it was all a big mess, and they just put the 5 best rated news there, so in fact if there was a news rated 4.995 with 100k votes and another one with 5 stars at 1 vote, the "better rated" one is on top even if that is obv ridiculous.
For the first moment i capped the list so only news with a certain amount of votes (like 10 or 20) can be in the list.
But i do not really like that. Is there a nice method to kind-a give those things a "weight" with the count or something like that?
Have you considered using a weighted bayesian rating system? It'll weight the results based on the number of votes and the vote values themselves.
You could explore the statistical confidence in the rating perhaps based around the average rating received for all entries and the standard deviation of all votes. While an entry has an average rating of 5, if you only have a few votes then you may not be able to say with more than 90% confidence that the actual rating is above 4.7 say. You can then rate the entries based upon the rating for which you have 90% confidence.
I'm not sure if this meets your requirement of being simple.
You could use median of the user ratings as the total rating.
You would have five fields with eatch article, each one containing how many times the article was rated as n stars. Then you would select the field with the biggest value of all these and that would be your rating. It has the advantage of ignoring the outliers in the ratings.