Select 10 rows closest to a field numeric value - php

In PHP, I've a contest question such as "How many people will participate ?". I need to select the 10 closest answers near this total participants.
I've a table called answers with an ID and number field.
Let's say the total participants are 100 and I want 10 results.
I need to select the 10 results where number is closest to 100. It should be above and below 100.
How could I do that ?
Thanks,

Select the (abs(delta))...
select id, number, abs(100 - number) as delta
from mytable
order by delta
limit 0, 10
Something like this.

You can calculate the proximity with the absolute number of the substraction;
$proximity=abs($answer - 100);
the smaller, the closer!

Related

select RAND() with probability

I want to display 10 lines of the "questions" table with probability 0.2 of question that have type_id = 1 and probability 0.8 of question that have type_id =2.
Below my request, how to add the probability?
$query = "select * from questions ORDER BY RAND() LIMIT 10";
I want to display 10 questions which 20% of the questions have type_id = 2 and 80% have type_id = 1.
can someone help me please?
As I noted in the comments, you won't be able to use anything as obvious as ORDER BY RAND() if you want to include probabilities or anything like that. ORDER BY RAND() simply doesn't support that kind of thing. ORDER BY RAND() is also very slow, and not really suitable for use on a database of any significant size anyway.
There are a whole bunch of approaches you can use to do a random sort order with weighting or probabilities; I'm not going to try to discuss them all; I'll just give you a relatively simple one, but please be aware that the best technique for you will depend on your specific use case.
A simple approach would be something like this:
Create a new integer field on your table called weight or something similar.
Add a DB index for this field to enable you to query it quickly.
Set the first record to a value equal to its weighting as a whole number. ie a probability of 0.2 could be a weight of 20.
Set each subsequent record to the max value of this field plus the weight for that record. So if the second record is also 0.2, it would get a value of 40; if the one after that is only 0.1, it would be 50; and so on.
Do likewise for any new records that get added.
Now you can select a random record, with different weights for each record, as follows:
SELECT * FROM questions
WHERE weight >= FLOOR(RAND() * (SELECT MAX(weight) FROM questions))
ORDER BY weight
LIMIT 1
(note, I'm writing is answer in a hurry and without resource to test it; I haven't run this query so I may have got the syntax wrong, but the basic technique is sound)
This will pick a random number between zero and the largest weight value, and then find the question record that has the closest weight value to that random number.
Also, because the weight field is indexed, this query will be quick and efficient.
Downsides of this technique: It assumes that the weights for any given record won't change. If the weight of a record does need to change, then you would have to update the weight value for every record after it in the index.
[EDIT]
Let's imagine a table like this:
id Name
1 Question One
2 Question Two
3 Question Three
4 Question Four
5 Question Five
In this example, we want Questions 1 and 2 to have a probability of 0.2, question 3 to have a probability of 0.1 and questions 4 and 5 to have a probability of 0.3. Those probabilities can be expressed as integers by multiplying them by 100. (multiply by 10 also works, but 100 means we can have probabilities like 0.15 as well)
We add the weight column and the index for it, and set the weight values as follows:
id Name Weight
1 Question One 20
2 Question Two 40 (ie previous value + 20)
3 Question Three 50 (ie previous value + 10)
4 Question Four 80 (ie previous value + 30)
5 Question Five 110 (ie previous value + 30)
Now we can run our query.
The random part of the query FLOOR(RAND() * (SELECT MAX(weight) FROM questions)) will select a value between zero and 110. Let's imagine it gives 68.
Now the rest of our query says to pick the first record where the weight is greater than 68. In this case, that means that the record we get is record #4.
This gives us our probability because the random number could be anything, but is more likely to select a given record if the gap between its weight and the one before it is larger. You'll get record #4 three times as often as record #3.

Get lowest sum and highest sum of users from MySQL

I have a MySQL database that is holding users activities on the site. Each time the user completes something I log the amount of time it took them to complete. The field is stored as a decimal.
I am wanting to know how I can get the lowest sum and the highest sum amounts of users. Lets say user1 has performed 2 tasks taking .5 each. Their sum would be 1.0. User2 has performed 10 tasks each taking 1.5 each. Their sum would be 15. User3 has performed 20 tasks each taking .25 each. Their sum would be 5.
So running a query over the DB the lowest amount would be 1 and the highest amount would be 15.
I know how to get the sum of columns but not sure how to return the lowest and the highest.
Thanks.
You can do this with a subquery and two levels of aggregation:
select min(totaltime), max(totaltime)
from (select user, sum(amountoftime) as totaltime
from t
group by user
) t;
Actually getting the users associated with the min and max is a bit more difficult, but that is not this question.

Accurate star rating calculation

I need to calculate a star rating for a product
I know how to calculate the weighted average, but its not good enough
example (5*252 + 4*124 + 3*40 + 2*29 + 1*33) / (252+124+40+29+33) = 4.11
I want to avoid cases when a product get a 1000 five star ratings and one 4 star, and another one gets just one 5 stars and it gets on top
I know there is a way but i couldn't find it
thanks
Try sort you product record using new column (rating * votes).
it will help you to find the most voted product with best rating.
use sorting which can include number of ratings, something like number of votes divided by calculated avg rating.
You can multiply for a weight function, that gives a penalty to product with lower number of votes and converge in time. Something like this should do the job.
a parabole truncated to 1 should do the job
EX:
convergence_step=1000
if voters<convergence_step:
meanscore=score*{[(voters)/(float)(convergence_step)]^2}
else
meanscore=score

Fastest way to get row number with percentage selects (LIMIT X, 1)?

In my MySQL database I have a table (PERSONS) with over 10 million rows, the two important columns are:
ID
POINTS
I would like to know the rank of the person with ID = randomid
I want to return to the person his "rank", which depends on his points. But his rank will not be the exact row number, but more like a percentage layer. Like: "You are in the top 5%" or "You are in the layer 10% - 15%".
Of course I could query the table and convert the row number to the layer% by dividing it with the total number of rows. But my question is, would it be faster (with 10M+ rows) to just grab the several rows with LIMIT X, 1, where X will be a row on percentage 100, 95, 90, 85 .. of the table. Next step: check if the points of this row is lower than the current persons points and if yes, grab next layer % row, if not, return previous layer row.
In the persons table there are 9 columns with 2 bigints, 4 varchars 150, 1 date and 2 booleans.
Of course I would prefer to get the exact row rank, but from what I tested, this is slow and takes at least several seconds, with my wat it can be done in a few hundreds of a second.
Also, the way I suggested is not precise when there are several layers with the same points, but it doesn't need to be that precise, so we can neglect that fact.
Extra info, I program in PHP, so if there is a specific solution for this in PHP + MySQL it would be nice too.
At last, it's worth to mention that the table grows with 20k rows an hour (almost 500k a day).
I appreciate all the help.
You could try this. I first count the number of rows with more points, and then add one to that, in case there are a number of rows with the same number of points. So if there are 10 rows with the same number of points, they all have the same rank as the first one in that group.
SELECT SUM(CASE WHEN points > (SELECT POINTS FROM YOUR_TABLE WHERE ID = randomid) THEN 1 ELSE 0 END) + 1 as Rank,
(SUM(CASE WHEN points > (SELECT POINTS FROM YOUR_TABLE WHERE ID = randomid) THEN 1 ELSE 0 END) + 1) / COUNT(*) as Pct
FROM YOUR_TABLE
If that is slow, I would run two queries. First get that ID's points and then plug that into a second query to determine the rank/pct.
SELECT POINTS
FROM YOUR_TABLE
WHERE ID = randomid
Then compute the rank and pct, plugging in the points from above.
SELECT SUM(CASE WHEN points > POINTS THEN 1 ELSE 0 END) + 1 as Rank,
(SUM(CASE WHEN points > POINTS THEN 1 ELSE 0 END) + 1) / COUNT(*) as Pct
FROM YOUR_TABLE

Calculating poll votes

I got a poll on my website and 5 stars rating:
1 star - 1 (worst)
2 stars - 2
3 stars - 3
4 stars - 4
5 stars - 5 (best)
Now, how should I store the poll records in MySQL? How to calculate them?
Default rate value is 5, but if user would rate it 1 star, it should change this value to 1 instead and then start to calculating it somehow... First I need an idea on how to store the votes in my database. You probably have more experience with that.
Store votes in a separate table, this way you will have record on who has voted.
user_id, topic_id , vote, date will be enough for now. Calculating is easy sum all votes divide by the total number of votes related to the topic. This will give you the average . In case you want it to show as 1-5 you can round() it. In order not to do this calculation every time you load a topic you can store it in a field in the topics table and update that field each time you add/remove record from the votes table.
Just store the votes in an integer field (1 to 5) in the table, combined with other info (eg to make sure the user can vote only once).
When you want to show the result, you use the cast votes, eg to calculate an average, or other statistics.
Recalculating (and storing) the statistics after each vote is cast, is also possible but not really required, unless you have much more page views than votes cast then it might result in less resource usage. (This also depending on the complexity of your statical calculations of course)

Categories