Calculating overall rating - php

If i have a series of 10 objects with rating from 1 to 10. Then how can i calculate overall rating?
For example if i have a list like this:
Entertainment - 8/10
Fun - 9/10
Comedy - 6/10
Dance - 8/10
and so on... Like this 10 objects. Tell me how to calculate the overall rating for 10.
Overall - ?/10
I am very weak in maths. I was told by someone to add the total and if I got 83 as the answer, then the overall rating will be 8.3/10. Is this correct?
I am doing this for my PHP website. So if someone knows how to write a query for this, that would be very helpful for me.

Average the total rating and you will get the answer.
he one that is told for will stand correct if there are 10 criteria on which scoring is to be made.
SELECT avg(score) FROM tbl
There is inbuilt function available for it
Refer
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_avg

Yes, to get the average, add them all together and divide by the amount. Example:
//do a MySQL query instead of this
$result_out_of_10 = array(
'fun' => 9,
'comedy' => 6,
'dance' => 8
);
$total = 0;
$total_results = 0;
foreach( $result_out_of_10 as $result )
{
$total += $result;
$total_results++;
}
$final_average_out_of_10 = $total / $total_results;
print "Average rating: $final_average_out_of_10 out of 10.";
EDIT: Meherzad has a better way - using the MySQL AVG() function - which I didn't know about. Use his way instead (although mine still works, it's more code than necessary).

Related

PHP - Optimize finding closest point in an Array

I have created a script which gets a big array of points and then finds the closest point in 3D-space based on a limited array of chosen points. It works great. However, sometimes I get like over 2 Million points to compare to an array of 256 items so it is over 530 million calculations! Which takes a considerable amount of time and power (taking that it will be comparing stuff like that few times a min).
I have a limited group of 3D coordinates like this:
array (size=XXX)
0 => 10, 20, 30
1 => 200, 20, 13
2 => 36, 215, 150
3 => ...
4 => ...
... // this is limited to max 256 items
Then I have another very large group of, let's say, random 3D coordinates which can vary in size from 2,500 -> ~ 2,000,000+ items. Basically, what I need to do is to iterate through each of those points and find the closest point. To do that I use Euclidean distance:
sq((q1-p1)2+(q2-p2)2+(q3-p3)2)
This gives me the distance and I compare it to the current closest distance, if it is closer, replace the closest, else continue with next set.
I have been looking on how to change it so I don't have to do so many calculations. I have been looking at Voronoi Diagrams then maybe place the points in that diagram, then see which section it belongs to. However, I have no idea how I can implement such a thing in PHP.
Any idea how I can optimize it?
Just a quick shot from the hip ;-)
You should be able to gain a nice speed up if you dont compare each point to each other point. Many points can be skipped because they are already to far away if you just look at one of the x/y/z coordinates.
<?php
$coord = array(18,200,15);
$points = array(
array(10,20,30),
array(200,20,13),
array(36,215,150)
);
$closestPoint = $closestDistance= false;;
foreach($points as $point) {
list($x,$y,$z) = $point;
// Not compared yet, use first poit as closest
if($closestDistance === false) {
$closestPoint = $point;
$closestDistance = distance($x,$y,$z,$coord[0],$coord[1],$coord[2]);
continue;
}
// If distance in any direction (x/y/z) is bigger than closest distance so far: skip point
if(abs($coord[0] - $x) > $closestDistance) continue;
if(abs($coord[1] - $y) > $closestDistance) continue;
if(abs($coord[2] - $z) > $closestDistance) continue;
$newDistance = distance($x,$y,$z,$coord[0],$coord[1],$coord[2]);
if($newDistance < $closestDistance) {
$closestPoint = $point;
$closestDistance = distance($x,$y,$z,$coord[0],$coord[1],$coord[2]);
}
}
var_dump($closestPoint);
function distance($x1,$y1,$z1,$x2,$y2,$z2) {
return sqrt(pow($x1-$x2,2) + pow($y1 - $y2,2) + pow($z1 - $z2,2));
}
A working code example can be found at http://sandbox.onlinephpfunctions.com/code/8cfda8e7cb4d69bf66afa83b2c6168956e63b51e

I'm creating a random array in PHP and my code doesnt seem to output a truly random answer

I want to construct an array of 3 offers that output in a random order. I have the following code and whilst it does output 3 random offers it doesn't appear to be random. The first value in the generated array always seems to be from the 1st 2 records in my offers table. The offers table only has 5 records in it (I dont know if this is affecting things).
$arrayOfferCount = $offerCount-1;
$displayThisManyOffers = 3;
$range = range(0, $arrayOfferCount);
$vals = array_rand($range, $displayThisManyOffers);`
Any help or advice would be appreciated.
Working fine here. Benchmark it over lots of runs instead of just gut feeling... here it is for 1,000 tries:
<?php
$offerCount = 5;
$arrayOfferCount = $offerCount-1;
$displayThisManyOffers = 3;
$range = range(0, $arrayOfferCount);
for($i = 0; $i < 1000; $i++) {
$vals = array_rand($range, $displayThisManyOffers);
foreach($vals as $val) {
$counts[$val]++;
}
}
sort($counts);
print_r($counts);
Generates:
Array
(
[0] => 583
[1] => 591
[2] => 591
[3] => 610
[4] => 625
)
I know that mt_rand() is much better PRNG.
However, in your case you need to let the database select them for you
SELECT * FROM ads ORDER BY RAND() LIMIT 0, 3
It is probably randomly picking which to display, but displaying them in the same order they appear in your array. If you do it enough times (~20) you should get the third one to show up once if this is the case (chances of choosing exactly the last 3 out of 5 would be 1 in 5*4, so around every 20th one you'll see the third option appear).
array_rand seems not to work properly sometimes (see PHP-Manual comments).
Workaround: Get the array size and pick a random index using the function mt_rand

PHP lottery chances

I am working on a lottery script where from a list of 5 users a random one is choosed. My problem is that some of the users must have 50% more chances to win.
For example
bob - 1 chance
rob - 1.5 chance
mike - 1 chance
john - 1 chance
todd - 1.5 chance
How can i make this work? I was thinking build an array and use array_rand to get a random user but i have no ideea how to distribute the chances.
Thanks
For an uneven distribution like this, the approach is as follows:
Add all the weightings together, to get a total. This is the figure that you would use as a random number ceiling. In your example, this would be 6.
Now build an array with each element having the sum of all the elements below it in the array (the sort order doesn't matter).
Thus you would have an array like this:
bob = 1
rob = 2.5
mike = 3.5
john = 4.5
todd = 6
Now you can get the random number, and pick the array element which is the highest one where the score is less than the random number.
This will give you your weighted randomness, regardless of how uneven the weightings are.
Change you list to
bob - 1 chance
bob - 1 chance
rob - 1.5 chance
rob - 1.5 chance
rob - 1.5 chance
mike - 1 chance
mike - 1 chance
john - 1 chance
john - 1 chance
todd - 1.5 chance
todd - 1.5 chance
Then when you select one from this list, some have better or worse chance at winning.
By Changes you probably mean chances.
Well, what you can do is make an array, that has the users:
$boughtLotteryTicket = array('bob', 'rob', 'rob', 'mike', 'john', 'todd', 'todd');
And do whatever you wanted to it :). Since you added them twice, they both have double the possibility to win.
Same principle when participating in any lottery. Buy more tickets == higher change of winning.
Sum up all the chances weighted with the changes:
$sum =$pSum = 0;
foreach($users as $participant)
$sum += $participant['change'];
$rand = rand()/get_randmax();
foreach($users as $participant) {
$pSum += $participant['change']/$sum;
if($pSum > $rand) {
$winner = $participant;
break;
}
}
This is not likely the best way to handle it but for the sake of getting something this small done I assume for a work related thing..
make a simple array. Of names. Those who have 50% extra of a chance to win I would assume is equal to 2 in 1. So of those names, I would do duplicates in the array.
$lottoNames = array('bob', 'rob', 'rob', 'mike', 'john', 'todd', 'todd');
$x = count($lottoNames)-1;
$lotto = rand(0, $x);
echo $lottoNames[$lotto];
I can't say this would work but its a shawdy idea to run with again for simplicity of it all.
This might work:
$chance = array(
'bob' => 1,
'rob' => 1.5,
'mike' => 1,
'john' => 1,
'todd' => 1.5,
);
$range = array_sum($chance);
$offset = rand(0, $range-1);
$i = 0;
foreach ($chance as $person => $weight)
{
$i += $weight;
if ($i <= $offset)
{
echo $person . ' won';
break;
}
}

how to rate or rank votes

i'm really sorry if i'm wrong with my question but i want some idea...i want to have and idea of ranking algorithm with include time they submit there votes.
Nice Question!
Okay lets bring it on!
First of all one thing you cannot when calculating good ratings is Bayesianaverage
You ran read up on it but very simplified it takes care of the following:
Entries with little votes are not the true mean of their votes but have a componentn of the mean rating throughout your dataset. For example on IMDB the default rating is somewhere at 6.4. So a film with only 2 votes which were 10 stars each may still have something between 6 and 7. The more votes the more meaning they become alltogether and the rating is "pulled away" from the default. Imdb also implements a minimum number of votes for their movies to show up in listings.
Another thing that I find confusing is: Why is the time of the vote important? Didn't you mean the time of the entry that was voted on? So in our movies example just released movies are more important?
But anyway! In both cases good results are often achieved by applying logarithmic functions.
For our movie example movies relevance could be multiplied by
1 + 1/SQRT(1 + CURRENT_YEAR - RELEASE_YEAR )
So 1 is a socket rating that every movie gets.
A movie from teh current year will have a boost of 100% (200% relevance) as the above will return true. Last year 170%, 2 Years old 157% and son on.
But the difference of a movie from 1954 or 1963 is far not so great.
So remember:
Everything you use in your calculations. Is it really linear? May it distort your ratings? Are the relations throughout the dataset sane?
If you want to have recent votes cast more you can do that the exact same way but weight your votes. It makes sense too if you want recent voted stuff be "warmed up"... Because it is currently hot and discussed in your community for example.
That beeing said it remains just hard work. A lot of playing around etc.
Let me give you one last idea.
At the company I work we calculate a relevance for movies.
We have a config array where we store the "weighting" of several factors in the final relevance.
It looks like this:
$weights = array(
"year" => 2, // release year
"rating" =>13, // rating 0-100
"cover" => 4, // cover available?
"shortdescription" => 4, // short descr available?
"trailer" => 3, // trailer available?
"imdbpr" => 13, // google pagerank of imdb site
);
Then we calculate a value between 0 and 1 for every metric. There are different methods. But let me show you the example of our rating (which is itself an aggregated rating of several platforms that we crawl and that have different weightings themsevles etc.)
$yearDiff = $data["year"] - date('Y');
//year
if (!$data["year"]){
$values['year'] = 0;
} else if($yearDiff==0) {
$values['year'] = 1;
} else if($yearDiff < 3) {
$values['year'] = 0.8;
} else if($yearDiff < 10) {
$values['year'] = 0.6;
} else {
$values['year'] = 1/sqrt(abs($yearDiff));
}
So you see we hardcoded some "age intervals" and relyed on the sqrt function only for older movies. In fact the difference there is minimal so the SQRT example here is very poor.
But mathematical functions are very often useful!
You can, for example, also use periodic functions like sinus curves etc to calculate seasonal relevance! For example your year has a range from 0-1 then you can use sinus function to weight up summer hits / winter hits / autumn hits for the current time of the year!
One last example for the IMDB pagerank. It is completely hardcoded as there are only 10 different values possible and they are not distributed in an statistical homogenous way (pagerank 1 or 2 is even worse than none):
if($imdbpr >= 7) {
$values['imdbpr'] = 1;
} else if($imdbpr >= 6) {
$values['imdbpr'] = 0.9;
} else if($imdbpr >= 5) {
$values['imdbpr'] = 0.8;
} else if($imdbpr >= 4) {
$values['imdbpr'] = 0.6;
} else if($imdbpr >= 3) {
$values['imdbpr'] = 0.5;
} else if($imdbpr >= 2) {
$values['imdbpr'] = 0.3;
} else if($imdbpr >= 1) {
$values['imdbpr'] = 0.1;
} else if($imdbpr >= 0) {
$values['imdbpr'] = 0.0;
} else {
$values['imdbpr'] = 0.4; // no pagerank available. probably new
}
Then we sum it up like this:
foreach($values as $field=>$value) {
$malus += ($value*$weights[$field]) / array_sum($weights);
}
This may not be an exact answer to your question but a bit more and broadly, but I hope I pointed you in the right direction and gave you some points where your thoughts can pick up!
Have fun and success with your application!
Reddit's code is open source. There is a pretty good discussion of their ranking algorithm here, with code: http://amix.dk/blog/post/19588

Calculate average without being thrown by strays

I am trying to calculate an average without being thrown off by a small set of far off numbers (ie, 1,2,1,2,3,4,50) the single 50 will throw off the entire average.
If I have a list of numbers like so:
19,20,21,21,22,30,60,60
The average is 31
The median is 30
The mode is 21 & 60 (averaged to 40.5)
But anyone can see that the majority is in the range 19-22 (5 in, 3 out) and if you get the average of just the major range it's 20.6 (a big difference than any of the numbers above)
I am thinking that you can get this like so:
c+d-r
Where c is the count of a numbers, d is the distinct values, and r is the range. Then you can apply this to all the possble ranges, and the highest score is the omptimal range to get an average from.
For example 19,20,21,21,22 would be 5 numbers, 4 distinct values, and the range is 3 (22 - 19). If you plug this into my equation you get 5+4-3=6
If you applied this to the entire number list it would be 8+6-41=-27
I think this works pretty good, but I have to create a huge loop to test against all possible ranges. In just my small example there are 21 possible ranges:
19-19, 19-20, 19-21, 19-22, 19-30, 19-60, 20-20, 20-21, 20-22, 20-30, 20-60, 21-21, 21-22, 21-30, 21-60, 22-22, 22-30, 22-60, 30-30, 30-60, 60-60
I am wondering if there is a more efficient way to get an average like this.
Or if someone has a better algorithm all together?
You might get some use out of standard deviation here, which basically measures how concentrated the data points are. You can define an outlier as anything more than 1 standard deviation (or whatever other number suits you) from the average, throw them out, and calculate a new average that doesn't include them.
Here's a pretty naive implementation that you could fix up for your own needs. I purposely kept it pretty verbose. It's based on the five-number-summary often used to figure these things out.
function get_median($arr) {
sort($arr);
$c = count($arr) - 1;
if ($c%2) {
$b = round($c/2);
$a = $b-1;
return ($arr[$b] + $arr[$a]) / 2 ;
} else {
return $arr[($c/2)];
}
}
function get_five_number_summary($arr) {
sort($arr);
$c = count($arr) - 1;
$fns = array();
if ($c%2) {
$b = round($c/2);
$a = $b-1;
$lower_quartile = array_slice($arr, 1, $a-1);
$upper_quartile = array_slice($arr, $b+1, count($lower_quartile));
$fns = array($arr[0], get_median($lower_quartile), get_median($arr), get_median($upper_quartile), $arr[$c-1]);
return $fns;
}
else {
$b = round($c/2);
$a = $b-1;
$lower_quartile = array_slice($arr, 1, $a);
$upper_quartile = array_slice($arr, $b+1, count($lower_quartile));
$fns = array($arr[0], get_median($lower_quartile), get_median($arr), get_median($upper_quartile), $arr[$c-1]);
return $fns;
}
}
function find_outliers($arr) {
$fns = get_five_number_summary($arr);
$interquartile_range = $fns[3] - $fns[1];
$low = $fns[1] - $interquartile_range;
$high = $fns[3] + $interquartile_range;
foreach ($arr as $v) {
if ($v > $high || $v < $low)
echo "$v is an outlier<br>";
}
}
//$numbers = array( 19,20,21,21,22,30,60 ); // 60 is an outlier
$numbers = array( 1,230,239,331,340,800); // 1 is an outlier, 800 is an outlier
find_outliers($numbers);
Note that this method, albeit much simpler to implement than standard deviation, will not find the two 60 outliers in your example, but it works pretty well. Use the code for whatever, hopefully it's useful!
To see how the algorithm works and how I implemented it, go to: http://www.mathwords.com/o/outlier.htm
This, of course, doesn't calculate the final average, but it's kind of trivial after you run find_outliers() :P
Why don't you use the median? It's not 30, it's 21.5.
You could put the values into an array, sort the array, and then find the median, which is usually a better number than the average anyway because it discounts outliers automatically, giving them no more weight than any other number.
You might sort your numbers, choose your preferred subrange (e.g., the middle 90%), and take the mean of that.
There is no one true answer to your question, because there are always going to be distributions that will give you a funny answer (e.g., consider a biased bi-modal distribution). This is why may statistics are often presented using box-and-whisker diagrams showing mean, median, quartiles, and outliers.

Categories