Creating a scoring metric - php

I am trying to create a way to rank domain between 1 and 100 based on a bunch of different metrics that range from 1 to 999,999,999. The idea is to use 3 different metrics and come up with a single number that can be accurately used to measure how good or bad a certain domain is.
One of the metrics I am using for this is AlexaRank which ranges from 1 to 999,999 (I think). 1 is obviously better. Another one would be the number of pages indexed by Google, where 1 is bad.
I think the correct way of doing this would be to give a certain base score to the range of numbers. For example, a domain with alexa rank 1 can have a base score of 49.9995, one with AR of 313 can be 46.7648 whereas one with AR of 123000 could be 24.4875 and something with AR 999000 can be 2.5478.
Does anyone know of a exponential logic that I can use for this? Really doesnt matter which language it is(prefer PHP), I would just like some examples of the logic. Any ideas are much appreciated.
Thanks

Alexa from 1 to 99,999,999. If one website scores 1 then it has a 100% score
Google indexed pages (you need to know how many pages the site has to extract total) if indexed is close to total than they get 100%. If indexed are 30 out of 100 total pages then they get 30% obviously
Let's say that alexa total score possible is 200 points and google total score possible is 200 points than we can do this math:
if they scored 70% in alexa then they score 140 here. If they scored 60% in Google indexed pages then they score 120. 140 + 120 = 260 total score.
There are multiple ways of doing this, you just need all the right numbers.
I hope I'm making sense lol.

Related

Ranking players based off the percentages

I have a database that has over 1,000 users and points that rank them on skill. I'm going to take the user with the most points and base off my rankings on that alone. So, say the player has 900 points, I will create 5 "divisions" based off that. The higher the division, the lesser the percentages and harder it is to get into. So, with that being said, let me show you an example.
Here are 5 divisions based from best to least.
Master
Platinum
Gold
Silver
Bronze
Iron
So, the player with the most points would already be in the "Master" division. Then I want to create percentages based off of these divisions. So, for example, here is how I plan on setting up the percentages.
Master 5%
Platinum 10%
Gold 15%
Silver 20%
Bronze 25%
Iron 25%
So as you can see, I need to be able to get the percentages of this and use it on the player who has the most points. So in this case, I need to be able to display this on any user's profile. So I need to take the user's points of the profile I am looking at and compare it with the player who has the most points and form a division for the given player's profile I am looking at.
function rank($rank, $mostpoints) {
$count = $rank / $mostpoints;
}
I am aware it isn't really much. But I know I can finish the rest off with IF statements, however, I want to know the best way to take percents of this number. I need to be able to take the current rank and spit out a division. I realized, however, that I can't just take the percent of Master (5%) and divide it by the most points and have it allocate as the Master. I need a way to be able to break the mostpoints into 5 sections based on the percentages and then do if statements to see if the user belongs in them based on the rank. Can anyone provide any feedback on the most efficient way to do this?

How to calculate luck factor based on bet winchance/result?

I am working on a website where users can bet on events with variable win chance. One of the rquests is to display the "Luck factor" of a certain user, based on his bets.
Here is the definition of the Luck factor:
The luck percentage displayed shows how many bets you have won compared to how many you 'should' have won. For example, if you play 10 times with a 10% chance of winning and win two of the 10 bets, your luck will show as 200%, since you have won twice as many as you 'should' have. Bet size is not taken into account when calculating luck, so it is possible to have a luck less than 100% and still show a profit if your winning bets risked more than your losing bets.
Here is my (MySQL) database structure:
Table bet
Columns:
winchance (0.01 - 99.99)
win (true/false)
The application is written in php, but I am sure a pseudocode example would push me to the right direction.
If I understand your question, right, You can take the average of winning probability using mysql winchance column and real winning ratio, would be (number of wins / total number of bets). Given these two values, luck factor would be real ratio / winchance avg * 100.
For instance, avg win chance is 0.1 and real winning ratio, is 2 / 10 = 0.2, then luck factor is 0.2/0.1 * 100 = 200%. This shall be easily calculated with mysql inbuilt functions itself.

MYSQL sorting content by rating logic and opinion?

I'm designing a site and don't know how to rate the system in terms of logic.
Outcome is I want an item with 4 stars with 1000 votes to be ranked higher than an item with 1 vote of 5 stars. However, I don't want an item with 1 star with 1000 votes to be ranked higher than an item with 4 stars and 200 votes.
Anyone have any ideas or advice on what to do?
I found these two questions
Sorting by weighted rating in SQL?
MySQL Rating System - Find Rating
and they have their drawbacks and in the first one I don't understand what the winner means by "You may want to denormalize this rating value into event for performance reasons if you have a lot of ratings coming in." Please share some insight? Thank you!
Here's a quick sketch-up of such a system which works by defining a bonus factor xₙ for each flag number. According to your question you want:
x₄*4*1000 > x₅*1*5
and
x₁*1*1000 < x₄*4*200
Setting the factors to for example x₁=1, x₄=2 and x₅=2 will satisfy this, but you will of course want to adjust it and add the missing factors.
He means, you should put rating-data into the event-table (and thus have redundant data) to optimize it for performance.
See the wiki for Denormalization: http://en.wikipedia.org/wiki/Denormalization
The data you have to determine the rank of items is:
average rating
number of ratings
The hard part is probably to make rules for the ranking. Like: If the average rating for an item > 4 and the number of ratings < 4 treat it like rated 3.9
For convenience, I would put this value (how to treat the items for ranking) in the item-table.

Adding an extra factor (number of clicks) to a Bayesian ranking system

I run a music website for amateur musicians where we have a rating system based on a score out of 10, which is then calculated into an overall score out of 100. We have a "credibility" points system for users which directly influences the average score at the point of rating, but the next step is to implement a chart system which uses this data effectively.
I'll try and explain exactly how it all works so you can see which data I have at my disposal.
A site member rates a track between 1 and 10.
That site member has a "credibility" score, which is just a total of points accumulated for various activities around the site. A user gains, for example, 100 points for giving a rating so the more ratings they give, the higher their "credibility" score. Only the total credibility score is saved in the database, updated each time a user performs an activity with a points reward attached. These individual activities are not stored.
Based on the credibility of this user compared to other users who have rated the track, a weighted average is calculated for the track, which is then stored as a number between 1 and 100 in the tracks table.
In the tracks table, the number of times a track is listened to (i.e. number of plays) is also stored as a total.
So the data I have to work with is:
Overall rating for the track (number between 1 and 100)
Number of ratings for the track
Number of plays for the track
In the chart system I want to create a ranking that uses the above 3 sets of data to create a fair balance between quality (overall rating, normalized with number of ratings) and popularity (number of plays). BUT the system should factor quality more heavily than popularity, so for example the quality aspect makes up 75% of the normalized ranking and popularity 25%.
After a search on this site I found the IMDB Bayesian-style system which is helpful for working out the quality aspect, but how do I add in the popularity (number of plays) and have it balanced in the way I want?
The site is written in PHP and MySQL if that helps.
EDIT: the title says "number of clicks" but this is basically the direct equivalent of "number of plays".
You may want to try the following. The IMDB equation you mentioned uses weighing to lean toward either the average rating of the movie or the average rating of all movies:
WR = (v/(v+m)) × R + (m/(v+m)) × C
So
v << m => v/(v+m) -> 0; m/(v+m) -> 1 => WR -> C
and
v >> m => v/(v+m) -> 1; m/(v+m) -> 0 => WR -> R
This should generally be fair. Calculating a popularity score between 0 and 100 based on the number of plays is pretty tricky unless you really know your data. As a first try calculate the average number of plays avg(p) and the variance var(p) you can then use these to scale the number of plays using a technique call whitening:
WHITE(P) = (p - avg(p))/var(p)
This will give you a score between -1 and 1 by assuming your data looks like a bell curve. You can then scale this to be in the range 0 - 100 by scaling again:
POP = 50 * (1 + WHITE(P))
To combine the score based on some weighting factor w (e.g. 0.75) you'd simply do:
RATING = w x WR + (1 - w) x POP
Play with these and let me know how you get on.
NOTE: this does not account for the fact that a use can "game" the popularity buy playing a track many times. You could get around this by penalising multiple plays of a single song:
deltaP = (1 - (Puser - 1)/TPuser)
Where:
deltaP = Change in # plays
Puser = number of time this user has played this track
TPuser = total number of tracks (not unique) played by the user
So the more times a user plays just the one track the less it counts toward the total number of plays for that track. If the users listening habits are diverse then TPuser will be large and so deltaP will tend back to 1. This still can be gamed but is a good start.

How to calculate percentage of "equal or better" in mysql DB?

Figuring out a title for this question was hard, but the following is harder for me. I hope anyone can help.
I have a simple MySQL database table. This table is filled with records containing an ID and the number of week-visitors. It has records of 2 year of about 200+ websites.
To summarize, I want to be able to know two things:
1.) - "In week 54 of 2009 the website somethingonline.com had 300 visitors" (Easy of course. I can do this)
2.) - "The webiste sometingonline.com was among the 8% best scoring websites in that week."
Now, how can I get number 2.??? Of course, I want to know that percentage of all websites in every week so I get a list like:
sometingonline1.com - 300 visitors - 8% of the website score like this or better
sometingonline2.com - 400 visitors - 4% of the website score like this or better
sometingonline3.com - 500 visitors - 2% of the website score like this or better
sometingonline4.com - 600 visitors - 1% of the website score like this or better
How can I get these results? Is this possible in one query?
I use MySQL and PHP.
The key is to involve two different "copies" of your visits table. In this query v1 represents the website you're actually looking at. For each of those v1 websites, we'll join to a copy of the visits table, matching any row that covers a site with more visits in the same week.
SELECT v1.website_name, v1.visits, COUNT(v2.id)
FROM visits AS v1
INNER JOIN visits AS v2 ON (v1.week_number = v2.week_number AND v2.visits > v1.visits AND v2.id != v1.id)
WHERE week_number = 54
This will tell you the number of sites that had more visitors. To get that as a percentage, run a separate query to simply count the total number of sites that had any visits in that week. In your PHP script you can then do the simple division to get the percentage you want.

Categories