Scoring algorithm user/actions

Scoring algorithm user/actions - php

I'm looking for a scoring algorithm but I really don't find what I'm looking for.
Let's imagine an application in which an user can post reviews, can vote on articles or create reviews, like or dislike reviews/articles. The goal is to rank user depending on their activity.
The system himself might grow like a neperian logarithm or something, so the more points you have the more difficulty you have to get more points. Also a newbie must have a voice against a very experienced user, but his voice is less important.
I'm thinking about:
Each action has a base value.
An user has an amount of points, which determines his weight.
His amount of points determine his level and so the set of actions he can perform.
When a user performs an action his weight affect the base value of the action, and his amount of points increase so his weight also.
Does this sound correct ? And Do you know some algorithms/examples of code implementing this kind of thing ? Thanks
PS : I also mean which function should I take has weight ?

How about something like this:
Each action has base value at least 100 points (but not more than 1000).
User starts with 100 points
And if user that has X points in total performs action with value base Y you give him 10 * Y / log2(log2(X)) extra points.
So after performing consecutively actions with 100 base value, user recives:
360 points (460 total points)
317 points (777 total points)
306 points (1083 total points)
and so on

Related

Ranking players based off the percentages

I have a database that has over 1,000 users and points that rank them on skill. I'm going to take the user with the most points and base off my rankings on that alone. So, say the player has 900 points, I will create 5 "divisions" based off that. The higher the division, the lesser the percentages and harder it is to get into. So, with that being said, let me show you an example.
Here are 5 divisions based from best to least.
Master
Platinum
Gold
Silver
Bronze
Iron
So, the player with the most points would already be in the "Master" division. Then I want to create percentages based off of these divisions. So, for example, here is how I plan on setting up the percentages.
Master 5%
Platinum 10%
Gold 15%
Silver 20%
Bronze 25%
Iron 25%
So as you can see, I need to be able to get the percentages of this and use it on the player who has the most points. So in this case, I need to be able to display this on any user's profile. So I need to take the user's points of the profile I am looking at and compare it with the player who has the most points and form a division for the given player's profile I am looking at.
function rank($rank, $mostpoints) {
$count = $rank / $mostpoints;
}
I am aware it isn't really much. But I know I can finish the rest off with IF statements, however, I want to know the best way to take percents of this number. I need to be able to take the current rank and spit out a division. I realized, however, that I can't just take the percent of Master (5%) and divide it by the most points and have it allocate as the Master. I need a way to be able to break the mostpoints into 5 sections based on the percentages and then do if statements to see if the user belongs in them based on the rank. Can anyone provide any feedback on the most efficient way to do this?

MYSQL sorting content by rating logic and opinion?

I'm designing a site and don't know how to rate the system in terms of logic.
Outcome is I want an item with 4 stars with 1000 votes to be ranked higher than an item with 1 vote of 5 stars. However, I don't want an item with 1 star with 1000 votes to be ranked higher than an item with 4 stars and 200 votes.
Anyone have any ideas or advice on what to do?
I found these two questions
Sorting by weighted rating in SQL?
MySQL Rating System - Find Rating
and they have their drawbacks and in the first one I don't understand what the winner means by "You may want to denormalize this rating value into event for performance reasons if you have a lot of ratings coming in." Please share some insight? Thank you!

Here's a quick sketch-up of such a system which works by defining a bonus factor xₙ for each flag number. According to your question you want:
x₄*4*1000 > x₅*1*5
and
x₁*1*1000 < x₄*4*200
Setting the factors to for example x₁=1, x₄=2 and x₅=2 will satisfy this, but you will of course want to adjust it and add the missing factors.

He means, you should put rating-data into the event-table (and thus have redundant data) to optimize it for performance.
See the wiki for Denormalization: http://en.wikipedia.org/wiki/Denormalization
The data you have to determine the rank of items is:
average rating
number of ratings
The hard part is probably to make rules for the ranking. Like: If the average rating for an item > 4 and the number of ratings < 4 treat it like rated 3.9
For convenience, I would put this value (how to treat the items for ranking) in the item-table.

Find highest total price by selling items to multiple buyers, limited by user input to how many separate sales can be made

EDIT: Im sorry guys my explantion of the problem wasn't clear! This should be better:
User sends ID numbers of articles and the max. number of bundles(packages)
API searches for all prices available for the articles and calculates best result for min. number of bundles (limit to max. number provided by customer)
ONE Bundle is one package of items delivered to ONE platform(buyer)
Thanks!

This is a fun little problem. I spent a few hours on it this morning, and while I don't have a complete solution, I think I have enough for you to get started (which I believe was what you asked for).
First of all, I'm assuming these things, based on your description of the problem:
All buyers quote a price for all the items
There's no assumption about the items, they may all be different
The user can only interact with a limited number of buyers
The user wants to sell every item, each to one buyer
The user may sell multiple items to a single buyer
Exact solution -- brute force approach
For this, the first thing to realize is that, for a given set of buyers, it is straight forward to calculate the maximum total revenue, because you can just choose the highest price offered in that set of buyers for each item. Add up all those highest prices, and you have the max total revenue for that set of buyers.
Now all you have to do is make that calculation for every possible combination of buyers. That's a basic combinations problem: "n choose k" where n is the total number of buyers and k is the number of buyers you're limited to. There are functions out there that will generate lists of these combinations (I wrote my own... there's also this PEAR package for php).
Once you have a max total revenue for every combination of chosen buyers, just pick the biggest one, and you've solved the problem.
More elegant algorithm?
However, as I intimated by calling this "brute force", the above is not fast, and scales horribly. My machine runs out of memory with 20 buyers and 20 items. I'm sure a better algorithm exists, and I've got a good one, but it isn't perfect.
It's based on opportunity costs. I calculate the difference between the highest price and the second highest price for each item. That difference is an opportunity cost for not picking the buyer with that highest price.
Then I pick buyers offering high prices for items where the opportunity cost is the highest (thus avoiding the worst opportunity costs), until I have k - 1 buyers (where k is the max I can pick). The final choice is tricky, and instead of writing a much more complicated algorithm, I just run all the possibilities for the final buyer and pick the best revenue.
This strategy picks the best combination most of the time, and if it misses, it doesn't miss much. Its also scales relatively well. It's 10x faster than brute force on small scales, and if I quadruple all the parameters (buyers, buyer limit, and items), calculation time goes up by a factor of 20. Considering how many combinations are involved, that's pretty good.
I've got some code drafted, but it's too long for this post. Let me know if you're interested, and I'll figure out a way to send it to you.

This is a graph problem. It can be solved with the Edmond's Blossom V algorithm. It's a matching algorithm to find the best pairwise matching for example in dating programs. Maybe you want to look for the 1d bin-packing algorithm. In 1d bin-packing you have a limit items to assign to unlimited boxes or shelves the better the boxes get filled.

If I understand the problem correctly, it is NP-complete via reduction from Minimum Set Cover. We can translate an instance of Set Cover into an instance of the OP's problem as follows:
Let an instance of Set Cover be given by a set X of size n and a collection of subsets S_1, S_2, ..., S_m of X. Construct an instance of the OP's problem where the seller has n items to sell to m buyers, where buyer i offers a price of 1 for item j if *S_i* contains item j and 0 otherwise. A solution to the OP's problem where the number of buyers is limited by k and the total price paid is n corresponds to a solution to the original Set Cover problem with k sets. So, if you had a polynomial-time solution to the OP's problem, you could solve Minimum Set Cover by successively solving it for the case of 1, 2, 3, etc... buyers until you found a solution with total price equal to n.

Creating a scoring metric

I am trying to create a way to rank domain between 1 and 100 based on a bunch of different metrics that range from 1 to 999,999,999. The idea is to use 3 different metrics and come up with a single number that can be accurately used to measure how good or bad a certain domain is.
One of the metrics I am using for this is AlexaRank which ranges from 1 to 999,999 (I think). 1 is obviously better. Another one would be the number of pages indexed by Google, where 1 is bad.
I think the correct way of doing this would be to give a certain base score to the range of numbers. For example, a domain with alexa rank 1 can have a base score of 49.9995, one with AR of 313 can be 46.7648 whereas one with AR of 123000 could be 24.4875 and something with AR 999000 can be 2.5478.
Does anyone know of a exponential logic that I can use for this? Really doesnt matter which language it is(prefer PHP), I would just like some examples of the logic. Any ideas are much appreciated.
Thanks

Alexa from 1 to 99,999,999. If one website scores 1 then it has a 100% score
Google indexed pages (you need to know how many pages the site has to extract total) if indexed is close to total than they get 100%. If indexed are 30 out of 100 total pages then they get 30% obviously
Let's say that alexa total score possible is 200 points and google total score possible is 200 points than we can do this math:
if they scored 70% in alexa then they score 140 here. If they scored 60% in Google indexed pages then they score 120. 140 + 120 = 260 total score.
There are multiple ways of doing this, you just need all the right numbers.
I hope I'm making sense lol.

Popularity Algorithm

I'd like to populate the homepage of my user-submitted-illustrations site with the "hottest" illustrations uploaded.
Here are the measures I have available:
How many people have favourited that illustration
votes table includes date voted
When the illustration was uploaded
illustration table has date created
Number of comments (not so good as max comments total about 10 at the moment)
comments table has comment date
I have searched around, but don't want user authority to play a part, but most algorithms include that.
I also need to find out if it's better to do the calculation in the MySQL that fetches the data or if there should be a PHP/cron method every hour or so.
I only need 20 illustrations to populate the home page. I don't need any sort of paging for this data.
How do I weight age against votes? Surely a site with less submission needs less weight on date added?

Many sites that use some type of popularity ranking do so by using a standard algorithm to determine a score and then decaying eternally over time. What I've found works better for sites with less traffic is a multiplier that gives a bonus to new content/activity - it's essentially the same, but the score stops changing after a period of time of your choosing.
For instance, here's a pseudo-example of something you might want to try. Of course, you'll want to adjust how much weight you're attributing to each category based on your own experience with your site. Comments are rare, but take more effort from the user than a favorite/vote, so they probably should receive more weight.
score = (votes / 10) + comments
age = UNIX_TIMESTAMP() - UNIX_TIMESTAMP(date_created)
if(age < 86400) score = score * 1.5
This type of approach would give a bonus to new content uploaded in the past day. If you wanted to approach this in a similar way only for content that had been favorited or commented on recently, you could just add some WHERE constraints on your query that grabs the score out from the DB.
There are actually two big reasons NOT to calculate this ranking on the fly.
Requiring your DB to fetch all of that data and do a calculation on every page load just to reorder items results in an expensive query.
Probably a smaller gotcha, but if you have a relatively small amount of activity on the site, small changes in the ranking can cause content to move pretty drastically.
That leaves you with either caching the results periodically or setting up a cron job to update a new database column holding this score you're ranking by.

Obviously there is some subjectivity in this - there's no one "correct" algorithm for determining the proper balance - but I'd start out with something like votes per unit age. MySQL can do basic math so you can ask it to sort by the quotient of votes over time; however, for performance reasons, it might be a good idea to cache the result of the query. Maybe something like
SELECT images.url FROM images ORDER BY (NOW() - images.date) / COUNT((SELECT COUNT(*) FROM votes WHERE votes.image_id = images.id)) DESC LIMIT 20
but my SQL is rusty ;-)
Taking a simple average will, of course, bias in favor of new images showing up on the front page. If you want to remove that bias, you could, say, count only those votes that occurred within a certain time limit after the image being posted. For images that are more recent than that time limit, you'd have to normalize by multiplying the number of votes by the time limit then dividing by the age of the image. Or alternatively, you could give the votes a continuously varying weight, something like exp(-time(vote) + time(image)). And so on and so on... depending on how particular you are about what this algorithm will do, it could take some experimentation to figure out what formula gives the best results.

I've no useful ideas as far as the actual agorithm is concerned, but in terms of implementation, I'd suggest caching the result somewhere, with a periodic update - if the resulting computation results in an expensive query, you probably don't want to slow your response times.

Something like:
(count favorited + k) * / time since last activity
The higher k is the less weight has the number of people having it favorited.
You could also change the time to something like the time it first appeared + the time of the last activity, this would ensure that older illustrations would vanish with time.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.