Probabilities Function - - php

This is a bit of a math question, and cause am quite weak at math [ :( ] I can't figure this out
I have an application that must "randomly" decide if you won or not with a maximum daily winners, the problem is that i don't want to do a simple x chance of winning cause this might result in 20 people winning at the start of the day, and then everyone will keep losing, is there a generic formula to do this?
tl;dr
I have x amount of Gifts (x=20)
The user must know immediately if he won or not (can't do it at the end of the day)
And I want to randomly spread them throughout the day, is there a generic function/script?
After some suggestions in the comments, I could settle with either,
a solution that takes a predictable number of daily contestants (i ll just have a random guess for the first few days and change it accordingly)
a solution considering the time of the day, the gifts won so far, and the remaining gifts
Any ideas?

There is no math question here, not really, just some decisions that you need to make.
One possibility is to make the probability of winning be X/N where N is the expected number of visitors, until the gifts run out for that day. It is random, so it might be the case that on some days the gifts exhaust early. So what? That is how probability works. Extreme imbalances are unlikely. For example, say you have 20 gifts and 1000 visitors on an average day. The probability that the gifts will be exhausted by the 500th visitor is a binomial probability: the probability of having at least 20 successes in 500 trials where the probability of success is 20/1000 = 0.02. This probability works out to be just 0.003.
On days when there are unclaimed gifts -- increase the gift count for the next day and correspondingly increase the probability of winning. If you spin it the right way, this could increase interest in the game in sort of the same way that people buy more lottery tickets on days when a jackpot goes unclaimed.
Note that essentially the same idea can be implemented on different time resolutions. For example, use 4-hour time slots in place of whole days (with X and N adjusted accordingly). This will guarantee a more even spread of the gifts throughout the day (but to pull it off you might need to take into account that the expected number of visitors in a 4-hour time slot is unlikely to be constant over the course of a day. Different time slots might need different denominators).

Related

How to calculate luck factor based on bet winchance/result?

I am working on a website where users can bet on events with variable win chance. One of the rquests is to display the "Luck factor" of a certain user, based on his bets.
Here is the definition of the Luck factor:
The luck percentage displayed shows how many bets you have won compared to how many you 'should' have won. For example, if you play 10 times with a 10% chance of winning and win two of the 10 bets, your luck will show as 200%, since you have won twice as many as you 'should' have. Bet size is not taken into account when calculating luck, so it is possible to have a luck less than 100% and still show a profit if your winning bets risked more than your losing bets.
Here is my (MySQL) database structure:
Table bet
Columns:
winchance (0.01 - 99.99)
win (true/false)
The application is written in php, but I am sure a pseudocode example would push me to the right direction.
If I understand your question, right, You can take the average of winning probability using mysql winchance column and real winning ratio, would be (number of wins / total number of bets). Given these two values, luck factor would be real ratio / winchance avg * 100.
For instance, avg win chance is 0.1 and real winning ratio, is 2 / 10 = 0.2, then luck factor is 0.2/0.1 * 100 = 200%. This shall be easily calculated with mysql inbuilt functions itself.

What are the chances of getting 100 using mt_rand(1,100)?

I'm wondering what are the chances of getting 100 using mt_rand(1,100)?
Are the chances 1-100? does that mean I'll get atleast 100 once if i "roll" 100 times?
I've been wondering this for a while but I can't find any solution.
The reason why i wonder is because i'm trying to calculate how many times I have to roll in order to get 100 guaranteed.
<?php
$roll = mt_rand(1,100);
echo $roll;
?>
Regards Dennis
Are the chances 1-100? does that mean I'll get atleast 100 once if i "roll" 100 times?
No, thats not how random number generators work. Take an extreme example:
mt_rand(1, 2)
One would assume that over a long enough time frame that the number of 1s and the number of 2s would be the same. However, it is perfectly possible to get a sequence of 10 consecutive 1s. Just because its random, doesn't mean that a specific number must appear, if that were the case it would no longer be random.
I'm trying to calculate how many times I have to roll in order to get 100 guaranteed.
Mathematically, there is no number where 100 is guaranteed to be in the sequence. If each roll is independent there is a 99/100 chance that it won't be 100.
For two rolls this is (99/100)^2 or 98% likely. For 100 rolls its about 37% likely that you won't roll one 100 in that set. In fact, you need to roll in sets of 230 to have a less than 1% chance of having no 100s in the set.
The probability of getting 100 is 1/100 by calling this function however there is no guarantee of getting 100 when you call it for the 100 times. You have to take a much bigger sample space. For example: If you call this function for 100,000,000 times, there are good chances that 100 will be found for 100,000 times.
This can be answered in a better way if you let us know about your use case in more detail.
getting 1 out of 100 rolls is just a statistical way of explaining it. though there is 1%(means 1 out of 100), it doesn't mean you really will get one 1 out of 100 rolls. it's a matter of chances.
mt_rand uses the Mersenne Twister to generate pseudo random numbers, that are said to be uniform distributed. So if, you set min and max values, it should be (most likely) also uniform distributed.
So: you can only talk about the propability to get a number in the given range and also about an expected number of trys until you get a specific number or all numbers in range.
This means: No guarantees for a given number number to get a specific number at least once.

Create fixed length non-repeating permutation within certain ranges in PHP

I've got a table with 1000 recipes in it, each recipe has calories, protein, carbs and fat values associated with it.
I need to figure out an algorithm in PHP that will allow me to specify value ranges for calories, protein, carbs and fat as well as dictating the number of recipes in each permutation. Something like:
getPermutations($recipes, $lowCal, $highCal, $lowProt, $highProt, $lowCarb, $highCarb, $lowFat, $highFat, $countRecipes)
The end goal is allowing a user to input their calorie/protein/carb/fat goals for the day (as a range, 1500-1600 calories for example), as well as how many meals they would like to eat (count of recipes in each set) and returning all the different meal combinations that fit their goals.
I've tried this previously by populating a table with every possible combination (see: Best way to create Combination of records (Order does not matter, no repetition allowed) in mySQL tables ) and querying it with the range limits, however that proved not to be efficient as I end up with billions of records to scan through and it takes an indefinite amount of time.
I've found some permutation algorithms that are close to what I need, but don't have the value range restraint for calories/protein/carbs/fat that I'm looking for (see: Create fixed length non-repeating permutation of larger set) I'm at a loss at this point when it comes to this type of logic/math, so any help is MUCH appreciated.
Based on some comment clarification, I can suggest one way to go about it. Specifically, this is my "try the simplest thing that could possibly work" approach to a problem that is potentially quite tricky.
First, the tricky part is that the sum of all meals has to be in a certain range, but SQL does not have a built-in feature that I'm aware of that does specifically what you want in one pass; that's ok, though, as we can just implement this functionality in PHP instead.
So lets say you request 5 meals that will total 2000 calories - we leave the other variables aside for simplicity, but they will work the same way. We then calculate that the 'average' meal is 2000/5=400 calories, but obviously any one meal could be over or under that amount. I'm no dietician, but I assume you'll want no meal that takes up more than 1.25x-2x the average meal size, so we can restrict out initial query to this amount.
$maxCalPerMeal = ($highCal / $countRecipes) * 1.5;
$mealPlanCaloriesRemaining = $highCal; # more on this one in a minute
We then request 1 random meal which is less than $maxCalPerMeal, and 'save' it as our first meal. We then subtract its actual calorie count from $mealPlanCaloriesRemaining. We now recalculate:
$maxCalPerMeal = ($highCal / $countRecipesRemaining) * 1.5); # 1.5 being a maximum deviation from average multiple
Now the next query will ask for both a random meal that is less than $maxCalPerMeal AND $mealPlanCaloriesRemaining, AND NOT one of the meals you already have saved in this particular meal plan option (thus ensuring unique meals - no mac'n'cheese for breakfast, lunch, and dinner!). And we update the variables as in the last query, until you reach the end. For the last meal requested it we don't care about the average and it's associated multiple, as thanks to a compound query you'll get what you want anyway and don't need to complicate your control loops.
Assuming the worst case with the 5 meal 2000 calorie max diet:
Meal 1: 600 calories
Meal 2: 437
Meal 3: 381
Meal 4: 301
Meal 5: 281
Or something like that, and in most cases you'll get something a bit nicer and more random. But in the worst-case it still works! Now this actually just plain works for the usual case. Adding more maximums like for fat and protein, etc, is easy, so lets deal with the lows next.
All we need to do to support "minimum calories per day" is add another set of averages, as such:
$minCalPerMeal = ($lowCal / $countRecipes) * .5 # this time our multiplier is less than one, as we allow for meals to be bigger than average we must allow them to be smaller as well
And you restrict the query to being greater than this calculated minimum, recalculating with each loop, and happiness naturally ensues.
Finally we must deal with the degenerate case - what if using this method you end up needing a meal that is to small or too big to fill the last slot? Well, you can handle this a number of ways. Here's what I'd recommended.
The easiest is just returning less than the desired amount of meals, but this might be unacceptable. You could also have special low calorie meals that, due to the minimum average dietary content, would only be likely to be returned if someone really had to squeeze in a light meal to make the plan work. I rather like this solution.
The second easiest is throw out the meal plan you have so far and regenerate from scratch; it might work this time, or it just might not, so you'll need a control loop to make sure you don't get into an infinite work-intensive loop.
The least easy, requires a control loop max iteration again, but here you use a specific strategy to try to get a more acceptable meal plan. In this you take the optional meal with the highest value that is exceeding your dietary limits and throw it out, then try pulling a smaller meal - perhaps one that is no greater than the new calculated average. It might make the plan as a whole work, or you might go over value on another plan, forcing you back into a loop that could be unresolvable - or it might just take a few dozen iterations to get one that works.
Though this sounds like a lot when writing it out, even a very slow computer should be able to churn out hundreds of thousands of suggested meal plans every few seconds without pausing. Your database will be under very little strain even if you have millions of recipes to choose from, and the meal plans you return will be as random as it gets. It would also be easy to make certain multiple suggested meal plans are not duplicates with a simple comparison and another call or two for an extra meal plan to be generated - without fear of noticeable delay!
By breaking things down to small steps with minimal mathematical overhead a daunting task becomes manageable - and you don't even need a degree in mathematics to figure it out :)
(As an aside, I think you have a very nice website built there, so no worries!)

Algorithm for Removing Outliers from a dataset of prices

This is kind of a neat problem and I've enjoyed thinking it through...
Assume that you run a "Widget Rental" website, and on your application and you want to allow prospective purchasers to sort the widgets based on prices. (Low to high or high to low).
Each widget can have a different price based on the time of year. Some widgets will have dozens of different prices depending on the season as you get "high" seasons and "low" seasons.
However, the sellers of the "Widgets" are especially mischievous, and have realised that if they set their widget to be really expensive for one day of the year, and also really cheap one day of the year, then they can easily appear at the low and high sort ranges.
Currently, I took a very naive solution in order to calculate the "lowest price" for a Widget, which is to just take the lowest( N ) value from a dataset.
What I would like to is to get a "lowest from price" for a widget, which accurately portrays the price which it could be rented from.. and remove the lower/higher-band outliers.
Take a look at this chart... with values...
X Axis - Time (each significant interval is a day)
Y Axis - Price
The X axis is time, and the Y axis is the price. Now, this contains a normal distribution, and there aren't any real statistical outliers in that dataset. It's common to see the price between the lowest value and the upper value to fluctuate as much as 200%.
However, take a look at this second chart... It contains a single day tariff, which is only 20 ēuros...
I've played around with using Grubbs test and it seems to work quite well.
The important thing is that I want to get a "from price". That is to say, I want to be able to say, "You can rent this widget from XXXX". So it should be reflect the overall pricing taken as a whole and ignore clear outliers.
PHP bonus points if you point me in the direction of anything that already exists. (But I'm happy to code this myself in PHP).
One issue is that there are multiple definitions for what an outlier actually is. However, for this purpose a straight forward solution seems sufficient.
You could remove outliers by limiting the range of values to either +- some percentage or +- some number of standard deviations (probably one or two, but it could very) from the average price. Likely you'd probably want to use a combination of both, as if the prices don't very much, then a discount could be viewed as an outlier, which may or may not be appropriate. In any case, you'd likely have to do some experimenting to determine how sensitive it is. Chances are you'd probably want to set it so outliers must be at least some percentage away from the mean even if it's only 5-20 percent. Below are a few percentage based limits based on an average of $500.
90%: $50 to $950
75%: $125 to $875
50%: $250 to $750
30%: $350 to $650
25%: $375 to $650
If multiple passes are used, then it would be easier to sort the prices, then remove the price that is farthest from the average (perhaps considering the highest price as well as the lowest price) as long as it exceeds the range. This ends up being O(N*D log D) to obtain the result of continuous single passes until they have no effect, instead of O(N*D) for a single pass, where N is the number of items to rent and D is the number of days considered.
You also might find the Ramer–Douglas–Peucker algorithm useful for finding points of interest after a bit of experimenting with how to define the value of epsilon.

How to calculate percentile rank for point totals over different time spans?

On a PHP & CodeIgniter-based web site, users can earn reputation for various actions, not unlike Stack Overflow. Every time reputation is awarded, a new entry is created in a MySQL table with the user_id, action being rewarded, and value of that bunch of points (e.g. 10 reputation). At the same time, a field in a users table, reputation_total, is updated.
Since all this is sort of meaningless without a frame of reference, I want to show users their percentile rank among all users. For total reputation, that seems easy enough. Let's say my user_id is 1138. Just count the number of users in the users table with a reputation_total less than mine, count the total number of users, and divide to find the percentage of users with a lower reputation than mine. That'll be user 1138's percentile rank, right? Easy!
But I'm also displaying reputation totals over different time spans--e.g., earned in the past seven days, which involves querying the reputation table and summing all my points earned since a given date. I'd also like to show percentile rank for the different time spans--e.g., I may be 11th percentile overall, but 50th percentile this month and 97th percentile today.
It seems I would have to go through and find the reputation totals of all users for the given time span, and then see where I fall within that group, no? Is that not awfully cumbersome? What's the best way to do this?
Many thanks.
I can think of a few options off the top of my head here:
As you mentioned, total up the reputation points earned during the time range and calculate the percentile ranks based on that.
Track updates to reputation_total on a daily basis - so you have a table with user_id, date, reputation_total.
Add some new columns to the user table (reputation_total, reputation_total_today, reputation_total_last30days, etc) for each time range. You could also normalize this into a separate table (reputation_totals) to prevent you from having to add a new column for each time span you want to track.
Option #1 is the easiest, but it's probably going to get slow if you have lots of rows in your reputation transaction table - it won't scale very well, especially if you need to calculate these in real time.
Option #2 is going to require more storage over time (one row per user per day) but would probably be significantly faster than querying the transaction table directly.
Option #3 is less flexible, but would likely be the fastest option.
Both options 2 & 3 would likely require a batch process to calculate the totals on a daily basis, so that's something to consider as well.
I don't think any option is necessarily the best - they all involve different tradeoffs of speed/storage space/complexity/flexibility. What you do will ultimately depend on the requirements for your application of course.
I don't see why that would be too overly complex. Generally all you would need is to add to your WHERE clause a query that limits results like:
WHERE DatePosted between #StartOfRange and #EndOfRange

Categories