I'm wondering what are the chances of getting 100 using mt_rand(1,100)?
Are the chances 1-100? does that mean I'll get atleast 100 once if i "roll" 100 times?
I've been wondering this for a while but I can't find any solution.
The reason why i wonder is because i'm trying to calculate how many times I have to roll in order to get 100 guaranteed.
<?php
$roll = mt_rand(1,100);
echo $roll;
?>
Regards Dennis
Are the chances 1-100? does that mean I'll get atleast 100 once if i "roll" 100 times?
No, thats not how random number generators work. Take an extreme example:
mt_rand(1, 2)
One would assume that over a long enough time frame that the number of 1s and the number of 2s would be the same. However, it is perfectly possible to get a sequence of 10 consecutive 1s. Just because its random, doesn't mean that a specific number must appear, if that were the case it would no longer be random.
I'm trying to calculate how many times I have to roll in order to get 100 guaranteed.
Mathematically, there is no number where 100 is guaranteed to be in the sequence. If each roll is independent there is a 99/100 chance that it won't be 100.
For two rolls this is (99/100)^2 or 98% likely. For 100 rolls its about 37% likely that you won't roll one 100 in that set. In fact, you need to roll in sets of 230 to have a less than 1% chance of having no 100s in the set.
The probability of getting 100 is 1/100 by calling this function however there is no guarantee of getting 100 when you call it for the 100 times. You have to take a much bigger sample space. For example: If you call this function for 100,000,000 times, there are good chances that 100 will be found for 100,000 times.
This can be answered in a better way if you let us know about your use case in more detail.
getting 1 out of 100 rolls is just a statistical way of explaining it. though there is 1%(means 1 out of 100), it doesn't mean you really will get one 1 out of 100 rolls. it's a matter of chances.
mt_rand uses the Mersenne Twister to generate pseudo random numbers, that are said to be uniform distributed. So if, you set min and max values, it should be (most likely) also uniform distributed.
So: you can only talk about the propability to get a number in the given range and also about an expected number of trys until you get a specific number or all numbers in range.
This means: No guarantees for a given number number to get a specific number at least once.
Related
This is a bit of a math question, and cause am quite weak at math [ :( ] I can't figure this out
I have an application that must "randomly" decide if you won or not with a maximum daily winners, the problem is that i don't want to do a simple x chance of winning cause this might result in 20 people winning at the start of the day, and then everyone will keep losing, is there a generic formula to do this?
tl;dr
I have x amount of Gifts (x=20)
The user must know immediately if he won or not (can't do it at the end of the day)
And I want to randomly spread them throughout the day, is there a generic function/script?
After some suggestions in the comments, I could settle with either,
a solution that takes a predictable number of daily contestants (i ll just have a random guess for the first few days and change it accordingly)
a solution considering the time of the day, the gifts won so far, and the remaining gifts
Any ideas?
There is no math question here, not really, just some decisions that you need to make.
One possibility is to make the probability of winning be X/N where N is the expected number of visitors, until the gifts run out for that day. It is random, so it might be the case that on some days the gifts exhaust early. So what? That is how probability works. Extreme imbalances are unlikely. For example, say you have 20 gifts and 1000 visitors on an average day. The probability that the gifts will be exhausted by the 500th visitor is a binomial probability: the probability of having at least 20 successes in 500 trials where the probability of success is 20/1000 = 0.02. This probability works out to be just 0.003.
On days when there are unclaimed gifts -- increase the gift count for the next day and correspondingly increase the probability of winning. If you spin it the right way, this could increase interest in the game in sort of the same way that people buy more lottery tickets on days when a jackpot goes unclaimed.
Note that essentially the same idea can be implemented on different time resolutions. For example, use 4-hour time slots in place of whole days (with X and N adjusted accordingly). This will guarantee a more even spread of the gifts throughout the day (but to pull it off you might need to take into account that the expected number of visitors in a 4-hour time slot is unlikely to be constant over the course of a day. Different time slots might need different denominators).
I have a issue, I am using Facebook Score API. But by default it only sorts scores descending so for example 1000, is higher then 10. But my problem is my scores are based on time, so in my case 10 is better then 1000. And I don't really want to have to do it manually which requires looping over every facebook friend, seeing if they have a score, if they do cache it in an object, then reverse sort it.
So I am wondering if there is some way that I could make
10 or 6 or what ever a larger number then 1000 (so basically large numbers become small, and small numbers become large) which could then be reversed. I can do something to both ends (before they are posted, and when I retrieve them). But they have to remain numbers.
Any ideas if this is possible?
It cannot be a decimal or a negative number. The numbers will never be higher then 100,000 so it's basically 1-100000
If 100,000 is the highest number the score can be, then store the score as 100000 - actual_score. Later, you can retrieve the actual score by doing the same operation: 100000 - recorded_score.
Web application compares pairs of sets of positive integers. Each set has only unique values, no greater than 210 000 000 (fits into 28 bits). Up to 5 000 000 values in each set.
Comparing sets A & B, need three result sets: "unique to A", "unique to B", "common to A & B". Particular task is to answer a question "is number N present in set S?"
So far the project runs in limited resources of a shared hosting, under LAMP stack. Quick'n'dirty solution I came up with was to outsource the job to hosting's MySQL, which has more resources. Temporary table for each set, the only column with the numbers is the primary index. Rarely sets are small enough to fit into engine=Memory, which is fast. It works, but too slow.
Looking for a way to keep a set like this in-memory, effective for the task of searching a particular number within. Keeping memory footprint as low as possible.
I came up to an idea of coding each set as a bit mask of 2^28 bits (32 Mb). A number present in the set = 1 bit set. 5 mln numbers = 5 mln bits set out of 210mln. Many zeroes == can compress effectively?
Seems like I'm inventing a bicycle. Please direct me to a "well-known" solution to this particular case of binary compression. I read about Huffman coding, which seems not the right solution, as its focus is size reduction, while my task requires many searches over a compressed set.
Upd. Just found an article on Golomb coding and an example of its application to run-length encooding.
There is a standard compression technique available for represented large sets of integers in a range, which allows for efficient iteration (so it can easily do intersection, union, set difference, etc.) but does not allow random access (so it's no good for "is N in S"). For this particular problem, it will reduce the dataset to around seven bits each, which would be around 8MB for sets of size 5,000,000. In case it's useful, I'll describe it below.
Bit-vectors of size 210,000,000 bits (26MB each, roughly) are computationally efficient, both to answer the "is N in S" query, and for bitwise operations, since you can do them rapidly with vectorized instructions on modern processors; it's probably as fast as you're going to get for a 5,000,000-element intersection computation. It consumes a lot of memory, but if you've got that much memory, go for it.
The compression technique, which is simple and just about optimal if the sets are uniformly distributed random samples of the specified size, is as follows:
Sort the set (or ensure that it is sorted).
Set the "current value" to 0.
For each element in the set, in order:
a. subtract the "current value" from the element;
b. while that difference is at least 32, output a single 1 bit and subtract 32 from the difference;
c. output a single 0 bit, followed by the difference encoded in five bits.
d. set the "current value" to one more than the element
To justify my claim that the compression will result in around seven bits per element:
It's clear that every element will occupy six bits (0 plus a five-bit delta); in addition, we have to account for the 1 bits in step 3b. Note, however, that the sum of all the deltas is exactly the largest element in the set, which cannot be more than 210,000,000 and consequently, we cannot execute step 3b more than 210,000,000/32 times. So step 3b. will account for less than seven million bits, while step 3c will account for 6 * 5,000,000 bits, for a total of 37 million, or 7.4 bits per element (in practice, it will usually be a bit less than this).
Can someone tell me about this performance issue
I've got 2 arrays,
I need to pick 5 numbers from these 2 arrays and work on the logic
the first array has got 5 number, out of which I need to pick 3 numbers
and the second array has got 4 numbers, out of which I need to pick 2 number
so taking this into consideration 5c3 - 10 and 4c2 - 6
which means 60 iterations for a single case
Is the method I'm approaching the right way??
is there any performance issue on this type of iterations ??
If you have to go through the whole array and pick numbers, then there is no optimization for that. The execution time depends on the size of arrays, meaning the bigger the size - higher execution time.
Although, if you know that it will always be exactly 5 numbers from two rows whose elements will not change, than I think you could generate all the possible valid combinations, store them in a database or file, and return a random one (if random choice is what you are looking for). In this case, you will achieve some optimization.
If I have a system where a hash is generated out of a total permutation of 1 million possibilities. If there's a 10% chance of a collision, should I worry about the generating algorithm running 5 times?
I have a system similar to jsfiddle, where a user can "save" a file on my server. Now I'm using '23456789abcdefghijkmnopqrstuvwxyz' which is 33 chars, and the file is 4 chars long, for a total of 33^4 = 1,185,921 possibilities.
The "filename" is generated randomly and if there's a collision it reruns to get another filename. Using a birthday paradox calculator I can see that after I have 500 entries I have a 10% chance of a collision.
What are the chances that I'll get a collision more than 5 times in a row? what about 4?
Is there any way to figure this out? Should I worry about it? What happens after 5000 entries?
Is there a program out there that can figure this out with any arbitary inputs?
I don't think that the birthday paradox calculations apply. There's a difference between the odds of 500 random numbers out of 1185921 being all different and the odds of one new number being different once you have 500 known unique numbers.
If you have 500 assigned numbers and generate a new number at random, it will have odds of 500/1185921 of being a collision. With 500 names taken, the chances of 4 collisions in a row are (500/1185921)4 < 10-13. With 5000 existing file names, the odds of a new name being a collision are 5000/1185921, and the chance of 4 collisions in a row are < 10-9.
My math is a little rusty so bear with me.
The chance of getting x collisions in a row is simply:
chance of collision ^ x;
Where the chance of collision is:
entries/space (which is 500/1185921 or 0.04%).
You can see above that this will get worse with the more entries (and better with a bigger space).
Also note the birthday paradox is perhaps not quite what you want. The 10% chance is the chance that any two entries will have had a collision, not the chance of a collision for the next entry.