I run a digg-like website that promotes content to the front page when it reaches a certain number of votes. Right now it doesn't take date submitted into consideration.
I'd like to use a simple algorithm that just uses the number of votes and the date submitted to determine whether something should be promoted. I don't want the algorithm to do anything more complex then that (such as iterating over all the vote dates).
EDIT:
Shouldn't the formula be something like this:
30 / (days between post date and now) * (vote count) = weighted vote
Here are some scenarios which seem reasonable for my site, which indicates that the algorithm needs to be more lenient for older items (since older items are less discoverable on the site)
30 / 30 * 30 = 30 (30 days old, promoted with 30 votes)
30 / 5 * 15 = 90 (5 days old, promoted with 15 votes)
30 / 1 * 10 = 300 (1 day old, promoted with 10 votes)
How can the formula be modified so the above 3 give close to the same min weighted vote required for promotion?
You can use the difference between the current date and the submission date to weight the votes.
(threshold - (days between post date and now))/threshold * (vote count) = weighted vote
in code
$weightedVote = ($threshold - $daysOld) / $threshold * $voteCount;
This would have the effect of eliminating posts older than the threshold from consideration. For example, a post 10 days old would have its votes multiplied by 20/30.
Is there a reason why you are assigning an arbitrary number to content when the condition is vote based? I mean - it seems you'd be better off weighing the users and their votes rather than giving a piece of content more or less votes based on the date.
I wrote some pretty mean voting software for a company that had $10,000 + contests and our algorithm considered the user and their history of behavior, which ended up filtering out a lot of spam votes.
This sounds complex but it is not really.
As for your balancing code -
You want 1 day old content to be promoted at 10 votes, where a 30 day item requires 30 votes?
Or do you mean 1 day content with 10 votes is promoted, while a 30 day item with, say, 6 votes could be promoted because it is older and less likely to be seen, so the vote tolerance is reduced?
function daysDifference($endDate, $beginDate)
{
$date_parts1=explode("-", $beginDate);
$date_parts2=explode("-", $endDate);
$start_date=gregoriantojd($date_parts1[1], $date_parts1[2], $date_parts1[0]);
$end_date=gregoriantojd($date_parts2[1], $date_parts2[2], $date_parts2[0]);
return $end_date - $start_date;
}
$diff = 30 - daysDifference(date("Y-m-d"), $postdate);
if($diff > 0)
$weight = 30 / $diff + $votes;
else
$weight = $votes;
So, suppose the daysDifference function returned 26 and there were 4 votes originally. This would read 30 / (30 - 26 = 4) = 7.4 + 4 So 11.4 votes total.
For a one day old item with 10 votes, it would read 30 / (30 - 1 = 29) = 1.03 + 10. So 11.03 total.
Roughly the same for this sample, but will vary for others.
The if means that any content over 30 days is just not considered and their votes are equal to actual value.
I could have just misunderstood your needs though.
Related
I'm trying to calculate a sort or "daily random number" in a range, wich can't be guessed, for each user in our site but can't figure out how to do it.
I don't want a random number either, it must be a calculated number in PHP, not an additional database field or anything similar.
I tought at a function who can take the user's ID and the day of year and calculate this number.
Example:
USERID: 12345, Range: 0-7 (constant values for every user)
DayOfYear: 250 (change every day)
Then something like: ((12345 + 250) MODULO 8) (so I've range from 0 to 7 for each user). The problem is that the same number will come out every 8 days in a loop that user will find very fast.
Each user don't necessarly need a different number for every day, even the same number would be OK for a few days but not all users must have the same number. Also, most important, no loop scenario, so user can't guess his his daily number.
Thank you for your help.
There are so many answers to this question and none would be the best ... but I'm in a funny mood:
$id = hexdec(substr(md5($userId . date('z')), 0, 3)) % 8;
Use the md5-function to get a hex-string from a string. Then use a part of this string to calculate the mod 8.
For the next 10 days the id will be 5,7,5,3,6,0,2,0,2,4 when using your user id
But to guess a number between 0 and 7 isn't so hard, don't uses this for security ...
I have several objects, each object should be rated by [q]Quality, [v]Value and [s]Suitability by a user.
Currently I am retrieving the total average of each object by Score = (q+v+s/3) - That said I run into the popular issue where an object with 1 rating of 10,10,10 is rated higher than a object with 3 ratings of 10,9,9 | 9,10,10 | 10,10,8 - Not good!
I want to score each object by a total. Is there any algorithm that would be best suited? The end result will be in a PHP environment. An example could be roughly what Awwwards has currently for each of its websites listed.
I've looked around and can see similar requirements and the Bayesian method being suggested, I'm not sure how this would match my requirements though as the need for knowing a 'minimum'?
Digging around a bit more, I've found this - applied to some SQL would this work? Any issues?
<?php
$avg_num_votes = 17; // Average number of reviews on all objects
$avg_rating = 4.5; // Average review for all objects
$this_num_votes = 17; // Number of reviews on this object
$this_rating = 4; // Review for this object
$bayesian_rating = (($avg_num_votes * $avg_rating) + ($this_num_votes * $this_rating)) / ($avg_num_votes + $this_num_votes);
echo $bayesian_rating;
//(FR) = ((av * ar) + (v × r)) / (av + v)
//(FR) = ((17 * 4.5) + (17 * 4)) / (17 + 17)
//(FR) = (76.5 + 68) / 34
//(FR) = 162.5 / 34
//(FR) = 4.25
?>
Laplace smoothing is simple to implement, although you have to choose one parameter. It is what is being called "the Bayesian estimate" or "the Bayesian method" although that is not quite right, and there are many other techniques that more accurately implement Bayesian updating for different choices of prior distributions.
Choose M, called the number of "minimum" ratings by some. Calculate the average rating A over all categories. Give each object M average ratings in addition to the users' ratings. If you change M, this changes how much you trust a small sample. Larger values of M give less credit to small numbers of ratings.
You don't need to adjust this based on having three scores. Call the sum the rating.
For example, suppose the average rating anywhere is 25, you have chosen M=3, and you are comparing one object with 1 rating of 30 to an object with 7 ratings of 27. For the first, you calculate a smoothed rating of (30*1 + 25*3)/(1+3) = 26.25. The smoothed rating of the second is (27*7+25*3)/(7+3) = 26.4. So, the second object would have a slightly higher smoothed rating than the first.
I have an activity log table which tracks the timestamp of a user's action. I need to be able to identify users who perform more actions in a given time period than there are minutes in that time period, with at least 10 actions required for a "block of actions" to be identified.
E.g.
Flagged
11 activities in 10 minutes
25 activities in 20 minutes
Not flagged
9 activities in 5 minutes
23 activities in 30 minutes
I need to be able to identify the largest block which satisfies these conditions, so for example if a user performs the following actions:
Minutes Action_Count
1 3
2 0
3 0
4 3
5 0
6 3
7 0
8 1
9 0
10 0
11 0
12 1
13 3
Even though the actions from minutes 1-8 will be flagged as there are 10 in a "less than 10 minute period", the actions from minutes 12 and 13 should also be included as they constitute 14 actions in 13 minutes, even though the action at 12 minutes is back inside the threshold.
The data set per user is likely to be approximately 100 timestamps over a 7 day period, so there will likely be several blocks of actions per user, corresponding to when they are active.
I can identify blocks of actions which occur within the same minute with the following:
$timestamps = array(1392700382,1392700458,1392700486,1392700612,1392700619,1392700636,1392700648,1392700671,1392700679,1392700701,1392700860,1392815451,1392815486,1392815499,1392815532,1392815539,1392815680,1392815699,1392815763,1392815851,1392815972,1392816075,1392903882,1392903950,1392904029,1392904181,1392904259,1392904377,1392904402,1392904411,1392904437,1392904445,1392904469,1392904638,1392904735,1392988830,1392988858,1392988889,1392988917,1392988980,1392989016,1392989078,1392989108,1392989140,1392989167,1392989203,1392989251,1392989393,1392989401,1392989408,1392989415,1392989511,1393065019,1393065352,1393065448,1393066105,1393066110,1393066114,1393066136,1393066139,1393066144,1393066148,1393066203,1393114548,1393114563,1393114696,1393114697,1393114717,1393114723,1393114742,1393114748,1393114753,1393114785,1393114824,1393204378,1393204383,1393204387,1393204391,1393204408,1393204414,1393204419,1393204424,1393204474);
$elements = array();
foreach ($timestamps as $timestamp) {
$oneMinuteAgo = $timestamp - 60;
$elements[] = $timestamp;
$postsInLastMinute = array_filter($elements, function ($value) use ($oneMinuteAgo) {
return $value > $oneMinuteAgo;
});
echo implode(', ', $postsInLastMinute)."\n";
}
(see output here: http://pastebin.com/LDEQWMxn)
Although I'm not sure how this information helps me, and even if it is the right way to approach the problems.
Geobits has already mentioned the naive approach and if you are dealing with about 100 time stamps, that seems reasonably fast. I wouldn't bother putting the stamps into one-minute buckets first. Calculate the rate activities / time_span and find the longest stretch based on number of activities or time span where the rate exceeds 1/60. (Or just find out whether such a time span exists.)
In your example data (those in the code example), activities happen in chunks of at most 20 minutes over a period of a week. These chunks have large gaps between them. You can use the nature of the data to improve the naive algorithm by looking at your time stamps chunk-wise. This has the benefit that you can rule out sequences with fewer than 10 timestamps right away. Also, the naiveté of the approach is less dominant, because your inner loop has to do only n - 10 iterations where n rarely exceeds 20.
Here's the approach in pseudocode. (I'm not familiar with PHP.)
flag_piecewise(time[])
gap = 60 * 60 # choose approriate minimum gap, e.g. 1h
start = 0
curr = 1
while true
if (curr == len(time) || time[curr] - time[curr - 1] > gap)
if (flag_range(time[start:curr]) return true
if (curr == len(time)) break
start = curr
curr++
return false
Here flag_range is the implementation of the naive approach. In a quick test on your sample data, I got a speed-up of about 20. (The naive approach was fast enough, I think, but you get the speed-up without adding much complexity.)
I have found an excel file online that helps with the calculations of a Drugs Half-life and helps to determine how much of the given drug is likely to remain in ones system based on
Hal-life Hour Number
The quantity of the Drug taken per dose
how much is in your system from previous doses
Below is a screenshot of the Excel file showing both the Output with the calculations already performed and also shows the actual Math that is involved for each day...
The Columns A, B, C, D, E, etc.. is the Day 24 hours
Column D Row 6 is the Half life for a Drug in Hours
From the Image below you can see that the calculation is perfromed and that Value is then used in the Next Day's equation
Ok so I am not that knowledgeable with Math outside of basic Addition, Subtraction, Multiplication, and Division I do not know much more then that.
My goal is to create a tool similar to this Excel file but with PHP, I am not sure how to do so but I think all the answers are right here in the image above as far as the math portion.
Looking at D3 I can see that it takes...
D2's value which is 30 in the Image
It then Adds C3's value
Then Multiplies that by 1/2
I am not sure what the ^ does though?
Then it Divides 24 hours by D6 which holds the Hour Number
In my PHP I would like to have a Function that I can pass an array of Data, so let's say I pass in an Array with...
the Number of Days to calculate (my image shows like 4 days so if I pass 10 days, it will shows the daily results up to 10 days)
Then an array of the Daily amount consumed in mg (so in my image this would be an array with 30,0,30,0,0,0,0
Then I would also pass in the Half Life in Hours, so in my image the drug used has a half-life of 4.5 hours
This function would then return an array with the data for each day, should show the remaining mg in one's system for each day, I can then use this data result to build charts, graphs, or simply a List
I would appreciate any help to get me started, I think I can pull this off on my own but I need help getting the math portion 100%, above I break down the equation as I see it, please help me understand better for example I am not sure what ^ does in the equation or how to do it in PHP
I hope my question is not too vague, I will come back with more specific once I get a good start on this but please help if you can so far, thank you for reading.
function calcHalfLife( $mgTaken , $drugHalfLifeHours , $day ) {
//total number of half-lifes elapsed
$total_half_lifes = ($day * 24) / $drugHalfLifeHours;
//total reduction in dosage
$reductionFactor = pow( 0.5 , $total_half_lifes );
//return the current dosage in the person's system
return round( $mgTaken * $reductionFactor , 2 );
}
The above function should do the trick. Pass in the MG dosage taken, the number of hours for that drug's half-life, and the number of days since the dosage.
The function calculates the number of half-lifes experienced by the drug by taking the number of days * 24 hours in a day and dividing that by the total number of hours it takes for a half-life of that drug. This is the number of times the drug's dosage would be cut in half.
It then takes 0.5 (50% as a decimal) and increments it to the power of the total number of half lifes experienced by the drug. So 1 half life would be 0.5, 2 would be 0.5 * 0.5 = 0.25 etc etc. This is the decimal representation of the percentage of the drug left in the person's system.
It then multiplies that remainder by the original amount taken and rounds it off to 2 decimal places. The return floating point value will be a representation of the remaining MG dosage of the drug in the person's system.
If you're looking to build a function/system that allows you to calculate a daily dosage of that drug (ie: the person takes the same dosage each day as opposed to once) that is a very different formula and function, but the basic principle is the same as the one I wrote for you here.
Good luck ;)
I've been given the task of fixing up a bit of very messy code one of our clients had written by a dodgy freelancer.... I'm not terribly keen on rewriting the whole thing, as its been in use for some time, its only really caused problems now when they've asked for another feature. The database is utilised by various other parts of the program and this function is only a small portion of the overall system.
Essentially the purpose of the script is to manage all meetings in a virtual office.
They need a page to display "current" meetings. This was there original query.
$CURRENT_TIME=date("Gi");
mysql_connect(localhost,$USERNAME,$PASSWORD);
#mysql_select_db($DATABASE) or die( "Unable to select database");
$query="SELECT * FROM ROOM_1 WHERE
EVENT_ROOM LIKE'%$URLROOMNAME%'
AND EVENT_YEAR='$CURRENT_YEAR'
AND EVENT_MONTH='$CURRENT_MONTH'
AND EVENT_DATE='$CURRENT_DATE'
AND START_TIME<='$CURRENT_TIME'
AND END_TIME>='$CURRENT_TIME' ";
$result=mysql_query($query);
$num=mysql_numrows($result);
mysql_close();
The time is stored in the mysql table in an int field as a 24hour value. Eg 3:43pm is stored simply as 1543.
The new requirement was for a meeting to be able to have a setup time allowance.
The freelancers... "ingenious" solutions was to add another int field to the table and change this line
AND START_TIME<='$CURRENT_TIME'
to..
AND (START_TIME-APPEAR_START_TIME)<='$CURRENT_TIME'
Now whilst it may work for some meetings it wont for others, eg A meeting starting at 1405 with a 20 minute setup allowance time would result in 1385...
So I'm looking for a clever solution that allows me to leave the rest alone and just subtract APPEAR_START_TIME field from the $START_TIME column in the query but by minutes.
Any ideas ?
If I understand you correctly, you have two integer values in the form of "HHMM", where HH is hours, and MM is minutes; and you want to calculate difference between two time values.
You can get HH with value / 100, and MM with value % 100. Then you can calculate delta for hours and minutes separately.
Hours delta is (HH values delta + probably 1 hour caused by minutes delta, if MM-start < MM-appear-start):
(START_TIME / 100 - APPEAR_START_TIME / 100) + (START_TIME % 100 - APPEAR_START_TIME % 100) / 60
Minutes delta is:
(START_TIME % 100 - APPEAR_START_TIME % 100) % 60
Then you can concatenate HH and MM parts of delta:
HOURS_DELTA * 100 + MINUTES_DELTA
first of all , thanks Kel :)
I think this one is more easier :
(((START_TIME/100)*60+(START_TIME%100) - APPEAR_START_TIME) / 60) * 100 + (((START_TIME/100)*60+(START_TIME%100) - APPEAR_START_TIME)%60)