I don't know if this is allowed, but I need clarification on a solution given to a question.
What is the best way to count page views in PHP/MySQL?
I have the exact question. I just have no idea how the solution makes any sense, here is the solution:
$sample_rate = 100;
if(mt_rand(1,$sample_rate) == 1) {
$query = mysql_query(" UPDATE posts
SET views = views + {$sample_rate}
WHERE id = '{$id}' ");
// execute query, etc
}
Any help?
Here mt_rand() generate random number between 1 to 100, so probability of that number to be one is 1/100.
If this generate 1, we are increasing value of views by 100.
So, effective increase in database per view
= ( Probability of increasing view ) * (increase in database )
= (1/ 100 )* 100
= 1
So in long run it will increase database value by 1 for each view.
This is tread-off between accuracy of post and speed. As MySQL query are more time extensive than PHP rand function calls.
For each user that views a page, a random number between 1 and 100 ($sample_rate) is generated. If the number equals 1, then the database is updated by the amount of possible values (sample rate).
This is simply a sampling technique used to save resources. This is a common technique used for larger websites.
If you are running a smaller operation, you should simply update the database each time the page is viewed, as oppose to using a sampling method.
Related
I have a betting script, which allows people to 'invest' in it. The table that stores how much and which user has invested, looks like this and is named 'invested':
Every time someone places a bet and loses for example, it should share out those losses proportionally according to how much someone has invested and likewise if they win.
To clarify, since the sum of 'invested' is 0.6 if someone bets 0.6 and wins, both users in the example would lose their investment.
Lets say the bet amount is defined in $wager.
$win_lose when "1" means a win and when "0" means a loss.
Any ideas on the PHP script for this?
I am CERTAIN that there is a more optimized query and procedure for this, but here is off the cuff code:
//! loop through your rows relative to your game
Use the sql from this answer to get your percentages: MySQL query to calculate percentage of total column
//! then depending on the win/lose case debit/credit the records:
$multiplier = ($gameresult) ? -1 * $wager : $wager;
foreach($investors as $playerid => $stake)
{
$val = $multiplier * $stake;
$sql = "UPDATE invested SET invested = ROUND(invested+".$val.") WHERE player_id=".$playerid;
}
I am interested to use this ranking class, based off of an article by Evan Miller to rank a table I have that has upvotes and downvotes. I have a system very similar to Stack Overflow's up/down voting system for an events site I am working on, and by using this ranking class I feel as though results will be more accurate. My question is how do I order by the function 'hotness'?
private function _hotness($upvotes = 0, $downvotes = 0, $posted = 0) {
$s = $this->_score($upvotes, $downvotes);
$order = log(max(abs($s), 1), 10);
if($s > 0) {
$sign = 1;
} elseif($s < 0) {
$sign = -1;
} else {
$sign = 0;
}
$seconds = $posted - 1134028003;
return round($order + (($sign * $seconds)/45000), 7);
}
I suppose each time a user votes I could have a column in my table that has the hotness data recalculated for the new vote, and order by that column on the main page. But I am interested to do this more on-the-fly incorporating the function above, and I am not sure if that is possible.
From Evan Miller, he uses:
SELECT widget_id, ((positive + 1.9208) / (positive + negative) -
1.96 * SQRT((positive * negative) / (positive + negative) + 0.9604) /
(positive + negative)) / (1 + 3.8416 / (positive + negative))
AS ci_lower_bound FROM widgets WHERE positive + negative > 0
ORDER BY ci_lower_bound DESC;
But I rather not do this calculation in the sql as I feel this is messy and difficult to change down the line if I utilize this code on multiple pages .etc.
Accessing the corresponding "Posts" table for anything (reading, writing, sorting, comparing, etc.) is extremely quick and thus relying on the database is the "most on-the-fly" alternative you have for non-temporary data storage (memory/sessions are still quicker but, logically, cannot be used to store this information).
You should be more worried about building a good ranking algorithm delivering the results you want (you are proposing two different systems, delivering different results) and working on making the whole code and the code-database communication as efficient as possible.
In principle, small codes with iterative simple orders offer the quickest and most reliable solution for this kind of situations. Example:
Ranking function (like the first one you propose or any
other one built on the ranking rules you want) called every time a
vote is given. It writes to the corresponding column(s) in the
"Posts" table (the simpler the query, the better: you can create a
ranking system as complex as you wish, but try to rely on PHP
rather than on queries).
Every time a comparison between posts is required, the "Posts" table is read with a simple SELECT ordering the records by ranking
(you can have various "assessing columns" (e.g., up-votes,
down-votes, further considerations); but better having one with the
definitive ranking).
You are right, query like this is rather messy and expensive as well.
Mixed PHP/MySQL on the fly is a bad idea well as you will have to select values for all posts and calculate hotness and then select a list of hotest ones. Extremely expensive.
You should consider saving at least part of your calculation to the database. Definitely order should go to the database. It's always better to calculate something and save just once on every save/update, instead of calculating each time it will be displayed. Try to do a benchmark on how much time you will save by calculating order on save/update instead of every time you calculate the hotness. Good thing is that order never changes unless someone upvotes/downvotes which you save to the db anyway, same for the sign.
Even if you save the sign to the db you are stil not able to avoid calculating on the fly due to the posted timestamp parameter.
I would see what difference does it make and where it makes a difference and calculate hotness with a CLI script every x amount of time only for those scripts where this is crucial, every y amount of time where it's making less of a difference.
Taking this approach you will be recalculating hotness only when necessary. This will make your application much more efficient.
I am not sure if it is possible with your DB and Schema however have you consider writing a UDF for custom sorting?
A post from stackoverflow talks about how to do this here.
In php - how do I display 5 results from possible 50 randomly but ensure all results are displayed equal amount.
For example table has 50 entries.
I wish to show 5 of these randomly with every page load but also need to ensure all results are displayed rotationally an equal number of times.
I've spent hours googling for this but can't work it out - would very much like your help please.
please scroll down for "biased randomness" if you dont want to read.
In mysql you can just use SeleCT * From table order by rand() limit 5.
What you want just does not work. Its logically contradicting.
You have to understand that complete randomness by definition means equal distribution after an infinite period of time.
The longer the interval of selection the more evenly the distribution.
If you MUST have even distribution of selection for example every 24h interval, you cannot use a random algorithm. It is by definition contradicting.
It really depends no what your goal is.
You could for example take some element by random and then lower the possibity for the same element to be re-chosen at the next run. This way you can do a heuristic that gives you a more evenly distribution after a shorter amount of time. But its not random. Well certain parts are.
You could also randomly select from your database, mark the elements as selected, and now select only from those not yet selected. When no element is left, reset all.
Very trivial but might do your job.
You can also do something like that with timestamps to make the distribution a bit more elegant.
This could probably look like ORDER BY RAND()*((timestamps-min(timestamps))/(max(timetamps)-min(timestamps))) DESC or something like that. Basically you could normalize the timestamp of selection of an entry using the time interval window so it gets something between 0 and 1 and then multiply it by rand.. then you have 50% fresh stuff less likely selected and 50% randomness... i am not sure about the formular above, just typed it down. probably wrong but the principle works.
I think what you want is generally referred to as "biased randomness". there are a lot of papers on that and some articles on SO. for example here:
Biased random in SQL?
Copy the 50 results to some temporary place (file, database, whatever you use). Then everytime you need random values, select 5 random values from the 50 and delete them from your temporary data set.
Once your temporary data set is empty, create a new one copying the original again.
I'd like to populate the homepage of my user-submitted-illustrations site with the "hottest" illustrations uploaded.
Here are the measures I have available:
How many people have favourited that illustration
votes table includes date voted
When the illustration was uploaded
illustration table has date created
Number of comments (not so good as max comments total about 10 at the moment)
comments table has comment date
I have searched around, but don't want user authority to play a part, but most algorithms include that.
I also need to find out if it's better to do the calculation in the MySQL that fetches the data or if there should be a PHP/cron method every hour or so.
I only need 20 illustrations to populate the home page. I don't need any sort of paging for this data.
How do I weight age against votes? Surely a site with less submission needs less weight on date added?
Many sites that use some type of popularity ranking do so by using a standard algorithm to determine a score and then decaying eternally over time. What I've found works better for sites with less traffic is a multiplier that gives a bonus to new content/activity - it's essentially the same, but the score stops changing after a period of time of your choosing.
For instance, here's a pseudo-example of something you might want to try. Of course, you'll want to adjust how much weight you're attributing to each category based on your own experience with your site. Comments are rare, but take more effort from the user than a favorite/vote, so they probably should receive more weight.
score = (votes / 10) + comments
age = UNIX_TIMESTAMP() - UNIX_TIMESTAMP(date_created)
if(age < 86400) score = score * 1.5
This type of approach would give a bonus to new content uploaded in the past day. If you wanted to approach this in a similar way only for content that had been favorited or commented on recently, you could just add some WHERE constraints on your query that grabs the score out from the DB.
There are actually two big reasons NOT to calculate this ranking on the fly.
Requiring your DB to fetch all of that data and do a calculation on every page load just to reorder items results in an expensive query.
Probably a smaller gotcha, but if you have a relatively small amount of activity on the site, small changes in the ranking can cause content to move pretty drastically.
That leaves you with either caching the results periodically or setting up a cron job to update a new database column holding this score you're ranking by.
Obviously there is some subjectivity in this - there's no one "correct" algorithm for determining the proper balance - but I'd start out with something like votes per unit age. MySQL can do basic math so you can ask it to sort by the quotient of votes over time; however, for performance reasons, it might be a good idea to cache the result of the query. Maybe something like
SELECT images.url FROM images ORDER BY (NOW() - images.date) / COUNT((SELECT COUNT(*) FROM votes WHERE votes.image_id = images.id)) DESC LIMIT 20
but my SQL is rusty ;-)
Taking a simple average will, of course, bias in favor of new images showing up on the front page. If you want to remove that bias, you could, say, count only those votes that occurred within a certain time limit after the image being posted. For images that are more recent than that time limit, you'd have to normalize by multiplying the number of votes by the time limit then dividing by the age of the image. Or alternatively, you could give the votes a continuously varying weight, something like exp(-time(vote) + time(image)). And so on and so on... depending on how particular you are about what this algorithm will do, it could take some experimentation to figure out what formula gives the best results.
I've no useful ideas as far as the actual agorithm is concerned, but in terms of implementation, I'd suggest caching the result somewhere, with a periodic update - if the resulting computation results in an expensive query, you probably don't want to slow your response times.
Something like:
(count favorited + k) * / time since last activity
The higher k is the less weight has the number of people having it favorited.
You could also change the time to something like the time it first appeared + the time of the last activity, this would ensure that older illustrations would vanish with time.
I mean what the most efficient way to get information about the quantity of your page's items and make sql query with LIMIT that you need. or I should get all items and then crop array with php functions?
now I do 2 queries: first to count all items and second to get items that I need with LIMIT.
OK, I'll be more concrete. For example I need to show a question on my page and 20 answers to this question. At the bottom there shold be page control: links to the next, prev page and so on. I want to show proper number of links (number of answers/20) and when I go to any link I want to recieve proper answers (for example 41 to 60 on the 3d page). So what's the best way to get number of items (answers) to show proper number of links and to get proper answers for each link?
I guess your'e trying to say you want to know how many items/answers there is in the query but only read up to 20 items at at time, for pagination.
Firstly: You really should look for a pagination package; lots and lots of people have had the same problem before and there probably exists both free/opensource and proprietary solutions for your programming language and framework. (If you say what language you are using I'm sure someone can reccomend a solution for you.)
Anyway, I know I like to know how things work, so this is how it usually does:
As far as I know the pagination code calculates the pages by doing one query using select count(*) from tblX where something divide this number with the items-per-page number and use ceiling (e.g. 4.1 => 5).
For listing the results per page a new query is required; don't worry the count query is terribly much faster than getting every result discarding the ones you don't need DO NOT DO THAT (that's the recipie for becoming the top story on this page). Something like select * from tblX where something limit Y offset Z where Y is the number of items per page, and Z is the the (requested_page - 1)*Y; page 1 will have offset 0, page 2 have offset 20 (if thats what Y are) etc..
But do not try to implement this manually, it's unneccesary, tedious and error prone, much better to use your time customizing a readymade solution.
I'm assuming you want a count of the number of rows you'll be reading so as to do some pagination or similar? I don't understand your need for the LIMIT in the context of your question. However, if you just want a count of how many rows have been found, use one of the following.
You select the count of all rows such as:
select count(*) as counted, name, address
from contact
Or found rows:
SELECT SQL_CALC_FOUND_ROWS, name, address
from contact
This may be mysql specific I'm not sure.
Update:
For pagination you would do something like the following - (Psuedocode)
$rows = array($result)
$num_rows = sql_calc_found_rows
$per_page = 20
$pages = ceil($num_rows / $per_page)
page
$rows_this_page = array()
$rows_this_page = get_values($rows, (min index)$page_number * $per_page - $per_page, (max index)$page_number * $per_page - 1)