Bayesian algorithm returning 0 - php

I'm trying to get the top rated photos within the last week through MySQL and PHP. I've found that the Bayesian formula may be what I need, but I've been messing with it to no avail.
The following code doesn't return any errors, it only returns a single '0'. Why that is I haven't the slightest.
$bayesian_algo = "SELECT
photo_id,
(SELECT count(photo_id) FROM photo_ratings) /
(SELECT count(DISTINCT photo_id) FROM photo_ratings) AS avg_num_votes,
(SELECT avg(rating) FROM photo_ratings) AS avg_rating,
count(photo_id) as this_num_votes,
avg(rating) as this_rating
FROM photo_ratings
WHERE `date` > '$timeframe'
GROUP BY photo_id";
$bayesian_info = $mysqli->query($bayesian_algo);
$all_bayesian_info = array();
while($row=$bayesian_info->fetch_assoc()) array_push($all_bayesian_info,$row);
list($photo_id,$avg_num_votes,$avg_rating,$this_num_votes,$this_rating) = $all_bayesian_info;
$photo_id = intval($photo_id);
$avg_num_votes = intval($avg_num_votes);
$avg_rating = intval($avg_rating);
$this_num_votes = intval($this_num_votes);
$this_rating = intval($this_rating);
$bayesian_result = (($avg_num_votes * $avg_rating) + ($this_num_votes * $this_rating)) / ($avg_num_votes + $this_num_votes);
echo $bayesian_result; // 0??
My database looks like this:
photo_id | user_id | rating | date
Where all fields are stored as INTs (I'm storing date as a UNIX timestamp).
I'm tired and coding recklessly, normally I could at least get a little further if there were error messages (or anything!), but there's no way the data I get if I var_dump($all_bayesian_info) would ever return 0.

Lets do the complex Bayesian calcuation in mysql query itself!.
The code can be rewritten like this:
$bayesian_algo_result = "SELECT *,
(((resultdata.avg_num_votes * resultdata.avg_rating) + (resultdata.this_num_votes * resultdata.this_rating)) / (resultdata.avg_num_votes + resultdata.this_num_votes)) AS bayesian_result
FROM
(
SELECT
photo_id,
(SELECT count(photo_id) FROM photo_ratings) /
(SELECT count(DISTINCT photo_id) FROM photo_ratings) AS avg_num_votes,
(SELECT avg(rating) FROM photo_ratings) AS avg_rating,
count(photo_id) as this_num_votes,
avg(rating) as this_rating
FROM photo_ratings
WHERE `date` > '$timeframe'
GROUP BY photo_id
) AS resultdata;
";
$bayesian_result_info = $mysqli->query($bayesian_algo_result);
//loop through the rows.
while($row = $bayesian_result_info->fetch_assoc()) {
list(
$photo_id,
$avg_num_votes,
$avg_rating,
$this_num_votes,
$this_rating,
$bayesian_result
) = $row;
echo 'Balesian rating for photo' . $photo_id . ' is: ' . $bayesian_result;
}
Note:
Here is a working sql fiddle: http://sqlfiddle.com/#!2/d4a71/1/0
I didnot make any logic change to your formula. So please make sure your formula is correct.
If/when UNIX timestamps go to a 64bit data type, then you'll have to use a MySQL "bigint" to store them ( for your 'date' column).

Related

selecting sql variable returning an empty array

I'm attempting to SET 3 variables in MySQL and get the sum of two of them.
The fist two variables, #cFollow and #cComment, should return an integer value each (the count of how many rows are returned); the third one is the sum of those two integers.
This is my SQL:
SET #cFollow = (SELECT COUNT(*) FROM followers WHERE unix > :unix AND following = :user);
SET #cComment = (SELECT COUNT(*) FROM comments WHERE comment_unix > :unix AND comment_track IN (SELECT upload_id FROM uploads WHERE upload_artist = :user));
SET #total = #cFollow + #cComment;
SELECT #total;
When I tested this on PHPMyAdmin, it returned the correct values and worked perfectly fine. However, when I tested it within PHP, it returned an empty array.
This is my PHP:
$holdPoint = (int)Input::get("hold_point");
$_SQL = "
SET #cFollow = (SELECT COUNT(*) FROM followers WHERE unix > :unix AND following = :user);
SET #cComment = (SELECT COUNT(*) FROM comments WHERE comment_unix > :unix AND comment_track IN (SELECT upload_id FROM uploads WHERE upload_artist = :user));
SET #total = #cFollow + #cComment;
SELECT #total;";
$_PARAMS = [":unix" => $holdPoint, ":user" => $user_id];
$check = DB::getInstance()->queryPro($_SQL, $_PARAMS);
var_dump($check);
This is the result of that var_dump:
array(0){} // not very impressive...
// should be something like int(1) instead
I've been searching around all night learning how to return a variable in PHP from a MySQL query, and this is as far as I've gotten.
All help is appreciated,
Cheers.
This answer is not really meant as a answer but more as a comment.
Also note that your queries
SET #cFollow = (SELECT COUNT(*) FROM followers WHERE unix > :unix AND following = :user);
SET #cComment = (SELECT COUNT(*) FROM comments WHERE comment_unix > :unix AND comment_track IN (SELECT upload_id FROM uploads WHERE upload_artist = :user));
SET #total = #cFollow + #cComment;
SELECT #total;
Can be most likely be rewritten as one query
SELECT
SUM(alias.c) AS total
FROM (
SELECT COUNT(*) AS c FROM followers WHERE unix > :unix AND following = :user
UNION ALL
SELECT COUNT(*) AS c FROM comments WHERE comment_unix > :unix AND comment_track IN (SELECT upload_id FROM uploads WHERE upload_artist = :user)
) AS alias

Select a fixed number of records from a particular user in a sql result

I have 2 tables - users and articles.
users:
user_id (int)
name (varchar)
articles:
article_id (int)
user_id (int)
title (varchar)
description (text)
In my application I need to display 20 RANDOM articles on a page.
My query is like this:
SELECT a.title
, a.description
, u.name
FROM articles a
JOIN users u
USING (user_id)
ORDER
BY RAND()
LIMIT 20
A user can have any number of articles in the database.
Now the problem is sometimes out of 20 results, there are like 9-10 articles from one single user.
I want those 20 records on the page to not contain more than 3 (or say 4) articles from a particular user.
Can I achieve this through SQL query. I am using PHP and MySQL.
Thanks for your help.
You could try this?
SELECT * FROM
(
SELECT B.* FROM
(
SELECT A.*, ROW_NUMBER() OVER (PARTITION BY A.USER_ID ORDER BY A.R) USER_ROW_NUMBER
FROM
(
SELECT a.title, a.description, u.name, RND() r FROM articles a
INNER JOIN users u USING (user_id)
) A
) B
WHERE B.USER_ROW_NUMBER<=4
) C
ORDER BY RAND() LIMIT 20
Mmm, intresting I don't think this is possible through a pure sql query.
My best idea would be to have an array of the articles that you'll eventually display query the database and use the standard SELECT * FROM Articles ORDER BY RAND() LIMIT 20
The go through them, making sure that you have indeed got 20 articles and no one has breached the rules of 3/4 per user.
Have another array of users to exclude, perhaps using their user id as an index and value of a count.
As you go through add them to your final array, if you find any user that hits you rule add them to the array.
Keep running the random query, excluding users and articles until you hit your desired amount.
Let me try some code (it's been a while since I did php)
$finalArray = [];
$userArray = [];
while(count($finalArray) < 20) {
$query = "SELECT * FROM Articles ";
if(count($finalArray) > 0) {
$query = $query . " WHERE articleID NOT IN(".$finalArray.")";
$query = $query . " AND userID NOT IN (".$userArray.filter(>4).")";
}
$query = $query . " ORDER BY Rand()";
$result = mysql_query($query);
foreach($row = mysql_fetch_array($result)) {
if(in_array($finalArray,$row) == false) {
$finalArray[] = $row;
}
if(in_array($userArray,$row[userId]) == false) {
$userArray[$row[userId]] = 1;
}
else {
$userArray[$row[userId]] = $userArray[$row[userId]] + 1;
}
}

Echo value and percentage from SQL server Union query in PHP

I have an issue where I'm trying to find the percentage of two queries but want to retain the session value to echo elsewhere. let me explain.
My first union query returns a value E.G. 100
$sql1 = "select COUNT(*) From
(select inc.INCIDENT_NUMBER as TICKET
From dbo.HELP_DESK as inc
Where inc.STATUS < 3
and inc.ASSIGNED_GROUP = 'Scheduling'
UNION ALL
SELECT chg.INFRASTRUCTURE_CHANGE_ID AS TICKET
FROM dbo.CHANGE as chg
WHERE chg.CHANGE_REQUEST_STATUS NOT IN (1,4,5,8,9,10,11,12)
and chg.ASGRP in = 'Scheduling') num1";
I then want to echo the result as a session value so:
$SCH = sqlsrv_query($conn,$sql1);
if( $SCH === false) {
die( print_r( sqlsrv_errors(), true) );
}
while( $Row = sqlsrv_fetch_array( $SCH, SQLSRV_FETCH_NUMERIC) ) {
$_SESSION['$SCH'] = $Row[0];
}
so far so good. However, I want to calculate the percentage based on my second union query.
$sql2 = "select COUNT(*) From
(select inc.INCIDENT_NUMBER as TICKET
From dbo.HELP_DESK as inc
Where inc.STATUS < 4
and inc.ASSIGNED_GROUP = 'Scheduling'
UNION ALL
SELECT chg.INFRASTRUCTURE_CHANGE_ID AS TICKET
FROM dbo.CHANGE as chg
WHERE chg.CHANGE_REQUEST_STATUS < 9
and chg.ASGRP = 'Scheduling') num2"
The second value is 250 as an example. So I'm trying to calculate the percentage e.g. (100/250) * 100 = 40%
function percentage ($sql1,$sql2)
{
return ($sql1/$sql2) * 100;
}
Echo percentage
The last bit pulls back nothing, blank screen. There's probably a much easier way of doing this and given my relative php newcomer status, I'm probably mixing up several functions at once.
I'd appreciate a little help getting it working. Thanks for taking the time to read.
Managed to resolve the percentage issues, can output seperate session values.
select
s.activeCount * 100 / s.totalCount as Percentage
FROM
(select activecount = (select COUNT(*) From
(select inc.INCIDENT_NUMBER AS TICKET
From dbo.HELP_DESK as inc
Where inc.STATUS < 3
and inc.ASSIGNED_GROUP = 'Scheduling'
UNION ALL
SELECT chg.INFRASTRUCTURE_CHANGE_ID AS TICKET
FROM dbo.CHANGE as chg
WHERE chg.CHANGE_REQUEST_STATUS NOT IN (1,4,5,8,9,10,11,12)
and chg.ASGRP = 'Scheduling') num),
totalCount = (select COUNT(*) From
(select inc.INCIDENT_NUMBER AS TICKET
From dbo.HELP_DESK as inc
Where inc.STATUS < 4
and inc.ASSIGNED_GROUP = 'Scheduling'
UNION ALL
SELECT chg.INFRASTRUCTURE_CHANGE_ID AS TICKET
FROM dbo.CHANGE as chg
WHERE chg.CHANGE_REQUEST_STATUS < 9
and chg.ASGRP = 'Scheduling') num)) s

Getting total in while statement with UNION query

I am trying to calculate how much a user has earned so it reflects on the users home page so they know how much their referrals have earned.
This is the code I have.
$get_ref_stats = $db->query("SELECT * FROM `members` WHERE `referral` = '".$user_info['username']."'");
$total_cash = 0;
while($ref_stats = $get_ref_stats->fetch_assoc()){
$get_ref_cash = $db->query("SELECT * FROM `completed` WHERE `user` = '".$ref_stats['username']."' UNION SELECT * FROM `completed_repeat` WHERE `user` = '".$ref_stats['username']."'");
$countr_cash = $get_ref_cash->fetch_assoc();
$total_cash += $countr_cash['cash'];
$countr_c_rate = $setting_info['ref_rate'] * 0.01;
$total_cash = $total_cash * $countr_c_rate;
}
It worked fine when I just had
$get_ref_cash = $db->query("SELECT * FROM `completed` WHERE `user` = '".$ref_stats['username']."'");
but as soon as I added in the UNION it no longer calculated correctly.
For example, there is 1 entry in completed and 1 entry in completed_repeat both of these entries have a cash entry of 0.75. The variable for $countr_c_rate is 0.10 so $total_cash should equal 0.15 but instead it displays as 0.075 with and without the UNION it acts as if it is not counting from the other table as well.
I hope this makes sense as I wasn't sure how to explain the issue, but I am very unsure what I have done wrong here.
In your second query instead of UNION you should use UNION ALL since UNION eliminates duplicates in the resultset. That is why you get 0.075 instead of 0.15.
Now, instead of hitting your database multiple times from client code you better calculate your cash total in one query.
It might be inaccurate without seeing your table structures and sample data but this query might look like this
SELECT SUM(cash) cash_total
FROM
(
SELECT c.cash
FROM completed c JOIN members m
ON c.user = m.username
WHERE m.referral = ?
UNION ALL
SELECT r.cash
FROM completed_repeat r JOIN members m
ON r.user = m.username
WHERE m.referral = ?
) q
Without prepared statements your php code then might look like
$sql = "SELECT SUM(cash) cash_total
FROM
(
SELECT c.cash
FROM completed c JOIN members m
ON c.user = m.username
WHERE m.referral = '$user_info['username']'
UNION ALL
SELECT r.cash
FROM completed_repeat r JOIN members m
ON r.user = m.username
WHERE m.referral = '$user_info['username']'
) q";
$result = $db->query($sql);
if(!$result) {
die($db->error()); // TODO: better error handling
}
if ($row = $result->fetch_assoc()) {
$total_cash = $row['cash_total'] * $setting_info['ref_rate'];
}
On a side note: make use of prepared statements in mysqli instead of building queries with concatenation. It's vulnerable for sql-injections.
With $countr_cash = $get_ref_cash->fetch_assoc(); you only fetch the first row of your result. However, if you use UNION, you get in your case two rows.
Therefore, you need to iterate over all rows in order to get all values.
Ok, So there is only one row in members table. You are iterating only once on the members table. Then you are trying to get rows using UNION clause which will result in two rows and not one. Then you are just getting the cash column of the first row and adding it to the $total_cash variable.
What you need to do is iterate over the results obtained by executing the UNION query and add the $total_cash variable. That would give you the required result.
$get_ref_stats = $db->query("SELECT * FROM `members` WHERE `referral` = '".$user_info['username']."'");
$total_cash = 0;
while($ref_stats = $get_ref_stats->fetch_assoc()){
$get_ref_cash = $db->query("SELECT * FROM `completed` WHERE `user` = '".$ref_stats['username']."' UNION SELECT * FROM `completed_repeat` WHERE `user` = '".$ref_stats['username']."'");
while($countr_cash = $get_ref_cash->fetch_assoc()){
$total_cash += $countr_cash['cash'];
}
$countr_c_rate = $setting_info['ref_rate'] * 0.01;
$total_cash = $total_cash * $countr_c_rate;
}

How To Optimize PostgreSQL generate_series function

I have a query that uses PostgreSQL generate_series function but when it comes to large amounts of data, the query can be slow. An example of code the generates the query is below:
$yesterday = date('Y-m-d',(strtotime ( '-1 day' ) ));
$query = "
WITH interval_step AS (
SELECT gs::date AS interval_dt, random() AS r
FROM generate_series('$yesterday'::timestamp, '2015-01-01', '1 day') AS gs)
SELECT articles.article_id, article_title, article_excerpt, article_author, article_link, article_default_image, article_date_published, article_bias_avg, article_rating_avg
FROM development.articles JOIN interval_step ON articles.article_date_added::date=interval_step.interval_dt ";
if (isset($this -> registry -> get['category'])) {
$query .= "
JOIN development.feed_articles ON articles.article_id = feed_articles.article_id
JOIN development.rss_feeds ON feed_articles.rss_feed_id = rss_feeds.rss_feed_id
JOIN development.news_categories ON rss_feeds.news_category_id = news_categories.news_category_id
WHERE news_category_name = $1";
$params = array($category_name);
$query_name = 'browse_category';
}
$query .= " ORDER BY interval_step.interval_dt DESC, RANDOM() LIMIT 20;";
This series looks for only content that goes one day back and sorts the results in random order. My question is what are was that generate_series can be optimized to improve performance?
You don't need that generate_series at all. And do not concatenate query strings. Avoid it by making the parameter an empty string (or null) if it is not set:
if (!isset($this -> registry -> get['category']))
$category_name = '';
$query = "
select articles.article_id, article_title, article_excerpt, article_author, article_link, article_default_image, article_date_published, article_bias_avg, article_rating_avg
from
development.articles
inner join
development.feed_articles using (article_id)
inner join
development.rss_feeds using (rss_feed_id)
inner join
development.news_categories using (news_category_id)
where
(news_category_name = $1 or $1 = '')
and articles.article_date_added >= current_date - 1
order by
date_trunc('day', articles.article_date_added) desc,
random()
limit 20;
";
$params = array($category_name);
Passing $yesterday to the query is also not necessary as it can be done entirely in SQL.
If $category_name is empty it will return all categories:
(news_category_name = $1 or $1 = '')
Imho, try removing that random() in your order by statement. It probably has a much larger performance impact than you think. As things are it's probably ordering the entire set by interval_dt desc, random(), and then picking the top 20. Not advisable...
Try fetching e.g. 100 rows ordered by interval_dt desc instead, then shuffle them per the same logic, and pick 20 in your app. Or wrap the entire thing in a subquery limit 100, and re-order accordingly along the same lines.

Categories