Executing a "weighted" SQL query - php

In my current setup, I have two tables: product and rating.
Product Table
product_id
rating
The product table contains a whole bunch additional of information, but for this question, I am focussed on those two fields only.
Rating Table
product_id
rating
user_id (who rated)
is_admin - bool on whether the user that rated was an admin
The reason we collect the admin ratings in the first place, is because we want to weigh admin ratings slightly higher (60%) in comparison to regular users (40%). The rating column in the product table is equal to the AVG of all the admin ratings. Ratings in general are between 1 and 5.
So for each product we have to consider four scenarios:
RATINGS BY TOTAL
USER ADMIN RATING
---- -----
no no = 0
yes no = AVG of user ratings (`ratings` table)
yes yes = 0.6 AVG of admin ratings (`product_table`) + 0.4 AVG of user ratings (`ratings` table)
no yes = AVG of admin ratings (`product_table`)
The SQL query which currently retrieves the datasets looks like this:
$sql = "SELECT p.product_id,
(COALESCE(p.rating,0)+COALESCE(j.sum,0)) / (COALESCE(p.rating/p.rating,0)
+ COALESCE(j.tot,0)) AS rating
FROM product p
LEFT JOIN
(SELECT SUM(rating) AS sum ,
COUNT(rating) AS tot,
product_id FROM rating
WHERE is_admin_rating=FALSE GROUP BY product_id) j
ON (p.product_id = j.product_id) LEFT JOIN product_description pd
ON (p.product_id = pd.product_id) LEFT JOIN product_to_store p2s
ON (p.product_id = p2s.product_id)";
This query then gets appended with a variety of different sort options (rating being the default), in addition to that we also use LIMIT to "paginate" the search results.
Is there a way in to incorporate the weighted ratings into the query? Or will I have to break it up into several queries?

Since this obviously looks like a web-based system, I would strongly suggest a slight denormalization and tacking on 5 columns to the product table for
UserRatings, UserCount, AdminRatings, AdminCount, FinalRating
When any entries are added or updated to the ratings table, you could apply a simple update trigger, something like
update Product p,
( select r.product_id,
sum( is_admin_rating=FALSE, 1, 0 ) as UserCount,
sum( is_admin_rating=FALSE, rating, 0 ) as UserRatings,
sum( is_admin_rating=TRUE, 1, 0 ) as AdminCount,
sum( is_admin_rating=TRUE, rating, 0 ) as AdminRatings
from Ratings r
where r.product_id = ProductIDThatCausedThisTrigger
group by r.product_id ) as PreSum
set p.UserCount = PreSum.UserCount,
p.UserRatings = PreSum.UserRatings,
p.AdminrCount = PreSum.AdminCount,
p.AdminRatings = PreSum.AdminRatings,
p.FinalRating = case when PreSum.UserCount = 0 and PreSum.AdminCount = 0
then 0
when PreSum.UserCount = 0
then PreSum.AdminRatings / PreSum.AdminCount
when PreSum.AdminCount = 0
then PreSum.UserRatings / PreSum.UserCount
else
( PreSum.UserRatings / PreSum.UserCount * .4 )
/ ( PreSum.AdminRatings / PreSum.AdminCount * .6 )
end
where p.product_id = PreSum.product_id
This way, you will never have to do a separate join to the ratings table and do aggregations which will just get slower as more data is accumulated. Then your query can just use the tables and not have to worry about coalesce, your count per each and their ratings will be there.
The case/when for the FinalRatings is basically doing it all wrapped up as the combination of the user counts and admin counts can be 0/0, +/0, 0/+ or +/+
So, if no count for either, the case/when sets rating to 0
if only the user count has a value, just get that average rating (userRatings / userCounts)
if only the admin count has a value, get admin avg rating (adminRatings / adminCounts)
if BOTH have counts, you are taking the respective averages * .4 and * .6 respectively. This would be the one factoring adjustment you might want to tweak.
Although the query itself looks somewhat monstrous and confusing, if you look at the "PreSum" query, you are only doing it for the 1 product that has just been rated and basis from the trigger. Then, a simple update based on the results of that joined by the single product ID.
Getting this to work might offer a better long-term solution for you.

Related

How to create new table for the output of DISTINCT COUNT to be distributed in rows, not in column?

My query displays the DISTINCT count of buyers with corresponding ticketserial#. I need to automatically calculate the SOLD and BALANCE column and save into the database either into the existing table (table1) with the rows that corresponds to the ticketserial. I've already exhausted my brain and did google many times but I just can't figure it out. So I tried another option by trying to create a new table into the database for the output of DISTINCT COUNT but I didn't find any sample query to follow, so that I could just use INNER JOIN for that new table with table1, with that the PRINTED, SOLD are in the same table, thus I can subtract these columns to obtain the values for the BALANCE column.
Existing table1 & table2 are records in the database via html form:
Table1
Ticket Serial Printed Copies SOLD(sold) Balance
TS#1234 50 ?(should be auto ?
TS#5678 80 ?(should be auto ?
(so on and so forth...)
Table2
Buyer Ticket Serial
Adam TS#1234
Kathy TS#1234
Sam TS#5678
(so on and so forth...)
The COUNT DISTINCT outputs the qty. of sold tickets:
<td> <?php print '<div align="center">'.$row['COUNT(ticketserial)'];?></td>
...
$query = "SELECT *, COUNT(ticketserial) FROM buyers WHERE ticketsold != 'blank' GROUP BY
ticketserial ";
It's COUNT output looks like this:
Ticket Serial------Distinct Count
TS#1234 7
TS#5678 25
(so on and so forth...)
I tried to update the SOLD column and BALANCE column by UPDATE or INSERT and foreach loop but only the first row in table was updated.
Table1
Ticket Serial Printed Copies Sold Balance
TS#1234 50 **7** 0
TS#5678 80 **0** 0
TS#8911 40 **0** 0
(so on and so forth...)
Note: The fieldname "sold" in table1 is not the same with the fieldname "ticketsold" in table2 as the former is quantity and the later is ticketserials.
Your question is a bit hard to follow. However this looks like a left join on a aggregate query:
select
t1.ticket_serial,
t1.printed_copies,
coalesce(t2.sold, 0) sold,
t1.printed_copies - coalesce(t2.sold, 0) balance
from table1 t1
left join (
select ticket_serial, count(*) sold
from table2
group by ticket_serial
) t2 on t2.ticket_serial = t1.ticket_serial
If you are looking to update the main table:
update table1 t1
left join (
select ticket_serial, count(*) sold
from table2
group by ticket_serial
) t2 on t2.ticket_serial = t1.ticket_serial
set
t1.sold = coalesce(t2.sold, 0),
t1.balance = t1.printed_copies - coalesce(t2.sold, 0)
I would not actually recommend storing the sold and balance in the main table - this is derived information that can be easily computed when needed, and would be tedious to maintain. If needed, you could create a view using the first above SQL statement, which will give you an always up-to-date perspective at your data.

Efficient comment system pagination query

So I've been looking around the web about any information about pagination.
From what I've seen there are 3 kinds, (LIMIT, OFFSET) a, (WHERE id > :num ORDER BY id LIMIT 10) b and (cursor pagination) c like those used on facebook and twitter.
I decided that for my project I'll go with the "b" option as it looks pretty straightforward and efficient.
I'm trying to create some kind of "facebook" like post and comment system, but not as complex.
I have a ranking system for the posts and comments and top 2 comments for each post that are fetched with the post.
The rest of the comments for each specific post are being fetched when people click on to see more comments.
This is a query for post comments:
SELECT
c.commentID,
c.externalPostID,
c.numOfLikes,
c.createdAt,
c.customerID,
c.numOfComments,
(CASE WHEN cl.customerID IS NULL THEN false ELSE true END) isLiked,
cc.text,
cu.reputation,
cu.firstName,
cu.lastName,
c.ranking
FROM
(SELECT *
FROM Comments
WHERE Comments.externalPostID = :externalPostID) c
LEFT JOIN CommentLikes cl ON cl.commentID = c.commentID AND cl.customerID = :customerID
INNER JOIN CommentContent cc ON cc.commentTextID = c.commentID
INNER JOIN Customers cu ON cu.customerID = c.customerID
ORDER BY c.weight DESC, c.createdAt ASC LIMIT 10 OFFSET 2
offset 2 is because there were 2 comments being fetched earlier as top 2 comments.
I'm looking for a way similar to this of seeking next 10 comments each time through the DB without going through all the rows like with LIMIT,OFFSET
The problem is that I have two columns that are sorting the results and I won't allow me to use this method:
SELECT * FROM Comments WHERE id > :lastId LIMIT :limit;
HUGE thanks for the helpers !
Solution So Far:
In order to to have an efficient pagination we need to have a single column with as much as possible unique values that make a sequence to help us sort the data and paginate through.
My example uses two columns to sort the data so it makes a problem.
What I did is combine time(asc sorting order) and weight of the comment(desc sorting order), weight is a total of how much that comment is being engaged by users.
I achieved it by getting the pure int number out of the DateTime format and dividing the number by the weight let's call the result,"ranking" .
this way a comment with a weight will always have a lower ranking ,than a comment without a weight.
DateTime after stripping is a 14 digit int ,so it shouldn't make a problem dividing it by another number.
So now we have one column that sorts the comments in a way that comments with engagement will be at the top and after that will come the older comments ,so on until the newly posted comments at the end.
Now we can use this high performance pagination method that scales well:
SELECT * FROM Comments WHERE ranking > :lastRanking ORDER BY ASC LIMIT :limit;
Ok i want to say about other way, in my opinion this very useful.
$rowCount = 10; //this is count of row that is fetched every time
$page = 1; //this is for calculating offset . you must increase only this value every time
$offset = ($page - 1) * $rowCount; //offset
SELECT
c.commentID,
c.externalPostID,
c.numOfLikes,
c.createdAt,
c.customerID,
c.numOfComments,
(CASE WHEN cl.customerID IS NULL THEN false ELSE true END) isLiked,
cc.text,
cu.reputation,
cu.firstName,
cu.lastName,
c.ranking
FROM
(SELECT *
FROM Comments
WHERE Comments.externalPostID = :externalPostID) c
LEFT JOIN CommentLikes cl ON cl.commentID = c.commentID AND cl.customerID = :customerID
INNER JOIN CommentContent cc ON cc.commentTextID = c.commentID
INNER JOIN Customers cu ON cu.customerID = c.customerID
ORDER BY c.ranking DESC, c.createdAt ASC LIMIT $rowCount OFFSET $offset
There can be an error because i didn't check it , please don't make it matter

Joins and correlated subquery

Cannot figure out query for situation where I want to display only customers with unverified order but do not include customers who already have at least one verified order. One customer can have more records in DB since for every order also new record in customers table is made so the only way how to track specific user is by customer_number.
My DB structure (simplified):
customers:
id | customer_number
orders:
id | customer_id | isVerified
I would probably need to combine join and correlated queries (to search records in customers table for every customer_number and check isVerified column for false) which in the end could be really slow especially for thousands of records.
I use Laravel so Eloquent ORM is available if this can make things easier.
(Second thought: Or maybe it would be faster and more efficient to rewrite that part to create only one user record for orders of specific user.)
Any ideas? Thank you
There are probably a few ways to do this, but you can achieve this result with a join, aggregation and conditional sum:
select a.customer_id,
sum( case when isVerified = 1 then 1 else 0 end ) as Num_Verified,
sum( case when isVerified = 0 then 1 else 0 end ) as Num_unVerified
from customers as a
left join
orders as b
on a.customer_id = b.customer_id
group by a.customer_id
having Num_Verified = 0
and Num_unVerified > 0
SQLfiddle here
You can achieve like this:
$customer_id = Customer::join('orders','customers.id','orders.cutomer_id')
->where('isVerified',true)
->select('orders.*')
->groupBy('customer_id')
->pluck('customer_id');
This will give customers with at least one verified order.
Now get customers with unverified orders as:
$customers = Customer::join('orders','customers.id','orders.customer_id')
->where('isVerified',false)
->whereNotIn('customer_id',$customer_id)
->select('customers.customer_number','orders.*')
->groupBy('customer_id')
->pluck('customer_id');
How about this one?
$customer_list = customers::where('customer_number',$customer_number)->with('orders',function($query){
$query->where('isVerified',0);
})->get();
One method is an aggregation query:
select c.customer_number
from customers c join
orders o
on c.customer_id = o.customer_id
group by c.customer_number
having sum(isVerified = 1) = 0 and
sum(isVerified = 0) > 0;
This structure assumes that isVerified is a number that takes on the values of 0 for false and 1 for true.

Need SQL query with good performance to select data that does NOT match criteria

I have a database with
a company table
a country table
a company_country n:n table which defines which company is available in which country
a product table (each product belongs to one specific categoryId)
and a company_product_country n:n:n table that defines which company offers which product in which country.
The latter has the three primary key columns companyId, productId, countryId and the additional columns val and limitedAvailability. val is an ENUM with the values yes|no|n/a, and limitedAvailability is an ENUM with the values 0|1.
Products within categories 1 or 2 are available in all countries and therefore get countryId = 0. But at the same time, only these very products may have a limitedAvailability = 1.
An SQLFiddle with a test database can be found here: http://www.sqlfiddle.com/#!9/a065a/1/0
It contains five countries, products and companies.
Background information on what I need to select from the database:
A PHP script generates a search form where an arbitrary list of countries and products can be selected. The products are separated by categories (I did not add the category table in the sample database, because it is not needed in this case). For the first category, I can select whether to exclude products with limited availability.
Generating the desired result works fine:
It displays all companies that are available in the selected countries and have at least one of the selected products available. The result offers a column that defines how many of the selected products are available by company.
If the user defines that one or more categories should not contain products with limited availability, then the products within the corresponding categories will not count as a match if the company offers them with limited availability only.
I am pleased with the performance of this query. My original database has got around 15 countries, 100 companies and 150 products. Selecting everything in the search form occupies the MySQL server for around two seconds which is acceptable for me.
The problem:
After generating the result list of companies which matches as many product search criteria as possible, I use PHP to iterate through those companies and run another SQL query that should give me the list of products that the company does not offer corresponding to the search criteria. The following is an example query for companyId 1 to find out which products are not available when
the desired products have the productIds 2, 4 and 5
the product's country availability should be at least one of the countryIds 1, 2 or 3
the product should not have a limitedAvailability when it is from categoryId = 2:
SELECT DISTINCT p.name
FROM `product` p
LEFT JOIN `company_product_country` cpc ON `p`.`productId` = `cpc`.`productId` AND `cpc`.`companyId` = 1
WHERE NOT EXISTS(
SELECT *
FROM company_product_country cpcTmp
WHERE `cpcTmp`.`companyId` = 1
AND cpcTmp.val = 'yes'
AND (
cpcTmp.limitedAvailability = 0
OR p.categoryId NOT IN(2)
)
AND cpcTmp.productId = p.productId
)
AND p.`productId` IN (2,4,5)
AND countryId IN(0,1,2,3);
The database along with this query can be found on the SQLFiddle linked above.
The query generates the correct result, but its performance dramatically decreases with the number of products. My local SQL server needs about 4 seconds per company when searching for 150 products in 15 countries. This is inaccpetable when iterating through 100 companies. Is there any way to improve this query, like avoiding the IN(...) function containing up to 150 products? Or should I maybe split the query into two like so:
First fetch the unmatched products that do not have country Id 0 and are IN the desired countryIds
Then fetch the unmatched products in countryId = 0 and if applicable filter limitedAvailability = 0
?
Your help is gladly appreciated!
I would suggest writing the query like this:
SELECT p.name
FROM product p
WHERE EXISTS (select 1
from company_product_country cpc
where p.productid = cpc.productid and
cpc.companyid = 1 and
cpc.countryid in (1, 2, 3)
) and
NOT EXISTS (select 1
from company_product_country cpcTmp
where cpcTmp.productId = p.productId and
cpcTmp.companyId = 1 and
cpcTmp.val = 'yes' and
cpcTmp.limitedAvailability = 0
) AND
NOT EXISTS (select 1
from company_product_country cpcTmp
where cpcTmp.productId = p.productId and
cpcTmp.companyId = 1 and
cpcTmp.val = 'yes' and
p.categoryId NOT IN (2)
)
p.`productId` IN (2, 4, 5) ;
Then, you want the following indexes:
product(productid, categoryid, name)
company_product_country(productid, companyid, countryid)
company_product_country(productid, companyid, val, limitedavailability)
company_product_country(productid, companyid, val, category)
Note: these indexes completely "cover" the query, meaning that all columns in the query come from the indexes. For most purposes, is probably sufficient to have a single index on company_product_country. Any of the three would do.
Take the query that identifies the products that match the user selection. Subquery it and outer join it to the products table. Exclude the matches.
SQL Fiddle
SELECT p.name
FROM
product p LEFT JOIN
(
SELECT productId
FROM company_product_country cpcTmp
WHERE companyId = 1 AND
countryId IN (0,1,2,3) AND
(
productId IN (4, 5) OR
(productId = 2 AND limitedAvailability = 0)
)
) t
ON p.productId = t.productId
WHERE
t.productId IS NULL AND
p.productId IN (2,4,5)

SQL to search by associative items

I have the below two tables and I need to be able to search by items to find the shopping_list_id. Also, I want to limit the query so that it doesn't bring back other shopping lists with additional items on it. Essentially, I'm checking to see if this is a shopping list the user has saved before. The below query does NOT handle if there are shopping lists that match but with additional items, I'm stumped as to how to do that.
tables:
shopping_list
shopping_list_id
user
shopping_list_name
shopping_list_item
shopping_list_item_id
shopping_list_id
category_id
qty
qty_unit_id
This example has three items, but there could be any number. My PHP code dynamically generates the SQL joins and where clause based on the user's input.
Query that I have:
SELECT DISTINCT sli.shopping_list_id
FROM shopping_list_item sli
JOIN shopping_list sl ON sli.shopping_list_id=sl.shopping_list_id
JOIN shopping_list_item sli0 on sli.shopping_list_id=sli0.shopping_list_id
JOIN shopping_list_item sli1 on sli.shopping_list_id=sli1.shopping_list_id
JOIN shopping_list_item sli2 on sli.shopping_list_id=sli2.shopping_list_id
WHERE sl.user_id=:webuser_id
AND sli0.category_id=3 AND sli0.qty=1 AND sli0.qty_unit_id=3
AND sli1.category_id=683 AND sli1.qty=1 AND sli1.qty_unit_id=3
AND sli2.category_id=309 AND sli2.qty=1 AND sli2.qty_unit_id=7
You can do this pretty easily with the group by/having approach to this type of query:
select sli.shopping_list_id
from shopping_list_item sli
group by sli.shopping_list_id
having sum(sli.category_id = 3 AND sli.qty = 1 AND sli.qty_unit_id) = 1 and
sum(sli.category_id = 683 AND sli.qty = 1 AND sli.qty_unit_id = 3) = 1 and
sum(sli.category_id = 309 AND sli.qty = 1 AND sli.qty_unit_id = 7) = 1 and
count(*) = 3;

Categories