Efficient comment system pagination query - php

So I've been looking around the web about any information about pagination.
From what I've seen there are 3 kinds, (LIMIT, OFFSET) a, (WHERE id > :num ORDER BY id LIMIT 10) b and (cursor pagination) c like those used on facebook and twitter.
I decided that for my project I'll go with the "b" option as it looks pretty straightforward and efficient.
I'm trying to create some kind of "facebook" like post and comment system, but not as complex.
I have a ranking system for the posts and comments and top 2 comments for each post that are fetched with the post.
The rest of the comments for each specific post are being fetched when people click on to see more comments.
This is a query for post comments:
SELECT
c.commentID,
c.externalPostID,
c.numOfLikes,
c.createdAt,
c.customerID,
c.numOfComments,
(CASE WHEN cl.customerID IS NULL THEN false ELSE true END) isLiked,
cc.text,
cu.reputation,
cu.firstName,
cu.lastName,
c.ranking
FROM
(SELECT *
FROM Comments
WHERE Comments.externalPostID = :externalPostID) c
LEFT JOIN CommentLikes cl ON cl.commentID = c.commentID AND cl.customerID = :customerID
INNER JOIN CommentContent cc ON cc.commentTextID = c.commentID
INNER JOIN Customers cu ON cu.customerID = c.customerID
ORDER BY c.weight DESC, c.createdAt ASC LIMIT 10 OFFSET 2
offset 2 is because there were 2 comments being fetched earlier as top 2 comments.
I'm looking for a way similar to this of seeking next 10 comments each time through the DB without going through all the rows like with LIMIT,OFFSET
The problem is that I have two columns that are sorting the results and I won't allow me to use this method:
SELECT * FROM Comments WHERE id > :lastId LIMIT :limit;
HUGE thanks for the helpers !
Solution So Far:
In order to to have an efficient pagination we need to have a single column with as much as possible unique values that make a sequence to help us sort the data and paginate through.
My example uses two columns to sort the data so it makes a problem.
What I did is combine time(asc sorting order) and weight of the comment(desc sorting order), weight is a total of how much that comment is being engaged by users.
I achieved it by getting the pure int number out of the DateTime format and dividing the number by the weight let's call the result,"ranking" .
this way a comment with a weight will always have a lower ranking ,than a comment without a weight.
DateTime after stripping is a 14 digit int ,so it shouldn't make a problem dividing it by another number.
So now we have one column that sorts the comments in a way that comments with engagement will be at the top and after that will come the older comments ,so on until the newly posted comments at the end.
Now we can use this high performance pagination method that scales well:
SELECT * FROM Comments WHERE ranking > :lastRanking ORDER BY ASC LIMIT :limit;

Ok i want to say about other way, in my opinion this very useful.
$rowCount = 10; //this is count of row that is fetched every time
$page = 1; //this is for calculating offset . you must increase only this value every time
$offset = ($page - 1) * $rowCount; //offset
SELECT
c.commentID,
c.externalPostID,
c.numOfLikes,
c.createdAt,
c.customerID,
c.numOfComments,
(CASE WHEN cl.customerID IS NULL THEN false ELSE true END) isLiked,
cc.text,
cu.reputation,
cu.firstName,
cu.lastName,
c.ranking
FROM
(SELECT *
FROM Comments
WHERE Comments.externalPostID = :externalPostID) c
LEFT JOIN CommentLikes cl ON cl.commentID = c.commentID AND cl.customerID = :customerID
INNER JOIN CommentContent cc ON cc.commentTextID = c.commentID
INNER JOIN Customers cu ON cu.customerID = c.customerID
ORDER BY c.ranking DESC, c.createdAt ASC LIMIT $rowCount OFFSET $offset
There can be an error because i didn't check it , please don't make it matter

Related

Mysql (doctrine) is duplicating count results probably due to grouping by but can't figure it out

I'm using doctrine as an ORM layer but by putting a plain mysql query in my database tool i got the same results. So my problem:
I have an invoices table, invoice_items and invoice_payments table and what i would like to get as an result is all invoices that are not paid or at least not fully paid yet. I know that the query should be almost correctly because its giving the correct amount of items back... the only thing is that it is multiplying the amount by a number of probably joined rows.
So the query that i have for now:
select invoice.*, sum(item.amount * item.quantity) as totalDue,
sum(payment.amount) as totalPaid
from invoices as invoice
left join invoice_items as item on item.invoice_id = invoice.id
left join invoice_payments as payment on payment.invoice_id = invoice.id
and payment.status = 'successful'
where invoice.invoice_number is not null
and invoice.sent_at is not null
and invoice.due_date >= '2018-05-15'
group by invoice.id
having count(payment.id) = 0
or sum(payment.amount) < sum(item.amount * item.quantity)
order by invoice.issue_date desc, sum(payment.amount) desc;
As you can see i also have totalDue and totalPaid in my select (those are for reference only and should be removed if the query is correct).
What i saw is that the amount is multiplied by six (because it has 6 items in the payments table).
So maybe someone could help me pointing in the right direction that it doesn't do the multiplying on the totalDue. I was thinking its probably because the group by but without my query is failing.
By simply using a distinct in my query i fixed the problem.
select invoice.*, sum(distinct(item.amount * item.quantity)) as totalDue,
sum(payment.amount) as totalPaid
from invoices as invoice
left join invoice_items as item on item.invoice_id = invoice.id
left join invoice_payments as payment on payment.invoice_id = invoice.id
and payment.status = 'successful'
where invoice.invoice_number is not null
and invoice.sent_at is not null
and invoice.due_date >= '2018-05-15'
group by invoice.id
having count(payment.id) = 0
or sum(payment.amount) < sum(distinct(item.amount * item.quantity))
order by invoice.issue_date desc, sum(payment.amount) desc;
I would like to thank all the people who had taken the time to re-style my question ;-)

Sorting data from MySQL by SUM with Grouping By - not working properly

I have a problem with mysql - i'm kind new to it, byt looking to improve my skills :)
Have a code like this:
if($where > 0) $query = mysql_query("SELECT img.*, user.user as owner_name, cat.name as cat_name FROM tentego_img AS img LEFT JOIN tablicacms_users AS user ON user.id = img.owner LEFT JOIN tentego_img_cat AS cat ON cat.id = img.cat WHERE img.`is_waiting` LIKE ".$where.$cat." INNER JOIN tentego_img_vote ON tentego_img.id = tentego_img_vote.object_id GROUP BY tentego_img_vote.object_id ORDER BY SUM ( (CASE WHEN tentego_img_vote.vote = '0' THEN '-1' ELSE '1' END) ) DESC LIMIT ".$page.",".$objPerPage);
I need to make sorting by number of votes, sorted descending.
Still it makes results sorted by it own way.
In table I have rows:
ID - vote id for table purpose
object_id- id of object joined with another table to show results.
User ID - user id
Vote - where values are 0 for dislike and 1 for like (so -1 for 0, and +1 for 1)
So, as I understand i need to sum up all records for each of unique object_id, then sort by sum of vote values of each.
This code worked before my script provider decide to upgrade it, so right now i dont know how to fix it :(

Better way than current query to assemble random categorized entries?

I am trying to display exactly 6 random 'entertainment' entries, but with my current query it's getting a random number between 1 and 6, and displaying that number of entries. How do I update this query in order to make it display exactly 6 random entertainment entries from my Articles table? Also, I don't want to do ORDER BY RAND() because my table will become bigger overtime. Here's my current query:
SELECT
r1.*
FROM
Articles AS r1
INNER JOIN (SELECT(RAND() * (SELECT MAX(id) FROM Articles)) AS id) AS r2
WHERE
r1.id >= r2.id
AND r1.category = 'entertainment'
LIMIT 6;
Table structure:
table Articles
- id (int)
- category (varchar)
- title (varchar)
- image (varchar)
- link (varchar)
- Counter (int)
- dateStamp (datetime)
Your 'entertainment' entries should all have unique id's which should be integers.
If this is the case you could generate 6 random int's between 1 and the amount of entries you have using PHP's rand() function. Here is a function I've written which may be useful.
function selectSixRandomEntries() {
$queryWhere = "";
$i = 0;
while($i < 6) {
$randomNumber = rand(1, 200);
if (strpos($queryWhere, $randomNumber) == -1)
continue;
$queryWhere .= "r1.id = " . rand(1, 200);
if ($i != 5)
$queryWhere .= " OR ";
$i++;
}
return $queryWhere
}
And to use it you could try
$query = "SELECT
r1.*
FROM
Articles AS r1
INNER JOIN (SELECT(RAND() * (SELECT MAX(id) FROM Articles)) AS id) AS r2
WHERE
" . selectSixRandomEntries() . "
AND r1.category = 'entertainment'
LIMIT 6";
With
select floor(rand() * m.maxId + 1) as randomId
from Articles a
join (SELECT MAX(id) maxId FROM Articles) m
limit 100
you will create 100 random ids. I take 100 because you have gaps in you id column, so the probability of not getting enough existing ids will be (very) small. Then you can use that result to select only 6 rows with those ids:
select distinct a.*
from (
select id, floor(rand() * m.maxId + 1) as randomId
from Articles a
join (SELECT MAX(id) maxId FROM Articles) m
limit 100
) r
join Articles a on a.id = r.randomId
order by r.id -- only need it for small tables. will slow down the query on big tables
limit 6
The best value for LIMIT in the subselect depends on percentage of gaps in your ids. 100 should be enough and fast.
Update
If you need to filter by category you can add a WHERE a.category = 'entertainment' clause before ORDER BY and LIMIT. But in that case you will need to ajust the number of generated random ids.
For example: If you have inserted 1M articles but 10% of them are deleted, then an average of 90 randomly generated ids do really exist. If now 10% of articles have category = 'entertainment', then an average of 9 random rows will match the condition. Average means - it might be 3 and might also be 16. So you need to generate more random ids to be sure, that you get at least 6 articles. With LIMIT 1000 in the subselect you will get an average of 90 random entertainment articles. This way you are very unlikely do get less than 6. So you need to know the statistics of your table in order to pick a good LIMIT.
Another issue with the WHERE clause, is that MySQL might reverse the join order to use an index for filtering. This might be faster for small number of generated random ids, but might be slower if the LIMIT in the subselect is huge. You can force the join order by using STRIGHT_JOIN instead of JOIN - But in my test with LIMIT 10000 it didn't make a
measurable difference.
If your condition is too selective (e.g. only 1% of articles have category='entertainment') a simple ORDER BY RAND() can be faster, because otherwise you would need to create too many random ids. But up to 10K rows matching your condition ORDER BY RAND() will be fast enough.

A Delete-proof way to get the index of a table row?

I have this nice and neat way of loading posts for my blog website to fit specified page:
$end = $count - ($page * $ppp); //count = select max(id) from art;
$start = $count- ($page * $ppp) - ($ppp-1);
$nxtpage = $page +1; //this is set beforehand in case no posts exists
$prvpage = $page == 0 ? 0 : $page -1;
$sql = "SELECT
a.id AS id,
a.nazwa AS nazwa,
a.data AS data,
a.wstep AS wstep,
a.imgs AS imgs,
a.zdj AS zdj,
GROUP_CONCAT(t.nazwa) all_tags
FROM
art a INNER JOIN tagart ta ON a.id = ta.id INNER JOIN tags t ON t.idt = ta.idt
WHERE a.id BETWEEN $start AND $end
GROUP BY a.id
ORDER BY a.id desc";
This way I can load only a specified numer of posts depended by blogs page (pagination).
There is a pretty big problem with it tho.
Lets say my client make a mistake like writing BLACK PPL somewhere in one article half a year ago, and now he has to delete it.
Or even better, has to delete about 10 posts from it. When middle posts are deleted, the whole alrorithm gets messed up, because it scans posts based it their ID.
So my question here for you is what better way of picking the posts I could use, that would always get the correct order of the posts?
It looks like you're trying to implement your own way of doing LIMIT, which is a MySQL feature that handles pagination. Instead of manually defining your start and end ID's, you should be looking to order your posts and then only fetching the next X posts, no matter what their ID's are. Here's how you would do that
$start = ($page - 1) * $ppp;
$sql = "SELECT
a.id AS id,
a.nazwa AS nazwa,
a.data AS data,
a.wstep AS wstep,
a.imgs AS imgs,
a.zdj AS zdj,
GROUP_CONCAT(t.nazwa) all_tags
FROM
art a INNER JOIN tagart ta ON a.id = ta.id INNER JOIN tags t ON t.idt = ta.idt
GROUP BY a.id
ORDER BY a.id DESC
LIMIT $start,$ppp";
LIMIT is used as either
LIMIT 5 #Fetch first 5 items
or
LIMIT 5,10 #Starting from the 5th item, fetch the next 10 items
Instead of using max(id) to determine the number of posts, use count(id) or count(*) to actually count them. If a post gets deleted, the count can take that into account.
In the select query use limit to select the range of posts to show.
As you already figured out, you should not be relying on MAX(id) to count the number of records.
I would just use a separate count query to get the count. It's simple and relatively inexpensive:
SELECT COUNT (id) FROM art
And as others have already mentioned, use LIMIT to paginate instead of limiting by id.

Nested count() function with % percentage operation

I’m designing a program for my school to keep student attendance records. So far I have the following query working fine and now I would like to add an IF statement to perform a percentage operation when a certain condition is given. As it is, the query is using INNER JOIN to search for data from two different tables (oxadmain and stuattend) and it’s displaying the results well on a results table:
SELECT o.name
, o.year
, o.photoID
, o.thumbs
, s.ID
, s.studid
, s.date
, s.teacher
, s.subject
, s.attendance
FROM stuattend s
JOIN oxadmain o
ON s.studid = o.stuid
ORDER
BY name ASC
Now I would like to add an “if” statement that
1) finds when stuattend.attendance is = Absent, calculates the percentage of absences the students may have in any given period of time, and then stores that (%) value in “percentage” and
2) ELSE assigns the value of 100% to “Percentage”.
So far I’ve been trying with the following:
<?php $_GET['studentID'] = $_row_RepeatedRS['WADAstuattend']; ?>
SELECT oxadmain.name , oxadmain.year , oxadmain.photoID , oxadmain.thumbs , stuattend.ID , stuattend.studid , stuattend.date , stuattend.teacher, stuattend.subject , stuattend.attendance
CASE
WHEN stuattend.attendance = Absent THEN SELECT Count (studentID) AS ClassDays, (SELECT Count(*) FROM stuattend WHERE studentID = stuattend.studid AND Absent = 1) AS ClassAbsent, ROUND ((ClassAbsent/ClassDays)*100, 2) AS Percentage
ELSE
Percentage = 100
END
FROM stuattend INNER JOIN oxadmain ON stuattend.studid=oxadmain.stuid
ORDER BY name ASC
Any suggestions on how to do this well?
Thank you for your attention
The base idea would be:
select stuattend.studid, sum(stuattend.attendance = `absent`) / count(*)
from stuattend
group by stuaddend.studid;
This very much depends on exactly one entry per student and per day, and of course gets 0 if no absence and 1 if always absent.
To make this a bit more stable I would suggest to write a calendar day table, which simply keeps a list of all days and a column if this is a school day, so workday=1 means they should have been there and workday=0 means sunday or holiday. Then you could left join from this table to the presence and absence days, and even would give good results when presence is not contained in your table.
Just ask if you decide which way to go.

Categories