Sorting by date in reviews - php

In many reviews, there's usually a feature to sort by date, helpfulness, etc... Essentially these work by querying a new MySQL line right?
For example:
$sql = "SELECT * FROM TABLE ORDER BY TimeAdded"
$sql = "SELECT * FROM TABLE ORDER BY Helpfulness"
Or would there be a better way to do it?
Secondly, making pages for reviews. Is it as simple as using a loop and a limit in the MySQL query to show 10 results per page?
Edit:
For a huge review site, would letting MySQL handle all the sorting be ideal?

You could either let the database manage the ordering as you have there (recommended) or you could load all the results and then sort them using your PHP code. Alternatively, you could put them all into HTML and sort them using JavaScript.
For pagination, you can use the LIMIT and OFFSET clauses (or LIMIT 10,10) to page through a resultset (of course, if the resultset changes, your pagination may also change).
SELECT * FROM posts ORDER BY helpfulness LIMIT 10,10
OR
SELECT * FROM posts ORDER BY helpfulness LIMIT 10 OFFSET 10
For combining pagination and sorting, I'd definitely recommend leaving it up to the database to handle.

You got it right on both counts, Doug.
For a huge review site, you still want MySQL to sort the results, you just want to make use of server-side optimizations (a good my.cnf, server-side caching, etc) and code caching. That's a whole 'nother ball of wax, but the basics are basically the same.

Related

MySQL+PHP: How to paginate data from complex query with ORDER BY on user-selected column

I have a table with currently ~1500 rows which is expected to grow over time (can't say how much, but still), the website is read-only and lets users do complex queries through the use of some forms, then the search query is completely URL-encoded since it's a public database. It's important to know that users can select what column data must be sorted by.
I'm not concerned about putting some indexes and slowing down INSERTs and UPDATEs (just performed occasionally by admins) since it's basically heavy-reading, but I need to paginate results as some popular queries can return 900+ results and that takes up too much space and RAM on client-side (results are further processed to create a quite rich <div> HTML element with an <img> for each result, btw).
I'm aware of the use of OFFSET {$m} LIMIT {$n} but would like to avoid it
I'm aware of the use of this
Query
SELECT *
FROM table
WHERE {$filters} AND id > {$last_id}
ORDER BY id ASC
LIMIT {$results_per_page}
and that's what I'd like to use, but that requires rows to be sorted only by their ID!
I've come up with (what I think is) a very similar query to custom sort results and allow efficient pagination.
Query:
SELECT *
FROM table
WHERE {$filters} AND {$column_id} > {$last_column_id}
ORDER BY {$column} ASC
LIMIT {$results_per_page}
but that unfortunately requires to have a {$last_column_id} value to pass between pages!
I know indexes (especially unique indexes) are basically automatically-updated integer-based columns that "rank" a table by values of a column (be it integer, varchar etc.), but I really don't know how to make MySQL return the needed $last_column_id for that query to work!
The only thing I can come up with is to put an additional "XYZ_id" integer column next to every "XYZ" column users can sort results by, then update values periodically through some scripts, but is it the only way to make it work? Please help.
(Too many comments to fit into a 'comment'.)
Is the query I/O bound? Or CPU bound? It seems like a mere 1500 rows would lead to being CPU-bound and fast enough.
What engine are you using? How much RAM? What are the settings of key_buffer_size and innodb_buffer_pool_size?
Let's see SHOW CREATE TABLE. If the table is full of big BLOBs or TEXT fields, we need to code the query to avoid fetching those bulky fields only to throw them away because of OFFSET. Hint: Fetch the LIMIT IDs, then reach back into the table to get the bulky columns.
The only way for this to be efficient:
SELECT ...
WHERE x = ...
ORDER BY y
LIMIT 100,20
is to have INDEX(x,y). But, even that, will still have to step over 100 cow paddies.
You have implied that there are many possible WHERE and ORDER BY clauses? That would imply that adding enough indexes to cover all cases is probably impractical?
"Remembering where you left off" is much better than using OFFSET, so try to do that. That avoids the already-discussed problem with OFFSET.
Do not use WHERE (a,b) > (x,y); that construct used not to be optimized well. (Perhaps 5.7 has fixed it, but I don't know.)
My blog on OFFSET discusses your problem. (However, it may or may not help your specific case.)

Show relationship using two table JOIN, or use PHP functions?

I'm making a micro-blogging website. The users can follow each other. I've to make stream of posts (activity stream) for the current user ( $userid ) based on the users the current user is following, like in Twitter. I know two ways of implementing this. Which one is better?
Tables:
Table: posts
Columns: PostID, AuthorID, TimeStamp, Content
Table: follow
Columns: poster, follower
The first way, by joining these two tables:
select `posts`.* from `posts`,`follow` where `follow`.`follower`='$userid' and
`posts`.`AuthorID`=`follow`.`poster` order by `posts`.`postid` desc
The second way is by making an array of users the $userid is following (posters), then doing php implode on this array, and then doing where in:
One thing I'll like to tell here that I'm storing the the number of users a user is following in the `following` record of the `user` table, so here I'll use this number as a limit when extracting the list of posters - the 'followingList':
function followingList($userid){
$listArray=array();
$limit="select `following` from `users` where `userid`='$userid' limit 1";
$limit=mysql_query($limit);
$limit=mysql_fetch_row($limit);
$limit= (int) $limit[0];
$sql="select `poster` from `follow` where `follower`='$userid' limit $limit";
$result=mysql_query($sql);
while($data = mysql_fetch_row($result)){
$listArray[] = $data[0];
}
$posters=implode("','",$listArray);
return $posters;
}
Now I've a comma separated list of user IDs the current $userid is following.And now selecting the posts to make the activity stream:
$posters=followingList($userid);
$sql = "select * from `posts` where (`AuthorID` in ('$posters'))
order by `postid` desc";
Which of the two methods is better?
And can knowing the total number of following (number of users the current user is following), make things faster in the first method as it's doing in the second method?
Any other better method?
You should go all the way with the first option. Always try as much as possible to process the data on the mysql server instead of in your PHP code. PHP will not implicitly cache the results of the operations while MySQL will do it.
The most important thing is to make sure you index your data correctly. Try using "EXPLAIN" statements to make sure you have optimized your database as much as possible and use #1 to link your data together.
http://dev.mysql.com/doc/refman/5.0/en/explain.html
This will allow you later to compute statistics also, while the second method requires you to process a part of the statistics.
The first important point is that PHP is good at building pages but very bad are managing data, everything manipulated by PHP will fill the memory and no special behavior can be applied in PHP to prevent using to much memory, except crashing.
On the other side the datatase job is to analyse relation between the tables, real number used by the query (cardinality of indexes and statictics on rows and index usage in fact), and a lot of different mechanism can be choosen by the engine depending on the size of data (merge joins, temporary tables, etc). That means you could have 256.278.242 posts and 145.268 users, with 5.684 average followers the datatabase job would be to find the fastest way to give you an answer. Well, when you hit really big numbers you'll see that all databases are not equal, but that's another problem.
On the PHP side Retrieving the list of users from the fisrt query coudl became very long (with a big number of followed users, let's say 15.000. Simply building the query string with 15 000 identifiers inside would take a quite big amount a memory. Trasnferring this new query to the SQL server would also be slow. It's definitively the wrong way.
Now be careful of the way you build your SQL request. A request is something you should be able to read from the top to the end, explaining what you really want. This will help the SQL (good) engine in choosing the right solution.
select `posts`.*
from `posts`
INNER JOIN `follow` ON posts`.`AuthorID`=`follow`.`poster`
where `follow`.`follower`='#userid'
order by `posts`.`postid` desc
LIMIT 15
Several remarks:
I have used an INNER JOIN.I want an INNER JOIN, let's write it, it will be easier to read for me later and it should be the same for the query analyser.
if #userid is an int do not use quotes. Please use ints for identifiers (this is really faster than strings). And on the PHP side cast the int "SELECT ..." . (int) $user_id ." ORDER ... or use query with parameters (This is for security).
I have used a LIMIT 15, maybe an offset could be used as well, if you want to show some pagination control around the posts. Let's say this query will retrieve 15.263 documents from my 5.642 folowwed users, you do not want, and the user do not want, to show theses 15.263 documents on a web page. And knowing with $limit that the number is 15.263 is a good thing but certainly not for a request limit. You know this number, but the database may know it as well if it has a good query analyser and some good internal statistics.
The request limit has several goals
1. Limit the size of data transfered from the database to your PHP script
2. Limit the memory usage of your PHP script (an array with 15.263 documents containg some HTMl stuff... ouch)
3. Limit the size of the final user output (and get a faster response)

Creating new pages dynamically based on amount of database entries? (and sorting issue)

I didn't quite know how to thoroughly explain my question in the title, but here's my idea:
My website takes user submissions and writes them to the database (they're displayed on a new page). They're ordered by time of submission (oldest first). I'd like to reverse the order they're displayed in (so it would display newest submissions at the top), and only display 20-30 per page, creating a new page every time it exceeded that number. I'm new to php, and completely unsure on what to do here. Thanks for any help.(:
Create an SQL Query, no need for PHP code.
Out the top of my head
SELECT * FROM submissions
ORDER BY subID ASC
LIMIT 30
Use mysql_query() and manipulate the LIMIT with PHP
e.g.
$limit = 25;
$query = mysql_query("SELECT * FROM submissions ORDER BY subID ASC LIMIT $limit");
The term for this is pagination, and as the others have said, it is something that has to be handled by your query (as the issue isn't just a matter of the UI, but also the number of records and their size returned from the database).
jQuery.DataTables will do all this for you.

How to efficiently paginate large datasets with PHP and MySQL?

As some of you may know, use of the LIMIT keyword in MySQL does not preclude it from reading the preceding records.
For example:
SELECT * FROM my_table LIMIT 10000, 20;
Means that MySQL will still read the first 10,000 records and throw them away before producing the 20 we are after.
So, when paginating a large dataset, high page numbers mean long load times.
Does anyone know of any existing pagination class/technique/methodology that can paginate large datasets in a more efficient way i.e. that does not rely on the LIMIT MySQL keyword?
In PHP if possible as that is the weapon of choice at my company.
Cheers.
First of all, if you want to paginate, you absolutely have to have an ORDER BY clause. Then you simply have to use that clause to dig deeper in your data set. For example, consider this:
SELECT * FROM my_table ORDER BY id LIMIT 20
You'll have the first 20 records, let's say their id's are: 5,8,9,...,55,64. Your pagination link to page 2 will look like "list.php?page=2&id=64" and your query will be
SELECT * FROM my_table WHERE id > 64 ORDER BY id LIMIT 20
No offset, only 20 records read. It doesn't allow you to jump arbitrarily to any page, but most of the time people just browse the next/prev page. An index on "id" will improve the performance, even with big OFFSET values.
A solution might be to not use the limit clause, and use a join instead -- joining on a table used as some kind of sequence.
For more informations, on SO, I found this question / answer, which gives an example -- that might help you ;-)
There are basically 3 approaches to this, each of which have their own trade-offs:
Send all 10000 records to the client, and handle pagination client-side via Javascript or the like. Obvious benefit is that only a single query is necessary for all of the records; obvious downside is that if the record size is in any way significant, the size of the page sent to the browser will be of proportionate size - and the user might not actually care about the full record set.
Do what you're currently doing, namely SQL LIMIT and grab only the records you need with each request, completely stateless. Benefit in that it only sends the records for the page currently requested, so requests are small, downsides in that a) it requires a server request for each page, and b) it's slower as the number of records/pages increases for later pages in the result, as you mentioned. Using a JOIN or a WHERE clause on a monotonically increasing id field can sometimes help in this regard, specifically if you're requesting results from a static table as opposed to a dynamic query.
Maintain some sort of state object on the server which caches the query results and can be referenced in future requests for a limited period of time. Upside is that it has the best query speed, since the actual query only needs to run once; downside is having to manage/store/cleanup those state objects (especially nasty for high-traffic websites).
SELECT * FROM my_table LIMIT 10000, 20;
means show 20 records starting from record # 10000 in the search , if ur using primary keys in the where clause there will not be a heavy load on my sql
any other methods for pagnation will take real huge load like using a join method
I'm not aware of that performance decrease that you've mentioned, and I don't know of any other solution for pagination however a ORDER BY clause might help you reduce the load time.
Best way is to define index field in my_table and for every new inserted row you need increment this field. And after all you need to use WHERE YOUR_INDEX_FIELD BETWEEN 10000 AND 10020
It will much faster.
some other options,
Partition the tables per each page so ignore the limit
Store the results into a session (a good idea would be to create a hash of that data using md5, then using that cache the session per multiple users)

What is the best way to paginate results in php

I need to display many pages of news in a site. Should I do the pagination in the database query using LIMIT or with the PHP script after getting all the results?
Use limit in SQL! Every time!
Otherwise you're throwing around considerably more data than you need to, which makes your scripts unnecessarily slow, and will lead to scalability problems as the amount of data in your tables increases.
Limit is your friend!
Use limit - you don't want to transfer masses of data from the database to the scripting engine if you can avoid it.
If you want only work with a DBMS that support this than do it on the DBMS. If you want support other DBMS in the future then ad a layer between that can handle depending on the current DBMS.
You can use some existing libraries to help you:
Pear::Pager can help with the output, and to limit the database traffic to only what you need, you can use a wrapper provided in the examples that come with it.
Here's a tutorial I just googled that has it all...
In addition to using LIMIT, I'd suggest using an explicit WHERE clause to set the offset, and order the results on that column. For example:
--- First page (showing first 50 records)
SELECT * FROM people ORDER BY id LIMIT 50
--- Second page
SELECT * FROM people WHERE id > 50 ORDER BY id LIMIT 50
This further limits the numbers of rows returned to those within the desired range. Using the WHERE approach (as opposed to a LIMIT clause with a separate offset, e.g. LIMIT 50,50) allows you to deal effectively with paging through records with other natural keys, e.g. alphabetically by name, or by date order.
Personally, I would use the query to do it. Obviously, that can change if your dealing with AJAX and such, but just doing a basic limit in the query and outputting the results is simple and efficient.

Categories