I have a simple query using server-side pagination. The issue is the WHERE Clause makes a call to an expensive function and the functions argument is the user input, eg. what the user is searching for.
SELECT
*
FROM
( SELECT /*+ FIRST_ROWS(numberOfRows) */
query.*,
ROWNUM rn FROM
(SELECT
myColumns
FROM
myTable
WHERE expensiveFunction(:userInput)=1
ORDER BY id ASC
) query
)
WHERE rn >= :startIndex
AND ROWNUM <= :numberOfRows
This works and is quick assuming numberOfRows is small. However I would also like to have the total row count of the query. Depending on the user input and database size the query can take up to minutes. My current approach is to cache this value but that still means the user needs to wait minutes to see first result.
The results should be displayed in the Jquery datatables plugin which greatly helps with things like serer-side paging. It however requires the server to return a value for the total records to correctly display paging controls.
What would be the best approach? (Note: PHP)
I thought if returning first page immediately with a fake (better would be estimated) row count. After the page is loaded do an ajax call to a method that determines total row count of the query (what happens if the user pages during that time?) and then update the faked/estimated total row count.
However I have no clue how to do an estimate. I tried count(*) * 1000 with SAMPLE (0.1) but for whatever reason that actually takes longer than the full count query. Also just returning a fake/random value seems a bit hacky too. It would need to be bigger than 1 page size so that the "Next" button is enabled.
Other ideas?
One way to do it is as I said in the comments, to use a 'countless' approach. Modify the client side script in such a way that the Next button is always enabled and fetch the rows until there are none, then disable the Next button. You can always add a notification message to say that there are no more rows so it will be more user friendly.
Considering that you are expecting a significant amount of records, I doubt that the user will paginate through all the results.
Another way is to schedule a cron job that will do the counting of the records in the background and store that result in a table called totals. The running intervals of the job should be set up based on the frequency of the inserts / deletetions.
Then in the frontend, just use the count previously stored in totals. It should make a decent aproximation of the amount.
Depends on your DB engine.
In mysql, solution looks like this :
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
-> WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();
Basically, you add another attribute on your select (SQL_CALC_FOUND_ROWS) which tells mysql to count the rows as if limit clause was not present, while executing the query, while FOUND_ROWS actually retrieves that number.
For oracle, see this article :
How can I perform this query in oracle
Other DBMS might have something similar, but I don't know.
Related
This is a problem with a ordering search results on my website,
When a search is made, random results appear on the content page, this page includes pagination too. I user following as my SQL query.
SELECT * FROM table ORDER BY RAND() LIMIT 0,10;
so my questions are
I need to make sure that everytime user visits the next page, results they already seen not to appear again (exclude them in the next query, in a memory efficient way but still order by rand() )
everytime the visitor goes to the 1st page there is a different sets of results, Is it possible to use pagination with this, or will the ordering always be random.
I can use seed in the MYSQL, however i am not sure how to use that practically ..
Use RAND(SEED). Quoting docs: "If a constant integer argument N is specified, it is used as the seed value." (http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand).
In the example above the result order is rand, but it is always the same. You can just change the seed to get a new order.
SELECT * FROM your_table ORDER BY RAND(351);
You can change the seed every time the user hits the first results page and store it in the user session.
Random ordering in MySQL is as sticky a problem as they come. In the past, I've usually chosen to go around the problem whenever possible. Typically, a user won't ever come back to a set of pages like this more than once or twice. So this gives you the opportunity to avoid all of the various disgusting implementations of random order in favor of a couple simple, but not quite 100% random solutions.
Solution 1
Pick from a number of existing columns that already indexed for being sorted on. This can include created on, modified timestamps, or any other column you may sort by. When a user first comes to the site, have these handy in an array, pick one at random, and then randomly pick ASC or DESC.
In your case, every time a user comes back to page 1, pick something new, store it in session. Every subsequent page, you can use that sort to generate a consistent set of paging.
Solution 2
You could have an additional column that stores a random number for sorting. It should be indexed, obviously. Periodically, run the following query;
UPDATE table SET rand_col = RAND();
This may not work for your specs, as you seem to require every user to see something different every time they hit page 1.
First you should stop using the ORDER BY RAND syntax. This will bad for performance in large set of rows.
You need to manually determine the LIMIT constraints. If you still want to use the random results and you don't want users to see the same results on next page the only way is to save all the result for this search session in database and manipulate this information when user navigate to next page.
The next thing in web design you should understand - using any random data blocks on your site is very, very, very bad for users visual perception.
You have several problems to deal with! I recommend that you go step by step.
First issue: results they already seen not to appear again
Every item returned, store it in one array. (assuming the index id on the example)
When the user goes to the next page, pass to the query the NOT IN:
MySQL Query
SELECT * FROM table WHERE id NOT IN (1, 14, 25, 645) ORDER BY RAND() LIMIT 0,10;
What this does is to match all id that are not 1, 14, 25 or 645.
As far as the performance issue goes: in a memory efficient way
SELECT RAND( )
FROM table
WHERE id NOT
IN ( 1, 14, 25, 645 )
LIMIT 0 , 10
Showing rows 0 - 9 (10 total, Query took 0.0004 sec)
AND
SELECT *
FROM table
WHERE id NOT
IN ( 1, 14, 25, 645 )
ORDER BY RAND( )
LIMIT 0 , 10
Showing rows 0 - 9 (10 total, Query took 0.0609 sec)
So, don't use ORDER BY RAND(), preferably use SELECT RAND().
I would have your PHP generate your random record numbers or rows to retrieve, pass those to your query, and save a cookie on the user's client indicating what records they've already seen.
There's no reason for that user specific data to live on the server (unless you're tracking it, but it's random anyway so who cares).
The combination of
random ordering
pagination
HTTP (stateless)
is as ugly as it comes: 1. and 2. together need some sort of "persistent randomness", while 3. makes this harder to achieve. On top of this 1. is not a job a RDBMS is optimized to do.
My suggestion depends on how big your dataset is:
Few rows (ca. <1K):
select all PK values in first query (first page)
shuffle these in PHP
store shuffled list in session
for each page call select the data according to the stored PKs
Many rows (10K+):
This assumes, you have an AUTO_INCREMENT unique key called ID with a manageable number of holes. Use a amintenace script if needed (high delete ratio)
Use a shuffling function that is parameterized with e.g. the session ID to create a function rand_id(continuous_id)
If you need e.g. the records 100,000 to 100,009 calculate $a=array(rand_id(100,000), rand_id(100,001), ... rand_id(100,009));
$a=implode(',',$a);
$sql="SELECT foo FROM bar WHERE ID IN($a) ORDER BY FIELD(ID,$a)";
To take care of the holes in your ID select a few records too many (and throw away the exess), looping on too few records selected.
can someone please explain to me how to approach this issue?
I have a table with 4000 records, a select option and a search field to filter out the table
however I'm trying to figure out the best way to set up pagination.
1st Approach:
Should I make one query against the database and populate the table with all records and then set up the pagination with javascript so I can be able to search for records that are not only shown on the current page?
2nd Approach
Set up the pagination with PHP, make a query via ajax for each page? however I'm afraid that this approach would not allow me to filter out the table if I need to search for a record that is not on the current page.
1st approach: definitely not, loading large number of results for nothing is not recommended.
2nd approach makes more sense. You can use the LIMIT statement inside your SELECT statement to decide which records you want. Ex. if you paginate on 10 elements perpage, you could do
SELECT * FROM Table LIMIT 10
then
SELECT * FROM Table LIMIT 10,10
and so on. Indexes start at 0.
Note that you don't even need Ajax to do that, your next button can specify the offset for the next load of the page. It's a choice based on your knowledge, the size of the rest of the page (minimize load time), ...
I wrote a small script which uses the concept of long polling.
It works as follows:
jQuery sends the request with some parameters (say lastId) to php
PHP gets the latest id from database and compares with the lastId.
If the lastId is smaller than the newly fetched Id, then it kills the
script and echoes the new records.
From jQuery, i display this output.
I have taken care of all security checks. The problem is when a record is deleted or updated, there is no way to know this.
The nearest solution i can get is to count the number of rows and match it with some saved row count variable. But then, if i have 1000 records, i have to echo out all the 1000 records which can be a big performance issue.
The CRUD functionality of this application is completely separated and runs in a different server. So i dont get to know which record was deleted.
I don't need any help coding wise, but i am looking for some suggestion to make this work while updating and deleting.
Please note, websockets(my fav) and node.js is not an option for me.
Instead of using a certain ID from your table, you could also check when the table itself was modified the last time.
SQL:
SELECT UPDATE_TIME
FROM information_schema.tables
WHERE TABLE_SCHEMA = 'yourdb'
AND TABLE_NAME = 'yourtable';
If successful, the statement should return something like
UPDATE_TIME
2014-04-02 11:12:15
Then use the resulting timestamp instead of the lastid. I am using a very similar technique to display and auto-refresh logs, works like a charm.
You have to adjust the statement to your needs, and replace yourdb and yourtable with the values needed for your application. It also requires you to have access to information_schema.tables, so check if this is available, too.
Two alternative solutions:
If the solution described above is too imprecise for your purpose (it might lead to issues when the table is changed multiple times per second), you might combine that timestamp with your current mechanism with lastid to cover new inserts.
Another way would be to implement a table, in which the current state is logged. This is where your ajax requests check the current state. Then generade triggers in your data tables, which update this table.
You can get the highest ID by
SELECT id FROM table ORDER BY id DESC LIMIT 1
but this is not reliable in my opinion, because you can have ID's of 1, 2, 3, 7 and you insert a new row having the ID 5.
Keep in mind: the highest ID, is not necessarily the most recent row.
The current auto increment value can be obtained by
SELECT AUTO_INCREMENT FROM information_schema.tables
WHERE TABLE_SCHEMA = 'yourdb'
AND TABLE_NAME = 'yourtable';
Maybe a timestamp + microtime is an option for you?
I am making a simple message board in PHP with a MySQL database. I have limited messages to 20 a page with the 'LIMIT' operation.
An example of my URL is: http://www.example.com/?page=1
If page is not specified, it defaults to 1. As I mentioned earlier, there is a limit of 20 per page, however if that is out of a possible 30 and I wish to view page 2, I only end up with 10 results. In this case, the LIMIT part of my query resembles LIMIT 20,40 - How can I ensure in this case that 20 are returned?
I would prefer to try and keep this as much on the MySQL side as possible.
EDIT:
To clarify, if I am on page 2, I will be fetching rows 20-30, however this is only 10 rows, so I wish to select 10-30 instead.
EDIT:
I am currently using the following query:
My query:
SELECT MOD(COUNT(`ID`),20) AS lmt WHERE `threadID`=2;
SELECT * FROM `msg_messages` WHERE `threadID`=2 LIMIT 20-(20-lmt) , 40-(20-lmt) ;
There are 30 records this matches.
I'm not sure to really understand the question, but if I do I think that the best practive would be to prevent users to go to a page with no results. To do so, you can easily check how many rows you have in total even if you are using the "LIMIT" clause using SQL_CALC_FOUND_ROWS.
For example you could do:
Select SQL_CALC_FOUND_ROWS * from blog_posts where balblabla...
Then you have to run another query like this:
Select FOUND_ROWS() as posts_count
It will return the total rows very quickly. With this result and knowing the current page, you can decide if you display next/prev links to the user.
You could do a:
SELECT COUNT(*)/20 AS pages FROM tbl;
..to get the max number of pages, then work out if you're going to be left with a partial set and adjust your paging query accordingly.
This is a problem with a ordering search results on my website,
When a search is made, random results appear on the content page, this page includes pagination too. I user following as my SQL query.
SELECT * FROM table ORDER BY RAND() LIMIT 0,10;
so my questions are
I need to make sure that everytime user visits the next page, results they already seen not to appear again (exclude them in the next query, in a memory efficient way but still order by rand() )
everytime the visitor goes to the 1st page there is a different sets of results, Is it possible to use pagination with this, or will the ordering always be random.
I can use seed in the MYSQL, however i am not sure how to use that practically ..
Use RAND(SEED). Quoting docs: "If a constant integer argument N is specified, it is used as the seed value." (http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand).
In the example above the result order is rand, but it is always the same. You can just change the seed to get a new order.
SELECT * FROM your_table ORDER BY RAND(351);
You can change the seed every time the user hits the first results page and store it in the user session.
Random ordering in MySQL is as sticky a problem as they come. In the past, I've usually chosen to go around the problem whenever possible. Typically, a user won't ever come back to a set of pages like this more than once or twice. So this gives you the opportunity to avoid all of the various disgusting implementations of random order in favor of a couple simple, but not quite 100% random solutions.
Solution 1
Pick from a number of existing columns that already indexed for being sorted on. This can include created on, modified timestamps, or any other column you may sort by. When a user first comes to the site, have these handy in an array, pick one at random, and then randomly pick ASC or DESC.
In your case, every time a user comes back to page 1, pick something new, store it in session. Every subsequent page, you can use that sort to generate a consistent set of paging.
Solution 2
You could have an additional column that stores a random number for sorting. It should be indexed, obviously. Periodically, run the following query;
UPDATE table SET rand_col = RAND();
This may not work for your specs, as you seem to require every user to see something different every time they hit page 1.
First you should stop using the ORDER BY RAND syntax. This will bad for performance in large set of rows.
You need to manually determine the LIMIT constraints. If you still want to use the random results and you don't want users to see the same results on next page the only way is to save all the result for this search session in database and manipulate this information when user navigate to next page.
The next thing in web design you should understand - using any random data blocks on your site is very, very, very bad for users visual perception.
You have several problems to deal with! I recommend that you go step by step.
First issue: results they already seen not to appear again
Every item returned, store it in one array. (assuming the index id on the example)
When the user goes to the next page, pass to the query the NOT IN:
MySQL Query
SELECT * FROM table WHERE id NOT IN (1, 14, 25, 645) ORDER BY RAND() LIMIT 0,10;
What this does is to match all id that are not 1, 14, 25 or 645.
As far as the performance issue goes: in a memory efficient way
SELECT RAND( )
FROM table
WHERE id NOT
IN ( 1, 14, 25, 645 )
LIMIT 0 , 10
Showing rows 0 - 9 (10 total, Query took 0.0004 sec)
AND
SELECT *
FROM table
WHERE id NOT
IN ( 1, 14, 25, 645 )
ORDER BY RAND( )
LIMIT 0 , 10
Showing rows 0 - 9 (10 total, Query took 0.0609 sec)
So, don't use ORDER BY RAND(), preferably use SELECT RAND().
I would have your PHP generate your random record numbers or rows to retrieve, pass those to your query, and save a cookie on the user's client indicating what records they've already seen.
There's no reason for that user specific data to live on the server (unless you're tracking it, but it's random anyway so who cares).
The combination of
random ordering
pagination
HTTP (stateless)
is as ugly as it comes: 1. and 2. together need some sort of "persistent randomness", while 3. makes this harder to achieve. On top of this 1. is not a job a RDBMS is optimized to do.
My suggestion depends on how big your dataset is:
Few rows (ca. <1K):
select all PK values in first query (first page)
shuffle these in PHP
store shuffled list in session
for each page call select the data according to the stored PKs
Many rows (10K+):
This assumes, you have an AUTO_INCREMENT unique key called ID with a manageable number of holes. Use a amintenace script if needed (high delete ratio)
Use a shuffling function that is parameterized with e.g. the session ID to create a function rand_id(continuous_id)
If you need e.g. the records 100,000 to 100,009 calculate $a=array(rand_id(100,000), rand_id(100,001), ... rand_id(100,009));
$a=implode(',',$a);
$sql="SELECT foo FROM bar WHERE ID IN($a) ORDER BY FIELD(ID,$a)";
To take care of the holes in your ID select a few records too many (and throw away the exess), looping on too few records selected.