I'm working on an application that uses a scroll load system of 20 or so results loading into a feed at a time as you scroll. This feed consists of constantly added user generated content. Which means the result set can change per query that is offset by X.
So let's say we load 20 results, then scroll, another 20, and then before scrolling more to load the next 20, another user has uploaded a new piece of content, which effectively would present a duplicate in the next set of 20 results for the feed because we're using OFFSET to get additional results, and the total result set is getting shifted by 1 with this addition of new content that falls into the conditions of the query.
What is the best and most efficient way around this? We've dabbled with using the id of a row in a where condition to prevent duplicate results, and only using limit without offset for new results fetched.. so we could do WHERE id < 170 LIMIT 20, WHERE id < 150 LIMIT 20, WHERE id < 130 LIMIT 20, etc.. to control and prevent duplciates... HOWEVER, this does not work in every possible scenario as our result sets aren't always ordered with the id column ordered by DESC..
Soo.. what other options are there?..
Why are you using the where clause instead of limit with the offset option? Limit can take two arguments. The offset argument seems to do exactly what you want. For instance:
limit 100, 20
Takes 20 rows starting at the 101st row. Then:
limit 120, 20
Takes 20 rows starting at the 121st row. (The offsets start at 0 rather than 1 in MySQL counting.)
The one enhancement that you need to make is to ensure that the sort order for the records is stable. A stable sort is, essentially, one where there are no sort keys with the same value. To make this happen, just make the id column the last column in the sort. It is unique, so if there are any duplicate sort keys, then the addition of the id makes the sort stable.
You might want to try a database solution. On the initial request, create and populate a table. Use that table for the feed.
Make sure the tablename starts with something consistent, like TableToFeedMyApp, and ends with something guaranteed to make it unique. Then set up a scheduled job to find and drop all these tables that were created earlier than whatever you deem to be a certain interval.
Related
I'm trying to figure out the fastest way to get x rows from a table that are offset by x rows and ordered by a date column.
The problem I have is I'm paginating the rows from the query into pages of 10 rows per page, but I only need the nth page.
For example if I only need page 4 from the table, I need to select all the rows:
SELECT * FROM posts ORDER BY date
Then I need to paginate the array using PHP and get the 4th page (if it exists). This is less than ideal as it seems a waste to have to get the whole table.
Is there a better way to query the table in this situation?. For example if I have 10 posts per page and I want the the 4th page, is there a way to offset the query so it starts from the 30th row? (and ordered by date).
You're looking for LIMIT
SELECT * FROM posts ORDER BY date LIMIT 10, 20
Where 10 is offset and 20 is the number of rows
LIMIT is the answer. According to MySQL documentation,
The LIMIT clause can be used to constrain the number of rows returned
by the SELECT statement. LIMIT takes one or two numeric arguments,
which must both be nonnegative integer constants (except when using
prepared statements).
With two arguments, the first argument specifies the offset of the
first row to return, and the second specifies the maximum number of
rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
I have just started working on sphinx with php. Was just wondering is if i set limit to 20 records per call.
$cl->SetLimits ( 0, 20);
the index recreate is say set to 5 minutes with a --rotate option.
So if in my application i have to call the next 20 search results i call the command
$cl->SetLimits ( 20, 20);
Suppose the index is recreated in between the two setlimit calls. And say a new document is inserted with say the highest weight. (and i am sorting results by relevance.)
Wouldnt the search result shift by one position down so the earlier 20th record will now be the 21st record and so i again get the same result at the 21st position that i got in the 20th position & so my application will display a duplicate search result. Is this true..any body else got this problem.
Or how should I overcome this?
Thanks!
Edit (Note: The next setlimit command is called based on a user event say 'See more Results')
Yes, that can happen.
But usually happens so rarely that nobody notices.
About the only way to avoid it would be to store some sort of index with the query. So as well as a page number, you include a last id. Then when on the second page etc, use that id to exclude any new results created since the search started.
On the first page query, you lookup the biggest id in the index, need to run a second query for that.
(this at least copes with new additions to the index, but its harder to cope with changes to documents, but can be done in a similar way)
setLimit sets the offset on the result server side, http://php.net/manual/en/sphinxclient.setlimits.php.
So to answer your question, no, it will query the with max_matches and save a result set, from there you work with the result set and not the indexed data.
One question though, why are you indexing it every 5 minutes? It would be better just to re-index every time your data changes.
I'm creating a pagination with a where clause.
I need to know the true behaviour of the offset.
Is the offset calculated based on all the rows that match the where clause? Or
The offset is calculated like an id and all rows are considered. Eg. If you specify an offset of 5, rows will be returned starting from the 6th row in the table even of the first 4 rows don't match the where clause?
Edit: I want to be sure since the second behaviour would be totally incorrect and cause problems.
Thanks for your answers. I can't comment as my browser fails at javascript and ajax horribly.
Yes the offset is calculated based on all rows that matches the where clause. Just try it.
Are you talking about using the LIMIT clause? LIMIT puts a cap on the number of successful matches, not the total matches. The offset portion of limit is calculated from the matches rather than all eligible rows. MySQL will not necessarily scan rows in a given order and may not scan some rows at all, so it wouldn't short change you rows if a failed match had a lower index.
In php - how do I display 5 results from possible 50 randomly but ensure all results are displayed equal amount.
For example table has 50 entries.
I wish to show 5 of these randomly with every page load but also need to ensure all results are displayed rotationally an equal number of times.
I've spent hours googling for this but can't work it out - would very much like your help please.
please scroll down for "biased randomness" if you dont want to read.
In mysql you can just use SeleCT * From table order by rand() limit 5.
What you want just does not work. Its logically contradicting.
You have to understand that complete randomness by definition means equal distribution after an infinite period of time.
The longer the interval of selection the more evenly the distribution.
If you MUST have even distribution of selection for example every 24h interval, you cannot use a random algorithm. It is by definition contradicting.
It really depends no what your goal is.
You could for example take some element by random and then lower the possibity for the same element to be re-chosen at the next run. This way you can do a heuristic that gives you a more evenly distribution after a shorter amount of time. But its not random. Well certain parts are.
You could also randomly select from your database, mark the elements as selected, and now select only from those not yet selected. When no element is left, reset all.
Very trivial but might do your job.
You can also do something like that with timestamps to make the distribution a bit more elegant.
This could probably look like ORDER BY RAND()*((timestamps-min(timestamps))/(max(timetamps)-min(timestamps))) DESC or something like that. Basically you could normalize the timestamp of selection of an entry using the time interval window so it gets something between 0 and 1 and then multiply it by rand.. then you have 50% fresh stuff less likely selected and 50% randomness... i am not sure about the formular above, just typed it down. probably wrong but the principle works.
I think what you want is generally referred to as "biased randomness". there are a lot of papers on that and some articles on SO. for example here:
Biased random in SQL?
Copy the 50 results to some temporary place (file, database, whatever you use). Then everytime you need random values, select 5 random values from the 50 and delete them from your temporary data set.
Once your temporary data set is empty, create a new one copying the original again.
I am trying to implement the pagination in php. I am using the Mysql as back end database. I am trying to implement the pagination logic.
I would be having lots of record. But the user will see only 10 at a time.
Now to show the first page, i do a
SELECT * from USERS LIMIT 10.
Now to get the next 10 and the subsequent 10 records i am not able to write a query. Please help me fetch the in between records to support pagination logic. Also provide if any other suggestions for pagination.
You should use the OFFSET option.
SELECT * FROM Users LIMIT 10 OFFSET 10 (or 20, or 30);
That way you just pass the start position in the request when you hit next (or the page number) and you'll retrieve the records you want.
MySQL's limit feature can take two arguments:
select * from USERS limit 10,10
The above would retrieve 10 rows starting at row 10. Bear in mind that the MySQL row offset is 0 based, not 1. The first argument is the starting row, the second is the page size.
Also, if your page size is consistent, all you need to do is pass in the current page (default to zero). That would then allow you to specify the start row as a page * size.