Optimizing ORDER BY LIMIT queries in MySQL

Optimizing ORDER BY LIMIT queries in MySQL - php

In my web app I made an internal messaging system. I want to place a 'previous' and a 'next' link on each page (where the user viewing the message).
In order to get the next and previous id I execute two queries:
For the previous one:
SELECT id FROM pages WHERE (id<$requestedPageId) ORDER BY id DESC LIMIT 1
And for the next one:
SELECT id FROM pages WHERE (id>$requestedPageId) ORDER BY id LIMIT 1
EXPLAIN says the the query type is "range" and the rows column says it would examine all rows that has smaller or bigger id than the page's id (a big number). The Extra row says "Using where".
It seems MySQL ignores that I want only one row. Isn't MySQL smart enough to optimize this kind of query so it would find the row for the page and search for the first matching row back/forth?
Is there a better approach to get the next and previous page's id?
Additional notes:
This problem seems to exist on every ORDER BY LIMIT type queries (eg.: when I split a long list to multiple pages.).
Where clause is not this simple (I want to let the user access the next/previous page he has permission to access. No joins though.)
All columns appear in WHERE are indexed (id is the primary key)
variables are protected against injection.
EDIT1:
So the query I'm using currently:
SELECT id
FROM reports
WHERE (id<$requestedPageId) AND ((isPublic=1) OR (recipientId=$recipient))
ORDER BY id DESC
LIMIT 1
Or when I re-factor it as the answer said:
SELECT MAX(id)
FROM reports
WHERE (id<$requestedPageId) AND ((isPublic=1) OR (recipientId=$recipient))

For the previous
SELECT MAX(id) FROM pages WHERE id<$requestPageId
And for the next
SELECT MIN(id) FROM pages WHERE id>$requestedPageId

The database is behaving as expected. Your query is a range query because of the less-than symbol (id < $requestedPageId). The OR statement makes it harder to use a single index to find the results. And, sorting the results means it has to get all matching rows to perform the sort, even though you only want 1 row.
You're not going to be able to make this a "const" type query, but you may be able to optimize it using indexes, sub-queries, and/or union statements.
Here is one query to rule them all. I'm not saying this is the best solution, but just one way of approaching the problem. To start, this query will work better if you create two indexes, one on recipientId and another on isPublic.
SELECT
GREATEST(
( SELECT MAX( id ) FROM reports
WHERE id < $requestedPageId AND recipientId = $recipient ),
( SELECT MAX( id ) FROM reports
WHERE id < $requestedPageId AND isPublic = 1 )
) AS prev_id
LEAST(
( SELECT MIN( id ) FROM reports
WHERE id > $requestedPageId AND recipientId = $recipient ),
( SELECT MIN( id ) FROM reports
WHERE id > $requestedPageId AND isPublic = 1 )
) AS next_id

Related

Joining two mysql select statements where the second statement uses a column in first

I apologize in advance if this is super simple for some, but I'm not quite sure how to phrase the question to get relevant search results/answers to it. I'm also new to this. I thank you for your time in advance to look at my question.
I have two tables:
#1 - quote_requests . This is where all data is saved once a customer submits a quote request. This has a primary id called id.
#2 - quote_messages . Here are all the replies for all quote_requests. Basically a chat back and forth between the client and the sales rep. There's a column called quote_id that identifies the quote_requests' column id
So what I do in PHP is first run this statement
SELECT * FROM `quote_requests` WHERE `archived` = 0 AND `owner_id` != 0 AND `owner_id` = 64 ORDER BY `id` DESC
Then I go through the results with a while in PHP, with the purpose of seeing who was the last person that replied to the messages on that particular quote request: was it the client or the sales rep?
SELECT `reply_as`, `member_id` FROM `quote_messages` WHERE `quote_id` = :quote_id ORDER BY ID DESC LIMIT 1
Now obviously this is very bad because it takes 40 seconds for the page to process.
My question is:
How do I combine these two select statements into one considering that the second select statement is tied into the results of the first one. quote_id of quote_messages being the same as id of quote_requests
Thank you so much!

Hmmm . . . your method might be fine if there are not too many quote requests.
So, I might start just by using indexes on the existing queries:
quote_requests(owner_id, archived, id desc)
quote_messages(quote_id, id desc)
However, if you are doing a loop in PHP (which your question is not really explicit about), then you might want to run just one query in the database instead of a loop.
If I understand correctly the one query would look like:
SELECT qq.*
FROM (SELECT qm.quote_id, qm.reply_as, qm.member_id,
ROW_NUMBER() OVER (PARTITION BY qm.quote_id ORDER BY qm.id DESC) as seqnum
FROM quote_requests qr JOIN
quote_messages qm
ON qr.quote_id = qm.quote_id
WHERE qr.archived = 0 AND qr.owner_id = 64
) qq
WHERE seqnum = 1;
And for this you want the same indexes above.

There are 2 solutions for this to replace the while loop
Fetch for all quotes in a single query
SELECT `reply_as`, `member_id`
FROM `quote_messages`
WHERE id IN (
SELECT MAX(id)
FROM `quote_messages`
WHERE `quote_id` IN (:quote_ids)
GROUP BY ID
) AS a
adding 2 columns in quote_requests which will maintain the latest reply_as, member_id

Is there a way to identify which data has been selected with the sql statement

I have a sql statement:
$feed=$conn->prepare("SELECT * FROM posts WHERE post_by=? OR id=? ORDER BY id DESC LIMIT 10");
$feed->bind_param("ii",$friend['id'],$like[id]);
$feed->execute();
The $friend['id'] is the id of a user's friend, $like[id] is the id of a like by the user's friend.
The posts fetched with this query appear in a page.
What I want is I want to know which all posts have been posted by the user's friends (Which all posts have been fetched using $friends['id']) and which all posts have been liked by the user's friends and appear in the feed(Which all posts have been fetched using $like['id'])
I want to know all possibilities I can try to achieve what I want.
I have tried varying my query with UNION ALL but it shows errors and I could'nt achieve what I want.
Currently there are no errors but I want the user to know how this post appeared in the newsfeed.
Hope you all get a good idea about my question and all types of hacks are also accepted as I want in someway to achieve the result I would also agree to change mt query nature.
Please comment for more info.
Thanks in advance.

SELECT *, post_by = ?postId AS post_by_friend
FROM posts
WHERE post_by = ?postId OR
id = ?friendId
ORDER BY id DESC
LIMIT 10
post_by_friend will be 1 if it matched the first condition, otherwise 0. I haven't benchmarked it, but this method should be faster than StuartLC's UNION suggestion.

What you can do is break the query up on its 'OR' clause into a UNION of two separate queries, and add a marker column to indicate whether the row was found by friend or by like:
SELECT *
FROM
(
SELECT *, 'Friend' AS HowFound
FROM posts
WHERE post_by= ?postId
UNION
SELECT *, 'Like' AS HowFound
FROM posts
WHERE id= ?friendId AND post_by <> ?postId
) x
ORDER BY id DESC
LIMIT 10;
You'll want to exclude rows which match both friend and post classifications from one of the selects, otherwise it will be reported twice (or, otherwise your app will need to combine them).
I'm no PHP guru, but I'm sure there is a way to name the parameters to allow the above exclusion.
The derived table is needed to order and restrict the overall result.

How would I duplicate MySQL Delete using an offset in Elastic Search?

I have a MySQL script that takes a database query and cuts off a certain amount of rows depending on some settings. So if I have a user with a subscription of 100,000 things, and the user uploads 110,000, the script cuts off the last 10,000.
Here is the MySQL script:
DELETE FROM `my_table`
WHERE id <= (
SELECT id
FROM (
SELECT id
FROM `my_table`
WHERE some_id = $this->id
ORDER BY id DESC
LIMIT 1 OFFSET $max
) sp
Where max is 100,000
Which will delete any extra, I have since started implementing Elastic Search, and I am up to trying to duplicate this functionality but I don't know where to start because I am not that versed with this software just yet.
I have been looking at the deleteByQuery method in the PHP API, but I don't see anything about offsets or anything like that.
Can someone point me in the right direction?

Try this one, it will delete extra records
DELETE FROM my_table WHERE id IN (
SELECT id
WHERE some_id = $this->id
ORDER BY id ASC
LIMIT $maxRecordsAllowed, $countHowManyToDelete
)

I need to select newest rows from a MySQL database, but verify that I am also returning a row with a given ID

I'm new to this, sorry if the title is confusing. I am building a simple php/mysql gallery of sorts. It will show the newest 25 entries when a user first goes to it, and also allows off-site linking to individual items in the list. If the URL contains an ID, javascript will scroll to it. But if there are 25+ entries, it's possible that my query will fetch the newest results, but omit an older entry that happens to be in the URL as an ID.
That means I need to do something like this...
SELECT * FROM `submissions` WHERE uid='$sid'
But after that has successfully found the submission with the special ID, also do
SELECT * FROM `submissions` ORDER BY `id` DESC LIMIT 0, 25`
So that I can populate the rest of the gallery.
I could query that database twice, but I am assuming there's some nifty way to avoid that. MySQL is also ordering everything (based on newest, views, and other vars) and using two queries would break that.

You could limit across a UNION like this:
(SELECT * FROM submissions WHERE uid = '$uid')
UNION
(SELECT * FROM submissions WHERE uid <> '$uid' ORDER BY `id` LIMIT 25)
LIMIT 25
Note LIMIT is listed twice as in the case that the first query returns a result, we would have 26 results in the union set. This will also place the "searched for" item first in the returned sort result set (with the other 24 results displayed in sort order). If this is not desirable, you could place an ORDER BY across the union, but your searched for result would be truncated if it happened to be the 26th record.
If you need 25 rows with all of them being sorted, my guess is that you would need to do the two query approach (limiting second query to either 24 or 25 records depending on whether the first query matched), and then simply insert the uid-matched result into the sorted records in the appropriate place before display.

I think the better solution is:
SELECT *
FROM `submissions`
order by (case when usid = $sid then 0 else 1 end),
id desc
limit 25
I don't think the union is guaranteed to return results in the order of the union (there is no guarantee in the standard or in other databases).

Getting random results from large tables

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?

You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.

Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound

First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.

For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Optimizing ORDER BY LIMIT queries in MySQL - php

For the previous SELECT MAX(id) FROM pages WHERE id<$requestPageId And for the next SELECT MIN(id) FROM pages WHERE id>$requestedPageId

Related

Joining two mysql select statements where the second statement uses a column in first

Is there a way to identify which data has been selected with the sql statement

How would I duplicate MySQL Delete using an offset in Elastic Search?

I need to select newest rows from a MySQL database, but verify that I am also returning a row with a given ID

Getting random results from large tables

Categories

Resources