I have an orders grid holding 1 million records. The page has pagination, sort and search options. So If the sort order is set by customer name with a search key and the page number is 1, it is working fine.
SELECT * FROM orders WHERE customer_name like '%Henry%' ORDER BY
customer_name desc limit 10 offset 0
It becomes a problem when the User clicks on the last page.
SELECT * FROM orders WHERE customer_name like '%Henry%' ORDER BY
customer_name desc limit 10 offset 100000
The above query takes forever to load. Index is set to the order id, customer name, date of order column.
I can use this solution https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/ if I don't have a non-primary key sort option, but in my case sorting is user selected. It will change from Order id, customer name, date of order etc.
Any help would be appreciated. Thanks.
Problem 1:
LIKE "%..." -- The leading wildcard requires a full scan of the data, or at least until it finds the 100000+10 rows. Even
... WHERE ... LIKE '%qzx%' ... LIMIT 10
is problematic, since there probably not 10 such names. So, a full scan of your million names.
... WHERE name LIKE 'James%' ...
will at least start in the middle of the table-- if there is an index starting with name. But still, the LIMIT and OFFSET might conspire to require reading the rest of the table.
Problem 2: (before you edited your Question!)
If you leave out the WHERE, do you really expect the user to page through a million names looking for something?
This is a UI problem.
If you have a million rows, and the output is ordered by Customer_name, that makes it easy to see the Aarons and the Zywickis, but not anyone else. How would you get to me (James)? Either you have 100K links and I am somewhere near the middle, or the poor user would have to press [Next] 'forever'.
My point is that the database is not the place to introduce efficiency.
In some other situations, it is meaningful to go to the [Next] (or [Prev]) page. In these situations, "remember where you left off", then use that to efficiently reach into the table. OFFSET is not efficient. More on Pagination
I use a special concept for this. First I have a table called pager. It contains an primary pager_id, and some values to identify a user (user_id,session_id), so that the pager data can't be stolen.
Then I have a second table called pager_filter. I consist of 3 ids:
pager_id int unsigned not NULL # id of table pager
order_id int unsigned not NULL # store the order here
reference_id int unsigned not NULL # reference into the data table
primary key(pager_id,order_id);
As first operation I select all records matching the filter rules from and insert them into pager_filter
DELETE FROM pager_filter WHERE pager_id = $PAGER_ID;
INSERT INTO pager_filter (pager_id,order_id,reference_id)
SELECT $PAGER_ID pager_id, ROW_NUMBER() order_id, data_id reference_id
FROM data_table
WHERE $CONDITIONS
ORDER BY $ORDERING
After filling the filter table you can use an inner join for pagination:
SELECT d.*
FROM pager_filter f
INNER JOIN data_table d ON d.data_id = f.reference id
WHERE f.pager_id = $PAGER_ID && f.order_id between 100000 and 100099
ORDER BY f.order_id
or
SELECT d.*
FROM pager_filter f
INNER JOIN data_table d ON d.data_id = f.reference id
WHERE f.pager_id = $PAGER_ID
ORDER BY f.order_id
LIMIT 100 OFFSET 100000
Hint: All code above is not tested pseudo code
Related
I apologize in advance if this is super simple for some, but I'm not quite sure how to phrase the question to get relevant search results/answers to it. I'm also new to this. I thank you for your time in advance to look at my question.
I have two tables:
#1 - quote_requests . This is where all data is saved once a customer submits a quote request. This has a primary id called id.
#2 - quote_messages . Here are all the replies for all quote_requests. Basically a chat back and forth between the client and the sales rep. There's a column called quote_id that identifies the quote_requests' column id
So what I do in PHP is first run this statement
SELECT * FROM `quote_requests` WHERE `archived` = 0 AND `owner_id` != 0 AND `owner_id` = 64 ORDER BY `id` DESC
Then I go through the results with a while in PHP, with the purpose of seeing who was the last person that replied to the messages on that particular quote request: was it the client or the sales rep?
SELECT `reply_as`, `member_id` FROM `quote_messages` WHERE `quote_id` = :quote_id ORDER BY ID DESC LIMIT 1
Now obviously this is very bad because it takes 40 seconds for the page to process.
My question is:
How do I combine these two select statements into one considering that the second select statement is tied into the results of the first one. quote_id of quote_messages being the same as id of quote_requests
Thank you so much!
Hmmm . . . your method might be fine if there are not too many quote requests.
So, I might start just by using indexes on the existing queries:
quote_requests(owner_id, archived, id desc)
quote_messages(quote_id, id desc)
However, if you are doing a loop in PHP (which your question is not really explicit about), then you might want to run just one query in the database instead of a loop.
If I understand correctly the one query would look like:
SELECT qq.*
FROM (SELECT qm.quote_id, qm.reply_as, qm.member_id,
ROW_NUMBER() OVER (PARTITION BY qm.quote_id ORDER BY qm.id DESC) as seqnum
FROM quote_requests qr JOIN
quote_messages qm
ON qr.quote_id = qm.quote_id
WHERE qr.archived = 0 AND qr.owner_id = 64
) qq
WHERE seqnum = 1;
And for this you want the same indexes above.
There are 2 solutions for this to replace the while loop
Fetch for all quotes in a single query
SELECT `reply_as`, `member_id`
FROM `quote_messages`
WHERE id IN (
SELECT MAX(id)
FROM `quote_messages`
WHERE `quote_id` IN (:quote_ids)
GROUP BY ID
) AS a
adding 2 columns in quote_requests which will maintain the latest reply_as, member_id
First a bit of background about the tables & DB.
I have a MySQL db with a few tables in:
films:
Contains all film/series info with netflixid as a unique primary key.
users:
Contains user info "ratingid" is a unique primary key
rating:
Contains ALL user rating info, netflixid and a unique primary key of a compound "netflixid-userid"
This statement works:
SELECT *
FROM films
WHERE
INSTR(countrylist, 'GB')
AND films.netflixid NOT IN (SELECT netflixid FROM rating WHERE rating.userid = 1)
LIMIT 1
but it takes longer and longer to retrieve a new film record that you haven't rated. (currently at 6.8 seconds for around 2400 user ratings on an 8000 row film table)
First I thought it was the INSTR(countrylist, 'GB'), so I split them out into their own tinyint columns - made no difference.
I have tried NOT EXISTS as well, but the times are similar.
Any thoughts/ideas on how to select a new "unrated" row from films quickly?
Thanks!
Try just joining?
SELECT *
FROM films
LEFT JOIN rating on rating.ratingid=CONCAT(films.netflixid,'-',1)
WHERE
INSTR(countrylist, 'GB')
AND rating.pk IS NULL
LIMIT 1
Or doing the equivalent NOT EXISTS.
I would recommend not exists:
select *
from films f
where
instr(countrylist, 'GB')
and not exists (
select 1 from rating r where r.userid = 1 and f.netflixid = r.netflixid
)
This should take advantage of the primary key index of the rating table, so the subquery executes quickly.
That said, the instr() function in the outer query also represents a bottleneck. The database cannot take advantage of an index here, because of the function call: basically it needs to apply the computation to the whole table before it is able to filter. To avoid this, you would probably need to review your design: that is, have a separate table to represent the relationship between movies and countries, which each tuple on a separate row; then, you could use another exists subquery to filter on the country.
The INSTR(countrylist, 'GB') could be changed on countrylist = 'GB' or countrylist LIKE '%GB%' if the countrylist contains more than the country.
Then don't select all '*' if you need only some columns details. Depends on the number of columns, the query could be really slow
Lets start by saying that I cant use INDEXING as I need the INSERT, DELETE and UPDATE for this table to be super fast, which they are.
I have a page that displays a summary of order units collected in a database table. To populate the table an order number is created and then individual units associated with that order are scanned into the table to recored which units are associated with each order.
For the purposes of this example the table has the following columns.
id, UID, order, originator, receiver, datetime
The individual unit quantities can be in the 1000's per order and the entire table is growing to hundreds of thousands of units.
The summary page displays the number of units per order and the first and last unit number for each order. I limit the number of orders to be displayed to the last 30 order numbers.
For example:
Order 10 has 200 units. first UID 1510 last UID 1756
Order 11 has 300 units. first UID 1922 last UID 2831
..........
..........
Currently the response time for the query is about 3 seconds as the code performs the following:
Look up the last 30 orders by by id and sort by order number
While looking at each order number in the array
-- Count the number of database rows that have that order number
-- Select the first UID from all the rows as first
-- Select the last UID from all the rows as last
Display the result
I've determined the majority of the time is taken by the Count of the number of units in each order ~1.8 seconds and then determining the first and last numbers in each order ~1 second.
I am really interested in if there is a way to speed up these queries without INDEXING. Here is the code with the queries.
First request selects the last 30 orders processed selected by id and grouped by order number. This gives the last 30 unique order numbers.
$result = mysqli_query($con, "SELECT order, ANY_VALUE(receiver) AS receiver, ANY_VALUE(originator) AS originator, ANY_VALUE(id) AS id
FROM scandb
GROUP BY order
ORDER BY id
DESC LIMIT 30");
While fetching the last 30 order numbers count the number of units and the first and last UID for each order.
while($row=mysqli_fetch_array($result)){
$count = mysqli_fetch_array(mysqli_query($con, "SELECT order, COUNT(*) as count FROM scandb WHERE order ='".$row['order']."' "));
$firstLast = mysqli_fetch_array(mysqli_query($con, "SELECT (SELECT UID FROM scandb WHERE orderNumber ='".$row['order']."' ORDER BY UID LIMIT 1) as 'first', (SELECT UID FROM barcode WHERE order ='".$row['order']."' ORDER BY UID DESC LIMIT 1) as 'last'"));
echo "<td align= center>".$count['count']."</td>";
echo "<td align= center>".$firstLast['first']."</td>";
echo "<td align= center>".$firstLast['last']."</td>";
}
With 100K lines in the database this whole query is taking about 3 seconds. The majority of the time is in the $count and $firstlast queries. I'd like to know if there is a more efficient way to get this same data in a faster time without Indexing the table. Any special tricks that anyone has would be greatly appreciated.
Design your database with caution
This first tip may seems obvious, but the fact is that most database problems come from badly-designed table structure.
For example, I have seen people storing information such as client info and payment info in the same database column. For both the database system and developers who will have to work on it, this is not a good thing.
When creating a database, always put information on various tables, use clear naming standards and make use of primary keys.
Know what you should optimize
If you want to optimize a specific query, it is extremely useful to be able to get an in-depth look at the result of a query. Using the EXPLAIN statement, you will get lots of useful info on the result produced by a specific query, as shown in the example below:
EXPLAIN SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column;
Don’t select what you don’t need
A very common way to get the desired data is to use the * symbol, which will get all fields from the desired table:
SELECT * FROM wp_posts;
Instead, you should definitely select only the desired fields as shown in the example below. On a very small site with, let’s say, one visitor per minute, that wouldn’t make a difference. But on a site such as Cats Who Code, it saves a lot of work for the database.
SELECT title, excerpt, author FROM wp_posts;
Avoid queries in loops
When using SQL along with a programming language such as PHP, it can be tempting to use SQL queries inside a loop. But doing so is like hammering your database with queries.
This example illustrates the whole “queries in loops” problem:
foreach ($display_order as $id => $ordinal) {
$sql = "UPDATE categories SET display_order = $ordinal WHERE id = $id";
mysql_query($sql);
}
Here is what you should do instead:
UPDATE categories
SET display_order = CASE id
WHEN 1 THEN 3
WHEN 2 THEN 4
WHEN 3 THEN 5
END
WHERE id IN (1,2,3)
Use join instead of subqueries
As a programmer, subqueries are something that you can be tempted to use and abuse. Subqueries, as show below, can be very useful:
SELECT a.id,
(SELECT MAX(created)
FROM posts
WHERE author_id = a.id)
AS latest_post FROM authors a
Although subqueries are useful, they often can be replaced by a join, which is definitely faster to execute.
SELECT a.id, MAX(p.created) AS latest_post
FROM authors a
INNER JOIN posts p
ON (a.id = p.author_id)
GROUP BY a.id
Source: http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
I want per day sales item count so for that one i already created query but it takes to much around 55.585s and query is
Query :
SELECT
td.db_date,
(
select count(*) from order as order where DATE(order.created_on) = td.db_date
)as day_contribute
FROM time_dimension as td
So can any one please let me know how may i optimized this query and reduce execution time.?
You can modify your query to join like:
SELECT
td.db_date, count(order.id) as day_contribute
FROM time_dimension as td
LEFT JOIN order ON DATE(order.created_on) = td.db_date
GROUP BY td.db_date;
I do not know your primary id key for table order - so used just "order.id". Replace it with your.
Also it is very important - test if you have index on td.db_date field.
And one more important thing - better to avoid using DATE(order.created_on). Because it is mean that DATE() method will be called each time when DB will compare dates. If it is possible - convert order.created_on to same format as td.db_date. Or join by other fields. That will add speed too.
First you should make sure you have index on created_on column in order table.
However if you have many records in time_dimension and many records in order table it might be hard to optimize the query, because for each record from time_dimension you need to search in order table.
You can also change count(*) into count(order_id) (assuming primary key in order table is order_id) or add extra column with date only in order table (created_on_date with date only and index on this column) so your query could look like this:
SELECT
td.db_date,
(
select count(order_id) from order where order.created_on_date = td.db_date
)as day_contribute
FROM time_dimension as td
However it's possible the execution time might be too high if you have many records in both tables, so it might be necessary to create one extra table where you hold number of orders for each day and update it in cron or when adding/updating/deleting records in order table
I have a voting system for articles. Articles are stored in 'stories' table and all votes are stored in 'votes' table. id in 'stories' table is equal to item_name in 'votes' table (therefore each vote is related to article with item_name).
I want to make it so when sum of votes gets to 10 it updates 'showing' field in 'stories' table to value of "1".
I was thinking about setting up a cron job that runs every hour to check all posts that have a showing = 0. If showing = 0 than it will sum up votes related to that article and set showing = 1 if sum of votes >= 10. I'm not sure if it is efficient as it might take up a lot of server resources, not sure.
So could anyone suggest a cron job that could do the task?
Here is my database structure:
Stories table
Votes table
Edit:
For example this row from 'stories' table:
id| 12
st_auth | author name
st_date | story date
st_title| story title
st_category| story category
st_body| story body
showing| 0 for unaproved and 1 for approved
This row is related to this one from 'votes' table
id| 83
item_name| 12 (id of article)
vote_value| 1 for upvote -1 for downvote
...
Couple of things:
Why did you name the column item_name in the votes table, when it is actually the id of the article table? I would recommend making this a match on the article table in that it is an int(11) vs a var_char(255). Also, you should add a foreign key constraint to the votes table, so if an article is ever deleted, you don't orphan a row in the votes table.
Why is the vote_value column an int(11)? If it can only be two states (1, or -1) you can do a tinyint(1) signed (for the -1).
The ip column in the votes table is a bit concerning. If you are regulating 'unique' votes by ip, did you account for proxy ips? Something like this should be handled at the account level, so several users from the same proxy IP can issue individual votes.
I wouldn't do a cronjob for determining whether the showing column should be flagged 0 or 1. Rather, I would issue a count every time a vote was cast against the article. So if someone up-voted or down-voted, calculate the new value of the story, and store it in cache for future reads.
Using this query, you get a list of all articles plus a column containing the count of associated votes.
SELECT s.*, SUM(v.vote_value) AS votes_total
FROM stories AS s INNER JOIN votes AS v
ON v.item_name = s.id
GROUP BY v.vote
This way, you can create a view from which you can filter on votes_total > 10, without need of the cron job.
Or you can use it as a normal query, something like this:
SELECT * FROM (
SELECT s.*, SUM(v.vote_value) AS votes_total
FROM stories AS s INNER JOIN votes AS v
ON v.item_name = s.id
GROUP BY v.vote
) WHERE votes_total > 10;
I would use a trigger (insert trigger) and handle your logic there (in the database itself)?
This would remove the poll code altogether (cron job).
I would also keep your foreign key (in VOTES) the same (at least the type) as the primary key (in STORIES)?
Using a trigger instead of polling will be much cleaner in the long run.
You don't specify your database, but in TSQL (for SQL Server) it could be close to this
CREATE TRIGGER myTrigger
ON VOTES
FOR INSERT
AS
DECLARE #I INT --HOLDS COUNT OF VOTES
DECLARE #IN VARCHAR(255) --HOLDS FK ID FOR LOOKUP INTO STORIES IF UPDATE REQUIRED
SELECT #IN = ITEM_NAME FROM INSERTED
SELECT #I = COUNT(*) FROM VOTES WHERE ITEM_NAME = #IN
IF (#I >= 10)
BEGIN
UPDATE STORIES SET SHOWING = 1 WHERE ID = #IN --This is why your PK/FK should be refactored
END