MySQL query runs out of memory

MySQL query runs out of memory - php

I have a system in which a store is an account and customers shop in those stores. There is a table that stores many-to-many association of customers and stores. The key attributes of that table are accountid, customerid and last_visit_date. For a set of accountids, I need to find the most recent visit of each customer. I have a query that works perfectly but seems to be inefficient because it runs out of memory for about 21000 customers.
SELECT ac.customerId FROM account_customer ac
INNER JOIN (SELECT customerId, max(last_visit_date) AS
LastVisitDate FROM account_customer
WHERE accountId in
(311,307,318,320,321,322,323,332,347,439,519,630,634,643)
GROUP BY customerId) grouped_ac
ON ac.customerId = grouped_ac.customerId
AND ac.last_visit_date = grouped_ac.LastVisitDate
AND ac.last_visit_date <= '2016-10-18'
OR ac.last_visit_date is null
When I run the above query, it gives me the correct result for a smaller dataset but for larger dataset, I get memory error. I am not even talking about a very large set - just around 20,000 + customers.
Any help would be appreciated.

Do you possibly mean
ac.customerId = grouped_ac.customerId
AND ac.last_visit_date = grouped_ac.LastVisitDate
and (ac.last_visit_date <= '2016-10-18' or ac.last_visit_date is null)
I think without the parentheses, the query may be returning all records there the last_visit_date is null.
Take a look at the answer to How exactly does using OR in a MySQL statement differ with/without parentheses?.

Related

Mysql fetch from last to first - [many records]

i want to fetch records from mysql starting from last to first LIMIT 20. my database have over 1M records. I am aware of order by. but from my understanding when using order by its taking forever to load 20 records i have no freaking idea. but i think mysql fetch all the records before ordering.
SELECT bookings.created_at, bookings.total_amount,
passengers.name, passengers.id_number, payments.amount,
passengers.ticket_no,bookings.phone,bookings.source,
bookings.destination,bookings.date_of_travel FROM bookings
INNER JOIN passengers ON bookings.booking_id = passengers.booking_id
INNER JOIN payments on payments.booking_id = bookings.booking_id
ORDER BY bookings.booking_id DESC LIMIT 10

I suppose if you execute the query without the order by the time would be satisfactory?
You might try to create an index in the column your are ordering:
create index idx_bookings_booking_id on bookings(booking_id)

You can try to find out complexity of the Query using
EXPLAIN SELECT bookings.created_at, bookings.total_amount,
passengers.name, passengers.id_number, payments.amount,
passengers.ticket_no,bookings.phone,bookings.source,
bookings.destination,bookings.date_of_travel FROM bookings
INNER JOIN passengers ON bookings.booking_id = passengers.booking_id
INNER JOIN payments on payments.booking_id = bookings.booking_id
ORDER BY bookings.booking_id DESC LIMIT 10
then check the proper index has been created on the table
SHOW INDEX FROM `db_name`.`table_name`;
if the index us not there create proper index on all the table
please add if anything is missing

The index lookup table needs to be able to reside in memory, if I'm not mistaken (filesort is much slower than in-mem lookup).
Use small index / column size
For a double in capacity use UNSIGNED columns if you need no negative values..
Tune sort_buffer_size and read_rnd_buffer_size (maybe better on connection level, not global)
See https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html , particularly regarding using EXPLAIN and the maybe trying another execution plan strategy.

You seem to need another workaround like materialized views.
Tell me if this sounds like it:
Create another table like the booking table e.g. CREATE TABLE booking_short LIKE booking. Though you only need the booking_id column
And check your code for where exactly you create booking orders, e.g. where you first insert into booking. SELECT COUNT(*) FROM booking_short. If it is >20, delete the first record. Insert the new booking_id.
You can select the ID and join from there before joining for more details with the rest of the tables.
You won't need limit or sorting.
Of course, this needs heavy documentation to avoid maintenance problems.
Either that or https://stackoverflow.com/a/5912827/6288442

JOIN query too slow on real database, on small one it runs fine

I need help with this mysql query that executes too long or does not execute at all.
(What I am trying to do is a part of more complex problem, where I want to create PHP cron script that will execute few heavy queries and calculate data from the results returned and then use those data to store it in database for further more convenient use. Most likely I will make question here about that process.)
First lets try to solve one of the problems with these heavy queries.
Here is the thing:
I have table: users_bonitet. This table has fields: id, user_id, bonitet, tstamp.
First important note: when I say user, please understand that users are actually companies, not people. So user.id is id of some company, but for some other reasons table that I am using here is called "users".
Three key fields in users_bonitet table are: user_id ( referencing user.id), bonitet ( represents the strength of user, it can have 3 values, 1 - 2 - 3, where 3 is the best ), and tstamp ( stores the time of bonitet insert. Every time when bonitet value changes for some user, new row is inserted with tstamp of that insert and of course new bonitet value.). So basically some user can have bonitet of 1 indicating that he is in bad situation, but after some time it can change to 3 indicating that he is doing great, and time of that change is stored in tstamp.
Now, I will just list other tables that we need to use in query, and then I will explain why. Tables are: user, club, club_offer and club_territories.
Some users ( companies ) are members of a club. Member of the club can have some club offers ( he is representing his products to the people and other club members ) and he is operating on some territory.
What I need to do is to get bonitet value for every club offer ( made by some user who is member of a club ) but only for specific territory with id of 1100000; Since bonitet values are changing over time for each user, that means that I need to get the latest one only. So if some user have bonitet of 1 at 21.01.2012, but later at 26.05.2012 it has changed to 2, I need to get only 2, since that is the current value.
I made an SQL Fiddle with example db schema and query that I am using right now. On this small database, query is working what I want and it is fast, but on real database it is very slow, and sometimes do not execute at all.
See it here: http://sqlfiddle.com/#!9/b0d98/2
My question is: am I using wrong query to get all this data ? I am getting right result but maybe my query is bad and that is why it executes so slow ? How can I speed it up ? I have tried by putting indexes using phpmyadmin, but it didn't help very much.
Here is my query:
SELECT users_bonitet.user_id, users_bonitet.bonitet, users_bonitet.tstamp,
club_offer.id AS offerId, club_offer.rank
FROM users_bonitet
INNER JOIN (
SELECT max( tstamp ) AS lastDate, user_id
FROM users_bonitet
GROUP BY user_id
)lastDate ON users_bonitet.tstamp = lastDate.lastDate
AND users_bonitet.user_id = lastDate.user_id
JOIN users ON users_bonitet.user_id = users.id
JOIN club ON users.id = club.user_id
JOIN club_offer ON club.id = club_offer.club_id
JOIN club_territories ON club.id = club_territories.club_id
WHERE club_territories.territory_id = 1100000
So I am selecting bonitet values for all club offers made by users that are members of a club and operate on territory with an id of 1100000. Important thing is that I am selecting club_offer.id AS offerId, because I need to use that offerId in my application code so I can do some calculations based on bonitet values returned for each offer, and insert data that was calculated to the field "club_offer.rank" for each row with the id of offerId.

Your query looks fine. I suspect your query performance may be improved if you add a compound index to help the subquery that finds the latest entry from users_botinet for each user.
The subquery is:
SELECT max( tstamp ) AS lastDate, user_id
FROM users_bonitet
GROUP BY user_id
If you add (user_id, tstamp) as an index to this table, that subquery can be satisfied with a very efficient loose index scan.
ALTER TABLE users_bonitet ADD KEY maxfinder (user_id, tstamp);
Notice that if this users_botinet table had an autoincrementing id number in it, your subquery could be refactored to use that instead of tstamp. That would eliminate the possibility of duplicates and be even more efficient, because there's a unique id for joining. Like so.
FROM users_botinet
INNER JOIN (
SELECT MAX(id) AS id
FROM users_botinet
GROUP BY user_id
) ubmax ON users_botinet.id = ubmax.id
In this case your compound index would be (user_id, id.
Pro tip: Don't add lots of indexes unless you know you need them. It's a good idea to read up on how indexes can help you. For example. http://use-the-index-luke.com/

How to improve Mysql database performance without changing the db structure

I have a database that is already in use and I have to improve the performance of the system that's using this database.
There are 2 major queries running about 1000 times in a loop and this queries have inner joins to 3 other tables each. This in turn is making the system very slow.
I tried actually to remove the query from the loop and fetch all the data only once and process it in PHP. But this is putting to much load on the memory (RAM) and the system is hanging if 2 or more clients try to use the system.
There is a lot of data in the tables even after removing the expired data .
I have attached the query below.
Can anyone help me with this issue ?
select * from inventory
where (region_id = 38 or region_id = -1)
and (tour_opp_id = 410 or tour_opp_id = -1)
and room_plan_id = 141 and meal_plan_id = 1 and bed_type_id = 1 and hotel_id = 1059
and FIND_IN_SET(supplier_code, 'QOA,QTE,QM,TEST,TEST1,MQE1,MQE3,PERR,QKT')
and ( ('2014-11-14' between from_date and to_date) )
order by hotel_id desc ,supplier_code desc, region_id desc,tour_opp_id desc,inventory.inventory_id desc
SELECT * ,pinfo.fri as pi_day_fri,pinfoadd.fri as pa_day_fri,pinfochld.fri as pc_day_fri
FROM `profit_markup`
inner join profit_markup_info as pinfo on pinfo.profit_id = profit_markup.profit_markup_id
inner join profit_markup_add_info as pinfoadd on pinfoadd.profit_id = profit_markup.profit_markup_id
inner join profit_markup_child_info as pinfochld on pinfochld.profit_id = profit_markup.profit_markup_id
where profit_markup.hotel_id = 1059 and (`booking_channel` = 1 or `booking_channel` = 2)
and (`rate_region` = -1 or `rate_region` = 128)
and ( ( period_from <= '2014-11-14' and period_to >= '2014-11-14' ) )
ORDER BY profit_markup.hotel_id DESC,supplier_code desc, rate_region desc,operators_list desc, profit_markup_id DESC

Since we have not seen your SHOW CREATE TABLES; and EXPLAIN EXTENDED plan it is hard to give you 1 answer
But generally speaking in regard to your query "BTW I re-wrote below"
SELECT
hotel_id, supplier_code, region_id, tour_opp_id, inventory_id
FROM
inventory
WHERE
region_id IN (38, -1)
AND tour_opp_id IN (410, -1)
AND room_plan_id IN (141, 1)
AND bed_type_id IN (1, 1059)
AND supplier_code IN ('QOA', 'QTE', 'QM', 'TEST', 'TEST1', 'MQE1', 'MQE3', 'PERR', 'QKT')
AND ('2014-11-14' BETWEEN from_date AND to_date )
ORDER BY
hotel_id DESC, supplier_code DESC, region_id DESC, tour_opp_id DESC, inventory_id DESC
Do not use * to get all the columns. You should list the column that you really need. Using * is just a lazy way of writing a query. limiting the columns will limit the data size that is being selected.
How often is the records in the inventory are being updates/inserted/delete? If not too often then you can use consider using SQL_CACHE. However, caching a query will cause you problems if you use it and the inventory table is updated very often. In addition, to use query cache you must check the value of query_cache_type on your server. SHOW GLOBAL VARIABLES LIKE 'query_cache_type';. If this is set to "0" then the cache feature is disabled and SQL_CACHE will be ignored. If it is set to 1 then the server will cache all queries unless you tell it not too using NO_SQL_CACHE. If the option is set to 2 then MySQL will cache the query only where SQL_CACHE clause is used. here is documentation about query_cache_type
If you have an index on those following column in this order it will help you (hotel_id, supplier_code, region_id, tour_opp_id, inventory_id)
ALTER TABLE inventory
ADD INDEX (hotel_id, supplier_code, region_id, tour_opp_id, inventory_id);
If possible increase sort_buffer_size on your server as most likely you issue here is that your are doing too much sorting.
As for the second query "BTW I re-wrote below"
SELECT
*, pinfo.fri as pi_day_fri,
pinfoadd.fri as pa_day_fri,
pinfochld.fri as pc_day_fri
FROM
profit_markup
INNER JOIN
profit_markup_info AS pinfo ON pinfo.profit_id = profit_markup.profit_markup_id
INNER JOIN
profit_markup_add_info AS pinfoadd ON pinfoadd.profit_id = profit_markup.profit_markup_id
INNER JOIN
profit_markup_child_info AS pinfochld ON pinfochld.profit_id = profit_markup.profit_markup_id
WHERE
profit_markup.hotel_id = 1059
AND booking_channel IN (1, 2)
AND rate_region IN (-1, 128)
AND period_from <= '2014-11-14'
AND period_to >= '2014-11-14'
ORDER BY
profit_markup.hotel_id DESC, supplier_code DESC, rate_region DESC,
operators_list DESC, profit_markup_id DESC
Again eliminate the use of * from your query
Make sure that the following columns have the same type/collation and same size. pinfo.profit_id, profit_markup.profit_markup_id, pinfoadd.profit_id, pinfochld.profit_id and each one have to have an index on every table. If the columns have different types then MySQL will have to convert the data every time to join the records. Even if you have index it will be slower. Also, if those column are characters type (ie. VARCHAR()) make sure they are of the CHAR() with a collation of latin1_general_ci as this will be faster for finding ID, but if you are using INT() even better.
Use the 3rd and 4th trick I listed for the previous query
Try using STRAIGHT_JOIN "you must know what your doing here or it will bite you!" Here is a good thread about this When to use STRAIGHT_JOIN with MySQL
I hope this helps.

For the first query, I am not sure if you can do much (assuming you have already indexed the fields you are ordering by) apart from replacing the * with column names (Don't expect this to increase the performance drastically).
For the second query, before you go through the loop and put in selection arguments, you could create a view with all the tables joined and ordered then make a prepared statement to select from the view and bind arguments in the loop.
Also, if your php server and the database server are in two different places, it is better if you did the selection through a stored procedure in the database.
(If nothing works out, then memcache is the way to go... Although I have personally never done this)

Here you have increase query performance not an database performance.
For both queries first check index is available on WHERE and ON(Join) clause columns, if index is missing then you have to add index to improve query performance.
Check explain plane before create index.
If possible show me the explain plane of both query that will help us.

MySQL Query Optimization - Random Record

I'm having a terrible time with a MySQL query. I've spent most of my weekend and most of my day today attempting to make this query run a bit faster. I've made it considerably faster, but I know I can make it better.
SELECT m.id,other_fields,C.contacts_count FROM marketingDatabase AS m
LEFT OUTER JOIN
(SELECT COUNT(*) as contacts_count, rid
FROM contacts
WHERE status = 'Active' AND install_id = 'XXXX' GROUP BY rid) as C
ON C.rid = m.id
WHERE (RAND()*2612<50)
AND do_not_call != 'true'
AND `ACTUAL SALES VOLUME` >= '800000'
AND `ACTUAL SALES VOLUME` <= '1200000'
AND status = 'Pending'
AND install_id = 'XXXXX'
ORDER BY RAND()
I have an index on 'install_id', 'category' and 'status' but the EXPLAIN shows it was sorting based on 9100 rows.
My Explain is here:
https://s3.amazonaws.com/jas-so-question/Screen+Shot+2012-03-13+at+12.34.04+AM.png
Anybody have any suggestions on what I can do to make this a bit faster? The entire point of the query is to select a random record from an account's records (install_id) that matches certain criteria like sales volume, status and do_not_call. I'm currently gathering 25 records and caching it (using PHP) so I only have to run this query once every 25 requests, but I'm already dealing with thousands of requests per day. It currently takes 0.2 seconds to run. I realize that by using ORDER BY RAND() I'm already taking a major performance hit, but it's just sorting 25 rows.
Thanks in advance for the help.
**EDIT: I forgot to mention that the 'contact_sort' index is on the 'contacts' table, and indexes install_id, status, and rid. (rid references Record ID in marketingDatabase so it knows which record a contact belongs to.
**EDIT 2: The 2612 number in the query represents the number of rows in marketingDatabase that match the criteria (install_id, status, actual sales volume, etc.)

Since I do not see your index definitions, I am not sure they are correct. The query would benefit from the following indexes:
a composite index (install_id, status, rid) on the contacts
a composite index (install_id, status, `ACTUAL SALES VOLUME`) on marketingDatabase

I played around with a few queries, and I don't think you'll ever be able to get a indexed query to work with RAND(), especially when you're using it in both a WHERE clause and an ORDER BY clause. If at all possible, I'd introduce the random element in my PHP logic, and probably look at whether two simple queries made more sense than one fairly complex one. Added to that, you have LEFT OUTER JOIN on a random result set, which may also be increasing the amount of work that has to be done a lot.
In summary, my guess would be - rewrite to exclude RAND, see if you can get rid of the LEFT OUTER JOIN. Two straightforward indexed queries with a bit of PHP in between may be a lot better.

MySQL PHP | "SELECT FROM table" using "alphanumeric"-UUID. Speed vs. Indexed Integer / Indexed Char

At the moment, I select rows from 'table01 and table02' using:
SELECT t1.*,t2.* FROM table01 AS t1
INNER JOIN table02 AS t2 ON (t1.ID = t2.t1ID)
WHERE t1.UUID = 'whatever';
The UUID column is a unique index, type: char(15), with alphanumeric input. I know this isn't the fastest way to select data from the database, but the UUID is the only row-identifier that is available to the front-end.
Since I have to select by UUID, and not ID, I need to know what of these two options I should go for, if say the table consists of 100'000 rows. What speed differences would I look at, and would the index for the UUID grow to large, and lag the DB?
Get the ID before doing the "big" select
1. $id = SELECT ID FROM table01 WHERE UUID = '{alphanumeric character}';
2. SELECT t1.*,t2.* FROM table01 AS t1
INNER JOIN table02 AS t2 ON (t1.ID = t2.t1ID)
WHERE t1.ID = $id;
Or keep it the way it is now, using the UUID.
2. SELECT t1.*,t2.* FROM table01 AS t1
INNER JOIN table02 AS t2 ON (t1.ID = t2.t1ID)
WHERE t1.UUID = 'whatever';
Side note: All new rows are created by checking if the system generated uniqueid exists before trying to insert a new row. Keeping the column always unique.

Why not just try it out? Create a new db with those tables. Write a quick php script to populate the tables with more records than you can imagine being stored (if you're expecting 100k rows, insert 10 million). Then experiment with different indexes and queries (remember, EXPLAIN is your friend)...
When you finally get something you think works, put the query into a script on a webserver and hit it with ab (Apache Bench). You can watch what happens as you increase the concurrency of the requests (1 at a time, 2 at a time, 10 at a time, etc).
All this shouldn't take too long (maybe a few hours at most), but it will give you a FAR better answer than anyone at SO could for your specific problem (as we don't know your DB server config, exact schema, memory limits, etc)...

The second solution have the best performance. You will need to look up the row by the UUID in both solutions, but in the first solution you first do it by UUID, and then do a faster lookup by primary key, but then you've already found the right row by UUID so it doesn't matter that the second lookup is faster because the second lookup is unnecessary altogether.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.