I am running a complex LEFT JOIN query of two tables.
Table A - 1.6 million rows
Table B - 700k rows.
All columns are indexed.
I tried different debuggings but had no success on finding the problem since I guess that's not too many data.
Anyway I found out that there is no problem if I remove the 'WHERE' clause in my query
But when I try this simple query on table A - it returns "Lost connection".
SELECT id FROM table_A ORDER BY id LIMIT 10
What is the best practice to run this query? I don't wish to exceed the timeout.
Are my tables too big and should I "empty" the old data or something?
How do you handle big tables with millions of rows and JOINS? All I know that can help is indexing, and I've already done that.
A million rows -- not a problem; a billion rows -- then it gets interesting. Your tables are not "too big".
"All columns are indexed." -- Usually a mistake. We need to see the actual query before commenting on what index(es) would be useful.
Possibly you need a "composite" index.
SELECT id FROM table_A ORDER BY id LIMIT 10 -- If there is an index starting with id, that will return nearly instantly. Please provide SHOW CREATE TABLE table_A so we can see the schema.
Related
I'm building a sales system for a company, but I'm stuck with the following issue.
Every day I load .XML productfeed into a database called items. The rows in the productfeed are never in the same order, so sometimes the row with Referentie = 380083 is at the very top, and the other day that very same row is at the very bottum.
I also have to get all the instock value, but when I run the following query
SELECT `instock` FROM SomeTable WHERE `id` > 0
I get all values, but not in the same order as in the other table.
So I have to get the instock value of all rows where referentie in table A is the same as it is in table B.
I already have this query:
select * from `16-11-23 wed 09:37` where `referentie` LIKE '4210310AS'
and this query does the right job, but I have like 500 rows in the table.
So I need to find a way to automate the: LIKE '4210310AS' bit, so it selects all 500 values in one go.
Can somebody tell me how that can be done?
I'm not even sure I understand your problem...
Don't take this personally, but you seem to be concerned/confused by the ordering of the data in the tables which suggests to me your understanding of relational databases and SQL is lacking. I suggest you brush up on the basics.
Can't you just use the following query?
SELECT a.referentie
, b.instock
FROM tableA a
, tableB b
WHERE b.referentie = a.referentie
I'm using PDO, and I need to know how many rows are returned with a SELECT statement. My question is, is the following slower, the same, or faster than doing it in two queries? PHPMyAdmin will tell me how long just the SELECT statement, takes, but not just the COUNT statement, so I'm having trouble telling how long a query takes.
Query in question:
SELECT *, (SELECT COUNT(*) from table) AS count FROM table
Faster, same or slower than splitting it into two queries?
Thanks.
You can write this query as:
SELECT t.*, const.totalcount
FROM table t cross join
(select count(*) as totalcount from table) const;
This may or may not be faster than running two queries. Two queries involve "query running" overhead -- compiling the query, transmitting the data back and forth. This adds another column, so it increases the total amount of data in the result set.
Two queries is going to be faster. What you have is a dependent subquery, it's going to run for every record in the parent. If it's a MyISAM table, the subquery will be very fast and you may not notice it with a small number of records.
Do an EXPLAIN on it and see what MySQL reports back.
I have a tableA this contains the following structure
I modified this structure into tableB like below to reduce number of rows and the category is fixed length
Assume I have 21 lakh data in tableA after modified into new structure tableB contains 70k rows only
In some case I want to SUM all the values into the table,
QUERY1: SELECT SUM(val) AS total FROM tableA;
vs
QUERY2: SELECT SUM(cate1+cate2+cate3) AS total FROM tableB;
QUERY1 is executing faster while comparing to QUERY2.
tableB contains less rows while comparing to tableA
As of my expectation QUERY2 is faster but QUERY1 is the fastest one.
Help me to understand why the performance is reduced in QUERY2?
MySQL is optimized to speed up relational operations. There is not so much effort at speeding up the other kinds of operations MySQL can perform. Cate1+Cate2+Cate3 is a perfectly legitimate operation, but there's nothing particularly relational about it.
Table1 is actually simpler in terms of the relational model of data than Table2, even though Table1 has more rows. It's worth noting in passing that Table1 conforms to first normal form but Table2 does not. Those three columns are really a repeating group even though it's been made to look like they are not.
So First Normal form is good for you in terms of performance (most of the time).
In your first query, mysql just need to do the summation. (1 process)
In your second query, mysql first need an arithmetic addition along three columns , then do a summation through the results.(2 process).
I have to select 4 rows randomly from a column.
Is is better to generate randomly 4 id and to perform 4 requests 'select column from database where id = ... '
Or to select all the rows in one request and to choose after?
If you are capable of generating random existing id's, I think the best approach is to use a clause like where id in (id1, id2, id3, id4). This will result in getting 4 records in one query, so no unnecessary query's or records are fetched.
As told before, where id in (id1, id2, id3, id4) is the fastest way from the MySQL perspective. How ever, you will need some logic in the application generating those IDs : All 4 IDs shall exist, be randomly distributed, and you want to avoid duplicates. In worst case you will be retrieving a list of all existend IDs with a huge query, extracting 4 random values, and querying again.
With all that logic to be done, it can be wise to move selection into MySQL:
SELECT * FROM foobar
ORDER BY RAND()
LIMIT 4;
You must understand that this is slow in mysql, but you have a speed gain in the application logic and can be sure to get random values equally seed all over your table.
EDIT:
The comment asks if PHP is fasten in this task then MySQL. Answer is no.
It is not done by "using rand". You need to have an array containing all those IDs in PHP. That is a huge query, lots of TCP traffic, huge array to be buildt in php, huge btree to be buildt by zend engine. Then, with the IDs, you must fire a second query to get the rows for those IDs.
Although the RAND() function may be slow, so far I have not had significant problems with speed. MY strategy is actually to join the database back to a query of itself returning a list of random IDs with a limit.
SELECT *
FROM table AS t1
JOIN (
SELECT rowID
FROM table
ORDER BY RAND()
LIMIT 4
) AS t2
WHERE t1.rowID = t2.rowID
There is also a more robust solution that exist - try checking out this question (asked in 2010).
I'm trying to optimize a report query run on an ecommerce site. I'm pretty sure that I'm doing something stupid, since this query shouldn't be taking nearly as long to run as it does.
The query in question is:
SELECT inventories_name, inventories_code, SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price, inventories_categories_name,
inventories_price_list, inventories_id
FROM shop_orders
LEFT JOIN shop_orders_inventories ON (shop_orders_id = join_shop_orders_id)
LEFT JOIN inventories ON (join_inventories_id = inventories_id)
WHERE {$date_type} BETWEEN '{$start_date}' AND '{$end_date}'
AND shop_orders_x_response_code = 1
GROUP BY join_inventories_id, join_shop_categories_id
{$order}
{$limit}
It's basically trying to get total sales per item over a period of time; values in curly brackets are filled in via a form. It works fine for a period of a couple days, but querying a time interval of a week or more can take 30 seconds+.
I feel like it's joining way too many rows in order to calculate the aggregate values and sucking up huge amounts of memory, but I'm not sure how to limit it.
Note - I realize that I'm selecting fields which aren't in the group by, but they correspond 1-1 with inventory ID, which is in the group by.
Any suggestions?
-- Edit --
The current indices are:
inventories:
join_categories - BTREE
inventories_name, inventories_code, inventories_description - FULLTEXT
shop_orders_inventories:
shop_orders_inventories_id - BTREE
shop_orders:
shop_orders_id - BTREE
Two sequential left joins will work quite long on a big table. Try to use "join" instead of "left join" (unless you have records in shop_orders with now matching records in shop_orders_inventories or inventories) or split this query to couple of small ones. Also by using "sum" and "group by" you are forcing MySQL to create temp tables - you might want to increase MySQL cache so those tables would fit in to memory (otherwise MySQL will dump them to disk which will also increase SQL execution time).
The first and foremost rule to indexing is... index the columns that you will search on!
For each possible value of {$date_type}, create an index for that date column.
Once you have lots of data in the table (say 2 years or 100 weeks), a single week's data is 1% of the index, so it becomes a good starting point.
Even though MySQL allows non-aggregates in the SELECT clause, I personally would sync the two
SELECT inventories_name, inventories_code,
SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price,
inventories_categories_name, inventories_price_list, inventories_id
FROM ...
GROUP BY inventories_id, join_shop_categories_id, inventories_name,
inventories_code, inventories_categories_name, inventories_price_list
...