I have three tables, each contain some common information, and some information that is unique to the table.
For example: uid, date are universal among the tables, but one table can contain a column type while the other contains currency.
I need to query the database and get the last 20 entries (date DESC) that have been entered in all three tables.
My options are:
Query the database once, with one large query, containing three UNION ALL clauses, and pass along fake values for columns, IE:
FROM (
SELECT uid, date, currency, 0, 0, 0
and later on
FROM (
SELECT uid, date, 0, type, 0, 0
This would leave me with allot of null-valued fields..
OR I can query the database three times, and somehow within PHP sort through the information to get the combined latest 20 posts. This would leave me with an excess of information - 60 posts to look through (LIMIT 20) * 3 - and force me to preform some type of addtional quicksort every time.
What option is better/any alternate ideas?
Thanks.
Those two options are more similar than you make it sound.
When you perform the single large query with UNIONs, MySQL will still be performing three separate queries, just as you propose doing in your alternative plan, and then combining them into a single result.
So, you can either let MySQL do the filtering (and LIMIT) for you, or you can do it yourself. Given that choice, letting MySQL do all the work sounds far preferable.
Having extra columns in the result set could theoretically hinder performance, but with so small a result set as your 20 rows, I wouldn't expect it to have any detectable impact.
It all depends of how big your tables are. If each table has a few thousands records, you can go with the first solution (UNION), and you'll be fine.
On bigger tables, I'd probably go with the second solution, mostly because it will use much less ressources (RAM) than the UNION way, and still be reasonably fast.
But I would advise you to think about your data model, and maybe optimize it. The fact you have to use UNION-based queries usually means there's room for optimization, typically by merging the three tables, with an added "type" field (names isn't good at all, but you see my point).
if you know your limits you can limit each query and had union only run on little data. this should be better as mysql will return only 20 rows and will make the sorting faster then you can in php...
select * from (
SELECT uid, date, currency, 0, 0, 0 from table_a order by date desc limit 20
union
SELECT uid, date, 0, type, 0, 0 from table_b order by date desc limit 20
...
) order by date desc limit 20
Related
We have records with a count field on an unique id.
The columns are:
mainId = unique
mainIdCount = 1320 (this 'views' field gets a + 1 when the page is visited)
How can you insert all these mainIdCount's as seperate records in another table IN ANOTHER DBASE in one query?
Yes, I do mean 1320 times an insert with the same mainId! :-)
We actually have records that go over 10,000 times an id. It just has to be like this.
This is a weird one, but we do need the copies of all these (just) counts like this.
The most straightforward way to this is with a JOIN operation between your table, and another row source that provides a set of integers. We'd match each row from our original table to as many rows from the set of integer as needed to satisfy the desired result.
As a brief example of the pattern:
INSERT INTO newtable (mainId,n)
SELECT t.mainId
, r.n
FROM mytable t
JOIN ( SELECT 1 AS n
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) r
WHERE r.n <= t.mainIdCount
If mytable contains row mainId=5 mainIdCount=4, we'd get back rows (5,1),(5,2),(5,3),(5,4)
Obviously, the rowsource r needs to be of sufficient size. The inline view I've demonstrated here would return a maximum of five rows. For larger sets, it would be beneficial to use a table rather than an inline view.
This leads to the followup question, "How do I generate a set of integers in MySQL",
e.g. Generating a range of numbers in MySQL
And getting that done is a bit tedious. We're looking forward to an eventual feature in MySQL that will make it much easier to return a bounded set of integer values; until then, having a pre-populated table is the most efficient approach.
I have to select 4 rows randomly from a column.
Is is better to generate randomly 4 id and to perform 4 requests 'select column from database where id = ... '
Or to select all the rows in one request and to choose after?
If you are capable of generating random existing id's, I think the best approach is to use a clause like where id in (id1, id2, id3, id4). This will result in getting 4 records in one query, so no unnecessary query's or records are fetched.
As told before, where id in (id1, id2, id3, id4) is the fastest way from the MySQL perspective. How ever, you will need some logic in the application generating those IDs : All 4 IDs shall exist, be randomly distributed, and you want to avoid duplicates. In worst case you will be retrieving a list of all existend IDs with a huge query, extracting 4 random values, and querying again.
With all that logic to be done, it can be wise to move selection into MySQL:
SELECT * FROM foobar
ORDER BY RAND()
LIMIT 4;
You must understand that this is slow in mysql, but you have a speed gain in the application logic and can be sure to get random values equally seed all over your table.
EDIT:
The comment asks if PHP is fasten in this task then MySQL. Answer is no.
It is not done by "using rand". You need to have an array containing all those IDs in PHP. That is a huge query, lots of TCP traffic, huge array to be buildt in php, huge btree to be buildt by zend engine. Then, with the IDs, you must fire a second query to get the rows for those IDs.
Although the RAND() function may be slow, so far I have not had significant problems with speed. MY strategy is actually to join the database back to a query of itself returning a list of random IDs with a limit.
SELECT *
FROM table AS t1
JOIN (
SELECT rowID
FROM table
ORDER BY RAND()
LIMIT 4
) AS t2
WHERE t1.rowID = t2.rowID
There is also a more robust solution that exist - try checking out this question (asked in 2010).
I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?
You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.
Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound
First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.
For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4
I wonder what is better for performance and programming style for a simple thing: get the count for 2 values from one table (all below queries do the same job).
Make 2 single queries:
SELECT count(*) FROM `a` WHERE categories_id=2
SELECT count(*) FROM `a` WHERE group_id=92
or use subquery
SELECT (SELECT count(*) FROM `a` WHERE categories_id=2 AS categories)
,(SELECT count(*) FROM `a` WHERE group_id=92) AS groups)
or union
SELECT count(*) FROM `a` WHERE categories_id=2
UNION
SELECT count(*) FROM `a` WHERE group_id=92
The main difference between the three is the handling of the result values, though that is not traumatic.
The first example returns the two values in two separate fetch operations (on separate statements).
The second example returns the two values as part of a single fetch operation.
The third example returns the two values in two separate fetch operations (on the same statement).
Performance-wise, with just two rows of data, there is very little to choose between the three. The second (two sub-query) solution does the most with a single statement, and only requires a single fetch operation, so it might be the quickest. The first requires separate parsing of two statements, plus two sets of operations, so it should be the slowest. But whether you can truly measure that depends on lots of factors. If the client is in Australia and the server is in Europe, then the round-trip latency is likely to mean that the second or third solution is best (and the difference may depend on whether the DBMS returns multiple rows with a single client-server message exchange). If the client is on the same machine as the server, then the round-trip latency is much less critical.
For ease of understanding, the UNION version is probably sufficiently clean; it won't confuse anyone reading it. The first version might be slightly cleaner (one keyword less) but the difference is minimal.
If the number of alternatives increases (more than one group_id value, or more than one categories_value), then I think the UNION wins on clarity:
SELECT 'G' AS type, group_id, COUNT(*)
FROM a
WHERE group_id IN (92, 104, 137, 291)
GROUP BY type, group_id
UNION
SELECT 'C' AS type, categories_id, COUNT(*)
FROM a
WHERE categories_id IN (2, 3, 13, 17, 19, 21)
GROUP BY type, categories_id
The 'type' column allows you to distinguish between a group ID and a category ID that share the same ID number (albeit that they are two different sorts of ID).
Because it is easier to expand, I'd probably go with option 3 (UNION) unless there was compelling timing experiments on live data to show that option 2 (sub-queries) was in fact quicker.
The first option, doing two SELECTs, will always be slightly less efficient as it involves an extra round trip to the database. Between the second two, the union version will in theory be ever so slightly slower as the UNION will cause the database to have to sort the values and make the union. In practice, and for only two values, this isn't going be measurable against the time doing the two main parts of the query.
I'm trying to optimize a report query run on an ecommerce site. I'm pretty sure that I'm doing something stupid, since this query shouldn't be taking nearly as long to run as it does.
The query in question is:
SELECT inventories_name, inventories_code, SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price, inventories_categories_name,
inventories_price_list, inventories_id
FROM shop_orders
LEFT JOIN shop_orders_inventories ON (shop_orders_id = join_shop_orders_id)
LEFT JOIN inventories ON (join_inventories_id = inventories_id)
WHERE {$date_type} BETWEEN '{$start_date}' AND '{$end_date}'
AND shop_orders_x_response_code = 1
GROUP BY join_inventories_id, join_shop_categories_id
{$order}
{$limit}
It's basically trying to get total sales per item over a period of time; values in curly brackets are filled in via a form. It works fine for a period of a couple days, but querying a time interval of a week or more can take 30 seconds+.
I feel like it's joining way too many rows in order to calculate the aggregate values and sucking up huge amounts of memory, but I'm not sure how to limit it.
Note - I realize that I'm selecting fields which aren't in the group by, but they correspond 1-1 with inventory ID, which is in the group by.
Any suggestions?
-- Edit --
The current indices are:
inventories:
join_categories - BTREE
inventories_name, inventories_code, inventories_description - FULLTEXT
shop_orders_inventories:
shop_orders_inventories_id - BTREE
shop_orders:
shop_orders_id - BTREE
Two sequential left joins will work quite long on a big table. Try to use "join" instead of "left join" (unless you have records in shop_orders with now matching records in shop_orders_inventories or inventories) or split this query to couple of small ones. Also by using "sum" and "group by" you are forcing MySQL to create temp tables - you might want to increase MySQL cache so those tables would fit in to memory (otherwise MySQL will dump them to disk which will also increase SQL execution time).
The first and foremost rule to indexing is... index the columns that you will search on!
For each possible value of {$date_type}, create an index for that date column.
Once you have lots of data in the table (say 2 years or 100 weeks), a single week's data is 1% of the index, so it becomes a good starting point.
Even though MySQL allows non-aggregates in the SELECT clause, I personally would sync the two
SELECT inventories_name, inventories_code,
SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price,
inventories_categories_name, inventories_price_list, inventories_id
FROM ...
GROUP BY inventories_id, join_shop_categories_id, inventories_name,
inventories_code, inventories_categories_name, inventories_price_list
...