I wonder what is better for performance and programming style for a simple thing: get the count for 2 values from one table (all below queries do the same job).
Make 2 single queries:
SELECT count(*) FROM `a` WHERE categories_id=2
SELECT count(*) FROM `a` WHERE group_id=92
or use subquery
SELECT (SELECT count(*) FROM `a` WHERE categories_id=2 AS categories)
,(SELECT count(*) FROM `a` WHERE group_id=92) AS groups)
or union
SELECT count(*) FROM `a` WHERE categories_id=2
UNION
SELECT count(*) FROM `a` WHERE group_id=92
The main difference between the three is the handling of the result values, though that is not traumatic.
The first example returns the two values in two separate fetch operations (on separate statements).
The second example returns the two values as part of a single fetch operation.
The third example returns the two values in two separate fetch operations (on the same statement).
Performance-wise, with just two rows of data, there is very little to choose between the three. The second (two sub-query) solution does the most with a single statement, and only requires a single fetch operation, so it might be the quickest. The first requires separate parsing of two statements, plus two sets of operations, so it should be the slowest. But whether you can truly measure that depends on lots of factors. If the client is in Australia and the server is in Europe, then the round-trip latency is likely to mean that the second or third solution is best (and the difference may depend on whether the DBMS returns multiple rows with a single client-server message exchange). If the client is on the same machine as the server, then the round-trip latency is much less critical.
For ease of understanding, the UNION version is probably sufficiently clean; it won't confuse anyone reading it. The first version might be slightly cleaner (one keyword less) but the difference is minimal.
If the number of alternatives increases (more than one group_id value, or more than one categories_value), then I think the UNION wins on clarity:
SELECT 'G' AS type, group_id, COUNT(*)
FROM a
WHERE group_id IN (92, 104, 137, 291)
GROUP BY type, group_id
UNION
SELECT 'C' AS type, categories_id, COUNT(*)
FROM a
WHERE categories_id IN (2, 3, 13, 17, 19, 21)
GROUP BY type, categories_id
The 'type' column allows you to distinguish between a group ID and a category ID that share the same ID number (albeit that they are two different sorts of ID).
Because it is easier to expand, I'd probably go with option 3 (UNION) unless there was compelling timing experiments on live data to show that option 2 (sub-queries) was in fact quicker.
The first option, doing two SELECTs, will always be slightly less efficient as it involves an extra round trip to the database. Between the second two, the union version will in theory be ever so slightly slower as the UNION will cause the database to have to sort the values and make the union. In practice, and for only two values, this isn't going be measurable against the time doing the two main parts of the query.
Related
We have records with a count field on an unique id.
The columns are:
mainId = unique
mainIdCount = 1320 (this 'views' field gets a + 1 when the page is visited)
How can you insert all these mainIdCount's as seperate records in another table IN ANOTHER DBASE in one query?
Yes, I do mean 1320 times an insert with the same mainId! :-)
We actually have records that go over 10,000 times an id. It just has to be like this.
This is a weird one, but we do need the copies of all these (just) counts like this.
The most straightforward way to this is with a JOIN operation between your table, and another row source that provides a set of integers. We'd match each row from our original table to as many rows from the set of integer as needed to satisfy the desired result.
As a brief example of the pattern:
INSERT INTO newtable (mainId,n)
SELECT t.mainId
, r.n
FROM mytable t
JOIN ( SELECT 1 AS n
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) r
WHERE r.n <= t.mainIdCount
If mytable contains row mainId=5 mainIdCount=4, we'd get back rows (5,1),(5,2),(5,3),(5,4)
Obviously, the rowsource r needs to be of sufficient size. The inline view I've demonstrated here would return a maximum of five rows. For larger sets, it would be beneficial to use a table rather than an inline view.
This leads to the followup question, "How do I generate a set of integers in MySQL",
e.g. Generating a range of numbers in MySQL
And getting that done is a bit tedious. We're looking forward to an eventual feature in MySQL that will make it much easier to return a bounded set of integer values; until then, having a pre-populated table is the most efficient approach.
I'm using PDO, and I need to know how many rows are returned with a SELECT statement. My question is, is the following slower, the same, or faster than doing it in two queries? PHPMyAdmin will tell me how long just the SELECT statement, takes, but not just the COUNT statement, so I'm having trouble telling how long a query takes.
Query in question:
SELECT *, (SELECT COUNT(*) from table) AS count FROM table
Faster, same or slower than splitting it into two queries?
Thanks.
You can write this query as:
SELECT t.*, const.totalcount
FROM table t cross join
(select count(*) as totalcount from table) const;
This may or may not be faster than running two queries. Two queries involve "query running" overhead -- compiling the query, transmitting the data back and forth. This adds another column, so it increases the total amount of data in the result set.
Two queries is going to be faster. What you have is a dependent subquery, it's going to run for every record in the parent. If it's a MyISAM table, the subquery will be very fast and you may not notice it with a small number of records.
Do an EXPLAIN on it and see what MySQL reports back.
I have a tableA this contains the following structure
I modified this structure into tableB like below to reduce number of rows and the category is fixed length
Assume I have 21 lakh data in tableA after modified into new structure tableB contains 70k rows only
In some case I want to SUM all the values into the table,
QUERY1: SELECT SUM(val) AS total FROM tableA;
vs
QUERY2: SELECT SUM(cate1+cate2+cate3) AS total FROM tableB;
QUERY1 is executing faster while comparing to QUERY2.
tableB contains less rows while comparing to tableA
As of my expectation QUERY2 is faster but QUERY1 is the fastest one.
Help me to understand why the performance is reduced in QUERY2?
MySQL is optimized to speed up relational operations. There is not so much effort at speeding up the other kinds of operations MySQL can perform. Cate1+Cate2+Cate3 is a perfectly legitimate operation, but there's nothing particularly relational about it.
Table1 is actually simpler in terms of the relational model of data than Table2, even though Table1 has more rows. It's worth noting in passing that Table1 conforms to first normal form but Table2 does not. Those three columns are really a repeating group even though it's been made to look like they are not.
So First Normal form is good for you in terms of performance (most of the time).
In your first query, mysql just need to do the summation. (1 process)
In your second query, mysql first need an arithmetic addition along three columns , then do a summation through the results.(2 process).
I have a SQL query that has 4 UNIONS and 4 LEFT JOINS. It is layed out as such:
SELECT ... FROM table1
LEFT JOIN other_table1
UNION SELECT ... FROM table2
LEFT JOIN other_table2
UNION SELECT ... other_table3
LEFT JOIN other_table3
UNION SELECT ... FROM table4
LEFT JOIN other_table4
Would it be better to run 4 separate queries and then merge the results with php after the fact? Or should I keep them together? Which would provide that fastest execution?
The most definitive answer is to test each method, however the UNION is most likely to be faster as only one query is run by MySQL as opposed to 4 for each part of the union.
You also remove the overhead of reading the data into memory in PHP and concatenating it. Instead, you can just do a while() or foreach() or whatever on one result.
In this case, it depends on the number of records you are going to get out of the result. Since you are using left join in all unions, I suggest to do different fetch to avoid bottleneck in SQL and merge the results in PHP
When a query is executed from a programming language, following steps occur
A connection is created to between application and database (or an existing connection is used from pool)
Query is sent to database
Database sends the result back
Connection is released to pool
If you are running N number of queries, above steps happen N number of times, which you can guess will definitely slow down the process. So ideally we should keep number of queries to as minimum as possible.
It will make sense to break a query into multiple parts if single query becomes complex and it gets difficult to maintain and takes a lot of time to execute. In that case too, good way will be to optimize the query itself.
As in your case, query is pretty simple, and as someone has pointed out that union will also help removing duplicate rows, the best way is to go for sql query than php code. Try optimization techniques like creating proper indexes on tables.
The UNION clause can be faster, because it will return distinct records at once (duplicated records won't be returned), otherwise you will need to do it in the application. Also, in this case it may help to reduce a traffic.
From the documentation:
The default behavior for UNION is that duplicate rows are removed from
the result. The optional DISTINCT keyword has no effect other than the
default because it also specifies duplicate-row removal. With the
optional ALL keyword, duplicate-row removal does not occur and the
result includes all matching rows from all the SELECT statements.
You can mix UNION ALL and UNION DISTINCT in the same query. Mixed UNION
types are treated such that a DISTINCT union overrides any ALL union
to its left. A DISTINCT union can be produced explicitly by using
UNION DISTINCT or implicitly by using UNION with no following DISTINCT
or ALL keyword.
I have three tables, each contain some common information, and some information that is unique to the table.
For example: uid, date are universal among the tables, but one table can contain a column type while the other contains currency.
I need to query the database and get the last 20 entries (date DESC) that have been entered in all three tables.
My options are:
Query the database once, with one large query, containing three UNION ALL clauses, and pass along fake values for columns, IE:
FROM (
SELECT uid, date, currency, 0, 0, 0
and later on
FROM (
SELECT uid, date, 0, type, 0, 0
This would leave me with allot of null-valued fields..
OR I can query the database three times, and somehow within PHP sort through the information to get the combined latest 20 posts. This would leave me with an excess of information - 60 posts to look through (LIMIT 20) * 3 - and force me to preform some type of addtional quicksort every time.
What option is better/any alternate ideas?
Thanks.
Those two options are more similar than you make it sound.
When you perform the single large query with UNIONs, MySQL will still be performing three separate queries, just as you propose doing in your alternative plan, and then combining them into a single result.
So, you can either let MySQL do the filtering (and LIMIT) for you, or you can do it yourself. Given that choice, letting MySQL do all the work sounds far preferable.
Having extra columns in the result set could theoretically hinder performance, but with so small a result set as your 20 rows, I wouldn't expect it to have any detectable impact.
It all depends of how big your tables are. If each table has a few thousands records, you can go with the first solution (UNION), and you'll be fine.
On bigger tables, I'd probably go with the second solution, mostly because it will use much less ressources (RAM) than the UNION way, and still be reasonably fast.
But I would advise you to think about your data model, and maybe optimize it. The fact you have to use UNION-based queries usually means there's room for optimization, typically by merging the three tables, with an added "type" field (names isn't good at all, but you see my point).
if you know your limits you can limit each query and had union only run on little data. this should be better as mysql will return only 20 rows and will make the sorting faster then you can in php...
select * from (
SELECT uid, date, currency, 0, 0, 0 from table_a order by date desc limit 20
union
SELECT uid, date, 0, type, 0, 0 from table_b order by date desc limit 20
...
) order by date desc limit 20