Is my JOIN + GROUP BY ... HAVING COUNT query correct?

Is my JOIN + GROUP BY ... HAVING COUNT query correct? - php

I'm new to SQL and I want to implement the following query:
I've got two tables, LicenseTbl and UnlockTbl:
LicenseTbl contains information about a purchased software license:
LicenseID, ProgramID, Owner, Location, OrderNo, BlockTime
UnlockTbl contains information about a specific software registration:
UnlockID, LicenseID (foreign key into LicenseTbl), Timestamp, SerialNo, Key, UninstallTime
where BlockTime and UninstallTime contain a timestamp if the license was blocked or the software uninstalled and NULL otherwise.
I want to devise a query that gives me ALL LicenseIDs for which the following conditions hold:
belongs to a given customer,
is not blocked,
is either not listed in the UnlockTbl or there are < X different SerialNo's in lines which are not marked as uninstalled.
I have written this, but I'm not sure if it is absolutely correct (it's one of my first SQL queries ever):
SELECT LicenseID FROM LicenseTbl
JOIN UnlockTbl
ON (LicenseTbl.LicenseID = UnlockTbl.LicenseID)
WHERE LicenseTbl.OrderNo = '$givenOrderNo'
AND LicenseTbl.Owner = '$givenOwner'
AND LicenseTbl.Location = '$givenLocation'
AND LicenseTbl.BlockTime IS NULL
AND UnlockTbl.UninstallTime IS NULL
GROUP BY LicenseTbl.LicenseID, UnlockTbl.Key
HAVING COUNT(*) < $X
(which is supposed to mean, list all licenses which have only been used less than X times simultaneously. I would prefer those that have been used the least first but don't know how to sort like that.)

This is a good start, but I would change the query to the following...
SELECT
LicenseID
FROM
LicenseTbl
LEFT JOIN
UnlockTbl
ON UnlockTbl.LicenseID = LicenseTbl.LicenseID
AND UnlockTbl.UninstallTime IS NULL
WHERE
LicenseTbl.OrderNo = '$givenOrderNo'
AND LicenseTbl.Owner = '$givenOwner'
AND LicenseTbl.Location = '$givenLocation'
AND LicenseTbl.BlockTime IS NULL
GROUP BY
LicenseTbl.LicenseID
HAVING
COUNT(DISTINCT UnlockTbl.SerialNo) < $X
ORDER BY
COUNT(DISTINCT UnlockTbl.SerialNo)
1). LEFT JOIN
A LEFT JOIN ensures that all rows in LicenseTbl are returned, even if there are no matches in the UnlockTbl table. (If there are no matches, the UnlockTbl table's values are all represented as NULL.)
2). UnlockTbl.UninstallTime IS NULL in the JOIN and not the WHERE
The WHERE clause is applied after the JOIN. This means that any records in UnlockTbl where UninstallTime have a real value (NOT NULL) get joined and then get filtered out. This in turn means that if all the relevant records in UnlockTbl have a non-NULL value in UninstallTime, all the rows for that License will get filtered.
3). GROUP BY on just the license, not the Key.
Simply, I don't know why you had it there, and it doesn't appear in the English description of what you want.
As you want a list of LicenseIDs, grouping by only that field ensures that you get one record per LicenseID.
4). HAVING clause modified to look at COUNT(DISTINCT SerialNo)
COUNT(*) counts all records. Even if there was no match (All the UnlockTbl values appearing as NULL), this would return 1.
COUNT(SerialNo) counts only records where SerialNo is NOT NULL. If there was no match (All the UnlockTbl values appearing as NULL), this would return 0.
COUNT(DISTINCT SerialNo) also counts only records where SerialNo is NOT NULL, but treats duplicates of the sme value as just 1 entry.
5). ORDER BY COUNT(DISTINCT SerialNo)
Takes the same value as is being filtered in the HAVING clause, and orders by it.

Related

Speed-up/Optimise MySQL statement - finding a new row that hasn't been selected before

First a bit of background about the tables & DB.
I have a MySQL db with a few tables in:
films:
Contains all film/series info with netflixid as a unique primary key.
users:
Contains user info "ratingid" is a unique primary key
rating:
Contains ALL user rating info, netflixid and a unique primary key of a compound "netflixid-userid"
This statement works:
SELECT *
FROM films
WHERE
INSTR(countrylist, 'GB')
AND films.netflixid NOT IN (SELECT netflixid FROM rating WHERE rating.userid = 1)
LIMIT 1
but it takes longer and longer to retrieve a new film record that you haven't rated. (currently at 6.8 seconds for around 2400 user ratings on an 8000 row film table)
First I thought it was the INSTR(countrylist, 'GB'), so I split them out into their own tinyint columns - made no difference.
I have tried NOT EXISTS as well, but the times are similar.
Any thoughts/ideas on how to select a new "unrated" row from films quickly?
Thanks!

Try just joining?
SELECT *
FROM films
LEFT JOIN rating on rating.ratingid=CONCAT(films.netflixid,'-',1)
WHERE
INSTR(countrylist, 'GB')
AND rating.pk IS NULL
LIMIT 1
Or doing the equivalent NOT EXISTS.

I would recommend not exists:
select *
from films f
where
instr(countrylist, 'GB')
and not exists (
select 1 from rating r where r.userid = 1 and f.netflixid = r.netflixid
)
This should take advantage of the primary key index of the rating table, so the subquery executes quickly.
That said, the instr() function in the outer query also represents a bottleneck. The database cannot take advantage of an index here, because of the function call: basically it needs to apply the computation to the whole table before it is able to filter. To avoid this, you would probably need to review your design: that is, have a separate table to represent the relationship between movies and countries, which each tuple on a separate row; then, you could use another exists subquery to filter on the country.

The INSTR(countrylist, 'GB') could be changed on countrylist = 'GB' or countrylist LIKE '%GB%' if the countrylist contains more than the country.
Then don't select all '*' if you need only some columns details. Depends on the number of columns, the query could be really slow

Retail inventory Mysql query optimization

Given the following tables for a retail administration system:
STORES: store_id, name
PRODUCTS: product_id, name, cost
PRODUCT_ENTRIES: key, store_id, date
PRODUCT_ENTRIES_CONTENT: product_entries_key, product_id, quantity
PRODUCT_EXITS: key, store_id, product_id, quantity, status, date
SALES: key, store_id, date
SALES_CONTENT: sales_key, product_id, quantity
RETURNS: key, store_id, date
RETURNS_CONTENT: returns_key, product_id, quantity
In order to calculate stock values I run through the contents of the products table and for each product_id:
Sum quantities of product_entries_content as well as returns_content
Subtract quantities of product_exits_content (where status = 2 or 3) as well as sales_content
To calculate the cost of the inventory of each store, I'm running the following query through a PHP loop for each distinct store and outputting the result:
SELECT
SUM((((
(SELECT COALESCE(SUM(product_entries_content.quantity), 0)
FROM product_entries
INNER JOIN product_entries_content ON
product_entries_content.product_entries_key = product_entries.key
WHERE product_entries_content.product_id = products.id
AND product_entries.store_id = '.$row['id'].'
AND DATE(product_entries.date) <= DATE(NOW()))
-
(SELECT COALESCE(SUM(quantity), 0)
FROM sales_content
INNER JOIN sales ON sales.key = sales_content.sales_key
WHERE product_id = products.product_id AND sales.store_id = '.$row['id'].'
AND DATE(sales_content.date) <= DATE(NOW()))
+
(SELECT COALESCE(SUM(quantity), 0)
FROM returns_content
INNER JOIN returns ON returns.key = returns_content.returns_key
WHERE product_id = products.product_id AND returns.store_id = '.$row['id'].'
AND DATE(returns.date) <= DATE(NOW()))
-
(SELECT COALESCE(SUM(quantity), 0)
FROM product_exits
WHERE product_id = products.product_id AND (status = 2 OR status = 3)
AND product_exits.store_id = '.$row['id'].' #store_id
AND DATE(product_exits.date) <= DATE(NOW()))
) * products.cost) / 100) ) AS "'.$row['key'].'" #store_name
FROM products WHERE 1
All foreign keys and indexes are properly set. The problem is because of the large amount of stores and movements in each store the query is becoming increasingly heavy, and because inventory is calculated from the beginning of each store's history it only gets slower with time.
What could I do to optimize this scheme?

Ideally, SHOW CREATE TABLE tablename for each table would really help a lot in any optimization question. The data type of each column is EXTREMELY important to performance.
That said, from the information you've given the following should be helpful, assuming the column data types are all appropriate.
Add the following indexes, if they do not exist. IMPORTANT: Single column indexes are NOT valid replacements for the following composite indexes. You stated that
All foreign keys and indexes are properly set.
but that tells us nothing about what they are, and if they are "proper" for optimization.
New indexes
ALTER TABLE sales
CREATE INDEX `aaaa` (`store_id`,`key`)
ALTER TABLE sales_content
CREATE INDEX `bbbb` (`product_id`,`sales_key`,`date`,`quantity`)
ALTER TABLE returns
CREATE INDEX `cccc` (`store_id`,`date`,`sales_key`)
ALTER TABLE returns_content
CREATE INDEX `dddd` (`product_id`,`returns_key`,`quantity`)
ALTER TABLE product_exits
CREATE INDEX `eeee` (`product_id`,`status`,`store_id`,`date`,`quantity`)
ALTER TABLE product_entries
CREATE INDEX `ffff` (`store_id`,`date`,`key`)
ALTER TABLE product_entries_content
CREATE INDEX `gggg` (`product_id`,`product_entries_key`,`quantity`)
(Use more appropriate names than aaaa. I just used those to save time.)
Each of the above indexes will allow the database to read only one row for each table. Most performance issues involving joins comes from what is known as a double lookup.
Understanding indexes and double lookups
An index is just a copy of the table data. Each column listed in the index is copied from the table, in the order listed in the index, and then the primary key is appended to that row in the index. When the database uses an index to look up a value, if not all the information is contained in the index, the primary key will be used to access the clustered index of the table to obtain the rest of the information. This is what a double look up is, and it is VERY bad for performance.
Example
All the above indexes are designed to avoid double lookups. Let's look at the second subquery to see how the indexes related to that query will work.
ALTER TABLE sales
CREATE INDEX `aaaa` (`store_id`,`key`)
ALTER TABLE sales_content
CREATE INDEX `bbbb` (`product_id`,`sales_key`,`date`,`quantity`)
Subquery (I added aliases and adjusted how the date column is accessed, but otherwise it is unchanged):
SELECT COALESCE(SUM(sc.quantity), 0)
FROM sales_content sc
INNER JOIN sales s
ON s.key = sc.sales_key
WHERE sc.product_id = p.product_id
AND s.store_id = '.$row['id'].'
AND sc.date < DATE_ADD(DATE(NOW()), INTERVAL 1 DAY)
Using the aaaa index, the database will be able to look up only those rows in the sales table that match the store_id, since that is listed first in the index. Think of this in the same way as a phone book, where store_id is the last name, and key is the first name. If you have the last name, then it is EXTREMELY easy to flip to that point of the phone book, and quickly get all the first names that go with that last name. Likewise, the database is able to very quickly "flip" to the part of the index that contains the given store_id value, and find all the key values. In this case, we do not need the primary key at all (which would be the phone number, in the phone book example.)
So, done with the sales table, and we have all the key values we need from there.
Next, the database moves onto the bbbb index. We already have product_id from the main query, and we have the sales_key from the aaaa index. That is like having both first and last name in the phone book. The only thing left to compare is the date, which could be like the address in a phone book. The database will store all the dates in order, and so by giving it a cutoff value, it can just look at all the dates up to a certain point.
The last part of the bbbb index is the quantity, which is there so that the database can quickly sum up all those quantities. To see why this is fast, consider again the phone book. Imagine in addition to last name, first name, and address information, that there is also a quantity column (of something, it doesn't matter what). If you wanted the sum of the quantities for a specific last name, first name, and for all addresses that start with the number 5 or less, that is easy, isn't it? Just find the first one, and add them up in order until you reach the first address that starts with a number greater than 5. The database benefits the same way when using the date column in this way (date is like the address column, in this example.)
The date columns
Finally, I noted earlier, I changed how the date column was accessed. You never want to run a function on a database column that you are comparing to another value. The reason is this: What would happen if you had to convert all the addresses into roman numerals, before you did any comparison? You wouldn't be able to just go down the list like we did earlier. You'd have to convert ALL the values, and THEN check each one to make sure it was within the limit, since we no longer know if the values are sorted correctly to just be able to do the "read them all and then stop at a certain value" shortcut I described above.
You and I may know that converting a datetime value to a date isn't going to change the order, but the database will not know (it might be possible it optimizes this conversion, but that's not something I want to assume.) So, keep the columns pure. The change I made was to just take the NOW() date, and add one day, and then make it a < instead of a <=. After all, comparing two values and saying the date must be equal to or less than today's date is equivalent to saying the datetime must be less than tomorrow's date.
The query
Below is my final query for you. As stated, not much has changed other than the date change and aliases. However, you had a typo in the first subquery where you accessed products.id. I corrected the id to be product_id, given that that matches what you stated were the columns for the products table.
SELECT
SUM(
(
(
(
(
SELECT COALESCE(SUM(pec.quantity), 0)
FROM product_entries pe
INNER JOIN product_entries_content pec
ON pec.product_entries_key = pe.key
WHERE pec.product_id = p.product_id
AND pe.store_id = '.$row['id'].'
AND pe.date < DATE_ADD(DATE(NOW()), INTERVAL 1 DAY)
)
-
(
SELECT COALESCE(SUM(sc.quantity), 0)
FROM sales_content sc
INNER JOIN sales s
ON s.key = sc.sales_key
WHERE sc.product_id = p.product_id
AND s.store_id = '.$row['id'].'
AND sc.date < DATE_ADD(DATE(NOW()), INTERVAL 1 DAY)
)
+
(
SELECT COALESCE(SUM(rc.quantity), 0)
FROM returns_content rc
INNER JOIN returns r
ON r.key = rc.returns_key
WHERE rc.product_id = p.product_id
AND r.store_id = '.$row['id'].'
AND r.date < DATE_ADD(DATE(NOW()), INTERVAL 1 DAY)
)
-
(
SELECT COALESCE(SUM(pex.quantity), 0)
FROM product_exits pex
WHERE pex.product_id = p.product_id
AND (pex.status = 2 OR pex.status = 3)
AND pex.store_id = '.$row['id'].' #store_id
AND pex.date < DATE_ADD(DATE(NOW()), INTERVAL 1 DAY)
)
)
* p.cost)
/ 100)
) AS "'.$row['key'].'" #store_name
FROM products p WHERE 1
You may be able to further optimize this by splitting the subquery on the product_exits table into 2 separate sub queries, rather than using a OR, which many times will perform poorly. Ultimately, you'll have to benchmark that to see how well the database optimizes the OR on its own.

Select two table and put ordering priority

I have two tables,
ir1_police is included messages that reported to admin.
ir1_police_flag, admin can flag reporters from 1 to 2. (1 means medium 2 means low and nothing means high priority).
If someone try to report something that is not real. Admin will flag it as 1 or 2 .
So I would make a list of report that shows first high priority, second medium and at last low.
I use the mysql statement but there is a problem. if there was nothing ir1_police_report nothing will be shown. or if exist only shows they are on ir_police_flags.
I have no idea to select them if no record exists on ir1_police_flags
SELECT * FROM ir1_police
JOIN ir1_police_flags on ir1_police_flags.uid = `ir1_police.uid
WHERE
ir1_police.status=0 AND ir1_police.parent_id=0
ORDER BY ir1_police.time DESC

Replace JOINwith LEFT JOIN. The former only selects rows from the tables where a match is found, whereas the latter selects all rows from the first table, even when there is no match in the other table.
Then you can add a second field to ORDER BY:
SELECT * FROM ir1_police
LEFT JOIN ir1_police_flags ON ir1_police_flags.uid = ir1_police.uid
WHERE ir1_police.status=0 AND ir1_police.parent_id=0
ORDER BY
ir1_police_flags.flag ASC,
ir1_police.time DESC
Notice the LEFT JOIN produces results where all ir1_police_flags's fields are NULL where there is no match in this table. This is perfect in your case, because NULL is considered smaller than any value, as far as ORDER BY is concerned.
Your application might justify this structure, but you should ask yourself whether this flag shouldn't be just a column in the table ir1_police altogether.

MYSQL: GROUP BY on all values except 0 and null?

I have a simple SQL Query:
SELECT tid,
COUNT(*) AS bpn
FROM mark_list
WHERE userid = $userid
GROUP BY tid
Now the column tid is basically a category list associated with each entry. The categories are unique numeric values.
What I am trying to do is get an overall count of how many records there as per userid, but I only want to count an entire category one time (meaning if category 3 has 10000 records, it should only receive a count of 1).
The caveat is that sometimes the category is listed as null or sometimes a 0. If the item has either a 0 or a null, it has no category and I want them counted as their own separate entities and not lumped into a single large category.

Wheeee!
SELECT SUM(`tid` IS NULL) AS `total_null`,
SUM(`tid` = 0) AS `total_zero`,
COUNT(DISTINCT `tid`) AS `other`
FROM `mark_list`
WHERE `user_id` = $userid
Edit: note that if total_zero is greater than 0, you will have to subtract one from the "other" result (because tid=0 will get counted in that column)

You can alter the query to not take into account those particular values (via the WHERE clause), and then perhaps run a separate query that ONLY takes into account those values.
There may be a way to combine it into only one query, but this way should work, too.

Comparing rows in table for differences between fields

I have a table (client) with 20+ columns that is mostly historical data.
Something like:
id|clientID|field1|field2|etc...|updateDate
If my data looks like this:
10|12|A|A|...|2009-03-01
11|12|A|B|...|2009-04-01
19|12|C|B|...|2009-05-01
21|14|X|Y|...|2009-06-11
27|14|X|Z|...|2009-07-01
Is there an easy way to compare each row and highlight the differences in the fields?
I need to be able to simply highlight the fields that changed between revisions (except for the key and the date of course)
There may be multiple fields updated in each new row (or just one).
This would be on a client by client basis so I could select on the clientID to filter.
It could be on the server or client side, which ever is easiest.
More details
I should expand my description a little:
I'm looking to just see if there was a difference between the fields (one is different in any way). Some of the data is numeric, some is text others are dates. A more complete example might be:
10|12|A|A|F|G|H|I|J|...|2009-03-01
11|12|A|B|F|G|H|I|J|...|2009-04-01
19|12|C|B|F|G|Z|I|J|...|2009-05-01 ***
21|14|X|Y|L|M|N|O|P|...|2009-06-11
27|14|X|Z|L|M|N|O|P|...|2009-07-01
I'd want to be able to isplay each row for clientID 12 and highlight B from row 11 and C & Z from row 19.

Any expression in SQL must reference columns only in one row (barring subqueries).
A JOIN can be used to make two different rows into one row of the result set.
So you can compare values on different rows by doing a self-join. Here's an example that shows joining each row to every other row associated with the same client (excluding a join of a row to itself):
SELECT c1.*, c2.*
FROM client c1
JOIN client c2 ON (c1.clientID = c2.clientID AND c1.id <> c2.id)
Now you can write expressions that compare columns. For example, to restrict the above query to those where field1 differs:
SELECT c1.*, c2.*
FROM client c1
JOIN client c2 ON (c1.clientID = c2.clientID AND c1.id <> c2.id)
WHERE c1.field1 <> c2.field1;
You don't specify what kinds of comparisons you need to make, so I'll leave that to you. The key point is that in general, you can use a self-join to compare rows in a given table.
Re your comments and clarification: Okay, so your "difference" is not simply by value but by ordinal position of the row. Remember that relational databases don't have a concept of row number, they only have order of rows with respect to some order you must specify in an ORDER BY clause. Don't confuse the "id" pseudokey with row number, the numbers are assigned as monotonically increasing only by coincidence of their implementation.
In MySQL, you could take advantage of user-defined variables to achieve the effect you're looking for. Order the query by clientId and then by id, and track values per column in MySQL user variables. When the value in a current row differs from the value in the variable, do whatever highlighting you were going to do. I'll show an example for one field:
SET #clientid = -1, #field1 = '';
SELECT id, clientId, field1, #clientid, #field1,
IF(#clientid <> clientid,
((#clientid := clientid) AND (#field1 := field1)) = NULL,
IF (#field1 <> field1,
(#field1 := field1),
NULL
)
) AS field1_changed
FROM client c
ORDER BY clientId, id;
Note this solution is not really different from just selecting all rows with plain SQL, and tracking the values with application variables as you fetch rows.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Is my JOIN + GROUP BY ... HAVING COUNT query correct? - php

Related

Speed-up/Optimise MySQL statement - finding a new row that hasn't been selected before

Retail inventory Mysql query optimization

Select two table and put ordering priority

MYSQL: GROUP BY on all values except 0 and null?

Comparing rows in table for differences between fields

Categories

Resources