How to count unique set values in MySQL

How to count unique set values in MySQL - php

I would appreciate your input to help me count unique values for a SET type in MySql. I have a column named "features" defined as a SET field as follows:
CREATE TABLE cars (features SET('power steering', 'power locks', 'satellite radio', 'power windows', 'sat nav', 'turbo'));
As I fill this table, since the features are not mutually exclusive, I will get records which include a combination of 2 or more of these features. For example:
Car 1 has power steering and power windows, but none of the remaining features.
Car 2 has all features.
Car 3 has all features, except sat nav and turbo.
What I want to do is to get a list of all single listed features in the table, including the count of records associated to each in a similar fashion as a SELECT statement using a GROUP BY clause. So, following with the example above, I should be able to get the following result:
features |count
---------------+------
power steering | 3 //All cars have this feature
power locks | 2 //Only cars 2 and 3 have it
satellite radio| 2 //Only cars 2 and 3 have it
power windows | 3
sat nav | 1 //only car 2 has it
turbo | 1 //only car 2 has it
I have tried using the following query with the expectation of obtaining the aforementioned result:
SELECT features, COUNT(features) FROM cars GROUP BY features;
However, instead of what I was expecting, I got the count of each of the existing feature combinations:
features |count
------------------------------------------------+--------
power steering, power windows | 1 //i.e. only 1 car has
| //only these 2 features
| //(car 1 in this example)
|
------------------------------------------------+-------
power steering, power locks, satellite radio, |
power windows, sat nav, turbo | 1
------------------------------------------------+-------
power steering, power locks, satellite radio, |
power windows | 1
So, the question is: Is there a way of obtaining the count of each single feature, as shown in the first table, using one single MySQL query? I could do it by executing one query for each feature, but I'm sure there must be a way of avoiding such hassle. Someone might as well suggest using a different table for the features and joining, but it is not possible at this point without heavily impacting the rest of the project. Thanks in advance!

SELECT set_list.features, COUNT(cars.features) FROM
(SELECT TRIM("'" FROM SUBSTRING_INDEX(SUBSTRING_INDEX(
(SELECT TRIM(')' FROM SUBSTR(column_type, 5)) FROM information_schema.columns
WHERE table_name = 'cars' AND column_name = 'features'),
',', #r:=#r+1), ',', -1)) AS features
FROM (SELECT #r:=0) deriv1,
(SELECT ID FROM information_schema.COLLATIONS) deriv2
HAVING #r <=
(SELECT LENGTH(column_type) - LENGTH(REPLACE(column_type, ',', ''))
FROM information_schema.columns
WHERE table_name = 'cars' AND column_name = 'features')) set_list
LEFT OUTER JOIN cars
ON FIND_IN_SET(set_list.features, cars.features) > 0
GROUP BY set_list.features
Adapted from:
MySQL: Query for list of available options for SET
My query takes the SQL at the above post as the basis, to get a list of the available column values. All of the indented SQL is that one query, if you execute it alone you'll get the list, and I create a result set from it which I call "set_list". I just copied that query as is, but it is basically doing a lot of string manipulation to get the list - as Mike Brant suggested, the code would be far simpler (but maybe just not as dynamic) if you put the list into another table, and just joined that.
I then join set_list back against the cars table, joining each item from set_list against the rows in cars that contain that feature - FIND_IN_SET(). It's an outer join, so if anything from the set list isn't represented, it will be there with a count of zero.

Typically, we use the FIND_IN_SET function.
You could use a query like this to return the specified result:
SELECT f.feature
, COUNT(1)
FROM ( SELECT 'power steering' AS feature
UNION ALL SELECT 'power locks'
UNION ALL SELECT 'satellite radio'
UNION ALL SELECT 'power windows'
UNION ALL SELECT 'sat nav'
UNION ALL SELECT 'turbo'
) f
JOIN cars c
ON FIND_IN_SET(f.feature,c.features)>0
GROUP BY f.feature
ORDER BY f.feature
You could omit >0 and get the same result. This query omits "zero counts": rows with a "feature" that doesn't appear for any car. To get those, you could use an outer join (add the LEFT keyword before JOIN, and rather than COUNT(1) in the SELECT list, COUNT(expr) where expr is a column from cars that is NOT NULL, or some other expression that will be non-NULL when a matching row is found, and NULL when a matching row is not found.

Related

MySQL join query duplicates users in output

I have the following tables
ea_users
id
first_name
last_name
email
password
id_roles
ea_user_cfields
id
c_id = custom field ID
u_id = user ID
data
ea_customfields
id
name = name of custom field
description
I want to get all users which have a certain role, but I also want to retrieve all the custom fields per user. This is for the backend of my software where all the ea_users and custom fields should be shown.
I tried the following, but for each custom field, it duplicates the same user
$this->db->join('(SELECT GROUP_CONCAT(data) AS custom_data, id AS dataid, u_id, c_id
FROM ea_user_cfields userc
GROUP BY id) AS tt', 'tt.u_id = ea.id','left');
$this->db->join('(SELECT GROUP_CONCAT(name) AS custom_name, id AS customid
FROM ea_customfields AS cf
GROUP BY id) AS te', 'tt.c_id = te.customid','left');
$this->db->where('id_roles', $customers_role_id);
return $this->db->get('ea_users ea')->result_array();

the problem that u did not understand properly how join works.
its ok, that u have duplicates in select when u have relation one to many.
in few words your case: engine tries to fetch data from table "A" (ea_users) then JOIN according to the conditions another table "B" (ea_customfields). If u have one to many relation between tables (it means that one record from table "A" (lets say that we have in this table A1 record) can contain few related rows in table "B", lets call them as B1.1, B1.2 and B1.3 and B1.4), in this case it will join this records and put join result in memory. So in memory u would see something like
| FromTable A | FromTableB |
| A1 | B1.1 |
| A1 | B1.2 |
| A1 | B1.3 |
| A1 | B1.4 |
if u have 10 records in table "B", which related to the table "A" it would put 10 times in memory copy of data from table "A" during fetching. And then will render it to u.
depending on join type rows, with missing related records, can be skipped at all (INNER JOIN), or can be filled up with NULLs (LEFT JOIN or RIGHT JOIN), etc.
When u think about JOINs, try to imagine yourself, when u try to join on the paper few big tables. U would always need to mark somehow which data come from which table in order to be able to operate with it later, so its quite logically to write row "A1" from table "A" as many times as u need to fill up empty spaces when u find appropriate record in table "B". Otherwise u would have on your paper something like:
| FromTable A | FromTableB |
| A1 | B1.1 |
| | B1.2 |
| | B1.3 |
| | B1.4 |
Yes, its looks ok even when column "FromTable A" contains empty data, when u have 5-10 records and u can easily operate with it (for example u can sort it in your head - u just need to imagine what should be instead of empty space, but for it, u need to remember all the time order how did u wrote the data on the paper). But lets assume that u have 100-1000 records. if u still can sort it easily, lets make things more complicated and tell, that values in table "A" can be empty, etc, etc.. Thats why for mysql engine simpler to repeat many times data from table..
Basically, I always stick to examples when u try to imagine how would u join huge tables on paper or will try to select something from this tables and then make sorting there or something, how would u look through the tables, etc.
GROUP_CONCAT, grouping
Then, next mistake, u did not understand how GROUP_CONCAT works:
The thing is that mysqlEngine fetch on the first step structure into memory using all where conditions, evaluating subqueries + appends all joins. When structure is loaded, it tried to perform GROUPing. It means that it will select from temporary table all rows related to the "A1". Then will try to apply aggregation function to selected data. GROUP_CONCAT function means that we want to apply concatenation on selected group, thus we would see something like "B1.1, B1.2, B1.3, B1.4". Its in few words, but I hope it will help a little to understand it.
I googled table structure so u can write some queries there.
http://www.mysqltutorial.org/tryit/query/mysql-left-join/#1
and here is example how GROUP_CONCAT works, try to execute there query:
SELECT
c.customerNumber, c.customerName, GROUP_CONCAT(orderNumber) AS allOrders
FROM customers c
LEFT JOIN orders o ON (c.customerNumber = o.customerNumber)
GROUP BY 1,2
;
can compare with results with previous one.
power of GROUP in aggregation functions which u can use with it. For example, u can use "COUNT()", "MAX()", "GROUP_CONCAT()" or many many others.
or example of fetching of count (try to execute it):
SELECT c.customerName, count(*) AS ordersCount
FROM customers AS c
LEFT JOIN orders AS o ON (c.customerNumber = o.customerNumber)
GROUP BY 1
;
so my opinion:
simpler and better to solve this issue on client side or on backend, after fetching. because in term of mysql engine response with duplication in column is absolutely correct. BUT of course, u can also solve it using grouping with concatenations for example. but I have a feeling that for your task its overcomplicating of logic
PS.
"GROUP BY 1" - means that I want to group using column 1, so after selecting data into memory mySql will try to group all data using first column, better not to use this format of writing on prod. Its the same as "GROUP BY c.customerNumber".
PPS. Also I read comments like "use DISTINCT", etc.
To use DISTINCT or order functions, u need to understand how does it work, because of incorrect usage it can remove some data from your selection, (same as GROUP or INNER JOINS, etc). On the first look, you code might work fine, but it can cause bugs in logic, which is the most complicated to find out later.
Moreover DISTINCT will not help u, when u have one-to-many relation(in your particular case). U can try to execute queries:
SELECT
c.customerName, orderNumber AS nr
FROM customers c
INNER JOIN orders o ON (c.customerNumber = o.customerNumber)
WHERE c.customerName='Alpha Cognac'
;
SELECT
DISTINCT(c.customerName), orderNumber AS nr
FROM customers c
INNER JOIN orders o ON (c.customerNumber = o.customerNumber)
WHERE c.customerName='Alpha Cognac'
;
the result should be the same. Duplication in customer name column and orders numbers.
and example how to loose data with incorrect query ;):
SELECT
c.customerName, orderNumber AS nr
FROM customers c
INNER JOIN orders o ON (c.customerNumber = o.customerNumber)
WHERE c.customerName='Alpha Cognac'
GROUP BY 1
;

MySQL or PHP: Find optimum combination of rows based on score

I have a MySQL database with the following columns
Flavour1| Flavour2 | Score
-------------------------------------
Vanilla | Strawberry | 7
Choc | Toffee | 8
Vanilla | Choc | 6
Toffee | Vanilla | 7
Etc.
I want to be able to select N rows from the table which, in combination, have the highest total score, but are subject to restrictions on the number of times each flavour can feature.
For example, I may want to choose the 5 best flavour combinations (rows) with no single flavour appearing more than 3 times (count of Flavour1+Flavour2 < 3)
I'm struggling to get my head around how to do it due to the fact the db has to compare all combinations to get the score, whilst keeping count of the number of times a flavour has featured.
Any help much appreciated!!
EDIT - if there's an algorithmic way to do this in PHP that would also be acceptable.

As specified, there is no "efficient" way to do this in SQL. You can do this by generating all combinations and then applying the rules you want in a where clause. Let me also assume that you have an id on each column, uniquely identifying a pair.
In your case, because you allow duplicates, I would add a count to each combination and use that for the combination:
create view v_withcounts as
select t.*, 1 as cnt
from table t
union all
select t.*, 2 as cnt
from table t
union all
select t.*, 3 as cnt
from table t;
Then for the query:
select v1.id, coalesce(v1.cnt),
v2.id, coalesce(v2.cnt),
v3.id, coalesce(v3.cnt),
v4.id, coalesce(v4.cnt),
v5.id, coalesce(v5.cnt)
from v_withcounts v1 left join
v_withcounts v2
on v2.id not in (v1.id) left join
v_withcounts v3
on v3.id not in (v1.id, v2.id) left join
v_withcounts v4
on v4.id not in (v1.id, v2.id, v3.id) left join
v_withcounts v2
on v5.id not in (v1.id, v2.id, v3.id, v4.id)
where (coalesce(v1.cnt, 0) + coalesce(v2.cnt, 0) + coalesce(v3.cnt, 0) +
coalesce(v4.cnt, 0) + coalesce(v5.ccnt, 0)
) = 5
Algorithmically, there are probably more efficient ways to solve this problem. I suspect that a greedy algorithm would be much faster and would generate your desired result.

Mysql conditions with grouping many-to-many tables

I was wondering if somebody can think of a more elegant solutions to my problem. I have trouble finding similar cases.
I have 5 tables. 3 are details for employees, skills and subskills. The remaining 2 are linking tables.
skill_links
skill_id subskill_id
1 4
1 5
2 4
2 6
emp_skill_links
employee_id subskill_id acquired
1 4 2013-04-05 00:00:00
1 5 2014-02-24 00:00:00
2 6 2012-02-26 00:00:00
2 5 2011-06-14 00:00:00
Both have many-to-many relations. Skills with subskills (skill_links) and employees with subskills (emp_skill_links).
I want to pick employees who have acquired all subskills for a skill. I tried doing it with one query, but couldn't manage it with the grouping involved. At the moment my solution is two separate queries and matching these in php array later. That is:
SELECT sl.skill_id, COUNT(sl.subskill_id) as expected
FROM skill_links sl
GROUP BY sl.skill_id
to be compared with:
SELECT sl.skill_id, esl.employee_id, COUNT(esl.subskill_id) as provided
FROM emp_skill_links esl
INNER JOIN skill_links sl
ON sl.subskill_id = esl.subskill_id
GROUP BY sl.skill_id, esl.employee_id
Is there a more efficient single query solution to my problem? Or would it not be worth the complexity involved?

If you consider a query consisting of sub-queries as meeting your requirement for "a more efficient single query solution" (depends on your definition of "single query"), then this will work.
SELECT employeeTable.employee_id
FROM
(SELECT sl.skill_id, COUNT(*) AS subskill_count
FROM skill_links sl
GROUP BY sl.skill_id) skillTable
JOIN
(SELECT esl.employee_id, sl2.skill_id, COUNT(*) AS employee_subskills
FROM emp_skill_links esl
JOIN skill_links sl2 ON esl.subskill_id = sl2.subskill_id
GROUP BY esl.employee_id, sl2.skill_id) employeeTable
ON skillTable.skill_id = employeeTable.skill_id
WHERE employeeTable.employee_subskills = skillTable.subskill_count
What the query does:
Select the count of sub-skills for each skill
Select the count of sub-skills for each employee for each main skill
Join those results based on the main skill
Select the employees from that who have a sub-skill count equal to
the count of sub-skills for the main skill
DEMO
In the is example, users 1 and 3 each have all sub-skills of main skill 1. User 2 only has 2 of the 3 sub-skills of main skill 2.
You'll note that the logic here is similar to what you're already doing, but it has the advantage of just one db request (instead of two) and it doesn't involve the PHP work of creating, looping through, comparing, and reducing arrays.

Using left join to get total rows in other table

Hi I all I have two tables, yalladb_hotel and yalladb_room_types and their structure are
yalladb_hotel
-----------------------------------------
| id | name | address | fax | telephone |
---------------------------------------------
And yalladb_room_types
-----------------------------
|id | hotel_id | roomtype_name | rate |
Now I want to get all information from hotel table and want to get total number of room types related to hotel table. I used left join as it is not necessary that all hotels have room types. So I used following query
SELECT
h.*,
count(rm.*) as total_room_types
FROM
yalladb_hotel h
LEFT JOIN yalladb_room_types rm
ON h.id=rm.hotel_id
LIMIT
0,5
But it is producing following error and I am totally unable to understand the error is....
#1064 - You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version
for the right syntax to use near '*) as total_room_types FROM
yalladb_hotel h LEFT JOIN yalladb_room_types rm ON h' at line 1
Can any one tell what is there?
Regards

If you are using agregate function you should put every columns(except agregate column) to GROUP BY section
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
Explanation:
Select (these columns should be in group by section), count(agr_col)
from table
group by (here should be those columns also)

Just do
COUNT(rm.id) as Total_Room_Types
Unless you have a specific classification to differentiate between, Double, Queen, King size bed rooms.
If your Hotel room "Name" is the classification of room type as described above, you should pre-query the room types first and join to that.
SELECT
h.*,
COALESCE( PreQuery.Name, " " ) as RoomType,
COALESCE( PreQuery.RoomTypeCount, 0 ) as RoomTypeCount
FROM
yalladb_hotel h
LEFT JOIN ( select rm.hotel_id,
rm.name,
count(*) as RoomTypeCount
from
yalladb_room_types rm
group by
rm.hotel_id,
rm.name ) PreQuery
ON h.id=PreQuery.hotel_id
LIMIT
0,5
EDIT CLARIFICATIONS...
To clarify my answer. Instead of just a count of how many rooms, you wanted them per room type. Per your original listed structure, you had "Name" as a column which is now listed as roomType_Name per your edits. I suspected this column to describe the type of room. So my inner query (as opposed to an inner join) tells the query to pre-aggregate this stuff first, grouping by the criteria and let its results be known as an alias of "PreQuery" for the join condition. THEN, back to the main hotel table LEFT joined to "PreQuery" on the hotel ID.
Since a left join will otherwise result in NULL values if no such matches are found in whatever the "OTHER" table is, COALESCE() says... Get the value from parameter 1. If that is null, get the second value... and put that into a final query column called ... RoomType or RoomTypeCount as in this example. So your final query will not have any "NULL" as part of the result, but at least of proper data type expected (char and numeric respectively).

Checking that a recipe contains an ingredient - MYSQL

Hey everyone. I'm having a bit of trouble running a query / php combination efficiently. I seem to be just looping over too many result sets in inner loops in my php. I'm sure there is a more efficient way of doing this. Any help very much appreciated.
I've got a table that holds 3500 recipes ([recipe]):
rid | recipe_name
And another table that holds 600 different ingredients ([ingredients])
iid | i_name
Each recipe has x number of ingredients associated to it, and I use a nice joining table to create the association ([recipe_ingredients])
uid | rid | iid
(where uid is just a unique id for the table)
For example:
rid: 1 | recipe_name: Lemon Tart
.....
iid: 99 | i_name: lemon curd
iid: 154 | i_name: flour
.....
1 | 1 | 99
2 | 1 | 154
The query I'm trying to run, allows the user to enter what ingredients they have, and it will tell you anything you can make with those ingredients. It doesn;t have to use all ingredients, but you do need to have all the ingredients for the recipe.
For instance if I had flour, egg, salt, milk and lemon curd I could make 'Pancakes', and 'Lemon Tart' (if we assume lemon tart has no other ingredients:)), but couldn't make 'Risotto' (as I didnt have any rice, or anything else thats needed in it).
In my PHP I have an array containing all the ingredients the user has. At the moment they way I'm running this is going through every recipe (loop 1) and then checking all ingredients in that recipe to see if each ingredient is contained in my ingredients array (loop 2). As soon as it finds an ingredient in the recipe, that isnt in my array, it says "no" and goes onto the next recipe. If it does, it stores the rid in a new array, that I use later to display the results.
But if we look at the efficiency of that, if I assume 3500 recipes, and Ive got 40 ingredients in my array, the worst case scenario is it running through 3500 x 40n, where n = number of ingredients in the recipe. The best case is still 3500 x 40 (doesn't find an ingredient first time for every recipe so exits).
I think my whole approach to this is wrong, and I think there must be some clever sql that I'm missing here. Any thoughts? I can always build up an sql statement from the ingredient array I have ......
Thanks a lot in advance, much appreciated

I'd suggest storing the count of the number of ingredients for the recipe in the recipe table, just for efficiency's sake (it will make the query quicker if it doesn't have to calculate this information every time). This is denormalization, which is bad for data integrity but good for performance. You should be aware that this can cause data inconsistencies if recipes are updated and you are not careful to make sure the number is updated in every relevant place. I've assumed you've done this with the new column set as ing_count in the recipe table.
Make sure you escape the values in for NAME1, NAME2, etc if they are provided via user input - otherwise you are at risk for SQL injection.
select recipe.rid, recipe.recipe_name, recipe.ing_count, count(ri) as ing_match_count
from recipe_ingredients ri
inner join (select iid from ingredients where i.name='NAME1' or i.name='NAME2' or i.NAME='NAME3') ing
on ri.iid = ing.iid
inner join recipe
on recipe.rid = ri.rid
group by recipe.rid, recipe.recipe_name, recipe.ing_count
having ing_match_count = recipe.ing_count
If you don't want to store the recipe count, you could do something like this:
select recipe.rid, recipe.recipe_name, count(*) as ing_count, count(ing.iid) as ing_match_count
from recipe_ingredients ri
inner join (select iid from ingredients where i.name='NAME1' or i.name='NAME2' or i.NAME='NAME3') ing
on ri.iid = ing.iid
right outer join recipe
on recipe.rid = ri.rid
group by recipe.rid, recipe.recipe_name
having ing_match_count = ing_count

You could an "IN ANY" type query:
select recipes.rid, count(recipe_ingredients.iid) as cnt
from recipes
left join recipe_ingredients on recipes.rid = recipe_ingredients.rid
where recipes_ingredients in any (the,list,of,ingredients,the,user,hash)
group by recipes.rid
having cnt > some_threshold_amount
order by cnt desc
Doing this off the top of my head, but basically pull out any recipes where at least one of the user-provided ingredients are listed, sort by the total ingredient count, and then only return the recipes where more than a threshold amount of ingredients are present.
I've probably got the threshold bit wrong - sneaky suspicion it'll count the recipes's ingredients, and not the user-provided ones, but the rest of the query should be a good start for what you need.

Question: why isn't your query directly sql?
You can optimize by eliminating the wrong recipes:
firstly eliminate the recipes that have more ingridients than you user ingredients
make a recursive greedy by:
pick the first rid|iid
if it's in the user ingredients, continue,
if not, eliminate from the Recipe_Ingredients table all the rows with rid => new_table
restart using the new_table | stop new_table count = 0
It should have the best statistical results.
Hope it helped

Something like this:
SELECT r.*, COUNT(ri.iid) AS count FROM recipe r
INNER JOIN recipe_ingredient ri ON r.rid = ri.rid
INNER JOIN ingredient i ON i.iid = ri.iid
WHERE i.name IN ('milk', 'flour')
GROUP BY r.rid
HAVING count = 2
It's pretty easy to understand. count hold the number of ingredients within the list (milk, flour) that were matched for each recipe. If count matches the number of ingredients in the WHERE clause (in this case: 2), then return the recipe.

SELECT irl.ingredient_amount, r . * , i.thumbnail
FROM recipes r
LEFT JOIN recipe_images i ON ( i.recipe_id = r.recipe_id )
LEFT JOIN ingredients_recipes_link irl ON ( irl.recipe_id = r.recipe_id )
WHERE irl.recipe_id
IN (
SELECT recipe_id
FROM `ingredients_recipes_link`
WHERE ingredient_id
IN ( 24, 21, 22 )
HAVING count( * ) =3
)
GROUP BY r.recipe_id

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.