Mysql conditions with grouping many-to-many tables

Mysql conditions with grouping many-to-many tables - php

I was wondering if somebody can think of a more elegant solutions to my problem. I have trouble finding similar cases.
I have 5 tables. 3 are details for employees, skills and subskills. The remaining 2 are linking tables.
skill_links
skill_id subskill_id
1 4
1 5
2 4
2 6
emp_skill_links
employee_id subskill_id acquired
1 4 2013-04-05 00:00:00
1 5 2014-02-24 00:00:00
2 6 2012-02-26 00:00:00
2 5 2011-06-14 00:00:00
Both have many-to-many relations. Skills with subskills (skill_links) and employees with subskills (emp_skill_links).
I want to pick employees who have acquired all subskills for a skill. I tried doing it with one query, but couldn't manage it with the grouping involved. At the moment my solution is two separate queries and matching these in php array later. That is:
SELECT sl.skill_id, COUNT(sl.subskill_id) as expected
FROM skill_links sl
GROUP BY sl.skill_id
to be compared with:
SELECT sl.skill_id, esl.employee_id, COUNT(esl.subskill_id) as provided
FROM emp_skill_links esl
INNER JOIN skill_links sl
ON sl.subskill_id = esl.subskill_id
GROUP BY sl.skill_id, esl.employee_id
Is there a more efficient single query solution to my problem? Or would it not be worth the complexity involved?

If you consider a query consisting of sub-queries as meeting your requirement for "a more efficient single query solution" (depends on your definition of "single query"), then this will work.
SELECT employeeTable.employee_id
FROM
(SELECT sl.skill_id, COUNT(*) AS subskill_count
FROM skill_links sl
GROUP BY sl.skill_id) skillTable
JOIN
(SELECT esl.employee_id, sl2.skill_id, COUNT(*) AS employee_subskills
FROM emp_skill_links esl
JOIN skill_links sl2 ON esl.subskill_id = sl2.subskill_id
GROUP BY esl.employee_id, sl2.skill_id) employeeTable
ON skillTable.skill_id = employeeTable.skill_id
WHERE employeeTable.employee_subskills = skillTable.subskill_count
What the query does:
Select the count of sub-skills for each skill
Select the count of sub-skills for each employee for each main skill
Join those results based on the main skill
Select the employees from that who have a sub-skill count equal to
the count of sub-skills for the main skill
DEMO
In the is example, users 1 and 3 each have all sub-skills of main skill 1. User 2 only has 2 of the 3 sub-skills of main skill 2.
You'll note that the logic here is similar to what you're already doing, but it has the advantage of just one db request (instead of two) and it doesn't involve the PHP work of creating, looping through, comparing, and reducing arrays.

Related

How to count unique set values in MySQL

I would appreciate your input to help me count unique values for a SET type in MySql. I have a column named "features" defined as a SET field as follows:
CREATE TABLE cars (features SET('power steering', 'power locks', 'satellite radio', 'power windows', 'sat nav', 'turbo'));
As I fill this table, since the features are not mutually exclusive, I will get records which include a combination of 2 or more of these features. For example:
Car 1 has power steering and power windows, but none of the remaining features.
Car 2 has all features.
Car 3 has all features, except sat nav and turbo.
What I want to do is to get a list of all single listed features in the table, including the count of records associated to each in a similar fashion as a SELECT statement using a GROUP BY clause. So, following with the example above, I should be able to get the following result:
features |count
---------------+------
power steering | 3 //All cars have this feature
power locks | 2 //Only cars 2 and 3 have it
satellite radio| 2 //Only cars 2 and 3 have it
power windows | 3
sat nav | 1 //only car 2 has it
turbo | 1 //only car 2 has it
I have tried using the following query with the expectation of obtaining the aforementioned result:
SELECT features, COUNT(features) FROM cars GROUP BY features;
However, instead of what I was expecting, I got the count of each of the existing feature combinations:
features |count
------------------------------------------------+--------
power steering, power windows | 1 //i.e. only 1 car has
| //only these 2 features
| //(car 1 in this example)
|
------------------------------------------------+-------
power steering, power locks, satellite radio, |
power windows, sat nav, turbo | 1
------------------------------------------------+-------
power steering, power locks, satellite radio, |
power windows | 1
So, the question is: Is there a way of obtaining the count of each single feature, as shown in the first table, using one single MySQL query? I could do it by executing one query for each feature, but I'm sure there must be a way of avoiding such hassle. Someone might as well suggest using a different table for the features and joining, but it is not possible at this point without heavily impacting the rest of the project. Thanks in advance!

SELECT set_list.features, COUNT(cars.features) FROM
(SELECT TRIM("'" FROM SUBSTRING_INDEX(SUBSTRING_INDEX(
(SELECT TRIM(')' FROM SUBSTR(column_type, 5)) FROM information_schema.columns
WHERE table_name = 'cars' AND column_name = 'features'),
',', #r:=#r+1), ',', -1)) AS features
FROM (SELECT #r:=0) deriv1,
(SELECT ID FROM information_schema.COLLATIONS) deriv2
HAVING #r <=
(SELECT LENGTH(column_type) - LENGTH(REPLACE(column_type, ',', ''))
FROM information_schema.columns
WHERE table_name = 'cars' AND column_name = 'features')) set_list
LEFT OUTER JOIN cars
ON FIND_IN_SET(set_list.features, cars.features) > 0
GROUP BY set_list.features
Adapted from:
MySQL: Query for list of available options for SET
My query takes the SQL at the above post as the basis, to get a list of the available column values. All of the indented SQL is that one query, if you execute it alone you'll get the list, and I create a result set from it which I call "set_list". I just copied that query as is, but it is basically doing a lot of string manipulation to get the list - as Mike Brant suggested, the code would be far simpler (but maybe just not as dynamic) if you put the list into another table, and just joined that.
I then join set_list back against the cars table, joining each item from set_list against the rows in cars that contain that feature - FIND_IN_SET(). It's an outer join, so if anything from the set list isn't represented, it will be there with a count of zero.

Typically, we use the FIND_IN_SET function.
You could use a query like this to return the specified result:
SELECT f.feature
, COUNT(1)
FROM ( SELECT 'power steering' AS feature
UNION ALL SELECT 'power locks'
UNION ALL SELECT 'satellite radio'
UNION ALL SELECT 'power windows'
UNION ALL SELECT 'sat nav'
UNION ALL SELECT 'turbo'
) f
JOIN cars c
ON FIND_IN_SET(f.feature,c.features)>0
GROUP BY f.feature
ORDER BY f.feature
You could omit >0 and get the same result. This query omits "zero counts": rows with a "feature" that doesn't appear for any car. To get those, you could use an outer join (add the LEFT keyword before JOIN, and rather than COUNT(1) in the SELECT list, COUNT(expr) where expr is a column from cars that is NOT NULL, or some other expression that will be non-NULL when a matching row is found, and NULL when a matching row is not found.

MYSQL count based off of two tables

Sorry for asking this, but I haven't found an answer to what I'm trying to do anywhere!
Basically, I have a database with two tables. Below or two examples I'll use:
Table 1:
Process ID Date
---------- -----------
1 2008/08/21
2 2008/08/23
3 2008/08/21
Table 2:
Process ID Qty
---------- ---
1 1
2 4
3 6
Basically, I was to do something in PHP where I will select table 1, and find all processes that occur today (in this example I'll say the 21st of August). I then want to take those process ids, and match them in Table two and give a count of their quantities.
The end result I'm trying to figure out in this example is how do I get the output to be "7" by using PHP to select the processes that happened today in one table, then add up the corresponding process quantities in another table.

SELECT sum(t2.qty)
FROM table1 t1
JOIN table2 t2 ON t1.pid = t2.pid
WHERE t1.date = '2008/08/21'

How to find the top 10 most popular values in a comma separated list with PHP & MYSQL

I am new to all of this and I have Googled and searched on here, but to no avail. Using google and some of the responses here, I've managed to solve a separate problem, but this is what I'm really interested in and am wondering if this is even possible/how to accomplish it.
I have mysql table that looks like this:
id type of game players timestamp
1 poker a,b,c,d,e,f,g,h 2011-10-08 08:00:00
2 fencing i,j,k,l,m,n,o,p 2011-10-08 08:05:00
3 tennis a,e,k,g,p,o,d,z 2011-10-08 08:10:00
4 football x,y,f,b 2011-10-08 08:15:00
There are 7 types of games, and either 4 or 8 players separated by commas for each gametype.
However, the players are IRC nicknames so potentially there could be new players with unique nicknames all the time.
What I am trying to do is look in the players column of the entire table and find the top 10 players in terms of games played, regardless of the gametype, and print it out to a website in this format, e.g.:
Top 10 Players:
a (50 games played)
f (39 games played)
o (20 games played)
......
10 g (2 games played)
Does anyone have any idea how to accomplish this? Any help is appreciated! Honestly, without this website I would not have even come this fair in my project!

My suggestion is that you don't keep a list of the players for each game in the same table, but rather implement a relationship between a games table and a players table.
The new model could look like:
TABLE Games:
id type of game timestamp
1 poker 2011-10-08 08:00:00
2 fencing 2011-10-08 08:05:00
3 tennis 2011-10-08 08:10:00
4 football 2011-10-08 08:15:00
TABLE Players:
id name
1 a
2 b
3 c
.. ..
TABLE PlayersInGame:
id idGame idPlayer current
1 1 1 true //Player a is currently playing poker
When a player starts a game, add it to the PlayersInGame table.
When a player exits a game, set the current status to false.
To retrieve the number of games played by a player, query the PlayersInGame table.
SELECT COUNT FROM PlayersInGame WHERE idPlayer=1
For faster processing you need to de-normalize(not actually denormalization, but i don't know what else to call it) the table and keep track of the number of games for each player in the Players table. This would increase the table size but provide better speed.
So insert column games played in Players and query after that:
SELECT * FROM Players ORDER BY games_played DESC LIMIT 10
EDIT:
As Ilmari Karonen pointed out, to gain speed from this you must create an INDEX for the column games_played.

Unless you have a huge number of players, you probably don't need the denormalization step suggested at the end of Luchian Grigore's answer. Assuming tables structured as he initially suggests, and an index on PlayersInGame (idPlayer), the following query should be reasonably fast:
SELECT
name,
COUNT(*) AS games_played
FROM
PlayersInGame AS g
JOIN Players AS p ON p.id = g.idPlayer
GROUP BY g.idPlayer
ORDER BY games_played DESC
LIMIT 10
This does require a filesort, but only on the grouped data, so its performance will only depend on the number of players, not the number of games played.
Ps. If you do end up adding an explicit games_played column to the player table, do remember to create an index on it — otherwise the denormalization will gain you nothing.

Output of invoice from number of tables

I have the following tables
customers
cust_id cust_name
1 a company
2 a company 2
3 a company 3
tariffs
tariff_id cost_1 cost_2 cost_3
1 2 0 3
2 1 1 1
3 4 0 0
terminals
term_id cust_id term_number tariff_id
1 1 12345 1
2 1 67890 2
3 2 14324 1
4 3 78788 3
usage
term_ident usage_type usage_amount date
12345 1 20 11/12/2010
67890 2 10 31/12/2010
14324 1 1 01/01/2011
14324 2 5 01/01/2011
78788 1 0 14/01/2011
In real life the tables are quite large - there are 5000 customers, 250 tariffs, 500000 terminals and 5 million usage records.
In the terminals table - term_id, cust_id and tariff_id are all foreign keys. There are no foreign keys in the usage table - this is just raw data imported from a csv file.
There could be terminals in the usage table that do not exist in the terminals table - these can be ignored.
What I need to do is produce an invoice per customer on usage. I only want to include usage between 15/12/2010 and 15/01/2011 - this is the billing period. I need to calculate the line items of the invoice for the usage based on its tariff ... for example: take the first record in the usage table - the cost of usage_1 (for term_id 1) would be 90x2=180, this is because term_ident uses tariff_id number 1.
The output should be as follows
Customer 2
date terminal usage_cost_1 usage_cost_2 usage_cost_3 total cost
01/01/2011 14324 18 0 6 24
I am a competent PHP developer - but only a beginner with SQL. What I need some advice on is the most efficient process for producing the invoices - perhaps there is an SQL query that would help me before the processing in PHP starts to calculate the costs - or perhaps the SQL statement could produce the costs too ? Any advice is welcome ....
Edit:
1) There is something currently running this process - its written in C++ and takes around 24 hours to process this ... i do not have access to the source.
2) I am using Doctrine in Symfony - im not sure how helpfuly Doctrine will be as retrieving data as Objects is only going to slow down the process - and I'm not sure if the use of Objects will help too much here ?
Edit #13:54 ->
Had the usage table specified incorrectly ... Sorry !
I have to map the usage_type to a cost on the specific tariff for each terminal ie usage_type of 1 = cost_1 in appropriate tariff ... I guess that makes it slightly more complicated ?

There you go, should take less than 24 hours ;)
SELECT u.date, u.term_ident terminal,
(ta.cost_1 * u.usage_1) usage_cost_1,
(ta.cost_2 * u.usage_2) usage_cost_2,
(ta.cost_3 * u.usage_3) usage_cost_3,
(usage_cost_1 + usage_cost_2 + usage_cost_3) total_cost
FROM usage u
INNER JOIN terminals te ON te.term_number = u.term_ident
INNER JOIN tariffs ta ON ta.tariff_id = te.tariff_id
INNER JOIN customers c ON c.cust_id = te.cust_id
WHERE u.date BETWEEN '2010-12-15' AND '2011-01-15'
AND c.cust_id = 2
This query is only for the customer with cust_id = 2. If you want a result for the whole dataset, just remove the condition.
Update
It's not that trivial with your new requirements. You could transform the usage table to the new one you posted before.
To make a decision in SELECT queries you can do something like this. But this is not the result you expect. It could be used to create the transformed new usage table.
SELECT u.date, u.term_ident terminal,
CASE u.usage_type
WHEN 1 then ta.cost_1 * u.usage_1
WHEN 2 then ta.cost_2 * u.usage_2
WHEN 3 THEN ta.cost_3 * u.usage_3
AS usage_cost
FROM usage u
INNER JOIN terminals te ON te.term_number = u.term_ident
INNER JOIN tariffs ta ON ta.tariff_id = te.tariff_id
INNER JOIN customers c ON c.cust_id = te.cust_id
WHERE u.date BETWEEN '2010-12-15' AND '2011-01-15'
AND c.cust_id = 2

Best way to tell 3 or more consecutive records missing

I'm implementing a achievement system. One of the "badges" I'm trying to create will determine:
If a user has joined in at least one coding challenge
Then hasn't joined in 3 consecutive coding challenges
Then started participating again.
The badge is simply called "I'll be back" ;-)
The tables
users
==================
id fullname
1 Gary Green
challenge
==================================
id name start_date
1 challenge1 01-AUG-2010
2 challenge2 03-AUG-2010
3 challenge3 06-SEP-2010
4 challenge4 07-SEP-2010
5 challenge5 30-OCT-2010
6 challenge6 05-NOV-2010
entries
====================================================
id challengeid userid type code
1 1 1 1 -
2 2 1 1 -
3 6 1 1 -
4 6 1 2 -
The "type" in the entries table refers to if the entry type is either a non-regex based entry or regex based one. A user can submit both a regex and non-regex entry, therefore the above entry for challenge 6 is valid.
Example output
This is the style output of the query I would like (in this case the badge should be awarded):
(for userid 1)
Challenge 1 --> Joined in
Challenge 2 --> Joined in
Challenge 3 --> NULL
Challenge 4 --> NULL
Challenge 5 --> NULL
Challenge 6 --> Joined in
How?
Here are my questions
Whats the best way to do this in a query?
Is there a function I can use in MySQL to SELECT this range without resorting to some PHP?
The query so far
I'm doing a LEFT OUTER JOIN to join the challenge table and entries table (LEFT OUTER to make sure to preserve the challenges the user has not joined in), then sort by challenge start_date to see if the user has not joined in for 3 or more consecutive challenges.
SELECT challenge.id AS challenge_id, entries.id AS entry_id
FROM challenge_entries entries
LEFT OUTER JOIN challenge_details challenge
ON entries.challengeid = challenge.id
WHERE entries.userid = <user_id>
ORDER BY challenge.start_date
GROUP BY entries.challengeid
important edit: for this badge to make sense the criteria will need to be 3 or more consecutive challenges sandwiched between challenges that were joined in i.e. like the example output above. Otherwise anyone who joins in a challenge for the first time will automatically receive the badge. The user has to be seen to have been "away" from participating in challenges for a while (>=3)

I think you need to use the other table to start from...
SELECT challenge.id AS challenge_id,
entries.id AS entry_id
FROM
challenge_details challenge
LEFT JOIN challenge_entries entries ON entries.challengeid = challenge.id and entries.userid = <user_id>
ORDER BY challenge.start_date
adding a group by can be done as you want...

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.