This is an expansion of my original question located here:
How do I pull all rows from a table with one unique field and specific values for another field?
I have a table with two fields: user_id and skill_id.
I want to pull out all rows that have a skill_id of a certain number but I have a large number of skill_id's to search for (~30). I was using the self-join suggestion presented in the question linked above but with so many skills to look for, that query is proving extremely slow.
How can I look for a large number of skill_ids without bogging down the query?
EDIT:
Here's an example of what I'm looking for. Using the table below, I want to pull out all rows of users that have skill_id of 10 AND 11 AND 12, etc. (except I'd be looking for more like 30 skills at a time).
TABLE
user_id | skill_id
=====================
1 | 10
1 | 11
1 | 12
1 | 13
2 | 10
2 | 12
2 | 13
3 | 15
3 | 16
4 | 10
5 | 45
5 | 46
If I understand your question well, below query might help you. Assuming (user_id, skill_id) is UNIQUE or PK.
SELECT user_id
FROM tab
WHERE skill_id IN (30 entries)
GROUP BY user_id
HAVING SUM(skill_id IN (30 entries)) = 30;
You can test here. http://www.sqlfiddle.com/#!2/f73dfe/1/0
select user_id
from table
where skill_id IN (10,11,12...)
make suer skill_is is indexed
Related
I have a codescore table which will soon cross 500,000 rows of dummy data. The table is used to hold the points scored by players in a game. Following are the columns present in the table.
questionid bigint,
playerid varchar(20),
playerscore float,
questionlevel enum('EASY', 'MODERATE', 'HARD'),
questionstatus enum('PENDING','SOLVED'),
lastmodified datetime,
created datetime
Primary key (questionid, playerid)
Sample data:
+------------+----------+-------------+----------------+
| questionid | playerid | playerscore | questionstatus |
+------------+----------+-------------+----------------+
| 1 | 1 | 5 | PENDING |
+------------+----------+-------------+----------------+
| 1 | 2 | 10 | SOLVED |
+------------+----------+-------------+----------------+
| 3 | 1 | 10 | SOLVED |
+------------+----------+-------------+----------------+
| 2 | 3 | 10 | SOLVED |
+------------+----------+-------------+----------------+
Each questionid has some points associated with it and when a player solves the problem correctly he is awarded the points and an entry in made in the codescore table and the questionstatus for that entry is set to SOLVED. If the player has partially solved a problem then some part of the total points is awarded to the player and an entry is made in the table with questionstatus set to PENDING.
A player can solve a given problem any number of times as long as the status is PENDING, to improve his score. Once the status is changed to SOLVED then no more score update is done. Though the player is still allowed to solve the problem for practicing.
Now the main part of the problem:
Find top 10 players (order by score in decreasing order) for each questionlevel (EASY, MODERATE, HARD)
Using the following query to find top 10 players for the HARD level:
SELECT playerid, SUM(playerscore) AS score FROM codescore
WHERE questionlevel = 'HARD'
GROUP BY playerid
ORDER BY score DESC
LIMIT 10;
Is the above query going to work well if the records in the table crosses 500K or 1 Million rows in production environment?
Is there any better solution for this type of problem or a better query that can be used to accomplish this task?
[Backend: using MySQL and CodeIgniter]
I have a table with scores like this:
score | user
-------------------
2 | Mark
4 | Alex
3 | John
2 | Elliot
10 | Joe
5 | Dude
The table is gigantic in reality and the real scores goes from 1 to 25.
I need this:
range | counts
-------------------
1-2 | 2
3-4 | 2
5-6 | 1
7-8 | 0
9-10 | 1
I've found some MySQL solutions but they seemed to be pretty complex some of them even suggested UNION but performance is very important. As mentioned, the table is huge.
So I thought why don't you simply have a query like this:
SELECT COUNT(*) as counts FROM score_table GROUP BY score
I get this:
score | counts
-------------------
1 | 0
2 | 2
3 | 1
4 | 1
5 | 1
6 | 0
7 | 0
8 | 0
9 | 0
10 | 1
And then with PHP, sum the count of scores of the specific ranges?
Is this even worse for performance or is there a simple solution that I am missing?
Or you could probaly even make a JavaScript solution...
Your solution:
SELECT score, COUNT(*) as counts
FROM score_table
GROUP BY score
ORDER BY score;
However, this will not returns values of 0 for count. Assuming you have examples for all scores, then the full list of scores is not an issue. You just won't get counts of zero.
You can do what you want with something like:
select (case when score between 1 and 2 then '1-2'
when score between 3 and 4 then '3-4'
. . .
end) as scorerange, count(*) as count
from score_table
group by scorerange
order by min(score);
There is no reason to do additional processing in php. This type of query is quite typical for SQL.
EDIT:
According to the MySQL documentation, you can use a column alias in the group by. Here is the exact quote:
An alias can be used in a query select list to give a column a
different name. You can use the alias in GROUP BY, ORDER BY, or HAVING
clauses to refer to the column:
SELECT
SUM(
CASE
WHEN score between 1 and 2
THEN ...
Honestly, I can't tell you if this is faster than passing "SELECT COUNT(*) as counts FROM score_table GROUP BY score" into PHP and letting PHP handle it...but it add a level of flexibility to your setup. Create a three column table as 'group_ID', 'score','range'. insert values into it to get your groupings right
1,1,1-2
1,2,1-2
1,3,3-4
1,4,3-4
etc...
Join to it on score, group by range. THe addition of the 'group_ID' allows you to set groups...maybe have group 1 break it into groups of two, and let a group_ID = 2 be a 5 set range (or whatever you might want).
I find the table use like this is decently fast, requires little code changing, and can readily be added to if you require additional groupings or if the groupings change (if you do the groupings in code, the entire case section needs to be redone to change the groupings slightly).
How about this:
select concat((score + (1 * (score mod 2)))-1,'-',(score + (1 * (score mod 2)))) as score, count(*) from TBL1 group by (score + (1 * (score mod 2)))
You can see it working in this fiddle: http://sqlfiddle.com/#!2/215839/6
For the input
score | user
-------------------
2 | Mark
4 | Alex
3 | John
2 | Elliot
10 | Joe
5 | Dude
It generates this:
range | counts
-------------------
1-2 | 2
3-4 | 2
5-6 | 1
9-10 | 1
If you want a simple solution which is very powerful, add an extra field within your table and put a value in it for the score so 1 and 2 have the value 1, 3 and 4 has 2. With that you can group by that value. Only by inserting the score you've to add an extra field. So your table looks like this:
score | user | range
--------------------------
2 | Mark | 1
4 | Alex | 2
3 | John | 2
2 | Elliot | 1
10 | Joe | 5
5 | Dude | 3
Now you can do:
select count(score),range from table group by range;
This is always faster if you've an application where selecting has prior.
By inserting do this:
$scoreRange = 2;
$range = ceil($score/$scoreRange);
I have a table which contains a standard auto-incrementing ID, a type identifier, a number, and some other irrelevant fields. When I insert a new object into this table, the number should auto-increment based on the type identifier.
Here is an example of how the output should look:
id type_id number
1 1 1
2 1 2
3 2 1
4 1 3
5 3 1
6 3 2
7 1 4
8 2 2
As you can see, every time I insert a new object, the number increments according to the type_id (i.e. if I insert an object with type_id of 1 and there are 5 objects matching this type_id already, the number on the new object should be 6).
I'm trying to find a performant way of doing this with huge concurrency. For example, there might be 300 inserts within the same second for the same type_id and they need to be handled sequentially.
Methods I've tried already:
PHP
This was a bad idea but I've added it for completeness. A request was made to get the MAX() number for the item type and then add the number + 1 as part of an insert. This is quick but doesn't work concurrently as there could be 200 inserts between the request for MAX() and that particular insert leading to multiple objects with the same number and type_id.
Locking
Manually locking and unlocking the table before and after each insert in order to maintain the increment. This caused performance issues due to the number of concurrent inserts and because the table is constantly read from throughout the app.
Transaction with Subquery
This is how I'm currently doing it but it still causes massive performance issues:
START TRANSACTION;
INSERT INTO objects (type_id,number) VALUES ($type_id, (SELECT COALESCE(MAX(number),0)+1 FROM objects WHERE type_id = $type_id FOR UPDATE));
COMMIT;
Another negative thing about this approach is that I need to do a follow up query in order to get the number that was added (i.e. searching for an object with the $type_id ordered by number desc so I can see the number that was created - this is done based on a $user_id so it works but adds an extra query which I'd like to avoid)
Triggers
I looked into using a trigger in order to dynamically add the number upon insert but this wasn't performant as I need to perform a query on the table I'm inserting into (which isn't allowed so has to be within a subquery causing performance issues).
Grouped Auto-Increment
I've had a look at grouped auto-increment (so that the number would auto-increment based on type_id) but then I lose my auto-increment ID.
Does anybody have any ideas on how I can make this performant at the level of concurrent inserts that I need? My table is currently InnoDB on MySQL 5.5
Appreciate any help!
Update: Just in case it is relevant, the objects table has several million objects in it. Some of the type_id can have around 500,000 objects assigned to them.
Use transaction and select ... for update. This will solve concurrency conflicts.
In Transaction with Subquery
Try to make index on column type_id
I think by making index on column type_id it will speed up your subquery.
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,type_id INT NOT NULL
);
INSERT INTO my_table VALUES
(1,1),(2,1),(3,2),(4,1),(5,3),(6,3),(7,1),(8,2);
SELECT x.*
, COUNT(*) rank
FROM my_table x
JOIN my_table y
ON y.type_id = x.type_id
AND y.id <= x.id
GROUP
BY id
ORDER
BY type_id
, rank;
+----+---------+------+
| id | type_id | rank |
+----+---------+------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 4 | 1 | 3 |
| 7 | 1 | 4 |
| 3 | 2 | 1 |
| 8 | 2 | 2 |
| 5 | 3 | 1 |
| 6 | 3 | 2 |
+----+---------+------+
or, if performance is an issue, just do the same thing with a couple of #variables.
Perhaps an idea to create a (temporary) table for all rows with a common "type_id".
In that table you can use auto-incrementing for your num colomn.
Then your num shoud be fully trustable.
Then you can select your data and update your first table.
I please need some help:
I have this database, which has this fields with their respect values:
agency_id | hostess_id
3 | 12-4-6
5 | 19-4-7
1 | 1
In hostess_id are stored all hostesses ids that are associated with that agency_id but separated with a "-"
Well, i login as a hostess, and i have the id=4
I need to retrieve all the agency_id which contain the id=4 , i can't do this with like operator.. i tried to do it by saving the hostess_id row to an array, then implode it, but i can't resolve it like this.
Please, please any idea?
You should change your database design. What you are describing is a typical N:N relation
Agencies:
agency_id | name
3 | Miami
5 | Annapolis
1 | New York
Hosteses
Hostes_id | name
4 | Helen
12 | May
19 | June
AgencyHostes
Hostes_id | agency_id
4 | 1
4 | 3
4 | 5
12 | 1
12 | 3
19 | 1
First, let me say that I absolutely agree with #JvdBerg on that this is terrible database design that needs to be normalized.
Let's think for a minute though, that you have no way of changing the database layout and that you must solve this with SQL, an inefficient but working solution would be
select agency_id from tablename where
hostess_id LIKE '4-%' OR
hostess_id LIKE '%-4-%' OR
hostess_id LIKE '%-4'
if you were searching for all agencies with hostess id 4. I build this on sqlfiddle to illustrate more thoroughly http://sqlfiddle.com/#!2/09a52/1
Mind though, that this SQL statement is hard to optimize since an index structure for substring matching is rarely employed. For very short id lists it will work okay though. If you have ANY chance at changing the table structure, normalize your schema like #JvdBerg suggested and look up database design and normal forms on google.
Ok, so I have an organization id column named org_id in several databases.
I am writing a search function that will query two of these databases, and look for all org_id's associated with a value in these two tables.
I ONLY want to ultimately have the org_id's that correspond with the values I am querying for in EACH table.
For example:
Say I have an org_id of 3 that is in the tables cult_xref and cat_xref, which are both associated with an organization table, which isn't really relevant for this. I want to ONLY pull those org_id's which are in BOTH tables cult_xref and cat_xref, based on values I put into those tables, say 2 and 6 respectively.
So:
cult_xref
org_id | cult_id
3 | 2
4 | 2
3 | 5
and
cat_xref
org_id | cat_id
3 | 6
3 | 1
7 | 6
I would only want to pull the org_id's that fulfill cult_id['2'] and cat_id['6'] at the SAME TIME.
In SQL, this is called a JOIN.
SELECT org_id
FROM cult_xref INNER JOIN cat_xref USING (org_id)
WHERE (cult_id,cat_id) = (2,6)
For a nice introduction to joins, see A Visual Explanation of SQL Joins