I have a user table which contain a membergroupids, and user table looks like this:
userid membergroupids
1 1,2
2 2,3
3 2,3,4
and I want to use sql to output a result like this
membergroupid count
1 1
2 3
3 2
4 1
I tried use SELECT membergroupids FROM user, then use php to loop through the result and get the count, but it works with small set of user table, but I have a really big user table, the select query itself will take more than 1min to finish, is there better way to do this?
There is a much better way to do it. Your tables need to be normalized:
Instead of
userid membergroupids
1 1,2
2 2,3
3 2,3,4
It needs to be
userid membergroupids
1 1
1 2
2 2
2 3
3 2
3 3
3 4
From here, it's a simple query to get the counts (assuming this table is called your_table:
select count(membergroupids) as numberofgroups, userid
from your_table
group by userid
order by userid
The real problem, then, is getting your tables normalized. If you only have 9 membergroupids, then you could use a like '%1%' to find all userids with membergroupid #1. But if you have 10, then it won't be able to distinguish between 1 and 10. And sadly, you can't count on the commas to help you distinguish because the number might not be surrounded by commas.
unless...
Create new field with group ids encapsulated by commas
you could create a new field and populate it with membergroupids and surround it with commas by using concat (check your database's docs). Something along this line:
update your_table set temp=concat(',', membergroupids, ',');
This could give you a table structure like so:
userid membergroupids temp
1 1,2 ,1,2,
2 2,3 ,2,3,
3 2,3,4 ,2,3,4,
Now, you have the ability to grab distinct member group ids in the new field, ie, where temp like '%,1,%' to find userids with membergroupid 1. (They will be encapsulated by commas) Now, you can manually build your new normalized table which I'll call user_member.
Insert membergroupid 1:
insert into user_member (userid,membergroupid) select userid,'1' from your_table where temp like '%,1,%';
You could make a php script that loops through all the membergroupids.
Keep in mind that like %...% is not very efficient, so don't even think about relying on this to do your count. It'll work, but it's not scalable. It would be much better to use this to build the normalized table.
It's easy to do your purpose IF the data structure is as like as below:
SELECT `membergroupids`, COUNT(`membergroupids`) as
CountOfMembergroupids FROM `TBL_TEST01` WHERE 1
GROUP BY `membergroupids`
ORDER BY `userid`
As you mentioned that you have to proceed with large amount of data..., I'd strongly suggest that you could revise your table structure as above...
Related
I have a table of hashtags that looks something like this:
TABLE: hashtags
hashtag_id hashtag_name - UNIQUE
1 apples
2 oranges
3 life
And a table of topics that have had these hashtags tagged to them like so:
TABLE: topic_hashtags, unique index on hashtag_id and topic_id
topic_hashtag_id hashtag_id topic_id
1 1 2
2 3 18
3 3 30
4 2 15
I've been trying to make a MYSQL statement (to no success) that can do the following in pseudocode:
INSERT INTO topic_hashtags (hashtag_id, topic_id)
VALUES(SELECT LAST_INSERT_ID(INSERT INTO hashtags (hashtag_name) VALUES('somehashtag')
ON DUPLICATE KEY SELECT hashtag_id FROM hashtags WHERE hashtag_name = 'somehashtag'), 'x');
Basically I need to get the ID of the hashtag name, if it does not exist, then insert it and get the last inserted id, then use these values to record it into the hashtag_topics table. The thing is, ON DUPLICATE KEY doesn't allow you to select a column instead. Is there another way I can do this, or is it just faster to use multiple queries? (I feel like it's not though). I really don't want to have to do:
SELECT id from hashtags
if mysqli_num_rows($result) == 1):
use the id and insert
endif;
else
insert
mysqli_insert_id
use that id and insert
You can't do all of that in one query. Just use multiple queries. It will not create excessive load because modern databases and file systems are excellent at caching.
I have a one-to-many relationship of rooms and their occupants:
Room | User
1 | 1
1 | 2
1 | 4
2 | 1
2 | 2
2 | 3
2 | 5
3 | 1
3 | 3
Given a list of users, e.g. 1, 3, what is the most efficient way to determining which room is completely/perfectly filled by them? So in this case, it should return room 3 because, although they are both in room 2, room 2 has other occupants as well, which is not a "perfect" fit.
I can think of several solutions to this, but am not sure about the efficiency. For example, I can do a group concatenate on the user (ordered ascending) grouping by room, which will give me comma separated strings such as "1,2,4", "1,2,3,5" and "1,3". I can then order my input list ascending and look for a perfect match to "1,3".
Or I can do a count of the total number of users in a room AND containing both users 1 and 3. I will then select the room which has the count of users equal to two.
Note I want to most efficient way, or at least a way that scales up to millions of users and rooms. Each room will have around 25 users. Another thing I want to consider is how to pass this list to the database. Should I construct a query by concatenating AND userid = 1 AND userid = 3 AND userid = 5 and so on? Or is there a way to pass the values as an array into a stored procedure?
Any help would be appreciated.
For example, I can do a group concatenate on the user (ordered ascending) grouping by room, which will give me comma separated strings such as "1,2,4", "1,2,3,5" and "1,3". I can then order my input list ascending and look for a perfect match to "1,3".
First, a word of advice, to improve your level of function as a developer. Stop thinking of the data, and of the solution, in terms of CSVs. It limits you to thinking in spreadsheet terms, and prevents you from thinking in Relational Data terms. You do not need to construct strings, and then match strings, when the data is in the database, you can match it there.
Solution
Now then, in Relational data terms, what exactly do you want ? You want the rooms where the count of users that match your argument user list is highest. Is that correct ? If so, the code is simple.
You haven't given the tables. I will assume room, user, room_user, with deadly ids on the first two, and a composite key on the third. I can give you the SQL solution, you will have to work out how to do it in the non-SQL.
Another thing I want to consider is how to pass this list to the database. Should I construct a query by concatenating AND userid = 1 AND userid = 3 AND userid = 5 and so on? Or is there a way to pass the values as an array into a stored procedure?
To pass the list to the stored proc, because it needs a single calling parm, the length of which is variable, you have to create a CSV list of users. Let's call that parm #user_list. (Note, that is not contemplating the data, that is passing a list to a proc in a single parm, because you can't pass an unknown number of identified users to a proc otherwise.)
Since you constructed the #user_list on the client, you may as well compute #user_count (the number of members in the list) while you are at it, on the client, and pass that to the proc.
Something like:
CREATE PROC room_user_match_sp (
#user_list CHAR(255),
#user_count INT
...
)
AS
-- validate parms, etc
...
SELECT room_id,
match_count,
match_count / #user_count * 100 AS match_pct
FROM (
SELECT room_id,
COUNT(user_id) AS match_count -- no of users matched
FROM room_user
WHERE user_id IN ( #user_list )
GROUP BY room_id -- get one row per room
) AS match_room -- has any matched users
WHERE match_count = MAX( match_count ) -- remove this while testing
It is not clear, if you want full matches only. In that case, use:
WHERE match_count = #user_count
Expectation
You have asked for a proc-based solution, so I have given that. Yes, it is the fastest. But keep in mind that for this kind of requirement and solution, you could construct the SQL string on the client, and execute it on the "server" in the usual manner, without using a proc. The proc is faster here only because the code is compiled and that step is removed, as opposed to that step being performed every time the client calls the "server" with the SQL string.
The point I am making here is, with the data in a reasonably Relational form, you can obtain the result you are seeking using a single SELECT statement, you don't have to mess around with work tables or temp tables or intermediate steps, which requires a proc. Here, the proc is not required, you are implementing a proc for performance reasons.
I make this point because it is clear from your question that your expectation of the solution is "gee, I can't get the result directly, I have work with the data first, I am ready and willing to do that". Such intermediate work steps are required only when the data is not Relational.
Maybe not the most efficient SQL, but something like:
SELECT x.room_id,
SUM(x.occupants) AS occupants,
SUM(x.selectees) AS selectees,
SUM(x.selectees) / SUM(x.occupants) as percentage
FROM ( SELECT room_id,
COUNT(user_id) AS occupants,
NULL AS selectees
FROM Rooms
GROUP BY room_id
UNION
SELECT room_id,
NULL AS occupants,
COUNT(user_id) AS selectees
FROM Rooms
WHERE user_id IN (1,3)
GROUP BY room_id
) x
GROUP BY x.room_id
ORDER BY percentage DESC
will give you a list of rooms ordered by the "best fit" percentage
ie. it works out a percentage of fulfilment based on the number of people in the room, and the number of people from your set who are in the room
I have a small problem with a php mysql query, I am looking for help.
I have a family tree table, where I am storing for each person his/her ancestors id separated by a comma. like so
id ancestors
10 1,3,4,5
So the person of id 10 is fathered by id 5 who is fathered by id 4 who is fathered by 3 etc...
Now I wish to select all the people who have id x in their ancestors, so the query will be something like:
select * from people where ancestors like '%x%'
Now this would work fine except, if id x is lets say 2, and a record has an ancestor id 32, this like query will retrieve 32 because 32 contains 2. And if I use '%,x,%' (include commas) the query will ignore the records whose ancestor x is on either edge(left or right) of the column. It will also ignore the records whose x is the only ancestor since no commas are present.
So in short, I need a like query that looks up an expression that either is surrounded by commas or not surrounded by anything. Or a query that gets the regular expression provided that no numbers are around. And I need it as efficient as possible (I suck at writing regular expressions)
Thank you.
Edit: Okay guys, help me come up with a better schema.
You are not storing your data in a proper way. Anyway, if you still want to use this schema you should use FIND_IN_SET instead of LIKE to avoid undesired results.
SELECT *
FROM mytable
WHERE FIND_IN_SET(2, ancestors) <> 0
You should consider redesigning your database structure. Add new table "ancestors" to database with columns:
id id_person ancestor
1 10 1
2 10 3
3 10 4
After -- use JOIN query with "WHERE IN" to choose right rows.
You're having this issue because of wrong design of database.First DBMS based db's aren't meant for this kind of data,graph based db's are more likely to fit for this kind of solution.
if it contain small amount of data you could use mysql but still the design is still wrong,if you only care about their 'father' then just add a column to person (or what ever you call it) table. if its null - has no father/unknown otherwise - contains (int) of his parent.
In case you need more then just 'father' relationship you could use a pivot table to contain two persons relationship but thats not a simple task to do.
There are a few established ways of storing hierarchical data in RDBMS. I've found this slideshow to be very helpful in the past:
Models for Hierarchical Design
Since the data deals with ancestry - and therefore you wouldn't expect it to change that often - a closure table could fit the bill.
Whatever model you choose, be sure to look around and see if someone else has already implemented it.
You could store your values as a JSON Array
id | ancestors
10 | {"1","3","4","5"}
and then query as follows:
$query = 'select * from people where ancestors like \'%"x"%\'';
Better is of course using a mapping table for your many-to-many relation
You can do this with regexp:
SELECT * FROM mytable WHERE name REGEXP ',?(x),?'
where x is your searched value
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,ancestors VARCHAR(250) NOT NULL
);
INSERT INTO my_table VALUES(10,',1,3,4,5');
SELECT *
FROM my_table
WHERE CONCAT(ancestors,',') LIKE '%,5,%';
+----+-----------+
| id | ancestors |
+----+-----------+
| 10 | ,1,3,4,5 |
+----+-----------+
SELECT *
FROM my_table
WHERE CONCAT(ancestors,',') LIKE '%,4,%';
+----+-----------+
| id | ancestors |
+----+-----------+
| 10 | ,1,3,4,5 |
+----+-----------+
This is an example of my table:
|..id.|.class...|.group....|..name....|
|..5..|....1....|.....A....|....XX....|
|.19..|....1....|.....B....|....XX....|
|.12..|....2....|.....A....|....XX....|
|.28..|....2....|.....B....|....XX....|
|..8..|....3....|.....A....|....XX....|
|.50..|....3....|.....B....|....XX....|
It has about 30 rows per class and group. What I'm trying to do is to fetch all data after the row | 12 | 2 | A | XX |. Can't just state "where class > 2" since there are still some rows with class and group 2A that I need to be in the select.
Is there a way to do that, from SELECT or maybe a Fetch() argument in PHP & Mysql
Thanks!
Try this:
SELECT * FROM `table`
WHERE
CONCAT(`CLASS`, `GROUP`, `NAME`) >= '2AMarcus'
Select all ids and loop through them creating a comma-delimited list in PHP of the ids after 12 is found. Then do your select where id in ().
Or
Create the list of ids to exclude until 12 is found. Then do select where id not in ().
It looks like you need some work on normalizing tables, out of sql sentences.
If you need the rows after Class 2 Group A Name Marcus, it says to me that something occur in real life from that point in the time, an event, so, i would add a new column for timestamp or for another data for that event, and then back to sql sentences and use that new column for the apropiate SELECT / WHERE.
I am quite new to PHP and MySQL, but have experience of VBA and C++. In short, I am trying to count the occurrences of a value (text string), which can appear in 11 columns in my table.
I think I will need to populate a single-dimensional array from this table, but the table has 14 columns (named 'player1' to 'player14'). I want each of these 'players' to be entered into the one-dimensional array (if not NULL), before proceeding to the next row.
I know there is the SELECT DISTINCT statement in MySQL, but can I use this to count distinct occurrences across 14 columns?
For background, I am building a football results database, where player1 to player14 are the starting 11 (and 3 subs), and my PHP code will count the number of times a player has made an appearance.
Thanks for all your help!
Matt.
Rethink your database schema. Try this:
Table players:
player_id
name
Table games:
game_id
Table appearances:
appearance_id
player_id
game_id
This reduces the amount of duplicate data. Read up on normalization. It allows you to do a simple select count(*) from appearances inner join players on player_id where name='Joe Schmoe'
First of all, the database schema you're using is terrible, and you just found out a reason why.
That being said, I see no other way then to first get a list of all players by distinctly selecting the names of players into an array. Before each insertion, you would have to check if the name is already in the array (if it is already in, don't add it again).
Then, when you have the list of names, you would have to run an SQL statement for each player, adding up the number of occurences, like so:
SELECT COUNT(*)
FROM <Table>
WHERE player1=? OR player2=? OR player3=? OR ... OR player14 = ?
That is all pretty complicated, and as I said, you should really change your database schema.
This sounds like a job for fetch_assoc (http://php.net/manual/de/mysqli-result.fetch-assoc.php).
If you use mysqli, you would get each row as an associative array.
On the other hand the table design seems a bit flawed, as suggested before.
If you had on table team with team name and what not and one table player with player names.
TEAM
| id | name | founded | foo |
PLAYER
| id | team_id | name | bar |
With that structure you could add 14 players, which point at the same team and by joining the two tables, extract the players that match your search.