I am an amateur programmer and am facing a problem to which I am sure exists a certain algorithm or at least procedure, maybe recursive, but I cannot figure it out by myself.
Problem:
There are different groups with group members who can be in more than one group. I now want to find out, which groups can meet on the same day because there are not more than two group members (conflicts) in two different groups. In a loop I check all the groups and by comparing two I count the conflicts, i.e. members that are in both groups. I save the result in a database that looks like that.
id
group1
group2
conflicts
1
G1
G2
3
2
G1
G3
1
3
G1
G4
2
4
G2
G3
6
5
G2
G4
1
6
G3
G4
6
I now want to know which groups can meet on the same day. The criterium for that is that there are not more than two conflicts.
The problem no is: I select G1, so I find out G1 and G2 cannot meet, because more than 2 conflicts. G1 and G3 can meet because only 1 conflict. G1 and G4 can meet, because two not more than 2 conflicts.
So the intermediate result is: G1, G3 and G4 are possible. But now I also have to check if G3 and G4 are possible and as I can tell as a human being from the table: this is not possible because there are 6 conflicts between G3 and G4. In addition it would be better to let G2 and G4 meet because they have only 1 conflict.
So the best result would be:
Day 1 - G1 and G3 meet
Day 2 - G2 and G4 meet
Is there a way to find that "best" result with a program in PHP? I somehow have to make some kind of "chained" check, starting with the value in group1 and group2 and then take the value in group2 as the new group1 value and so on... I don't find an elegant solution for that. To be honest, up to now I have not found any programmable solution.
Thank you for your help!
I already tried nested loops, but I never find the best result.
Related
This question already has an answer here:
MYSQL/PHP find the most common item associate with a given item
(1 answer)
Closed 6 years ago.
I have a dataset of "collections" or let's call them groups or wishlists...
a collection is a list of items
collectionId | itemdId
---------------------------------
123 | 2345
123 | 3465
123 | 876
123 | 567
123 | 980
777 | 980
777 | 332
777 | 3465
777 | 876
777 | 678
777 | 567
you see item 876 and 980, are included in both collections (777 and 123) so they are a popular couple/pair
my users generate these collection, and I'm curious to extract two insights:
which are the most common items (this is easy)
which are the most common pairs/couple (or more than 2) of items (this is my question)
eg.
say many wish-lists contains iphones and a pink iphone covers
among other accessories but i want to extract in fact that iphone +
that pink iphone cover is a common recurring "couple"
all in all, basically i'm trying to do what Amazon does, if you see an iphone i want to suggest you a pink iphone cover because many other users have suggested/favorited that
Do I have to compare similarity between collection first? to see who many items they have in common? than rate the similarity with an index?
what is the best approach to this with mysql.
do i need PHP as well?
UPDATE:
in PHP I would probably do something loopy like in pseudo code
for total number of collection:
select all item from collection 1
select all item from collection 2
do array_interesct (c1,c2)
store the matching items
repeat...
select all item from collection 2
do array_interesct (c1,c3)
store the matching items
repeat...
...then elect all item from collection 2 and repeat all the iterations..
For two collection you can do a join
select a.itemID
from my_table a
join my_table b on a.itemID = b.ItemID
where a.collection = 123
and b.collection = 777
for all you can try with a cartesian product (for pair two table) .. for ( 3 ..3)
select a.itemID
from my_table a
cross join my_table b
where a.item = b.item
and a.collection <> b.collection
First of all: Sorry for the long post, I am trying to explain a hard situation in an easy way and, at the same time, trying to give as much information as I can.
I have an algorithm that tries to determine user expectation during a search. There are a couple of way I can use it and I have the same problem with both of them, so, lets say I use it for disambiguation. Well, with a db structure like this one (or any other that allows the work):
post
ID | TITLE
---+----------------------------------------------
1 | Orange developed the first 7G phone
2 | Orange: the fruit of gods
3 | Theory of Colors: Orange
4 | How to prepare the perfect orange juice
keywords
ID | WORD | ABOUT
---+----------+---------
1 | orange | company
2 | orange | fruit
3 | orange | color
post_keywords
ID | POST | KEYWORD
---+-------+---------
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
4 | 4 | 2
.
If in a search box, an user make a search for the word "orange", the algorithm would look that orange may refers to the company, the color, or the fruit and, by answering a couple of questions, it tries to determine which the user is looking for. After all that I get an array like this one:
$e = array(
'fruit' => 0.153257,
'color' => 0.182332,
'company' => 0.428191,
);
In this point I know the user is probably looking for information about the fruit (because fruit's value is closer to 0) and if I am wrong my second bet goes for the color. At the bottom of the list, the company.
So, with a Join and ORDER BY FIELD(keywords.id, 2,3,1) I can give the results the (almost) perfect order:
- Orange: the fruit of gods
- How to prepare the perfect orange juice
- Theory of Colors: Orange
- Orange developed the first 7G phone
.
Well... as you can imagine, I wouldn't come for help if everything is so nice. So, the problem is that in is the previous example we have only 4 possible results, so, if the user really was looking for the company he can find this result in the 4th position and everything is okay. But... If we have 200 post about the fruit and 100 post about the color, the first post about the company come in the position 301st.
I am looking for a way to alternate the order (in a predictable and repeatable way) now that I know the user is must likely looking for the fruit, followed by the color and the company at the end. I want to be able to show a post about the fruit in the first position (and possibly the second), followed by a post about the color, followed by the company and start this cycle again until the results ends.
Edit: I'll be happy with a MySQL trick or with an idea to change the approach, but I can't accept third-party solutions.
You can use variables to provide custom sort field.
SELECT
p.*,
CASE k.about
WHEN 'company' THEN #sort_company := #sort_company + 1
WHEN 'color' THEN #sort_color := #sort_color + 1
WHEN 'fruit' THEN #sort_fruit := #sort_fruit + 1
ELSE NULL
END AS sort_order,
k.about
FROM post p
JOIN post_keywords pk ON (p.id = pk.post)
JOIN keywords k ON (pk.keyword = k.id)
JOIN (SELECT #sort_fruit := 0, #sort_color := 0, #sort_company := 0) AS vars
ORDER BY sort_order, FIELD(k.id, 2, 3, 1)
Result will look like this:
| id | title | sort_order | about |
|---:|:----------------------------------------|-----------:|:--------|
| 2 | Orange: the fruit of gods | 1 | fruit |
| 3 | Theory of Colors: Orange | 1 | color |
| 1 | Orange developed the first 7G phone | 1 | company |
| 4 | How to prepare the perfect orange juice | 2 | fruit |
I think you do need some way of categorizing, or, I would prefer to say, clustering the answers. If you can do this, you can then start by showing the users the top scoring answer from each cluster. Hey, sometimes maximising diversity really is worth doing just for its own sake!
I think you should be able to cluster answers. You have some sort of scoring formula which tells you how good an answer a document is to a user query, perhaps based on a "bag of words" model. I suggest that you use this to tell how close one document is to another document by treating the other document as a query. If you do exactly this you might want to treat each document as a query with the other as an answer and average the two scores, so that the the score d(a, b) has the property that d(a, b) = d(b, a).
Now you have a score (unfortunately probably not a distance: that is, with a score, high values mean close together) and you need a clustering algorithm. Ideally you want a fast one, but maybe it just has to be fast enough to be faster than a human reading through the answers.
One fast clustering algorithm is to keep track of N (for some parameter N) cluster centres. Initialise these to the first N documents retrieved, then consider every other document one at a time. At each stage you are trying to reduce the maximum score found between any two documents in the cluster centre (which amounts to getting the documents as far apart as possible). When you consider a new document, compute the score between that document and each of the N current cluster centres. If the maximum of these scores is less than the current maximum score between the N current cluster centres, then this document is further away from the cluster centres than they are from each other so you want it. Swap it with one of the N cluster centres - whichever one makes that maximum score between the new N cluster centres the least.
This isn't a perfect clustering algorithm - for one thing, the result depends on the order in which documents are presented, which is a bad sign. It is, however, reasonably fast for small N, and it has one nice property: if you you have k <=N clusters, and (switching from scores to distances) every distance within a cluster is smaller than every distance between two points from different clusters, then the N cluster centres at the end will include at least one point from each of the k clusters. The first time you see a member of a cluster you haven't seen before, it will become a cluster centre, and you will never reduce the number of cluster centers held, because you would be ejecting a point which in a different cluster from the other centres, which won't increase the minimum distance between any two points held as cluster centres (reduce the maximum score between any two such points).
I have a "users" table like this:
+-----+------------+---------+---------------+
| uid | first_name | surname | email |
+-----+------------+---------+---------------+
1 joe bloggs joe#test.com
2 bill bloggs bill#test.com
3 john bloggs john#test.com
4 karl bloggs karl#test.com
and a "connections" table like this:
+----+---------+----------+--------+
| id | user_id | user_id2 | status |
+----+---------+----------+--------+
1 1 3 1
2 3 1 1
3 4 3 1
4 3 4 1
5 2 3 1
6 3 2 1
Here id is auto auto_increment user id is saved in either user_id or user_id2. Status 1 means connection is approved and active.
Now I want to send an email alert to users with profile suggestion like Facebook or LinkedIn do. I assume it is possible to get mutual connections between users but not sure how to do. I have tried but it is not perfect. I want to get these all with one mysql query with user and their suggested connection profile. Any idea how to do this?
Many thanks in advance!
Such algorithms are never perfect: you can never know exactly if two people know each other. People might live in the same building, go to the same work, have 100 friends in common and even share the same hobbies without knowing each other (of course the odds are not that great).
What Social networks do exactly is of course unknown (that's part of the way they make money). But some aspects are known. For instance the number of mutual friends are important (together with for instance location, interests, hobbies, education, work, surname,...).
Based on what you provide, one can more or less only use the number of mutual friends. This can be done using the following query:
SELECT a.user_id, b.user_id2, count(*) --Select the two ids and count the number of transitive relations
FROM connections as a, connections as b --Use the table twice (transitivity)
WHERE a.user_id2 = b.user_id -- Transitivity constraint
AND a.user_id < b.user_id2 -- Maintain strict ordening (can be dropped when checked)
AND a.status = 1 -- First relation must be confirmed.
AND b.status = 1 -- Second connection must be confirmed.
AND NOT EXISTS ( -- Not yet friends
SELECT *
FROM connections as c
WHERE c.user_id = a.user_id
AND c.user_id2 = b.user_id2
)
GROUP BY a.user_id, b.user_id2 -- Make sure we count them correctly.
As you can see here, the fiddle calculates that (1,2), (1,4) and (2,4) are not yet friends, and all have one mutual friend.
Once the number of mutual friends surpasses a certain threshold, one can propose friendship.
I would however advice you to make your table more compact: add a CHECK to the table such that user_id is always strictly less than user_id2 (CHECK(user_id < user_id2)). This makes the database more compact, for most implementations of a database tool faster as well and queries will become simpler. What is after all the difference between (1,3,1) and (3,1,1).
I have a problem trying to apply rules about direct matches in a football[soccer] app. I have read this tread and it was very heplful on creating the standing positions table by the points criteria, difference and scored goals.
But i would like to know if is possible to order the teams position by direct matches:
look this positions table:
Pos Team Pld W D L F A GD Pts
1 FC Barcelona 5 2 3 0 8 5 3 9
2 **Inter Milan** 6 2 2 2 11 10 1 8
3 *Real Madrid* 6 2 2 2 8 8 0 8
4 AC Milan 5 0 3 2 8 12 -4 3
As you may see Inter Milan and Real Madrid are tied by points, and the Inter is heading real madrid because its goal difference. The result that i want to get is this :
Pos Team Pld W D L F A GD Pts
1 FC Barcelona 5 2 3 0 8 5 3 9
2 **Real Madrid** 6 2 2 2 8 8 0 8
3 *Inter Milan* 6 2 2 2 11 10 1 8
4 AC Milan 5 0 3 2 8 12 -4 3
Notice that in this time the real madrid is in front the inter milan because it won the two direct matches between them.
i have a table for teams and other for the results.
I would like to achive this using a query in mysql if is possible. Or maybe it would be better if i do this ordering on the server side (PHP).
Thanks any help would be appreciated.
It is impossible to efficiently do what you request in a single query that would return the results you ask for and sort the ties in points with that criteria.
The reasoning is simple: lets assume that you could get a column in your query that would provide or help with the kind of sorting you want. That is to say, it would order teams that are tied in points according to which one has more victories over the others (as this is very likely to happen to more than 2 teams). To make that calculation by hand you would need a double-entry table that shows the amount of matches won between those teams as follows:
| TeamA | TeamB | TeamC
------------------------------
TeamA | 0 | XAB | XAC
TeamB | XBA | 0 | XBC
TeamC | XCA | XCB | 0
So you would just add up each column row and sorting in descending order would provide you the needed data.
The problem is that you don't know which teams are tied before you actually get the data. So creating that column for the general case would mean you need to create the whole table of every team against every team (which is no small task); and then you need to add the logic to the query to only add up the columns of a team against those that are tied with it in points... for which you need the original result set (that you should be creating with the same query anyhow).
It may be possible to get that information in a single query, but it will surely be way too heavy on the DB. You're better off adding that logic in code afterwards getting the data you know you will need (getting the amount of games won by TeamA against TeamB or TeamC is not too complicated). You would still need to be careful about how you build that query and how many you run; after all, during the first few games of a league you will have lots of teams tied up against each other so getting the data will effectively be the same as building the whole double-entry table I used as an example before for all teams against all teams.
create temporary in a stored procedure and call to procedure...
create temporary table tab1 (position int not null auto_increment ,
team_name varchar(200),
points int,
goal_pt int,
primary key(position));
insert into tab1(team_name,
points,
goal_pt)
select team_name,
points,
goal_pt
from team
order by points desc,
goal_pt desc ;
first question from me and it's a bit of a complex one, so do guide me through any etiquette you're all used to! Right, bit of a complex one here so I shall describe what I want first before going into code.
MySQL database, I have a table of organisations, a table of services, and a table that cross-references organisations with services. I'm building a search function that will retrieve the organisations that match ANY selected services, but will sort them based on how many services they match AND list what those services are.
OK, the pertinent bits of code n whatnot:
Organisations table (organisations):
id: INT, AI
name: VARCHAR(150)
Services table (services):
id: INT, AI
name: VARCHAR(150)
Cross-reference table (xref_org_services):
org_id: INT
service_id: INT
I can get a list of organisations that match any of the services I search for:
SELECT DISTINCT organisations.name AS org
FROM organisations
JOIN xref_org_services ON organisations.id=xref_org_services.org_id
WHERE xref_org_services.services_id IN (x, y, z)
ORDER BY org ASC
where x, y, z would be the ID of the services I've selected. Done, not a problem.
Now I want to return how many services each of the organisations match, and what they are. I've had a try with
SELECT organisations.name AS org, COUNT(xref_org_services.services_id) AS matched
but that's giving me the number of rows matched total which isn't what I want. Am I looking at a nested SELECT somewhere to do the count?
Ultimately, my output needs to go something like this:
Org_1 matched 4 services
- Serv_1
- Serv_3
- Serv_4
Org_2 matched 3 services
- Serv_1
- Serv_2
- Serv_4
Org_3 matched 1 service
- Serv_2
Am I making any sense at all? Let me throw some "data" into this then to see if I can clarify things a bit.
organisations:
id | name
-------------------------
1 | A1 Train
2 | Faux LLC
3 | Shakey Practice
services:
id | name
-------------------------
1 | PTS
2 | Track safety
3 | Assessors
4 | Structural
5 | Signalling
xref_org_services:
org_id | services_id
-------------------------
1 | 1
1 | 2
1 | 4
2 | 1
2 | 4
3 | 5
So, this means:
A1 Train offers PTS, Track Safety and Structural
Faux LLC offers PTS and Structural
Shakey Practice offers Signalling
If I want to retrieve all organisations that offer PTS, Track Safety and Structural I want the output to say something along the lines of:
A1 Train matched
- PTS
- Track Safety
- Structural
Faux LLC matched
- PTS
- Structural
I hope this makes sense. Pointers, hints, full answers all welcome. Let me know if you need clarifications or further information.
Thanks
You want to use GROUP BY:
SELECT
organisations.name AS org,
COUNT(*) AS matched,
GROUP_CONCAT(services.name SEPARATOR "\n") AS services
FROM
organisations
LEFT JOIN xref_org_services
ON xref_org_services.org_id = organisations.id
LEFT JOIN services
ON services.id = xref_org_services.services_id
WHERE xref_org_services.services_id IN (x, y, z)
GROUP BY organisations.id
ORDER BY org ASC;