Ordering users with most matching user's skills

Ordering users with most matching user's skills - php

I want to show recommended users to a user in descending order where skills match the most.
Issue is that I am storing skill in single field in this form
user_skill
musician,pop,singer
Note: this will be input musician,pop,singer..
so what I want to achieve I want to show users have all three skills at top,then those having two at last those having only one.
so out put will be like
**user_name skills**
sam musician,pop,singer
smith musician,pop,singer
ali musician,singer
nasira musicain,pop
siri musician
taylor pop
andrew singer
Can this be achieved by single mysql query?
If this not possible can this be done by php code.I don't want to change table structure as this will require lots of redo.
Thanks for your help.

You might be able to do with with SQL but the query would be so complex i probably wouldnt.
On the php side you could iterate over the result set and tally the skills for each user_name within a given set of skills you are looking for, and then just sort them:
$desired = explode(',', $input);
$users = array();
// i presume you have the query worked out to find users with any one of input
// skills attributed to them so lets say that $stmt is the PDO statement where you
// have executed that query
while (false === ($row = $stmt->fetch(PDO::FETCH_ASSOC))) {
$row['skills'] = explode(',', $row['skills']);
// assign an array containing only skills that were in your $input
// to $row['desired_skills']
$row['desired_skills'] = array_intersect($desired, $row['skills']);
$row['nb_desired_skills'] = count($row['desired_skills']);
$user[] = $row;
}
usort($users, function ($a, $b) {
return $b['nb_desired_skills'] - $a['nb_desired_skills'];
});
// now you can loop over $users and display the fields you need
However, the definition and attribution of these skills is a key part of you application, you should just normalize the tables now... its only going to become a larger refactor by putting it off.

I don't want to change table structure as this will require lots of redo.
I strongly (!) recommend you make a separate skills table, and link that to your users table using foreign keys. You will be infinitely thankful to yourself once your project gets even slightly more complex.
If you don't know how, I'm sure people on this site would help you with a script that does the conversion if you were to post another question.
MySQL
CREATE TABLE skills
(
name VARCHAR(64) NOT NULL,
user_id INT NOT NULL,
PRIMARY KEY (name, user_id),
FOREIGN KEY (user_id) REFERENCES users(id)
);
This way you can easily select users by most skills using a simple subquery (assuming you have a users table with an id column):
MySQL
SELECT *,
(SELECT COUNT(*)
FROM skills
WHERE user_id = u.id) AS num_skills
FROM users u
ORDER BY num_skills DESC;
So why would you take this approach instead of relying on PHP?
Creates minimal extra overhead in database,
significantly reduces chatter between database and webserver by not requiring to query the entire users table if you just want, for example, only the top 10 users,
enables more complex and varying queries in the future easily.
The following MySQL snippet queries only those users who have a given skill (for example, "singer"), then sorts them in descdending order based on number of skills. The query does not return users with zero skills:
MySQL
SELECT *,
(SELECT COUNT(*)
FROM skills
WHERE user_id = u.id
AND name = 'singer') AS num_skills
FROM users u
WHERE num_skills > 0
ORDER BY num_skills DESC;
Of course, you can also search for skills by id just replace the second part of the WHERE clause inside the subquery with AND id = 3, which will query those users, who have a skill with ID of 3.
The next step towards a more optimal database would be creating a real skills table, which stores only the skills that you have registered in your database, and a user_skills table, that links it and the users table together.
This would enable you to significantly reduce database size on the long run, and be able to run complex but 'clean' queries that do not depend on the webserver.

Related

JOIN query too slow on real database, on small one it runs fine

I need help with this mysql query that executes too long or does not execute at all.
(What I am trying to do is a part of more complex problem, where I want to create PHP cron script that will execute few heavy queries and calculate data from the results returned and then use those data to store it in database for further more convenient use. Most likely I will make question here about that process.)
First lets try to solve one of the problems with these heavy queries.
Here is the thing:
I have table: users_bonitet. This table has fields: id, user_id, bonitet, tstamp.
First important note: when I say user, please understand that users are actually companies, not people. So user.id is id of some company, but for some other reasons table that I am using here is called "users".
Three key fields in users_bonitet table are: user_id ( referencing user.id), bonitet ( represents the strength of user, it can have 3 values, 1 - 2 - 3, where 3 is the best ), and tstamp ( stores the time of bonitet insert. Every time when bonitet value changes for some user, new row is inserted with tstamp of that insert and of course new bonitet value.). So basically some user can have bonitet of 1 indicating that he is in bad situation, but after some time it can change to 3 indicating that he is doing great, and time of that change is stored in tstamp.
Now, I will just list other tables that we need to use in query, and then I will explain why. Tables are: user, club, club_offer and club_territories.
Some users ( companies ) are members of a club. Member of the club can have some club offers ( he is representing his products to the people and other club members ) and he is operating on some territory.
What I need to do is to get bonitet value for every club offer ( made by some user who is member of a club ) but only for specific territory with id of 1100000; Since bonitet values are changing over time for each user, that means that I need to get the latest one only. So if some user have bonitet of 1 at 21.01.2012, but later at 26.05.2012 it has changed to 2, I need to get only 2, since that is the current value.
I made an SQL Fiddle with example db schema and query that I am using right now. On this small database, query is working what I want and it is fast, but on real database it is very slow, and sometimes do not execute at all.
See it here: http://sqlfiddle.com/#!9/b0d98/2
My question is: am I using wrong query to get all this data ? I am getting right result but maybe my query is bad and that is why it executes so slow ? How can I speed it up ? I have tried by putting indexes using phpmyadmin, but it didn't help very much.
Here is my query:
SELECT users_bonitet.user_id, users_bonitet.bonitet, users_bonitet.tstamp,
club_offer.id AS offerId, club_offer.rank
FROM users_bonitet
INNER JOIN (
SELECT max( tstamp ) AS lastDate, user_id
FROM users_bonitet
GROUP BY user_id
)lastDate ON users_bonitet.tstamp = lastDate.lastDate
AND users_bonitet.user_id = lastDate.user_id
JOIN users ON users_bonitet.user_id = users.id
JOIN club ON users.id = club.user_id
JOIN club_offer ON club.id = club_offer.club_id
JOIN club_territories ON club.id = club_territories.club_id
WHERE club_territories.territory_id = 1100000
So I am selecting bonitet values for all club offers made by users that are members of a club and operate on territory with an id of 1100000. Important thing is that I am selecting club_offer.id AS offerId, because I need to use that offerId in my application code so I can do some calculations based on bonitet values returned for each offer, and insert data that was calculated to the field "club_offer.rank" for each row with the id of offerId.

Your query looks fine. I suspect your query performance may be improved if you add a compound index to help the subquery that finds the latest entry from users_botinet for each user.
The subquery is:
SELECT max( tstamp ) AS lastDate, user_id
FROM users_bonitet
GROUP BY user_id
If you add (user_id, tstamp) as an index to this table, that subquery can be satisfied with a very efficient loose index scan.
ALTER TABLE users_bonitet ADD KEY maxfinder (user_id, tstamp);
Notice that if this users_botinet table had an autoincrementing id number in it, your subquery could be refactored to use that instead of tstamp. That would eliminate the possibility of duplicates and be even more efficient, because there's a unique id for joining. Like so.
FROM users_botinet
INNER JOIN (
SELECT MAX(id) AS id
FROM users_botinet
GROUP BY user_id
) ubmax ON users_botinet.id = ubmax.id
In this case your compound index would be (user_id, id.
Pro tip: Don't add lots of indexes unless you know you need them. It's a good idea to read up on how indexes can help you. For example. http://use-the-index-luke.com/

Storing MySQL values as integers

I have two database tables that I am using to create a Twitter-style following system.
sh_subscriptions
=> id
=> user_id
=> feed_id
sh_feeds
=> id
=> item
=> shop_name
=> feed_id
The problem with storing feed_id rather than shop_name in sh_subscriptions is that it requires a lot of table joining:
$id = $_POST['id'];
$user_id = $id['id'];
$shop_name = mysqli_escape_string($con, $_POST['shop_name']);
$query = "SELECT * FROM sh_subscriptions s INNER JOIN sh_feeds f ON s.feed_id = f.feed_id WHERE s.user_id = $user_id AND f.shop_name = '$shop_name'";
$result = mysqli_query($con, $query) or die(mysqli_error($con));
if (mysqli_num_rows($result) > 0)
{
$query2 = "DELETE FROM sh_subscriptions s INNER JOIN sh_feeds f ON s.feed_id = f.feed_id WHERE s.user_id = $user_id AND f.shop_name = '$shop_name'";
$result2 = mysqli_query($con, $query2) or die(mysqli_error($con));
}
else
{
// insert the row instead
}
(I know there's an error somewhere in the if statement, but I'll worry about that later.)
If I were to replace feed_id with shop_name, I would be able to replace line 5 with this:
$query = "SELECT * FROM sh_subscriptions WHERE user_id = $user_id AND shop_name = '$shop_name'";
My question is: is it always preferable to store MySQL values as integers where possible, or in a situation like this, would it be faster to have sh_subscriptions contain shop_name rather than feed_id?

Your sh_subscriptions table is actually a many-to-many join table that relates users to feeds. This is considered a fine way to design database schemas.
Your basic concept is this: you have a collection of users and a collection of feeds. Each user can subscribe to zero or more feeds, and each feed can have zero or more subscribers.
To enter a subscription you create a row in the sh_subscriptions table. To cancel it you delete the row.
You say there's "a lot of table joining." With respect, this is not a lot of table joining. MySQL is made for this kind of joining, and it will work well.
I have some suggestions about your sh_subscriptions table.
get rid of the id column. Instead make the user_id and feed_id columns into a composite primary key. That way you will automatically prevent duplicate subscriptions.
add an active column ... a short integer ... to the table. When it is set to a value of 1 your suscription is active. That way you can cancel a subscription by setting active to 0.
you might also add a subscribed_date column if you care about that.
create two compound non unique indexes (active,user_id,feed_id) and (active,feed_id,userId) on the table. These will greatly accelerate queries that join tables like this.
Query fragment:
FROM sh_feed f
JOIN sh_subscription s ON (f.feed_id = s.feed_id AND s.active = 1)
JOIN sh_users u ON (s.user_id = u.user_id)
WHERE f.shop_name = 'Joe the Plumber'
If you get to the point where you have hundreds of millions of users or feeds, you may need to consider denormalizing this table.. that is, for example, relocating the shop name text so it's in the sh_subscriptions table. But not now.
Edit I am proposing multiple compound covering indexes. If you're joining feeds to users, for example, MySQL starts satisfying your query by determining the row in sh_feeds that matches your selection.
It then determines the feed_id, and random-accesses your compound index on feed_id. Then, it needs to look up all the user_id values for that feed_id. It can do that by scanning the index from the point where it random-accessed it, without referring back to the table. This is very fast indeed. It's called a covering index.
The other covering index deals with queries that start with a known user and proceed to look up the feeds. The order of columns in indexes matters: random access can only start with the first (leftmost) column of the index.
The trick to understand is that these indexes are both randomly accessible and sequentially scannable.
one other note If you only have two columns in the join table, one of your covering indexes is also your primary key, and the other contains the columns in the reverse order from the primary key. You don't need any duplicate indexes.

Return all records of non-given id if just one of those records matches the given id of another field

After searching for a damn long time, I've not found a query to make this happen.
I have an "offers" table with a "listing_id" field and a "user_id" field and I need to get ALL the records for all listing_id's where at least one record matches the given user_id.
In other words, I need a query that determines the listing_id's that the given user is involved in, and then returns all the offer records of those listing_id's regardless of user_id.
That last part is the problem. It's getting all the other user's offer records to return when I'm only providing one user's id and no listing id's
I was thinking of first determining the listing_ids in a separate query and then using a php loop to create a WHERE clause for a second query that would consist of a bunch of "listing_id = $var ||" but then I couldn't bring myself to do it because I figured there must be a better way.
Hopefully this is easy and the only reason it has escaped me is because I've had my head up my ass. Will be happy to get this one behind me.
Thanks for taking the time.
Josh

You could do two queries playing along on the MySQL side, like this:
SELECT * FROM offers WHERE listing_id IN (SELECT listing_id FROM offers WHERE user_id = 1)

If I understand what you are after you should join offers on itself on listingid match and userid = given
select * from offers AS t1
inner join offers AS t2 on t1.listingid = t2.listingid and t1.userid = 1;

Order by votes - PHP

I have a voting script which pulls out the number of votes per user.
Everything is working, except I need to now display the number of votes per user in order of number of votes. Please see my database structure:
Entries:
UserID, FirstName, LastName, EmailAddress, TelephoneNumber, Image, Status
Voting:
item, vote, nvotes
The item field contains vt_img and then the UserID, so for example: vt_img4 and both vote & nvotes display the number of votes.
Any ideas how I can relate those together and display the users in order of the most voted at the top?
Thanks

You really need to change the structure of the voting table so that you can do a normal join. I would strongly suggest adding either a pure userID column, or at the very least not making it a concat of two other columns. Based on an ID you could then easily do something like this:
select
a.userID,
a.firstName,
b.votes
from
entries a
join voting b
on a.userID=b.userID
order by
b.votes desc
The other option is to consider (if it is a one to one relationship) simply merging the data into one table which would make it even easier again.
At the moment, this really is an XY problem, you are looking for a way to join two tables that aren't meant to be joined. While there are (horrible, ghastly, terrible) ways of doing it, I think the best solution is to do a little extra work and alter your database (we can certainly help with that so you don't lose any data) and then you will be able to both do what you want right now (easily) and all those other things you will want to do in the future (that you don't know about right now) will be oh so much easier.
Edit: It seems like this is a great opportunity to use a Trigger to insert the new row for you. A MySQL trigger is an action that the database will make when a certain predefined action takes place. In this case, you want to insert a new row into a table when you insert a row into your main table. The beauty is that you can use a reference to the data in the original table to do it:
CREATE TRIGGER Entries_Trigger AFTER insert ON Entries
FOR EACH ROW BEGIN
insert into Voting values(new.UserID,0,0);
END;
This will work in the following manner - When a row is inserted into your Entries table, the database will insert the row (creating the auto_increment ID and the like) then instantly call this trigger, which will then use that newly created UserID to insert into the second table (along with some zeroes for votes and nvotes).

Your database is badly designed. It should be:
Voting:
item, user_id, vote, nvotes
Placing the item id and the user id into the same column as a concatenated string with a delimiter is just asking for trouble. This isn't scalable at all. Look up the basics on Normalization.

You could try this:
SELECT *
FROM Entries e
JOIN Voting v ON (CONCAT('vt_img', e.UserID) = v.item)
ORDER BY nvotes DESC
but please notice that this query might be quite slow due to the fact that the join field for Entries table is built at query time.
You should consider changing your database structure so that Voting contains a UserID field in order to do a direct join.

I'm figuring the Entries table is where votes are cast (you're database schema doesn't make much sense to me, seems like you could work it a little better). If the votes are actually on the Votes table and that's connected to a user, then you should have UserID field in that table too. Either way the example will help.
Lets say you add UserID to the Votes table and this is where a user's votes are stored than this would be your query
SELECT Users.id, Votes.*,
SUM(Votes.nvotes) AS user_votes
FROM Users, Votes
WHERE Users.id = Votes.UserID
GROUP BY Votes.UserID
ORDER BY user_votes

USE ORDER BY in your query --
SELECT column_name(s)
FROM table_name
ORDER BY column_name(s) ASC|DESC

Best method for storing quiz results in MySQL

I'm trying to record test/quiz scores in a database. What's the best method to do this when there might be a lot of tests and users?
These are some options I considered: should I create a new column for each quiz and row for users, or does this have its limitations? Might this be slow? Should i create a new row for each user & quiz? Should I stick to my original 'user' database and encode it in text?
Elaborating a little on the plan: JavaScript Quiz, submits score with AJAX, and a script sends it to the database. I'm new with php so i'm not sure about a good approach.
Any help would be greatly appreciated :) this is for a school science fair

I'd suggest 3 data tables in your database: students, tests, and scores.
Each student needs to have fields for an ID and whatever else (name, dob, etc) you want to record about them.
Tests should have fields for an ID and whatever else (name, date, weight, etc).
Scores should have the student ID, a test ID, and the score (any anything else).
This means you can query a student and join with the scores table to get all the student's scores. You can also join the test table these results to get labels put onto each score and calculate a grade based on scores and weight.
Alternately you can query for a test and join with the scores to get all the scores on a given test to get the class stats.

I would say create a database table, maybe one that lists all students(name, dob, student id), and then one for all tests(score, date, written by). Will only you access the db, or can your students access it too? If the latter is the case, you need to make sure the create accurate security or "views" to ensure the student can only see their own grades at a time (not everyone's).

Definitely do not create dynamic columns! (no column for each quiz). Also adding columns to user table (or generally any table) when they are not identifying the user(or generally any table item) is bad aproach...
This is pretty example of normalization, you should avoid storing any redundant rows. To do that you would create 3 tables and foreign keys to ensure scores are always referencing an existing user and quiz. E.g.:
users - id, nickname, name
quizzes - id, quizName, quizOtherData
scores - id, user_id (references users.id) , quiz_id , (ref. quizzes.id), score
And then add rows to scores table per user per quiz. Additionaly you could create UNIQUE key for columns user_id and quiz_id to disallow users to complete one quiz more times than one.
This will be fast and will not store redundant (unneeded extra) data.
To get results of quiz with id e.g. 4 and user info of people who's submitted this quiz, ordered from highest to lowest score, you would do query like:
SELECT users.*, scores.score
FROM scores RIGHT JOIN users ON(users.id=scores.user_id)
WHERE scores.quiz_id = 4
ORDER BY score DESC
Reason why I used RIGHT join here is because there might be users that didn't do this quiz, however every score always have an existing user&quiz (due to foreign keys
To get overall info of all users, quizes and scores you would do something like:
SELECT *
FROM quizzes
LEFT JOIN scores ON(quizzes.id=scores.quiz_id)
LEFT JOIN users ON(users.id=scores.user_id)
ORDER BY quizzes.id DESC, scores.score DESC, users.name ASC
BTW: If you are new to PHP (or anybody reading this), use PHP's PDO interface to communicate with your database :) AVOID functions like mysql_query, at least use mysqli_query, but for portability I would recommend stay with PDO.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.