Introduction
I have a project where we help the IT community to organize better IT meetups (just like meetup.com but only for IT community) called https://codotto.com/
We are developing a new feature where we would like to show the number of usages of certain tags. The algorithm should work just like Stackoverflow's one. You write javascript and you get a list with the tags that match your query sorted by the most used ones.
I'm currently using Laravel but I will post the raw query so that it's easier for mysql wizards to help me out if possible :)
Problem
I have the following tables
tags table
id name
1 javascript
2 javascript-tools
3 javascript-security
group_has_tags table
group_id tag_id
1 2
2 2
2 3
We have the tags javascript-tools being used two times, javascript-security is used one while javascript is not used at all.
Now, if a user search for javascript, he should get first javascript (because it is a direct match) followed by the rest of the tags sorted by their usage.
In Laravel this is the code that I have (simplified ofc)
$tags = Tag::withCount('groups')
->orderBy('groups_count', 'DESC')
->where('name', 'LIKE', 'javascript%')
->get(2);
The problem is that since I only return back 2 results, javascript is not being included in the results, even tho it's what the user literally wrote
For the mysql magicians, here is the raw query
select
`tags`.*,
(
select
count(*)
from
`meetups`
inner join `meetup_has_tags` on `meetups`.`id` = `meetup_has_tags`.`meetup_id`
where
`tags`.`id` = `meetup_has_tags`.`tag_id`
) as `meetups_count`
from
`tags`
where
`title` LIKE 'javascript%'
order by
`meetups_count` desc
limit
2 offset 0
Question
The main objective here is to return the most relevant result to the user. He writes javascript and javascript shows up first followed by less "relevant" results. The way I found was to sort by the number of times a tag was used.
Is there a solution where I can do "please, fetch the results that match this query first, then return the most relevant results"?
By "most relevant" results, I simply mean "what the user is looking for". If he writes "javascript" it should return "javascript" followed by "javascript-tools" (because "javascript-tools" was used twice but the user is literally searching for "javascript")
Here is your query:
SELECT * FROM
(SELECT tags.*,COALESCE((SELECT COUNT(*) FROM group_has_tags WHERE tag_id = tags.id),0) AS usage
FROM tags
WHERE title LIKE 'javascript%') AS tmp
ORDER BY tmp.name = 'javascript' DESC,usage DESC
For each matching tag you get the number of times it has been used.
Then you first sort by whether the tag matches literally what the user has typed, then by the usage.
Of course you will have to parameterize this query but I hope you get the idea.
Related
I have an ajax script that searches database tables for expressions similar to google search. The SELECT statement just uses LIKE and finds matches in the relevant fields. It worked fine at first but as content has grown, it is giving way too many matches for most search strings.
For example, if you search for att, you get att but also attention, attaboy, buratta etc.
Good search engines such as Google seem to have an intermediate table of suggestions that have been vetted by others. Rather than search the data directly, they seem to search the approved phrases such as AT&T and succeed in narrowing the number of results. Has anyone coded something like this and suggest the right dbase schema and query to get relevant results.
Right now I am searching table of say names directly with something like
$sql = "SELECT lastname from people WHERE lastname LIKE '%$searchstring%'";
I imagine besides people I should create some intermediate table along the lines of
people
id|firstname|lastname|description
niceterms
id|niceterm|peopleid
Then the query could be:
$sql = "SELECT p.lastname,p.peopleid, n.niceterm, n.peopleid,
FROM `people` p
LEFT JOIN `niceterms` n
on p.id = n.peopleid
WHERE niceterm LIKE '%$searchterm%'";
..so when you type something in the search box, you get nice search terms that will yield better results.
But how do I populate the niceterms table. Is this the right approach? I'm not trying to create a whole backweb or pagerank. Just want to narrow search results so they are relevant.
Thanks for any suggestions.
You might want to take a look at FULLTEXT search in Mysql. It allowes you to create powerfull query's based on relevance. You can for example create a BOOLEAN search which allowes you to create a scorerow in your result. The score will be based on rules like does the text start with a karakter combination (yes? +2, no but it does contain the combination: +1)
The below code is just another column and it has 3 rules in it:
Does the p1.name field contain Bl or rock? if yes -> add score
Does the p1.name field start with either Bl or rock? if yes -> add score
IS the p1.name equal to Bl rock? if yes -> add score
MATCH p1.name AGAINST('>Bl* >rock* >((+Bl*) (+rock*)) >("Bl rock")' IN BOOLEAN MODE) AS match
Now just order by match and it will show you the most relevant searches. You can also combine the order by with multiple statements and add a limit like below:
Orders by most recent date, highest match and then orders the matches that have the same score by their character length
ORDER BY `date` DESC, `match` DESC, LENGTH(`p1`.`name`) ASC
Keep in mind that the above code somehow creates a relevant result based on common cases. Copying Google will be imposible since their algorithms for optimal results / speed are incredible.
If FULLTEXT search is a step to much, try to make a tag system. Tagging content with unique tag combinations will also result in a more reliable search result
I'm currently learning how to build a site in PHP MySQL. However, I seem to fail to understand COUNT() as count and wouldn't mind some further explanation.
I get the principles of COUNT, 0 || 1, and how it returns all the values that pertain to that query.
But, don't see how COUNT as count works. Anyhow, this is how the code I'm writing goes - so we have a working example - and where I first became perplexed.
"SELECT COUNT(id) as count, id
FROM user
WHERE email='$email' AND password='".md5$password."'"
That is what is called alias which is sometimes used to show a more appealing column header to users or the calling code
SELECT COUNT(`id`) as `count`....
will print
count
--------
5
The alias standing as the column header instead of any arbitrary string: See the SQLFiddle to see the difference
From the fiddle you can see that the header column looks somehow e.g.
count(*)
--------
5
With Count() you can count the returning rows of a result set. The also the official MySQL documentation about count:
Databases are often used to answer the question, “How often does a certain type of data occur in a table?” For example, you might want to know how many pets you have, or how many pets each owner has, or you might want to perform various kinds of census operations on your animals.
Counting the total number of animals you have is the same question as “How many rows are in the pet table?” because there is one record per pet. COUNT(*) counts the number of rows, so the query to count your animals looks like this:
SELECT COUNT(*) FROM pet;
The part with AS count means that this colum will get a name which you can use e.g. in PHP. See also this explenation on w3schools:
You can give a table or a column another name by using an alias. This can be a good thing to do if you have very long or complex table names or column names.
An alias name could be anything, but usually it is short.
as count is just an alias. You can use as for any field or method selected. it means you change the name of the column being returned in your dataset.
SELECT `field` as another_name
So:
SELECT COUNT(*) as `count`
Just renames the column from COUNT(*) to count making it easier to work with whereever you are maniuplating your result set.
It also makes for easier access within your current query. Many would do the following with large table names:
SELECT * FROM `table_with_ridiculous_name` as twrn WHERE twrn.id = 1
If you ran this sql:
SELECT COUNT(id), id ....
You would get (after doing a *_fetch_assoc) $row['numberofrecordshere'] which would be very hard to echo (or use in a comparison) unless you knew how many records there would be (which would defeat the purpose of this result, anyway)
Returning it as count allows you to get to it in the resulting array by using $row['count']
I am new to php and mysql and I am trying for 2 day to create a web application which can store specific data about the user from the form and than allow users to search based on similar slightly complicated form. I got the forms working and also input to the database but I have the problem with receiving the data from mysql and maybe in the future I will have with the search function too but i did not think much about that yet. Right now I am focusing on getting that specific data from db.
SELECT players.*,
brackets.name AS interested_in,
classes.name AS looking_for,
tags.name AS tagged_with
FROM players JOIN player_brackets ON (players.id = player_brackets.player_id)
JOIN brackets ON (player_brackets.bracket_id = brackets.id)
JOIN player_classes ON (player_brackets.player_id = player_classes.player_id)
JOIN classes ON (player_classes.class_id = classes.id)
JOIN player_tags ON (player_classes.player_id = player_tags.player_id)
JOIN tags ON (player_tags.tag_id = tags.id)
WHERE players.id = '18'
I have this query which works fine but the data i get from it is every combination of the data in the db which is right now about more than 100 results which are all almost the same. Its multiplying the results by number of fields in the db for example players (1) * brackets (3) * classes (11) * tags (5) which will output 165 rows of almost the same data with the difference in fields brackets, classes and tags.
If I group that results by players.id I get just one row but the data from other fields are lost. I want to get 1 row which will contains all of it (col with multiple data in in an array probably).
DB got 7 tables - players, tags, classes, brackets and 3 tables which contains player id and tag/class/bracket id combination.
I have been searching web and SO but nothing really helped me. It would be very nice if someone can help me with this one. Thanks for reply and I am sorry for my poor english its not my primary language.
When you join, MySQL is going to give you every combination.
As you've seen, you can GROUP BY, but then it will only give you one combination.
I think you're looking for a series of queries ("get all brackets for player 1" then "get all classes for player 1 in bracket 12", etc).
Or you can parse the results intelligently in your app.
There's also GROUP_CONCAT, but that rarely meets anyone's needs (since it comes back as one string).
I have a table with about 150 websites listed in it with the columns "site_name", "visible_name" (basically a formatted name), and "description." For a given page on my site, I want to pull site_name and visible_name for every site in the table, and I want to pull all three columns for the selected site, which comes from the $_GET array (a URL parameter).
Right now I'm using 2 queries to do this, one that says "Get site_name and visible_name for all sites" and another that says "Get all 3 fields for one specific site." I'm guess a better way to do it is:
SELECT * FROM site_list;
thus reducing to 1 query, and then doing the rest post-query, which brings up 2 questions:
The "description" field for each site is about 200-300 characters. Is it bad from a performance standpoint to pull this for all 150 sites if I'm only using it for 1 site?
How do I reference the specific row from the MySQL result set for the site specificed in the URL? For example, if the URL is "mysite.com/results?site_name=foo" how do I do the post-query equivalent of SELECT * FROM site_list where site_name=foo; ?
I don't know how to get the data for "site_name=foo" without looping through the entire result array and checking to see if site_name matches the URL parameter. Isn't there a more efficient way to do it?
Thanks,
Chris
PS: I noticed a similar question on stackoverflow and read through all the answers but it didn't help in my situation, which is why I'm posting this.
Thanks,
Chris
I believe what you do now, keeping sperated queries for getting a list of sites with just titles and one detailed view with description for a single given site, is good. You don't pull any unneeded data and both queries being very simple are fast.
It is possible to combine both your queries into one, using left join, something maybe like:
SELECT s1.site_name, s1.visible_name, s2.description
FROM site_list s1
LEFT JOIN
( SELECT site_name, description
FROM site_list
WHERE site_name = 'this site should go with description' ) s2
ON s2.site_name = s1.site_name
resulting in all sites without matching name having NULL as description, you could even sort it using
ORDER BY description DESC, site_name
to get the site with description as first fetched row, thus eliminating need to iterate through results to find it, but mysql would have to do a lot more work to give you this result, negating any possible gain you could hope for. So basically stick to what you have now, its good.
Generally, it's good practice to have an 'id' field in the table as an auto_increment value. Then, you would:
SELECT id,url,display_name FROM table;
and you'd have the 'id' value there to later:
SELECT * FROM table WHERE id=123;
That's probably your most efficient method if you had WAAAY more entries in the table.
However, with only 150 rows in the table, you're probably just fine doing
SELECT * FROM table;
and only accessing that last field for a matching row based on your criteria.
If you only need the description for the site named foo you could just query the database with SELECT * FROM site_list WHERE site_name = 'foo' LIMIT 1
Otherwise you would have to loop though the result array and do a string comparison on site_name to find the correct description.
I have a webapp development problem that I've developed one solution for, but am trying to find other ideas that might get around some performance issues I'm seeing.
problem statement:
a user enters several keywords/tokens
the application searches for matches to the tokens
need one result for each token
ie, if an entry has 3 tokens, i need the entry id 3 times
rank the results
assign X points for token match
sort the entry ids based on points
if point values are the same, use date to sort results
What I want to be able to do, but have not figured out, is to send 1 query that returns something akin to the results of an in(), but returns a duplicate entry id for each token matches for each entry id checked.
Is there a better way to do this than what I'm doing, of using multiple, individual queries running one query per token? If so, what's the easiest way to implement those?
edit
I've already tokenized the entries, so, for example, "see spot run" has an entry id of 1, and three tokens, 'see', 'spot', 'run', and those are in a separate token table, with entry ids relevant to them so the table might look like this:
'see', 1
'spot', 1
'run', 1
'run', 2
'spot', 3
you could achive this in one query using 'UNION ALL' in MySQL.
Just loop through the tokens in PHP creating a UNION ALL for each token:
e.g if the tokens are 'x', 'y' and 'z' your query may look something like this
SELECT * FROM `entries`
WHERE token like "%x%" union all
SELECT * FROM `entries`
WHERE token like "%y%" union all
SELECT * FROM `entries`
WHERE token like "%z%" ORDER BY score ect...
The order clause should operate on the entire result set as one, which is what you need.
In terms of performance it won't be all that fast (I'm guessing), however with databases the main overhead in terms of speed is often sending the query to the database engine from PHP and receiving the results. With this technique this only happens once instead of once per token, so performance will increase, I just don't know if it'll be enough.
I know this isn't strictly an answer to the question you're asking but if your table is thousands rather than millions of rows, then a FULLTEXT solution might be the best way to go here.
In MySQL when you use MATCH on your indexed column, each keyword you supply will be given a relevance score (calculated roughly by the number of times each keyword was mentioned) that will be more accurate than your method and certainly more effecient for multiple keywords.
See here:
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
If you're using the UNION ALL pattern you may also want to include the following parts to your query:
SELECT COUNT(*) AS C
...
GROUP BY ID
ORDER BY c DESC
While this is a really trivial example it does get you the frequency of the matches for each result and this could be a pseudo rank to start with.
You'll probably get much better performance if you used a data structure designed for search tasks rather than a database. For example, you might try looking at building an inverted index. Rather than writing it youself, however, you might also want to look into something like Lucene which does most of the work for you.