I've been working on a live search feature for my site. I take a bunch of table values (i.e. bands, genres, albums, etc.) and I put them in their respective tables, as well as a search table for indexing. My main question is how do I make it more accurate.
Here is my mysql query (NOTE: the results are the same with or without Desc):
$search = "
(SELECT * FROM `search`
WHERE `search_name` LIKE '%$search_term%'
ORDER BY '%$search_term%' DESC
LIMIT 0, 8)";
Here is an example of the database:
Search ID | Search Name | Type |
----------------------------------------------------------
8 | Big Deal (What's He Done Lately?) | Album |
12 | Henry's Funeral Shoe | Band |
Problem is, is that when I type say H into the search parameter, I expect Henry's Funeral Shoe to be at the top, but instead I get Big Deal (What's He Done Lately?) before it, because it contains H and it is searched before the more appropriate one is.
So my question: Is there a MySQL function that can sort through the table and find the most relevant results and weigh them against those less relevant?
A basic search is easy. Good search is hard.
When setting up a search, you need to understand the data and what users will be searching for. In your case you want items starting with the terms to be returned first (I assume).
MySQL is not the best platform for search in general, but what you can do is 2 search. The first is:
search_name LIKE '$search_term%'
Note the missing % in front of the search term. These results are the higher ranked ones.
Secondly you should use:
search_name LIKE '%$search_term%'
These are your lower ranked results. Any results that are also in the higher rank list should be removed from the lower ranked list.
Finally you should combine the result sets making sure not to accidental mix up the ranks.
While not perfect nor adjustable, that might help in a basic search field.
Related
I have a database containing more than 100,000 values. The structure looks something like as follows:
id | countryid | webid | categoryid | title | dateaddedon
If I use basic RAND() considering there are so many ids it won't be able to return a random result. I end up seeing titles of same webid next to each other. I would rather want titles from different webids being displayed. Therefore I figured since there are only 4-5 different values of webid it might be a better option to randomize the output based on this. I am unable to figure out how to define which specific column values should be randomized when using mysql SELECT command.
I am current using following
SELECT * FROM table WHERE countryid='1' ORDER BY dateaddedon DESC, RAND(3)
I am currently using 3 as seed value. I am not sure what kind of impact does seed value have on RAND. I would highly appreciate if someone could explain that too.
If seed value is specified it produces a repeatable sequence of column values. Unless you require a repeatable value leave it out. Also if you should have the RAND() as the first clause in ORDER.
SELECT * FROM table WHERE countryid='1' ORDER BY RAND(),dateaddedon DESC
I have a database with over 60 million records indexed by SphinxQL 2.1.1. Each record has a title and a catid (among other things). When a new record is inserted into the database, I am trying to get sphinx to guess the catid based on the text in the title.
I have managed to get it working for single words like so:
SELECT #groupby, catid, count(*) c FROM sphinx WHERE MATCH('*LANDLORDS*') group by catid order by c desc
However the actual title is likely to be something like this:
Looking for Landlords - Long term lease - No fees!!!
Is there any way to just dump the whole title string into sphinx and have it break down each of the words and perform some sort of fuzzy match, returning the most likely category?
Well as such sphinx isnt 'magical', and it doesn't have a 'fuzzy match' function.
But can approximate one :) Two main steps...
Changing from requiring all 'words', to just requiring some,
changing ranking, to try to make the best 'intersection' between the query and the title, get a high weight, and therefore 'bubble' to the top.
Can then just take the top result, and take it be a 'best guess'.
(there is actully a third, words lie 'for' and 'the' are likly to cause lots of false positives, so may want to exclude them, either using stopwords on the index, or just strip then from the query)
A prototype of such a query might be something like
SELECT catid FROM sphinx WHERE MATCH('"Looking Landlords Long term lease No fees"/1') OPTION ranker=wordcount LIMIT 1;
Thats using quorum to affect matching, and choosing a different ranker.
Using this version with grouping, proabbly wont work, as will include lots of low quality matches. Although could perhap try using avg, or sum to get a composite weight?
SELECT SUM(WEIGHT()) as w, catid FROM sphinx WHERE MATCH('"Looking Landlords Long term lease No fees"/1') GROUP BY catid ORDER BY w DESC OPTION ranker=wordcount LIMIT 1
There are lots of ways to tweak this...
You can try other rankers, eg matchany. Or even some custom ranking expressions.
Or change the quorum, eg rather rank requiring 1 word, could result at least a few.
Or if can extract phrases, eg
'"Looking Landlords" | "Long term lease" | "No fees"'
might work?
ALso could rather than just taking the top result, take the top 5-10 results, and show them all to the user, compenstates for the fact the results are very approximate.
I'm self-taught programmer/web-guy. And I am completely new to MYSQL. This is not a question asking how-to because I can make it do what I want. Rather I am asking for advice/critiques on how to do it better with regards to the DB design and queries I used.
A little background about me
I am not a professional programmer. Programming has always been more of a hobby interest for me. My job is designing custom jewelry using a CAD program and then have my models milled out via a CNC made especially for carving jewelry models out of wax (and a few days ago we got a 3D printer that prints in wax suitable for jewelry casting...yay!).
Why I'm doing this mysql/php project
For about a year, I've used a physical bulletin board by my workstation to track my 20 to 30 some odd design projects I always seem to have in queue. I have been wanting to make a website whereby I could enter some details about the job ticket and track it that way. This has various benefits, not the least of which is getting rid of the physical board which I always seem to be knocking the tickets off of. But mainly, the boss can view ticket statuses from anywhere in the world. A few months ago I finally got it roughed-in and it is even usable now, but I wanted to add a few more things to the project (namely, notes and special instructions).
So, I've got a main table that's approximately like...
ticket_id | lname | fname | job_desc | state_id |
----------|-------|-------|----------|-----------|
1 | Smith | Bob | dia ring | 1 |
----------|-------|-------|----------|-----------|
2 | Parker| Gil | pendant | 3 |
----------|-------|-------|----------|-----------|
There's other columns in the tickets table, but gives the general idea. The state_id refers to another table of states (statuses), which looks like so...
state_id | state_desc |
---------|----------------|
1 | Not Started |
---------|----------------|
2 | Being Designed |
---------|----------------|
3 | Rendered |
There's more states a ticket can have, but that gives the idea. The state ids end up being column heads which I use on the physical bulletin board.
I would like to have special instructions and notes for each ticket. This was something that has been awkward, though not impossible, on the physical board. It will be much nicer on the website version.
My first thought was to add another column to the tickets table like so....
ticket_id | lname | fname | job_desc | state_id | sp_instr | notes |
----------|-------|-------|----------|----------|----------|-------|
1 | Smith | Bob | dia ring | 1 | | |
----------|-------|-------|----------|----------|----------|-------|
For each ticket, there will be multiple instructions/notes and I want them to end up as unordered lists in the web page. So I would have to do something like...
Finger Size: 6.5|||Use three largest diamonds only|||Use customer's gold
And then explode on the "|||" (or some other arbitrary character combination) to get the array to make the list from. Simple enough. It was my first direction. Not too bad for the instructions. Never gonna have more than 5 or 6 short sentences. But the notes (which could double as an activity log, too) could be more complex and numerous.
So then I considered making a separate table like so...
ticket_id | instr |
----------|----------------------------------|
12 | Finger Size: 6.5 |
----------|----------------------------------|
12 | Use three largest diamonds only |
----------|----------------------------------|
12 | Use customer's gold |
----------|----------------------------------|
18 | Put bird on pendant somehow |
----------|----------------------------------|
18 | Use cust's white gold in pendant |
----------|----------------------------------|
Below is the meat of the php code that uses the database (connection info is not shown and table names have been changed)....
$db = new mysqli("{connection info}");
// put column heads in an array
$result = $db->query("SELECT * FROM states_table");
$col_heads = array();
while($row = $result->fetch_assoc()){
$col_heads[] = $row;
}
// put ticket info in an array
$result = $db->query("SELECT * FROM main_table ORDER BY due_date");
$tickets = array();
while($row = $result->fetch_assoc()){
$tickets[]=$row;
}
// do a bunch of stuff here to make HTML for the columns & tickets
// collect special instructions
$result = $db->query("SELECT main_table.ticket_id, instructions.instr FROM main_table LEFT JOIN instructions ON main_table.ticket_id = instructions.ticket_id");
while($row = $result->fetch_object()) {
var_dump($row);
}
But it's that last query that is giving me a headache. For tickets that have no special instructions, I get...
object(stdClass)#4 (2) {
["ticket_id"]=>
string(1) "6"
["instr"]=>
NULL
}
But when they do, I get a separate object for each record in the special instructions table like this...
object(stdClass)#2 (2) {
["ticket_id"]=>
string(2) "39"
["instr"]=>
string(22) "Use dias marked in red"
}
object(stdClass)#4 (2) {
["ticket_id"]=>
string(2) "39"
["instr"]=>
string(32) "Cust likes side shank of RG ring"
}
object(stdClass)#2 (2) {
["ticket_id"]=>
string(2) "39"
["instr"]=>
string(73) "Cust likes top of WG ring as well as the top of the ring from our website"
}
I am definitely able to (and did) go through that mess with a while loop and collect the instructions for a single ticket id, pack them into an array and associate them with the id so I have key=>value pairs like so...
ticket_id => array of instructions
So I can strong-arm my way through it, but I feel like I am overlooking some power in the MYSQL query statements somehow. Are my database tables and queries laughable? Have I overlooked something common/useful/powerful? Or is that just how linking a single record from one table to another table with multiple records is?
Sorry for the rambling post.
Because your question is more about MySQL than anything else, i'd say add a query we can comment on. I'll just assume what kind of advise you need here..
As for DB normalisation.. your setup looks good. Some common MySQL advice bulletpoints you might not be aware of are
Always give each table an unique (Auto-Increment) ID column, so add that to the notes table.
Consider adding field prefixes derived from the table name. This makes multi-table queries more readable. Often this is truncated to 3 or 4 characters like noteTicketID inside notes, and tickID inside the tickets table.
Try to make table names the plural of what they contain, but dont make the field names plural unless 1 field actually contains multiple values.
Do not use queries without a ORDER BY clause.
As for DB queries.. a query can be simply:
SELECT * FROM `notes` WHERE `noteTicketID`=3 ORDER BY `noteCreated` DESC
But perhaps you want to integrate several queries into one, and make a huge overview for open tickets. MySQL allows you to do this: (list every tickets of a certain state, including notes)
SELECT * FROM `tickets` WHERE `tickStateID`=0 OR `tickStateID`=1
LEFT JOIN `notes` ON `notes.noteTicketID`=`tickets.tickID`
Its certainly not "good design" to make arrays from MySQL results which you use to run further queries. If you can let MySQL do it for you in one swoop, thats a win-win situation.
As for solving the company problem, consider defining it and looking for existing solutions. Perhaps you need enterprise solutions like ERP, CRM, SCM, or just a simple issuetracker like MantisBT. This might not require much mcGyvering, but you'll likely move towards such a solution anyway eventually. The mcGyvering is certainly worth the effort though, if that helps defining the problem, but it takes a lot of time. None the less, this is how we all learned programming.. + feedback like this SO.
Conclusion:
Not your question, but install MantisBT and see how far this gets you. At the worst, it helps define the problem. But you may see a few workflow tricks that can use in a custom solution, or check out its intricate database design. And best of all, its also written in PHP and uses MySQL.
If I had to do over again, I would have called this question: MYSQL: Query that Groups Rows from Left Joined Tables???
I knew so little about MYSQL I could hardly think how to ask the question. My main issue was that the left joined table had multiple records per ticket id. I needed a way to group this data.
Enter GROUP_CONCAT.
The query that worked for me?
SELECT c.*,
GROUP_CONCAT(i.instr SEPARATOR '|||') AS instructions,
GROUP_CONCAT(n.note SEPARATOR '|||') AS notes
FROM custom_tickets c
LEFT JOIN ct_instr i ON c.ticket_id = i.ticket_id
LEFT JOIN ct_notes n ON c.ticket_id = n.ticket_id
GROUP BY c.ticket_id
The above query as used in my php code, creates an array with key => value pairs that look something like...
ticket_id => "1"
lname => "Smith"
fname => "Bob"
job_desc => "a cool ring"
instructions => "finger size: 6|||use cust's gold|||don't screw it up"
notes => "Use more diamonds now|||Make it look less stupid"
Some cool things that I (think I) learned
how to reference table names via variables within the query statement
that you can reference a joined table prior to using the JOIN keyword
(apparently) the results of GROUP_CONCAT should be treated as a column
GROUP_CONCAT is a great place to use the AS keyword. It makes for an ugly key in the array if you don't.
Also, by default, GROUP_CONCAT uses the comma as the separator. But as my notes and instructions will commonly include commas, I needed to specify a separator. I randomly picked a group of characters I am unlikely to ever use ("|||").
Thanks for every one's help.
I'm designing a mysql database, and i'd like some input on an efficient way to store blog/article data for searching.
Right now, I've made a separate column that stores the content to be searched - no duplicate words, no words shorter than four letters, and no words that are too common. So, essentially, it's a list of keywords from the original article. Also searched would be a list of tags, and the title field.
I'm not quite sure how mysql indexes fulltext columns, so would storing the data like that be ineffective, or redundant somehow? A lot of the articles are on the same topic, so would the score be hurt by so many of the rows having similar keywords?
Also, for this project, solutions like sphinx, lucene or google custom seach can't be used -- only php & mysql.
Thanks!
EDIT - Let me clarify:
Basically, i'm asking which way fulltext would provide the fastest, most relevant results: by finding many instances of the search term in all the data, or just the single keyword among a handful of other words.
I think a separate keywords table would be over the top for what i need, so should I forget the keywords column and search on the article, or continue to select keywords for each row?
You should build the word list (according to the rules you've specified) in a separate table and then map it to each article in a join table, along with the number of occurrences:
words: id | name
articles: id | title | content
articles_words: id | article_id | word_id | occurrences
Now you can scan through the join table and even rank the articles by the occurrence of the word, and probably place some importance on the order in which the words were typed in the search query string.
Of course, this is a very academic solution. I'm not sure what your project requires, but FULLTEXT indexing is very powerful and you're always better off using it in most practical situations.
HTH.
I have a table which would contain information about a certain month, and one column in that row would have mysql row id's for another table in it to grab multiple information from
is there a more efficent way to get the information than exploding the ids and doing seperate sql queryies on each... here is an example:
Row ID | Name | Other Sources
1 Test 1,2,7
the Other Sources has the id's of the rows from the other table which are like so
Row ID | Name | Information | Link
1 John | No info yet? | http://blah.com
2 Liam | No info yet? | http://blah.com
7 Steve| No info yet? | http://blah.com
and overall the information returned wold be like the below
Hi this page is called test... here is a list of our sources
- John (No info yet?) find it here at http://blah.com
- Liam (No info yet?) find it here at http://blah.com
- Steve (No info yet?) find it here at http://blah.com
i would do this... i would explode the other sources by , and then do a seperate SQL query for each, i am sure there could be a better way?
Looks like a classic many-to-many relationship. You have pages and sources - each page can have many sources and each source could be the source for many pages?
Fortunately this is very much a solved problem in relational database design. You would use a 3rd table to relate the two together:
Pages (PageID, Name)
Sources (SourceID, Name, Information, Link)
PageSources (PageID, SourceID)
The key for the "PageSources" table would be both PageID and SourceID.
Then, To get all the sources for a page for example, you would use this SQL:
SELECT s.*
FROM Sources s INNER JOIN PageSources ps ON s.SourceID = ps.SourceID
AND ps.PageID = 1;
Not easily with your table structure. If you had another table like:
ID Source
1 1
1 2
1 7
Then join is your friend. With things the way they are, you'll have to do some nasty splitting on comma-separated values in the "Other Sources" field.
Maybe I'm missing something obvious (been known to), but why are you using a single field in your first table with a comma-delimited set of values rather than a simple join table. The solution if do that is trivial.
The problem with these tables is that having a multi-valued column doesn't work well with SQL. Tables in this format are considered to be normalized, as multi-valued columns are forbidden in First Normal Form and above.
First Normal Form means...
There's no top-to-bottom ordering to the rows.
There's no left-to-right ordering to the columns.
There are no duplicate rows.
Every row-and-column intersection contains exactly one
value from the applicable domain (and
nothing else).
All columns are regular [i.e. rows have no hidden components such as
row IDs, object IDs, or hidden timestamps].
—Chris Date, "What First Normal Form Really Means", pp. 127-8[4]
Anyway, the best way to do it is to have a many to many relationship. This is done by putting a third table in the middle, like Dominic Rodger does in his answer.