Database Search minus "The" prefix - php

could someone please point me in the right direction, I currently have a searchable database and ran into the problem of searching by title.
If the title begins with "The" then obviously the title will be in the 'T' section, what is a good way to avoid "The" being searched ? Should i concat two fields to display the title but search by only the second title ignoring the prefix. or is there another way to do this? Advice or direction would be great. thanks.

A few choices:
a) Store the title in "Library" format, which means you process the title and store it as
Scarlet Pimpernel, The
Tale of Two Cities, A
b) Store the original unchanged title for display purposes, and add a new "library_title" field to store the processed version from a).
c) Add a new field to store the articles, and the bare title in title field. For display, you'd concatenate the two fields, for searching you'd just look in the title field.

I believe the best approach is to use full-text search, with 'the' in the stopwords list. That would solve the search problem (i.e., 'the' on search phrases would be ignored).
However, if you are ordering the results by title, a title starting with 'The' would still be sorted, "in the 'T' section", as you put it. To solve that, there are several possible approaches. Here are some of them:
Separating the fields, the way you said on the quesiton
Having a separate field with the number of chars to be ignored from the beginning when sorting
Replacing initial 'The's for sorting
Among others...

If you are using mysql, you could use a str_replace function to remove "The" from your query, or if you are using PHP or Ruby or another language you can just sanitize your query before sending to the database server.

Create three columns in the database
1) TitlePrefix
2) Title
3) TitlePostfix
Code such that you have 4 methods like
searchTitleOnly(testToSearch) // search only title column
searchTitleWithPrefixAndPostfix(testToSearch)//concat all the three columns and search
searchTitlePrefix(testToSearch) // search title prefix only
searchTitlePostfix(testToSearch) // search title postfix only

Try looking into some sql functions like LTRIM, RTRIM etc and use these functions on a temp column which has exact same data. Modify the data by using LTRIM, RTRIM by dropping whichever words u please. Then perform the search on the modified column and return the entire row as the result!

Related

Autocomplete SQL Query suggestions (Ajax+PHP)

I have a question regarding SQL best practices when formulating a query for use in an Autocomplete form (jquery Axax + PHP).
Let us assume the following:
I have a database with the titles of books
Some books have titles without a definite article ("The" or "A") such as "Life of Pi"
Some books have titles with a definite article ("The" or "A") such as "The Catcher in the Rye"
As a result, users will input the title of the book either using "The" at the beginning or simply omitting the "The" and start the query without any definite article.
Three possible queries seem to exist:
SELECT 'title' FROM 'books' WHERE 'title' LIKE '%$string'
or
SELECT 'title' FROM 'books' WHERE 'title' LIKE '$string%'
or
SELECT 'title' FROM 'books' WHERE 'title' LIKE '%$string%'
When using the first query method (where the % is before the string), it is difficult to get any results, since the wildcard before the string seems to behave erroneously.
When using the second query, it seems to favor exact matches using "The" before a title. Thus, a user searching for "The Catcher in the Rye" will find the book, but a user searching for "Catcher in the Rye" will not.
The last result is the best one, since it has a wildcard before and after the string. However, it also gives the longest auto-complete list. The user will have to type a few letters to narrow down the search result.
Any ideas on implementing a more efficient query? Or is the third option the best one (seeing as it is not feasible to separate the definite article in the title of a book?
Thanks in advance,
You can do a search using Regular Expressions (query result comes quickly)
and do not forget to add limitation to your results.
a small example
SELECT title FROM books WHERE title REGEXP '$string' LIMIT 20
or you can use word boundaries
SELECT title FROM books WHERE title REGEXP '[[:<:]]$string[[:>:]]' LIMIT 20
see the documents http://dev.mysql.com/doc/refman/5.5/en/regexp.html
$query = mysqi_query("SELECT title FROM books WHERE title REGEXP '$string'");
if($query->num_rows() == 0) {
//First remove all the stop words like for, the, of, a from the search string.
$stopWords = array('/\bfor\b/i', '/\bthe\b/i', '/\bto\b/i', '/\bof\b/i','/\ba\b/i');
$string = preg_replace($stopWords, "", $string);
//Then, use
mysqli_query("SELECT title FROM books WHERE title REGEXP '$string'");
}
I would suggest using the third method with wildcards on either side of the string. If you are worried about the size of the returned result set, perhaps limit the results to a certain number, and as the user types the list will naturally get smaller and more specific.
you may also consider allowing searches for 'Catcher Rye' that should still match.
in this case - you would tokenize each word in the title as well as the words entered by the user and find the best matches.
otherwise only autocomplete after say 4 or more characters have been entered, and use option 3.
If you're worried about the quantity of suggestions, can you modify the change event to only retrieve suggestions after they have typed some minimum number of characters in the field?

Backend for autosuggest for fulltext search

I want to create an autosuggest for a fulltext search with AJAX, PHP & MySQL.
I am looking for the right way to implement the backend. While the user is typing, the input field should give him suggests. Suggests should be generated from text entrys in a table.
Some information for this entrys: They are stored in fulltext, generated from PDF with 3-4 pages each. There not more than 100 entrys for now and will reach a maximum of 2000 in the next few years.
If the user starts to type, the word he is typing should be completed with a word which is stored in the DB, sorted by occurrences descending. Next step is to suggest combinations with other words, witch have a high occurrence in the entrys matching the first word. Surely you can compare it to Google autosuggest.
I am thinking about 3 different ways to implement this:
Generate an index via cronjob, witch counts occurrences of words and combinations over night. The user searches on this index.
I do a live search within the entrys with an 'LIKE "%search%"' function. Then I look for the word after the this and GROUP them by occurrence.
I create a logfile for all user searches, and look for good combinations like in 1), so the search gets more intelligent with each search action.
What is the best way to start with this? The search should be fast and performant.
Is there a better possibility I did not think about?
I'd use mysql's MATCH() AGAINST() (http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html), eg:
SELECT *
FROM table
WHERE MATCH(column) AGAINST('search')
ORDER BY MATCH(column) AGAINST('search')
Another advantage is that you could further tweak the importance of words being searched for (if neccessary), like:
MATCH(column) AGAINST('>important <lessimportant') IN BOOLEAN MODE
Or say that certain words of the search term are to be required, whilst others may not be present in the result, eg:
MATCH(column) AGAINST('+required -prohibited') IN BOOLEAN MODE
I think, the idea no 1 is the best. By the way, dont't forget to eliminate stopwords from autosuggest (an, the, by, ...).

Need a help to write search query

My database contains a list of phone numbers which is of varchar type. Phone number may be in any of these formats
12323232323
1-232-323 2323
232-323-2323
2323232323
Instead of the – symbol there may be ( ) , . or space
And if I search for 12323232323, 1-232-323 2323, 232-323-2323, or 2323232323 it should display all these results. I need to write a query for this.
I think it is not efficient to do this realtime, I propose two options.
clean the data, so there will be only one format.
add another column which contains the clean data, so when you search, you search for this column, when display you can display the various format data.
I agree with James, but if you really need to search the database as it is, perhaps MySQL's REPLACE operator will get you where you need to go. Something like
select * from mytable where replace(crazynumber,'-','')='23232323';
How to Replace Multiple Characters in SQL?
Can MySQL replace multiple characters?
Agree with James, but if u really need to do this, the above two links have proposed the prefect solutions for your scenario.

Need a PHP MySQL script to search for keywords in a database

I need to implement a search option for user comments that are stored in a MySQL database. I would optimally like it to work in a similar manner to a standard web page search engine, but I am trying to avoid the large scale solutions. I'd like to just get a feel for the queries that would give me decent results. Any suggestions? Thanks.
It's possible to create a full indexing solution with some straightforward steps. You could create a table that maps words to each post, then when you search for some words find all posts that match.
Here's a short algorithm:
When a comment is posted, convert the string to lowercase and split it into words (split on spaces, and optionally dashes/punctuation).
In a "words" table store each word with an ID, if it's not already in the table. (Here you might wish to ignore common words like 'the' or 'for'.)
In an "indexedwords" table map the IDs of the words you just inserted to the post ID of the comment (or article if that is what you want to return).
When searching, split the search term on words and find all posts that contain each of the words. (Again here you might want to ignore common words.)
Order the results by number of occurrences. If the results must contain all the words you'd need to find the union of your different arrays of posts.
As an entry point, you can use MySQL LIKE queries.
For example if you have a table 'comments' with a column named 'comment', and you want to find all comments that contain the word 'red', use:
SELECT comment FROM comments WHERE comment LIKE '% red %';
Please note that fulltext searches can be slow, so if your database is very large or if you run this query a lot, you will want to find an optimized solution, such as Sphinx (http://sphinxsearch.com).

finding similar strings to given by keywords, each keyword have got it's own 'power'

This question is a chalenge for me, my friend can`t tell me how to do it, but he is really good programmer (I think).
Users can put into database sentences. When user puts a sentence it is saved in sentences table.
Next, sentence is split into words, each soundex of the word is saved into table tags with id of the splited sentence.
Last, each soundax of the word is put into weights table, if there arleady is the same soundex, function adds 1 to counter of this soundex.
(For those who dont know: soundex is a function that returns a phonetic representation (the way it sounds) of a string)
Structure of the database:
One table sentences contains two rows: id and sentence.
Other table tags contains id (with is id of a sentence) and tag (with is one word from the sentence).
tag isn't really just plain word, but soundex of this word.
Last table weights contains tag and weight (with is number, it tells us how many there is tags like this in table tags)
My question is: how can I make a function witch returns similar sentences to given string.
It should use tags (soundex of word) and each tag should have its own power based on weights table.
Tags, that are often used are more important, then more original tags. Can it be done in just one mysql query?
Next question: I think that this way of looking for similar sentences is good, but what with speed of this function?
I need to use it very very often in my site.
Well instead of having a weights table, why don't you have a table that relates tags to sentences? So have a table called sentence_tags with a sentence_id and a tag_id column. Then you can compute the weights by doing a join on those two tables, and still reference back to the sentence that contains the tag. You may as well store both the tag and the soundex in the tags table, while you're at it.
Perhaps the Levenshtein Distance is what you are looking for. It calculates the number of steps there are needed to transfer from one word to another.
Do realize this is a costly operation.
Joe K's suggestion seems spot on for good database design.
Do not store information that can be extrapolated.
Meaning, use the join statement and PHP to calculate the weight at run-time.
I understand this may not be the correct solution in your design, but often a few moments spent on smart database struture design will make everything work that much better.

Categories