Dynamic mysql search when Fulltext is not a viable solution - php

Many articles will point you to Fulltext indexing for a simple solution to mysql searches. This very well may be the case under the right circumstances, but I've yet to see a solution that comes close to Fulltext when Fulltext cannot be used (for instance, across tables). The solution I'm looking for would preferably be one that can match anything in the sentence.
So, searching James Woods or searching Woods James, might both return the same row where the text James Woods exists. Basic search methods would render "mix-matching" of search words useless.
The likely answers are replacing Fulltext with REGEXP or LIKE. Then replacing the 'whitespace' in the search term with | or % so James Woods might become James|Woods, so any combination of James and Woods will return results. Or become '%James%Woods%', which will be less productive, but still will return matches that aren't necessarily exact.
Example SQL
SELECT * FROM people
LEFT JOIN
(SELECT GROUP_CONCAT(other_data) AS people_data GROUP BY people_id)
AS t2 ON(t2.people_id = people.id)
WHERE CONCAT_WS(' ', people.firstname, people.lastname, people_data) LIKE {$query}
Is this really the best way? Are there any tricks to making this method (or another method) work more efficiently? I'm really looking for a mysql solution, so if your answer is to use another DB service, well, so be it and I'll accept that as an answer, but the real question is the best solution for mysql. Thanks.

In MySQL have you tried using
combination of MATCH() and AGAINST() functions
they will yield the result that you are looking I guess.
e.g.
for following set of data ::
mysql> select * from temp;
+----+---------------------------------------------+
| id | string |
+----+---------------------------------------------+
| 1 | James Wood is the matyr. |
| 2 | Wood James is the saviour. |
| 3 | James thames are rhyming words. |
| 4 | Wood is a natural product. |
| 5 | Don't you worry child - Swedish House Mafia |
+----+---------------------------------------------+
5 rows in set (0.00 sec)
this query would return following results, if you require either james or wood to be present
mysql> select string from temp where match (string) against ('james wood' in bo
olean mode);
+---------------------------------+
| string |
+---------------------------------+
| James Wood is the matyr. |
| Wood James is the saviour. |
| James thames are rhyming words. |
| Wood is a natural product. |
+---------------------------------+
4 rows in set (0.00 sec)
if you require that James and Wood both words should be present than this query would work. note the '+' sign before the words. check this Boolean mode
mysql> select string from temp where match (string) against ('+james +wood' in b
oolean mode);
+----------------------------+
| string |
+----------------------------+
| James Wood is the matyr. |
| Wood James is the saviour. |
+----------------------------+
2 rows in set (0.00 sec)
to find a word with any suffix it works in similar way
mysql> select string from temp where match (string) against ('Jame*' in boolean
mode);
+---------------------------------+
| string |
+---------------------------------+
| James Wood is the matyr. |
| Wood James is the saviour. |
| James thames are rhyming words. |
+---------------------------------+
3 rows in set (0.02 sec)
but note that prefix searches are not yet supported in fulltext searches by Mysql
mysql> select string from temp where match (string) against ('*ame*' in boolean
mode);
Empty set (0.00 sec)
I hope this helps.
On a kind note, this reply is very late but was interested enough for me to reply.
to learn more check this link http://dev.mysql.com/doc/refman/5.5/en//fulltext-search.html

I'm a bit late to the party - so apologies for that.
You mentioned that you cannot use fulltext functionality because you're using joins - well, although that is kind of the case, there is a popular way to get around this.
Consider the following for using fulltext search with joins:
SELECT
article.articleID,
article.title,
topic.title
FROM articles AS article
-----
INNER JOIN (
SELECT articleID
FROM articles
WHERE MATCH (title, keywords) AGAINST ("cat" IN BOOLEAN MODE)
ORDER BY postDate DESC
LIMIT 0, 30
) AS ftResults ON article.articleID = ftResults.articleID
-----
LEFT JOIN topics AS topic ON article.topicID = topic.topicID
GROUP BY article.id
ORDER BY article.postDate DESC
Notice how I managed to keep my topics join intact by running my fulltext search in another query and joining/matching the results by ID.
If you're not on shared hosting, also consider using Sphinx or Lucene Solr alongside MySQL for extra fast fulltext searches. I've used Sphinx and highly recommend it.

Related

Get Matching MySQL Rows from List?

I've got a list of ID numbers, (ex: 100230, 123890, 342098...). I've also got a table, with one column devoted to ID numbers:
thisID | name | dateBirth | State
----------------------------------
192465 | Fred | 94-12-06 | OR
197586 | Alex | 78-04-26 | NM
197586 | Alex | 78-04-26 | CA
178546 | Sam | 65-12-01 | NY
112354 | Katy | 89-06-22 | CO
...
I need to return any rows with 'thisID' that matches any of the items in the list I've got. Also, note that sometimes there may be multiple rows with the same ID that match an item in the list... in that case, all matching records should be returned.
I've looked around, and seen some recommendations to use arrays or temporary tables or something, but nothing definitive. How should I do this?
You can use the IN sql syntax for this, if I understand you correctly.
SELECT * FROM tablename
WHERE thisID IN (100230, 123890, 342098);
You can do like this:
select * from [table_name] where thisID in ([your IDs]);
It will return all the rows that match the given IDs.
See the SQLFiddle Demo

Most efficient JOIN query - MySQL

Below is a gross over simplification of 2 very large tables I'm working worth.
campaign table
| id | uid | name | contact | pin | icon |
| 1 | 7 | bob | ted | y6w | yuy |
| 2 | 7 | ned | joe | y6e | ygy |
| 3 | 6 | sam | jon | y6t | ouy |
records table
| id | uid | cid | fname | lname | address | city | phone |
| 1 | 7 | 1 | lars | jack | 13 main | lkjh | 55555 |
| 2 | 7 | 1 | rars | jock | 10 maun | oyjh | 55595 |
| 2 | 7 | 1 | ssrs | frck | 10 eaun | oyrh | 88595 |
The page loops thru the records table and prints the results to an HTML table. The existing code, for some reason, does a separate query for each record "select name from campaign where id = $res['cid']" I'd like to get rid of the second query and do a some kind of join but what is the most effective way to do it?
I need to
SELECT * FROM records
and also
SELECT name FROM campaigns WHERE campaigns.id = records.cid
in a single query.
How can I do this efficiently?
Simply join the two tables. You already have the required WHERE condition. Select all columns from one but only one column from the other. Like this:
SELECT records.*, campaigns.name
FROM records, campaigns
WHERE campaigns.id = records.cid
Note that a record row without matching campaign will get lost. To avoid that, rephrase your query like this:
SELECT records.*, campaigns.name
FROM records LEFT JOIN campaigns
ON campaigns.id = records.cid
Now you'll get NULL names instead of missing rows.
The "most efficient" part is where the answer becomes very tricky. Generally a great way to do this would be to simply write a query with a join on the two tables and happily skip away singing songs about kittens. However, it really depends on a lot more factors. how big are the tables, are they indexed nicely on the right columns for the query? When the query runs, how many records are generated? Are the results being ordered in the query?
This is where is starts being a little bit of an art over science. Have a look at the explain plan, understand what is happening, look for ways to make it more efficient or simpler. Sometimes running two subqueries in the from clause that will generate only a subset of data each is much more efficient than trying to join the entire tables and select data you need from there.
To answer this question in more detail, while hoping to be accurate for your particular case will need a LOT more information.
If I was to guess at some of these things in your database, I would suggest the following using a simple join if your tables are less than a few million rows and your database performance is decent. If you are re-running the EXACT query multiple times, even a slow query can be cached by MySQL VERY nicely, so look at that as well. I have an application running on a terribly specc'ed machine, where I wrote a cron job that simply runs a few queries with new data that is loaded overnight and all my users think the queries are instant as I make sure that they are cached. Sometimes it is the little tricks that really pay off.
Lastly, if you are actually just starting out with SQL or aren't as familiar as you think you might eventually get - you might want to read this Q&A that I wrote which covers off a lot of basic to intermediate topcs on queries, such as joins, subqueries, aggregate queries and basically a lot more stuff that is worth knowing.
You can use this query
SELECT records.*, campaigns.name
FROM records, campaigns
WHERE campaigns.id = records.cid
But, it's much better to use INNER JOIN (the new ANSI standard, ANSI-92) because it's more readable and you can easily replace INNER with LEFT or other types of join.
SELECT records.*, campaigns.name
FROM records INNER JOIN campaigns
ON campaigns.id = records.cid
More explanation here:
SQL Inner Join. ON condition vs WHERE clause
INNER JOIN ON vs WHERE clause
SELECT *
FROM records
LEFT JOIN campaigns
on records.cid = campaigns.id;
Using a left join instead of inner join guarantees that you will still list every records entry.

Pull records in order of most unique occurrences of dynamic array?

I am trying to pull the most relevant jobs based on a user defined keywords list on my website. So as a user, if I specify the following keywords:
builder
bricks
concrete
I want to work out how to search all jobs in the database that have at least one of these, but order them by the jobs that contains all three of these words.
My database table is as follows -
job_id INT
job_title VARCHAR
job_description TEXT
So I want to check job_description field and if it finds all 3 of these keywords it orders this at the top, then those jobs where 2 of the 3 are in job_description somewhere then 1.
Horrible hack, but with some client-side processing of the source array, you can dynamically build a query that'd look like:
SELECT
LOCATE('red', your_text_field) +
LOCATE('green', your_text_field) +
etc...
LOCATE('purple', your_text_field) AS color_count
FROM ...
ORDER BY color_count DESC
If a particular color doesn't exist, the LOCATE returns 0 and doesn't contribute to the sum.
I would use a full text search for the first part of your problem, the second part for ranking on unique occurrences is a bit harder.
Example:
SELECT SQL_CALC_FOUND_ROWS
something_tbl.*,
MATCH(something_tbl.field_1, something_tbl.field_2)
AGAINST (:keywords) AS score
FROM something_tbl
WHERE MATCH(something_tbl.field_1, something_tbl.field_2)
AGAINST (:keywords IN BOOLEAN MODE)
ORDER BY score DESC
Without giving more details on how your table structure looks like, this is only vagely answerable.
However, consider using a fulltext index, if your data is based on string-like datatypes.
Basic example. your_field is in a fulltext index.
+----+-------------------------------------------------------------+
| id | your_field |
+----+-------------------------------------------------------------+
| 1 | red |
| 2 | green red |
| 3 | black red |
| 4 | yellow red green blue orange |
| 5 | black blue |
+----+-------------------------------------------------------------+
And now the SQL:
SELECT *,
MATCH (your_field)
AGAINST ('+yellow +red +green +blue +orange' IN BOOLEAN MODE) AS 'val'
FROM yourtable
WHERE MATCH (your_field)
AGAINST ('+yellow +red +green +blue +orange' IN BOOLEAN MODE)
ORDER BY val DESC;
More information can be found here in the manual.

Searching a database with forgiveness.

I have a database of 30k elements, each game names.
I'm currently using:
SELECT * from gamelist WHERE name LIKE '%%%$search%%%' OR aliases LIKE '%$search%' LIMIT 10"
for the searchbar.
However, it can get really picky about things like:
'Animal Crossing: Wild World'
See, many users won't know that ':' is in that phrase, so when they start searching:
Animal Crossing Wild World
It won't show up.
How can I have the sql be more forgiving?
Replace the non alphanumeric characters in the search parameter with wildcards so Animal Crossing Wild World becomes %Animal%Crossing%Wild%World% and filter on that.
I would suggest you make another table witch contains keyworks like
+---------+------------+
| game_id | keyword_id |
+---------+------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
+---------+------------+
 
+------------+--------------+
| keyword_id | keyword_name |
+------------+--------------+
| 1 | animal |
| 2 | crossing |
| 3 | wild |
| 4 | world |
+------------+--------------+
After that you can easily explode the user given text into keywords and search for them in the database, witch will give you the id's of the possible games he/she was looking for.
Oh, and remove special symbols, like ":" or "-", so you don't need multiple keywords for the same phrase.
The following is from MySQL LIKE %string% not quite forgiving enough. Anything else I can use? by the user M_M:
If you're using MyISAM, you can use full text indexing. See this tutorial
If you're using a different storage engine, you could use a third party full text engine like sphinx, which can act as a storage engine for mysql or a separate server that can be queried.
With MySQL full text indexing a search on A J Kelly would match AJ Kelly (no to confuse matters but A, J and AJ would be ignored as they are too short by default and it would match on Kelly.) Generally Fulltext is much more forgiving (and usually faster than LIKE '%string%') because allows partial matches which can then be ranked on relevance.
You can also use SOUNDEX to make searches more forgiving by indexing the phonetic equivalents of words and search them by applying SOUNDEX on your search terms and then using those to search the index. With soundex mary, marie, and marry will all match, for example.
You can try Match () AGAINST () if your MySQL engine is MyISAM or InnoDB:
http://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html
http://dev.mysql.com/doc/refman/5.6/en/fulltext-boolean.html
Your resulting SQL will be like this:
SELECT * from gamelist WHERE MATCH (name, aliases) AGAINST ('$search' IN BOOLEAN MODE) LIMIT 10
The behavior of the search is more like the boolean search used in search engines.

(My)SQL Query to search for multiple values on multiple tables (some rows, some columns)

I am creating a search using MySQL & PHP on an existing table structure.
Multiple search keywords can be entered and the user can opt to either match ALL or ANY. The any form is not too difficult, but i am breaking my head on writing an efficient solution for the AND form.
The following is about the AND form, so all the search keywords must be found.
The 2 tables i have to work with (search in) have a structure as follows:
Table1
- item_id (non-unique)
- text
Table2
- item_id (unique)
- text_a
- text_b
- text_c
(The real solution will also have a 3rd table, but that is structure the same way as Table1. Table2 will have around 20 searchable columns)
Table1 can have multiple rows for each item_id with different text.
Consider having only 2 search keywords (can be more in real live), then both must exist in:
- both in a single row/column
or:
- in 2 different columns of maybe different tables.
or:
- in 2 different rows with the same item_id (in case of both keywords found in different rows of Table1)
All i could come up with are very intensive sub-queries but that would bring the server down or the response times would be huge.
As i am using PHP i could use intermediate queries and store the results for use in a later final query.
Anyone some good suggestions?
Edit: There where requests for real examples, so here it goes.
Consider the following 2 tables with data:
Table 1
+---------+-----------+-----------+-----------+-----------+
| item_id | t1_text_a | t1_text_b | t1_text_c | t1_text_d |
+---------+-----------+-----------+-----------+-----------+
| 1 | aaa bbb | NULL | ccc | ddd |
| 2 | aaa ccc | ddd | fff | ggg |
| 3 | bbb | NULL | NULL | NULL |
+---------+-----------+-----------+-----------+-----------+
Table2
+---------+----------+---------+
| item_id | sequence | t2_text |
+---------+----------+---------+
| 1 | 1 | kkk lll |
| 2 | 1 | kkk |
| 2 | 2 | lll |
| 3 | 1 | mmm |
+---------+----------+---------+
PS In the real database (which i can not change, so full text indexes or changes to table definition are not an option) Table1 has about 20 searchable columns and there are 2 tables like Table2. This should not make a difference to the solution, although it is something to consider from a performance perspective.
Example searches:
Keywords: aaa bbb
Should return:
- item_id=1. Both keywords are found in column t1_text_a.
Keywords: ccc ddd
Should return:
- item_id=1. "ccc" is found in t1_text_c, "ddd" is found in t1_text_d.
- item_id=2. "ccc" is found in t1_text_a, "ddd" is found in t1_text_b.
Keywords: kkk lll
Should return:
- item_id=1. Both keywords found in a single row of Table2 in column t2_text.
- item_id=2. Both keywords found in Table2, but in separate rows with the same item_id.
Keywords: bbb mmm
Should return:
- item_id=3. "bbb" is found in table1.t1_text_a, "mmm" is found in table2.t2_text.
My progress so far
I actually, for now, gave up on trying to catch this in mostly SQL.
What i did do is to create a query for each table retrieving any row that matches at least 1 of the search keywords. If there is only 1 search keyword the query uses a LIKE, otherwise a REGEXP 'keyword1|keyword2'.
These rows are put in a PHP array with the item_id as the index, and a concatenation of all the strings (searchable columns) as value. When finished retrieving all possible rows, i search the array for rows that match all keywords in the concatenated field.
Most likely not the best solution and it will not scale very well if the search will return many candidate rows with at least 1 match.
It's hard to provide you with a finite answer since you do not give a lot of details about your case.
But maybe this can give you a starting point:
SELECT * FROM table1 AS tbl1
INNER JOIN table2 AS tbl2
WHERE
tbl1.text LIKE %search_word1%
AND tbl1.text LIKE %search_word2%
AND tbl2.text_a LIKE %search_word1%
AND tbl2.text_a LIKE %search_word2%
AND tbl2.text_b LIKE %search_word1%
AND tbl2.text_b LIKE %search_word2%
AND tbl2.text_c LIKE %search_word1%
AND tbl2.text_c LIKE %search_word2%
You can adapt with JOIN, INNER JOIN, LEFT JOIN, RIGHT JOIN and the different LIKE and AND/OR statements to obtain the result you're looking for.
Google some join examples with LIKE statements for more details.
But as Tom H. said, it'd be better if you could post a more precise table structure and a real exemple of search terms...

Categories