I have a requirement to add a search feature to a site I'm building and was wondering if anyone has done something similar.
I have a sample table that contains the details of cats in this format:
Name, place, type, age, gender and size.
And I only have one search box where users can enter their search terms. My question is, how do I search the table if, for example someone types in "cat in Paris"?
I want to be able to search all the fields and return a something if found.
Is there any way to achieve this rather than having lots of boxes for them to select a search criteria? Any help or suggestion would be appreciated.
One of the simpler approaches that works very well in this situation is to do a fulltext search in mysql. You can have it index all of the columns and to a natural language search.
If you had a mysql table called cats with the following schema:
mysql> desc cats;
+--------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(100) | YES | MUL | NULL | |
| place | varchar(100) | YES | | NULL | |
| type | varchar(100) | YES | | NULL | |
| age | int(11) | YES | | NULL | |
| gender | varchar(100) | YES | | NULL | |
| size | varchar(100) | YES | | NULL | |
+--------+--------------+------+-----+---------+----------------+
You can run the following SQL to create the index:
CREATE FULLTEXT INDEX cats_search ON cats (name, type, place, gender);
Then when you get the search string 'male tabby in paris' you can search the table with this SQL:
SELECT *
, MATCH(name, type, place, gender)
AGAINST ('male tabby in paris' IN BOOLEAN MODE) relevance
FROM cats
WHERE MATCH(name, type, place, gender)
AGAINST ('male tabby in paris' IN BOOLEAN MODE)
ORDER BY relevance DESC;
will return all of the rows that match those terms in the order mysql decides is most relevant.
You will have to research mysql fulltext searches to fine tune the results they way you want, but this should get you off the ground.
Related
I'm creating an application using PHP (Codeigniter/MySQL) and within the application are organisations.
Each organisation can have multiple locations, regions, departments, etc (I'm calling these areas)
Each area has an administrator, and sometimes I will need to escalate things to a higher area.
I've currently got all the data in 1 table, and I am using a parent_area_id and area_level to determine the parents,children etc.
But I think this is very inefficient, and I've been pointed towards closure loops, which I have no knowledge of.
Here the database table, is this ok, will it be efficient or is there a better way to do it?
+----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+----------------+
| area_id | int(12) | NO | PRI | NULL | auto_increment |
| area_title | varchar(40) | NO | | NULL | |
| area_name | varchar(40) | NO | | NULL | |
| address1 | varchar(40) | YES | | NULL | |
| address2 | varchar(40) | YES | | NULL | |
| address3 | varchar(40) | YES | | NULL | |
| town | varchar(20) | YES | | NULL | |
| county | varchar(20) | YES | | NULL | |
| post_code | varchar(10) | YES | | NULL | |
| has_ra | varchar(1) | YES | | 0 | |
| org_id | int(12) | NO | MUL | NULL | |
| parent_area_id | int(8) | YES | | NULL | |
| area_level | int(1) | YES | | NULL | |
+----------------+-------------+------+-----+---------+----------------+
EDIT:
(better explanation of how this is being used)
1) Areas relate to customers of the business only.
2) The areas are different area(region,location,department) that a customer might have. (South region, Oxford Office, Accounts Dept).
3) Each area may have many employees allocated.
SO
If I had a regional administrator for example, they might have the following areas under them: e.g:
South Region
Oxford office
Sales Department
Accounts Department
London Office
Marketing
Planning
SO
If I wanted to get the user_id's of all employees under the regional administrator, using the above database structure, i would need to:
1) Query the db to get all area_id's that have a parent_area_id of the regional administrator.
2) Loop through each returned area_id, and query the db and get all area_id's that have a parent_area_id of the returned area_id
3) Continue looping through returned area_id's until we get to the bottom level
4) Query the db to get all user_id's that have an area_id of all above returned records
SO
That doesn't seem very efficient, and needs multiple SQL queries and programming loops to get a list of users associated with a regional manager.
If thats the most efficient way to do it then fine I just don't seem convinced, and im sure there must be an easier way?
There's no serious problem here if you're dealing with a situation where you're escalating one level at a time. I've got no idea how "closure loops" would factor in here, that's programming related, not a database schema concern, and is largely a matter of personal preference.
So long as you don't violate the Zero, One or Infinity Rule of design, you should be okay. Your multiple address fields here skirt the line, that might be better represented as a single field that accepts multiple lines of text, but that is also how a lot of databases traditionally represent arbitrary street addresses.
I'm trying to build a similar facebook style messaging system (conversations).
This is the conversation table.
DESCRIBE conversation;
+----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+----------------+
| c_id | int(11) | NO | PRI | NULL | auto_increment |
| user_one | int(11) | NO | | NULL | |
| user_two | int(11) | NO | | NULL | |
| ip | varchar(30) | NO | | NULL | |
| time | int(11) | NO | | NULL | |
+----------+-------------+------+-----+---------+----------------+
Now before the user can read a conversation, I need to check if the conversation (c_id) exists, and if the user is the owner of the given conversation id. What is the best possible way to write this query?
Example of what I have, which is not working:
$cid = intval($_GET['cid']);
$conv = $this->db->fetchRow('SELECT c_id FROM `conversation` WHERE
user_one=? OR
user_two=? AND
c_id=?',
array($this->user->id, $this->user->id, $cid));
if ($conv) {
// get the conversation replies etc..
}
I see a couple of problems.
One is that you seem to have overlooked that AND has a higher precedence than OR. So the logic of your condition works as if you had written it this way:
WHERE user_one=? OR (user_two=? AND c_id=?)
Whereas I would guess that you intended the logic to work this way:
WHERE (user_one=? OR user_two=?) AND c_id=?
But if that's how you intended it to work, I wonder why you need to search for the user id's at all, since the condition on c_id=? will select only one row (or zero rows if there's no match), because it's searching for one specific primary key value.
I'm using generic Sphinx with Python (though I tested this against PHP as well and got the same problem). I have a table where I have several fields I want to be able to search in sphinx against but it seems like only some of the fields get indexed.
Here's my source (dbconfig just has the connection information):
source bill_src : dbconfig
{
sql_query = \
SELECT id,title,official_title,summary,state,chamber,UNIX_TIMESTAMP(last_action) AS bill_date FROM bill
sql_attr_timestamp = bill_date
sql_query_info = SELECT * FROM bill WHERE id=$id
}
Here's the index
index bills
{
source = bill_src
path = /var/data/bills
docinfo = extern
charset_type = sbcs
}
I'm trying to use extended match mode. It seems that title and summary are fine but the official_title, the state and the chamber fields are ignored in the index. So for example if I do:
#official_title Affordable Care Act
I get:
query error: no field 'official_title' found in schema
but the same query with #summary produces results. Any ideas what I'm missing?
EDIT
Here's the table I'm trying to index:
+--------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| bt50_id | int(11) | YES | MUL | NULL | |
| type | varchar(10) | YES | | NULL | |
| title | varchar(255) | YES | | NULL | |
| official_title | text | YES | | NULL | |
| summary | text | YES | | NULL | |
| congresscritter_id | int(11) | NO | MUL | NULL | |
| last_action | datetime | YES | | NULL | |
| sunlight_id | varchar(45) | YES | | NULL | |
| number | int(11) | YES | | NULL | |
| state | char(2) | YES | | NULL | |
| chamber | varchar(45) | YES | | NULL | |
| session | varchar(45) | YES | | NULL | |
| featured | tinyint(1) | YES | | 0 | |
| source_url | varchar(255) | YES | | | |
+--------------------+--------------+------+-----+---------+----------------+
I seem to have fixed the problem, though I'll admit this is all dumb luck so it might not be a root cause:
First I thought maybe it didn't like the order of the fields in the query I have the only attribute field last so I decided to move it to after the ID:
SELECT id, UNIX_TIMESTAMP(last_action) AS bill_date, \
title,official_title,summary,state,chamber, FROM bill
This did not fix the problem.
Secondly, I noticed all the example date fields are converted using UNIX_TIMESTAMP and then aliased to the same name, so instead of UNIX_TIMESTAMP(last_action) AS bill_date I changed it to UNIX_TIMESTAMP(last_action) AS last_action ... the first attempt tripped me up though because it still wasn't working.
Finally I dropped the date altogether and added each field successfully (re-indexing and testing each time). Each time it worked and finally I added the date field on the end and I was able to sort by it and search all the fields. So the final query is:
SELECT \
id,title,official_title,summary,state,chamber, \
UNIX_TIMESTAMP(last_action) AS last_action FROM bill
It seems that attribute fields must come after the full text fields and aliases must be the same name as the actual field name. I find it strange that the date field seemed fine but other fields suddenly disappeared (randomly!).
I hope this helps someone else though I feel it might be some kind of isolated bug that doesn't affect many people. (This is on OSX and sphinx was compiled by hand)
Little rusty on sphinx, but believe in your source { } clause needs a sql_field_string definition.
source bill_src : dbconfig
{
sql_query = \
SELECT \
id,title,official_title,summary,state,chamber, \
UNIX_TIMESTAMP(last_action) AS bill_date \
FROM bill
sql_attr_timestamp = bill_date
sql_field_string = official_title
sql_query_info = SELECT * FROM bill WHERE id=$id
}
According to http://sphinxsearch.com/docs/1.10/conf-sql-field-string.html the sql_field_string declaration will index and store the string for referencing. That's different from a sql_attr_string, which is stored but not indexed.
We are implementing a system that analyses books. The system is written in PHP, and for each book loops through the words and analyses each of them, setting certain flags (that translate to database fields) from various regular expressions and other tests.
This results in a matches table, similar to the example below:
+------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| regex | varchar(250) | YES | | NULL | |
| description | varchar(250) | NO | | NULL | |
| phonic_description | varchar(255) | NO | | NULL | |
| is_high_frequency | tinyint(1) | NO | | NULL | |
| is_readable | tinyint(1) | NO | | NULL | |
| book_id | bigint(20) | YES | | NULL | |
| matched_regex | varchar(255) | YES | | NULL | |
| [...] | | | | | |
+------------------------+--------------+------+-----+---------+----------------+
Most of the omitted fields are tinyint, either 0 or 1. There are currently 25 fields in the matches table.
There are ~2,000,000 rows in the matches table, the output of analyzing ~500 books.
Currently, there is a "reports" area of the site which queries the matches table like this:
SELECT COUNT(*)
FROM matches
WHERE is_readable = 1
AND other_flag = 0
AND another_flag = 1
However, at present it takes over a minute to fetch the main index report as each query takes about 0.7 seconds. I am caching this at a query level, but it still takes too long for the initial page load.
As I am not very experienced in how to manage datasets such as this, can anyone advise me of a better way to store or query this data? Are there any optimisations I can use with MySQL to improve the performance of these COUNTs, or am I better off using another database or data structure?
We are currently using MySQL with MyISAM tables and a VPS for this, so switching to a new database system altogether isn't out of the question.
You need to use indexes, create them on the columns you do a WHERE on most frequently.
ALTER TABLE `matches` ADD INDEX ( `is_readable` )
etc..
You can also create indexes based on multiple columns, if your doing the same type of query over and over its useful. phpMyAdmin has the index option on the structure page of the table at the bottom.
Add multi index to this table as you are selecting by more than one field. Below index should help a lot. Those type of indexes are very good for boolean / int columns. For indexes with varchar values read more here: http://dev.mysql.com/doc/refman/5.0/en/create-index.html
ALTER TABLE `matches` ADD INDEX ( `is_readable`, `other_flag`, `another_flag` )
One more thing is to check your queries by using EXPLAIN {YOUR WHOLE SQL STATEMENT} to check which index is used by DB. So in this example you should run query:
EXPLAIN ALTER TABLE `matches` ADD INDEX ( `is_readable`, `other_flag`, `another_flag` )
More info on EXPLAIN: http://dev.mysql.com/doc/refman/5.0/en/explain.html
I'm facing the following situation.
We've got an CMS with an entity with translations. These translations are stored in a different table with a one-to-many relationship. For example newsarticles and newsarticle_translations. The amount of available languages is dynamically determined by the same CMS.
When entering a new newsarticle the editor is required to enter at least one translation, which one of the available languages he chooses is up to him.
In the newsarticle overview in our CMS we would like to show a column with the (translated) article title, but since none of the languages are mandatory (one of them is mandatory but i don't know which one) i don't really know how to construct my mysql query to select a title for each newsarticle, regardless of the entered language.
And to make it all a little harder, our manager asked for the possibilty to also be able to sort on title, so fetching the translations in a separate query is ruled out as far as i know.
Anyone has an idea on how to solve this in the most efficient way?
Here are my table schema's it it might help
> desc news;
+-----------------+----------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+----------------+------+-----+-------------------+----------------+
| id | int(10) | NO | PRI | NULL | auto_increment |
| category_id | int(1) | YES | | NULL | |
| created | timestamp | NO | | CURRENT_TIMESTAMP | |
| user_id | int(10) | YES | | NULL | |
+-----------------+----------------+------+-----+-------------------+----------------+
> desc news_translations;
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| enabled | tinyint(1) | NO | | 0 | |
| news_id | int(1) unsigned | NO | | NULL | |
| title | varchar(255) | NO | | | |
| summary | text | YES | | NULL | |
| body | text | NO | | NULL | |
| language | varchar(2) | NO | | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
PS: i've though about subqueries and coalesce() solutions but those seem rather dirty tricks, wondering if something better is know that i'm not thinking of?
This is not a fast approach, but I think it gives you what you want.
Let me know how it works, and we can work on speed next :)
select nt.title
from news n
join news_translations nt on(n.id = nt.news_id)
where nt.title is not null
and nt.language = (
select max(x.language)
from news_translations x
where x.title is not null
and x.new_id = nt.news_id)
order
by nt.title;
Assuming I've read your problem aright, you want to get a list of titles for articles, preferring the "required" language? A query for that might go along the lines of ...
SELECT * FROM (
SELECT nt.`title`, nt.news_id
FROM news n
INNER JOIN news_translations nt ON (n.id = nt.news_id)
WHERE title != ''
ORDER BY
CASE
WHEN nt.language = 'en' THEN 3
WHEN nt.language = 'jp' THEN 2
WHEN nt.language = 'de' THEN 1
ELSE 0 END DESC
) AS t1
GROUP BY `news_id`
This example prefers a title in English (en) if available, Japanese (jp) as a second preference, and German (de) as a third, but will display the first 'other' entry if none of the requested languages are available.