My database contains a list of phone numbers which is of varchar type. Phone number may be in any of these formats
12323232323
1-232-323 2323
232-323-2323
2323232323
Instead of the – symbol there may be ( ) , . or space
And if I search for 12323232323, 1-232-323 2323, 232-323-2323, or 2323232323 it should display all these results. I need to write a query for this.
I think it is not efficient to do this realtime, I propose two options.
clean the data, so there will be only one format.
add another column which contains the clean data, so when you search, you search for this column, when display you can display the various format data.
I agree with James, but if you really need to search the database as it is, perhaps MySQL's REPLACE operator will get you where you need to go. Something like
select * from mytable where replace(crazynumber,'-','')='23232323';
How to Replace Multiple Characters in SQL?
Can MySQL replace multiple characters?
Agree with James, but if u really need to do this, the above two links have proposed the prefect solutions for your scenario.
Related
How can I use regular expressions in MySQL to rewrite the column value to be matched with an exact string? I can only find guides that do the opposite.
SELECT * FROM customers WHERE regexp_replace('([^0-9])', '', phone) = '0123456789';
The reason is that the column can contain all kinds of formatting e.g. "012-345 6789" "(0)12-3456789" and so on...
Please note: This is NOT a question about how data should better be stored. But wheither regexp replaces are possible or not. The example is only demonstrative to simplify the question and it's nature.
You can improve your application using this 2 steps:
write migration which convert you data with different formats to one
canonical
move formatting of this values to your view layer
This approach gives you:
ease in searching by this field
flexibility in using different formats for this field in differents views
I have a PHP interface with a keyword search, working off a DB(MySQL) which has a Keywords field.
The way in which the keywords field is set up is as follows, it is a varchar with all the words formatted as shown below...
the, there, theyre, their, thermal etc...
if i want to just return the exact word 'the' from the search how would this be achieved?
I have tried using 'the%' and '%the' in the PHP and it fails to work by not returning all of the rows where the keyword appears in.
is there a better (more accurate) way to go about this?
Thanks
If you want to select the rows that have exactly the keyword the:
SELECT * FROM table WHERE keyword='the'
If you want to select the rows that have the keyword the anywhere in them:
SELECT * FROM table WHERE keyword LIKE '%the%'
If you want to select the rows that start with the keyword the:
SELECT * FROM table WHERE keyword LIKE 'the%'
If you want to select the rows that end with the keyword the:
SELECT * FROM table WHERE keyword LIKE '%the'
Try this
SELECT * FROM tablename
WHERE fieldname REGEXP '[[:<:]]test[[:>:]]'
[[:<:]] and [[:>:]] are markers for word boundaries.
MySQL Regular Expressions
if you also search for the commas, you can be sure you are getting the whole word.
where keywordField like '%, the, %'
or keywordField like '%, the'
or keywordField like 'the, %'
maybe I didn't understand the question properly... but If you want all the words where 'the' appears, a LIKE '%word%' should work.
If the DB of words is HUGE MySQL may fail to retrieve some of the words, that can be solved in 2 ways...
1- get a DB that support bigger sizes (not many ppl would chose this one tho). For example SQL Server has a 'CONTAINS' function that works better than LIKE '%word%'.
2- use a external search tool that uses inverted index search. I used Sphinx for a project and it works quite good. This is better if you rarely UPDATE the rows of the data you want to search from, which should be the case.
Sphinx for example would generate a file from your MySQL table and use this file to solve the search (it's very fast), this file should be re-indexed everytime you do a insert or update on the table, making it a much better solution if you rarely update or insert new rows.
It looks like you have a one to many relationship going on within a column. It might be better to create a separate table for keywords with a row for each keyword and a foreign key to whatever it is you're searching on.
Doing like '%???%' is generally a bad idea because the DB can't make use of an index so it will scan the whole table. Whether this matters will depend on the size of data you're working with but its worth considering up front. The single best way to help DB performance is in the initial table design. This can be tricky to change later.
I have a feed that comes from the State of Florida in a CSV that I need to load daily into MySQL. It is a listing of all homes for sale in my area. One field has a list of codes, separated by commas. Here's one such sample:
C02,C11,U01,U02,D02,D32,D45,D67
These codes all mean something (pool, fenced in area, etc) and I have the meanings in a separate table. My question is, how should I handle loading these? Should I put them in their own field as they are in the CSV? Should I create a separate table that holds them?
If I do leave them as they are in a field (called feature_codes), how could I get the descriptions out of a table that has the descriptions? That table is simply feature_code, feature_code_description. I don't know how to break them apart in my first query to do the join to bring the description in.
Thank you
As a general rule, csv data should never stored in a field, especially if you actually need to consider individual bits of the csv data, instead of just the csv string as a whole.
You SHOULD normalize the design and split each of those sub "fields" into their own table.
That being said, MySQL does have find_in_set() which allows you sort-of search those csv strings and treat each as its own distinct datum. It's not particularly efficient to use this, but it does put a bandaid on the design.
You should keep the information about feature codes in a separate table, where each row is a pair of house identifier, and feature identifier
HouseID FeatureID
1 C07
1 D67
2 D02
You can use explode() to separate your CSV string : http://php.net/manual/en/function.explode.php
$string = 'C02,C11,U01,U02,D02,D32,D45,D67';
$array = explode(',', $string);
Then with your list of feature_codes you can easily retrieve your feature_code_description but you need to do another query to get an array with all your feature_codes and feature_code_description.
Or split your field and put it in another table with the home_id.
You can save it in your DB as is and when you read it out you can run the php function explode. Go check that function out. It will build an array for you out of a string separating the values by whatever you want . In your case you can use:
$array_of_codes = explode(",", $db_return_string);
This will make an array out of each code separating them by the commas between them. Good luck.
could someone please point me in the right direction, I currently have a searchable database and ran into the problem of searching by title.
If the title begins with "The" then obviously the title will be in the 'T' section, what is a good way to avoid "The" being searched ? Should i concat two fields to display the title but search by only the second title ignoring the prefix. or is there another way to do this? Advice or direction would be great. thanks.
A few choices:
a) Store the title in "Library" format, which means you process the title and store it as
Scarlet Pimpernel, The
Tale of Two Cities, A
b) Store the original unchanged title for display purposes, and add a new "library_title" field to store the processed version from a).
c) Add a new field to store the articles, and the bare title in title field. For display, you'd concatenate the two fields, for searching you'd just look in the title field.
I believe the best approach is to use full-text search, with 'the' in the stopwords list. That would solve the search problem (i.e., 'the' on search phrases would be ignored).
However, if you are ordering the results by title, a title starting with 'The' would still be sorted, "in the 'T' section", as you put it. To solve that, there are several possible approaches. Here are some of them:
Separating the fields, the way you said on the quesiton
Having a separate field with the number of chars to be ignored from the beginning when sorting
Replacing initial 'The's for sorting
Among others...
If you are using mysql, you could use a str_replace function to remove "The" from your query, or if you are using PHP or Ruby or another language you can just sanitize your query before sending to the database server.
Create three columns in the database
1) TitlePrefix
2) Title
3) TitlePostfix
Code such that you have 4 methods like
searchTitleOnly(testToSearch) // search only title column
searchTitleWithPrefixAndPostfix(testToSearch)//concat all the three columns and search
searchTitlePrefix(testToSearch) // search title prefix only
searchTitlePostfix(testToSearch) // search title postfix only
Try looking into some sql functions like LTRIM, RTRIM etc and use these functions on a temp column which has exact same data. Modify the data by using LTRIM, RTRIM by dropping whichever words u please. Then perform the search on the modified column and return the entire row as the result!
Say if I had a table of books in a MySQL database and I wanted to search the 'title' field for keywords (input by the user in a search field); what's the best way of doing this in PHP? Is the MySQL LIKE command the most efficient way to search?
Yes, the most efficient way usually is searching in the database. To do that you have three alternatives:
LIKE, ILIKE to match exact substrings
RLIKE to match POSIX regexes
FULLTEXT indexes to match another three different kinds of search aimed at natural language processing
So it depends on what will you be actually searching for to decide what would the best be. For book titles I'd offer a LIKE search for exact substring match, useful when people know the book they're looking for and also a FULLTEXT search to help find titles similar to a word or phrase. I'd give them different names on the interface of course, probably something like exact for the substring search and similar for the fulltext search.
An example about fulltext: http://www.onlamp.com/pub/a/onlamp/2003/06/26/fulltext.html
Here's a simple way you can break apart some keywords to build some clauses for filtering a column on those keywords, either ANDed or ORed together.
$terms=explode(',', $_GET['keywords']);
$clauses=array();
foreach($terms as $term)
{
//remove any chars you don't want to be searching - adjust to suit
//your requirements
$clean=trim(preg_replace('/[^a-z0-9]/i', '', $term));
if (!empty($clean))
{
//note use of mysql_escape_string - while not strictly required
//in this example due to the preg_replace earlier, it's good
//practice to sanitize your DB inputs in case you modify that
//filter...
$clauses[]="title like '%".mysql_escape_string($clean)."%'";
}
}
if (!empty($clauses))
{
//concatenate the clauses together with AND or OR, depending on
//your requirements
$filter='('.implode(' AND ', $clauses).')';
//build and execute the required SQL
$sql="select * from foo where $filter";
}
else
{
//no search term, do something else, find everything?
}
Consider using sphinx. It's an open source full text engine that can consume your mysql database directly. It's far more scalable and flexible than hand coding LIKE statements (and far less susceptible to SQL injection)
You may also check soundex functions (soundex, sounds like) in mysql manual http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex
Its functional to return these matches if for example strict checking (by LIKE or =) did not return any results.
Paul Dixon's code example gets the main idea across well for the LIKE-based approach.
I'll just add this usability idea: Provide an (AND | OR) radio button set in the interface, default to AND, then if a user's query results in zero (0) matches and contain at least two words, respond with an option to the effect:
"Sorry, No matches were found for your search phrase. Expand search to match on ANY word in your phrase?
Maybe there's a better way to word this, but the basic idea is to guide the person toward another query (that may be successful) without the user having to think in terms of the Boolean logic of AND and ORs.
I think Like is the most efficient way if it's a word. Multi words may be split with explode function as said already. It may then be looped and used to search individually through the database. If same result is returned twice, it may be checked by reading the values into an array. If it already exists in the array, ignore it. Then with count function, you'll know where to stop while printing with a loop. Sorting may be done with similar_text function. The percentage is used to sort the array. That's the best.