Suppose I have a user table. One of a column of the table store for user first name.
Also suppose there are there rows in the table. The user first names are as follows :
'Suman','Sumon','Papiya'.
Now I want a mysql query if an user search from the table by user first name with 'Suman' then the result will shows two rows one for 'Suman' and another for 'Sumon'.
You can use soundex it will compare if the sound of values in firstname matches to the sound of provided word
According to docs
When using SOUNDEX(), you should be aware of the following
limitations:
This function, as currently implemented, is intended to work well with strings that are in the English language only. Strings in other
languages may not produce reliable results.
This function is not guaranteed to provide consistent results with strings that use multi-byte character sets, including utf-8.
select *
from t
where soundex(firstname)=soundex('Suman')
Demo
Related
I have a database with 150,000 records and I need to match its FULL column value or records, with some parts of the string.
**
As i want to check if the STRING contains the COLUMN records and NOT!
if the COLUMN contains the string
**
(Example below)
For testing purposes
Lets say the database has a TABLE , 1 COLUMN and 1 record as the records are similar to this:
come to me
and i need to match it with this #STRING
She wants to come to me now
I want to execute something similar to :(but this doesn't work of course)
SELECT * from TABLE where #STRING like '%'+COLUMN+'%'
I can't seem to solve this with SQL the usage of PHP is possible but prefer if the solution is with SQL but if the solution with PHP is available please propose it and note that the database has over 150,000 records
SELECT * from TABLE where LOCATE(COLUMN, #STRING)>0;
LOCATE(a,b) returns a number giving the character location of a in b, or returns 0.
See Mysql LOCATE()
The docs discuss that LOCATE() is only case sensitive when one of the strings is a 'binary string'. That probably doesn't affect your use case, though if it became an issue you could CONVERT() the binary strings to a locale and use LOWER() to get lower case.
MySQL String Functions
The dynamic like syntax for mysql is
SELECT * from TABLE where #STRING like CONCAT('%',COLUMN,'%')
I have database with a field 'clinicNo' and that field contains records like 1234A, 2343B, 9999Z ......
If by mistake I use '1234B' instead of '1234A' for the select statement, I want to get a result set which contains clinicNos which are differ only by a one character to the given string (ie. 1234B above)
Eg. Field may contain following values.
1234A, 1235B, 5433A, 4444S, 2978C
If I use '1235A' for the select query, it should give 1234A and 1235B as the result.
You could use SUBSTRING for your column selection, below example return '1235' with 'A to Z'
select * from TableName WHERE SUBSTRING(clinicNo, 0, 5) LIKE '1235A'
What you're looking for is called the Levenshtein Distance algorithm. While there is a levenshtein function in PHP, you really want to do this in MySQL.
There are two ways to implement a Levenshtein function in MySQL. The first is to create a STORED FUNCTION which operates much like a STORED TRANSACTION, except it has distinct inputs and an output. This is fine for small datasets, but a little slow on anything approaching several thousand rows. You can find more info here: http://kristiannissen.wordpress.com/2010/07/08/mysql-levenshtein/
The second method is to implement a User Defined Function in C/C++ and link it into MySQL as a shared library (*.so file). This method also uses a STORED FUNCTION to call the library, which means the actual query for this or the first method may be identical (providing the inputs to both functions are the same). You can find out more about this method here: http://samjlevy.com/2011/03/mysql-levenshtein-and-damerau-levenshtein-udfs/
With either of these methods, your query would be something like:
SELECT clinicNo FROM words WHERE levenshtein(clinicNo, '1234A') < 2;
It's important to remember that the 'threshold' value should change in relation to the original word length. It's better to think of it in terms of a percentage value, i.e. half your word = 50%, half of 'term' = 2. In your case, you would probably be looking for a difference of < 2 (i.e. a 1 character difference), but you could go further to account for additional errors.
Also see: Wikipedia: Levenshtein Distance.
SELECT * FROM TABLE
WHERE ClinicNo like concat(LEFT(ClinicNo,4),'%')
In general development, you could use a function like Levenshtein to find the difference between two strings and it returns you a number of "how similar they are". You probably want then the result with the most similarity.
To get Levenshtein also in MySQL, read this post.
Or just get all results and use the Levenshtein function of PHP.
In the interest of good relational database design:
There are currently two columns in the DB: "GroupName" and "WebGroupName". The second column is used for simple url access to a profile. Eg: www.example.com/myWebGroupName the reason for this is that it avoids spaces being passed in the url for example: www.example.com/my Web Group Name would not work
To re-iterate the DB structure; column 1 would store "My Group Name" and column two would store "MyGroupName".
Possible solutions may err on the side of storing the group name without spaces then using some regular expression to add the spaces back. The focus of my question is how to eliminate the need for two columns storing near-duplicate date.
Thank you for your time
Assuming that you really have a problem with spaces in URLs (as Larry Lustig pointed out it isn't necessarily a problem) - Then it isn't bad relational database design to have two columns that often have very similiar information.
The kind of repetition that is to be avoided (normalized) deals with repetition across multiple rows. If you have two columns which are meant to contain different, but related information, then these two columns are perfectly OK and you aren't breaking any rules. The fact that sometimes these two columns are equal (coincidentally) is not a problem.
You said:
Possible solutions may err on the side of storing the group name
without spaces then using some regular expression to add the spaces
back. The focus of my question is how to eliminate the need for two
columns storing near-duplicate date.
From this I assume that what is most important to your system is the web group name. If the group name were the driver then writing an expression that removes spaces would be trivial. If the web group name is something that can be set arbitrarily based on the group name, then you should store the name with spaces and replace them with empty strings when you need a web group name. If the web group name is not completely arbitrary then you really do have two independent data points and they need to be stored in two separate columns.
I have a two separate tables that contain parts of user name (don't ask why)...
t1
---------------
firstName
lastName
t2
---------------
middleName
stage_firstName
stage_middleName
stage_lastName
So before I output the name I run it through a function that capitalizes First letter of Name and uses Stage name if provided.
It works OK, but I now have a case where I need to display multiple names. The question I have, is: shall I use mySQL to store properly formatted name when the user formats it initially, or keep the values in multiple tables and keep on using the function to format them. For some reason I think I can improve some performance by utilizing a single value from a table, even if I add additional table column rather than keeping the fields separately in two separate tables and then parsing each name through this huge function.
Am I wrong with these assumptions?
And if, at some point, you need to extract the name and display/use it in a different format, you will need to then perform some kind of translation on the already formatted string.
You could write the formatting into the query though.
I have a table full of data, where one column is full of different entries for each row, where the data is formatted like this:
A:some text|B:some other text|C:some more text|
I want to separate those strings of text into two columns on a new table. So the new table should have one column for A, B, C etc. and the other column will have the rest of the text in their respective rows.
And there is another value (a DATETIME value) in a separate column of the first table that I would like to copy into a third column for each of the separated entries.
Let me know if this needs clarificaiton, I know it's kind of confusing and I'm pretty fuzzy with MySQL. Thanks!
MySQL supports SUBSTRING, together with LOCATE you could probably whip up something nice, based on the pipe symbol you seem to use as a separator.
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_locate
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_substring
In most cases I prefer to write "convertors" in a another language than perform it directly on the database, however in this situation it looks like it's not that much data so 'might' work fine..
I think you should better write a simple script in VBScript, PHP or any other scripting language of your choice. All scripting languages provide you with string manipulation and date formatting functions. Database queries won't allow you to handle the "unexpected".