my mysql table contains an entry
Foo Bar
(i.e will potentially include capitals and whitespace)
I need search the DB using the string 'foobar' and match this
foobar is generated originally from the DB using:
$i = str_replace(' ','',$i);
$i = preg_replace("/[^A-Za-z0-9]/", "", $i);
$i = strtolower($i);
It isn't easy to convert Foo Bar to foobar in MySQL prior to performing comparisons (although it is possible with UDFs). Then you need only do:
WHERE LOWER(preg_replace('/[^A-Za-z0-9]/', '', field)) = 'foobar'
The easiest option would be to use regular expression pattern matching, which isn't case sensitive by default—simply insert [^[:alnum:]]* between every character in foobar (within PHP) and then search the database:
WHERE field RLIKE '^[^[:alnum:]]*f[^[:alnum:]]*o[^[:alnum:]]*o[^[:alnum:]]*b[^[:alnum:]]*a[^[:alnum:]]*r[^[:alnum:]]*$'
I think the most elegant (though somewhat harder) solution would be to define your own collation that ignores non alphanumeric characters and lettercase and then use that.
If only spaces are the problem, and your collation is case-insensitive:
WHERE REPLACE(tn.fieldname,' ','') LIKE 'foobar';
if your collation is case-sensitive:
WHERE LOWER(REPLACE(tn.fieldname,' ','')) LIKE LOWER('foobar');
It will however result in a full-table scan, so not a great performance. Normalizing fieldname to what you are searching for, or another column with a normalized fieldname would perform better, especially with an index on it.
MySQL doesn't have anything equivalent to gawk's gensub() or PHP's preg_replace(). While you could use MySQL's REPLACE() and LOWER() functions to remove specific characters, you can't tell MySQL to remove /[^[:alnum:]]/.
My suggestion would be to use a UDF.
Related
I have this issue with mysql when querying a DB inside PHP.
The PHP code is:
$Query = "SELECT COUNT(*) FROM theTable WHERE fieldValue REGEXP 'Dom-R[eéèêë]my'";
$DBR = mysql_query($Query,$Connection);
I am expecting this query to get things like, I mean find the number of those:
Dom-Remy
Dom-Rémy
Dom-Rèmy
...etc...
But I get nothing, I mean zero. What is wrong in the code? I have tried several variations, all equally not working.
This is subject of Unicode characters.
What happens is that e,é,è,ê,ë.. in your example is not a single letter but 2 because the tilde counts as a character as well. This brings lots of complexities and rules that needs to be followed in order to meet Unicode rules.
You could do something like: ([\x{0049}-\x{0130}]) to search letters with tildes but this expression may vary depending if you are going to use this expression on .net, java, javascript or php.
You could also check what code each character represents here:
http://www.fileformat.info/info/unicode/char/search.htm?q=%C4%B0&preview=entity
As per official website specification, MySQL regex is matched in byte-wise fashion
The REGEXP and RLIKE operators compare characters by their
byte values and accented characters may not compare as equal even if a
given collation treats them as equal.
If you can match any character in place of [eéèêë], this should be sufficient:
$Query = "SELECT COUNT(*) FROM theTable WHERE field REGEXP '^Dom-R.+?my$'";
If
the column's CHARACTER SET is utf8 or utf8mb4, and
your connection between client and mysql server is also either of those character set, and
you are not using COLLATION utf8_bin, then
'Dom-Remy' = 'Dom-Rémy' = ...
WHERE ... = ... and WHERE ... LIKE ... will abide by the above. REGEXP (RLIKE) cannot be used, for the reasons already discussed.
This shows what is equal (for = and LIKE.)
If you are simply searching a string for Dom-Remy, use
fieldValue LIKE '%Dom-Remy%`
and instead of regexp/rlike
If you have something more complex that needs REGEXP, then start a new question with the details.
I have an array of strings generated randomly. Now, how am I going to check if a string is correctly spelled or not, based on US English dictionary. This way, I can remove non-English words from the list.
What I did right now is to loop through the list and have it queried to a database of dictionary words. Unfortunately, it is not efficient especially if my list contains hundred of words.
I have read about Aspell but unfortunately, I have to install it, and I am restricted because I am hosting the program in a shared web hosting.
Anyway, here's what I have so far:
// generate random strings using the method I coded
// returns a string array of generated strings
// no duplicates generated here
// just plain permutations
$generated_list = generate();
Since I have read an article that instead of looping and do query for each string, I just did a single query, like this Performing A Query In A Loop :
$only_english_list = [];
if (count($generated_list)) {
$result = $connection->query("SELECT `word` FROM `us_eng` WHERE `word` IN (" . implode(',', $generated_list));
while ($row = $result->fetch_row()) {
$only_english_list[] = $row['word'];
}
}
However, is there more efficient in checking if a string is in English dictionary? Something like a method that will return true or false?
I now have an answer to this problem. Here's what I did. Instead of generating permutations, which takes a MASSIVE amount of time and resources, I just utilize IMMEDIATELY the capability of MySQL. That is, I used REGEXP or LIKE against a table of English words of a certain length.
So, for English words that can be formed from vleoly, I used this query to a table of English words of length 6, noted by us_6.
SELECT word FROM us_6 WHERE
word REGEXP 'v' AND
word REGEXP 'l.*l' AND
word REGEXP 'e' AND
word REGEXP 'o' AND
word REGEXP 'y'
And results generated are lovely and volley.
For more information, check MySQL, REGEXP - Find Words Which Contain Only The Following Exact Letters
Thanks for taking time to read my question.
I have created a MySQL table, a HTML form and a program in PHP which connects the form to MySQL table and retrieves sequences for column Annotations which is text data type.
This column has characters and also has one or more of hyphen, comma, parentheses, period or spaces.
Please look at the following code that I used for select query:
$values=mysql_query("SELECT Sequence
FROM oats
WHERE Foldchange = '$Foldchange' AND
RustvsMockPvalue = '$RustvsMockpvalue' AND
Annotations REGEXP '%$Annotation%[-]+'");
Here $Annotation is the form variable which holds the value entered by the user in the form. Annotations is the column name in the MySQL table.
Annotations column has characters A-Z or a-z and one or more of hyphen, comma, space or parentheses like the following.
Sequence is another text column in the MySQL table but does not have ,./().
Example data from Annotations column:
ADP, ATP carrier protein, mitochondrial precursor (ADP/ATP translocase) (Adenine nucleotide translocator) (ANT).
I am not able to retrieve Sequence column data when I search for any Annotations column data with comma, parentheses, period and slash. It works fine for those records which does not have these ,.()/.
I tried to use LIKE instead of REGEX but it didn't work either.
A record from mysql table:(columns that you see below: contigid,source,genelength,rustmeans, mockmeans,foldchange,pvalue,rustmockteststatistic,Annotations and Sequence)
as_rcr_contig_10002 ORME1 2101 506.33 191 -2.18 2.21E-10 -6.35 Tesmin/TSO1-like, CXC domain containing protein. AACAATTCCCCTCAACCAACCTTTTATTTCATCCCATTTTTATCATCTGTCCGGTTACAGATTTTGCTTCCAGTTAGGTGCCACTTCTTCAAACGCTCAACCCTTACCCACTACCACCCCACCAAAACCAACCCCCCAAGATGCAGTTCATCACTCTCGCCGTTGCTTTTGCTTTCTTTGCTGGTGCCANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTTTTGCTTTCTTTGCTGGTGCCACCTCGTCGCCGGTTTCCATGGACCCCAAAGCCGAGAAGTCCGGCTCCTCGGGATCCGGTGGCGCCCCTCTGGGCACTGCTAGCCCCTATCCCCAAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGTGGCCCTCAGTCGCCAGGCTCTGGCCAACCCGGTAGGATGCCATGGGGTAGCGACCAATCTGCCTACGGTGGTGGTTTCCCTTATGGATCATTCCCCTCGGTTTCGGGGCAATCCCAATCGACGGCCTATGCTCAAGCTCAATCATCCAGTTTCCCCTCAAACGGTGTCCCGACACACTCCTCGGCCTCCGCCCAAGCGCAATCATCCGGTCCTGGACAAGCTCAGGCAGCCGCTTCTGCCCAGGTTCCCGGCGGCCCCCACGGTCAAGGTTCTAACGGATTTGGCGCACAAGGCCAGTTTGGACAGAACGGGCAGAACGGCCTCTATGGTCAAGACGGCAATGGCTTTAGTGCCCAAGGCCAATTTGGACAGAGTGGACAGAATGGCTTCTATGGTCA
Could someone please help me in the correct syntax of the SELECT syntax? Thank you.
You need to familiarise yourself with regex - it's its ownittke language.
Use REGEXP with the right regex:
WHERE ...
AND Annotations REGEXP '[-A-Za-z(). ]+'
AND Annotations NOT_REGEXP '[A-Za-z]+'
If mysql supported regex look aheads, this could be done in one test.
,
First of all, you are not using REGEXP properly.
You should check the differences between LIKE and REGEXP.
REGEXP use Regular expresions, which have very particular syntax.
LIKE use simple text remplacement with key characters like % or _
Here you are using REGEXP with %, that's why it's not working. % is a key character for LIKE only.
But in REGEXP, . and - are special characters that you need to escape to.
If you want to check several characters, REGEXP is the way to go :
Annotations REGEXP '.*$Annotation.*[\-(),\.]+.*'
This match :
.* : 0 to n characters
$Annotation : Your keyword
.* : 0 to n characters
[\-(),\.]+ : At least 1 character from the list : - ( ) , .
.* : 0 to n characters
Tell us if that match your data.
Since we can't craft a Regular Expression that would work in your case without getting into some crazy matching schemes (orders and so forth), In order to find what you're looking for, you'll need to custom construct the SQL statement and luckily you're using PHP.
Here I'm starting with a simple space delimited entry. Remember that you can't wrap something with parenthesis because the parenthesis might not match up in your result set.
$search_input = 'ADP ANT';
//example of array from a search page full of check boxes or fields
$annSearches = explode(' ',$search_input);
/*annSearches is now and array with ADP,ANT*/
$sql = "SELECT Sequence FROM oats WHERE Foldchange = '$Foldchange' AND RustvsMockPvalue = '$RustvsMockpvalue'";
foreach ($annSearches as $Annotation){
$sql .= " AND Annotations LIKE '%$Annotation%'";
}
The output SQL statement would look like this (wrapped for clarity):
SELECT Sequence FROM oats WHERE
Foldchange = '$Foldchange'
AND RustvsMockPvalue = '$RustvsMockpvalue'
AND Annotations LIKE '%ADP%'
AND Annotations LIKE '%ANT%';
If you do a really long query, this will get slower and slower as MySQL has to run through every record in the database over and over for the results.
FULLTEXT SEARCH OPTION
Another way that you could potentially do this is to enable FULLTEXT search functionality on the Annotations field in the table in the database.
ALTER TABLE oats ADD FULLTEXT(Annotations);
This would allow you to do a search something like this:
Sequence FROM oats WHERE
Foldchange = '$Foldchange'
AND RustvsMockPvalue = '$RustvsMockpvalue'
MATCH(Annotations) AGAINST ('ADP ANT')
The string looks like this:
abc def
123 'abc' abc "def"
bla bla
So I want to replace abc with something else, but to not affect the abc that's within quotes. Same with def, which is using double quotes...
This is actually for a string that contains SQL queries, I want to replace table names without replacing by mistake data from fields which could contain the same word.
you can use regular expressions with negative lookahead and negative lookbehind. You can read more about that here: http://www.regular-expressions.info/lookaround.html
Here is an example that matches abc:
(?<!['"])(?<target>abc)(?!['"]) - this will match any abc not surrounded by single or double quotes.
A negative lookbehind and lookahead should suffice:
(?<!')abc(?!')
There's the negative lookbehind to ensure abc is not preceded by a single quote (?<!') as well as a negative lookahead to ensure that abc is is not followed by a single quote (?!').
Obviously, this is trivially changeable to switch out single for double quotes:
(?<!")def(?!")
Table names appear after specific keywords:
FROM tablename
JOIN tablename
INNER JOIN tablename
LEFT JOIN tablename
...
I would build a replacement pattern based on this fact.
You can use a regexp like this one:
//
$new_string = preg_replace("/\"(.)\"|\'(.)\'/","test",$string);
Which version of sql are you using? Some versions of sql (most?) don't have regular expression support.
The main approach that I'm aware of (in for example, MS SQL Server) is to write a function that actually parses the entire string, checking the conditions you are interested and replacing as and when necessary.
As this would be a multi-statement function, it has it's own overheads. As this sounds like a one-off, that shouldn't be too bad, normally.
An alternative approach, which is best avoided in lower level languages, but possibly suitable in higher level languages...
Find all occurances that you DON'T want to replace, and replace them with a holding pattern. (This pattern must be known to not already exist in the data.) Do your replacement, then return the holding pattern to the original string.
Start = abc "abc" 'abc'
Step1 = abc "xxx" 'abc' -- REPLACE('"abc", '"xxx"', start)
Step2 = abc "xxx" 'xxx' -- REPLACE('''abc''', '''xxx''', step1)
Step3 = ??? "xxx" 'xxx' -- REPLACE('abc', '???', step2)
Final = ??? "abc" 'abc' -- REPLACE('xxx', 'abc', step3)
I have this piece of code here:
$query = mysql_query("SELECT * FROM example WHERE text LIKE '%$value%'");
Would it make a difference if I would use:
$query = mysql_query("SELECT * FROM example WHERE text LIKE '$value'");
If yes, what would it be? What would be the difference?
Its not a difference in PHP, its a wildcard in SQL. You can read more about it here. Essentially, %
Matches any number of characters, even zero characters
Yes this has nothing to do with php, its a sql thing, % means like a wildcard there could be 0 or more characters instead of it.
%abc% matches abc, aabca, aabc, abcd
%abc matches dabc, abc but not abcd or tabcd
abc% matches abcd, abc but not dabc, tabcd
The difference is that '%$value%' will seach for matches containing $value (for exemple if $value = 'foo' it could return 'foobar' or 'barfoobar'), '$value' only matches the exact value of $value.
In PHP there is no difference (although you might want to take a look at using PDO for your database queries), the '%' symbols affect the query executed in MySQL.
% acts as a wildcard so the first result will return anything that contains the term 'value' in its text attribute, whereas the second will return only records that match exactly the term 'value'