Force MySQL field to lowercase with PHP

Force MySQL field to lowercase with PHP - php

I have a database with an email field, and it cycles through the database to grab all the transactions concerning a certain email address.
Users putting in lowercase letters when their email is stored with a couple capitals is causing it not to show their transactions. When I modify it to match perfect case with the other emails, it works.
How can I modify this so that it correctly compares with the email field and case doesn't matter? Is it going to be in changing how the email gets stored?
$result = mysql_query("SELECT * FROM `example_orders` WHERE `buyer_email`='$useremail';") or die(mysql_error());
Thanks ahead of time!

Uh... you realize that email addresses are case sensitive, right? From RFC 2821:
Verbs and argument values (e.g., "TO:" or "to:" in the RCPT command
and extension name keywords) are not case sensitive, with the sole
exception in this specification of a mailbox local-part (SMTP
Extensions may explicitly specify case-sensitive elements). That is,
a command verb, an argument value other than a mailbox local-part,
and free form text MAY be encoded in upper case, lower case, or any
mixture of upper and lower case with no impact on its meaning. This
is NOT true of a mailbox local-part. The local-part of a mailbox
MUST BE treated as case sensitive. Therefore, SMTP implementations
MUST take care to preserve the case of mailbox local-parts. Mailbox
domains are not case sensitive. In particular, for some hosts the
user "smith" is different from the user "Smith". However, exploiting
the case sensitivity of mailbox local-parts impedes interoperability
and is discouraged.
(emphasis added)

A mixed PHP/MySQL solution:
$result = mysql_query("
SELECT *
FROM example_orders
WHERE LOWER(buyer_email) = '" . strtolower($useremail) . "';
") or die(mysql_error());
What it does is converting both sides of the comparison to lowercase. This is not very efficient, because the use of LOWER will prevent MySQL from using indexes for searching.
A more efficient, pure SQL solution:
$result = mysql_query("
SELECT *
FROM example_orders
WHERE buyer_email = '$useremail' COLLATE utf8_general_ci;
") or die(mysql_error());
In this case, we are forcing the use of a case-insensitive collation for the comparison. You wouldn't need that if the column had a case-insensitive collation in the first place.
Here is how to change the column collation, as suggested by Basti in a comment:
ALTER TABLE `example_orders`
CHANGE `buyer_email` `buyer_email` VARCHAR( 100 )
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL
If you choose to do that, you can run the query without COLLATE utf8_general_ci.

If you do WHERE buyer_email LIKE '...' it'll by default do a case-insensitive match.
With e-mail fields, though, I prefer to lowercase the e-mail address when I insert it into the DB.

Related

Mysql where condition in different languages other than English [duplicate]

In a table x, there is a column with the values u and ü.
SELECT * FROM x WHERE column='u'.
This returns u AND ü, although I am only looking for the u.
The table's collation is utf8mb4_unicode_ci . Wherever I read about similar problems, everyone suggests to use this collation because they say that utf8mb4 really covers ALL CHARACTERS. With this collation, all character set and collation problems should be solved.
I can insert ü, è, é, à, Chinese characters, etc. When I make a SELECT *, they are also retrieved and displayed correctly.
The problem only occurs when I COMPARE two strings as in above example (SELECT WHERE) or when I use a UNIQUE INDEX on the column. When I use the UNIQUE INDEX, a "ü" is not inserted when I have a "u" in the column already. So, when SQL compares u and ü in order to decide whether the ü is unique, it thinks it is the same as the u and doesn't insert the ü.
I changed everything to utf8mb4 because I don't want to worry about character sets and collation anymore. However, it seems that utf8mb4 isn't the solution either when it comes to COMPARING strings.
I also tried this:
SELECT * FROM x WHERE _utf8mb4 'ü' COLLATE utf8mb4_unicode_ci = column.
This code is executable (looks pretty sophisticated). However, it also returns ü AND u.
I have talked to some people in India and here in China about this issue. We haven't found a solution yet.
If anyone could solve the mystery, it would be really great.
Add_On: After reading all the answers and comments below, here is a code sample which solves the problem:
SELECT * FROM x WHERE 'ü' COLLATE utf8mb4_bin = column
By adding "COLLATE utf8mb4_bin" to the SELECT query, SQL is invited to put the "binary glasses" (ending _bin) on when it looks at the characters in the column. With the binary glasses on, SQL sees now the binary code in the column. And the binary code is different for every letter and character and emoji which one can think of. So, SQL can now also see the difference between u and ü. Therefore, now it only returns the ü when the SELECT query looks for the ü and doesn't also return the u.
In this way, one can leave everything (database collation, table collation) the same, but only add "COLLATE utf8mb4_bin" to a query when exact differentiation is needed.
(Actually, SQL takes all other glasses off (utf8mb4_german_ci, _general_ci, _unicode_ci etc.) and only does what it does when it is not forced to do anything additional. It simply looks at the binary code and doesn't adjust its search to any special cultural background.)
Thanks everybody for the support, especially to Pred.

Collation and character set are two different things.
Character set is just an 'unordered' list of characters and their representation.
utf8mb4 is a character set and covers a lots of characters.
Collation defines the order of characters (determines the end result of order by for example) and defines other rules (such as which characters or character combinations should be treated as same). Collations are derived from character sets, there can be more than one collation for the same character set. (It is an extension to the character set - sorta)
In utf8mb4_unicode_ci all (most?) accented characters are treated as the same character, this is why you get u and ü. In short this collation is an accent insensitive collation.
This is similar to the fact that German collations treat ss and ß as same.
utf8mb4_bin is another collation and it treats all characters as different ones. You may or may not want to use it as default, this is up to you and your business rules.
You can also convert the collation in queries, but be aware, that doing so will prevent MySQL to use indexes.
Here is an example using a similar, but maybe a bit more familiar part of collations:
The ci at the end of the collations means Case Insensitive and almost all collations with ci has a pair ending with cs, meaning Case Sensitive.
When your column is case insensitive, the where condition column = 'foo' will find all of these: foo Foo fOo FoO FOo FoO fOO, FOO.
Now if you try to set the collation to case sensitive (utf8mb4_unicode_cs for example), all the above values are treated as different values.
The localized collations (like German, UK, US, Hungarian, whatever) follow the rules of the named language. In Germany ss and ß are the same and this is stated in the rules of the German language. When a German user searches for a value Straße, they will expect that a software (supporting german language or written in Germany) will return both Straße and Strasse.
To go further, when it comes to ordering, the two words are the same, they are equal, their meaning is the same so there is no particular order.
Don't forget, that the UNIQUE constraint is just a way of ordering/filtering values. So if there is a unique key defined on a column with German collation, it will not allow to insert both Straße and Strasse, since by the rules of the language, they should be treated as equal.
Now lets see our original collation: utf8mb4_unicode_ci, This is a 'universal' collation, which means, that it tries to simplify everything so since ü is not a really common character and most users have no idea how to type it in, this collation makes it equal to u. This is a simplification in order to support most of the languages, but as you already know, these kind of simplifications have some side effects. (like in ordering, filtering, using unique constraints, etc).
The utf8mb4_bin is the other end of the spectrum. This collation is designed to be as strict as it can be. To achieve this, it literally uses the character codes to distinguish characters. This means, each and every form of a character are different, this collation is implicitly case sensitive and accent sensitive.
Both of these have drawbacks: the localized and general collations are designed for one specific language or to provide a common solution. (utf8mb4_unicode_ci is the 'extension' of the old utf8_general_ci collation)
The binary requires extra caution when it comes to user interaction. Since it is CS and AS it can confuse users who are used to get the value 'Foo' when they are looking for the value 'foo'. Also as a developer, you have to be extra cautious when it comes to joins and other features. The INNER JOIN 'foo' = 'Foo' will return nothing, since 'foo' is not equal to 'Foo'.
I hope that these examples and explanation helps a bit.

utf8_collations.html lists what letters are 'equal' in the various utf8 (or utf8mb4) collations. With rare exceptions, all accents are stripped before comparing in any ..._ci collation. Some of the exceptions are language-specific, not Unicode in general. Example: In Icelandic É > E.
..._bin is the only collation that honors the treats accented letters as different. Ditto for case folding.
If you are doing a lot of comparing, you should change the collation of the column to ..._bin. When using the COLLATE clause in WHERE, an index cannot be used.
A note on ß. ss = ß in virtually all collations. In particular, utf8_general_ci (which used to be the the default) treated them as unequal. That one collation made no effort to treat any 2-letter combination (ss) as a single 'letter'. Also, due to a mistake in 5.0, utf8_general_mysql500_ci treats them unequal.
Going forward, utf8mb4_unicode_520_ci is the best through version 5.7. For 8.0, utf8mb4_0900_ai_ci is 'better'. The "520" and "900" refer to Unicode standards, so there may be even newer ones in the future.

You can try the utf8_bin collation and you shouldn't face this issue, but it will be case sensitive. The bin collations compare strictly, only separating the characters out according to the encoding selected, and once that's done, comparisons are done on a binary basis, much like many programming languages would compare strings.

I'll just add to the other answers that a _bin collation has its peculiarities as well.
For example, after the following:
CREATE TABLE `dummy` (`key` VARCHAR(255) NOT NULL UNIQUE);
INSERT INTO `dummy` (`key`) VALUES ('one');
this will fail:
INSERT INTO `dummy` (`key`) VALUES ('one ');
This is described in The binary Collation Compared to _bin Collations.
Edit: I've posted a related question here.

->where() clauses seem to be case sensitive

I have a query like so:
$profilesx->where('qZipCode', $location)->
orWhere('qCity', 'LIKE', '%'.$location.'%');
Where location is equal to belgrade, and the database column says Belgrade.
It seems to be case sensitive (using either = or LIKE) so if I search for Belgrade I get a result but if I search for belgrade I do not get any results.
How to make it case insensitive?

The default character set and collation are latin1 and
latin1_swedish_ci, so nonbinary string comparisons are case
insensitive by default. This means that if you search with col_name
LIKE 'a%', you get all column values that start with A or a. To make
this search case sensitive, make sure that one of the operands has a
case sensitive or binary collation. For example, if you are comparing
a column and a string that both have the latin1 character set, you can
use the COLLATE operator to cause either operand to have the
latin1_general_cs or latin1_bin collation:
source: http://dev.mysql.com/doc/refman/5.7/en/case-sensitivity.html
What's actually happened is that the case sensitivity has been switched off (which is not neccassarily a bad thing). The solution is given in the next section of that document. Try something like
orWhere('qCity', 'COLLATE latin1_general_cs LIKE', '%'.$location.'%');
If laravel doesn't like it you will have to use a raw query or change the collation setting for the column.
As a side note, try to avoid LIKE %something% queries if you can. Mysql cannot use an index for these sorts of queries and they generally tend to be slow on large tables because of it.

This is usually the collation settings for the database. You need to set it to a case insensitive collation type.
Read this link for more info:
http://dev.mysql.com/doc/refman/5.7/en/case-sensitivity.html

MYSQL Email Validation with Regex not working

i just want to fetch the invalid email addresses from my database, i tried with the following query, but its not working
$sql=mysql_query("SELECT * FROM mytable WHERE email!='' and email NOT REGEXP '^[A-Z0-9._%-]+#[A-Z0-9.-]+\.[A-Z]{2,4}$'");
And the invalid Email is a.bcdefg-3#abccom

It looks like the Richard answer is correct however, it may not works given the collation used.
Therefore, if you have a case sensitive collation, you may want to lowercase your field.
Try this query :
SELECT * FROM `mytable` WHERE `email` NOT REGEXP '^[[:alnum:]._%-\+]+#[[:alnum:].-]+[.][[:alnum:]]{2,4}$';
I have updated the regex to use character classes instead of character range to avoid lower (or upper) case transformation.
Moreover, in some IDE, you may have to escape "." with two backslashes, therefore I use
[.]
instead of escaped dot.
I updated again to allow subdomains.
Edited to allow +, thanks to #Charlie Brumbaugh comment.

Try this:
SELECT * FROM `mytable` WHERE `email` NOT REGEXP '^[A-Z0-9._%-]+#[A-Z0-9.-]+\.[A-Z]{2,4}$';
Works for my.
Furthermore, do not use the MySQL driver as it is deprecated in PHP.

Alter table with PHP and MySQL, add a column with an '#' sign

I am trying to check if a table has a certain column in it, and if not add that column to it. My code appears to work fine as long as the input value does not have an # sign. I have tried surrounding
'$email'
with and without single quotes as an input string. Any help would be really appreciated.
$email = strtolower(mysql_real_escape_string($_SESSION['email']));
$result = mysql_query("SHOW COLUMNS FROM `selections` LIKE '$email'",$conn);
$exists = (mysql_num_rows($result))?TRUE:FALSE;
if ($exists == FALSE) {
$query2 = "ALTER TABLE selections ADD $email VARCHAR( 120 ) NOT NULL";
$add= mysql_query($query2,$conn);
var_dump($query2);
echo("this error". mysql_error());
}
$query2 was taken directly from phpmyadmin and seems to work there even with an # sign input
Thanks for your help!

Before anything else, please, please consider doing this in another way. You will be adding a field to a table for every email - what you probably do not know is that this increases the size of your table by increasing the size of the rows, and also limits you to a fixed number of fields (This link clearly highlights a total of 65535 bytes per row max. Every VARCHAR character, depending on charset, is between 3 and 8 bytes)
The real reason why your request is failing is because # is a special character in your SQL queries and phpmyadmin happens to be smart enough to escape it. # denotes a variable in the SQL dialect uses by MySQL. You can either backtick-escape it for MySQL, or you can quit using this in favour of a table structure like this:
selection:
* id
* your metadata here
emails:
* id
* email_address
selection_emails:
* id
* selection_id
* email_id
The third table is called an associative table. It allows you to keep normalizing your data.

You can surround the email with curly brackets {$email} to define it explicitly as a variable within a string, but you probably also need to escape odd characters in this variable before this.
When altering the table you should also surround this with back-ticks, to allow for odd characters.
The best approach would be to use parameterized queries, and drop the DEPRECATED mysql library. And also to not allow odd characters to be used in field names.
I would also question why you are adding email as a new column.

SQLite: execute case-sensitive LIKE query on a specific query

I want to run a SELECT ... LIKE query in SQLite that is case-sensitive. But I only want this one query to be case sensitive, and nothing else.
I know there is
PRAGMA case_sensitive_like = boolean;
But that seems to change all LIKE queries.
How can I enable case-sensitive LIKE on a single query?
Examples:
I want a query for "FuN" to match "blah FuN blah", but not "foo fun bar".
(This is running under PHP using PDO)
I might be able to toggle that on, then off after the query but I can concerned about the repercussions that may have (efficiency etc). Is there any harm?
I don't have write access to the database.
(This is under Windows Server 2008)
I also tried SELECT id, summary, status FROM Tickets WHERE summary COLLATE BINARY LIKE '%OPS%'; but that did not do a case-sensitive SELECT, it still returned results returns like laptops.

Why not go the simple way of using
PRAGMA case_sensitive_like = true/false;
before and after each query you want to be case sensitve? But beware- case sensitivity does only work for ASCII characters, not Unicode which makes SQlite not fully UC-compliant at this time.
Alternatively, SQlite allows applications to implement the REGEXP operator which might help according to www.sqlite.org/lang_expr.html.

Try:
SELECT id, summary, status FROM Tickets WHERE summary GLOB \"*OPS*\";
there is no space between *and OPS.

I think you may need to do a seperate check in your php code on returned value to see if they match your case-sensitive data.
$rs = mysql_query("Select * from tbl where myfield like '%$Value%'");
while($row == mysql_fetch_assoc($rs))
{
if (strpos($row['myfield'],$Value) !== false)
{
$Matches[] = $row;
}
}
print_R($Matches);

You can try something like this:
SELECT YOUR_COLUMN
FROM YOUR_TABLE
WHERE YOUR_COLUMN
COLLATE latin1_general_cs LIKE '%YOUR_VALUE%'
Not sure what your collation set is on the column. I picked latin as an example. Run the query and change 'cs' to 'ci' at the end. You should see different results.
UPDATE
Sorry. read the question too fast. The above collation is for mysql. For SQLLite, you should be able to use BINARY which should give you case sensitive search.
ref: http://www.sqlite.org/datatype3.html#collation

You can do that per column, not per query (which may be your case). For this, use sqlite collations.
CREATE TABLE user (name VARCHAR(255) COLLATE NOCASE);
All LIKE operations on this column then will be case insensitive.
You also can COLLATE on queries even though the column isn't declared with a specific collation:
SELECT * FROM list WHERE name LIKE '%php%' COLLATE NOCASE
Note that only ASCII chars are case insensitive by collation. "A" == "a", but "Æ" != "æ"
SQLite allows you to declare new collation types with sqlite_create_collation and them implement the collation logic on the PHP side, but PDO doesn't expose this.

SELECT * FROM table WHERe field LIKE '%search_term%'
In this form the SELECT is case insensitive.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Force MySQL field to lowercase with PHP - php

If you do WHERE buyer_email LIKE '...' it'll by default do a case-insensitive match. With e-mail fields, though, I prefer to lowercase the e-mail address when I insert it into the DB.

Related

Mysql where condition in different languages other than English [duplicate]

->where() clauses seem to be case sensitive

MYSQL Email Validation with Regex not working

Alter table with PHP and MySQL, add a column with an '#' sign

SQLite: execute case-sensitive LIKE query on a specific query

Categories

Resources