I have a MySql database of latex snippets. Each snippet contains normal text and latex commands. The commands are all preceded by a backslash \ . I would like to search through these snippets such that the text is case insensitive but the commands are case insensitive. So selecting for vector gives results where the text contains either vector or Vector whereas selecting for \Vector will not return \vector.
Your question is about collations. A column in a table has a collation setting, for example utfmb4_general_ci or utf8mb4_bin. The first of those is case-insensitive, meaning a search like this
WHERE col LIKE '%vector%'
will yield rows containing both ...Vector... and ...vector...
If you use the utf8_bin (binary match, case sensitive) collation, that search excludes ...Vector...
You can specify the collation to use in a filter clause, like so:
WHERE col LIKE '%Vector%' COLLATE utf8_bin;
and that will force MySQL to use the collation you want. If you don't specificy the collation (the normal case) MySQL uses the collation specified at the time you created the table or the column.
When you specify a collation explicitly, and it's different from the column's collation, MySQL's query planner cannot use an index on the column to satisfy your query. So it might be slower. Of course the filter-clause pattern column LIKE '%value%' (with a leading %) also prevents the use the index.
Related
I am using laravel but can use raw sql if needed.
I have multiple fields that are json fields, in that json there is translated data for that language. So, for example post table has field title and that title is {"en": "Title in english", "et": "Title in estonian"}
Now I need to make a fulltext search that searches these fields, for some columns i need to search term from all languages, not just from active one.
I am using MariaDB latest stable.
If i make index of these fields for fulltext search i can search fine but the search is case sensitive.
How can i make the search case insensitive? The json fields are currently longtext and utf8mb4_bin, laravel chose them for json field. I know bin is case sensitive collation but what else could i put so the functionality to find records by translated slug (for example) would be still there.
In laravel, one can search like ->where('slug->en','some-post-slug'). So i need to keep laravels json fields functionality intact.
I have been trying to achieve this for 2 days now, i need some external input.
The MySQL documentation states the following:
By default, the search is performed in case-insensitive fashion. To perform a case-sensitive full-text search, use a case-sensitive or binary collation for the indexed columns. For example, a column that uses the utf8mb4 character set of can be assigned a collation of utf8mb4_0900_as_cs or utf8mb4_bin to make it case-sensitive for full-text searches.
I bet it's the same in MariaDB. If you change your collation it will work with case-insensitive searches like expected. You can change it that way:
ALTER TABLE mytable
CONVERT TO CHARACTER SET utf8mb4
COLLATE utf8mb4_general_ci;
In my laravel in order to search in products title column I use the following code:
$products->where('title', 'like', '%' . $request->title . '%');
the title column is a string column and data stored in it are in Persian. Also, the database collation is UTF8_general_ci. however, when I search something some titles are found and some aren't. I need the result to find every product which contains the $request->title in their title columns.
can you help me?
Change Collation UTF8_general_ci to latin1_swedish_ci
Collations have these general characteristics:
Two different character sets cannot have the same collation.
Each character set has one collation that is the default collation. For example, the default collation for latin1 is latin1_swedish_ci. The output for SHOW CHARACTER SET indicates which collation is the default for each displayed character set.
There is a convention for collation names: They start with the name of the character set with which they are associated, they usually include a language name, and they end with _ci (case insensitive), _cs (case sensitive), or _bin (binary).
In cases where a character set has multiple collations, it might not be clear which collation is most suitable for a given application. To avoid choosing the wrong collation, it can be helpful to perform some comparisons with representative data values to make sure that a given collation sorts values the way you expect.
reference here
I have a database(Mysql) in which I store more then 100 000 keywords with keyword in different languages. So an example if I have three colums [id] [turkish (utf8_turkish_ci)] [german(utf8)]
The users could enter a german or a turkish word in the search box. If the user enters a german word all is fine so it prints out the turkish word but how to solve it with the turkish one. I ask because each language has its own additional characters like ä ü ö ş etc.
So should I use
mb_convert_encoding
to convert the string but then how to check if it is a german or turkish string I think that would be to complex. Or is the encoding of the tables wrong?
Stuck now so how to implement it so the user could enter keyword of both languages words
You have several issues to solve to make this work correctly.
First, you've chosen the utf8 character set to hold all your text. That is a good choice. If this is a new-in-2016 application, you might choose the utf8mb4 character set instead. Once you have chosen a character set your users should be able to read your text.
Second, for the sake of searching and sorting (WHERE and ORDER BY) you need to choose an appropriate collation for each language. For modern German, utf8_general_ci will work tolerably well. utf8_unicode_ci works a little better if you need standard lexical ordering. Read this. http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html
For modern Spanish, you should use utf8_spanish_ci. That's because in Spanish the N and Ñ characters are not considered the same. I don't know whether the general collation works for Turkish.
Notice that you seem to have confused the notions of character set and collation in your question. You've mentioned a collation with your Turkish column and a character set with your German column.
You can explicitly specify character set and collation in queries. For example, you can write
WHERE _utf8 'München' COLLATE utf8_unicode_ci = table.name;
In this expression, _utf8 'München' is a character constant, and
constant COLLATE utf8_unicode_ci = table.name
is a query specifier which includes an explicit collation name. Read this.http://dev.mysql.com/doc/refman/5.7/en/charset-collate.html
Third, you may want to assign a default collation to each language specific column. Default collations are baked into indexes, so they'll help accelerate searching.
Fourth, your users will need to use an appropriate input method (keyboard mapping, etc) to present data to your application. Turkish-language users hopefully know how to type Turkish words.
I have a word list stored in mysql, and the size is around 10k words. The column is marked as unique. However, I cannot insert full-width and half-width character of punctuation mark.
Here are some examples:
(half-width, full-width)
('?', '?')
('/', '/')
The purpose is that, I have many articles containing both full-width and half-width characters and want to find out if the articles contain these words. I use php to do the comparison and it can know that '?' is different than '?'. Is there any idea how to do it in mysql too? Or is there some ways so that php can make it equal?
I use utf8_unicode_ci for the database encoding, and the column is also used utf8_unicode_ci for the encoding. When I made these queries, both return the same record, '?測試'
SELECT word FROM word_list WHERE word='?測試'
SELECT word FROM word_list WHERE word='?測試'
Most likely explanation is a characterset translation issue; for example, the column you are storing the value to is defined as latin1 characterset.
But it's not necessarily the characterset of the column that's causing the issue. It's a characterset conversion happening somewhere.
If you aren't aware of characterset encodings, I recommend consulting the source of all knowledge: google.
I highly recommend the two top hits for this search:
what every programmer needs to know about character encoding
http://www.joelonsoftware.com/articles/Unicode.html
http://kunststube.net/encoding/
Basically i feel like the problem is simple. but cant find any fix for it.
In a login form, i use php to query my database which on its side check the passed username and password by selecting from the database table any row that has that 2 values.
What seems to be the problem is when i login for ex.
user: mm
pass: oo
that works. and that is right as they are on the db table.
but now if i use
user: MM
pass: oo
still works?? which should not. As my db has only user as 'mm' not 'MM'.
I need it to distinguish between upper and lower
because in other rows i have mix of upper an lower letters
You'll have to change the collation/encoding of the column from a "case-insensitive" encoding to a case-sensitive one, such as utf8_general_cs or latin1_general_cs
you need to use case sensitive encoding
From 10.1.2. Character Sets and Collations in MySQL
There is a convention for collation names: They start with the name of the character set with which they are associated, they usually include a language name, and they end with _ci (case insensitive), _cs (case sensitive), or _bin (binary).
so use case sensitive like utf8_general_cs or latin1_general_cs
You can update the table using the query:
ALTER TABLE `your_table` CHARSET=latin1 COLLATE=latin1_general_cs;
Where it says latin1_general_cs, the cs specifies that it is case sensitive. If you ever desire it not to be case sensitive in the future you can just use this query:
ALTER TABLE `your_table` CHARSET=latin1 COLLATE=latin1_general_ci;
the ci spcifies that it is case insensitive.