Mariadb case insensitive fulltext search on json columns

Mariadb case insensitive fulltext search on json columns - php

I am using laravel but can use raw sql if needed.
I have multiple fields that are json fields, in that json there is translated data for that language. So, for example post table has field title and that title is {"en": "Title in english", "et": "Title in estonian"}
Now I need to make a fulltext search that searches these fields, for some columns i need to search term from all languages, not just from active one.
I am using MariaDB latest stable.
If i make index of these fields for fulltext search i can search fine but the search is case sensitive.
How can i make the search case insensitive? The json fields are currently longtext and utf8mb4_bin, laravel chose them for json field. I know bin is case sensitive collation but what else could i put so the functionality to find records by translated slug (for example) would be still there.
In laravel, one can search like ->where('slug->en','some-post-slug'). So i need to keep laravels json fields functionality intact.
I have been trying to achieve this for 2 days now, i need some external input.

The MySQL documentation states the following:
By default, the search is performed in case-insensitive fashion. To perform a case-sensitive full-text search, use a case-sensitive or binary collation for the indexed columns. For example, a column that uses the utf8mb4 character set of can be assigned a collation of utf8mb4_0900_as_cs or utf8mb4_bin to make it case-sensitive for full-text searches.
I bet it's the same in MariaDB. If you change your collation it will work with case-insensitive searches like expected. You can change it that way:
ALTER TABLE mytable
CONVERT TO CHARACTER SET utf8mb4
COLLATE utf8mb4_general_ci;

Related

set language for eloquent alphabetic orderBy

I want to sort an alphabetical column and using the orderBy method in Laravel eloquent.
like below:
Post::where("some conditions")->orderBy('name','desc')->get();
I'm working with the Persian language and the result seems to use the Arabic alphabet.
I want to change this default alphabet only. can eloquent and MySql handle this at all?

Here's an answer in how you could accomplish this in MySQL:
Ordering non-english letters in MySQL
Here's how to configure a default collation setting in Laravel:
https://laravel.com/docs/8.x/database#read-and-write-connections
Or you could use Laravel's raw system to add collation yourself on the specific query:
https://laravel.com/docs/8.x/queries#raw-expressions

SQL select case insensitive text but not Latex function names

I have a MySql database of latex snippets. Each snippet contains normal text and latex commands. The commands are all preceded by a backslash \ . I would like to search through these snippets such that the text is case insensitive but the commands are case insensitive. So selecting for vector gives results where the text contains either vector or Vector whereas selecting for \Vector will not return \vector.

Your question is about collations. A column in a table has a collation setting, for example utfmb4_general_ci or utf8mb4_bin. The first of those is case-insensitive, meaning a search like this
WHERE col LIKE '%vector%'
will yield rows containing both ...Vector... and ...vector...
If you use the utf8_bin (binary match, case sensitive) collation, that search excludes ...Vector...
You can specify the collation to use in a filter clause, like so:
WHERE col LIKE '%Vector%' COLLATE utf8_bin;
and that will force MySQL to use the collation you want. If you don't specificy the collation (the normal case) MySQL uses the collation specified at the time you created the table or the column.
When you specify a collation explicitly, and it's different from the column's collation, MySQL's query planner cannot use an index on the column to satisfy your query. So it might be slower. Of course the filter-clause pattern column LIKE '%value%' (with a leading %) also prevents the use the index.

Using a hyphen in fulltext search with an InnoDB engine?

I have a FULLTEXT search in a table of part numbers. Some part numbers have hyphens.
The table engine is InnoDB using MySQL 5.6.
The problem I am having is that MySQL was treating the hyphen (-) character as a word separator.
So I created a new MySQL charset collation whereas the hyphen is treated as a letter.
I followed this tutorial: http://dev.mysql.com/doc/refman/5.0/en/full-text-adding-collation.html
I made a test table, using the syntax at the bottom of the link, however i used the InnoDB Engine. I searched for '----' and received "syntax error, unexpected '-'"
However If I change the engine to MyISAM, I get the correct result.
How to I get this to work with the InnoDB engine?
It seems with MySQL its one step forward and two steps back.
Edit: I found this link for 5.6 (http://dev.mysql.com/doc/refman/5.6/en/full-text-adding-collation.html), which is the same tutorial using InnoDB as the engine.
But here's my test:
create table test (a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, FULLTEXT INDEX(a)) ENGINE=InnoDB
Added a row that is just "----"
select * from test where MATCH(a) AGAINST('----' IN BOOLEAN MODE)
syntax error, unexpected '-'
Drop the table, MyISAM
create table test (a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, FULLTEXT INDEX(a)) ENGINE=MyISAM
Added a row that is just "----"
select * from test where MATCH(a) AGAINST('----' IN BOOLEAN MODE)
1 result
Edit 2, if it helps to see visually, heres my 2 tests:

I encountered this exact issue recently. I had previously added a custom collation per the docs and was using MyISAM and it was working fine. Then a few weeks ago switched to InnoDB and things stopped working. I tried:
Rebuilding my collation and A/B testing to make sure they are working
Disabling stopword by setting innodb_ft_enable_stopword to 0
Rebuilding my fulltext table and index
In the end I took a different approach since InnoDB doesn't seem to follow the same rules as MyISAM when it comes to fulltext indexing. This is a bit hacky but works for my application:
Create a special search column containing the data I need to search for. This column has a fulltext index and exists for the sole purposes of doing a fulltext search, which is still very fast on a table with millions of rows.
Search/replace all - in my search column with an unused character that is considered a "word" character. See my question here regarding this: https://dba.stackexchange.com/questions/248607/which-characters-are-considered-word-characters. Figuring out what word characters are turns out to be not so easy but here are a few that worked for me: Ω œ π µ. These characters are probably not used in the data you need to be searching but they will be recognized by the parser as searchable characters. In my case I replace - with Ω. Since I only need the row ID, it doesn't matter what the data in this column looks like to human eyes.
Revise my updates and inserts to keep the search column data and substitutions up to date. In my case this was easy since there is only one place in the application that updates this particular table. A couple of triggers could also be used to handle this:
CREATE TRIGGER update_search BEFORE UPDATE ON animals
FOR EACH ROW SET NEW.search = REPLACE(NEW.animal_name, '-', 'Ω');
CREATE TRIGGER insert_search BEFORE INSERT ON animals
FOR EACH ROW SET NEW.search = REPLACE(NEW.animal_name, '-', 'Ω');
Replace - in my search queries with Ω.
Voila. Here's a fiddle demonstrating: https://www.db-fiddle.com/f/x1WZpZP6wcqbTTvTEFFXYc/0
The above workaround might not be realistic for every application but hopefully it's useful for someone. Would be great to have a real solution to this for InnoDB.

The InnoDb FULLTEXT search is probably treating the hyphens as stop-words. So when it gets to the second hyphen, it would expect a word, not a hyphen. This would explain the 'syntax error'.
Why it doesn't do this in MyISAM is because the implementation in InnoDB of FULLTEXT indexes is quite different, and of course, they've only been added for InnoDB in MySQL 5.6.
What can you do about this? It seems you can influence the list of stop-words through a special table: http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_ft_user_stopword_table. This could stop MySQL from treating hyphens as stop-words.

Opencart - Search regardless accent

It had been written many times already that Opencart's basic search isn't good enough .. Well, I have came across this issue:
When customer searches product in my country (Slovakia (UTF8)) he probably won't use diacritics. So he/she writes down "cucoriedka" and found nothing.
But, there is product named "čučoriedka" in database and I want it to display too, since that's what he was looking for.
Do you have an idea how to get this work? The simple the better!

I'm ignorant of Slovak, I am sorry. But the Slovak collation utf8_slovak_ci treats the Slovak letter č as distinct from c. (Do the surnames starting with Č all come after those starting with C in your telephone directories? They probably do. The creators of MySQL certainly think they do.)
The collation utf8_general_ci treats č and c the same. Here's a sql fiddle demonstrating all this. http://sqlfiddle.com/#!9/46c0e/1/0
If you change the collation of the column containing your product name to utf8_general_ci, you will get a more search-friendly table. Suppose your table is called product and the column with the name in it is called product_name. Then this SQL data-definition statement will convert the column as you require. You should look up the actual datatype of the column instead of using varchar(nnn) as I have done in this example.
alter table product modify product_name varchar(nnn) collate utf8_general_ci
If you can't alter the table, then you can change your WHERE clause to work like this, specifying the collation explicitly.
WHERE 'userInput' COLLATE utf8_general_ci = product_name
But this will be slower to search than changing the column collation.

You can use SOUNDEX() or SOUNDS LIKE function of MySQL.
These functions compare phonetics.
Accuracy of soundex is doubtful for other than English. But, it can be improved if we use it like
select soundex('ball')=soundex('boll') from dual
SOUNDS LIKE can also be used.
Using combination of both SOUNDEX() and SOUNDS LIKE will improve accuracy.
Kindly refer MySQL documentation for details OR mysql-sounds-like-and-soundex

mysql collation and indexes

What mysql collation should I use for my tables to support all european languages, all latin american languages, and maybe chinese, maybe asian languages? Thanks!
What is the rule when it comes to using indexes on mysql table columns? When should you not use an index for a column in a table?

UTF8 would probably be the best choice, more specific; utf8_general_ci.
Indices should not be set in a table that you're going to perform a huge amount of insertions into. Indices speed up SELECT-queries, but these indices need to be rebuilt everytime you INSERT into the table. So, if you have a table that's... well, let's say it stores news articles - suitable indices might be the title or something that you might wanna "search" for.
Hope this clears some things up.

utf8
utf8-general
is universal character set...
you should not use index when you're sure you will not search for it (via WHERE clause)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.