I have a copy of the Geonames database stored in a MySQL database, and a PHP application that allows users to search the database for their city. It works fine if they type the city name in English, but I want them to be able to search in their native language.
For example, instead of asking a Japanese speaker to search for Tokyo, they should be able to search for 東京.
The Geonames database contains an alternatenames column with, "alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)."
For example, the alternatenames value for the Tokyo row is Edo,TYO,Tochiu,Tocio,Tokija,Tokijas,Tokio,Tokió,Tokjo,Tokyo,Toquio,Toquio - dong jing,Toquio - æ±äº¬,Tòquio,Tókýó,Tóquio,TÅkyÅ,dokyo,dong jing,dong jing dou,tokeiyw,tokkiyo,tokyo,twkyw,twqyw,Τόκιο,Токио,Токё,Токіо,ÕÕ¸Õ¯Õ«Õ¸,טוקיו,توكيو,توکیو,طوكيو,ܛܘܟÜܘ,ܜܘܟÜܘ,टोकà¥à¤¯à¥‹,டோகà¯à®•à®¿à®¯à¯‹,โตเà¸à¸µà¸¢à¸§,ტáƒáƒ™áƒ˜áƒ,东京,æ±äº¬,æ±äº¬éƒ½,ë„ì¿„.
Those values don't contain 東京 exactly, but I'm guessing that they contain a form of it that has been encoded or converted in some way. So, I assuming that if I perform the same encoding/conversion on my search string, then I'll be able to match the row. For example:
mysql_query( sprintf( "
SELECT * FROM geoname
WHERE
MATCH( name, asciiname, alternatenames )
AGAINST ( %s )
LIMIT 1",
iconv( 'UTF-8', 'ASCII', '東京' )
) );
The problem is that I don't know what that conversion would be. I've tried lots of combinations of iconv(), mb_convert_string(), etc, but with no luck.
The MySQL table looks like this:
CREATE TABLE `geoname` (
`geonameid` int(11) NOT NULL DEFAULT '0',
`name` varchar(200) DEFAULT NULL,
`asciiname` varchar(200) DEFAULT NULL,
`alternatenames` mediumtext,
`latitude` decimal(10,7) DEFAULT NULL,
`longitude` decimal(10,7) DEFAULT NULL,
`fclass` char(1) DEFAULT NULL,
`fcode` varchar(10) DEFAULT NULL,
`country` varchar(2) DEFAULT NULL,
`cc2` varchar(60) DEFAULT NULL,
`admin1` varchar(20) DEFAULT NULL,
`admin2` varchar(80) DEFAULT NULL,
`admin3` varchar(20) DEFAULT NULL,
`admin4` varchar(20) DEFAULT NULL,
`population` int(11) DEFAULT NULL,
`elevation` int(11) DEFAULT NULL,
`gtopo30` int(11) DEFAULT NULL,
`timezone` varchar(40) DEFAULT NULL,
`moddate` date DEFAULT NULL,
PRIMARY KEY (`geonameid`),
KEY `timezone` (`timezone`),
FULLTEXT KEY `namesearch` (`name`,`asciiname`,`alternatenames`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4
Can anyone point me in the right direction?
When I download the Japan file and set up a database like this:
CREATE TABLE geonames (
geonameid SERIAL,
name varchar(200),
asciiname varchar(200),
alternatenames varchar(10000),
latitude float,
longitude float,
featureclass varchar(1),
featurecode varchar(10),
countrycode varchar(2),
cc2 varchar(200),
admin1code varchar(20),
admin2code varchar(80),
admin3code varchar(20),
admin4code varchar(20),
population BIGINT,
elevation INT,
dem INT,
timezone varchar(40),
modificationdate DATE
) CHARSET utf8mb4;
Then I load the data like this:
LOAD DATA INFILE '/tmp/JP.txt' INTO TABLE geonames CHARACTER SET utf8mb4;
And select it like this:
SELECT alternatenames FROM geonames WHERE geonameid=1850147\G
I get this:
*************************** 1. row ***************************
alternatenames: Edo,TYO,Tochiu,Tocio,Tokija,Tokijas,Tokio,Tokió,Tokjo,Tokyo,Toquio,Toquio - dong jing,Toquio - 東京,Tòquio,Tókýó,Tóquio,Tōkyō,dokyo,dong jing,dong jing dou,tokeiyw,tokkiyo,tokyo,twkyw,twqyw,Τόκιο,Токио,Токё,Токіо,Տոկիո,טוקיו,توكيو,توکیو,طوكيو,ܛܘܟܝܘ,ܜܘܟܝܘ,टोक्यो,டோக்கியோ,โตเกียว,ტოკიო,东京,東京,東京都,도쿄
I can also do a search like this:
SELECT name FROM geonames WHERE alternatenames LIKE '%,東京,%';
Which is a long way of saying: Note the charset declaration when I created the table. I believe this is what you failed to do when you created your database.
Recommended reading:
https://www.joelonsoftware.com/articles/Unicode.html
http://kunststube.net/encoding/
In terms of MySQL, what is of critical importance is the characterset of the MySQL connection. That's the characterset that MySQL Server thinks the client is using in its communication.
SHOW VARIABLES LIKE '%characterset%'
If that isn't set right, for example, the client is sending latin1 (ISO-8859-1), but MySQL server thinks it's receiving UTF8, or vice versa, there's potential for mojibake.
Also of importance is the characterset of the alternatenames column.
One issue dealing with multibyte character set is going to be the PHP sprintf function. Many of the string handling functions in PHP have "mutlibyte" equivalents , that correctly handle strings containing multibyte characters.
https://secure.php.net/manual/en/book.mbstring.php
Unfortunately, there is no builtin mb_sprintf function.
For a more detailed description of string handling in PHP including multibyte characters/charactersets:
https://secure.php.net/manual/en/language.types.string.php#language.types.string.details
excerpt:
Ultimately, this means writing correct programs using Unicode depends on carefully avoiding functions that will not work and that most likely will corrupt the data and using instead the functions that do behave correctly, generally from the intl and mbstring extensions. However, using functions that can handle Unicode encodings is just the beginning. No matter the functions the language provides, it is essential to know the Unicode specification.
Also, a google search of "utf8 all the way through" may return some helpful notes. But be aware that this mantra is not a silver bullet or panacea to the issues.
Another possible issue, noted in the MySQL Reference Manual:
https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html
13.9.5 Full-Text Restrictions
Ideographic languages such as Chinese and Japanese do not have word delimiters. Therefore, the built-in full-text parser cannot determine where words begin and end in these and other such languages.
In MySQL 5.7.6, a character-based ngram full-text parser that supports Chinese, Japanese, and Korean (CJK), and a word-based MeCab parser plugin that supports Japanese are provided for use with InnoDB and MySIAM tables.
Related
I tried to store GPS latitude and longitude value in DMS format with the degree and apostrophe in mysql with php.
The previous solution is convert it into Decimal.
Is there any particular datatype to store the value in mysql with the symbols like below:
12°53′17″N
80°13′52″E
The proble is,it replaces with 12°53′17″ N, 80°13′52″ E.
Create your table by setting the character set as utf-UTF-8 Unicode. If you specify the Character set for the table as Unicode then it will accept the original latitude/longitude Values.
Reference code for SQL:
ALTER TABLE `gps`
DEFAULT CHARACTER SET=utf8;
The following works for me
CREATE TABLE `gps` (
`lat` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I have table 'emails' and row 'kategorija' where are stored values(they are in table 'proizvodi' in row 'kategorija') from checkboxes.
In row 'email' are stored emails from users, and users can check checkboxes(values in table 'proizvodi', row 'kategorija'), checked values are stored in table 'emails' in row 'kategorija'.
When new product(it has values like 'alati, satovi and others') is added to database, I need somehow autosend email with information that new product is available in 'kategorija', which has user selected in checkboxes, to users adress stored in table 'emails'.
Table 'emails':
CREATE TABLE IF NOT EXISTS `emails` (
`id` int(15) NOT NULL,
`email` varchar(255) CHARACTER SET utf8 COLLATE utf8_croatian_ci NOT NULL,
`kategorija` varchar(255) COLLATE utf8_bin NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Table 'proizvodi' :
CREATE TABLE IF NOT EXISTS `proizvodi` (
`id` int(11) NOT NULL,
`naziv` varchar(55) CHARACTER SET utf8 DEFAULT NULL,
`cijena` decimal(8,2) DEFAULT NULL,
`slika` text CHARACTER SET utf8,
`opis` text CHARACTER SET utf8,
`kategorija` enum('alati','glazbeni_instrumenti','smartphone','laptop','fotoaparat_kamera','tehnicka_roba_ostalo','sportska_oprema','satovi','kucanski_aparati','ostalo_ostalo') CHARACTER SET utf8 DEFAULT NULL
) ENGINE=InnoDB AUTO_INCREMENT=196 DEFAULT CHARSET=latin1;
I might not understand the question properly. I think it is quite simple:
When you insert the new product, retrieve its product id and category.
Then select email from emails where kategorija where categoria like '%%' to get the emails to use.
Finally iterate over those email addresses and send out the emails with your favorite email lib in your favorite language.
You might have find a better solution to store multiple categories for the emails, now I can see that you're using a plain string. It is not totally rocket proof, you shall rather use an n:n relation between a (yet to define) category table and the email table, and use the (yet to define) category table as a foreign key instead of using a hardcoded enum in the table definition of products for categories. Having a dedicated category table allows you to add further categories at any time without changing any code or table structure.
I (although I am not a native English speaker either) found a very good idea to use plain English terms for table names, column names, as well as variables, etc. This allows you to involve later on other developers.
Some other databases, for example Oracle has a built-in procedural language which allows you sending an email right from the database. Combining that with a database trigger when a new product is inserted, you can have this whole functionality implemented in about 10 lines of PL/SQL code.
I found that Oracle well worth of its price, and if you can't afford that, Oracle XE is available at free of charge. With the built-in PL/SQL language and lots of built-in packages, it makes database oriented application development a snap. This offers two magnitude faster speed than e.g. Hibernate plus MySQL in a heavy I/O transactional application.
Sorry, it is not possible to achieve your goal in MySQL. See https://stackoverflow.com/questions/31667462/how-to-send-an-email-from-mysql-using-a-stored-procedure for the reasons.
A possible workaround by making the database outputting emails into files, which is then processed by another application is described here: How to send email from MySQL 5.1
Another workaround is to create a custom UDF which can be called from your trigger on the product table. User-defined functions (UDF) allow you to call external code from your database, this way you can implement whatever feature you need.
If you could use Oracle, the built-in UTL_MAIL package allows you sending email directly from the database, for example from a trigger on the product table.
An example of sending an email is here: http://www.orafaq.com/wiki/Send_mail_from_PL/SQL
I have created new table 'obavijest'.
CREATE TABLE IF NOT EXISTS `obavijest` (
`id` int(10) unsigned NOT NULL,
`email` varchar(255) COLLATE utf8_bin NOT NULL,
`kategorija` varchar(255) COLLATE utf8_bin NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
On table 'emails' I created trigger to auto send values to table obavijest.
CREATE TRIGGER `obavijesttrig` AFTER INSERT ON `emails`
FOR EACH ROW BEGIN INSERT INTO obavijest (id, email, kategorija)
VALUES (NEW.id, NEW.email, NEW.kategorija);
END
Only thing left is to create a php script for getting emails from table 'obavijest' and send to this email addresses and run it with cron job.
I am currently working on a project which is translated in 18 languages like russian, german, swedish or chinese. I have some issues with sorting countries names in different languages. For example, countries names in french are sorted like that :
- États-Unis
- Éthiopie
- Afghanistan
I don't have this issue on my local server using MAMP.
My database's character set is configured as utf8 and the collation is utf8_unicode_ci. I have exactly the same configuration on the distant server.
I created a my.cnf file on my local server with the following instructions in order to correctly display special characters :
[mysqld]
skip-character-set-client-handshake
collation_server=utf8_unicode_ci
character_set_server=utf8
On the distant server, the my.cnf file does not contain these lines. When I tried to add them, MySQL did not recognise special characters anymore like if it was interpreting them as latin1.
I checked collation_database and all character_set variables but they are all set as utf8 / utf8_unicode_ci.
Here is the SQL code for the creation of the table :
CREATE TABLE esth_countries (
country_id varchar(2) COLLATE utf8_unicode_ci NOT NULL,
name varchar(100) COLLATE utf8_unicode_ci NOT NULL,
region varchar(40) COLLATE utf8_unicode_ci NOT NULL,
language_id varchar(2) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (country_id,language_id),
KEY language_id (language_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Special characters are correctly displayed on my distant server. The only problem concerns sorting using ORDER BY clause.
It seems like there is something wrong with the distant server's configuration but I can't figure out what.
I've recently started using laravel for a project I'm working on, and I'm currently having problems displaying data from my database in the correct character encoding.
My current system consists of a separate script responsible for populating the database with data, while the laravel project is reponsible for displaying the data. The view that is used, is set to display all text as utf-8, which works as I've successfully printed special characters in the view. Text from the database is not printed as utf8, and will not print special characters the right way. I've tried using both eloquent models and DB::select(), but they both show the same poor result.
charset in database.php is set to utf8 while collation is set to utf8_unicode_ci.
The database table:
CREATE TABLE `RssFeedItem` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`feedId` smallint(5) unsigned NOT NULL,
`title` varchar(250) COLLATE utf8_unicode_ci NOT NULL,
`url` varchar(250) COLLATE utf8_unicode_ci NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`text` mediumtext COLLATE utf8_unicode_ci,
`textSha1` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `url` (`url`),
KEY `feedId` (`feedId`),
CONSTRAINT `RssFeedItem_ibfk_1` FOREIGN KEY (`feedId`) REFERENCES `RssFeed` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6370 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I've also set up a test page in order to see if the problem could be my database setup, but the test page prints everything just fine. The test page uses PDO to select all data, and prints it on a simple html page.
Does anyone know what the problem might be? I've tried searching around with no luck besides this link, but I haven't found anything that might help me.
I did eventually end up solving this myself. The problem was caused by the separate script responsible for populating my database with data. This was solved by running a query with SET NAMES utf8 before inserting data to the database. The original data was pulled out, and then sent back in after running said query.
The reason for it working outside laravel, was simply because the said query wasn't executed on my test page. If i ran the query before retrieving the data, it came out with the wrong encoding because the query stated that the data was encoded as utf8, when it really wasn't.
I need an idea on how to make a complete multilanguage website. I came across many ways some are having an xml file to have template bits translated. This works if I only want the main template. But even the content will be translated.
For example I have a new entry in english, it should be translated to 4 other languages. Most attributes are common.
What i reached so far is by creating a table for the main website template with attributes:
lang, tag, value
In my template it will do a match on lang and tag.
What is the best way to translate the rest of the website (dynamic php pages using mysql)
You need a table for languages as below:
CREATE TABLE `language` (
`langid` tinyint(3) unsigned NOT NULL AUTO_INCREMENT,
`language` varchar(35) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`langid`)
) ENGINE=InnoDB
Then for example you have a table for posts as below:
CREATE TABLE `post` (
`postid` int unsigned NOT NULL AUTO_INCREMENT,
`langid` tinyint(3) unsigned NOT NULL AUTO_INCREMENT,
`content` TEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`title` varchar(35) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`postid`)
) ENGINE=InnoDB
In the post table you need a key like langid, which refers to the specific language in the language table. In you dashboard you will have something as below:
Each textbox refers to that specific language.
You should have another table for site's menus, and put langid foreign key in there. You should be well on your way.
Look into gettext extension - http://php.net/manual/en/book.gettext.php
Then use a program like POEdit or simplepo to do the actual editing of the language files
IMO, this is the best way I have found for a multilingual site
You can also look into the Zend_Translate module