Problem: Cyrillic, UTF-8 encoded string, for example, "Михаил", specified in an HTML form, saved by PHP into MYSQL turns to unreadable krakozyabras like "Михайлович".
This is now a new problem, but I have found no solution so far... Please help if someone encountered this before.
HTML page is UTF-8 encoded and has properly set META; saving PHP script is UTF-8 encoded (with, or without BOM - doesn't matter). MySL table has DEFAULT ENCODING utf-8:
CREATE TABLE `cms_deposit_request` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Any input welcome! Thanks!
Always call
mysql_set_charset('utf8');`
function (or a similar function from the API you are using) right after connecting to database
if there is no such function, run
SET NAMES utf8
SQL query in the same place
Related
It's actually my fault that I did not think about it earlier that, my remote server MySQL version (on shared hosting) is 5.5.6, but my local MySQL version is 5.7.19.
I developed a Laravel (v6.6.0) Web Application, where I ran the migration on the very first run, but as it's completely a personal project, I continued modifying the database by hand where and how necessary, (but off-the-record, I kept changing the migration files as well though I never ran them after the first instance).
I migrated all the data from some other tables and my application was ready to deploy. But when I was exporting the local database tables, and importing them to the remote database, it's giving me a well-known error:
Specified key was too long; max key length is 767 bytes
I actually ignored it because all the tables were imported nicely. But recently I found its caveats - all the AUTO_INCREAMENT and PRIMARY_KEY are not present on my remote database.
I searched what I could, but all the solutions are suggesting to delete the database and create it again with UTF-8 actually could not be my case. And a solution like the following PHP-way is also not my case as I'm using PHPMyAdmin to Import my table while I'm getting the error:
// File: app/Providers/AppServiceProvider.php
use Illuminate\Support\Facades\Schema;
public function boot()
{
Schema::defaultStringLength(191);
}
I also tried running the following command on my target database:
SET #global.innodb_large_prefix = 1;
But no luck. I also tried replacing all the occurrences of my .sql local file:
from utf8mb4 to utf8, and
from utf8mb4_unicode_ci to utf8_general_ci
but found no luck again.
From where the error specifically is coming from, actually the longer foreign keys, like xy_section_books_price_unit_id_foreign, and at this stage when everything is done, I don't know how can I refactor all the foreign keys to 5.5 compatible.
Can anybody please shed some light on my issue?
How can I deploy my local database (v5.7) without losing my PRIMARY_KEYs, FOREIGN KEYS and INDEXes to a v5.5 MySQL database keeping the data intact?
Change your key names. You can overwrite the "default generated" very long key names when you create them. See https://laravel.com/docs/5.8/migrations Available index types for the documentation
I ran in a similar issue when migrating from SQL server to MySQL and the autogenerated key names that had full long namespaces and key names were simply too long. So by replacing those all by hand crafted unique index names I got around those problems.
You don't really need unique names in MySQL, but if you use SQLITE for unit tests you do need unique names.
so instead of:
public function up()
{
....
$table->primary('id');
// generates something like work_mayeenul_islam_workhorse_models_model_name_id_primary_key
$table->index(['foobar','bazbal']);
// generates something like work_mayeenul_islam_workhorse_models_model_name_foobar_bazbal_index
}
You use your own defined, you know these to be short index names.
public function up()
{
....
$table->primary('id', 'PK_short_namespace_modelname_id');
$table->index(['foobar', 'bazbal'], 'IX_short_namespace_modelname_foobar_bazbal');
}
Thank you #Tschallacka for your answer. My problem was, I cannot run php artisan migrate anymore because I've live data on those tables. First of all, the issue let me learn newer things (Thanks to my colleague Nazmul Hasan):
Lesson Learnt
Keys are unique but could even be gibberish
First, I found a pattern in the foreign keys: {table_name}_{column_name}_foreign. Similarly in index keys: {table_name}_{column_name}_index. Lesson learned that the foreign key or index key doesn't have to be in such a format to make work. It has to be unique, but it can be anything and could be gibberish too. So password_resets_email_index key can easily be pre_idx or anything else.
But that was not the issue.
Solution
For the solution, I tried digging the .sql file table by table and scope by scope. And I found only 2 of the UNIQUE key declaration was showing blocking error. And there were 3 other occasions where there were warnings:
ALTER TABLE `contents` ADD KEY `contents_slug_index` (`slug`); --- throwing warning
ALTER TABLE `foo_bar` ADD UNIQUE KEY `slug` (`slug`); --- throwing error
ALTER TABLE `foo_bar` ADD KEY `the_title_index` (`title`) USING BTREE; --- throwing warning
ALTER TABLE `password_resets` ADD KEY `password_resets_email_index` (`email`); --- throwing waring
ALTER TABLE `users` ADD UNIQUE KEY `users_email_unique` (`email`); --- throwing error
Finally, the solution came from this particular StackOverflow thread:
INNODB utf8 VARCHAR(255)
INNODB utf8mb4 VARCHAR(191)
With inspection on those table with the knowledge of that SO thread, I found:
The issue is: with collation utf8mb4_unicode_ci in MySQL 5.5/5.6 the field value cannot be greater than 191. But,
with collation utf8_unicode_ci in MySQL 5.5/5.6 the field value cannot be greater than 255. But with utf8_unicode_ci you cannot save emoji etc.
So I decided to stay with the utf8_unicode_ci for a comparatively longer value. So for a temporary remedy:
I changed all those particular columns, I changed from utf8mb4_unicode_ci to utf8_unicode_ci
If those particular columns exceed 255, I reduced them to 255
So for example, if the table is like below:
CREATE TABLE `foo_bar` (
`id` bigint(20) UNSIGNED NOT NULL,
`cover` varchar(500) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`title` varchar(300) COLLATE utf8mb4_unicode_ci NOT NULL,
`slug` varchar(300) COLLATE utf8mb4_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
I changed only the necessary columns:
CREATE TABLE `foo_bar` (
`id` bigint(20) UNSIGNED NOT NULL,
`cover` varchar(500) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`slug` varchar(255) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
And that's it. This temporary remedy is working just fine, and I didn't have to change the foreign key or index key.
Why this temporary remedy? Because eventually I'll go with MySQL 5.7+, but before that, at least try to cope with the previous versions.
I'm trying to convert a database to use utf8mb4 instead of utf8. Everything is going fine except one table:
CREATE TABLE `search_terms` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`search_term` varchar(128) NOT NULL,
`time_added` timestamp NULL DEFAULT NULL,
`count` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `search_term` (`search_term`),
KEY `search_term_count` (`count`)
) ENGINE=InnoDB AUTO_INCREMENT=198981 DEFAULT CHARSET=utf8;
Basically all it does is save an entry every time somebody searches something in a form so we can track the number of searches, very simple.
There's a unique index on search_term because we want to only have one row per search term and instead increment the count value.
However when converting to utf8mb4 I am getting duplicate entry errors. Here is the command I am running:
ALTER TABLE `search_terms` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Looking in the database I can see various examples like this:
fm2012
fm2012
fm2012
In it's current utf8 character set, these are all being treated as unique and exist within the database without ever having an issue with the unique index on search_term.
But when converting to utf8mb4 they are now being considered equal and throwing an error due to that index.
I can figure out how to merge these together easily enough, but i'm concerned this may be a symptom of a greater underlying problem. I'm not really sure how this has happened or what the consequences may be, so my questions are a bit vague:
Why is utf8mb4 treating these differently to utf8?
What are the possible consequences?
Is there someway I can do a conversion so things like "fm2012" never appear in my database and I only have "fm2012" (I am also using Laravel 5.1)
Your problem is the change of collation: you're using general_ci and you're converting to unicode_ci: general_ci is quite a simple collation that doesn't know much about unicode, but unicode_ci does.
The first "f" in your example string is a "Fullwidth Latin Small Letter F" (U+FF46) which is considered equal to "Latin Small Letter F" (U+0066) by unicode_ci but not by general_ci.
Normally it's recommended to use unicode_ci exactly because of its unicode-awareness but you could convert to utf8mb4_general_ci to prevent this problem.
To prevent this problem in the future, you should normalize your input before saving it in the DB. Normally you'd use NFC, but your case seems to call for NFKC. This should bring all "equivalent" strings to the same form.
Despite what was said previously it is not about general_ci being more simplistic than unicode_ci. Yes, it can be true, but the issue is that you need to keep it matching to the sub-type you have.
For example, my database is utf8_bin. I cannot convert to utf8mb4_unicode_ci nor to utf8mb4_general_ci. These commands will throw an error of a duplicate key being found. However the correct collation utf8mb4_bin completes without issues.
When I insert a link through a form with PHP, my database puts backslashes before the links. I use tinyMCE.
Example, this is how it looks in the database:
(14, 'MULTIMÉDIA', 'multimdia', '<h2><strong>RISING HARMONY </strong></h2>\r\n<p><strong>ITT A ZENE VILÁGA URALKODIK.</strong></p>\r\n<p><strong>KEDVENCÉT ÖN IS MEGOSZTHATJA </strong><strong>AZ ALÁBBI CÍMEN:</strong></p>\r\n<p><strong>personicum#gmail.com </strong></p>', '<p><iframe src=\\"\\\\"\\\\\\\\"\\\\\\\\\\\\\\\\"/www.youtube.com/embed/1ov6USLXwGA\\\\\\\\\\\\\\\\"\\\\\\\\"\\\\"\\" width=\\"\\\\"\\\\\\\\"\\\\\\\\\\\\\\\\"270\\\\\\\\\\\\\\\\"\\\\\\\\"\\\\"\\" height=\\"\\\\"\\\\\\\\"\\\\\\\\\\\\\\\\"152\\\\\\\\\\\\\\\\"\\\\\\\\"\\\\"\\" frameborder=\\"\\\\"\\\\\\\\"\\\\\\\\\\\\\\\\"0\\\\\\\\\\\\\\\\"\\\\\\\\"\\\\"\\" allowfullscreen=\\"allowfullscreen\\"></iframe> <iframe src=\\"\\\\"\\\\\\\\"\\\\\\\\\\\\\\\\"/www.youtube.com/embed/bnv6dPQ5f88\\\\\\\\\\\\\\\\"\\\\\\\\"\\\\"\\" width=\\"\\\\"\\\\\\\\"\\\\\\\\\\\\\\\\"263\\\\\\\\\\\\\\\\"\\\\\\\\"\\\\"\\" height=\\"\\\\"\\\\\\\\"\\\\\\\\\\\\\\\\"150\\\\\\\\\\\\\\\\"\\\\\\\\"\\\\"\\" frameborder=\\"\\\\"\\\\\\\\"\\\\\\\\\\\\\\\\"0\\\\\\\\\\\\\\\\"\\\\\\\\"\\\\"\\" allowfullscreen=\\"allowfullscreen\\"> )
This should be a youtube video, inserted through tinyMCE. It does the same thing to images and any kind of links. So, my question is, why do these things appear?
Here is the table:
CREATE TABLE IF NOT EXISTS `blog_posts_seo` (
`postID` int(11) unsigned NOT NULL AUTO_INCREMENT,
`postTitle` varchar(255) DEFAULT NULL,
`postSlug` varchar(255) DEFAULT NULL,
`postDesc` text,
`postCont` text,
`postDate` datetime DEFAULT NULL,
PRIMARY KEY (`postID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=25 ;
With another table it worked fine, so it means that the problem isn't with my php, but I will copy it here if it is needed. What is this problem caused by? On the website there is a 404 error instead of the video.
If I insert them manually through MySQL, then everything is fine, but if I insert it through the form it looks like this. Also, if I use the same php with another database, it works fine. It's a paradox and I am not experienced. I couldn't find the problem.
Also, locally it works fine, it shows like this:
<p><iframe src="//www.youtube.com/embed/Lcu8SdcsYnY" width="425" height="350"></iframe></p>
Thank you, in advance.
You run addslashes(), or other escaping functions multiple times over your input. You might have also magic quotes on. You should keep your strings in raw, and escape them only right before inserting into database. Best by using PDO's bound parameters.
i'm uploading images into a little cms on my php server, and now i have a file called "1372609671-Terrassenböden Watrawood.jpg" which causes some serious problems. i have downloaded everything to my mac and debugged everything down... facing that:
in my mysql table, everything seems fine, the "ö" appears as "ö" and i can find the file when i write a search-query with the exact filename:
But my php code fails, doing the same query. When i get the filename through the filesystem, with readdir, the resulting query seems strange:
as you can notice, the "ö" is no real "ö" anymore.. it is slightly bigger, but not as big as a big "Ö".. even the cursor is fun, i can stop in the middle of the character, when i hit then Backspace to delete the char, it first deletes the points over it, and on the second time the remaining "o"..
when i convert the filename using e.g. rawurlencode i got this:
you can see an "o" before the utf-8 stuff starts.. and then a %CC giving the dots and %88 giving a kind of space... what the hell is this? how can i get this down to a simple utf-8 "ö", cause using this stuff for a search-query will be useless.. :-/
For more details, the database schema:
CREATE SCHEMA IF NOT EXISTS `cms` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci ;
DROP TABLE IF EXISTS `upload`;
/*!40101 SET #saved_cs_client = ##character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `upload` (
`id` int(11) NOT NULL auto_increment,
`file_name` varchar(255) NOT NULL,
`file_type` varchar(20) NOT NULL,
`file_path` varchar(255) NOT NULL,
`timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
`session_id` varchar(45) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=8965 DEFAULT CHARSET=utf8;
/*!40101 SET character_set_client = #saved_cs_client */;
everything is so far utf-8 on my cms:
<meta charset="utf-8">
There's nothing wrong with what you have here. It's an o followed by U+0308 COMBINING DIAERESIS, which is a correct way to produce an ö. It's called a "decomposed form", while U+00F6 LATIN SMALL LETTER O WITH DIAERESIS is a "composed form". Decomposed forms are more general, while not every character has a composed form (they mostly exist for backwards compatibility). There's nothing not "real" about the decomposed form, and if it displays wrong in your editor it's only because your editor has poor Unicode support. When it comes to searching, again, any correctly-working search engine should treat U+006F U+0308 exactly the same as U+00F6.
However, if you do need to work with broken stuff, what you want is Unicode Normalization, provided in PHP by the normalizer class. NFKC should give you the form you expect.
Password needs to be matched by Password Hash which was originally created on a .NET platform and stored on MSSQL (so encryption is probably SHA1).
Here is how MySQL table looks like:
CREATE TABLE IF NOT EXISTS `test` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`UserName` varchar(100) COLLATE latin1_general_ci DEFAULT NULL,
`PasswordHash` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
`PasswordSalt` int(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci AUTO_INCREMENT=12535 ;
--
-- Dumping data for table `test`
--
INSERT INTO `test` (`id`, `UserName`, `PasswordHash`, `PasswordSalt`) VALUES(9836, 'demoadmin', '?z1??9t|????e&??9aK', -1190254076);
INSERT INTO `test` (`id`, `UserName`, `PasswordHash`, `PasswordSalt`) VALUES(12534, 'sunny', '??o\\(R?8~??6>?t????o', 549612932);
I've found two very close examples to what I need to be done but I was enable to make it work.
Example 1: http://gilbert.pellegrom.me/replicating-net-password-hashing-in-php/
Example 2: http://www.kevinbruce.com/Blog?area_id=6&blog_id=3&ba_id=27
Usernames and passwords are:
First user: demoadmin/demotest
Second user: sunny/eclyptix
Please help!
It looks like you have an encoding problem:
'?z1??9t|????e&??9aK'
It seems that your original code was broken and was converting characters out of the printable ASCII range into question marks.
You could try to replicate this behaviour in PHP. However continuing to use this broken scheme will compromise the security of your system as it is much more likely that a hash collision can be found. It might be necessary to get all your users to change their passwords. This time make sure that the hashes are stored correctly. You may also want to consider storing them as hexadecimal strings instead of binary data to minimize the risk of further encoding problems.
Kevin Bruce here (from the second example you cited).
For what it's worth, I never got the problem solved with my experience. I actually spoke with Elizabeth Smith (who works on PHP core for Windows) and she agreed that there is a big disconnect in hash support on the same level as .NET, due to character encoding support in PHP. This is what I suspected.