I have a database that's seems to be on latin1_swedish. I need to add some more text to it. The new text contains some Brazilian words. Example:
tilápia
Cachaça
...
The old text that is in the db has these words too, but it's like this:
tilápia
The PHP file is converting it to the real word, using the right accent.
How can I add these texts and keep PHP converting files? For example, add tilápia on my table and mysql keeps it as tilápia.
Thanks, hope it's not confusing.
While the collation should definitely be something more generic like utf8_generic, that won't change how things are displayed. MySQL will store whatever you throw at it and will return exactly the same thing when you ask for it. Hence, you just have to make sure to use the same encoding for reading and writing. In general it's a good idea to use utf8 through the whole application (including db). For that you would need to convert the content in your db.
Related
I have been poring over stackoverflow all night looking for a way to solve my issues, but I absolutely cannot get the browser to display my Unicode characters correctly when pulling them from my database. In particular, I am trying to use the "combining macron" character (U+0304), added after a character to put a macron over it. I want the user to have the option to turn them on and off, and having one character to look for and ignore seems easier to accomplish this than instead of making conversions between individual macroned letters and their non-macroned counterpart (e.g. Ā -> A).
It would be trivial to use the HTML entity (& #772;) to accomplish this, but if I were to use the MySQL database for something other than making a webpage I want it to be easily transferable. I have tested with the HTML entity and I can get it to successfully add a macron to the previous character.
However, when using the Unicode character in my MySQL table, I absolutely cannot get it to print anything other than question marks (?) in the browser. In the table itself, the entry is a VARCHAR(64) and looks like 'word¯' with the macron appearing afterwards, but I assume that's just a limitation of the cmd environment that it doesn't put the macron over the d. The column Collation is latin1_swedish_ci, if that makes a difference. Here is what I have tried to get the entry to print correctly:
Changing my php.ini to have a default charset of utf-8
Making the top of my php file read:
<?php
header('Content-Type: text/html; charset=utf-8');
?>
And setting the first parameter of my database PDO as mysql:dbname=NAME;host=localhost;charset=utf8'
When I simply make the php file echo the character I want, it prints to the page correctly. So I'm thinking the problem isn't with the encoding? Or maybe the encoding of the database and the server aren't the same and that is creating the ??
EDIT:
I can get it to correctly display if I insert the value from PhPMyAdmin, but not when I enter it through the cmd. In both cases I am pasting the same word with an ending character of 'U+0304'. Is there a reason that it works with PHPMyAdmin and not through a direct query, and what can I do so it works with both?
I'm storing data in a MySQL database that may have some special characters. I'm wondering how to store it so that these characters are preserved if they're either output to HTML via PHP OR via JavaScript, e.g. createTextNode.
For example, the division symbol (÷) has the html code ÷, and when I store it as that it shows up fine when put directly into HTML by PHP, but when I pull it into JavaScript using $.getJSON and then insert it with createTextNode it shows up looking like ÷.
I also tried storing the symbol in the SQL directly, but my understanding is that the column would need to be changed from VARCHAR to NVARCHAR and that would cause a performance hit that doesn't seem necessary.
Given that I can modify the SQL, the PHP, or the JavaScript, is there an easy fix here? Maybe a way to unescape the HTML entity in JavaScript?
As answered by Yogesh, you should switch your collation of the DB to utf8_general_ci
So there's probably two things going on:
JSON escapes special characters.
Somewhere, something in your code flow is URL encoding the strings too.
So you just need to decode the string in your JavaScript, or you need to find what part of your code is URL encoding those strings and fix it.
I'm unsure if this is a php-, filemaker-, mysql- or an odbc driver issue.
For security reasons the input fields of my current php webform convert special characters into hex codes, (for example: # becomes ' ) This hex code is saved in the database and will also be shown in Filemaker11 as the hex code. This is not what i want.
How can I make sure the special character will be displayed as it should be?
The other way round (from filemaker to db), no conversion will be done on inserting the special characters.
How can I make sure everything will be consistent?
Kind regards,
Jeroen
FileMaker is just showing the data stored in MySQL. If you pull up the DB in a tool like PhpMyAdmin you should see that the varchar contains the encoding as well. Since FMP is looking at it simply as a text field, it shows the encoding that was stored. If you wanted to decode in FMP you could show a calc field of the varchar that has a custom function to decode the text. (but that won't allow for updating the data..) You could also try a trigger on record load to decode the data in the fields so that you can properly view/edit.
Solved it! It appeared that I had to add an extra line to my PHP script.
after setting up the connection, php needs to tell mysql what the encoding needs to be. This can be done with the following line:
$dbh->query("SET NAMES 'utf8'");
Thanks for the effort guys!
This: ' type of encoding is not done automatically by the browser. Something is doing it. Normally you do it only on output not on input.
You can use html_entity_decode() to undo it. But I strongly suggest you figure out why it's happening in the first place.
When I display contents from the database, I get this:
��Some will have a job. Others will want one. They are my people, they are my clients and they are being denied their rights.
This text had been entered by the user via textarea with tinyMCE. How can I replace special characters (using preg_replace()) from the sentence to ' ' except for the characters: <>?
This article is totally worth a read. Dealing with UTF-8 characters is something that we all go through at some point. The trick seems to be to catch them before they go into the database or to fix the database so that when they're going in they aren't broken. Once they're in there though it's slightly more difficult.
As Chuck mentioned above, it is the database problem. Unless you only wish to display non-Unicode, ie Latin characters, then yes, preg_replace is the way to go. You will need to know the character sets well enough to filter out what you don't want.
But if you just want everything to display nicely, ie no garbage characters, then change the corresponding parts of the db to accept utf-8.
e.g. If you are using mySQL, try changing the field and table encoding to be able to accept UTF-8. The default is latin1_general_ci - try changing it to utf8_general_ci. Hope that explains my point.
I have just imported a huge MySQL database. Most fields are latin1_swedish_ci, and they contain lots of corrupted strings.
e.g. Cavit Y�r�kl� instead of Cavit Yürüklü
I have been trying to find a solution to fix these corruptions using PHP as thats all I know a little bit of. I have played unsuccessfully with utf8_(en|de)code, iconv.
Please help!!! As it is loads of corruptions.
UPDATE: Reimported as Latin 1 and now have for above, Cavit Y�r�kl�. So its definately different but the sql itself seems to be corrupted.
Yeah it's using the wrong encoding. Check out http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html to know how to fix it. You just need to find out what encoding it is in now and what you want it to be in and then you can convert. Or setup the db to match the encoding of the data you are importing (if thats an option)
First I would make a copy of the db dump, then I would try using iconv - and I know you said you tried but there are many, many combinations of character encodings that you can try out - I once had to fix some corrupted Russian Cyrillic data - what ended up working was specifying an output value of 'UTF-8//TRANSLIT' - I would try all the combinations that you can but remember to keep a copy of the original.