Tiny but strong russian issue - php

Plugin Used : http://www.tinybutstrong.com/plugins.php
Russian characters are not displaying correctly.
In mysql database they are stored correctly collation is utf8_general_ci.
I used define('OPENTBS_ALREADY_UTF8','already_utf8');

It looks like an UTF-8 problem.
You have to check that all the data chain is UTF-8 :
all your PHP scripts
the data injected in the template (usually a database), but you also have to check that your PHP script retrieve the data as UTF8. See « How do I make MySQL return UTF-8? »
the template (it is actually UTF8 since it is a LibreOffice or Ms Office template)
Since this chain is ok, you have to use the OPENTBS_ALREADY_UTF8 option to load the template.
$TBS->LoadTemplate('my_template.odt', OPENTBS_ALREADY_UTF8);
You can check that you chain is ok by a test like this :
echo "<!doctype html><html><head><title>Test</title><meta charset='UTF-8'></head><body>";
echo $my_data_from_database;
echo "</body></html>";
exit;
where $my_data_from_database is a data item retrieve from the database and that contains special characters like Russian characters.

Related

encoding issues in drupal when importing from wordpress

I am currently moving blog posts from wordpress to drupal. however after moving it
some of the text is not being displayed correctly.
wordpress is displaying :
When it hasn’t (html code is <h2>When it hasn’t</h2>)
Drupal is displaying :
When it hasn’t (html code is <h2>When it hasn’t</h2>)
In the wordpress and drupal db the value is correct. The source is the same.
<h2>When it hasn’t</h2>
I did a search and found many options. None of them helped.
Below are the ones I have done and checked.
1) I double checked that utf-8 is the character encoing in drupal and wp.
I also made a simple test.php file to check nothing else was coming in the way
and it still did not display correctly.
2) I made sure when we take a mysqldump and upload to drupal utf-8
is used.
3) I also made sure the .php file is in utf-8 when saved.
4) I changed the encoding type in chrome for every option available and nothing
displayed it correctly.
5) I also used php functions to recode it but they did not work.
$value2="<h2>When it hasn’t</h2>";
$out = recode_string('..utf-8', $value2);
//output - When it hasnt
$out2= mb_convert_encoding($value2,'UTF-8', "UTF-8");
// output - When it hasn’t
$out3= #iconv('UTF-8', 'utf-8', $value2);
// output - When it hasn’t
I have ran out of options now and I am stuck. Please help
You say the text in both databases is correct, but actually this doesn't mean too much: to viewing the content of a record you must use some client, and quite a few transformations may happen depending on how the text is rendered so you can read it.
So only two things matters:
the encoding of the column
the encoding of the HTML page returned by Drupal
Since your page outputs ’ (in CP1252 is xE2x80x99) for ’ (Unicode U+2019, UTF-8 is 0xE28099) I guess the column is indeed UTF-8, however there's someone between the database and the browser who thinks the text is CP1252. This is what you have to check:
If using MySQL, the connection encoding must be UTF-8 so that what you have in your PHP script is UTF-8 text. You can use SET NAMES 'UTF-8'. Note that if you don't need the Unicode set, you can even use CP1252: the only important thing is that you know the encoding, since PHP strings are just byte arrays.
Explicitely define the response encoding in the HTTP Content-Type header. I mean, configure Drupal to call header('Content-Type: text/html; charset=utf-8');
If the HTTP response encoding is different than the one used for the text retrieved from the db, transcode the query result accordingly

PHP/MySQL encoding problems. � instead of certain characters

I have come across some problems when inputting certain characters into my mysql database using php. What I am doing is submitting user inputted text to a database. I cannot figure out what I need to change to allow any kind of character to be put into the database and printed back out through php as it's suppose to.
My MySQL collation is: latin1_swedish_ci
Just before I send the text to the database from my form I use mysql_real_escape_string() on the data.
Example below
this text:
�People are just as happy as they make up their minds to be.�
� Abraham Lincoln
is suppose to look like this:
“People are just as happy as they make up their minds to be.”
― Abraham Lincoln
As mentioned by others, you need to convert to UTF8 from end to end if you want to support "special" characters. This means your web page, PHP, mysql connection and mysql table. The web page is fairly simple, just use the meta tag for UTF8. Ideally your headers would say UTF8 also.
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
Set your PHP to use UTF8. Things would probably work anyway, but it's a good measure to do this:
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
For mysql, you want to convert your table to UTF8, no need to export/import.
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8
You can, and should, configure mysql to default utf8. But you can also run the query:
SET NAMES UTF8
as the first query after establishing a connection and that will "convert" your database connection to UTF8.
That should solve all your character display problems.
The likeliest cause of the problem is that the database connection is set to latin1 but you are feeding it text encoded in UTF-8. The simplest way to solve this is to convert your input into what the client expects:
$quote = iconv("UTF-8", "WINDOWS-1252//TRANSLIT", $quote);
(What MySQL calls latin1 is windows-1252 in the rest of the world.) Note that many characters, such as the quotation dash U+2015 that you use there, cannot be represented in this encoding and will be converted into something else. Ideally you should change the column encoding to utf8.
An alternative solution: set the database connection to utf8. It doesn't matter how the columns are encoded: MySQL internally converts text from the connection encoding into the storage encoding, you can keep the columns as latin1 if you want to. (If you do, the quotation dash U+2015 will be turned into a question mark ? because it's not in latin1)
How to set the connection encoding depends on what library you are using: if you use the deprecated MySQL library it's mysql_set_charset, if MySQLi it's mysqli_set_charset, if PDO add encoding=utf8 to the DSN.
If you do this you'll have set the page encoding to UTF-8 with the Content-Type header.
Otherwise you would be having the same problem with the browser: feeding it text encoded in UTF-8 when it's expecting something else:
header("Content-Type: text/html; charset=utf-8");
The solutions provided are helpful if starting from scratch. Putting all possible connections to UTF-8 is indeed the safest. UTF-8 is the most used charset on the net for a variety of reasons.
Some suggestions and a word of warning:
copy the tables you want to sanitize with a unique prefix (tmp_)
although your db-connection is forced to utf8, check you General Settings collation, change to utf8_bin if that was not done yet
you need to run this on the local server
the funny char error is mostly due to mixing LATIN1 with UTF-8 configurations. This solution is designed for this. It could work with other used char-sets that LATIN1 but I haven't checked this
check these tmp_tables extensively before copying back to the original
Builds the 2 array needed for the magic:
$chars = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES, "UTF-8");
$LATIN1 = $UTF8 = array();
while (list($key,$val) = each ($chars)) {
$UTF8[] = $key;
$LATIN1[] = $val;
}
Now build up the routines you need: (tables->)rows->fields and at each field call
$row[$field] = mysql_real_escape_string(str_replace($LATIN1 , $UTF8 , $row[$field]));
$q[] = "$field = '{$row[$field]}'";
Finally build up and send the query:
mysql_query("UPDATE $table SET " . implode(" , " , $q) . " WHERE id = '{$row['id']}' LIMIT 1");
change the MySQL collation to utf8_unicode_ci or utf8_general_ci, including the table and the database.
You will need to set your database in utf-8 yes. There is many ways to do it. By changin the config file, via phpmyadmin or by calling php function (sorry memory blank) right before insert and update the mysql.
Unfortunately, i think you will have to re-enter any data you entered before.
One thing you also need to know, from personnal experience, make sure all table with relation have the same collation or you won'T be able to JOIN them.
as reference: http://dev.mysql.com/doc/refman/5.6/en/charset-syntax.html
Also, i can be a apache setting. We've experienced the same issue on 'free-hosting' server as well as on my brother's server. Once switched to another server, all the charater's became neat. Verfiy you apache setting, sorry but i can't bting more light on apache's config.
Get rid of everything you just need to follow these two points, every problem regarding special languages characters will be resolved.
1- You need to define the collation of your table to be utf8_general_ci.
2- define <meta http-equiv="content-type" content="text/html; charset=utf-8"> in the HTML after head tag.
2- You need to define the mysql_set_charset('utf8',$link_identifier); in the file where you made connection with the database and right after the selection of database like 'mysql_select_db' use this 'mysql_set_charset' this will allow you to add and retrieve data properly in what ever the language it is.
If your text has been encoded and decoded with the wrong encoding and so the mojibake is actually "solidified" into unicode characters, then the solutions mentioned so far won't work. I ended up having success with the ftfy Python package to automatically detect/fix mojibake:
https://github.com/LuminosoInsight/python-ftfy
https://pypi.org/project/ftfy/
https://ftfy.readthedocs.io/en/latest/
>>> import ftfy
>>> print(ftfy.fix_encoding("(ง'⌣')ง"))
(ง'⌣')ง
Hopefully this helps people who are in a similar situation.

Same dataset outputs different characters : phpmyadmin / own query

Im trying to get a some data from the db , but the output isn't what i expected.
Doing my own querying on the db , i get this output : string 'C�te d�Ivoire' (length=13)
Querying the db from phpmyadmin i get normal output : Côte d’Ivoire
php.ini default charset, mysql db default charset , <meta> charset are all set to utf-8 .
I can't fugire it out where the encoding is being made that i get different output with same configuration .
P.S. : using mysqli driver .
In the same page that gives you wrong results, try first running this instruction
print base64_encode("Côte");
The correct answer is Q8O0dGU.... If you get something else, like Q/R0ZQo..., this means that your script is working with another charset (here Latin-1) instead of UTF-8. It's still possible that also MySQL and also the browser are playing tricks, but the line above ensures that PHP and/or your editor are playing you false.
Next, extract Côte from the database and output its base64_encode. If you see Q8O0..., then the connection between MySQL and PHP is safely UTF8. If not, then whatever else might also be needed, you need to change the MySQL charset (SET NAMES utf8 and/or ALTER of table and database collation).
If PHP is UTF8, and MySQL is UTF8, and still you see invalid characters, then it's something between PHP and the browser. Verify that the content type header is sent correctly; if not, try sending it yourself as first thing in the script:
Header('Content-Type: text/html; charset=UTF8');
For example in Apache configuration you should have
AddDefaultCharset utf-8
Verify also that your browser is not set to override both server charset and auto-detection.
NOTE: as a rule of thumb, if you get a single diamond with a question mark instead of a UTF8 international character, this means that an UTF8 reader received an invalid UTF8 code point. In other words, the entity showing the diamond (your browser) is expecting UTF8, but is receiving something else, for example Latin1 a.k.a. ISO-8859-15.
Another difficult-to-track way of getting that error is if the output somehow contains a byte order mark (BOM). This may happen if you create a file such as
###<?php
Header("Content-Type: text/html; charset=UTF8");
?>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF8" />
</head>
<body>
Hellò, world!
</body>
</html>
where that ### is an (invisible in most editors) UTF8 BOM. To remove it, you either need to save the file as "without BOM" if the editor allows it, or use a different editor.
If you do your "own querying" with the command line tool mysql, you have to set the option --default-character-set=utf8, too. Otherwise, please tell us how you do your own querying.

Ajax autocompletion lost in character encoding

The scheme is a text input field in a html form to be autocompleted using jQuery.autocomplete and getting the appropriate server response (e;g. a city name json list). The whole package works well... except that the client does not get data returned from the server when typing accented characters (éèà..). Same as many, it looks like I'm facing a char encoding issue but can not manage to figure out where and how to solve it despite many tries (iconv, utf8_encode, urldecode...) and readings like this one for example.
Therefore I'd need some help/hints to understand where to act (before prototyping jQuery autocomplete code ... ?)
EDIT: might be also a jQuery accent folding issue, I'll try also that way.
Configuration:
server: Apache2.2 (debian lenny)
php : compiled 5.3.3 (so the option JSON_UNESCAPED_UNICODE is not available for json_encode)
mysql: 5.1.49 with MySQL charset: UTF-8 Unicode (utf8),
class: using a modified PFBC2.x version for the php form building
meta
The website is mostly for french users so it's all designed with ISO-8859-1 (bad initial choice I guess) :
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
jQuery autocomplete code (applied to the city input field)
// DEBUG Testing (tested w/ and w/o the $charset_attr: no change)
$charset_attr = 'contentType: "application/x-www-form-urlencoded;charset=ISO-8859-1"';
echo 'jQuery("#' . $this->attributes["id"] . '").autocomplete({source:"' , $this->xhr_path . '", minLength:2, ' . $charset_attr .'});';
The generated code for that input field is matching the above expection.
Converting mysql rows into utf8 using this function :
I convert the msyql returned array into utf8 prior to sending json back to the client. Actually I tested and wrote also other functions, but this does not change anything so I guess the point is not there.
$encoded_arr = utf8json($returnData);
echo json_encode($encoded_arr);
flush();
Encoding control 1 (client side)
A embed control in the html form in order to check which char encoding is actually passed to jQuery.autocomplete :
jQuery(document).ready(function() {
<?php
$test_str ="foobar";
$check_encoding = "'" . mb_detect_encoding($test_str) . "'";
?>
alert('Check charset server encoding: ' + <?php echo $check_encoding;?> ); // output : ASCII
});
Encoding control 2 (server side)
$inputData = (isset($_GET))? htmlspecialchars($_GET['term'],ENT_COMPAT, 'UTF-8') : NULL;
$encoding_get = mb_detect_encoding($_GET['term']);
$encoding_data = mb_detect_encoding($inputData);
$utf8converted = #iconv(strtolower($encoding_get), 'utf-8', $inputData);
$checkconversion = mb_detect_encoding($utf8converted);
Sending lowcase normal characters (ea..), I get all as ASCII.
Sending lowcase accented characters (éèà..), I get all as UTF8.
So I'm lost as the server receives the proper char string, produces a json return (tested without ajax) but it looks like the client does not receive or interprate this properly.
For those facing the same kind of ...%$# issue, here is what I've done to solve my case :
Checking the char encoding at each node (eg client, apache server, mysql server), using mb_detect_encoding on the server side,
Finally pointed out the problem location node : in my case passing UTF8 chars to the mysql server i/o latin ISO-8859-1, so mysql server did not return the expected answers, which I could not detect or debug with direct url POSTing data to the server script. So I had log the input and output in a file, checking entry character encoding and mysql server output.
Changed the ajax request to POST i/o GET,
Solved by encoding $_POST data to ISO prior sending the mysql server request, using mb_convert_encoding, as well described here.

Non Latin Characters & ouch

I'm getting to know Cake PHP, which has unearthed a general question about best practice in terms of PHP / MySQL character set stuff, which I'm hoping can be answered here.
My (practice) system contains a mysql table of movies. This list was sourced from an Excel sheet, which was exported as CSV, and imported via phpMyAdmin.
I noticed that titles with more "exotic" glyphs have issues rendering in the browser, eg The é in Amélie. Using Cake or plain PHP, it renders as a ?, unless transformed via htmlentities into a é. Links with the special characters don't render at all.
If I use my Cake input form to enter an <alt>0233, this is rendered correctly in source, but as é via htmlentities.
After a quick SO search, I decided maybe UTF-8 would fix stuff, hence I
changed the PHP source, and CSV file encoding to UTF-8
made sure the <meta> stuff was there (it was already via Cake's default layout).
made sure my browsers thinks the doc is UTF-8 (they do)
changed the collation on the MySQL DB to utf-8 general_ci (as an educated stab from avalable UTF-8 options)
deleted and reimported my data
However, I'm still stuck. I note that phpMyAdmin manages to render the characters "correctly" in it's HTML source when browsing records.
I sense that document encoding's to blame, however, am wondering if someone can provide the best answer to:
what's the best way to move my data from Excel to MySQL to preserve glyphs?
what's the optimum settings for my tables to accommodate this?
I'd prefer to use UTF-8 to natively display the likes of é, what can I do in Cake to avoid making loads of calls to the likes of htmlentities ie is there a configuration setting or way I set stuff up that makes this more friendly and lets Cake native helpers like Html->link work?
Some code, just in case:
movies controller excerpt..
function index() {
$this->set('movies' , $this->Movie->find('all'));
}
index.ctp view excerpt
<?php foreach ($movies as $movie): ?>
<tr>
<td><?php echo $movie['Movie']['id']; ?></td>
<td><?php echo htmlentities($movie['Movie']['title']); ?>
<td><?php echo $this->Html->link($movie['Movie']['title'] ,
array('controller' => 'movies' , 'action' => 'view' , $movie['Movie']['id'])); ?>
</td>
<td><?php echo $this->Html->link("Edit",
array('action' => 'edit' , $movie['Movie']['id'])); ?>
</td>
<td>
<?php echo $this->Html->link('Delete', array('action' => 'delete', $movie['Movie']['id']), null, 'Are you sure?')?>
</td>
</tr>
<?php endforeach; ?>
Thanks in advance for any help / tips.
Make sure the MySQL connection is set to UTF-8 while importing the data. The collation is only used for sorting and comparison, not for saving data.
You can set the charset of the connection using SET NAMES 'utf-8'; in the beginning of your SQL file.
That question comes here often.
UTF8 should work. Make sure that:
Your database collation uses utf8 (utf8 bin general)
You html document encoding tag is set to utf8
AND VERY IMPORTANT - most people forget that bit - make sure all your source files are saved as utf8. Use notepad++ on pc or Coda/TextMate/TextWrangler on mac to make sure the encoding is correct. If you don't do that, some transformation/re-interpretation of the characters may happen
EDIT: And forget about htmlentities, you don't need it if you use utf8 encoding all throughout

Categories