Character missing when saving a row? - php

$userTb = new My_Tb_User(); //Child of Zend_Db_Table_Abstract
$row = $userTb->find(9)->current();
$row->name = 'STÖVER';
$row->save();
Inside user table at row 9 for name column value ST gets stored instead of STÖVER ?
Ö is a german character supported in UTF-8 . IF I enter manually 'STÖVER' using phpmyadmin it get stored correctly .
I also passed charset parameter with value utf8 when creating db adapter but still no luck !

If you read the manual entry for utf8_encode, it converts an ISO-8859-1 encoded string to UTF-8. The function name is a horrible misnomer, as it suggests some sort of automagic encoding that is necessary. That is not the case. If your source code is saved as UTF-8 and you assign "STÖVER" to $string, then $string holds the character "STÖVER" encoded in UTF-8. No further action is necessary. In fact, trying to convert the UTF-8 string (incorrectly) from ISO-8859-1 to UTF-8 will garble it.
utf8_encode('STÖVER');
check this question in stackoverflow

This is a bad practice to use utf8_encode, this adds a lot of complexity to your app. Try to solve the problem by looking for the source.
Ssome thoughts :
a database server charset problem (check encoding of your server)
a database client charset problem (check encoding of your connection)
a database table charset problem (check encoding of your table)
a php default encoding problem (check default_encoding parameter in parameters.ini)
a multibyte missconfigured (see mb_string parameters in parameters.ini)
a <form> charset problem (check that it is sent as utf-8)
a <html> charset problem (where no enctype is set in your html file)
a Content-encoding: problem (where the wrong encoding is sent by Apache).

Related

Illegal mix of collations PHP MYSQL, latin1_swedish_ci and utf8_general_ci

I get the following coallation problem in my application when I try to select something where two strings are equal:
SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
In the stacktrace I can see the parameter Lamellt\xE4ckning which means Lamelltäckning and I think my parameter implicitly invokes the latin1_swedish_ci coallation.
my whole database uses this:
DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci
When I insert Strings from PHP I just make a simple insert:
$name = "Lamelltäcke";
$db->update("insert into....");
The data I am now trying to use comes from a CSV file and I do not know if I can solve this just by setting coallation i some way or if I need to convert the String in some way
What is the problem here? And how can I solve it?
It seams to be a problem when I insert data from PHP. I made the pdo connection like this:
$db = new \PDO($dsn, $config->db_user, $config->db_pass, array(\PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
When I define a string like
$str = "åuuäuuö";
In PHP and insert it there is no problem. But when I receive the string from a post request I can echo it out just fine
"åuuäuuö"
In the database however it now gets inserted like
"?uu?uu?"
mb_detect_encoding($str);
gives: UTF-8
The problem was the encoding of the string itself. My database uses UTF-8 but the encoding was ISO-8859-x. To make everything worse my Java client also had another encoding which made this hard to debug. It is called "Quoted String".
What finally helped me solve the problem was this piece of PHP code which takes a String and converts it from all possible encodings to UTF-8 and prints it. Look for a row where your string is printed correctly and there is your encoding of the string. Then when you obtain the correct encoding, rencode your string using mb_convert_encoding.
$str = "String of unknown encoding with chars like äåö or something else";
foreach(mb_list_encodings() as $chr){
echo mb_convert_encoding($str, 'UTF-8', $chr)." : ".$chr."\r\n";
}
A note is to make sure that the client also uses the right encoding. In my case this was a Java program, in a normal case this would be your webapp/browser and .
Try to convert your PHP file to UTF8
Right after database connection, make a query with SET NAMES 'utf8'
Also, check that your field charset is UTF8.

php 5.2.17 to php 5.4.4 encoding issues

I'm having issues updating to php 5.4.4 since my database records are show in browser with errors.
I've searched and many people said they had to make fixes on they code, but no one says wich fixes they made.
So since I'm not a php expert, I'm asking here help to point me in the right way.
what I want "á é í ó ú"
I've
<form id="editar" accept-charset="UTF-8" action="javascript:void(0)" method="post">
on submit my php does this
$txt_edi = htmlspecialchars($_POST['text_to_edit'], ENT_QUOTES, 'ISO-8859-1');
$query = $ligacao -> prepare("UPDATE mytable SET description = '".utf8_encode($txt_edi)."' ");
ok, on php 5.2.17 my mysql record is like " á é í ó ú "
on php 3.4.4 my mysql record is like this " á é í ó ú "
So I'm assuming something has changed in utf8_encode otherwise my record would be the same in both php vertions...
NOTE that if I dont add the third parameter to htmlspecialchars in php 5.4.4 my string gets empty ( if I change to UTF-8 it gets empty too ), in 5.2.17 it goes to database with no problem.
My educated guess (given your alarming lack of code) is that you are using htmlspecialchars(). That function has changed the default value for its third-argument in PHP/5.4—from ISO-8859-1 to UTF-8:
encoding
From PHP 5.6.0, default_charset value is used as default. From PHP 5.4.0, UTF-8 is the default. PHP prior to 5.4.0, ISO-8859-1 is used as the default. Although this argument is technically optional, you are
highly encouraged to specify the correct value for your code.
Solution: always provide the third parameter.
Edit: some random thoughts about updated question
App (supposedly) uses ISO-8859-1 but you force the browser to convert to UTF-8:
accept-charset="UTF-8"
When you receive the form you process it as if it was ISO-8859-1 (which it isn't):
htmlspecialchars($_POST['text_to_edit'], ENT_QUOTES, 'ISO-8859-1')
Finally you convert from fake ISO-8859-1 to UTF-8:
utf8_encode($txt_edi)
... and you inject the untrusted input into a SQL statement:
"UPDATE mytable SET description = '".utf8_encode($txt_edi)."' "
... even though your database class apparently supports prepared statements:
$query = $ligacao -> prepare(...)
Nothing in this code illustrates the problem in the original question (displaying data) but I have the impression that either it works in PHP/5.2 by pure chance or stuff in database is already corrupted (or both).
At this point, I'd normally suggest switching everything to UTF-8 and forgetting about encodings and conversions forever. But there's an added problem: you convert to HTML before storing in the database.
So, sorry, I'm completely lost. I'll gladly remove this answer if you consider it isn't' useful.

Why does my output change?

I'm working with UTF-8 encoding in PHP and I keep managing to get the output just as I want it. And then without anything happening with the code, the output all of a sudden changes.
Previously I was getting hebrew output. Now I'm getting "&&&&&".
Any ideas what might be causing this?
These are most common problems:
Your editor that you’re creating the PHP/HTML files in
The web browser you are viewing your site through
Your PHP web application running on the web server
The MySQL database
Anywhere else external you’re reading/writing data from (memcached, APIs, RSS feeds, etc)
And few things you can try:
Configuring your editor
Ensure that your text editor, IDE or whatever you’re writing the PHP code in saves your files in UTF-8 format. Your FTP client, scp, SFTP client doesn’t need any special UTF-8 setting.
Making sure that web browsers know to use UTF-8
To make sure your users’ browsers all know to read/write all data as UTF-8 you can set this in two places.
The content-type tag
Ensure the content-type META header specifies UTF-8 as the character set like this:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
The HTTP response headers
Make sure that the Content-Type response header also specifies UTF-8 as the character-set like this:
ini_set('default_charset', 'utf-8')
Configuring the MySQL Connection
Now you know that all of the data you’re receiving from the users is in UTF-8 format we need to configure the client connection between the PHP and the MySQL database.
There’s a generic way of doing by simply executing the MySQL query:
SET NAMES utf8;
…and depending on which client/driver you’re using there are helper functions to do this more easily instead:
With the built in mysql functions
mysql_set_charset('utf8', $link);
With MySQLi
$mysqli->set_charset("utf8")
*With PDO_MySQL (as you connect)*
$pdo = new PDO(
'mysql:host=hostname;dbname=defaultDbName',
'username',
'password',
array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")
);
The MySQL Database
We’re pretty much there now, you just need to make sure that MySQL knows to store the data in your tables as UTF-8. You can check their encoding by looking at the Collation value in the output of SHOW TABLE STATUS (in phpmyadmin this is shown in the list of tables).
If your tables are not already in UTF-8 (it’s likely they’re in latin1) then you’ll need to convert them by running the following command for each table:
ALTER TABLE myTable CHARACTER SET utf8 COLLATE utf8_general_ci;
One last thing to watch out for
With all of these steps complete now your application should be free of any character set problems.
There is one thing to watch out for, most of the PHP string functions are not unicode aware so for example if you run strlen() against a multi-byte character it’ll return the number of bytes in the input, not the number of characters. You can work round this by using the Multibyte String PHP extension though it’s not that common for these byte/character issues to cause problems.
Taken form here: http://webmonkeyuk.wordpress.com/2011/04/23/how-to-avoid-character-encoding-problems-in-php/
Try after setting the content type with header like this
header('Content-Type: text/html; charset=utf-8');
Try this function - >
$html = "Bla Bla Bla...";
$html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8");
for more - http://php.net/manual/en/function.mb-convert-encoding.php
I put together this method and called it in the file I'm working with, and that seemed to resolve the issue.
function setutf_8()
{
header('content-type: text/html; charset: utf-8');
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');
}
Thank you for all your help! :)

PHP/MySQL encoding problems. � instead of certain characters

I have come across some problems when inputting certain characters into my mysql database using php. What I am doing is submitting user inputted text to a database. I cannot figure out what I need to change to allow any kind of character to be put into the database and printed back out through php as it's suppose to.
My MySQL collation is: latin1_swedish_ci
Just before I send the text to the database from my form I use mysql_real_escape_string() on the data.
Example below
this text:
�People are just as happy as they make up their minds to be.�
� Abraham Lincoln
is suppose to look like this:
“People are just as happy as they make up their minds to be.”
― Abraham Lincoln
As mentioned by others, you need to convert to UTF8 from end to end if you want to support "special" characters. This means your web page, PHP, mysql connection and mysql table. The web page is fairly simple, just use the meta tag for UTF8. Ideally your headers would say UTF8 also.
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
Set your PHP to use UTF8. Things would probably work anyway, but it's a good measure to do this:
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
For mysql, you want to convert your table to UTF8, no need to export/import.
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8
You can, and should, configure mysql to default utf8. But you can also run the query:
SET NAMES UTF8
as the first query after establishing a connection and that will "convert" your database connection to UTF8.
That should solve all your character display problems.
The likeliest cause of the problem is that the database connection is set to latin1 but you are feeding it text encoded in UTF-8. The simplest way to solve this is to convert your input into what the client expects:
$quote = iconv("UTF-8", "WINDOWS-1252//TRANSLIT", $quote);
(What MySQL calls latin1 is windows-1252 in the rest of the world.) Note that many characters, such as the quotation dash U+2015 that you use there, cannot be represented in this encoding and will be converted into something else. Ideally you should change the column encoding to utf8.
An alternative solution: set the database connection to utf8. It doesn't matter how the columns are encoded: MySQL internally converts text from the connection encoding into the storage encoding, you can keep the columns as latin1 if you want to. (If you do, the quotation dash U+2015 will be turned into a question mark ? because it's not in latin1)
How to set the connection encoding depends on what library you are using: if you use the deprecated MySQL library it's mysql_set_charset, if MySQLi it's mysqli_set_charset, if PDO add encoding=utf8 to the DSN.
If you do this you'll have set the page encoding to UTF-8 with the Content-Type header.
Otherwise you would be having the same problem with the browser: feeding it text encoded in UTF-8 when it's expecting something else:
header("Content-Type: text/html; charset=utf-8");
The solutions provided are helpful if starting from scratch. Putting all possible connections to UTF-8 is indeed the safest. UTF-8 is the most used charset on the net for a variety of reasons.
Some suggestions and a word of warning:
copy the tables you want to sanitize with a unique prefix (tmp_)
although your db-connection is forced to utf8, check you General Settings collation, change to utf8_bin if that was not done yet
you need to run this on the local server
the funny char error is mostly due to mixing LATIN1 with UTF-8 configurations. This solution is designed for this. It could work with other used char-sets that LATIN1 but I haven't checked this
check these tmp_tables extensively before copying back to the original
Builds the 2 array needed for the magic:
$chars = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES, "UTF-8");
$LATIN1 = $UTF8 = array();
while (list($key,$val) = each ($chars)) {
$UTF8[] = $key;
$LATIN1[] = $val;
}
Now build up the routines you need: (tables->)rows->fields and at each field call
$row[$field] = mysql_real_escape_string(str_replace($LATIN1 , $UTF8 , $row[$field]));
$q[] = "$field = '{$row[$field]}'";
Finally build up and send the query:
mysql_query("UPDATE $table SET " . implode(" , " , $q) . " WHERE id = '{$row['id']}' LIMIT 1");
change the MySQL collation to utf8_unicode_ci or utf8_general_ci, including the table and the database.
You will need to set your database in utf-8 yes. There is many ways to do it. By changin the config file, via phpmyadmin or by calling php function (sorry memory blank) right before insert and update the mysql.
Unfortunately, i think you will have to re-enter any data you entered before.
One thing you also need to know, from personnal experience, make sure all table with relation have the same collation or you won'T be able to JOIN them.
as reference: http://dev.mysql.com/doc/refman/5.6/en/charset-syntax.html
Also, i can be a apache setting. We've experienced the same issue on 'free-hosting' server as well as on my brother's server. Once switched to another server, all the charater's became neat. Verfiy you apache setting, sorry but i can't bting more light on apache's config.
Get rid of everything you just need to follow these two points, every problem regarding special languages characters will be resolved.
1- You need to define the collation of your table to be utf8_general_ci.
2- define <meta http-equiv="content-type" content="text/html; charset=utf-8"> in the HTML after head tag.
2- You need to define the mysql_set_charset('utf8',$link_identifier); in the file where you made connection with the database and right after the selection of database like 'mysql_select_db' use this 'mysql_set_charset' this will allow you to add and retrieve data properly in what ever the language it is.
If your text has been encoded and decoded with the wrong encoding and so the mojibake is actually "solidified" into unicode characters, then the solutions mentioned so far won't work. I ended up having success with the ftfy Python package to automatically detect/fix mojibake:
https://github.com/LuminosoInsight/python-ftfy
https://pypi.org/project/ftfy/
https://ftfy.readthedocs.io/en/latest/
>>> import ftfy
>>> print(ftfy.fix_encoding("(ง'⌣')ง"))
(ง'⌣')ง
Hopefully this helps people who are in a similar situation.

Same dataset outputs different characters : phpmyadmin / own query

Im trying to get a some data from the db , but the output isn't what i expected.
Doing my own querying on the db , i get this output : string 'C�te d�Ivoire' (length=13)
Querying the db from phpmyadmin i get normal output : Côte d’Ivoire
php.ini default charset, mysql db default charset , <meta> charset are all set to utf-8 .
I can't fugire it out where the encoding is being made that i get different output with same configuration .
P.S. : using mysqli driver .
In the same page that gives you wrong results, try first running this instruction
print base64_encode("Côte");
The correct answer is Q8O0dGU.... If you get something else, like Q/R0ZQo..., this means that your script is working with another charset (here Latin-1) instead of UTF-8. It's still possible that also MySQL and also the browser are playing tricks, but the line above ensures that PHP and/or your editor are playing you false.
Next, extract Côte from the database and output its base64_encode. If you see Q8O0..., then the connection between MySQL and PHP is safely UTF8. If not, then whatever else might also be needed, you need to change the MySQL charset (SET NAMES utf8 and/or ALTER of table and database collation).
If PHP is UTF8, and MySQL is UTF8, and still you see invalid characters, then it's something between PHP and the browser. Verify that the content type header is sent correctly; if not, try sending it yourself as first thing in the script:
Header('Content-Type: text/html; charset=UTF8');
For example in Apache configuration you should have
AddDefaultCharset utf-8
Verify also that your browser is not set to override both server charset and auto-detection.
NOTE: as a rule of thumb, if you get a single diamond with a question mark instead of a UTF8 international character, this means that an UTF8 reader received an invalid UTF8 code point. In other words, the entity showing the diamond (your browser) is expecting UTF8, but is receiving something else, for example Latin1 a.k.a. ISO-8859-15.
Another difficult-to-track way of getting that error is if the output somehow contains a byte order mark (BOM). This may happen if you create a file such as
###<?php
Header("Content-Type: text/html; charset=UTF8");
?>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF8" />
</head>
<body>
Hellò, world!
</body>
</html>
where that ### is an (invisible in most editors) UTF8 BOM. To remove it, you either need to save the file as "without BOM" if the editor allows it, or use a different editor.
If you do your "own querying" with the command line tool mysql, you have to set the option --default-character-set=utf8, too. Otherwise, please tell us how you do your own querying.

Categories