PHP 5.4 multipart/form-data UTF-8 encoding - php

I'm having problem with UTF-8 encoding while posting form data as "multipart/form-data", without multipart/form-data everything works well. But since I have to upload files on same post, I need to use multipart/form-data.
Problem started after upgrading from PHP 5.3.x to PHP 5.4.4-14 (bundled with Debian Wheezy), same scripts works well with PHP 5.3 test server.
All of my documents are saved in UTF-8 and has <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> tags.
I tested with different browsers on different computers
mb_detect_encoding() detects posted string as UTF-8
I tried AddDefaultCharset utf-8 for Apache configuration.
Here you can test my scripts, you can copy/paste following string with Turkish characters (ex. string: öçşipğopüp )
http://sa.chelona.com.tr/haber-ekle.html
I also found related question at UTF-8 text is garbled when form is posted as multipart/form-data in PHP but it recommends re-installing apache/php and that's not possible for my situation. Is this a known PHP/Apache bug?

Do a simple conversion from UTF-8 to Turkish Alphabet ISO-8859-9 and the problem should be solved
iconv('UTF-8', "ISO-8859-9", $string);
Example Input : öçşipğopüp
Example Form:
<form method="post" enctype="multipart/form-data" action ="self.php">
<input type="text" name="hello" />
<input type="submit" name="test" />
</form>
Simple Sump :
var_dump($_POST['hello'],iconv('UTF-8', "ISO-8859-9", $_POST['hello']));
Output
string 'öçşipğopüp ' (length=16)
string 'öçþipðopüp ' (length=11)

I'm writing this to answer my own question... I hope it will help somebody else...
if you use PHP 5.4.x, setting mbstring.http_input from "auto" to "pass" may solve your problem.

My php version is 5.4.45 and changing mbstring.http_input from auto to pass works very well. In php.ini file the default value is pass. For more detail about this variable you can see here.

you should to try to re-install your wamp or xampp or your apache and php.and run your code on some one else's machine with the same php version .if this code runs then try to figure out why it is not working in your server or check of file_upload extension in your php.

if uncommenting the default charset line in php.ini does something, ot will be easy to fix.
remember to bounce apache after changing.

I don't think you should be using mb_detect_encoding to determine the encoding in this case.
If you must use it, then maybe you need to set the detection order to make sure UTF-8 is higher up the list, see http://www.php.net/manual/en/function.mb-detect-order.php
You've set the form's accept-charset to UTF-8; you've set the original page to UTF-8: all current browsers will send UTF-8. HTML 5 specifies this FWIW: http://www.w3.org/TR/2011/WD-html5-20110405/association-of-controls-and-forms.html#multipart-form-data
The string will be UTF-8, don't attempt any conversion of it, and you will be fine.
But if you post some of your PHP code then maybe it will be clear what you're trying to do and what's going wrong...

Sorry this is more of an idea for a workaround than actual solution, however if all traditional methods have failed, and you can't reinstall anything, try converting from the UTF8 code points. Something like using a base64 encoding before sending and then decode on receive. Or convert to a hex string and decode after receiving.

You need add headers in PHP and HTML, like lowercase:
<?php header('content-type: text/html; charset=utf-8'); ?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<form method="post" enctype="multipart/form-data" action ="self.php">
...
</form>
</body>
</html>
Remember: Save all php and html files in utf-8 Without BOM.

Your example page looks correct and the steps you have taken seem to cover most of the important points, there is one more thing i would check though. You wrote that the data is stored in a MySql database with UTF-8 charset, but this doesn't necessarily mean, that the PHP connection object works with this charset too.
// tells the mysqli connection to deliver UTF-8 encoded strings.
$db = new mysqli($dbHost, $dbUser, $dbPassword, $dbName);
$db->set_charset('utf8');
// tells the pdo connection to deliver UTF-8 encoded strings.
$dsn = "mysql:host=$dbHost;dbname=$dbName;charset=utf8";
$db = new PDO($dsn, $dbUser, $dbPassword);
The examples above show how to set the charset for SQLI or PDO. Preparing the connection object this way, makes you independent of the database configuration, if necessary the connection will even convert the returned/sent data.
To test this in your page, make sure that the charset is set, before inserting/querying the database.

mb_internal_encoding("UTF-8");
Add this code before your string..

After a long time trying with unpack() and the proposals from the answers here, I found a pitfall, and maybe you have the same reason for the encoding problem.
All I had to do was making htmlentities using utf-8 explicitly:
htmlentities(stripslashes(trim(rtrim($_POST['title']))), ENT_COMPAT, "utf-8");
This is for php 5.2.xx

Related

can't save arabic in php variable?

I am really really knew to php .
I have been doing asp.net etc earlier , and this php appears a whole lot different.
I am using drupal 7 , and the project has been already made.
I was told to do something very trivial but I am unable to do so. It is regarding arabic.
I declare a simple variable like $simpleText = "شهس ". Then i do drupal_set_message($simpleText).
What I then see on the web browser are ??? instead of the arabic . I have confirmed that the content type of the page is set to UTF-8. This is the meta tag of the rendered HTML on browser
Can you please help me identify how to eradicate this issue ?
Thanks.
I completed many PHP projects (including Drupal projects) and there is nothing wrong with PHP and Arabic :)
add this to your HTML and it should work just fine
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
or using PHP do this before releasing any output
header('Content-Type: text/html; charset=utf-8');
You need to set this line in connection string:
mysql_query("set character_set_server='utf8'");
mysql_query("set names 'utf8'");
You should save your file with UTF-8 encoding. If you are using Visual Studio, use "Advanced Save Options" with encoding: "Unicode (UTF-8 without signature) - Codepage 65001"

php - foreign characters not showing correctly

I am trying to do so foreign characters is showing correctly at my website.
When I try to write: "Português" it will output this:
Português
The code I use is:
$name = htmlspecialchars(stripslashes($f['forum_name']));
I also tried this:
$name = html_entity_decode(stripslashes(stripslashes($f['forum_desc'])));
But that gave me:
Português
What am I doing wrong?
Edit: $f is coming from this:
$sf=mysql_query("SELECT * FROM forum_cats WHERE forum_type='0' AND forum_type_id='".$h['forum_id']."'");
First, make sure your PHP program file is saved with UTF-8 encoding. (a decent editor should allow you to set the encoding)
Second, make sure that your HTML code specifies UTF-8 encoding: Make sure you have the following meta tag in your HTML head:
<meta charset="UTF-8">
Thirdly, throw away all that entity decoding and especially throw away the stripslashes().
You may also need to do further work to make sure that everything in your system is using UTF-8 encoding (eg the database, other input files).
Make use of utf-8 decode
<?php
echo utf8_decode("Português");//Português
EDIT : (From your latest question update)
Add this on top of your PHP code.
<?php
ini_set('default_charset','utf-8');
mysql_set_charset('utf8');
header('Content-type: text/html; charset=utf-8');
Try this:
<?php echo iconv(mb_detect_encoding($f['forum_name'], "UTF-8,ISO-8859-1"), "UTF-8", $f['forum_name']); ?>
Use mb_detect_encoding() to detect the charset type of your strings and iconv() to convert string to requested character encoding.
You can refer mb_detect_encoding and iconv on official documentation site.

Why does my output change?

I'm working with UTF-8 encoding in PHP and I keep managing to get the output just as I want it. And then without anything happening with the code, the output all of a sudden changes.
Previously I was getting hebrew output. Now I'm getting "&&&&&".
Any ideas what might be causing this?
These are most common problems:
Your editor that you’re creating the PHP/HTML files in
The web browser you are viewing your site through
Your PHP web application running on the web server
The MySQL database
Anywhere else external you’re reading/writing data from (memcached, APIs, RSS feeds, etc)
And few things you can try:
Configuring your editor
Ensure that your text editor, IDE or whatever you’re writing the PHP code in saves your files in UTF-8 format. Your FTP client, scp, SFTP client doesn’t need any special UTF-8 setting.
Making sure that web browsers know to use UTF-8
To make sure your users’ browsers all know to read/write all data as UTF-8 you can set this in two places.
The content-type tag
Ensure the content-type META header specifies UTF-8 as the character set like this:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
The HTTP response headers
Make sure that the Content-Type response header also specifies UTF-8 as the character-set like this:
ini_set('default_charset', 'utf-8')
Configuring the MySQL Connection
Now you know that all of the data you’re receiving from the users is in UTF-8 format we need to configure the client connection between the PHP and the MySQL database.
There’s a generic way of doing by simply executing the MySQL query:
SET NAMES utf8;
…and depending on which client/driver you’re using there are helper functions to do this more easily instead:
With the built in mysql functions
mysql_set_charset('utf8', $link);
With MySQLi
$mysqli->set_charset("utf8")
*With PDO_MySQL (as you connect)*
$pdo = new PDO(
'mysql:host=hostname;dbname=defaultDbName',
'username',
'password',
array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")
);
The MySQL Database
We’re pretty much there now, you just need to make sure that MySQL knows to store the data in your tables as UTF-8. You can check their encoding by looking at the Collation value in the output of SHOW TABLE STATUS (in phpmyadmin this is shown in the list of tables).
If your tables are not already in UTF-8 (it’s likely they’re in latin1) then you’ll need to convert them by running the following command for each table:
ALTER TABLE myTable CHARACTER SET utf8 COLLATE utf8_general_ci;
One last thing to watch out for
With all of these steps complete now your application should be free of any character set problems.
There is one thing to watch out for, most of the PHP string functions are not unicode aware so for example if you run strlen() against a multi-byte character it’ll return the number of bytes in the input, not the number of characters. You can work round this by using the Multibyte String PHP extension though it’s not that common for these byte/character issues to cause problems.
Taken form here: http://webmonkeyuk.wordpress.com/2011/04/23/how-to-avoid-character-encoding-problems-in-php/
Try after setting the content type with header like this
header('Content-Type: text/html; charset=utf-8');
Try this function - >
$html = "Bla Bla Bla...";
$html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8");
for more - http://php.net/manual/en/function.mb-convert-encoding.php
I put together this method and called it in the file I'm working with, and that seemed to resolve the issue.
function setutf_8()
{
header('content-type: text/html; charset: utf-8');
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');
}
Thank you for all your help! :)

Same dataset outputs different characters : phpmyadmin / own query

Im trying to get a some data from the db , but the output isn't what i expected.
Doing my own querying on the db , i get this output : string 'C�te d�Ivoire' (length=13)
Querying the db from phpmyadmin i get normal output : Côte d’Ivoire
php.ini default charset, mysql db default charset , <meta> charset are all set to utf-8 .
I can't fugire it out where the encoding is being made that i get different output with same configuration .
P.S. : using mysqli driver .
In the same page that gives you wrong results, try first running this instruction
print base64_encode("Côte");
The correct answer is Q8O0dGU.... If you get something else, like Q/R0ZQo..., this means that your script is working with another charset (here Latin-1) instead of UTF-8. It's still possible that also MySQL and also the browser are playing tricks, but the line above ensures that PHP and/or your editor are playing you false.
Next, extract Côte from the database and output its base64_encode. If you see Q8O0..., then the connection between MySQL and PHP is safely UTF8. If not, then whatever else might also be needed, you need to change the MySQL charset (SET NAMES utf8 and/or ALTER of table and database collation).
If PHP is UTF8, and MySQL is UTF8, and still you see invalid characters, then it's something between PHP and the browser. Verify that the content type header is sent correctly; if not, try sending it yourself as first thing in the script:
Header('Content-Type: text/html; charset=UTF8');
For example in Apache configuration you should have
AddDefaultCharset utf-8
Verify also that your browser is not set to override both server charset and auto-detection.
NOTE: as a rule of thumb, if you get a single diamond with a question mark instead of a UTF8 international character, this means that an UTF8 reader received an invalid UTF8 code point. In other words, the entity showing the diamond (your browser) is expecting UTF8, but is receiving something else, for example Latin1 a.k.a. ISO-8859-15.
Another difficult-to-track way of getting that error is if the output somehow contains a byte order mark (BOM). This may happen if you create a file such as
###<?php
Header("Content-Type: text/html; charset=UTF8");
?>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF8" />
</head>
<body>
Hellò, world!
</body>
</html>
where that ### is an (invisible in most editors) UTF8 BOM. To remove it, you either need to save the file as "without BOM" if the editor allows it, or use a different editor.
If you do your "own querying" with the command line tool mysql, you have to set the option --default-character-set=utf8, too. Otherwise, please tell us how you do your own querying.

php using utf8 without utf_encode

i'm running a german website which gets content from a mysql database.
i've defined the charset as utf8 as following:
<meta http-equiv='Content-Type' content='text/html;charset=utf-8' />
the problem is, when fetching + displaying contents from the database i always need to use utf8_encode in order to get the proper german "umlauts".
i want to maintain the utf8 charset for my web as i'll have to add more languages which have special characters.
any ideas on how to 1:1 echo database contents without having to utf8_encode?
thanks
Hard to tell without seeing how you are connecting to your database, but a common problem is the database connection itself.
After opening / selecting the database you need to set:
$db->exec('SET CHARACTER SET utf8'); // PDO
mysql_set_charset('utf8'); // Deprecated mysql_* extension
Whenever I want to use utf-8 with PHP and MySQL, I found that usually these two functions are the ones you should use after mysql_connect():
mysql_set_charset('utf8', $link);
mysql_query('SET NAMES utf8', $link);
Setting the content type in the header may do the trick:
header('content-type: text/html; charset=utf-8');
I had a similar problem and i solve adding this in the beginning of my PHP file:
ini_set('default_charset', 'UTF-8');
mb_internal_encoding('UTF-8');
Additionally, is very important to check if you are saving your PHP file in UTF-8 format without BOM, i had a big headache with this. I recomend Notepad++, it shows the current file encoding and allow you to convert to UTF-8 without BOM if necessary.
If you would like to see my problem and solution, it is here.
Hope it can help you!

Categories