This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 7 years ago.
I've looked at all the relevant topics on the site and this particular gotcha does not seem to be covered
I exported a Word 2016 document (in French) as filtered HTML ( filtered= no office specific stuff included). If I show the file in my browser as html, everything is fine - all the accents show correctly. ( Charset is utf-8 and it's not coming from a database). But when I change the extension to .php and run it, all the french characters are shown as black diamonds with a question mark inside.
If I express the french characters as html entities, they show correctly, but I don't want to do this as the fix - there are hundreds of them in there, and I don't want to edit the text - it's not mine and the author would have to proof read it all again just to check the accents.
So I figured it's a PHP (5.5.26) issue - but I can't see anything in the ini file which might affect this - it looks like UTF-8 is the default charset if you don't change anything,
What's the fix ??
If this is a problem with just one file, you can use the following:
<?php
header('Content-Type: text/html; charset=iso-8859-1');
?>
On the html part add or change:
<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1 />
If you've thousands of files and you don't want to change them manually you can try to modify the following line on your php.ini:
default_charset = "utf-8"
to
default_charset = "iso-8859-1";
Save and restart the server.
Related
The Issue
I've been having some trouble with what I think is a UTF-8 encoding issue where posts are not being saved to my database.
The issue occurs when a user copy and pastes text from MS Word. There seems to be a particular combination of characters causing this issue (I've not found any other variations which cause the same issue yet):
% b
% B
This means that, when I var_dump() my input I get:
string(5) "70�ck"
Instead of:
string(5) "70% back"
Edit: The database error I get is:
Incorrect string value: '\xBAck an...' for column [...]
What I've tried
I'm using the Summernote JS plugin. I've tried a different plugin (WYSIHTML5) and I've tried with no plugin at all. I've tried pasting the clipboard text as plain text. I've even got an onPaste callback on the summernote which strips all the stupid encoding/styling from MS Word (which is summernote specific issue I think).
Unfortunately I've not been able to get anywhere with searching 'encoding issue "% b"' and variations thereof... but I would presume that the combination of characters above is somehow getting translated into a character that is unsupported by the database...
Database is MySQL 5.7.10 and I'm using utf8_general_ci collation on all columns.
I've set the charset to UTF-8 within CodeIgniter: $config['charset'] = 'UTF-8';
Within CodeIgniter's database config I've specified 'char_set' => 'uft8', 'dbcollat' => 'utf8_general_ci'
The page's meta tag is set to use utf-8: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
The form has the accept-charset="utf-8" attribute
Update: I've also tried the solution suggested in this question
I think I've done all the usual troubleshooting and I'm a bit stuck. Does anyone know why this specific combination of characters causes issue? Perhaps I'm wrong and it's not an encoding issue at all? Does anyone have any other ideas?
You should look into doing more on the front-end side. Try setting the encoding on the form, as most browsers should then only send UTF-8 to your server
<form ... accept-charset="UTF-8">
...
</form>
See this answer for more detail
Also, if you are using an editor, check out Quill, which allows pasting from word.
I'm using https://github.com/farjadtahir/pdf-invoicr.
Problem is when I add diacritics to $invoice->setFrom() (or anywhere else) diacritics not showing up.
I tried $invoice->setFrom(array(iconv("UTF-8", "ISO-8859-1","ÆØÅ")from this comments
but still diacritics not working.
Next I tried https://stackoverflow.com/a/21555497/2893691 again not working.
So, how to finally convert ľščťžýáíé to UTF-8 in invoicr?
EDIT - NEW INFO
I'm used mb_detect_encoding() and return is UTF-8 already. But when I try show for example string ičo123 the result is empty. Not showing.
I tried add header('Content-Type: text/html; charset=utf-8'); and still diacritics not working.
EDIT 2 - NEW INFO
I tried this script http://www.fpdf.org/en/script/script92.php and still not working. Here is screenshot of downloaded example from link above:
Problem solved - Used script from EDIT 2 and removed all iconv() functions from phpinvoice.php file.
First, sorry my english isn't very good.
I have a problem, when I download an Excel File from a Website(direct download) it works on Windows but it isn't working on MAC.
I get the Names and Prenames etc. from a Mysql database.
The german "ä - ü - ö" are not properly converted on MAC.
How can I convert this? Do you know what I mean?
I work with Notepad++.
Programming Language is PHP
Excel version : 2010.
From what you told I suppose you have a PHP script that generates a CSV file with data from your database.
So this sounds like a typical encoding/charset problem to me. You have to define in which encoding you want to store your texts in the database. That's in the most common case UTF-8 these days. For german texts (suppose thats the language because of the umlauts) you could also use ISO-8859-15 encoding.
It's just a guess but in your case I think maybe you did not specify how the browser should interpret the received CSV file.
You normally tell the browser about it in a http header.
Content-Type: text/plain; charset="ISO-8859-15"
or whatever charset you are using (Maybe "UTF-8" instead).
Maybe the PHP header function docu helps you setting the http header.
It's also possible to define the charset in the HTML page. But I think in your case you let the PHP script sends the CSV file and not HTML. But for the record, setting the charset in HTML:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
I am really really knew to php .
I have been doing asp.net etc earlier , and this php appears a whole lot different.
I am using drupal 7 , and the project has been already made.
I was told to do something very trivial but I am unable to do so. It is regarding arabic.
I declare a simple variable like $simpleText = "شهس ". Then i do drupal_set_message($simpleText).
What I then see on the web browser are ??? instead of the arabic . I have confirmed that the content type of the page is set to UTF-8. This is the meta tag of the rendered HTML on browser
Can you please help me identify how to eradicate this issue ?
Thanks.
I completed many PHP projects (including Drupal projects) and there is nothing wrong with PHP and Arabic :)
add this to your HTML and it should work just fine
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
or using PHP do this before releasing any output
header('Content-Type: text/html; charset=utf-8');
You need to set this line in connection string:
mysql_query("set character_set_server='utf8'");
mysql_query("set names 'utf8'");
You should save your file with UTF-8 encoding. If you are using Visual Studio, use "Advanced Save Options" with encoding: "Unicode (UTF-8 without signature) - Codepage 65001"
Im trying to get a some data from the db , but the output isn't what i expected.
Doing my own querying on the db , i get this output : string 'C�te d�Ivoire' (length=13)
Querying the db from phpmyadmin i get normal output : Côte d’Ivoire
php.ini default charset, mysql db default charset , <meta> charset are all set to utf-8 .
I can't fugire it out where the encoding is being made that i get different output with same configuration .
P.S. : using mysqli driver .
In the same page that gives you wrong results, try first running this instruction
print base64_encode("Côte");
The correct answer is Q8O0dGU.... If you get something else, like Q/R0ZQo..., this means that your script is working with another charset (here Latin-1) instead of UTF-8. It's still possible that also MySQL and also the browser are playing tricks, but the line above ensures that PHP and/or your editor are playing you false.
Next, extract Côte from the database and output its base64_encode. If you see Q8O0..., then the connection between MySQL and PHP is safely UTF8. If not, then whatever else might also be needed, you need to change the MySQL charset (SET NAMES utf8 and/or ALTER of table and database collation).
If PHP is UTF8, and MySQL is UTF8, and still you see invalid characters, then it's something between PHP and the browser. Verify that the content type header is sent correctly; if not, try sending it yourself as first thing in the script:
Header('Content-Type: text/html; charset=UTF8');
For example in Apache configuration you should have
AddDefaultCharset utf-8
Verify also that your browser is not set to override both server charset and auto-detection.
NOTE: as a rule of thumb, if you get a single diamond with a question mark instead of a UTF8 international character, this means that an UTF8 reader received an invalid UTF8 code point. In other words, the entity showing the diamond (your browser) is expecting UTF8, but is receiving something else, for example Latin1 a.k.a. ISO-8859-15.
Another difficult-to-track way of getting that error is if the output somehow contains a byte order mark (BOM). This may happen if you create a file such as
###<?php
Header("Content-Type: text/html; charset=UTF8");
?>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF8" />
</head>
<body>
Hellò, world!
</body>
</html>
where that ### is an (invisible in most editors) UTF8 BOM. To remove it, you either need to save the file as "without BOM" if the editor allows it, or use a different editor.
If you do your "own querying" with the command line tool mysql, you have to set the option --default-character-set=utf8, too. Otherwise, please tell us how you do your own querying.