Portuguese charset problem - php

I’m a headache with the damn charset.
Portuguese charset=iso-8859-1
On my HTML I have:
<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1" />
On my config.php:
$config['charset'] = 'ISO-8859-1';
I have the word ‘café’, coffee.
It is been displayed like: cafŽ.
Any ideas?!
Thanks in advance for any help
**Edit
I don't know if it matters but I'm using Eclipse

What's the encoding of the file in Eclipse set to? Right-Klick on the file in Eclipse, check under "Properties". It must be the same as in your meta-tag.

thanks so much, i believe your answer is the best one:
$string = 'café';
utf8_decode($string);
OR
$string = 'café';
utf8_encode($string);
with meta charset in the header of each file, the issue of portugues characters will be solved.

Why don't you switch to UTF-8?
edit You might also want to switch to using entities.
é would be the é
http://www.w3schools.com/tags/ref_entities.asp

I would look at the default charset in the browser first, it could be set to ISO-8859-15 or UTF8. I have had the reverse problem of my browser encoding was set to ISO-8859-1 instead of UTF8.
Secondly is this data static or coming from a database? If it is from mySQL for example, check the collation of the database, is it in latin1 or utf8?
If coming from a UTF8 collated database (or not - as you're using PHP) you can try
$string = 'café';
utf8_decode($string);
OR
$string = 'café';
utf8_encode($string);
Moving to UTF8 may be a good idea because functions like PHPs utf8_encode() and utf8_decode() but if it's not appropriate to your market then that is that.
If the utf8_encode or utf8_decode functions work, you should look at your input method and input encoding as you will likely find a problem there.
P.S. I have the same problems from time to time being in Brazil... I feel your pain mate!

Try this one here:
$string = 'café';
htmlentities($string, ENT_COMPAT, 'utf-8');
Take care!

Go to the resource bundle in the project explorer and then right click on that file and change char set to utf=8 and then save the settings.

Related

Converting t’to apostrophe in php [duplicate]

I've tried converting the text to or from utf8, which didn't seem to help.
I'm getting:
"It’s Getting the Best of Me"
It should be:
"It’s Getting the Best of Me"
I'm getting this data from this url.
To convert to HTML entities:
<?php
echo mb_convert_encoding(
file_get_contents('http://www.tvrage.com/quickinfo.php?show=Surviver&ep=20x02&exact=0'),
"HTML-ENTITIES",
"UTF-8"
);
?>
See docs for mb_convert_encoding for more encoding options.
Make sure your html header specifies utf8
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
That usually does the trick for me (obviously if the content IS utf8).
You don't need to convert to html entities if you set the content-type.
Your content is fine; the problem is with the headers the server is sending:
Connection:Keep-Alive
Content-Length:502
Content-Type:text/html
Date:Thu, 18 Feb 2010 20:45:32 GMT
Keep-Alive:timeout=1, max=25
Server:Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.7 with Suhosin-Patch
X-Powered-By:PHP/5.2.4-2ubuntu5.7
Content-Type should be set to Content-type: text/plain; charset=utf-8, because this page is not HTML and uses the utf-8 encoding. Chromium on Mac guesses ISO-8859-1 and displays the characters you're describing.
If you are not in control of the site, specify the encoding as UTF-8 to whatever function you use to retrieve the content. I'm not familiar enough with PHP to know how exactly.
I know the question was answered but setting meta tag didn't help in my case and selected answer was not clear enough, so I wanted to provide simpler answer.
So to keep it simple, store string into a variable and process that like this
$TVrageGiberish = "It’s Getting the Best of Me";
$notGiberish = mb_convert_encoding($TVrageGiberish, "HTML-ENTITIES", 'UTF-8');
echo $notGiberish;
Which should return what you wanted It’s Getting the Best of Me
If you are parsing something, you can perform conversion while assigning values to a variable like this, where $TVrage is array with all the values, XML in this example from a feed that has tag "Title" which may contain special characters such as ‘ or ’.
$cleanedTitle = mb_convert_encoding($TVrage->title, "HTML-ENTITIES", 'UTF-8');
If you're here because you're experiencing issues with junk characters in your WordPress site, try this:
Open wp-config.php
Comment out define('DB_CHARSET', 'utf8') and define('DB_COLLATE', '')
/** MySQL hostname */
define('DB_HOST', 'localhost');
/** Database Charset to use in creating database tables. */
//define('DB_CHARSET', 'utf8');
/** The Database Collate type. Don't change this if in doubt. */
//define('DB_COLLATE', '');
It sounds like you're using standard string functions on a UTF8 characters (’) that doesn't exist in ISO 8859-1. Check that you are using Unicode compatible PHP settings and functions. See also the multibyte string functions.
We had success going the other direction using this:
mb_convert_encoding($text, "HTML-ENTITIES", "ISO-8859-1");
Just try this
if $text contains strange charaters do this:
$mytext = mb_convert_encoding($text, "HTML-ENTITIES", 'UTF-8');
and you are done..
if all seems not to work, this could be your best solution.
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
==or==
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
try this :
html_entity_decode(mb_convert_encoding(stripslashes($text), "HTML-ENTITIES", 'UTF-8'))
For fopen and file_put_contents, this will work:
str_replace("’", "'", htmlspecialchars_decode(mb_convert_encoding($string_to_be_fixed, "HTML-ENTITIES", "UTF-8")));
You Should check encode encoding origin then try to convert to correct encode type.
In my case, I read csv files then import to db. Some files displays well some not. I check encoding and see that file with encoding ASCII displays well, other file with UTF-8 is broken. So I use following code to convert encoding:
if(mb_detect_encoding($content) == 'UTF-8') {
$content = iconv("UTF-8", "ASCII//TRANSLIT", $content);
file_put_contents($file_path, $content);
} else {
$content = mb_convert_encoding($content, 'UTF-8', 'UTF-8');
file_put_contents($file_path, $content);
}
After convert I push the content to file then process import to DB, now it displays well in front-end
If none of the above solutions work:
In my case I noticed that the single quote was a different style of single quote. Instead of ' my data had a ’. Notice the difference in the single quote? So I simply wrote a str_replace to replace it and it fixed the problem. Probably not the most elegant solution but it got the job done.
$string= str_replace("’","'",$string);
I looked at the link, and it looks like UTF-8 to me. i.e., in Firefox, if you pick View, Character Encoding, UTF-8, it will appear correctly.
So, you just need to figure out how to get your PHP code to process that as UTF-8. Good luck!
use this
<meta http-equiv="Content-Type" content="text/html; charset=utf8_unicode_ci" />
instead of this
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
If nothing works try this mb_convert_encoding($elem->textContent, 'UTF-8', 'utf8mb4');

UTF8 character encoding, htmlentities and Cross Site Scripting

I am currently overworking my website and would like to make consequent usage of correct character encoding since I had some with this trouble so far.
My HTML and PHP files are encoded with UTF8, my HTML head looks like this:
<!DOCTYPE html><html><head><meta charset="utf-8" />.....
I use MySQLi and configured my connection like this:
$DB->set_charset("utf8");
My database is using utf8_general_ci collation.
So I think I set up everything correctly.
Now my problem:
To avoid cross site scripting I everytimes use htmlentities() when displaying data the user has saved into my database in order to break HTML characters like < > " '
echo(htmlentities($str, ENT_QUOTES, "UTF-8"));
But when $str contains other special characters like ÄäÖöÜü&ßàé they also get broken and my browser only displays only & Auml; and so on...
Where is my fault? What is the correct method to achieve safety against xss and display utf-8 characters? I think I could need a complete crash course within these topics.
This solution does not match my problem because I already use this parameter.
I would be very thankful if you could help me with this issue.
Best regards
Lukas
I could fix this issue by switching to htmlspecialchars. I think this function does enough for security against xss.

My wordpress the_content is giving me trash characters "£45", not sure where to check for encoding

I'm picking up trash characters whenever I use a £ symbol, I'm assuming something is wrong with my encoding.
I'm already using:
<meta charset="utf-8">
I have AddDefaultCharset utf-8 in my apache2.conf
Oh yea and I'm doing some filtering on the_content but I've applied this:
$html = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
I should also state that this doesn't happen locally.
Is there anywhere else I need to declare encoding?
When I have come across this issue I ended up doing:
$str = str_replace("£", "£", $str);
I'm sure this isn't the best way to do it, though.
Edit: Make sure your database is set to utf8_general_ci

Getting ’ instead of an apostrophe(') in PHP

I've tried converting the text to or from utf8, which didn't seem to help.
I'm getting:
"It’s Getting the Best of Me"
It should be:
"It’s Getting the Best of Me"
I'm getting this data from this url.
To convert to HTML entities:
<?php
echo mb_convert_encoding(
file_get_contents('http://www.tvrage.com/quickinfo.php?show=Surviver&ep=20x02&exact=0'),
"HTML-ENTITIES",
"UTF-8"
);
?>
See docs for mb_convert_encoding for more encoding options.
Make sure your html header specifies utf8
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
That usually does the trick for me (obviously if the content IS utf8).
You don't need to convert to html entities if you set the content-type.
Your content is fine; the problem is with the headers the server is sending:
Connection:Keep-Alive
Content-Length:502
Content-Type:text/html
Date:Thu, 18 Feb 2010 20:45:32 GMT
Keep-Alive:timeout=1, max=25
Server:Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.7 with Suhosin-Patch
X-Powered-By:PHP/5.2.4-2ubuntu5.7
Content-Type should be set to Content-type: text/plain; charset=utf-8, because this page is not HTML and uses the utf-8 encoding. Chromium on Mac guesses ISO-8859-1 and displays the characters you're describing.
If you are not in control of the site, specify the encoding as UTF-8 to whatever function you use to retrieve the content. I'm not familiar enough with PHP to know how exactly.
I know the question was answered but setting meta tag didn't help in my case and selected answer was not clear enough, so I wanted to provide simpler answer.
So to keep it simple, store string into a variable and process that like this
$TVrageGiberish = "It’s Getting the Best of Me";
$notGiberish = mb_convert_encoding($TVrageGiberish, "HTML-ENTITIES", 'UTF-8');
echo $notGiberish;
Which should return what you wanted It’s Getting the Best of Me
If you are parsing something, you can perform conversion while assigning values to a variable like this, where $TVrage is array with all the values, XML in this example from a feed that has tag "Title" which may contain special characters such as ‘ or ’.
$cleanedTitle = mb_convert_encoding($TVrage->title, "HTML-ENTITIES", 'UTF-8');
If you're here because you're experiencing issues with junk characters in your WordPress site, try this:
Open wp-config.php
Comment out define('DB_CHARSET', 'utf8') and define('DB_COLLATE', '')
/** MySQL hostname */
define('DB_HOST', 'localhost');
/** Database Charset to use in creating database tables. */
//define('DB_CHARSET', 'utf8');
/** The Database Collate type. Don't change this if in doubt. */
//define('DB_COLLATE', '');
It sounds like you're using standard string functions on a UTF8 characters (’) that doesn't exist in ISO 8859-1. Check that you are using Unicode compatible PHP settings and functions. See also the multibyte string functions.
We had success going the other direction using this:
mb_convert_encoding($text, "HTML-ENTITIES", "ISO-8859-1");
Just try this
if $text contains strange charaters do this:
$mytext = mb_convert_encoding($text, "HTML-ENTITIES", 'UTF-8');
and you are done..
if all seems not to work, this could be your best solution.
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
==or==
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
try this :
html_entity_decode(mb_convert_encoding(stripslashes($text), "HTML-ENTITIES", 'UTF-8'))
For fopen and file_put_contents, this will work:
str_replace("’", "'", htmlspecialchars_decode(mb_convert_encoding($string_to_be_fixed, "HTML-ENTITIES", "UTF-8")));
You Should check encode encoding origin then try to convert to correct encode type.
In my case, I read csv files then import to db. Some files displays well some not. I check encoding and see that file with encoding ASCII displays well, other file with UTF-8 is broken. So I use following code to convert encoding:
if(mb_detect_encoding($content) == 'UTF-8') {
$content = iconv("UTF-8", "ASCII//TRANSLIT", $content);
file_put_contents($file_path, $content);
} else {
$content = mb_convert_encoding($content, 'UTF-8', 'UTF-8');
file_put_contents($file_path, $content);
}
After convert I push the content to file then process import to DB, now it displays well in front-end
If none of the above solutions work:
In my case I noticed that the single quote was a different style of single quote. Instead of ' my data had a ’. Notice the difference in the single quote? So I simply wrote a str_replace to replace it and it fixed the problem. Probably not the most elegant solution but it got the job done.
$string= str_replace("’","'",$string);
I looked at the link, and it looks like UTF-8 to me. i.e., in Firefox, if you pick View, Character Encoding, UTF-8, it will appear correctly.
So, you just need to figure out how to get your PHP code to process that as UTF-8. Good luck!
use this
<meta http-equiv="Content-Type" content="text/html; charset=utf8_unicode_ci" />
instead of this
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
If nothing works try this mb_convert_encoding($elem->textContent, 'UTF-8', 'utf8mb4');

PHP output showing little black diamonds with a question mark

I'm writing a php program that pulls from a database source. Some of the varchars have quotes that are displaying as black diamonds with a question mark in them (�, REPLACEMENT CHARACTER, I assume from Microsoft Word text).
How can I use php to strip these characters out?
If you see that character (� U+FFFD "REPLACEMENT CHARACTER") it usually means that the text itself is encoded in some form of single byte encoding but interpreted in one of the unicode encodings (UTF8 or UTF16).
If it were the other way around it would (usually) look something like this: ä.
Probably the original encoding is ISO-8859-1, also known as Latin-1. You can check this without having to change your script: Browsers give you the option to re-interpret a page in a different encoding -- in Firefox use "View" -> "Character Encoding".
To make the browser use the correct encoding, add an HTTP header like this:
header("Content-Type: text/html; charset=ISO-8859-1");
or put the encoding in a meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Alternatively you could try to read from the database in another encoding (UTF-8, preferably) or convert the text with iconv().
I also faced this � issue. Meanwhile I ran into three cases where it happened:
substr()
I was using substr() on a UTF8 string which cut UTF8 characters, thus the cut chars could not be displayed correctly. Use mb_substr($utfstring, 0, 10, 'utf-8'); instead. Credits
htmlspecialchars()
Another problem was using htmlspecialchars() on a UTF8 string. The fix is to use: htmlspecialchars($utfstring, ENT_QUOTES, 'UTF-8');
preg_replace()
Lastly I found out that preg_replace() can lead to problems with UTF. The code $string = preg_replace('/[^A-Za-z0-9ÄäÜüÖöß]/', ' ', $string); for example transformed the UTF string "F(×)=2×-3" into "F � 2� ". The fix is to use mb_ereg_replace() instead.
I hope this additional information will help to get rid of such problems.
This is a charset issue. As such, it can have gone wrong on many different levels, but most likely, the strings in your database are utf-8 encoded, and you are presenting them as iso-8859-1. Or the other way around.
The proper way to fix this problem, is to get your character-sets straight. The simplest strategy, since you're using PHP, is to use iso-8859-1 throughout your application. To do this, you must ensure that:
All PHP source-files are saved as iso-8859-1 (Not to be confused with cp-1252).
Your web-server is configured to serve files with charset=iso-8859-1
Alternatively, you can override the webservers settings from within the PHP-document, using header.
In addition, you may insert a meta-tag in you HTML, that specifies the same thing, but this isn't strictly needed.
You may also specify the accept-charset attribute on your <form> elements.
Database tables are defined with encoding as latin1
The database connection between PHP to and database is set to latin1
If you already have data in your database, you should be aware that they are probably messed up already. If you are not already in production phase, just wipe it all and start over. Otherwise you'll have to do some data cleanup.
A note on meta-tags, since everybody misunderstands what they are:
When a web-server serves a file (A HTML-document), it sends some information, that isn't presented directly in the browser. This is known as HTTP-headers. One such header, is the Content-Type header, which specifies the mimetype of the file (Eg. text/html) as well as the encoding (aka charset).
While most webservers will send a Content-Type header with charset info, it's optional. If it isn't present, the browser will instead interpret any meta-tags with http-equiv="Content-Type". It's important to realise that the meta-tag is only interpreted if the webserver doesn't send the header. In practice this means that it's only used if the page is saved to disk and then opened from there.
This page has a very good explanation of these things.
As mentioned in earlier answers, it is happening because your text has been written to the database in iso-8859-1 encoding, or any other format.
So you just need to convert the data to utf8 before outputting it.
$text = “string from database”;
$text = utf8_encode($text);
echo $text;
To make sure your MYSQL connection is set to UTF-8 (or latin1, depending on what you're using), you can do this to:
$con = mysql_connect("localhost","username","password");
mysql_set_charset('utf8',$con);
or use this to check what charset you are using:
$con = mysql_connect("localhost","username","password");
$charset = mysql_client_encoding($con);
echo "The current character set is: $charset\n";
More info here: http://php.net/manual/en/function.mysql-set-charset.php
I chose to strip these characters out of the string by doing this -
ini_set('mbstring.substitute_character', "none");
$text= mb_convert_encoding($text, 'UTF-8', 'UTF-8');
Just Paste This Code In Starting to The Top of Page.
<?php
header("Content-Type: text/html; charset=ISO-8859-1");
?>
Based on your description of the problem, the data in your database is almost certainly encoded as Windows-1252, and your page is almost certainly being served as ISO-8859-1. These two character sets are equivalent except that Windows-1252 has 16 extra characters which are not present in ISO-8859-1, including left and right curly quotes.
Assuming my analysis is correct, the simplest solution is to serve your page as Windows-1252. This will work because all characters that are in ISO-8859-1 are also in Windows-1252. In PHP you can change the encoding as follows:
header('Content-Type: text/html; charset=Windows-1252');
However, you really should check what character encoding you are using in your HTML files and the contents of your database, and take care to be consistent, or convert properly where this is not possible.
Add this function to your variables
utf8_encode($your variable);
Try This Please
mb_substr($description, 0, 490, "UTF-8");
This will help you. Put this inside <head> tag
<meta charset="iso-8859-1">
That can be caused by unicode or other charset mismatch. Try changing charset in your browser, in of the settings the text will look OK. Then it's question of how to convert your database contents to charset you use for displaying. (Which can actually be just adding utf-8 charset statement to your output.)
what I ended up doing in the end after I fixed my tables was to back it up and change back the settings to utf-8 then I altered my dump file so that DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci are my character set entries
now I don't have characterset issues anymore because the database and browser are utf8.
I figured out what caused it. It was the web page+browser effects on the DB. On the terminals that are linux (ubuntu+firefox) it was encoding the database in latin1 which is what the tabes are set. But on the windows 10+edge terminals, the entries were force coded into utf8. Also I noticed the windows 10 has issues staying with latin1 so I decided to bend with the wind and convert all to utf8.
I figured it was a windows 10 issue because we started using win 10 terminals.
so yet again microsoft bugs causes issues. I still don't know why the encoding changes on the forms because the browser in windows 10 shows the latin1 characterset but when it goes in its utf8 encoded and I get the data anomaly. but in linux+firefox it doesn't do that.
This happened to work in my case:
$text = utf8_decode($text)
I turns the black diamond character into a question mark so you can:
$text = str_replace('?', '', utf8_decode($text));
Just add these lines before headers.
Accurate format of .doc/docx files will be retrieved:
if(ini_get('zlib.output_compression'))
ini_set('zlib.output_compression', 'Off');
ob_clean();
When you extract data from anywhere you should use functions with the prefix md_FUNC_NAME.
Had the same problem it helped me out.
Or you can find the code of this symbol and use regexp to delete these symbols.
You can also change the caracter set in your browser. Just for debug reasons.
Using the same charset (as suggested here) in both the database and the HTML has not worked for me... So remembering that the code is generated as HTML, I chose to use the "(HTML code) or the " (ISO Latin-1 code) in my database text where quotes were used. This solved the problem while providing me a quotation mark. It is odd to note that prior to this solution, only some of the quotation marks and apostrophes did not display correctly while others did, however, the special code did work in all instances.
I ran the "detect encoding" code after my collation change in phpmyadmin and now it comes up as Latin_1.
but here is something I came across looking a different data anomaly in my application and how I fixed it:
I just imported a table that has mixed encoding (with diamond question marks in some lines, and all were in the same column.) so here is my fix code. I used utf8_decode process that takes the undefined placeholder and assigns a plain question mark in the place of the "diamond question mark " then I used str_replace to replace the question mark with a space between quotes.
here is the
[code]
include 'dbconnectfile.php';
//// the variable $db comes from my db connect file
/// inx is my auto increment column
/// broke_column is the column I need to fix
$qwy = "select inx,broke_column from Table ";
$res = $db->query($qwy);
while ($data = $res->fetch_row()) {
for ($m=0; $m<$res->field_count; $m++) {
if ($m==0){
$id=0;
$id=$data[$m];
echo $id;
}else if ($m==1){
$fix=0;
$fix=$data[$m];
$fix = utf8_decode($fix);
$fixx =str_replace("?"," ",$fix);
echo $fixx;
////I echoed the data to the screen because I like to see something as I execute it :)
}
}
$insert= "UPDATE Table SET broke_column='".$fixx."' where inx='".$id."'";
$insresult= $db->query($insert);
echo"<br>";
}
?>
For global purposes.
Instead of converting, codifying, decodifying each text I prefer to let them as they are and instead change the server php settings.
So,
Let the diamonds
From the browser, on the view menu select
"text encoding" and find the one which let's you see your text
correctly.
Edit your php.ini and add:
default_charset = "ISO-8859-1"
or instead of ISO-8859 the one which fits your text encoding.
Go to your phpmyadmin and select your database and just increase the length/value of that table's field to 500 or 1000 it will solve your problem.

Categories