i just started dabbling in php and i'm afraid i need some help to figure out how to manipulate utf-8 strings.
I'm working in ubuntu 11.10 x86, php version 5.3.6-13ubuntu3.2. I have a utf-8 encoded file (vim :set encoding confirms this) which i then proceed to reading it using
$file = fopen("file.txt", "r");
while(!feof($file)){
$line = fgets($file);
//...
}
fclose($file);
using mb_detect_encoding($line) reports UTF-8
If i do echo $line I can see the line properly (no mangled characters) in the browser
so I guess everything is fine with browser and apache. Though i did search my apache configuration for AddDefaultCharset and tried adding http meta-tags for character encoding (just in case)
When i try to split the string using $arr = mb_split(';',$line) the fields of the resulting array contain mangled utf-8 characters (mb_detect_encoding($arr[0]) reports utf-8 as well).
So echo $arr[0] will result in something like this: ΑΘΗÎÎ.
I have tried setting mb_detect_order('utf-8'), mb_internal_encoding('utf-8'), but nothing changed. I also tried to manually detect utf-8 using this w3 perl regex because i read somewhere that mb_detect_encoding can sometimes fail (myth?), but results were the same as well.
So my question is how can i properly split the string? Is going down the mb_ path the wrong way? What am I missing?
Thank you for your help!
UPDATE: I'm adding sample strings and base64 equivalents (thanks to #chris' for his suggestion)
1. original string: "ΑΘΗΝΑ;ΑΙΓΑΛΕΩ;12242;37.99452;23.6889"
2. base64 encoded: "zpHOmM6Xzp3OkTvOkc6ZzpPOkc6bzpXOqTsxMjI0MjszNy45OTQ1MjsyMy42ODg5"
3. first part (the equivalent of "ΑΘΗΝΑ") base64 encoded before splitting: "zpHOmM6Xzp3OkQ=="
4. first part ($arr[0] after splitting): "ΑΘΗÎΑ"
5. first part after splitting base64 encoded: "77u/zpHOmM6Xzp3OkQ=="
Ok, so after doing this there seems to be a 77u/ difference between 3. and 5. which according to this is a utf-8 BOM mark. So how can i avoid it?
UPDATE 2: I woke up refreshed today and with your tips in mind i tried it again. It seems that $line=fgets($file) reads correctly the first line (no mangled chars), and fails for each subsequent line. So then i base64_encoded the first and second line, and the 77u/ bom appeared on the base64'd string of the first line only. I then opened up the offending file in vim, and entered :set nobomb :w to save the file without the bom. Firing up php again showed that the first line was also mangled now. Based on #hakre's remove_utf8_bom i added it's complementary function
function add_utf8_bom($str){
$bom= "\xEF\xBB\xBF";
return substr($str,0,3)===$bom?$str:$bom.$str;
}
and voila each line is read correctly now.
I do not much like this solution, as it seems very very hackish (i can't believe that an entire framework/language does not provide for a way to deal with nobombed strings). So do you know of an alternate approach? Otherwise I'll proceed with the above.
Thanks to #chris, #hakre and #jacob for their time!
UPDATE 3 (solution): It turns out after all that it was a browser thing: it was not enough to add header('Content-type: text/html; charset=UTF-8') and meta-tags like <meta http-equiv="Content-type" value="text/html; charset=UTF-8" />. It also had to be properly enclosed inside an <html><body> section or the browser would not understand the encoding correctly. Thanks to #jake for his suggestion.
Morale of the story: I should learn more about html before trying coding for the browser in the first place. Thanks for your help and patience everyone.
UTF-8 has the very nice feature that it is ASCII-compatible. With this I mean that:
ASCII characters stay the same when encoded to UTF-8
no other characters will be encoded to ASCII characters
This means that when you try to split a UTF-8 string by the semicolon character ;, which is an ASCII character, you can just use standard single byte string functions.
In your example, you can just use explode(';',$utf8encodedText) and everything should work as expected.
PS: Since the UTF-8 encoding is prefix-free, you can actually use explode() with any UTF-8 encoded separator.
PPS: It seems like you try to parse a CSV file. Have a look at the fgetcsv() function. It should work perfectly on UTF-8 encoded strings as long as you use ASCII characters for separators, quotes, etc.
When you write debug/testing scripts in php, make sure you output a more or less valid HTML page.
I like to use a PHP file similar to the following:
<!DOCTYPE html>
<html>
<head>
<meta charset=utf-8>
<title>Test page for project XY</title>
</head>
<body>
<h1>Test Page</h1>
<pre><?php
echo print_r($_GET,1);
?></pre>
</body>
</html>
If you don't include any HTML tags, the browser might interpret the file as a text file and all kinds of weird things could happen. In your case, I assume the browser interpreted the file as a Latin1 encoded text file. I assume it worked with the BOM, because whenever the BOM was present, the browser recognized the file as a UTF-8 file.
Edit, I just read your post closer. You're suggesting this should output false, because you're suggesting a BOM was introduced by mb_split().
header('content-type: text/plain;charset=utf-8');
$s = "zpHOmM6Xzp3OkTvOkc6ZzpPOkc6bzpXOqTsxMjI0MjszNy45OTQ1MjsyMy42ODg5";
$str = base64_decode($s);
$peices = mb_split(';', $str);
var_dump(substr($str, 0, 10) === $peices[0]);
var_dump($peices);
Does it? It works as expected for me( bool true, and the strings in the array are correct)
The mb_splitDocs function should be fine, but you should define the charset it's using as well with mb_regex_encodingDocs:
mb_regex_encoding('UTF-8');
About mb_detect_encodingDocs: it can fail, but that's just by the fact that you can never detect an encoding. You either know it or you can try but that's all. Encoding detection is mostly a gambling game, however you can use the strict parameter with that function and specify the encoding(s) you're looking for.
How to remove the BOM mask:
You can filter the string input and remove a UTF-8 bom with a small helper function:
/**
* remove UTF-8 BOM if string has it at the beginning
*
* #param string $str
* #return string
*/
function remove_utf8_bom($str)
{
if ($bytes = substr($str, 0, 3) && $bytes === "\xEF\xBB\xBF")
{
$str = substr($str, 3);
}
return $str;
}
Usage:
$line = remove_utf8_bom($line);
There are probably better ways to do it, but this should work.
Related
I have a test project which allows upload of various text(subs), then content of these text file gets displayed. Problem occurs when characters used, are not alphabet, i.e Cyrillic as in diacritics as ŠŽČĆ. Characters in text file are ok pre upload, but when i opened a uploaded file on server, all characters ŠŽČĆĐ get replaced by a . Yes, you saw it correctly, it's a rectangle thingy.
I use this line which work great on localhost, but on shared hosting throws a fit.
$temp = iconv(mb_detect_encoding($tmp, mb_detect_order(), true), "UTF-8", utf8_encode($tmp));
Where $temp variable is the string to be decoded.
Is it hosting thing, could i do something to prevent it?
PS: If i don't use utf8_encode on $tmp variable, server throws an error.
Edit1:
First image shows how it looks when file is opened on shared hosting.
And when i copy/paste that thing, it looks like this
sadly it doesn't get rendered on SO. Or lucky, depends of how you look at it ...
Above this sentence is an image, not typed out characters. It is how ever, a text i typed and character that is in a uploaded file copied then pasted when you making post on SO.
Edit2:
I sort a figured it out what's the problem. File is correctly saved as utf8 which contains previously said letters.
When file gets uploaded, these letters get changed to rectangle thing. So when i open file on server, instead ŠŽČĆĐ, i get rectangles. How to prevent server changing anything and to upload as is?
So it's not a formatting thing, athrough setting encoding to utf8 seems to help to at least display it and if i don't set encoding to utf8, it throws an error.
I'm using Laravel as backend.
Edit3:
If i test specific char after being read from file with this
mb_convert_encoding(file($path)[8][9])//It should be **š** character
It shows it's utf8, but if it was it will be shown.
If i try this line:
mb_convert_encoding(file($path)[8][9], "UTF-8", "ISO-8859-1")
then it shows rectangle thing like in file on server.
If i use to detect encoding with additional parameters like:
mb_detect_encoding(file($path)[8], "UTF-8", TRUE);
to determine if it's actual utf8, it says it's false.
And if i paste rectangle thing into google translate it shows an "š".
which is correct letter.
If i use bin2hex() to see hex code and for example argument is š letter, i get 9a hex code.
If anybody has any idea how to recreate function that will differentiate between these rectangles and show correct hex code or char itself, or how to upload to shared hosting without allowing it to change letters encoding in text file, or how to approach whole problem, it would be much obliged.
Do not use utf8-encode. It is only to converting from ISO-8859-1 and it doesn’t work with Windows-1252.
https://www.php.net/manual/en/function.utf8-encode.php
The second problem is that your code do a double encoding. I have marked two function that convert a string to UTF-8.
$temp = iconv(mb_detect_encoding($tmp, mb_detect_order(), true), "UTF-8", utf8_encode($tmp));
/* ^^^^^ ^^^^^^^^^^^ */
If the code below do not work, I would debug output of mb_detect_encoding($tmp, mb_detect_order(), true). The default values for mb_detect_order() may be far for optimal for Your situation.
https://www.php.net/manual/en/function.mb-detect-encoding.php
https://www.php.net/manual/en/function.mb-detect-order.php
$temp = iconv(mb_detect_encoding($tmp, mb_detect_order(), true), "UTF-8", $tmp);
You can use mb_convert_encoding() in place of iconv.
https://www.php.net/manual/en/function.mb-convert-encoding.php
For Your problem I would write this code:
/* If there are no Asian languages, the UTF-8 is the only encoding the mb_detect_encoding can recognize. */
if (mb_detect_encoding($tmp, 'UTF-8')) {
$temp = $tmp;
} else {
/* It is not UTF-8. Assume WINDOWS-1252. */
$temp = mb_convert_encoding($tmp, 'UTF-8', 'WINDOWS-1252');
}
It is very hard to reliably detect a particular single byte encoding. I am not aware of any build in PHP function for this.
I have a unicode string received over HTTP Post or fetched from a DB (does not matter)
In PHP I checked the encoding of the string using "mb_detect_encoding" and got UTF-8 as the result.
SO therefore the string is in Unicode.
But how do I write the string from php to a output file with the proper encoding
$fd = fopen('myfile.php', "wb");
fwrite($fd, $msg."\n");
What I see is "टेसà¥à¤Ÿ" instead of the actual string which is
टेस्ट्
Pasting the 'junk' into Notepad++ and then from menu option doing 'encoding UTF-8' will show the proper text.
EDIT
*SOLUTION*
Sorry for posting the question and figuring out the answer myself.
I found the solution at the following site
http://www.codingforums.com/showthread.php?t=129270
function writeUTF8File($filename,$content) {
$f=fopen($filename,"w");
# Now UTF-8 - Add byte order mark
fwrite($f, pack("CCC",0xef,0xbb,0xbf));
fwrite($f,$content);
fclose($f);
}
PHP does not change the encoding of the string or does anything with it when you write to a file. It simply dumps the bytes of the string (PHP strings are really byte arrays) into the file, period. If you actually receive the string as UTF-8 and do not do anything with it except write it to a file, the content of the file will be UTF-8 encoded. Your problem is most likely that whatever application you're using to view the file does not properly read it as UTF-8 encoded.
That BOM solution is not necessarily the best. A BOM is not necessary for UTF-8 and many applications have problems with it. It only helps applications that are otherwise unable (too stupid) to detect that a file is UTF-8 encoded. The better solution may be to simply explicitly tell the application in question that it needs to treat the file as UTF-8 encoded when opening the file. Or use a better application.
You have to specify the strict parameter of mb_detect_encoding, or you'll get many false positives.
Also, while the output may be UTF-8, you will have to specify the right headers (content-encoding) and/or the charset meta tag (if it's HTML).
Sorry for posting the question and figuring out the answer myself.
I found the solution at the following site here
function writeUTF8File($filename,$content) {
$f=fopen($filename,"w");
# Now UTF-8 - Add byte order mark
fwrite($f, pack("CCC",0xef,0xbb,0xbf));
fwrite($f,$content);
fclose($f);
}
I have this text...
“I’m not trying to be credible,†David admits with a smile broadening"
...and I would like to delete those funny characters, I've tried str_replace() but it does not work.
Any ideas?
You probably have handled text in a different encoding then its source encoding.
So if the text is UTF-8, you are not handling it currently as UTF-8. The easiest way is to send a header such as...
header('Content-Type: text/html; charset=UTF-8');
You could also add the meta element, but ensure it is the first child of your head element.
You need to fix that at the source instead of trying to patch it later (which will never work well).
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
...
</head>
Different sources often have different encodings, so you need to specify the encoding in which you are presenting the view. Utf-8 is the most popular, since it covers all of ASCII and many, many other languages.
php's utf8_(de)encode converts iso-8859-1 to utf-8 and the opposite and regular string manipulating functions are not multibyte-(which utf-8 can be) character aware. Either you use functions specific to mb_strings or enable encoding with certain parameters.
//comment if i'm mistaken
Well, you are using a different character encoding that you should probably use(you should be using utf-8 encoding), so I would change that instead of trying to just fix it on the spot with a quick-fix(you will run into less problems overall that way).
If you really want to fix it using PHP, you can use the ctype_alpha() function; you should be able to do something like this:
$theString = "your text here"; // your input string
$newString = ""; // your new string
$i = 0;
while($theString[$i]) // while there are still characters in the string
{
if(ctype_alpha($theString[$i]) // if it's a character in your current set
{
$newString .= $theString[$i]; // add it to the new string, increment pointer, and go to next loop iteration
$i++;
continue;
} // if the specific character at the $i index is an alphabetical character, add it to the new string
else
{
$i++;
} // if it's a bad character, just move the pointer up by one for the next iteration
}
Then use $newString however you want to. Really though, just change your character encoding instead of doing it this way. You want the encoding to be the same across your entire project.
I have a large file that contains world countries/regions that I'm seperating into smaller files based on individual countries/regions. The original file contains entries like:
EE.04 Järvamaa
EE.05 Jõgevamaa
EE.07 Läänemaa
However when I extract that and write it to a new file, the text becomes:
EE.04 Järvamaa
EE.05 Jõgevamaa
EE.07 Läänemaa
To save my files I'm using the following code:
mb_detect_encoding($text, "UTF-8") == "UTF-8" ? : $text = utf8_encode($text);
$fp = fopen(MY_LOCATION,'wb');
fwrite($fp,$text);
fclose($fp);
I tried saving the files with and without utf8_encode() and neither seems to work. How would I go about saving the original encoding (which is UTF8)?
Thank you!
First off, don't depend on mb_detect_encoding. It's not great at figuring out what the encoding is unless there's a bunch of encoding specific entities (meaning entities that are invalid in other encodings).
Try just getting rid of the mb_detect_encoding line all together.
Oh, and utf8_encode turns a Latin-1 string into a UTF-8 string (not from an arbitrary charset to UTF-8, which is what you really want)... You want iconv, but you need to know the source encoding (and since you can't really trust mb_detect_encoding, you'll need to figure it out some other way).
Or you can try using iconv with a empty input encoding $str = iconv('', 'UTF-8', $str); (which may or may not work)...
It doesn't work like that. Even if you utf8_encode($theString) you will not CREATE a UTF8 file.
The correct answer has something to do with the UTF-8 byte-order mark.
This to understand the issue:
- http://en.wikipedia.org/wiki/Byte_order_mark
- http://unicode.org/faq/utf_bom.html
The solution is the following:
As the UTF-8 byte-order mark is '\xef\xbb\xbf' we should add it to the document's header.
<?php
function writeStringToFile($file, $string){
$f=fopen($file, "wb");
$file="\xEF\xBB\xBF".$string; // utf8 bom
fputs($f, $string);
fclose($f);
}
?>
The $file could be anything text or xml...
The $string is your UTF8 encoded string.
Try it now and it will write a UTF8 encoded file with your UTF8 content (string).
writeStringToFile('test.xml', 'éèàç');
Maybe you want to call htmlentities($text) before writing it into file and html_entity_decode($fetchedData) before output. It'll work with Scandinavian letters.
It appears that your source file is not, in fact, in UTF-8. You might want to try using the same approach you've been using, but with a different encoding, such as UTF-16 perhaps.
You can do it as follows:
<?php
$s = "This is a string éèàç and it is in utf-8";
$f = fopen('myFile',"w");
fwrite($f, utf8_encode($s));
fclose($f);
?>
I'm writing a php program that pulls from a database source. Some of the varchars have quotes that are displaying as black diamonds with a question mark in them (�, REPLACEMENT CHARACTER, I assume from Microsoft Word text).
How can I use php to strip these characters out?
If you see that character (� U+FFFD "REPLACEMENT CHARACTER") it usually means that the text itself is encoded in some form of single byte encoding but interpreted in one of the unicode encodings (UTF8 or UTF16).
If it were the other way around it would (usually) look something like this: ä.
Probably the original encoding is ISO-8859-1, also known as Latin-1. You can check this without having to change your script: Browsers give you the option to re-interpret a page in a different encoding -- in Firefox use "View" -> "Character Encoding".
To make the browser use the correct encoding, add an HTTP header like this:
header("Content-Type: text/html; charset=ISO-8859-1");
or put the encoding in a meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Alternatively you could try to read from the database in another encoding (UTF-8, preferably) or convert the text with iconv().
I also faced this � issue. Meanwhile I ran into three cases where it happened:
substr()
I was using substr() on a UTF8 string which cut UTF8 characters, thus the cut chars could not be displayed correctly. Use mb_substr($utfstring, 0, 10, 'utf-8'); instead. Credits
htmlspecialchars()
Another problem was using htmlspecialchars() on a UTF8 string. The fix is to use: htmlspecialchars($utfstring, ENT_QUOTES, 'UTF-8');
preg_replace()
Lastly I found out that preg_replace() can lead to problems with UTF. The code $string = preg_replace('/[^A-Za-z0-9ÄäÜüÖöß]/', ' ', $string); for example transformed the UTF string "F(×)=2×-3" into "F � 2� ". The fix is to use mb_ereg_replace() instead.
I hope this additional information will help to get rid of such problems.
This is a charset issue. As such, it can have gone wrong on many different levels, but most likely, the strings in your database are utf-8 encoded, and you are presenting them as iso-8859-1. Or the other way around.
The proper way to fix this problem, is to get your character-sets straight. The simplest strategy, since you're using PHP, is to use iso-8859-1 throughout your application. To do this, you must ensure that:
All PHP source-files are saved as iso-8859-1 (Not to be confused with cp-1252).
Your web-server is configured to serve files with charset=iso-8859-1
Alternatively, you can override the webservers settings from within the PHP-document, using header.
In addition, you may insert a meta-tag in you HTML, that specifies the same thing, but this isn't strictly needed.
You may also specify the accept-charset attribute on your <form> elements.
Database tables are defined with encoding as latin1
The database connection between PHP to and database is set to latin1
If you already have data in your database, you should be aware that they are probably messed up already. If you are not already in production phase, just wipe it all and start over. Otherwise you'll have to do some data cleanup.
A note on meta-tags, since everybody misunderstands what they are:
When a web-server serves a file (A HTML-document), it sends some information, that isn't presented directly in the browser. This is known as HTTP-headers. One such header, is the Content-Type header, which specifies the mimetype of the file (Eg. text/html) as well as the encoding (aka charset).
While most webservers will send a Content-Type header with charset info, it's optional. If it isn't present, the browser will instead interpret any meta-tags with http-equiv="Content-Type". It's important to realise that the meta-tag is only interpreted if the webserver doesn't send the header. In practice this means that it's only used if the page is saved to disk and then opened from there.
This page has a very good explanation of these things.
As mentioned in earlier answers, it is happening because your text has been written to the database in iso-8859-1 encoding, or any other format.
So you just need to convert the data to utf8 before outputting it.
$text = “string from database”;
$text = utf8_encode($text);
echo $text;
To make sure your MYSQL connection is set to UTF-8 (or latin1, depending on what you're using), you can do this to:
$con = mysql_connect("localhost","username","password");
mysql_set_charset('utf8',$con);
or use this to check what charset you are using:
$con = mysql_connect("localhost","username","password");
$charset = mysql_client_encoding($con);
echo "The current character set is: $charset\n";
More info here: http://php.net/manual/en/function.mysql-set-charset.php
I chose to strip these characters out of the string by doing this -
ini_set('mbstring.substitute_character', "none");
$text= mb_convert_encoding($text, 'UTF-8', 'UTF-8');
Just Paste This Code In Starting to The Top of Page.
<?php
header("Content-Type: text/html; charset=ISO-8859-1");
?>
Based on your description of the problem, the data in your database is almost certainly encoded as Windows-1252, and your page is almost certainly being served as ISO-8859-1. These two character sets are equivalent except that Windows-1252 has 16 extra characters which are not present in ISO-8859-1, including left and right curly quotes.
Assuming my analysis is correct, the simplest solution is to serve your page as Windows-1252. This will work because all characters that are in ISO-8859-1 are also in Windows-1252. In PHP you can change the encoding as follows:
header('Content-Type: text/html; charset=Windows-1252');
However, you really should check what character encoding you are using in your HTML files and the contents of your database, and take care to be consistent, or convert properly where this is not possible.
Add this function to your variables
utf8_encode($your variable);
Try This Please
mb_substr($description, 0, 490, "UTF-8");
This will help you. Put this inside <head> tag
<meta charset="iso-8859-1">
That can be caused by unicode or other charset mismatch. Try changing charset in your browser, in of the settings the text will look OK. Then it's question of how to convert your database contents to charset you use for displaying. (Which can actually be just adding utf-8 charset statement to your output.)
what I ended up doing in the end after I fixed my tables was to back it up and change back the settings to utf-8 then I altered my dump file so that DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci are my character set entries
now I don't have characterset issues anymore because the database and browser are utf8.
I figured out what caused it. It was the web page+browser effects on the DB. On the terminals that are linux (ubuntu+firefox) it was encoding the database in latin1 which is what the tabes are set. But on the windows 10+edge terminals, the entries were force coded into utf8. Also I noticed the windows 10 has issues staying with latin1 so I decided to bend with the wind and convert all to utf8.
I figured it was a windows 10 issue because we started using win 10 terminals.
so yet again microsoft bugs causes issues. I still don't know why the encoding changes on the forms because the browser in windows 10 shows the latin1 characterset but when it goes in its utf8 encoded and I get the data anomaly. but in linux+firefox it doesn't do that.
This happened to work in my case:
$text = utf8_decode($text)
I turns the black diamond character into a question mark so you can:
$text = str_replace('?', '', utf8_decode($text));
Just add these lines before headers.
Accurate format of .doc/docx files will be retrieved:
if(ini_get('zlib.output_compression'))
ini_set('zlib.output_compression', 'Off');
ob_clean();
When you extract data from anywhere you should use functions with the prefix md_FUNC_NAME.
Had the same problem it helped me out.
Or you can find the code of this symbol and use regexp to delete these symbols.
You can also change the caracter set in your browser. Just for debug reasons.
Using the same charset (as suggested here) in both the database and the HTML has not worked for me... So remembering that the code is generated as HTML, I chose to use the "(HTML code) or the " (ISO Latin-1 code) in my database text where quotes were used. This solved the problem while providing me a quotation mark. It is odd to note that prior to this solution, only some of the quotation marks and apostrophes did not display correctly while others did, however, the special code did work in all instances.
I ran the "detect encoding" code after my collation change in phpmyadmin and now it comes up as Latin_1.
but here is something I came across looking a different data anomaly in my application and how I fixed it:
I just imported a table that has mixed encoding (with diamond question marks in some lines, and all were in the same column.) so here is my fix code. I used utf8_decode process that takes the undefined placeholder and assigns a plain question mark in the place of the "diamond question mark " then I used str_replace to replace the question mark with a space between quotes.
here is the
[code]
include 'dbconnectfile.php';
//// the variable $db comes from my db connect file
/// inx is my auto increment column
/// broke_column is the column I need to fix
$qwy = "select inx,broke_column from Table ";
$res = $db->query($qwy);
while ($data = $res->fetch_row()) {
for ($m=0; $m<$res->field_count; $m++) {
if ($m==0){
$id=0;
$id=$data[$m];
echo $id;
}else if ($m==1){
$fix=0;
$fix=$data[$m];
$fix = utf8_decode($fix);
$fixx =str_replace("?"," ",$fix);
echo $fixx;
////I echoed the data to the screen because I like to see something as I execute it :)
}
}
$insert= "UPDATE Table SET broke_column='".$fixx."' where inx='".$id."'";
$insresult= $db->query($insert);
echo"<br>";
}
?>
For global purposes.
Instead of converting, codifying, decodifying each text I prefer to let them as they are and instead change the server php settings.
So,
Let the diamonds
From the browser, on the view menu select
"text encoding" and find the one which let's you see your text
correctly.
Edit your php.ini and add:
default_charset = "ISO-8859-1"
or instead of ISO-8859 the one which fits your text encoding.
Go to your phpmyadmin and select your database and just increase the length/value of that table's field to 500 or 1000 it will solve your problem.