I'm having a problem where I'm trying to parse an email, and then post the email content to a website. The email may contain Japanese or English. The Japanese displays 99% correctly on the website, but every now and then a character will be swapped for another, or it will display as garbage.
Here's the code being used to get the proper encoding for the email body-
$post->content = quoted_printable_decode($parser->getMessageBody('text'));
$isISO2022 = $parser->isISO2022();
$post->content = ($isISO2022)
? mb_convert_encoding($post->content, 'UTF-8', 'iso-2022-jp')
: mb_convert_encoding($post->content, 'UTF-8', mb_detect_encoding($post->content));
$post->save();
The parser's isISO2022 function:
public function isISO2022() {
$isISO2022 = false;
foreach ($this->parts as $part) {
if (isset($part['headers']['content-type']) && preg_match('/iso-2022-jp/i',$part['headers']['content-type'])) {
$isISO2022 = true;
}
}
return $isISO2022;
}
Anyone have any ideas what's going on?
Added:
I have heard that there are some specific characters that are not supported by iso-2022-jp, and you should use iso-2022-jp-ms instead, but when I try to use iso-2022-jp-ms, it says invalid encoding. It also seems to me that the characters I've seen it not display correctly are basic characters, and should be universally supported.
Related
I have a reference emojis file used by my php code. Inside there is for example "woman-woman-boy", but the browser (chrome) replaces this name by "family_mothers_one_boy"...
Why are there two versions of emojis' names?
Is there en (some) error(s) in my file, or should I have to do something in my code to avoid the conversion?
NOTE:
The code related to this emoji is:
1F469;👩👦
Here are the two functions I'm using to manage the emojis:
1. When I display the emoji, I replace the tage :name: by the HTML rendering (using unicode)
function replaceEmojiNameByUnicode($inputText){
$emoji_unicode = getTabEmojiUnicode();
preg_match_all("/:([a-zA-Z0-9'_+-]+):/", $inputText, $emojis);
foreach ($emojis[1] as $emojiname) {
if (isset($emoji_unicode[$emojiname])) {
$inputText = str_replace(":".$emojiname.":", "&#x".$emoji_unicode[$emojiname].";", $inputText);
}
else {
$inputText = str_replace(":".$emojiname.":", "(:".$emojiname.":)", $inputText);
}
}
return $inputText;
}
2. When I want to propose the list of emoji I display an HTML SELECT in the page. Teh following function return the list of option to add inside:
/* Display the options in the HTML select */
function displayEmojisOptions(){
$emoji_unicode = getTabEmojiUnicode();
foreach ($emoji_unicode as $name => $unicode) {
echo '<option value="&#x'.$unicode.';">'.$name.' => &#x'.$unicode.';</option>';
}
}
In the array $emoji_unicode there is one entry (with 3 semi-column removed to not display emoji here):
'family_one_girl' => '1F468;‍👩‍👧',
For example: In order to make it works, I have to replace the line 'thinking_face' => '1F914', by 'thinking' => '1F914',
My question is: why ??
Thank you
Nop, the emoji text was changed by no code... I guess it was due to a wrong emoji file I used... I correct all the emoji manually and now I did not see the mismatch anymore...
If someone need the corrected file, I can provide it.
We are fetching data from csv file through php, and trying to compare data to insert the right data in our database but php comparison is not working as data in file is with french accents.
Here is a piece of code we are working with.
if($data[0]=='Expression' && $data[1]=='Domaine (Domain)' && utf8_decode($data[2])==utf8_decode('DŽfinition (Definition)') && $data[3]=='Commentaire (Commentary)' && $data[4]=='Voir aussiÉ (See also É)' && $data[5]=='ƒquivalent anglais (English equivalent)' && $data[6]=='En contexte / exemple(s) É (In context / examples)' && $data[7]=='Source' )
{
echo "<tr>
<td>".$data['0']."</td>
<td>".$data['1']."</td>
<td>".$data['2']."</td>
<td>".$data['3']."</td>
<td>".$data['4']."</td>
<td>".$data['5']."</td>
<td>".$data['6']."</td>
<td>".$data['7']."</td>
<td><i class='fa fa-close text-navy'></i></td>
</tr>";
return true;
}
else
{
echo "invalid data";
exit;
}
We have tried with this as well.
function convert($data)
{
$value = utf8_encode($data);
$value = iconv('UTF-8', 'ASCII//TRANSLIT', $value);
return $value;
}
Header is already placed output is fine
header('Content-Type: text/html; charset=iso-8859-1');
We have tried with several php functions like utf_decode,html entities, html special char, htmlspecialchar_decode but nothing is working.
echo print_r(utf8_decode($data[2]));
output is as following:
D?finition (Definition)1invalid data
Actual word is : DŽfinition (Definition)'
We are working on french dictionary and need to do real time searching on the data as well, please help with mysql as well, like which functions are needed to be called before insertion for decoding and which functions are needed to be called before showing data back to user encoding!
Hope my question is bit clear.
Thanks in advance
To save the CSV in UTF8, open it in notepad.
And Go to File - Save As
Change the Encoding to UTF-8.
or with Libre Office :
https://csvimproved.com/support/questions-and-answers/916-save-a-csv-file-as-utf-8
Hope it ll help
I want to save special characters like the following string in a database:-
:¦:-•:":•.-:¦:-•EXCELLENT!•-:¦:-•:•-:¦:-•:*''•
Below is the code that I am using.
$Fields ['CommentText']=$CommentText;
$Fields = prepareMySQLi($FieldsNotifications,$linkMysqli);
$insert = mysqli_query($linkMysqli,"INSERT INTO `feeds` SET $Fields");
function prepareMySQLi($MyArray,$linkMysqli) {
foreach($MyArray as $col => $val) {
if($val=='Invalid Request') $val='';
if ($val!='' && !is_array($val)) {
$col = mysqli_real_escape_string($linkMysqli,$col);
$val = mysqli_real_escape_string($linkMysqli,$val);
if(isset($fields)) {
$fields .= ", `$col` = '$val' ";
} else {
$fields = " `$col` = '$val' ";
}
}
}
return $fields;
}
But the above code saves the result like:-
•:¨¨:•.EXCELLENT.•:¨¨:••:¨¨:•.
Can anyone guide me how can I save the string same as it is in the database?
It seems like encoding issue. The encoding of the data that you are receiving should be same as the encoding of database where you are storing it.
General practice would be to use "utf-8" encoding for both.
So check in which encoding the database stores the data, and try to convert received data to that format or vice versa.
You can use utf8_encode function for encoding data to "utf-8".
please check by doing these two simple thing, i hope it helps you:-
the "collation" of the column in which you are going to save this data(special characters), made it "utf-8-bin".
either change type of the column to "blob","text" or "long text".
try in your code:- mysqli_set_charset($linkMysqli, "utf8");
encode and decode process. before saving the text encode it and if you want to show it somewhere then first decode it and then show.
simple thing is change
Collation in db to utf8 -> utf8_unicode_ci
and if you dont used field Type as text than change it to text...
May be this is simple solution.....
Other things may You can check your Mysqli Db class Where
'charset' => 'utf8',
is there or not....??
If you dont find charset than please change to utf8...
At last check your page header
and add
<meta charset="utf-8">
Thats it may u will get something...
I would like to be able for visitors in my website to register with a Hebrew username.
In my website when I try to register with a Hebrew character username it gives me no error,
the user receives an email saying he was successfully registered and the admin gets an email with the user's details,
BUT
The user does not actually register to the database... he does not show in the users table and when he tries to log-in there is an error saying there is no such user...
This does not happen with English usernames
Is there any way to fix this?
Short description of the problem: (and possible solution)
when wordpress creates the user's nicename ("A string that contains a URL-friendly name for the user").
If you specified a Hebrew username this will be a list of html entities.
In my test, the Hebrew username was "בדיקהקהקהק" with a sanitized username of "%d7%91%d7%93%d7%99%d7%a7%d7%94%d7%a7%d7%94%d7%a7%d7%a7" (54 characters).
The lenght of field user_nicename is 50 characters in the wordpress database. (this explains by the way, why it worked with usernames up to 8 characters)
Taking a look at the Wordpress code of user.php, we can see that Wordpress is checking the length of user_nicename before inserting the user, but it doesn't work.
Let me show you why:
the code to check user_nicename length is something like this:
$user_nicename = sanitize_user( $userdata['user_nicename'], true );
if ( mb_strlen( $user_nicename ) > 50 ) {
return new WP_Error( 'user_nicename_too_long', __( 'Nicename may not be longer than 50 characters.' ) );
}
When using a Hebrew username, regardless of the length of the generated user_nicename, the condition will always be true because the function sanitize_user() "removes all unsafe characters" - including html entities, and returns an empty string, length = 0.
Eventually passing the check you could have the impression that the user was created without error, but it wasn't because of the data exceeding the field length in user_nicename.
Possible solutions are to programmatically cut the string to 50 characters, forcing some stacig user_nicename like "user-1", "user-2" etc, or using latin characters for user_nicename either with PHP Transliterator class or a custom library/function.
I picked the last solution and because I used shared hosting without an option to install additional libraries.
Working code
inserted into function.php of my template:
// user creation with Hebrew usernames
//function to transliterate username
function convert_user_nicename($hebrew_username) {
$charset = array('א'=>'a','ב'=>'b','ג'=>'g','ד'=>'d','ה'=>'h','ו'=>'v','ז'=>'z','ח'=>'h','ט'=>'t','י'=>'y','ך'=>'k','כ'=>'k','ל'=>'l','ם'=>'m','מ'=>'m','ן'=>'n','נ'=>'n','ס'=>'s','ע'=>'e','ף'=>'p','פ'=>'p','ץ'=>'ts','צ'=>'ts','ק'=>'q','ר'=>'r','ש'=>'sh','ת'=>'t');
$latin_user_slug = '';
$chars = mbStringToArray($hebrew_username);
foreach ($chars as $key =>$char) {
$latin_user_slug .= $charset[$char];
}
return($latin_user_slug);
}
function mbStringToArray ($string) {
$strlen = mb_strlen($string);
while ($strlen) {
$array[] = mb_substr($string,0,1,"UTF-8");
$string = mb_substr($string,1,$strlen,"UTF-8");
$strlen = mb_strlen($string);
}
return $array;
}
// remove the filter
remove_filter( 'pre_user_nicename', 'filter_pre_user_nicename', 10, 1 );
function filter_pre_user_nicename($user_nicename) {
$heb = html_entity_decode($user_nicename);
$user_nicename = convert_user_nicename($heb);
return $user_nicename;
};
// add new filter
add_filter( 'pre_user_nicename', 'filter_pre_user_nicename', 10, 1 );
You'll need to change the MySQL Server connection collation. If you have PhpMyAdmin, you can simply change this by logging in to PhpMyAdmin. And then, on the front page you will see the Server connection collation (on the General tab).
Change the server connection collation into hebrew_general_ci to support Hebrew based texts in the database.
References:
https://www.serverintellect.com/support/sqlserver/change-database-collation/
http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html
Hope that helps!
I generate a lot of posts in Wordpress from an XML file. The worry: accented characters.
The header of the stream is:
<? Xml version = "1.0" encoding = "ISO-8859-15"?>
Here is the complete flux : http://flux.netaffiliation.com/rsscp.php?maff=177053821BA2E13E910D54
My site is in utf8.
So I use the function utf8_encode ... but that does not solve the problem, the accents are always misunderstood.
Does anyone have an idea?
EDIT 04-10-2011 18:02 (french hour) :
Here is the complete flux : http://flux.netaffiliation.com/rsscp.php?maff=177053821BA2E13E910D54
Here is my code :
/**
* parse an rss flux from netaffiliation and convert each item to posts
* #var $flux = external link
* #return bool
*/
private function parseFluxNetAffiliation($flux)
{
$content = file_get_contents($flux);
$content = iconv("iso-8859-15", "utf-8", $content);
$xml = new DOMDocument;
$xml->loadXML($content);
//get the first link : http://www.netaffiliation.com
$link = $xml->getElementsByTagName('link')->item(0);
//echo $link->textContent;
//we get all items and create a multidimentionnal array
$items = $xml->getElementsByTagName('item');
$offers = array();
//we walk items
foreach($items as $item)
{
$childs = $item->childNodes;
//we walk childs
foreach($childs as $child)
{
$offers[$child->nodeName][] = $child->nodeValue;
}
}
unset($offers['#text']);
//we create one article foreach offer
$nbrPosts = count($offers['title']);
if($nbrPosts <= 0)
{
echo self::getFeedback("Le flux ne continent aucune offre",'error');
return false;
}
$i = 0;
while($i < $nbrPosts)
{
// Create post object
$description = '<p>'.$offers['description'][$i].'</p><p>'.$offers['link'][$i].'</p>';
$my_post = array(
'post_title' => $offers['title'][$i],
'post_content' => $description,
'post_status' => 'publish',
'post_author' => 1,
'post_category' => array(self::getCatAffiliation())
);
// Insert the post into the database
if(!wp_insert_post($my_post));;
$i++;
}
echo self::getFeedback("Le flux a généré {$nbrPosts} article(s) depuis le flux NetAffiliation dans la catégorie affiliation",'updated');
return false;
}
All the posts are generated but... the accented chars are ugly. You can see the result here: http://monsieur-mode.com/test/
There are plenty difficulties which you have to master when swapping between different encodings. Also, encodings which use more than one byte to encode characters (so-called multibyte-encodings) like UTF-8, which is used by WordPress, deserve special attention in PHP.
First, make sure that all the files you create are saved with the same encoding as they will be served. For example, make sure you set the same encoding as in the "Save as..."-dialog as you use in the HTTP Content-Type header.
Second, you need to verify that the input has the same encoding as the file you want to deliver. In your case, the input file has the encoding ISO-8859-15, so you'll need to convert it to UTF-8 using iconv().
Third, you must know that PHP doesn't natively support multibyte-encodings such as UTF-8. Functions such as htmlentities() will produce strange characters. For many of these functions, there are multibyte-alternatives, which are prefixed with mb_. If your encoding is UTF-8, check your files for such functions and replace them if necessary.
For more information about these topics, see Wikipedia about variable-width encodings, and the page in the PHP-Manual.
By default, most application work with UTF-8 data and output UTF-8 content. Wordpress should definitely not be apart and surely works on a UTF-8 basis.
I would simply not convert at all any information when printing, but instead change your header to UTF-8 instead of ISO-8859-15.
If your incoming XML data is ISO-8859-15, use iconv() to convert it:
$stream = file_get_contents("stream.xml");
$stream = iconv("iso-8859-15", "utf-8", $stream);
mb_convert_encoding()saves my life.
Here is my solution :
$content = preg_replace('/ encoding="ISO-8859-15"/is','',$content);
$content = mb_convert_encoding($content,"UTF-8");