IMAP Encoding and decoding: UTF-8 issue - php

I have a function which is meant to move a mail from one folder to another on a gmail account.
The function is fully functional when it comes to moving the mail. Tho my problem appears when
working with utf-8 encoded mailboxes. I decode the IMAP folder list response
but the dump of both values gives different results.
// Getting the folders
$folders = imap_list(CONNECTION, MAILBOX, PATTERN);
// After a foreach, stripping slash, prefix and such
// $folder is the raw mailbox name from the IMAP list
$mailbox = utf8_encode(imap_utf7_decode($folder)); // = string(12) "Tæstbåks"
// The entered search from the client
$search_for = "Tæstbåks"; // = string(10) "Tæstbåks"
if($search_for == $mailbox)
print "Yeah!";
else
print "Noo!";
I do not know why those two strings do not match, that is my problem.

PHP's function imap_utf7_decode($folder) is documented to return a string in ISO-8859-1 encoding. Given that IMAP's modified UTF-7 scheme can encode the whole range of Unicode (which means "a lot") and that ISO-8859-1 can only represent 256 individual characters, you cannot possibly use that function in this context. I would go as far as to suggest that the PHP developer who decided to offer such a useless function was not in his best shape the day he designed it.
It looks like the mbstring extension can do what you really want to do here -- use something like $mailbox = mb_convert_encoding($folder, "UTF-8", "UTF7-IMAP"), as suggested in the comments below the PHP's docs.

Related

Convert and separate emoji from strings

I have been searching for some time for what I am trying to accomplish, and I am not an expert in this emoji 'stuff' and I need some help
I have an application which has multiple SMS service providers attached (2 of them at the moment), which sends SMS messages to (inbound), and we send SMS (outbound) all via API.
When an SMS message is received (from either provider), our platform sees the incoming text (with emojis) as:
"hello \ud83e\udd2a"
I have already changed my database to store the emojis, and I changed my charset in the PHP application to display them correctly in HTML, including the email forward, so I am all good there
The issue I am running into is that 1 of the providers (when sending) will accept this as a vaild emoji:
"hello to you \ud83e\udd2a"
But the other will not. The second provider needs (i think) HTML Dec format, so when I sent to them, it needs to look like this:
"hello to you &#128512 ;"
I have 2 separate php functions to send to each provider, so I can do any conversion code from the front-app I need to before it sends it to the provider.
My front-end is using a jQuery emoji picker, so the PHP form post sends "hello to you \ud83e\udd2a" to the php function that calls the API.
Any insight you can give will be greatly appreciated!
thanks in advance.
$utf32 = mb_convert_encoding($message['text'], 'UTF-32', 'UTF-8' );
$hex4 = bin2hex($utf32);
$dec = hexdec($hex4);
$emoji_replaced = "&#$dec;";
echo "&#$dec;";
this was giving me a dec value of the whole string
As a start, when the JSON document is decoded the \ud83e\udd2a will be decoded to the standard 🤪 emoji. These escapes are part of the JSON spec.
As for the provider that "doesn't support" emojis, they likely using a text encoding other than UTF-8 that will fail to display any "special" character or non-english text, not just emojis. This is further evidenced by the fact that they can still render the emoji when you send it as an HTML entity.
While you can use the below code to convert emojis to HTML entity sequences, you should also be checking that other providers docs as to the encoding they use, and converting the messages appropriately. That said, non-UTF-8 encodings simply do not have emojis at all, so you may still want to convert to entities first.
As their name implies, HTML entities will only work when viewed as part of an HTML document. YMMV.
<?php
function emoji_to_entity($emoji) {
return sprintf(
'&#%s;',
unpack(
'Ntgt',
mb_convert_encoding($emoji, 'UTF-32', 'UTF-8')
)['tgt']
);
}
// PHP>=8.2 supports the Extended_Pictographic property
//$emoji_regex = '/\p{Extended_Pictographic}/u';
// PHP<8.2 needs to use a monstrous regex like this
// culled from: https://raw.githubusercontent.com/PCRE2Project/pcre2/master/maint/Unicode.tables/emoji-data.txt
$emoji_regex = '/[\x{23}-\x{23}\x{2a}-\x{2a}\x{30}-\x{39}\x{a9}-\x{a9}\x{ae}-\x{ae}\x{200d}-\x{200d}\x{203c}-\x{203c}\x{2049}-\x{2049}\x{20e3}-\x{20e3}\x{2122}-\x{2122}\x{2139}-\x{2139}\x{2194}-\x{2199}\x{21a9}-\x{21aa}\x{231a}-\x{231b}\x{2328}-\x{2328}\x{2388}-\x{2388}\x{23cf}-\x{23cf}\x{23e9}-\x{23f3}\x{23f8}-\x{23fa}\x{24c2}-\x{24c2}\x{25aa}-\x{25ab}\x{25b6}-\x{25b6}\x{25c0}-\x{25c0}\x{25fb}-\x{25fe}\x{2600}-\x{2605}\x{2607}-\x{2612}\x{2614}-\x{2685}\x{2690}-\x{2705}\x{2708}-\x{2712}\x{2714}-\x{2714}\x{2716}-\x{2716}\x{271d}-\x{271d}\x{2721}-\x{2721}\x{2728}-\x{2728}\x{2733}-\x{2734}\x{2744}-\x{2744}\x{2747}-\x{2747}\x{274c}-\x{274c}\x{274e}-\x{274e}\x{2753}-\x{2755}\x{2757}-\x{2757}\x{2763}-\x{2767}\x{2795}-\x{2797}\x{27a1}-\x{27a1}\x{27b0}-\x{27b0}\x{27bf}-\x{27bf}\x{2934}-\x{2935}\x{2b05}-\x{2b07}\x{2b1b}-\x{2b1c}\x{2b50}-\x{2b50}\x{2b55}-\x{2b55}\x{3030}-\x{3030}\x{303d}-\x{303d}\x{3297}-\x{3297}\x{3299}-\x{3299}\x{fe0f}-\x{fe0f}\x{1f000}-\x{1f0ff}\x{1f10d}-\x{1f10f}\x{1f12f}-\x{1f12f}\x{1f16c}-\x{1f171}\x{1f17e}-\x{1f17f}\x{1f18e}-\x{1f18e}\x{1f191}-\x{1f19a}\x{1f1ad}-\x{1f1ff}\x{1f201}-\x{1f20f}\x{1f21a}-\x{1f21a}\x{1f22f}-\x{1f22f}\x{1f232}-\x{1f23a}\x{1f23c}-\x{1f23f}\x{1f249}-\x{1f53d}\x{1f546}-\x{1f64f}\x{1f680}-\x{1f6ff}\x{1f774}-\x{1f77f}\x{1f7d5}-\x{1f7ff}\x{1f80c}-\x{1f80f}\x{1f848}-\x{1f84f}\x{1f85a}-\x{1f85f}\x{1f888}-\x{1f88f}\x{1f8ae}-\x{1f8ff}\x{1f90c}-\x{1f93a}\x{1f93c}-\x{1f945}\x{1f947}-\x{1faff}\x{1fc00}-\x{1fffd}\x{e0020}-\x{e007f}]/u';
$input = json_decode('"hello to you \ud83e\udd2a"');
var_dump(
$input,
preg_replace_callback(
$emoji_regex,
function($a) { return emoji_to_entity($a[0]); },
$input
)
);
Output:
string(17) "hello to you 🤪"
string(22) "hello to you 🤪"
And outside of a code block: string(22) "hello to you 🤪"

Is there a way to change the encoding of the headers in SwiftMailer?

I'm using SwiftMailer to send emails but I have some codification problems with UTF-8 subjects. Swiftmailer uses QPHeaderEncoder as default to encode email headers and the safeMap looks like it has some problems with some UTF-8 French characters. One subject I use contains the word trouvé (found in French) and when the subject gets to the user it shows trouv.
I'd like to use something similar to the NativeQPContentEncoder that's available as content encoders but for headers there's only Base64 and Quoted Printable encoders.
Is there a way to fix this, maybe I'm doing something wrong so I paste the code I'm using here
$message = Swift_Message::newInstance()
// set encoding in 8 bit
->setEncoder(Swift_Encoding::get8BitEncoding())
// Give the message a subject
->setSubject($subject)
// Set the From address with an associative array
->setFrom(array($from => $niceFrom))
// Set the To addresses with an associative array
->setTo(array($to)) ;
Check if in your PHP configuration mbstring.func_overload option has any value other than 0. If yes, change it to 0, reload your webserver and try to send message again.
mbstring.func_overload overrides some string PHP functions and may lead to tricky bugs with UTF-8.
Personally I solved exactly this problem by disabling mbstring.func_overload.
First, make sure you know how is your subject string encoded. If it is not UTF-8 then utf8_encode() it.
Also, make sure you setCharset('utf-8') your message.

Best method of converting user input to UTF-8

I'm building a PHP web application, and it works in UTF-8. The database is UTF-8, the pages are served as UTF-8 and I set the charset using a meta tag to UTF-8. Of course, with users using Internet Explorer, and copying & pasting from Microsoft Office, I somehow manage to get not UTF-8 input occasionally.
The ideal solution would be to throw an HTTP 400 Bad Request error, but obviously I can't do that. The next best thing is converting $_GET, $_POST and $_REQUEST to UTF-8. Is there anyway to see what character encoding the input is in so I can pass it off to iconv? If not, what's the best solution for doing this?
Check out mb_detect_encoding() Example:
$utf8 = iconv(mb_detect_encoding($input), 'UTF-8', $input);
There's also utf8_encode() if you guarantee that the string is input as ISO-8859-1.
In some cases using just utf8_encode or general checks are ok but you might lose some characters within the string. If you can build out a basic array/string list based on various types, this example being windows, you can salvage quite a bit more.
if(!mb_detect_encoding($fileContents, "UTF-8", true)){
$checkArr = array("windows-1252", "windows-1251");
$encodeString = '';
foreach($checkArr as $encode){
if(mb_check_encoding($fileContents, $encode)){
$encodeString .= $encode.",";
}
}
$encodeString = substr($encodeString, 0, -1);
$fileContents = mb_convert_encoding($fileContents, "UTF-8", $encodeString);
}

using a base64 encoded string in url with codeigniter

I have an encrypted, base64 encoded array that I need to put into a url and insert into emails we send to clients to enable them to be identified (uniquely) - the problem is that base64_encode() often appends an = symbol or two after it's string of characters, which by default is disallowed by CI.
Here's an example:
http://example.com/cec/pay_invoice/VXpkUmJnMWxYRFZWTEZSd0RXZFRaMVZnQWowR2N3TTdEVzRDZGdCbkQycFFaZ0JpQmd4V09RRmdWbkVMYXdZbUJ6OEdZQVJ1QlNJTU9Bb3RWenNFSmxaaFVXcFZaMXQxQXpWV1BRQThVVEpUT0ZFZ0RRbGNabFV6VkNFTlpsTWxWV29DTmdackEzQU5Nd0lpQURNUGNGQS9BRFlHWTFacUFTWldOZ3M5QmpRSGJBWTlCREVGWkF4V0NtQlhiZ1IzVm1CUk9sVm5XMllEWlZaaEFHeFJZMU51VVdNTmJsdzNWVzlVT0EwZw==
Now I understand I can allow the = sign in config.php, but I don't fully understand the security implications in doing so (it must have been disabled for a reason right?)
Does anyone know why it might be a bad idea to allow the = symbol in URLs?
Thanks!
John.
Not sure why = is disallowed, but you could also leave off the equals signs.
$base_64 = base64_encode($data);
$url_param = rtrim($base_64, '=');
// and later:
$base_64 = $url_param . str_repeat('=', strlen($url_param) % 4);
$data = base64_decode($base_64);
The base64 spec only allows = signs at the end of the string, and they are used purely as padding, there is no chance of data loss.
Edit: It's possible that it doesn't allow this as a compatibility option. There's no reason that I can think of from a security perspective, but there's a possibility that it may mess with query string parsing somewhere in the tool chain.
Please add the character "=" to $config['permitted_uri_chars'] in your config.php file you can find that file at application/config folder
Originally there are no any harmful characters in the url at all. But there are not experienced developers or bad-written software that helps some characters to become evil.
As of = - I don't see any issues with using it in urls
Instead of updating config file you can use urlencode and urldecode function of native php.
$str=base64_encode('test');
$url_to_be_send=urlencode($str);
//send it via url
//now on reciveing side
//assuming value passed via get is stored in $encoded_str
$decoded_str=base64_decode(urldecode($encoded_str));

POST from Flash (AS2) to PHP, outputs ??? when non-english characters are used

I am trying to use POST in Flash (ActionScript 2), to POST values to PHP mail script.
I tried the PHP mail script with HTML form, and it worked perfectly fine.
But when I POST from flash and input non-English characters, I get "????" in the mail.
I tried utf8_encode($_POST["name"]), but it doesn't help.
Edit:
I also tried utf8_decode($_POST["name"]), it didn't work.
Update: (So you wont have to go through all the comments)
I checked the variables in Flash,
the values are stored correctly.
The HTML page where the Flash is embedded is UTF-8 encoded.
I watched the POST headers with FireBug, the POST itself is already messed up, showing "????" instead of the real value.
The the messed up "????" value, is currently url-encoded by flash, and decoded by PHP, resulting in $_POST["name"] == "???";
I suspect its the sendAndLoad method that creates the mess.
Update:
Here is the flash code:
System.useCodepage = true;
send_btn.onRelease = function() {
my_vars = new LoadVars();
my_vars.email = email_box.text;
my_vars.name = name_box.text;
my_vars.family_box = comment.text;
my_vars.phone = phone_box.text;
if (my_vars.email != "" and my_vars.name != "") {
my_vars.sendAndLoad("http://aram.co.il/ido/sendMail.php", my_vars, "POST");
gotoAndStop(2);
} else {
error_clip.gotoAndPlay(2);
}
my_vars.onLoad = function() {
gotoAndStop(3);
};
};
email_box.onSetFocus = name_box.onSetFocus=message_box.onSetFocus=function () {
if (error_clip._currentframe != 1) {
error_clip.gotoAndPlay(6);
}
};
Flash uses UTF8-encoding for all strings, anyway. If you use LoadVars, transfer as a urlencoded string should also work automatically.
So your problem is most probably in the PHP part of your application. For example, in order for UTF8 to work correctly, all individual PHP files must be saved in UTF8-encoded format, as well.
If just changing the file encoding doesn't work, try parsing $HTTP_RAW_POST_DATA first, check if all the fields have been transferred correctly, then go on and echo your way through until you find the place where the encoding is lost.
Update:
Here is your problem: You use System.useCodePage = true;. This requires you to specifically encode all your data as unicode before sending it. Unless you have any other documents in other encodings, and/or allow your users to upload their own text data with their localized encodings, set System.useCodePage = false;, and your utf8-problem should go away.
If you receive data from flash you need to use utf8_decode and not utf8_encode.
Flash uses UTF8 - as long as you don't tell it to use the local characterset. And you want PHP to decode that to good old ISO-8859-1 which PHP uses internally.
You'd only use utf8_encode when preparing data for flash.

Categories