iconv(): Detected an illegal character in input string - php

I am using Iconv function to convert string to requested character encoding. Look On Below Code
$sms_text = 'A:'f3*'F'; // Output received from SMPP
$result = iconv('UTF-16BE' ,'UTF-8//IGNORE' , $sms_text);
echo 'Ignore: ' .$result;
echo $sms_text = iconv('UTF-16BE' ,'UTF-8' , $sms_text);
$result1 = iconv('UTF-16BE' ,'UTF-8//TRANSLIT' , $sms_text); //line no (53)
echo 'Transilt: '.$result1;
And I received the below Output
If I have a string of dari and pashto Language, then its showing only first word and dont return remaining string after blank space. Even //IGNORE gives the same output.
Should I replace these blank spaces with the help of with some other character so that I can get complete string?
Note: I am passing string received from SMPP(receiver).
SMS Sent to SMPP : افغانستان کابل
Outpur Received from SMPP : 'A:'f3*'F
String back converted by iconv : افغانستان
English string is working well.
Thanks in advance.

You are converting string to UTF-8 charset on this line:
echo $sms_text = iconv('UTF-16BE' ,'UTF-8' , $sms_text);
The error appears becouse you are trying to convert it second time.
To resolve this issue you should not update $sms_text variable on mentioned line.
Here is a code and output that works for me:
$sms_text ="فغانستان کابل";
echo "UTF-8 : $sms_text \n";
$sms_text = iconv('UTF-8', 'UTF-16BE', $sms_text);
echo "UTF-16BE : $sms_text \n";
echo 'Ignore: ' . iconv('UTF-16BE' ,'UTF-8//IGNORE' , $sms_text) . "\n";
echo 'Simple: ' . iconv('UTF-16BE' ,'UTF-8' , $sms_text) . "\n";
echo 'Transilt: '. iconv('UTF-16BE' ,'UTF-8//TRANSLIT' , $sms_text) . "\n";
Output:
UTF-8 : فغانستان کابل
UTF-16BE : A:'F3*'F
Ignore: فغانستان کابل
Simple: فغانستان کابل
Transilt: فغانستان کابل
As for blank spaces, could you please share the test string?

Related

Php removing unicode arabic characters

I have a text such " مُشْكِلَةٌ " in db. How can I get " مشكلة " from this text in php( str_replace etc ). I have tried str_replace it didn't work
Code
$string = 'مُشْكِلَةٌ';
$diacritic = array('ِ', 'ُ', 'ٓ', 'ٰ', 'ْ', 'ٌ', 'ٍ', 'ً', 'ّ', 'َ');
$newString = str_replace($diacritic, '', $string);
echo "Old String : ".$string;
echo "New String : ".$newString;
Output
Old String : مُشْكِلَةٌ
New String : مشكلة
Demo

PHP UTF-8 mb_convert_encode and Internet-Explorer

Since some days I read about Character-Encoding, I want to make all my Pages with UTF-8 for Compability. But I get stuck when I try to convert User-Input to UTF-8, this works on all Browsers, expect Internet-Explorer (like always).
I don't know whats wrong with my code, it seems fine to me.
I set the header with char encoding
I saved the file in UTF-8 (No BOM)
This happens only, if you try to access to the page via $_GET on the internet-Explorer myscript.php?c=äüöß
When I write down specialchars on my site, they would displayed correct.
This is my Code:
// User Input
$_GET['c'] = "äüöß"; // Access URL ?c=äüöß
//--------
header("Content-Type: text/html; charset=utf-8");
mb_internal_encoding('UTF-8');
$_GET = userToUtf8($_GET);
function userToUtf8($string) {
if(is_array($string)) {
$tmp = array();
foreach($string as $key => $value) {
$tmp[$key] = userToUtf8($value);
}
return $tmp;
}
return userDataUtf8($string);
}
function userDataUtf8($string) {
print("1: " . mb_detect_encoding($string) . "<br>"); // Shows: 1: UTF-8
$string = mb_convert_encoding($string, 'UTF-8', mb_detect_encoding($string)); // Convert non UTF-8 String to UTF-8
print("2: " . mb_detect_encoding($string) . "<br>"); // Shows: 2: ASCII
$string = preg_replace('/[\xF0-\xF7].../s', '', $string);
print("3: " . mb_detect_encoding($string) . "<br>"); // Shows: 3: ASCII
return $string;
}
echo $_GET['c']; // Shows nothing
echo mb_detect_encoding($_GET['c']); // ASCII
echo "äöü+#"; // Shows "äöü+#"
The most confusing Part is, that it shows me, that's converted from UTF-8 to ASCII... Can someone tell me why it doesn't show me the specialchars correctly, whats wrong here? Or is this a Bug on the Internet-Explorer?
Edit:
If I disable converting it says, it's all UTF-8 but the Characters won't show to me either... They are displayed like "????"....
Note: This happens ONLY in the Internet-Explorer!
Although I prefer using urlencoded strings in address bar but for your case you can try to encode $_GET['c'] to utf8. Eg.
$_GET['c'] = utf8_encode($_GET['c']);
An approach to display the characters using IE 11.0.18 which worked:
Retrieve the Unicode of your character : example for 'ü' = 'U+00FC'
According to this post, convert it to utf8 entity
Decode it using utf8_decode before dumping
The line of code illustrating the example with the 'ü' character is :
var_dump(utf8_decode(html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", 'U+00FC'), ENT_NOQUOTES, 'UTF-8')));
To summarize: For displaying purposes, go from Unicode to UTF8 then decode it before displaying it.
Other resources:
a post to retrieve characters' unicode

utf (chinese char) covert to Hexadecimal format in php

I am passing my message to SMS api,
This is the documentation
Normally Unicode Messages are Arabic and Chinese Message, which are
defined by GSM Standards. Unicode messages are nothing but normal text
type messages but it has to be submitted in HEX form. To submit
Unicode messages following Url to be used.
I tried bin2hex() there is not working for the output.
$str = '人';
//$str = 'a';
$output = bin2hex($str);
echo $output;
//output
//人 = e4baba ; I would expect '4EBA'
I found a similar solution but it is in VB.net anyone can convert it?
http://www.supportchain.com/index.php?/Knowledgebase/Article/View/28/7/unable-to-send-sms-with-chinese-character-using-api
the sample i had tried, and it is work:-
example of conversion : a converted to hexadecimal is 0061, 人 converted to hexadecimal is 4EBA
The issue you are facing has to do with encoding. Since these are considered special characters, you need to add some encoding details when converting to hex.
Each of these outputs exactly what you were looking for when I run them:
echo bin2hex(iconv('UTF-8', 'ISO-10646-UCS-2', '人')) . PHP_EOL;
//Outputs 4eba
echo bin2hex(iconv('UTF-8', 'UNICODE-1-1', '人')) . PHP_EOL;
//Outputs 4eba
echo bin2hex(iconv('UTF-8', 'UTF-16BE', '人')) . PHP_EOL;
//Outputs 4eba
Pick whichever one you fancy.
If you want to convert back:
echo iconv('UTF-16BE', 'UTF-8', hex2bin('4eba')) . PHP_EOL;
//outputs 人

PHP substr giving weird characters from xml feed

For some reason when using substr I am getting a weird character on output like this "".
All I am doing is this:
$price = substr($game->price, 1);
$dollars_original = substr($game->fullPrice, 1);
echo "Price: $" . $price . "<br />\n";
echo "Original Price: $" . $dollars_original . "<br />\n";
This comes from an XML feed here: http://itch.io/browse/platform-linux/price-sale.xml which i parse like so:
$url = 'http://itch.io/browse/platform-linux/price-sale.xml';
$xml = simplexml_load_string(file_get_contents($url));
So for example a price might be £0.89 but, when removing the £ sign it comes out as �0.89
What am I missing here?
It looks like you have a mixup in character encodings. What is Latin-1 or Windows-1252, and what is UTF-8? If you're working on UTF-8 data, you may have to use mb_substr(), which is UTF-8 aware.

Acents become interrogation marks in php when parsing html

i'm getting a PT-BR text automatically from downloading a html page and the acentution becomes interrogation marks when I use uft8_decode, this is my function:
function pegaMsg($string)
{
$bot_url = "http://website.com";
//&rnd=&msg="
$rand_msg = rand(0,100);
$url = $bot_url . $rand_msg . "&msg=" . $string;
$url = str_replace(" ", "%20", $url);
//echo "\n" . $url;
$download = http_get($url, $referer="");
$download['FILE'] = utf8_decode($download['FILE']);
$download['FILE'] = str_replace("var resp = ", "", $download['FILE']);
$download['FILE'] = str_replace("\\r\\n", "", $download['FILE']);
$download['FILE'] = str_replace(";", "", $download['FILE']);
$download['FILE'] = str_replace("\'", "", $download['FILE']);
$download['FILE'] = trim($download['FILE']);
return $download['FILE'];
}
this is the output expected:
VOCÊ TINHA DUAS ESCOLHAS:
and this is what I get:
'VOC? TINHA DUAS ESCOLHAS:
what can I do ? I want the ^ displayed ! thanks and sorry for the bad english
utf8_decode replaces invalid code unit sequences ?. The reason you're getting a ? is likely because the text you're passing to utf8_decode was not in UTF-8 to begin with.
In fact, it's possible it was already in ISO-8859-1, which is the encoding of the string returned by utf8_decode. In that case, your solution would be to just omit the call to utf8_decode.
If the original text was neither in UTF-8 nor in ISO-8859-1 (which is what I'm assuming you want, since you're calling utf8_decode), you have to use iconv or mb_convert_encoding.
A final possibility is that whatever is interpreting the script output is assuming the encoding of the script output is different from what it actually and it also converts invalid code unit sequences to ?.
Try to use encode
$download['FILE'] = utf8_encode($download['FILE']);

Categories