PHP & Android - UTF-8 encoding problems

PHP & Android - UTF-8 encoding problems - php

We're doing a POST request via Android to PHP and Android is passing UTF-8 string, our database is currently cp1252 (has a live iPhone backend, hence working with it)
However we can't seem to get the string converted to cp1252. Tried the following code without results
$userkey = utf8_decode($userkey);
$userkey = iconv('UTF-8', 'cp1252', $userkey);
$userkey = mb_convert_encoding($userkey, 'cp1252', 'UTF-8');
Check the response with
echo 'userkey: '.mb_detect_encoding($userkey);
Always returns UTF-8
Further more if $userkey is sent with value "no" and we do the following
if($userkey == "no"){
echo "not registered"
}else{
echo "registered find db record"
}
The code always seems to drop into else - Any help would be great :)

Related

Login string with accent from android application to PHP script

I'm developping an Android application.
This app has to connect to a server with login and password. This launches a PHP script on the remote server which then get the login and password strings.
When I use a character such "é" in the login (which is possible), the PHP script get the character "�".
How to get the correct character in my PHP script ?
Thanks.

You can convert the character to UTF-8 using mb_convert_encoding() in PHP
// PHP
$enc = mb_convert_encoding("é", "UTF-8");
echo $enc;
Set charset=UTF-8 in your Android application HTTP client as well.
// JAVA
byte[] bytes = "é".getBytes("UTF-8");
String body = new String(bytes, Charset.forName("UTF-8"));

GSM modem AT commands UCS2 error 500

Three days ago I started making a simple app to send SMS. I already tested it and it works in GSM CSCS mode, but when I switch it to UCS2 it doesn't show the Cyrillic letters
<?php
error_reporting(E_ALL);
$fp = fopen("/dev/ttyUSB0", 'w+');
$msg = strtoupper(bin2hex(mb_convert_encoding("Тест", "UCS-2", "UTF-8")));
$number_raw = "+359000000000";
$number = bin2hex(mb_convert_encoding($number_raw, "UCS-2", "UTF-8"));
echo $number."<br>";
echo $msg;
$debug = false;
if(!$fp || $debug){
//echo "Can not open!";
}else{
fwrite($fp, "AT+CMGF=1".chr(13)); // OK
sleep(5);
fwrite($fp, 'AT+CSCS="UCS2"'.chr(13)); // OK
sleep(5);
fwrite($fp, 'AT+CMGS="'.$number.'"'.chr(13)); // OK
sleep(5);
fwrite($fp, $msg.chr(26)); // ERROR 500
echo "<br>Sent";
}
?>
Number and message are encoded properly according to this source: http://www.columbia.edu/kermit/ucs2.html
When the message is sent, I receive it (so the encoding of the number is correct) but the content isn't shown properly.
https://imgur.com/a/cCaTU
What are possible causes for this behaviour and could it be my PHP file encoding? Also why is linux finding 3 GSM tty devices?

After you have removed all the horrible sleep calls and implemented proper final response parsing, then you need fix the parsing for waiting for the "\r\n> " response of AT+CMGS.
Without any of that fixed, character set encoding trouble is a very minor problem. When you run AT+CSCS="UCS2" every single string will be encoded by the modem that way and must be encoded that way by you until eternity (or another character set is selected), so for instance to switch from UCS2 to UTF-8 would be AT+CSCS="005500540046002D0038" and not AT+CSCS="UTF-8".

How to store and retrieve extended ASCII characters in MSSQL

I was surprised that I was unable to find a straightforward answer to this question by searching.
I have a web application in PHP that takes user input. Due to the nature of the application, users may often use extended ASCII characters (a.k.a. "ALT codes").
My specific issue at the moment is with ALT code 26, which is a right arrow (→). This will be accompanied with other text to be stored in the same field (for example, 'this→that').
My column type is NVARCHAR.
Here's what I've tried:
I've tried doing no conversions and just inserting the value as normal, but the value gets stored as thisâ??that.
I've tried converting the value to UCS-2 in PHP using iconv('UTF-8', 'UCS-2', $value), but I get an error saying Unclosed quotation mark after the character string 't'.. The query ends up looking like this: UPDATE myTable SET myColumn = 'this�!that'.
I've tried doing the above conversion and then adding an N before the quoted value, but I get the same error message. The query looks like this: UPDATE myTable SET myColumn = N'this�!that'.
I've tried removing the UCS-2 conversion and just adding the N before the quoted value, and the query works again, but the value is stored as thisâ that.
I've tried using utf8_decode($value) in PHP, but then the arrow is just replaced with a question mark.
So can anyone answer the (seemingly simple) question of, how can I store this value in my database and then retrieve it as it was originally typed?
I'm using PHP 5.5 and MSSQL 2012. If any question of driver/OS version comes into play, it's a Linux server connecting via FreeTDS. There is no possibility of changing this.

You might try base64 encoding the input, this is fairly trivial to handle with PHP's base64_encode() and base64_decode() and it should handle what ever your users throw at it.
(edit: You can apparently also do the base64 encoding on the SQL Server side. This doesn't seem like something it should be responsible for imho, but it's an option.)

It seems like your freetds.conf is wrong. You need a TDS protocol version >= 7.0 to support unicode. See this for more details.
Edit your freetds.conf:
[global]
# TDS protocol version
tds version = 7.4
client charset = UTF-8
Also make sure to configure PHP correct:
ini_set('mssql.charset', 'UTF-8');

The accepted answer seems to do the job; yes you can encode it to base64 and then decode it back again, but then all the applications that use that remote database, should change and support the fields to be base64 encoded. My thought is that if there is a remote MS SQL Server database, there could be an other application (or applications) that may use it, so that application have to also be changed to support both plain and base64 encoding. And you'll have to also handle both plain text and base64 converted text.
I searched a little bit and I found how to send UNICODE text to the MS SQL Server using MS SQL commands and PHP to convert the UNICODE bytes to HEX numbers.
If you go at the PHP documentation for the mssql_fetch_array (http://php.net/manual/ru/function.mssql-fetch-array.php#80076), you'll see at the comments a pretty good solution that converts the text to UNICODE HEX values and then sends that HEX data directly to MS SQL Server like this:
Convert Unicode Text to HEX Data
// sending data to database
$utf8 = 'Δοκιμή με unicode → Test with Unicode'; // some Greek text for example
$ucs2 = iconv('UTF-8', 'UCS-2LE', $utf8);
// converting UCS-2 string into "binary" hexadecimal form
$arr = unpack('H*hex', $ucs2);
$hex = "0x{$arr['hex']}";
// IMPORTANT!
// please note that value must be passed without apostrophes
// it should be "... values(0x0123456789ABCEF) ...", not "... values('0x0123456789ABCEF') ..."
mssql_query("INSERT INTO mytable (myfield) VALUES ({$hex})", $link);
Now all the text actually is stored to the NVARCHAR database field correctly as UNICODE, and that's all you have to do in order to send and store it as plain text and not encoded.
To retrieve that text, you need to ask MS SQL Server to send back UNICODE encoded text like this:
Retrieving Unicode Text from MS SQL Server
// retrieving data from database
// IMPORTANT!
// please note that "varbinary" expects number of bytes
// in this example it must be 200 (bytes), while size of field is 100 (UCS-2 chars)
// myfield is of 50 length, so I set VARBINARY to 100
$result = mssql_query("SELECT CONVERT(VARBINARY(100), myfield) AS myfield FROM mytable", $link);
while (($row = mssql_fetch_array($result, MSSQL_BOTH)))
{
// we get data in UCS-2
// I use UTF-8 in my project, so I encode it back
echo '1. '.iconv('UCS-2LE', 'UTF-8', $row['myfield'])).PHP_EOL;
// or you can even use mb_convert_encoding to convert from UCS-2LE to UTF-8
echo '2. '.mb_convert_encoding($row['myfield'], 'UTF-8', 'UCS-2LE').PHP_EOL;
}
The MS SQL Table with the UNICODE Data after the INSERT
The output result using a PHP page to display the values
I'm not sure if you can reach my test page here, but you can try to see the live results:
http://dbg.deve.wiznet.gr/php56/mssql/test1.php

Base64Decode to file - whats missing?

I have in base64 encoded string in a $_POST field $_POST['nimage'] if I echo it directly as the src value in an img tag, i see the image just fine in browser: echo "<img src='".$_POST['nimage']."'>";
Now, I'm obviously missing a step, because when I base64_decode the string and write it to a file locally on the server, an attempt to view the created file in browser states error:
"The image 'xxxx://myserversomewhere.com/images/img1.jpg' cannot be displayed because it contains errors"
My decode and file put are:
$file = base64_decode($_POST['nimage']);
file_put_contents('images/'. $_POST['imgname'], $file);
which results in images/img1.jpg on the local server. What am I doing wrong in the decode here? Although the base64 output doesn't appear to be URLencoded I have tried urldecode() on it first before base64_decode() just for safe measure with same results.
First few lines of the base64 encode is:
data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgICAgMCAgIDAwMDBAYEBAQEBAgGBgUGCQgKCgkICQkKDA8MCgsOCwkJDRENDg8QEBEQCgwSExIQEw8QEBD/2wBDAQMDAwQDBAgEBAgQCwkLEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBD/wAARCAF4AqsDAREAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD2gJt+XPJPUGv2A/NB2044oAdtY9M8ccCgB6r8+0jtSYDxEW4xz2qQFCnGOPQ0AAQDJIz9KAF8rI6/hQA9Y+SBgjHIqWA5Yxz2xUsBwUdAMdzSAcFGAB0NADgCVK/KB/OgB6BNzc49agse2OgX2BFZvcCRUO7g

The data you're decoding has a data URI header attached:
data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...
The header is use by the browser to identify the file type and encoding, but isn't part of the encoded data.
Strip the header (data:image/jpeg;base64,) from the data and base64 decode the rest before writing it to a file: you should be good to go.
$b64 = 'data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...';
$dat = explode(',' $b64);
// element 1 of array from explode() contains B64-encoded data
if (($fileData = base64_decode($dat[1])) === false) {
exit('Base64 decoding error.');
}
file_put_contents($someFileName, $fileData);
NB: Check the return value of your call to base64_decode() for false and abort somehow with a message. It will trap any problems with the decoding process (like not removing the header!).

utf8_encode or decode isn't doing what I expect

I am taking an XML file and reading it into various strings, before writing to a database, however I am having difficulty with German characters.
The XML file starts off
<?xml version="1.0" encoding="UTF-8"?>
Then an example of where I am having problems is this part
<name><![CDATA[PONS Großwörterbuch Deutsch als Fremdsprache Android]]></name>
My PHP has this relevant section
$dom = new DOMDocument();
$domNode = $xmlReader->expand();
$element = $dom->appendChild($domNode);
$domString = utf8_encode($dom->saveXML($element));
$product = new SimpleXMLElement($domString);
//read in data
$arr = $product->attributes();
$link_ident = $arr["id"];
$link_id = $platform . "" . $link_ident;
$link_name = $product->name;
So $link_name becomes PONS GroÃwÃ¶rterbuch Deutsch als Fremdsprache Android
I then did a
$link_name = utf8_decode($link_name);
Which when I echoed back in terminal worked fine
PONS GroÃwÃ¶rterbuch Deutsch als Fremdsprache Android as is now
PONS Großwörterbuch Deutsch als Fremdsprache Android after utf8decode
However when it is written into my database it appears as:
PONS KompaktwÃ¶rterbuch Deutsch-Englisch (Android)
The collation for link_name in MysQL is utf8_general_ci
How should I be doing this to get it correctly written into my database?
This is the code I use to write to the database
$link_name = utf8_decode($link_name);
$link_id = mysql_real_escape_string($link_id);
$link_name = mysql_real_escape_string($link_name);
$description = mysql_real_escape_string($description);
$metadesc = mysql_real_escape_string($metadesc);
$link_created = mysql_real_escape_string($link_created);
$link_modified = mysql_real_escape_string($link_modified);
$website = mysql_real_escape_string($website);
$cost = mysql_real_escape_string($cost);
$image_name = mysql_real_escape_string($image_name);
$query = "REPLACE into jos_mt_links
(link_id, link_name, alias, link_desc, user_id, link_published,link_approved, metadesc, link_created, link_modified, website, price)
VALUES ('$link_id','$link_name','$link_name','$description','63','1','1','$metadesc','$link_created','$link_modified','$website','$cost')";
echo $link_name . " has been inserted ";
and when I run it from shell I see
PONS Kompaktwörterbuch Deutsch-Englisch (Android) has been inserted

You've got a UTF-8 string from an XML file, and you're putting it into a UTF-8 database. So there is no encoding or decode to be done, just shove the original string into the database. Make sure you've used mysql_set_charset('utf-8') first to tell the database there are UTF-8 strings coming.
utf8_decode and utf8_encode are misleadingly named. They are only for converting between UTF-8 and ISO-8859-1 encodings. Calling utf8_decode, which converts to ISO-8859-1, will naturally lose any characters you have that don't fit in that encoding. You should generally avoid these functions unless there's a specific place where you need to be using 8859-1.
You should not consider what the terminal shows when you echo a string to be definitive. The terminal has its own encoding problems and especially under Windows it is likely to be impossible to output every character properly. On a Western Windows install the system code page (which the terminal will use to turn the bytes PHP spits out into characters to display on-screen) will be code page 1252, which is similar to but not the same as ISO-8859-1. This is why utf8_decode, which spits out ISO-8859-1, appeared to make the text appear as you expected. But that's of little use. Internally you should be using UTF-8 for all strings.

You must use mb_convert_encoding or iconv unction before you write into your database.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.