I am using Delphi 7 and ICS components to communicate with php script and insert some data in mysql database...
How to post unicode data using http post ?
After using utf8encode from tnt controls I am doing it to post to PHP script
<?php
echo "Note = ". $_POST['note'];
if($_POST['action'] == 'i')
{
/*
* This code will add new notes to the database
*/
$sql = "INSERT INTO app_notes VALUES ('', '" . mysql_real_escape_string($_POST['username']) . "', '" . mysql_real_escape_string($_POST['note']) . "', NOW(), '')";
$result = mysql_query($sql, $link) or die('0 - Ins');
echo '1 - ' . mysql_insert_id($link);
?>
Delphi code :
data := Format('date=%s&username=%s&password=%s&hash=%s¬e=%s&action=%s',
[UrlEncode(FormatDateTime('yyyymmddhh:nn',now)),
UrlEncode(edtUserName.Text),
UrlEncode(getMd51(edtPassword.Text)),
UrlEncode(getMd51(dataHash)),UrlEncode(Utf8Encode(memoNote.Text)),'i'
]);
// try function StrHtmlEncode (const AStr: String): String; from IdStrings
HttpCli1.SendStream := TMemoryStream.Create;
HttpCli1.SendStream.Write(Data[1], Length(Data));
HttpCli1.SendStream.Seek(0, 0);
HttpCli1.RcvdStream := TMemoryStream.Create;
HttpCli1.URL := Trim(ActionURLEdit.Text);
HttpCli1.PostAsync;
But when I post that unicode value is totally different then original one that I see in Tnt Memo
Is there something I am missing ?!
Also anybody knows how to do this with Indy?
Thanks.
Your example code shows your data coming from a TNT Unicode control. That value will have type WideString, so to get UTF-8 data, you should call Utf8Encode, which will return an AnsiString value. Then call UrlEncode on that value. Make sure UrlEncode's input type is AnsiString. So, something like this:
var
data, date, username, passhash, datahash, note: AnsiString;
date := FormatDateTime('yyyymmddhh:nn',now);
username := Utf8Encode(edtUserName.Text);
passhash := getMd51(edtPassword.Text);
datahash := getMd51(data);
note := Utf8Encode(memoNote.Text);
data := Format('date=%s&username=%s&password=%s&hash=%s¬e=%s&action=%s',
[UrlEncode(date),
UrlEncode(username),
UrlEncode(passhash),
UrlEncode(datahash),
UrlEncode(note),
'i'
]);
There should be no need to UTF-8-encode the MD5 values since MD5 string values are just hexadecimal characters. However, you should double-check that your getMd51 function accepts WideString. Otherwise, you may be losing data before you ever send it anywhere.
Next, you have the issue of receiving UTF-8 data in PHP. I expect there's nothing special you need to do there or in MySQL. Whatever you store, you should get back identically later. Send that back to your Delphi program, and decode the UTF-8 data back into a WideString.
In other words, your Unicode data will look different in your database because you're storing it as UTF-8. In your database, you're seeing UTF-8-encoded data, but in your TNT controls, you're seeing the regular Unicode characters.
So, for instance, if you type the character "ش" into your edit box, that's Unicode character U+0634, Arabic letter sheen. As UTF-8, that's the two-byte sequence 0xD8 0xB4. If you store those bytes in your database, and then view the raw contents of the field, you may see characters interpreted as though those bytes are in some ANSI encoding. One possible interpretation of those bytes is as the two-character sequence "Ø´", which is the Latin capital letter o with stroke followed by an acute accent.
When you load that string back out of your database, it's still encoded as UTF-8, just as it was when you stored it, so you will need to decode it. As far as I can tell, neither PHP nor MySQL does any massaging of your data, so whatever UTF-8 character you give them will be returned to you as-is. If you are using the data in Delphi, then call Utf8Decode, which is the complement to the Utf8Encode function that you called previously. If you are using the data in PHP, then you might be interested in PHP's utf8_decode function, although that converts to ISO-8859-1, which doesn't include our example Arabic character. Stack Overflow already has a few questions related to using UTF-8 in PHP, so I won't attempt to add to them here. For example:
Best practices in PHP and MySQL
with international strings
UTF-8 all the way through…
Encode the UTF-8 data in application/x-www-form-urlencoded. This will ensure that the server can read the data over the http connection
I would expect (without knowing for sure) that you'd have to output them as &#nnnnn entities (with the number in decimal rather than hex ... I think)
Related
In php, json_encode() will encode UTF8 in hex entities, e.g.
json_encode('中'); // become "\u4e2d"
Assume the data "\u4e2d" is now being stored in MySQL, is it possible to convert back from "\u4e2d" to 中 without using PHP, just plain MySQL?
On my configuration, select hex('中'); returns E4B8AD
which is the hex code of the UTF8 bytes. Naturally it
is not the same as the hex of the code point 4e2d, but you can get
that with select hex(cast('中' as char(1) character set utf16));.
Update: The questioner has edited the question, to what looks to me like a completely different question, now it's apparently: how to get '中' given a string containing '\u4e2d' when 4e2d is the code point of 中 and the default character set is utf8. Okay, that is
select cast(char(conv(right('\u4e2d',4),16,10) using utf16) as char(1) character set utf8);
Encoding non-ASCII characters as JavaScript entities is only one of the different things that JSON encoders will do—and it isn't actually mandatory:
echo json_encode('中'), PHP_EOL;
echo json_encode('中', JSON_UNESCAPED_UNICODE), PHP_EOL;
echo json_encode('One "Two" Three \中'), PHP_EOL;
"\u4e2d"
"中"
"One \"Two\" Three \\\u4e2d"
Thus the only safe decoding approach is using a dedicated JSON decoder. MySQL bundles the required abilities since 5.7.8:
SET #input = '"One \\"Two\\" Three \\\\\\u4e2d"';
SELECT #input AS json_string, JSON_UNQUOTE(#input) AS original_string;
json_string original_string
============================ ===================
"One \"Two\" Three \\\u4e2d" One "Two" Three \中
(Demo)
If you have an older version you'll have to resort to more elaborate solutions (you can Google for third-party UDF's).
In any case, I suggest you get back to the design table. It's strange that you need JSON data in a context where you don't have a proper JSON decoder available.
I am building an app with Apache cordova for the support team for my company and everything was ok when I was using a test database in UTF8 was working.
Then when I was implement the real db I notice it was encoded with win-1252.
The problem is, even the db is with win-1252 we have many rows using special caracters like "ç" and "~" and "´" and "`" and with that when I am running the php all rows in the tables in my db will not show becasue of that.
Keep in mind I cann't convert the db to utf8.
ps:The solution I see is go to each row and remove that caracters but isn't a good solution(about 20,000 rows)
........................
PHP file:
header("Access-Control-Allow-Origin: *");
$dbconn = pg_connect("host=localhost dbname=bdgestclientes2
user=postgres password=postgres")
or die('Could not connect: ' . pg_last_error());
$data=array();
$q=pg_query($dbconn,"SELECT * FROM clientes WHERE idcliente = 3");
$row=pg_fetch_object($q)){$data[]=$row};
echo json_encode($data);
I just needed to add a line in php to encode to unicode so I could use the data and display the way it is
pg_set_client_encoding($dbconn, "UNICODE");
That shouldn't be a problem at all.
Windows-1252 supports “ç” (code point 0xE7), “~” (code point 0x7E), “`” (code point 0x60) and “´” (code point 0xB4).
PostgreSQL will automatically convert the characters to the database encoding.
You will get problems if you want to store characters that do not occur in Windows-1252, like “Σ”.
In that case, the correct solution is to use a database with a different encoding (UTF8).
If you cannot do that, you'll have to store the strings as binary objects (data type bytea) and handle encoding in your application. That will only work well if you don't need to process these functions in the database (e.g., use an index for case insensitive search).
I have a similar issue, where I cannot modify the database setup, but I use php's html entity encode to work around:
I removed the html key elements from the native htmlentities because I work with wysiwyg editors and need to keep the content like that. If you have no such limitations you can just use htmlentities on the string.
function makeFriendly($string)
$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['\'']);
unset($list['<']);
unset($list['>']);
unset($list['&']);
$search = array_keys($list);
$replace = array_values($list);
$search = array_map('utf8_encode', $search);
str_replace($replace, $search, $string);
}
If I need the actual characters I can always call html_entity_decode on the database string to get the 'real' string.
I'm having some troubles with my $_POST/$_REQUEST datas, they appear to be utf8_encoded still.
I am sending conventional ajax post requests, in these conditions:
oXhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded; charset=utf-8");
js file saved under utf8-nobom format
meta-tags in html <header> tag setup
php files saved under utf-8-nobom format as well
encodeURIComponent is used but I tried without and it gives the same result
Ok, so everything is fine: the database is also in utf8, and receives it this way, pages show well.
But when I'm receiving the character "º" for example (through $_REQUEST or $_POST), its binary represention is 11000010 10111010, while "º" hardcoded in php (utf8...) binary representation is 10111010 only.
wtf? I just don't know whether it is a good thing or not... for instance if I use "#º#" as a delimiter of the explode php function, it won't get detected and this is actually the problem which lead me here.
Any help will be as usual greatly appreciated, thank you so much for your time.
Best rgds.
EDIT1: checking against mb_check_encoding
if (mb_check_encoding($_REQUEST[$i], 'UTF-8')) {
raise("$_REQUEST is encoded properly in utf8 at index " . $i);
} else {
raise(false);
}
The encoding got confirmed, I had the message raised up properly.
Single byte utf-8 characters do not have bit 7(the eight bit) set so 10111010 is not utf-8, your file is probably encoded in ISO-8859-1.
The scheme is a text input field in a html form to be autocompleted using jQuery.autocomplete and getting the appropriate server response (e;g. a city name json list). The whole package works well... except that the client does not get data returned from the server when typing accented characters (éèà..). Same as many, it looks like I'm facing a char encoding issue but can not manage to figure out where and how to solve it despite many tries (iconv, utf8_encode, urldecode...) and readings like this one for example.
Therefore I'd need some help/hints to understand where to act (before prototyping jQuery autocomplete code ... ?)
EDIT: might be also a jQuery accent folding issue, I'll try also that way.
Configuration:
server: Apache2.2 (debian lenny)
php : compiled 5.3.3 (so the option JSON_UNESCAPED_UNICODE is not available for json_encode)
mysql: 5.1.49 with MySQL charset: UTF-8 Unicode (utf8),
class: using a modified PFBC2.x version for the php form building
meta
The website is mostly for french users so it's all designed with ISO-8859-1 (bad initial choice I guess) :
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
jQuery autocomplete code (applied to the city input field)
// DEBUG Testing (tested w/ and w/o the $charset_attr: no change)
$charset_attr = 'contentType: "application/x-www-form-urlencoded;charset=ISO-8859-1"';
echo 'jQuery("#' . $this->attributes["id"] . '").autocomplete({source:"' , $this->xhr_path . '", minLength:2, ' . $charset_attr .'});';
The generated code for that input field is matching the above expection.
Converting mysql rows into utf8 using this function :
I convert the msyql returned array into utf8 prior to sending json back to the client. Actually I tested and wrote also other functions, but this does not change anything so I guess the point is not there.
$encoded_arr = utf8json($returnData);
echo json_encode($encoded_arr);
flush();
Encoding control 1 (client side)
A embed control in the html form in order to check which char encoding is actually passed to jQuery.autocomplete :
jQuery(document).ready(function() {
<?php
$test_str ="foobar";
$check_encoding = "'" . mb_detect_encoding($test_str) . "'";
?>
alert('Check charset server encoding: ' + <?php echo $check_encoding;?> ); // output : ASCII
});
Encoding control 2 (server side)
$inputData = (isset($_GET))? htmlspecialchars($_GET['term'],ENT_COMPAT, 'UTF-8') : NULL;
$encoding_get = mb_detect_encoding($_GET['term']);
$encoding_data = mb_detect_encoding($inputData);
$utf8converted = #iconv(strtolower($encoding_get), 'utf-8', $inputData);
$checkconversion = mb_detect_encoding($utf8converted);
Sending lowcase normal characters (ea..), I get all as ASCII.
Sending lowcase accented characters (éèà..), I get all as UTF8.
So I'm lost as the server receives the proper char string, produces a json return (tested without ajax) but it looks like the client does not receive or interprate this properly.
For those facing the same kind of ...%$# issue, here is what I've done to solve my case :
Checking the char encoding at each node (eg client, apache server, mysql server), using mb_detect_encoding on the server side,
Finally pointed out the problem location node : in my case passing UTF8 chars to the mysql server i/o latin ISO-8859-1, so mysql server did not return the expected answers, which I could not detect or debug with direct url POSTing data to the server script. So I had log the input and output in a file, checking entry character encoding and mysql server output.
Changed the ajax request to POST i/o GET,
Solved by encoding $_POST data to ISO prior sending the mysql server request, using mb_convert_encoding, as well described here.
We have php server that sends json string in utf-8 encoding.
Im responsible for the iphone app that get the data.
I want to be sure that on my side everything is correct :
//after I downlad the data stream :
NSString* content = [[NSString alloc] initWithData:self.m_dataToParse encoding:NSUTF8StringEncoding];
//here the data is shown correctly in the console
NSLog(#"%#",content);
SBJsonParser *_parser = [[SBJsonParser alloc]init];
NSDictionary *jsonContentDictionary = [_parser objectWithData:self.m_dataToParse];
//here, when i printer values of array IN array, i see \u454 u\545 \4545 format. any ideas why ?
for(id key in jsonContentDictionary)
{
NSLog(#"key:%#, value:%#,key, [ jsonContentDictionary objectForKey:key]);
}
im using the latest version of json library :
https://github.com/stig/json-framework/
There is problem is the iphone side ? (json parser ? ) or in the php server ?
just to be clear again :
1.on the console, before json, the string looks o.k
2.after doing json, the array in array values are in the format of \u545 \u453 \u545
Thanks in advance.
Your code is correct.
A possible reason of the issue, and you must investigate it with your content provider (the server that sends the json to you), is that even if the whole json string is correctly encoded as utf-8 (remember: the json text is a sequence of character and so an encoding must be specified), it may happen that some or all of the text content (that is the values of the single objects contained in the json message) has been originally encoded in another format, typically this is html-encoding (iso-8859) especially when particular characters are used (e.g. cyrillic or asian). Now the json framework by default decodes all data as utf-8, but if there is a coding mismatch between the utf-8 characters and the iso-8859 (just to remain in the example) then the only way to transform them in utf-8 is to use the \u format. This happens quite often, especially when php scripts extract the info from html pages, which are usually encoded using iso-8859. And consider also that iOS is not able to convert the whole set of iso-8859 characters to unicode (e.g.: cyrillic).
So possible solutions are:
- do a content encoding of texts server side (iso-8859 --> utf-8)
- or if this is not possible, then it's up to you to recognize the \uxxx sequences coming more often from your content provider and replace them with the corresponding utf-8 characters.