Special characters in Flex - php

I am working on a Flex app that has a MySQL database. Data is retrieved from the DB using PHP then I am using AMFPHP to pass the data on to Flex
The problem that I am having is that the data is being copied from Word documents which sometimes result in some of the more unusual characters are not displaying properly. For example, Word uses different characters for starting and ending double quotes instead of just " (the standard double quotes). Another example is the long dash instead of -.
All of these characters result in one or more accented capital A characters appearing instead. Not only that, each time the document is saved, the characters are replaced again resulting in an ever-increasing number of these accented A's appearing.
Doing a search and replace for each troublesome character to swap it for one of the none characters seems to work but obviously this requires compiling a list of all the characters that may appear and means there is scope for this continuing as new characters are used for the first time. It also seems like a bit of a brute force way of getting round the problem rather than a proper solution.
Does anyone know what causes this and have any good workarounds / fixes? I have had similar problems when using utf-8 characters in html documents that aren't set to use utf-8. Is this the same thing and if so, how do I get flex to use utf-8?
Many thanks
Adam

It is the same thing, and smart quotes aren't special as such: you will in fact be failing for every non-ASCII character. As such a trivial ad-hoc replace for the smart quote characters will be pointless.
At some point, someone is mis-decoding a sequence of bytes as ISO-8859-1 or Windows code page 1252 when it should have been UTF-8. Difficult to say where without detail/code.
What is “the document”? What format is it? Does that format support UTF-8 content? If it does not, you will need to encode output you put into it at the document-creation phase to the encoding the consumer of that document expects, eg. using iconv.

Related

Convert ä to Umlaut

I am parsing a Document via xpath and fetch info from a metatag.
I am passing thsi string through utf8_decode( $metadesc ) but still get no normal Umlauts. The Document is UTF-8.
I want to convert Ã&#xA4 to ä.
I am debugging via the console in firebug and write the data also into a DB.
In both cases, I get the same result.
For text inside Div's it works. Only that one of the metatag is wrong.
Many Thanks
Well, it's true that xC3A4 is the UTF-8 encoding of the Unicode character xE4 which is ä. But in XML, the sequence ä represents something quite different: it represents "capital A with tilde" followed by "currency sign" (that is, ä). If you use an XML parser, you will see these two characters, and you won't get any indication that they started life as hex character references.
If possible, you should try and fix the program that generated this incorrect encoding of the character: that's much better than trying to repair the damage later.
If you do want to do it by a "repair" operation, you need to take into account that the sequence ä might actually represent the two characters that XML says it represents: how will you tell the difference? I don't know any PHP, but basically the way to do it is to extract the hex value xC3A4 and then put this through UTF-8 decoding.

PHP and XML fusion, correctly encoding special characters

I've got a database that outputs a great deal of information.
I'm currently building a PHP application to build this database into an XML format for another application to read.
I'm a little stuck with special characters.
In the database, some characters are printing strangely:
Ø becomes Ø
° becomes °
I'm using fwrite() to write the XML file in the PHP and I think the error resides there somehow.
I need a way to overcome this, perhaps by detecting where an occurrance of these characters occur and replacing them appropriately.
I'm using PHP and I'm not sure how to replace these characters on an individual basis, and more importantly, I'm not sure what to replace them with!
Can someone help?
Ø becomes Ø, ° becomes °
Looks like that UTF-8 encoded characters are passed to some display device and it's told the display device that those are ISO-8859-X or Windows-125X encoded characters.
Tell the display device that this is indeed UTF-8 (which is by default the standard encoding for XML).

Weird character encoding in Firefox

I have a problem with character encoding in Firefox. When I copy/paste a paragraph from Microsoft Word (2007), it could contains special character like this (dots/squares to make a list or quote) :
 Te’st
 Ze’f
• Gzg’a
The quote ’ is different compared to this quote ' (typed directly using keyboard). So I paste this in a textarea and save (using AJAX in some case). In the database (which has a collation latin1_swedish_ci) it shows perfectly fine. But when getting these data to edit again using Firefox, it shows weird binary symbols. Works fine in Chrome and IE.
I don't want to modify the charset of the database. Is there any way to solve this problem?
Note: you can also test by viewing this post in Chrome and FF
The characters you copypasted (assuming they got transmitted correctly into this forum) contain, in addition to letters, three occurrences of U+2019 RIGHT SINGLE QUOTATION MARK, which is the correct punctuation apostrophe in English and many other languages, one occurrence of U+2022 BULLET, which sounds ok, and two occurrences of U+F0A7, which is in the Private Use (PU) range and should not be used public information exchange, only for special purposes by mutual agreements between interested parties.
It is possible that some notations in Word 2007 documents get converted to PU characters in copy and paste, but at least normal list bullet normally becomes U+2022 BULLET. So it is a bit of a mystery where the PU characters come from.
Regarding single quotes, they are representable in windows-1252 too, and latin1_swedish_ci seems to cover it (though it is, as far as I understand, just the definition of collating order, rather than a character encoding). And as you are saying that the data looks fine in the database, it seems that problem is in the way in which the data is written in an HTML document served to the browser.
In particular, if the encoding of the page in which the data is then presented is UTF-8 and the actual data is there in windows-1252 encoding, problems arise. It would mean a problem like the one you describe, as U+2019 is encoded as 0x92 in windows-1252, and this causes a character-level data error when interpreted as UTF-8.
You can check the situation by using View→Encoding in Firefox when viewing the result page. If my hypothesis is correct, you will see UTF-8 selected there, and changing it to “West European (windows-1252)” makes the single quote appear (and may mess up other things on the page thoroughly).

Replace all special characters from a string using PHP

I am using jQuery editor with PHP it works fine for plane text (text with out special characters)
but if I try to post text which contain special characters then it does not store these special characters in to db table..
and when I tried to replace any special character with HTML codes it works fine.
But it is too difficult to replace all special character one by one..
Is there any script which replace all special characters from a string...?
Do you mean something like PHP's str_replace()?
http://php.net/manual/en/function.str-replace.php
Is there any script which replace all special characters from a string...?
This is the wrong approach. You need to get your character sets right, so will be no need to replace anything.
I don't know what you're doing, but if you are transmitting data through Ajax, it is probably UTF-8 encoded. If your database is in a different character set, you may need to convert it.
Basic (deep) reading: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
For more specific information, you will need to provide more details about your situation. Here are a few questions that deal with the subject, maybe one of them already helps:
Special characters in PHP / MySQL
How to store characters like ♥☆ to DB?

Issues with stored JSON encoded POSTs in MySQL

I have a situation where after several years of use we are suddenly have some JSON-encoded values that are giving our Perl script fits due to backslashes.
The issues are with accented characters like í and é. An example is Matí encoded as Mat\ud873.
It is unclear what may have changed in the environment. PHP, Perl, and MySQL are involved. The table collation is latin1_swedish_ci and this may have been changed by a co-worker screwing around.
Does this ring any bells for anyone?
The problem here is internationalization on the JavaScript end, not the collation of your DB table. If you had no such problems before, it's likely that no users were inputting international characters before, or the character set of your HTML pages was ISO-8859-1/cp1252 (which would have limited form POST data on the client end.) New users or changed HTML headers could have caused this problem to manifest itself, but the issue is really on the side of the Perl script.
JSON defines strings as double-quoted sets of characters with Unicode escape sequences when more than a 7-bit encoding is necessary. The first 127 ISO-8859-1 characters can be represented as-is, but any extended-ASCII/multi-byte characters will end up as \uXXXX values. For example, character é (e-acute), which is #233 in ISO-8859-1 will show up as \u00E9 (since é is U+00E9 in Unicode), and the string "résumé" would be stored as "r\u00E9sum\u00E9".
Not knowing what your Perl script is attempting to do, all I can say is it may be experiencing difficulty when trying to de-reference the escape sequence. Perl has its own set of escape sequences, and \u mid-string actually means "make the next character upper-case", so you're probably seeing a lot of "00E9" stuff from your Perl script instead of the accented characters, or you may get parse errors depending on your script.
Since you're creating/storing the JSON from POST data in PHP, you have some options:
Convert the special characters to HTML entities (htmlentities())
Force all special characters to reduce from UTF-8 sequences (if that's what your POST data comes in) to ISO-8859-1 via utf8_decode() (you may lose data with this approach)
Scrub the resultant JSON by replacing this REGEX match: /\\u[a-zA-Z0-9]{4,4}/ with "" (nothing) (you may lose data with this approach)
Double-escape the resultant JSON by changing all "\" characters to "\\" before feeding it to your Perl script (be wary of SQL injection!)

Categories