We have to implement a system called "Zahlen mit Code" (German for "pay by code"), a convenient, fast and lossless way of initiating credit transfers by using your smartphone & a QR-Code.
As mpdf (used by the php-based invoice-system "invoiceplane" for exports) can manage the QR-Code generation with flying colors we desperately seek for a solution to our problem: The creation of QR-Codes that contain special characters like Lf and Cr. (We need them to meet the requirements: "Elements are separated with line endings, where both variants Lf and CrLf are permitted.", Reference [it's worth reading]: https://www.stuzza.at/de/download/qr-code/339-qr-code-und-bcd-definition-2-en/file.html, Page 5)
To create a QR-Code within our UTF-8 coded HTML we use code snippets like:
<barcode code="Two\rlines" size="0.8" type="QR" error="M" class="barcode" />
Unfortunately, using "\r" within the QR-Code-text doesn't do the trick (there is no actual carriage return, only the used characters - e.g. "\r" appear within the string). We tried several variants like \n;
[HTML dec],
[HTML hex]; 0x0D (UTF8 hex) and so on. Our guess is that it gets escaped by some sanitizing code or we simply escape it the wrong way (or just use the wrong special/control character).
This is only possible with mPDF since version 7.0. The \r\n and \n characters in the barcode code parameter are treated as actual newlines.
Related
I am translating with poedit. However poedit seems to be ignoring apostrophes. For example shouldn't is coming through as shouldnt. I am encoding in utf-8. Does anyone know why this is the case and if there is a solution ?
I assure you that Poedit isn't somehow ignoring or eating apostrophes — that's preposterous. It's just an editor that puts whatever you wrote, exactly as you wrote it (yes, including ' or any Unicode characters), into your PO and MO files.
Your problem is in your PHP code where you incorrectly escape the (translated) strings before printing them — how and in what context you do that is unfortunately something you didn't share.
But this is why e.g. WordPress has functions like esc_attr_e that do any necessary escaping and do it correctly, so that you don't have to do anything ridiculous (and painful to work with!) like substituting ' with ’ in all your translations (which wouldn't even work when using untranslated text…).
You need to use the html entity: ’
Source: http://geektnt.com/tag/poedit
Some text characters need to be converted into html entities otherwise they will not display correctly. A very common example is a word containing an apostrophe or single quote (‘) which needs to be replaced with ’ — for example, Chloe O’Brian should be written as Chloe O’Brian. For a complete list of html entities, visit W3Schools.
I have an XML with special Characters encoded as &#xxx; in it. As long as I'd output these characters to a browser, that would work fine as they're HTML-Encodings (sort of).
But I need to read the XML-File with simplexml_load_string, which results in garbage for certain characters, because they're in the extended ASCII-table.
For example:
translates to š - but when I try to use html_entity_decode, I get an empty character.
I tried almost everything from iconv to mb_decode_numericentity - nothing worked.
How do I convert those &#xxx; to the real characters???
[Edit]
I found this table http://www.ascii-code.com that claims the is an extended ASCII Character using ISO-8859-1
I'm confused...
You're apparently dealing with two different characters that look almost identical when printing:
'LATIN SMALL LETTER S WITH CARON' (U+0161) actually encodes as š
corresponds to 'SINGLE CHARACTER INTRODUCER' (U+009A)
I've found that none of my fonts or text editors handle the second one properly. So you most likely get a blank character for that precise reason.
The second one appears to be some kind of weird control character whose exact purpose escapes from my understanding:
To be followed by a single printable character (0x20 through 0x7E) or
format effector (0x08 through 0x0D). The intent was to provide a means
by which a control function or a graphic character that would be
available regardless of which graphic or control sets were in use
could be defined. Definitions of what the following byte would invoke
was never implemented in an international standard. Not part of the
first edition of ISO/IEC 6429
It's worth noting that character references in XML use numeric codes from a fixed encoding (some UCS variant). If the author of the XML file doesn't follow this convention you'll be faced with either invalid XML (something that effectively prevents it from being parsed with an XML library) or valid XML that contains corrupted data (something that, at most, will require tedious post-processing).
I have some questions about \r\n:
newlines are browser dependent? (not how they are displayed in a browser, but how <textarea> sends them to php via http request)
newlines are system dependent? (where php runs)
will php apply some implicit conversion?
will mysql apply some implicit conversion?
Thanks in advance!
newlines are browser dependent?
No. Use <br> to get a newline in a browser
newlines are system dependent? (where php runs)
yes : \n on OSX, \n on Unix/Linux, \r\n on Windows
will php apply some implicit conversion?
no
will mysql apply some implicit conversion?
no
Generally, for browser \r and \n are whitespace chars, like ' ' (whitespace) of \t (tab). Inside some tags (script, pre etc.) they are treated as line break symbols. In this case browser will understand any of common line break sequences (\r, \r\n, \n).
When data comes from textarea, line breaks will always be represented as \r\n.
Line breaks in php files doesn't depend on system where they're running. It depends on settings of file editor used for creating php files. When you copy a php file to another system, line breaks format will not change.
For example, look at this code:
print_r("
" === "\r\n");
Its result will depend on settings of the editor used for creating this file. It doesn't depend on current system.
But if you're trying to read some other files contained by your system (text files, for example) these files will most probably use system's common line breaks format.
No, PHP and MySQL don't apply implicit conversions.
The system independent way is using PHP_EOL constant.
New lines is not browser dependent, outer a tag with CSS white-space:pre you must to execute nl2br() php function to convert newlines to BR tags.
You may be interested in nl2br, this takes new line characters like you described and replaces them with a HTML line break (<br />).
A big gotcha for me was that in single quoted strings 'like\nthis' escape sequences (like \n) will not be interpreted. You have to use double quotes "like\nthis" to get an actual newline.
<br> is browser independent, \n should be too.
Don't know about \r
MySQL won't convert it
I am currently translating my PHP application using gettext with POEdit. Since I respect the print margin in my source code, I was used to writing strings like that:
print $this->translate("A long string of text
that needs to follow the print margin and since
php outputs whitespaces for every break line I do
my sites renders correctly.");
However, in POEdit, as expected, the linebreaks are not escaped to whitespaces.
A long string of text\n
that needs to follow the print margin and since\n
php outputs whitespaces for every break line I do\n
my websites render correctly.\n
I know one approach would be to close the strings when changing lines in the source code like that:
print $this->translate("A long string of text " .
"that needs to follow the print margin and since " .
"php outputs whitespaces for every break line I do " .
"my sites renders correctly. ");
But it is not an approach that is extensible for me when texts need to change and print margin
still respected, unless netbeans (the IDE I use) can do that for me automatically just like eclipse
in java.
So in conclusion, is there a way to tell the POEdit parser to escape linebreaks as whitespaces in the preferences?
I know that the strings are still translatable even though linebreaks are not escaped, I'm asking this so my traductor (sometimes even the customer/user) will avoid confusion into thinking he needs to duplicate the linebreaks while he translates in POEdit.
You have to make sure that your using the right line breaks in your script and your app
LF: Line Feed, U+000A
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029
Within Windows systems (ms-dos) there line feed is CR+LF, And within "Unix-like" systems its LF adn 8Bit commodore's its a CR
You have to make sure that the source location contains the same type of feeds to your edit location.
Your server handles its line feeds different to the host that the editor is running on, just double check this and develope some means of auto replacing the Unicode chars depending on your OS
As you say that your "translating my PHP application using gettext with POEdit", i would create a script to go threw all your files via shell/doss/php and auto convert the character codes to the type of system your running on.
so if your working on Windows then you would search for all chars that are U+000A and replace with U+000DU+000A
i have tried to copy euro symbol from Wikipedia...and echo it (in my parent page),at that time it is working.but when i replace the same html content using jquery(used same symbol to echo in the other page).it is not displaying.why is it so..(or is der any way to display the same thing using html)?
In HTML you do this
€
And of course this works with jQuery, or any other web based language you are using
For more information look here
You need to ensure that your data is encoded using $X, that your server claims it is encoded using $X, and that any meta tags or xml prologs you may have also claim it is encoded using $X.
... where $X is a character encoding which includes the euro symbol. UTF-8 is recommended.
The W3C have an introduction to character encoding.
You can bypass this using HTML entities (€ in this case), which let you represent characters using ASCII (which is a subset of pretty much any character encoding you care to name). This has the advantage of being easy to type of a keyboard which doesn't have that character, but requires a tiny bit more bandwidth and will make it hard to read the source code of documents which include a lot of non-ASCII characters.
Note that HTML entities will only work when dealing with HTML. You'll find it breaking if you try things such as $(input).val('€').