I'm using SimpleXML to read nodes, and I echo out the image file name. Using foreach, I print them out:
assets/project_Guide2Big1.jpg
assets/project_Guide2Big2.jpg
assets/project_Guide2Big3.jpg
assets/project_Guide2Big4.jpg
assets/project_Guide2Big5.jpg
I inserted these values into my img tags, but the images don't appear except for the first one.
I copy "assets/project_Guide2Big1.jpg" into the browser. I see the image, but when I copy "assets/project_Guide2Big2.jpg", the address changes to this
asset/%E2%80%8BprojectGuide2Big2.jpg.
It looks like some urlencoding(?). I tried to decode, but my images still aren't working. This is so wierd.
Were does the %E2%80%8B come from?
That looks suspiciously like a UTF-8 character sequence representing some Unicode character which you didn't expect to be there.
Using this online converter, we can see that the sequence of UTF-8 bytes E2 80 8B represent the Unicode codepoint U+200B, which is a "Zero Width Space".
So somehow, your source XML includes an invisible character after the slash. When echoed to the screen, it is completely invisible - even when viewing source, since the source is still just text. But when you try to load the URL, that character is outside the valid range for URLs, so gets automatically encoded by the browser.
You might be wondering what the point of a zero-width space is, but consider automatic word-wrap functions - they may look for a space to break on, but a URL contains no spaces. So inserting a zero-width space makes the text look the same, but allows it to wrap at that specific point. Another character useful for this is the "soft hyphen", which has the beautifully apt entity name of - as a friend of mine put it, "the soft hyphen is shy, and may not appear". :)
Related
I store codes like "\u1F603" within messages in my database, and now I need to display the corresponding emoji on my web page.
How can I convert \u1F603 to \xF0\x9F\x98\x83 using PHP for displaying emoji icons in a web page?
You don't need to convert emoji character codes to UTF-8 sequences, you can simply use the original 21-bit Unicode value as numeric character reference in HTML like this: 😃 which renders as: 😃.
The Wikipedia article "Unicode and HTML" explains:
In order to work around the limitations of legacy encodings, HTML is designed such that it is possible to represent characters from the whole of Unicode inside an HTML document by using a numeric character reference: a sequence of characters that explicitly spell out the Unicode code point of the character being represented. A character reference takes the form &#N;, where N is either a decimal number for the Unicode code point, or a hexadecimal number, in which case it must be prefixed by x. The characters that compose the numeric character reference are universally representable in every encoding approved for use on the Internet.
For example, a Unicode code point like U+5408, which corresponds to a particular Chinese character, has to be converted to a decimal number, preceded by &# and followed by ;, like this: 合, which produces this: 合.
So if in your PHP code you have a string containing '\u1F603', then you can create the corresponding HTML string using preg_replace, as in following example:
$text = "This is fun \\u1F603!"; // this has just one backslash, it had to be escaped
echo "Database has: $text<br>";
$html = preg_replace("/\\\\u([0-9A-F]{2,5})/i", "&#x$1;", $text);
echo "Browser shows: $html<br>";
This outputs:
Database has: This is fun \u1F603!
Browser shows: This is fun 😃!
Note that if in your data you would use the literal \u notation also for lower range Unicode characters, i.e. with hex numbers of 2 to 4 digits, you must make sure the next user's character is not also a hex digit, as it would lead to a wrong interpretation of where the \u escape sequence stops. In that case I would suggest to always left-pad these hex numbers with zeroes in your data so they are always 5 digits long.
To ensure your browser uses the correct character encoding, do the following:
Specify the UTF-8 character encoding in the HTML head section:
<meta charset="utf-8">
Save your PHP file in UTF-8 encoding. Depending on your editor, you may need to use a "Save As" option, or find such a setting in the editor's "Preferences" or "Options" menu.
Hell everyone,
after many try i can found solution.
I user below code:
https://github.com/BriquzStudio/php-emoji
include 'Emoji.php';
$message = Emoji::Decode($message);
This one working fine for me!! :)Below is my reslut
I have an XML with special Characters encoded as &#xxx; in it. As long as I'd output these characters to a browser, that would work fine as they're HTML-Encodings (sort of).
But I need to read the XML-File with simplexml_load_string, which results in garbage for certain characters, because they're in the extended ASCII-table.
For example:
translates to š - but when I try to use html_entity_decode, I get an empty character.
I tried almost everything from iconv to mb_decode_numericentity - nothing worked.
How do I convert those &#xxx; to the real characters???
[Edit]
I found this table http://www.ascii-code.com that claims the is an extended ASCII Character using ISO-8859-1
I'm confused...
You're apparently dealing with two different characters that look almost identical when printing:
'LATIN SMALL LETTER S WITH CARON' (U+0161) actually encodes as š
corresponds to 'SINGLE CHARACTER INTRODUCER' (U+009A)
I've found that none of my fonts or text editors handle the second one properly. So you most likely get a blank character for that precise reason.
The second one appears to be some kind of weird control character whose exact purpose escapes from my understanding:
To be followed by a single printable character (0x20 through 0x7E) or
format effector (0x08 through 0x0D). The intent was to provide a means
by which a control function or a graphic character that would be
available regardless of which graphic or control sets were in use
could be defined. Definitions of what the following byte would invoke
was never implemented in an international standard. Not part of the
first edition of ISO/IEC 6429
It's worth noting that character references in XML use numeric codes from a fixed encoding (some UCS variant). If the author of the XML file doesn't follow this convention you'll be faced with either invalid XML (something that effectively prevents it from being parsed with an XML library) or valid XML that contains corrupted data (something that, at most, will require tedious post-processing).
I have a problem with a hidden character not showing up either in database (phpmyadmin) nor website. Website has utf-8 character encoding. If I copy/paste the string with the "hidden" character into Notepad I can see it. It looks just like a bullet character but is hidden. What type of character is this and can it be removed with PHP?
The user able to type this character is using Mac and are probably doing a copy/paste from a document (maybe unicode?) into a form on our website and saves it. So this character is not visible with utf-8 encoding but visible if I copy my string into a Notepad document.
This is the hidden character at the end of the string. Looks like a bullet:
Copy the character, then fire up PowerShell and do the following (yes, it's convoluted, sorry):
'U+{0:X4}'-f+[char]'<PASTE>'
and paste the character where it says <PASTE>. It should give you the Unicode code point of that character. You then should be able to write something that removes it from the string, but from my eyes there shouldn't be any input that destroys the document layout, except maybe fun things like RTL markers.
Short explanation of the above: [char]'x' converts a single-character string to a char, + will then treat it as a number (similar to [int], but shorter). The rest is a format string and the formatting operator -f.
Here's a link I found, which even has a character I need to play with for other projects of mine.
http://www.fileformat.info/info/unicode/char/2446/index.htm
There is a box with the Title of: "Encodings" on that page. And I am wondering about some of the rows.
I obviously need a course on this sort of thing, but I'm wondering what the difference is between "HTML Entity (decimal)" and "HTML Entity (hex)".
The funny thing is, which confuses me, I throw those characters on a web page, and they display fine. But I haven't specified any UTF-8 encoding in the php page.
<?php
$string1 = '⑆';
$string2 = '⑆';
echo $string1;
echo '<br>';
echo $string2;
?>
Does the browser know how to display both automatically?
And to make it weirder, I can only see those characters on my Mac, in Firefox.
But my windows box doesn't want to show them. I've tested it in chrome, and firefox. Do I need to tell the browsers to view them correctly? Or is it an operating system modification?
They're both valid numeric HTML entities, and the browser does indeed know how to decode them. The difference is the first is a hexadecimal number, while the latter is decimal.
0x2446 = 9286
Note that 0x means hexadecimal.
Also note that it is good practice to always have your server explicitly specify an encoding. The W3C explains how to do so. UTF-8 is a good choice.
If you use any Unicode encoding, you can always put the character right on your page, so you don't have to use entities.
To be exact, neither is an entity reference. & is an entity reference that refers to the entity named amp that is defined as:
<!ENTITY amp CDATA "&" -- ampersand, U+0026 ISOnum -->
Here you can see that the entity’s value is just another reference: &.
⑆ and ⑆ are “just” character references (numeric character references to be exact) and refers to characters by specifying the code position of a character in the Universal Character Set, i.e. the Unicode character set.
You can use any "HTML Entity" in any encoding and in practice, if You have installed appropriate fonts, every browser will work fine. Well, it was created for displaying characters that are not included in current encoding. In Your situations it looks You have to install some fonts on Your Windows box.
On the other hand, it has almost nothing to do with PHP.
i have tried to copy euro symbol from Wikipedia...and echo it (in my parent page),at that time it is working.but when i replace the same html content using jquery(used same symbol to echo in the other page).it is not displaying.why is it so..(or is der any way to display the same thing using html)?
In HTML you do this
€
And of course this works with jQuery, or any other web based language you are using
For more information look here
You need to ensure that your data is encoded using $X, that your server claims it is encoded using $X, and that any meta tags or xml prologs you may have also claim it is encoded using $X.
... where $X is a character encoding which includes the euro symbol. UTF-8 is recommended.
The W3C have an introduction to character encoding.
You can bypass this using HTML entities (€ in this case), which let you represent characters using ASCII (which is a subset of pretty much any character encoding you care to name). This has the advantage of being easy to type of a keyboard which doesn't have that character, but requires a tiny bit more bandwidth and will make it hard to read the source code of documents which include a lot of non-ASCII characters.
Note that HTML entities will only work when dealing with HTML. You'll find it breaking if you try things such as $(input).val('€').