Get UTF-8 equivalent of a Wingdings character - php

The Problem
I'm developing a PHP application which displays Wingdings and Webdings characters. If I Just put a font-tag around it, the character gets displayed correctly. Though, once it gets copy-pasted it reverts to the character it was before like "a".
What I think would be the Solution
This problem could be solved by escaping every wingdings character on the page by the UTF-8 equivalent. UTF-8 holds so many characters, so I'm guessing that Wingings characters and the like are also on that list.
Question
How can I map/create UTF-8 characters from Wingdings characters?

Here is a list of equivalent unicode characters to wingdings.
Is this, what you are looking for?
http://www.alanwood.net/demos/wingdings.html

Related

Strange umlaut encoding on file system

From time to time I encounter files that have a strange (wrong?) encoding of umlaut characters in their file names. Maybe the encoding comes from a Mac system, but I'm not sure. I work with Windows.
For example:
Volkszählung instead of Volkszählung (try to use Backspace after the first ä).
When pasting it into an ANSI encoded file with notepad++ it inserts Volksza¨hlung.
I have two questions:
a) Where does that come from and which encoding is it?
b) Using glob() in PHP does not list these files when using the wildchard character *. How is it possible to detect them in PHP?
That's a combining character: specifically, U+0308 COMBINING DIARESIS. Combining characters are what let you put things like umlauts on any character, not just specific "precomposed" characters with built-in umlauts like U+00E4 LATIN SMALL LETTER A WITH DIAERESIS. Although it's not necessary to use a combining character in this case (since a suitable precomposed character exists), it's not wrong either.
(Note, this isn't an "encoding" at all: in the context of Unicode, an encoding is a method for transforming Unicode codepoint numbers into byte sequences so they can be stored in a file. UTF-8 and UTF-16 are encodings. But combining characters are Unicode codepoints, just like normal characters; they're not something produced by the encoding process.)
If you're working with Unicode text, you should be using PHP's mbstring functions. The built-in string functions aren't Unicode-aware, and see strings only as sequences of bytes rather than sequences of characters. I'm not sure how mbstring treats combining characters, though; the documentation doesn't mention them at all, as far as I can see.
You should also take a look at the grapheme functions, which are specifically meant to cope with combining characters. A "grapheme unit" is the single visual character produced by a base character codepoint plus any combining characters that follow it.
Finally, the PCRE regex functions support a \X escape sequence that matches whole grapheme clusters rather than individual codepoints.

preg_replace PREG_BAD_UTF8_ERROR

i have an annoying problem with preg_replace and charsets. I'm doing a couple preg_replace in a row but unfortunate the first time any special character like äöüß is inserted by preg_replace i'm getting PREG_BAD_UTF8_ERROR on subsequent calls.
Beside that the special characters inserted are displayed just fine, they just break any subsequent preg_replace call. Is preg_ utf-8 only?
The text preg_replace is working on is coming from MySQL Database, also the replacement is crafted in the php file with values from MySQL. mb_detect_encoding() says ASCII for the text until the first replacement with special characters, it then detects UTF-8, so it changes and this might be the problem.
For your information i'm working with iso-8859-1 encoding (PHP, MySQL, meta-charset). Furthermore i have a workaround with htmlentities on the replacement string that is working for now.
Any ideas on how to solve it?
What you are looking for is probably mb_ereg_replace. It handles multibyte encodings and should perform fine with differrent ones. Be sure to use mb_regex_encoding along with it.

preg_match multiple and accentuated characters

The code above
preg_match('~\b(rain|dry|certain|clear)\b~i',$string);
It works like a charm, but when i'm searching for words with accentuated characters it doesn't work.
Can somebody help me
Well, technically a and á and à are all different characters to the interpreter. They are encoded differently and there is no way to know which different encodings represent a "similar" character (in some languages accented character are radically different letters). So you would need to include all variants you want to match. However, if you need the actual offset within the string, you might encounter difficulties, because for UTF-8 strings the offset is given in bytes not characters.
See this SO question for an example how to include all versions of a character.
And this bug report in case you encounter the problem with the wrong offsets.

Replace all special characters from a string using PHP

I am using jQuery editor with PHP it works fine for plane text (text with out special characters)
but if I try to post text which contain special characters then it does not store these special characters in to db table..
and when I tried to replace any special character with HTML codes it works fine.
But it is too difficult to replace all special character one by one..
Is there any script which replace all special characters from a string...?
Do you mean something like PHP's str_replace()?
http://php.net/manual/en/function.str-replace.php
Is there any script which replace all special characters from a string...?
This is the wrong approach. You need to get your character sets right, so will be no need to replace anything.
I don't know what you're doing, but if you are transmitting data through Ajax, it is probably UTF-8 encoded. If your database is in a different character set, you may need to convert it.
Basic (deep) reading: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
For more specific information, you will need to provide more details about your situation. Here are a few questions that deal with the subject, maybe one of them already helps:
Special characters in PHP / MySQL
How to store characters like ♥☆ to DB?

Special characters in Flex

I am working on a Flex app that has a MySQL database. Data is retrieved from the DB using PHP then I am using AMFPHP to pass the data on to Flex
The problem that I am having is that the data is being copied from Word documents which sometimes result in some of the more unusual characters are not displaying properly. For example, Word uses different characters for starting and ending double quotes instead of just " (the standard double quotes). Another example is the long dash instead of -.
All of these characters result in one or more accented capital A characters appearing instead. Not only that, each time the document is saved, the characters are replaced again resulting in an ever-increasing number of these accented A's appearing.
Doing a search and replace for each troublesome character to swap it for one of the none characters seems to work but obviously this requires compiling a list of all the characters that may appear and means there is scope for this continuing as new characters are used for the first time. It also seems like a bit of a brute force way of getting round the problem rather than a proper solution.
Does anyone know what causes this and have any good workarounds / fixes? I have had similar problems when using utf-8 characters in html documents that aren't set to use utf-8. Is this the same thing and if so, how do I get flex to use utf-8?
Many thanks
Adam
It is the same thing, and smart quotes aren't special as such: you will in fact be failing for every non-ASCII character. As such a trivial ad-hoc replace for the smart quote characters will be pointless.
At some point, someone is mis-decoding a sequence of bytes as ISO-8859-1 or Windows code page 1252 when it should have been UTF-8. Difficult to say where without detail/code.
What is “the document”? What format is it? Does that format support UTF-8 content? If it does not, you will need to encode output you put into it at the document-creation phase to the encoding the consumer of that document expects, eg. using iconv.

Categories