Characters with accents keep appearing as "�" - php

I'm using a simple php script to scour an RSS feed, store the scoured data to a temporary cache flat file, then display it along the side of my website. However all the characters with accents appear as "�" What is causing this and how can I fix it?

You're having a problem with your character encoding. Depending on which encoding the feed uses, you have to use the same to display your data, or try to convert it to the encoding you're using on your website. PHP offers iconv() for that purpose, for example.
In case the encoding is UTF-8 (or any other multibyte encoding), you also have to make sure you use multibyte-safe functions/methods in your PHP scripts, in case you process the feed in your application.
To deliver your content in UTF-8, for example, you have to send the appropriate content header before any other output.
Example:
header('Content-Type: text/html; charset=utf-8');

Related

How to get degree symbol to show inside my table [duplicate]

I have a program which extracts GPS coordinates from metadata and imports the results onto a database. I then display the data using PHP on a webpage.
My problem - I've recently created a new template but for whatever reason, it is no longer showing the degrees symbol '°' but a '�'.
I just find it strange that it works with one template, but not the other?
I've tried changing fonts, but had no luck
See DEGREE CHARACTER.
Specifically, HTML Entity: °.
Check that you have the proper docstring and character encoding in both templates to make sure they are correct.
You can use:
utf8_encode('YOUR TEXT');
utf8_encode converts the string data from the ISO-8859-1 encoding to UTF-8.
Save your PHP file with UTF-8 encoding.
Serve your PHP file with charset=UTF-8.
Add a META-tag in your HTML with charset=UTF-8.
This will solve (almost) all of your unicode character problems.
When you input or pull the data from you database, use htmlentities()
you can find a good guide on this function a here http://php.net/manual/en/function.htmlentities.php
For degree celcius in html
<span>℃</span>

Encoding problems using PHP Gettext

I am trying to start using Gettext for my php project.
However, I have some encoding problems. If I use UTF-8 encoding in the .mo files and use
"bind_textdomain_codeset('messages', 'UTF-8');"
I don't see the accents properly in the browser. In Firefox, in order to see them OK, I have to change the browser codification to UTF-8 (it is not the default encoding). As I can't expect my visitators to change their browser encoding, what should I do?
I also tried changing everything to ISO-8859-15 and, although accents work OK (even with the browser default encoding), the € sign doesn't work. And I have also read there are problemas when using languages like russian, so it doesn't seem to be the right way.
How should I proceed?
Thank you :)
You should instruct the browser that the page you are sending is encoded in UTF-8. Do this using header before you actually output any content:
header('Content-Type: text/html; charset=utf-8');
Of course this assumes that the page is in UTF-8 in the first place.
In general, the one law that you can never disregard is that all content in your page must be in the same encoding (and that's the encoding you use when declaring the Content-Type).
If all sources for the content (e.g. your hardcoded stuff, what comes from gettext, what comes from a database) are in that encoding, everything is fine. If not then you have to manually convert all content from sources that diverge to the encoding of the page, which is possible through iconv or mb_convert_encoding.

Displaying utf8 on flash?

I am using flash to read contents from a UTF8 page, which has unicode in it.
The problem is that when Flash loads the data it displays ???????? instead all unicode.
What could be the problem?
By default Flash treats strings as if they are encoded using UTF-8. The reason that you are seeing characters that possibly substitute non-printable characters or invalid / missing glyphs could be that you set System.useCodepage to true - if that's what happened, then why did you do that?
Otherwise, the font that is used to display the characters may be missing glyphs for the characters you need. You can check that by using Font.hasGlyphs("string with the glyphs"); to make sure the text can be displayed. This would normally only apply to embedded fonts.
Yet another possibility is that the source text you are trying to display is not a UTF-8 encoded string. Some particularly popular file formats such as XML and HTML some times use a declaration of the format in no correspondence to the actual payload (example XML tag: <?xml encoding="utf-8" ?> can be attached to any XML regardless of the actual encoding of the document). In order to make sure that the text is in UTF-8 - read it as ByteArray and verify that the first bit of every byte is set to 0. Single-byte encodings that use national characters use the first bit to encode their characters, while UTF-8 never does that.
Flash internally uses UTF-8 to represent strings, so there should not be a problem if the entire stack uses UTF-8 encoding.
You probably have an implicit decode/encode step somewhere along the way.
This could really be a million things, unfortunately. Start from the ground up, insert traces and/or log messages to see where the conversion fails. Make sure your XML-content uses UTF-8, and especially if you're using PHP, make sure that all the PHP source files are saved in UTF-8 encoding - editing PHP files in simple text editors often results in Windows/Mac format source files, which will then break your character encoding. Also, verify HTML request/response headers to see if there is an encoding mismatch.

How to make PHP use the right charset?

I'm making a KSSN (Korean ID Number) checker in PHP using a MySQL database.
I check if it is working by using a file_get_contents call to an external site.
The problem is that the requests (with Hangul/Korean characters in them) are using the wrong charset.
When I echo the string, the Korean characters just get replaced by question marks.
How can I make it to use Korean? Should I change anything in the database too?
What should be the charset?
PHP Source and SQL Dump: http://www.multiupload.com/RJ93RASZ31
NOTE: I'm using Apache (HTML), not CLI.
You need to:
tell the browser what encoding you wish to receive in the form submission, by setting Content-Type by header or <meta> as in aviv's answer.
tell the database what encoding you're sending it bytes in, using mysql_set_charset().
Currently you are using EUC-KR in the database so presumably you want to use that encoding in both the above points. In this century I would suggest instead using UTF-8 throughout for all web apps/databases, as the East Asian multibyte encodings are an anachronistic unpleasantness. (With potential security implications, as if mysql_real_escape_string doesn't know the correct encoding, a multibyte sequence containing ' or \ can sneak through an SQL injection.)
However, if enpang.com are using EUC-KR for the encoding of the Name URL parameter you would need either to stick with EUC-KR, or to transcode the name value from UTF-8 to EUC-KR for that purpose using iconv(). (It's not clear to me what encoding enpang.com are using for URL parameters to their name check service; I always get the same results anyway.)
I don't know the charset, but if you are using HTML to show the results you should set the charset of the html
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
You can also use iconv (php function) to convert the charset to a different charset
http://php.net/manual/en/book.iconv.php
And last but not least, check your database encoding for the tables.
But i guess that in your case you will only have to change the meta tag.
Basically all charset problems stem from the fact that they're being mixed and/or misinterpreted.
A string (text) is a sequence of bytes in a specific order. The string is encoded using some specific charset, that in itself is neither right nor wrong nor anything else. The problem is when you try to read the string, the sequence of bytes, assuming the wrong charset. Bytes encoded using, for example, KS X 1001 just don't make sense when you read them assuming they're UTF-8, that's where the question marks come from.
The site you're getting the text from sends it to you in some specific character set, let's assume KS X 1001. Let's assume your own site uses UTF-8. Embedding a stream of bytes representing KS X 1001 encoded text in the middle of UTF-8 encoded text and telling the browser to interpret the whole site as UTF-8 leads to the KS X 1001 encoded text not making sense to the UTF-8 parser.
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
KSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKS
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
will be rendered as
Hey, this is UTF-8 encoded text, awesome!
???????I?have?no?idea?what?this?is???????
Hey, this is UTF-8 encoded text, awesome!
To solve this problem, convert the fetched text into UTF-8 (or whatever encoding you're using on your site). Look at the Content-Type header of that other site, it should tell you what encoding the site is in. If it doesn't, take a guess.

Proper rendering of special characters in Flash, parsed from XML and generated with PHP/MySQL

Probably a problem many of you have encountered some day earlier, but i'm having problems with rendering of special characters in Flash (as2 and as3).
So my question is: What is the proper and fool-proof way to display characters like ', ", ë, ä, etc in a flash textfield? The data is collected from a php generated xml file, with content retrieved from a SQL database.
I believe it has something to do with UTF-8 encoding of the retrieved database data (which i've tried already) but I have yet to find a solid solution.
Just setting the header to UTF-8 won't work, it's a bit like changing the covers on a book from english to french and expecting the contents to change with it.
What you need to to is to make sure your text is UTF-8 from beginning to end, store it as that in the database, if you can't do that, make sure you encode your output properly.
If you get all those steps down it should all work just fine in flash, assuming you've got the proper glyphs embedded unless you're using a system font.
AS2 has a setting called useSystemCodepage, this may seem to solve the problem, but will likely make it break even more for users on different codepages, try to avoid this unless you're really sure of what you're doing.
Sometimes having those extra letters in your language actually helps ;)
I think that it's enough for you to put this in the xml head
<?xml version="1.0" encoding="UTF-8"?>
If your special characters are a part of Unicode set (and they should be, otherwise you're basically on your own), you just need to ensure that the font you're using to render the text has all of the necessary glyphs, and that the database output produces proper unicode text.
Some fonts don't neccessarily include all the unicode glyphs, but only a subset of them (usually dropping international glyphs and special characters). Make sure the font has them (test the font out in a word processor, for example). Also, if you're using embedded fonts, be sure to embed all the characters you need to use.

Categories