jpgraph charset problem - php

I use php jpgraph library, but there i a small problem with charsets.
lets assume graph.php generates the image, and i call it from some.php
some.php
...
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
...
<img src="graph.php" />
...
as you see, i set charsets in some.php, but it doesn't show the text of the graph, which is in foreign language.(maybe i must set it in graph.php? but how?)
What is the problem?
Thanks
UPDATE
even when i try to enter numerical HTML encoding
of the Unicode character from here, it doesn't work for Armenia language:/

The meta tag only applies to the content of the page, not to images displayed within the page (even images generated dynamically by your graph.php script)
Quoting from the jpgraph faq
16 How can I print unicode characters?
Use &#XXXX; format in your strings
where XXXX is the decimal value for
the unicode character. You may find a
list of Unicode characters and there
encodings at www.unicode.org Please
observe that the encoding in the lists
are given in hexadecimal and these
values must be converted to decimal.
Note: If You are working in an UTF-8
environment then the characters may be
input directly.

Related

strange words appear when extract arabic text from pdf (PdfToText)

I have a problem when extract arabic text from pdf.
I use PdfToText library
The text appears in this figure (΋ΎϬϧϟ΍υϔΣϟ΍ΦϳέΎΗ ΏϟΎρϟ΍ϡϳΩϘΗΝΫϭϣϧ ΩϳϘϟ΍ϡϗέ)
How can i solve it ? i tried
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
but this did not solve my problem
English letters are part of basic ASCII char set so the output is usually without any problems however any other languages using various accents or even different letters, ie. Arabic, Azbuka, Greek, etc. uses letters out of the basic set.
Make sure all three sources are using same encoding:
all the PHP scripts generating the output
the HTML encoding meta tag
the output file as well
ad 1
Check your editor how it saves the PHP scripts to the file system. The way how to set it up differs from each editor
ad 2
Use HTML meta tag <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
ad 3
define the encoding to use UTF-8 for example: pdftotext -enc UTF-8 your.pdf. According to the documentation the PdfToText class generates UTF8-encoded text.

Kohana 3 - In View ® shows invalid character

We are in the process of converting to UTF-8 throughout our site. For the most part we've not had any problems, but presently the ® symbol shows up in the views as an invalid character, but when if the value is output in the Controller then it is displays correctly.
We have made certain to include the meta attribute in the main view:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
Is there a setting that we are missing?
Your files should be UTF-8 encoded (UTF-8 without BOM, ANSI as UTF-8).
Note: Kohanas' HTML::chars() uses Kohana::$charset to decide which charset to use when encoding, so use it instead of htmlspecialchars().

PHP SimpleXML Values returned have weird characters in place of hyphens and apostrophes

I have looked around and can't seem to find a solution so here it is.
I have the following code:
$file = "adhddrugs.xml";
$xmlstr = simplexml_load_file($file);
echo $xmlstr->report_description;
This is the simple version, but even trying this any hyphens r apostrophes are turned into: ^a (euro sign) trademark sign.
Things I have tried are:
echo = (string)$xmlstr->report_description; /* did not work */
echo = addslashes($xmlstr->report_description); /* yes I know this doesnt work with hyphens, was mainly trying to see if I could escape the apostrophes */
echo = addslashes((string)$xmlstr->report_description); /* did not work */
also htmlspecial(again i know does not work with hyphens), htmlentities, and a few other tricks.
Now the situation is I am getting the XML files from a feed so I cannot change them, but they are pretty standard. The text with the hyphens etc are encapsulated in a cdata tag and encoding is UTF-8. If I check the source I am shown the hyphens and apostrophes in the source.
Now just to see if the encoding was off or mislabeled or something else weird, I tried to view the raw XML file and sure enough it is displayed correctly.
I am sure that in my rush to find the answer I have overlooked something simple and the fact that this is really the first time I have ever used SimpleXML I am missing a very simple solution. Just don't dock me for it I really did try and find the answer on my own.
Thanks again.
This is the simple version, but even
trying this any hyphens apostrophes
are turned into: ^a (euro sign)
trademark sign.
This is caused by incorrect charset guessing (and possibly recoding).
If a text contains a "curly apostrophe" = "Right single quotation mark" = U+2019 character, saving it in UTF-8 encoding results in bytes 0xE2 0x80 0x99. If the same file is then read again assuming its charset is windows-1252, the byte stream of the apostrophe character (0xE2 0x80 0x99) is interpreted as characters ’ (=small "a" with circumflex, euro sign, trademark sign). Again if this incorrectly interpreted text is saved as UTF-8 the original character results in byte stream 0xC3 0xA2 0xE2 0x82 0xAC 0xE2 0x84 0xA2
Summary: Your original data is UTF-8 and some part of your code that reads the data assumes it is windows-1252 (or ISO-8859-1, which is usually actually treated as windows-1252). A probable reason for this charset assumption is that default charset for HTTP is ISO-8859-1. 'When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.' Source: RFC 2616, Hypertext Transfer Protocol -- HTTP/1.1
PS. this is a very common problem. Just do a Google or Bing search with query doesn’t -doesn't and you'll see many pages with this same encoding error.
Do you know the document's character set?
You could do header('Content-Type: text/html; charset=utf-8'); before any content is printed, if you havent already.
Make sure you have set up SimpleXML to use UTF-8 too.
Be sure that all the entities are encoded using hex notation, not HTML entities.
Also maybe:
$string = html_entity_decode($string, ENT_QUOTES, "utf-8");
will help.
This is a symptom of declaring an incorrect character set in the <head> section of your page (or not declaring and using default character set without accents and special characters).
This does the trick for latin languages.
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
For TOTAL NEWBIES, html pages for browsers have a basic layout, with a HEAD or HEADER which serves to tell the browser some basic stuff about the page, as well as preload some scripts that the page will use to achieve its functionality(ies).
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
Hello world
</body>
</html>
if the <head> section is omitted, html will use defaults (take some things for granted - like using the northamerican character set, which does NOT include many accented letters, whch show up as "weird characters".

Browser displays � instead of ´

I have a PHP file which has the following text:
<div class="small_italic">This is what you´ll use</div>
On one server, it appears as:
This is what you´ll use
And on another, as:
This is what you�ll use
Why would there be a difference and what can I do to make it appear properly (as an apostrophe)?
Note to all (for future reference)
I implemented Gordon's / Gumbo's suggestion, except I implemented it on a server level rather than the application level. Note that (a) I had to restart the Apache server and more importantly, (b) I had to replace the existing "bad data" with the corrected data in the right encoding.
/etc/php.ini
default_charset = "iso-8859-1"
You have to make sure the content is served with the proper character set:
Either send the content with a header that includes
<?php header("Content-Type: text/html; charset=[your charset]"); ?>
or - if the HTTP charset headers don't exist - insert a <META> element into the <head>:
<meta http-equiv="Content-Type" content="text/html; charset=[your charset]" />
Like the attribute name suggests, http-equiv is the equivalent of an HTTP response header and user agents should use them in case the corresponding HTTP headers are not set.
Like Hannes already suggested in the comments to the question, you can look at the headers returned by your webserver to see which encoding it serves. There is likely a discrepancy between the two servers. So change the [your charset] part above to that of the "working" server.
For a more elaborate explanation about the why, see Gumbo's answer.
The display of the REPLACEMENT CHARACTER � (U+FFFD) most likely means that you’re specifying your output to be Unicode but your data isn’t.
In this case, if the ACUTE ACCENT ´ is for example encoded using ISO 8859-1, it’s encoded with the byte sequence 0xB4 as that’s the code point of that character in ISO 8859-1. But that byte sequence is illegal in a Unicode encoding like UTF-8. In that case the replacement character U+FFFD is shown.
So to fix this, make sure that you’re specifying the character encoding properly according to your actual one (or vice versa).
To sum it maybe up a little bit:
Make sure the FILE saved on the web server has the right encoding
Make sure the web server also delivers it with the right encoding
Make sure the HTML meta tags is set to the right encoding
Make sure to use "standard" special chars, i.e. use the ' instead of ´of you want to write something like "Luke Skywalker's code"
For encoding, UTF-8 might be good for you.
If this answer helps, please mark as correct or vote for it. THX
The simple solution is to use ASCII code for special characters.
The value of the apostrophe character in ASCII is ’. Try putting this value in your HTML, and it should work properly for you.
Set your browser's character set to a defined value:
For example,
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
This is probably being caused by the data you're inserting into the page with PHP being in a different character encoding from the page itself (the most common iteration is one being Latin 1 and the other UTF-8).
Check the encoding being used for the page, and for your database. Chances are there will be a mismatch.
Create an .htaccess file in the root directory:
AddDefaultCharset utf-8
AddCharset utf-8 *
<IfModule mod_charset.c>
CharsetSourceEnc utf-8
CharsetDefault utf-8
</IfModule>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Unicode and PHP - am I doing something wrong?

I'm using Kohana 3, which has full support for Unicode.
I have this as the first child of my <head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
The Unicode character I am inserting into is é as in Café.
However, I am getting the triangle with a ? (as in could not decode character).
As far as I can tell in my own code, I am not doing any string manipulation on the text.
In fact, I have placed the accent straight into a view's PHP file and it is still not working.
I copied the character from this page: http://www.fileformat.info/info/unicode/char/00e9/index.htm
I've only just started examining PHP's Unicode limitations, so I could be doing something horribly wrong.
So, how do I display this character? Do I need to resort to the HTML entity?
Update
So this works
Caf<?php echo html_entity_decode('é', ENT_NOQUOTES, 'UTF-8'); ?>
Why does that work? If I copy the output accented e from that script and insert it into my document, it doesn't work.
View the http headers. You should see something like
Content-Type: text/html; charset=UTF-8
Browsers don't pay much attention to meta tags, if there was a real http header stating a different encoding.
update
Whatcha get from this?
echo bin2hex('é');
echo chr(0xc3) . chr(0xa9);
You should get c3a9é, otherwise I'd say file encoding issue.
I guess, you see �, the replacement character for invalid UTF-8 byte sequences. Your text is not UTF-8 encoded. Check your editor’s settings to control the encoding of the PHP file.
If you’re not sure about the encoding of your sources, you can enforce UTF-8 compatibilty as described here (German text): Force UTF-8.
You should never need entities except the basic ones.

Categories