I have a problem when extract arabic text from pdf.
I use PdfToText library
The text appears in this figure (ΎϬϧϟυϔΣϟΦϳέΎΗ ΏϟΎρϟϡϳΩϘΗΝΫϭϣϧ ΩϳϘϟϡϗέ)
How can i solve it ? i tried
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
but this did not solve my problem
English letters are part of basic ASCII char set so the output is usually without any problems however any other languages using various accents or even different letters, ie. Arabic, Azbuka, Greek, etc. uses letters out of the basic set.
Make sure all three sources are using same encoding:
all the PHP scripts generating the output
the HTML encoding meta tag
the output file as well
ad 1
Check your editor how it saves the PHP scripts to the file system. The way how to set it up differs from each editor
ad 2
Use HTML meta tag <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
ad 3
define the encoding to use UTF-8 for example: pdftotext -enc UTF-8 your.pdf. According to the documentation the PdfToText class generates UTF8-encoded text.
Related
I want to use PHP to read a plain text file containing mixed ASCII and Japanese characters encoded in UTF-8. I've tried:
$input = file_get_contents('filename');
However, this converts all Japanese characters to "?", when I print them on a page with this header:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
I've tried creating the file with and without a UTF-8 signature, but that made no difference. I'm using PHP 5.5.1 on Windows 7 with IIS.
Can anybody help?
Thanks.
fm API to get event discription, venue name etc...
Now sometimes I get special chars back like: ' é à , but they show up scrabled.
So how can I display them properly? Also with the descrioption I get html-tags back, but I do want to keep these.
Can someone help me out fot those both cases? The language I'm using is php
Thanks in advance
specify encoding in the header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
encode string when handling the input
$str=utf8_encode($str);
if you are displaying the input back as-is, no encoding is required;
however, if the value is the content of an input or textarea, escape the html characters
<?php echo htmlspecialchars($str); ?>
For Latin characters, use
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
in your section.
you need to be sure about two things, the meta header referring to which enconding you will be using, and the encoding you are using for the text served.
If you are using a utf8 header just be sure to convert the text served to utf8, refer to functions for encoding conversion like : mb_convert_encoding
OK so I have a PHP file with several strings of text in various languages. For most languages like French or Spanish I just simply type in the characters.
The problem I have is with Russian language characters. The PHP file is encoded in UTF-8, how can I make sure that the Russian characters are both saved correctly and displayed correctly on the output web page... Is it just a case of pasting the text into the PHP file, or is there a way to guarantee the characters will be saved into the file correctly - perhaps converting it into HTML-like notation for example?
Obviously I am assuming the end user will have the correct encoding set in their web browser, I just want to make sure I got it all covered from my end.
I am using Notepad++ on Windows to edit my PHP file.
Thanks!
If you want to tell browsers your encoding, place it inside your <header> tag:
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
Or short version
<meta charset='utf-8'>
That should be pretty enough for Russian characters to be correctly displayed on a webpage.
if your doctype is html declare <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'> but if your doctype is xhtml then declare <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />.Never assume that end-user will act correctly during your designsIf you already have some document, edit your document's meta tag for charset declaration and use notepad++ encoding>convert to UTF-8 without BOM, save your document, safely go on with your multilingual structure from now on.php tag is irrelevant for your question since you don't mention about any database char setting.
There is no difference between Latin and Cyrillic characters in UTF-8. Both are just byte sequences. Configure your server or PHP script to send Content-Type: text/html;charset=utf, and you are rather safe.
Your editor might have problems when the font you are using does not contain Russian characters. Choose another font then.
And please ignore the <meta> element recommendations. You don't need that: it is useless when your HTTP headers are correct, and maybe harmful if they aren’t.
Well you have to check 2 things
To ensure that *.php is an UTF-8 file I use PSPad. If file is not in UTF-8, I save
it like that: http://stepolabs.com/upload/utf-8.png
Then your website must have UTF-8 encoding in <meta> tag;
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
... more about metatagging.
Finally if everything is done well - (format and meta declaration) all should be displayed properly!
i am working on php. in my index.php page i have included right.php. right.php contains greek text. index.php has the html headers. the greek text are not showing correctly. when i open the right.php file in dreamweaver and save the page, it gives warning about the text. what can i do to solve this? because right.php has common contents which is used in many pages.
This is all to do with the content type of your pages. Most likely you are trying to save / display the text in latin1 format which doesn't support the characters you are trying to display.
The most sensible thing to do is convert everything to UTF-8. If you're manually editing the text then ensure your text editor (i.e. Dreamweaver) is set to save the files as UTF-8 and then ensure you have the following meta tag on your page
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
Make sure you are saving your files as UTF-8 encoding (check preferences in DreamWeaver to find file encoding). Also make sure your HTML <head> tags include charset similar to this: <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
You can use a different character set if you prefer, but UTF-8 supports the entire Unicode character space, so it's pretty safe.
You have to set file encoding to utf-8 and set it also in <meta> charset tag in <head> HTML.
I use php jpgraph library, but there i a small problem with charsets.
lets assume graph.php generates the image, and i call it from some.php
some.php
...
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
...
<img src="graph.php" />
...
as you see, i set charsets in some.php, but it doesn't show the text of the graph, which is in foreign language.(maybe i must set it in graph.php? but how?)
What is the problem?
Thanks
UPDATE
even when i try to enter numerical HTML encoding
of the Unicode character from here, it doesn't work for Armenia language:/
The meta tag only applies to the content of the page, not to images displayed within the page (even images generated dynamically by your graph.php script)
Quoting from the jpgraph faq
16 How can I print unicode characters?
Use &#XXXX; format in your strings
where XXXX is the decimal value for
the unicode character. You may find a
list of Unicode characters and there
encodings at www.unicode.org Please
observe that the encoding in the lists
are given in hexadecimal and these
values must be converted to decimal.
Note: If You are working in an UTF-8
environment then the characters may be
input directly.