Multilingual support with unicode characters. A little confusion - php

I am creating a web application framework, in which I am providing support for multilingual content.
I mean a content, say a paragraph can have 2 sentences in English and other 2 sentences in Hindi (an indian language). Now I have several doubts about that.
1) User or admin will add that content to the website. They will be presented a textarea (where they can paste their content). Then they submit the post and I will save the content in a database. I also want to provide them a web based typewriter interface where they can type content in a given language, copy it from there, and then put it back in my main textarea.
Doubt:
1a) Will I need to do something to the textarea, so that it will accept characters in unicode.
1b) Where can I find a typewriter interface for some language I desire. Does tinymce supports that.
1c) I should put the encoding of database as 'UTF 8', right?
2) Then I nead to get content from database and put it in a webpage and show it. Now this content has utf8 encoding. As it can have many languages. What should I need to do? I am guessing that just setting encoding of the webpage as utf-8 will do. What will happen if the font that is required by a language is not installed on clients pc?
I am using PhpEd editor. Should my php files encoding must be utf-8, or just specifying the html encoding tag as utf8 will be enough?
I am a bit stumped. Please help.

1a) Yes, if the text area will accept text in any language, as long as you have the web page that contains it encoded in UTF-8. If it doesn't work, double check both the HTTP Content-type header, and the HTML META http-equiv tag for Content-type. If they are both present, they should agree; one of them would be sufficient.
1c) what to do with your database depends on the specific DBMS you use. If supported, make sure that
1. the table encoding
2. the connection/the client encoding
are both set to UTF-8.
2) Again, set the page encoding to UTF-8 (see 1a). If there are no sufficient fonts on the client system, you lose - but likely, if that's the case, the end user wouldn't have been able to read the text, anyway (most users do have fonts for text in their native languages).
The encoding of the PHP files is only relevant if they contain non-ASCII text (which you should avoid).

Related

Build a website in Arabic language

I saw a website like this i.e. http://www.a3malcom.com/index.php. I want to build same kind of website in Arabic website. I was wondering does entries into database table also needs to be done in Arabic or English?
What if i need the website in 2 languages i.e. english and arabic. In what language should data should be entered in DB.
Take a look at comprehensive article: (Thanks to #Deceze for great article)
Handling Unicode Front To Back In A Web App
It also has Arabic example with other languages:
Yes you should insert data in Arabic in db table. So you can read it easily in web page and no need to convert.
And use utf-8 encoding while displaying the page
Usually unicode is all that is needed (UTF-8).
Your source file should be encoded UTF-8 if you want to write arabic in the PHP source. Note that some text editors don't support arabic properly.
For the database, just create your database with UTF-8 encoding.
For HTML output, add this to your HEAD section:
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
Alternatively, you can send it as HTTP header... but why bother.
Anyway, I have encoutered a problem generating Arabic sentences as images in GD - using a custom font. Turns out GD doesn't render arabic properly (at least in my case).
That was solved using the following library:
http://ar-php.org/
(yes the website is pretty ugly, but the library works, is well packaged, contains documentation...)
All I had to do, in my case, to fix the problem, is:
$Arabic = new I18N_Arabic('Glyphs');
$text = $Arabic->utf8Glyphs($_GET['txt']);
And then feed $text to GD.
I'm encountering a minor problem with text direction, though, and looking for a solution. But at least I have valid arabic now.
But in most cases you won't need that library, but will just need to make sure that you're using UTF-8 in all your development process.
Hope it helps.

utf-8 encoding in HTML and utf8_unicode_ci char set & collation for MySQL - can I store & display any type of text in it now?

I want to dvp a small web app which would ideally be used worldwide. For the sake of the discussion, let's say it's a recipe sharing site - it's a good enough metaphor.
My app will allow users to enter or upload text in their native languages. My html header says that the site uses utf-8 encoding. I am now creating my MySQL db, and I suppose that I should select utf8_unicode_ci for the char set & collation.
Is that correct?
Is that all I need to do to be able to receive, store, and display safe user-generated-content in their chosen language? If not, what am I missing?
(I am aware of the safety concerns associated with displaying UGC, this is not what the question is about - here I am solely looking for advice to deal with safe content.)
It is all for you HTML and DB part, but you must ensure that the programming language is UTF-8 aware so it doesn't garble your stuff. If you use PHP just make sure that the functions you use are UTF-8 aware. If it isn't the manual usually mentions it.
As far as the html and the db i think this is all you need.
The only other part you may need to define that your inputs are UTF-8 encoded, is the part where you send/receive your data (assuming with a form and a post request for example).
You can check post #:1281123 in this forum, it helped a lot when i had some problems with encoding in a similar situation.

Arabic Language Encoding Problem in PHP and ODBC(Sybase)

I am a PHP Developer and Recently I develop a web site using PHP and I connect to a Syabse Database using ODBC.
My database connection is ok and I can display the data in my web pages but I face a problem when arabic data is displayed as it is viewed as squares and not understanding symbols.
I can not change the Database language encoding and it is correct as some other arabic data is displayed well.
I tried the same configuration used in that web page but it does not work too.
I read many realted problems and I tried some solutions and I read about UTF-8, ISO, Windows and Unicode langauge Encodings and I tried to change the HTML Meta tag to display the arabic words but the problem is not solved.
I think my PHP file language Encoding itself may have a problem.
Can I change the PHP file language Encoding ? How to do that if it is possible ?
Is there any solution to dispaly Arabic coming from Database well in PHP web pages ? It is a boring problem :(
I will appreciate any hint or suggestion to solve that problem but please mention your reference and put and example if it is available.
Thanks in Advance...
Just changing the Meta tag isn't enough.
Assuming you want to change your environment to UTF-8, you need to make sure that the following is UTF-8 encoded:
The database tables
The database connection.
The page's encoding (in the Meta tag, or preferably the content-type header)
the PHP file's encoding is irrelevant, unless it contains non-ASCII content itself. In that case, you need to adjust its encoding as well, usually in your IDE's "Save as" dialog.

Getting funny squares in browser when displaying content

I have content stored in a Postgres DB, now everytime I call the content so that it gets displayed using php, i get funny squares in IE and funny square type question marks in Firefox?
Example below
* - March � May 2009
How do I remove this?
I do not have access to the server so can't adjust the encoding there, only have postgres DB details and FTP access to upload my files
I would also recommend: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky, I've read it only recently myself, it will definitely help you sort out your problems.
You need to make sure that Postgres, PHP, and your browser all agree on the content encoding, and that you have an appropriate font selected in your browser. The simplest way to do that is to choose UTF8 for everything.
I don't know about PHP, but I do know about databases and browsers. First you need to find out if the database is UTF8. (From psql, I would do a "\l" and look at the encoding.) Then you need to find out if PHP supports UTF8 (I have no idea how you do that). Then you need to see if how those characters are being stored in the database by the PHP app. Then you need to figure out if the web server is correctly reporting the content encoding. (On Linux/Unix, I'd use the program "HEAD" (not "head") to see the headers its returning.) And then you need to figure out if your browser is using a font that supports UTF8.
Or, you could just make sure you only store ASCII and forget the rest of the world exists. Not recommended.
Wrong charset somewhere. The characters could be stored wrong already in database, or you have wrong charset in meta tags on the page(try manually change charset in browser), or there could be problem with wrong encoding when page is communicating with database.
Check this page http://www.postgresql.org/docs/8.2/static/multibyte.html for more informations.
Try to have same encoding on all places, preferably UTF-8
You have encoding issues. Make sure the encoding is set right in the database, in the html markup and make sure the files themselves are saved in proper encoding.

Questions about iPhone emoji and web pages


Okay, so emoji basically shows the above on a computer. Is that another programming language? So how do I put those little boxes into a php file? When I put it into a php file, it turns into question marks and what not. Also, how can I store these in a MySQL without it turning into question marks and other weird things?
how do I put those little boxes into a php file?
Same way as any other Unicode character. Just paste them and make sure you're saving the PHP file and serving the PHP page as UTF-8.
When I put it into a php file, it turns into question marks and what not
Then you have an encoding problem. Work it out with Unicode characters you can actually see properly first, for example ąαд™日本, before worrying about the emoji.
Your PHP file should be saved as UTF-8; the page it produces should be served as Content-Type: text/html;charset:UTF-8 (or with similar meta tag); the MySQL database should be using a UTF-8 collation to store data and PHP should be talking to MySQL using UTF-8.
However. Even handling everything correctly like this, PCs will still not show the emoji. That's because:
they don't have fonts that include shapes for those characters, and
emoji are still completely unstandardised. Those characters you posted are in the Unicode Private Use Area, which means they don't have any official meaning at all.
Each network in Japan uses different character codes for their emoji, mapped to different areas in the PUA. So even on another mobile phone, it probably won't display the correct character, unless you spend ages manually converting emoji codes for different networks. I'm guessing the ones you posted above are from SoftBank (iPhone?).
There is an ongoing proposal led by Google and Apple to collate the different networks' emoji and give them a proper standardised place in Unicode. Until then, getting emoji to display consistently across networks is an exercise in unhappiness. See the character overview from the standardisation work to see how much converting you would have to do.
God, I hate emoji. All that pain for such a load of useless twee rubbish.
This has nothing to do with programming languages, just with encoding and fonts. As a very brief overview: Every character is stored by its character code (e.g.: 0x41 = A, 0x42 = B, etc), which is rendered as a meaningful character on your screen using a font (which says "the character with the code 0x41 should look like this ...").
These emoji occupy the "private use area" of the Unicode table, which is a range of codes that are undefined and free for anyone to use. That makes them perfectly valid character codes, it's just that no standard font has an appropriate character to display for them, since they are undefined. Only the iPhone and other handhelds, mostly in Japan, have appropriate icons for these codes. This is done to save bandwidth; instead of transmitting relatively large image files back and forth, emoji can be transmitted using a single character code.
As for how to store them: They should be storable as is, as long as you don't try to convert them to another encoding, in which case they may get lost. Just be aware that they only make sense on the iPhone and other SoftBank phones in Japan.
Character Viewer http://img.skitch.com/20091110-e7nkuqbjrisabrdipk96p4yt59.png
If you're on OSX you can copy and paste the character into the Character Viewer to find out what it is. I think there's a similar Character Map on Windows (albeit inferior ;-P). You could put it through PHP's ord(), but that only works on ASCII characters. See the discussion on the ord page for UTF8 functions.
BTW, just for the fun of it, these characters display fine on the iPhone as is, because the iPhone has a font which has icons for them:
iPhone http://img.skitch.com/20091110-bjt3tutjxad1kw4p9uhem5jhnk.png
I'm using FF3.5 and WinXP. I see little boxes in my browser, too.
This tells me the string requires a character set not installed on my computer.
When you put the string into a PHP file, the question marks tell you the same thing: your computer doesn't know how to display the characters.
You could store these emoji characters in MySQL if you encoded them differently, probably using UTF-8.
Do a web search for character encoding, as it relates to MySQL.

Categories