Getting funny squares in browser when displaying content - php

I have content stored in a Postgres DB, now everytime I call the content so that it gets displayed using php, i get funny squares in IE and funny square type question marks in Firefox?
Example below
* - March � May 2009
How do I remove this?
I do not have access to the server so can't adjust the encoding there, only have postgres DB details and FTP access to upload my files

I would also recommend: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky, I've read it only recently myself, it will definitely help you sort out your problems.

You need to make sure that Postgres, PHP, and your browser all agree on the content encoding, and that you have an appropriate font selected in your browser. The simplest way to do that is to choose UTF8 for everything.
I don't know about PHP, but I do know about databases and browsers. First you need to find out if the database is UTF8. (From psql, I would do a "\l" and look at the encoding.) Then you need to find out if PHP supports UTF8 (I have no idea how you do that). Then you need to see if how those characters are being stored in the database by the PHP app. Then you need to figure out if the web server is correctly reporting the content encoding. (On Linux/Unix, I'd use the program "HEAD" (not "head") to see the headers its returning.) And then you need to figure out if your browser is using a font that supports UTF8.
Or, you could just make sure you only store ASCII and forget the rest of the world exists. Not recommended.

Wrong charset somewhere. The characters could be stored wrong already in database, or you have wrong charset in meta tags on the page(try manually change charset in browser), or there could be problem with wrong encoding when page is communicating with database.
Check this page http://www.postgresql.org/docs/8.2/static/multibyte.html for more informations.
Try to have same encoding on all places, preferably UTF-8

You have encoding issues. Make sure the encoding is set right in the database, in the html markup and make sure the files themselves are saved in proper encoding.

Related

utf8 filenames and greek chars

I'm trying to figure this out but I'm quite puzzled at the mo.
I have a directory in my website containing pdf files with greek filenames (ie ΤΙΜΟΚΑΤΑΛΟΓΟΣ.pdf)
I want to have links for the files on a web page so that users can open or save the files.
So far I can list the files ok but if I click on them I get a 404 error. It's as if the server thinks they're not there although they are.
I understand it's problably an encoding issue but beyond that I'm not sure what to look for. The website encoding is utf-8 and in order to display the filenames correctly I had to use mb_convert_encoding($file->filename, 'utf8', 'iso-8859-7').
This is the url: http://www.med4u.gr/timokatalogoi/
This is the directory listing: http://www.med4u.gr/pricelists/
The site is based on Joomla and it's hosted on a linux server.
Any ideas?
ISO-8859-* MUST DIE! (That's not personal!) Do everything in UTF-8. Everything. With good reason, some of us get upset when we see them being used, especially Latin-1 (8859-1) which bites a lot of people. I think you would find it very helpful to just dump them and move on to UTF-8.
Things to check:
Store your files encoded in UTF-8: Usually no difficulties with that.
Make sure your server is sending the files with UTF-8 charset: add header('Content-Type: text/html;charset=UTF-8'); near the top of your PHP.
Just in case someone saves your page, it's helpful in that case to put the same thing in a <meta> tag in the head.
Check it all in your browser: right click, view page info, and make sure the encoding is right.
CPanel is very flexible, so that's all doable without much fuss. Feel free to comment if you want more detail.
If you have a database, there are a few more hoops to jump through, but it's worth it. With UTF-8 you never have to worry, and it's the definitive, future-proof way of doing things.
Let's suppose for the sake of argument that the file name on disk is aa.pdf but your conversion displays it as ab.pdf. You need either to revert the conversion so it points back to aa.pdf, or teach the server to remap or redirect requests for ab.pdf to this file. Or if you prefer, rename the file to ab.pdf instead, if your file system can handle this name.
It's definitely an encoding problem. You'll need to escape the URL, or convert it to whatever character set your server recognises.
e.g. 'ΤΙΜΟΚΑΤΑΛΟΓΟΣ LASER.pdf' in iso-8859-7 = 'ÔÉÌÏÊÁÔÁËÏÃÏÓ LASER.pdf' in iso-8859-1

PHP charset accents issue

I have a form in my page for users to leave a comment.
I'm currently using this charset:
meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"
but retrieveving the comment from DB accents are not displaying correct ( Ex. è =>è ).
Which parameters should i care about for a correct handling of accents?
SOLVED
changed meta tag to charset='utf-8'
changed character-set Mysql (ALTER TABLE comments CONVERT TO CHARACTER SET utf-8)
changed connection character-set both when inserting records and retrieving ($conn->query('SET NAMES utf8'))
Now accents are displaying correct
thanks
Luca
Character sets can be complicated and pain to debug when it comes to LAMP web applications. At each of the stages that one piece of software talks to another there's scope for incorrect charset translation or incorrect storage of data.
The places you need to look out for are:
- Between the browser and the web server (which you've listed already)
- Between PHP and the MySQL server
The character you've listed look like normal a European character that will be included in the ISO-8859-1 charset.
Things to check for:
even though you're specifying the character set in a meta header have a look in your browser to be sure which character set the browser is actually using. If you've specified it the browser should use that charset to render/view the page but in cases I've seen it attempting to auto-detect the correct charset and failing. Most browsers will have an "encoding" menu (perhaps under "view") that allows you to choose the charset. Ensure that it says ISO-8859-1 (Western European).
MySQL can happily support character set conversion if required to but in most cases you want to have your tables and client connection set to use the same encoding. When configured this way MySQL won't attempt to do any encoding conversion and will just write the data you input byte for byte into the table. When read it'll come out the same way byte for byte.
You've not said if you're reading data from the database back out with the same web-app or with some other client. I'd suggest you try to read it out with the same web application and using the same meta charset header (again, check the browser is really setting it) and see what is displayed in the browser.
To debug these issues requires you to be really sure about whether the client/console you're using is doing any conversion too, the safest way is sometimes to get the data into a hex editor where you can be sure that nothing else is messing around with any translation.
If it doesn't look like it's a browser-side problem please can you include the output of the following commands against your database:
Run from a connection that your web-app makes (not from some other MySQL client):
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Run from any MySQL client:
SHOW CREATE TABLE myTable;
(where myTable is the table you're reading/writing data from/to)
The ISO-8859-1 character set is for Latin characters only. Try UTF-8, and make sure that the database these characters are coming from are also UTF-8 columns.

Help with multi-lingual text, php, and mysql

I have had no end of problems trying to do what I thought would be relatively simple:
I need to have a form which can accept user input text in a mix of English an other languages, some multi-byte (ie Japanese, Korean, etc), and this gets processed by php and is stored (safely, avoiding SQL injection) in a mysql database. It also needs to be accessed from the database, processed, and used on-screen.
I have it set up fine for Latin chars but when I add a mix of Latin andmulti-byte chars it turns garbled.
I have tried to do my homework but just am banging my head against a wall now.
Magic quotes is off, I have tried using utf8_encode/decode, htmlentities, addslashes/stripslashes, and (in mysql) both "utf8_general_ci" and "utf8_unicode_ci" for the field in the table.
Part of the problem is that there are so many places where I could be messing it up that I'm not sure where to begin solving the problem.
Thanks very much for any and all help with this. Ideally, if someone has working php code examples and/or knows the right mysql table format, that would be fantastic.
Here is a laundry list of things to check are in UTF8 mode:
MySQL table encoding. You seem to have already done this.
MySQL connection encoding. Do SHOW STATUS LIKE 'char%' and you will see what MySQL is using. You need character_set_client, character_set_connection and character_set_results set to utf8 which can easily set in your application by doing SET NAMES 'utf8' at the start of all connections. This is the one most people forget to check, IME.
If you use them, your CLI and terminal settings. In bash, this means LANG=(something).UTF-8.
Your source code (this is not usually a problem unless you have UTF8 constant text).
The page encoding. You seem to have this one right, too, but your browsers debug tools can help a lot.
Once you get all this right, all you will need in your app is mysql_real_escape_string().
Oh and it is (sadly) possible to successfully store correctly encoded UTf8 text in a column with the wrong encoding type or from a connection with the wrong encoding type. And it can come back "correctly", too. Until you fix all the bits that aren't UTF8, at which point it breaks.
I don't think you have any practical alternatives to UTF-8. You're going to have to track down where the encoding and/or decoding breaks. Start by checking whether you can round-trip multi-language text to the data base from the mysql command line, or perhaps through phpmyadmin. Track down and eliminate problems at that level. Then move out one more level by simulating input to your php and examining the output, again dealing with any problems. Finally add browsers into the mix.
First you need to check if you can add multi-language text to your database directly. If its possible you can do it in your application
Are you serializing any data by chance? PHPs serialize function has some issue when serializing non-english characters.
Everything you do should be utf-8 encoded.
One thing you could try is to json_encode() the data when putting it into the database and json_decoding() it when it's retrieved.
The problem was caused by my not having the default char set in the php.ini file, and (possibly) not having set the char set in the mysql table (in PhpMyAdmin, via the Operations tab).
Setting the default char set to "utf-8" fixed it. Thanks for the help!!
Check your database connection settings. It also needs to support UTF-8.

Charset encoding problem

I am developing an Arabic web site. However, I use AJAX to save some text in my data base. The AJAX works fine with me. My problem is, when I save the data in my database and try to print it on my screen, it returns a weird text. I have used the PHP function mb_detect_encoding to determine how the database deals with the text. The function returned UTF-8.
So I used iconv("windows-1256","UTF-8",$row["text"]) to print the text on the screen, but it still returning this weird thing. Please give a hand
Thanks
please take a look at this thread (and use the search before posting a question first).
in your case, i think you've forgotten to set the chorrect charset for you database-connection (using a SET NAMES statement or mysql_set_charset()) - but thats hard to say.
this is a quote from chazomaticus, who has given a perfect answer in the liked thread, listing all the points you have to care of:
Storage:
Specify utf8_unicode_ci (or
equivalent) collation on all tables
and text columns in your database.
This makes MySQL physically store and
retrieve values natively in UTF-8.
Retrieval:
In PHP, in whatever DB wrapper you
use, you'll need to set the connection
charset to utf8. This way, MySQL does
no conversion from its native UTF-8
when it hands data off to PHP.
*
Note that if you don't use a DB
wrapper, you'll probably have to issue
a query to tell MySQL to give you
results in UTF-8: SET NAMES 'utf8'
(as soon as you connect).
Delivery:
You've got to tell PHP to deliver
the proper headers to the client, so
text will be interpreted as UTF-8. In
PHP, you can use the default_charset
php.ini option, or manually issue the
Content-Type header yourself, which
is just more work but has the same
effect.
Submission:
You want all data sent to you by
browsers to be in UTF-8.
Unfortunately, the only way to
reliably do this is add the
accept-charset attribute to all your
<form> tags: <form ...
accept-charset="UTF-8">.
Note
that the W3C HTML spec says that
clients "should" default to sending
forms back to the server in whatever
charset the server served, but this is
apparently only a recommendation,
hence the need for being explicit on
every single <form> tag.
Although, on that front, you'll still
want to verify every submitted string
as being valid UTF-8 before you try to
store it or use it anywhere. PHP's
mb_check_encoding() does the trick,
but you have to use it religiously.
Processing:
This is, unfortunately, the hard
part. You need to make sure that
every time you process a UTF-8 string,
you do so safely. Easiest way to do
this is by making extensive use of
PHP's mbstring extension.
PHP's
string operations are NOT by default
UTF-8 safe. There are some things you
can safely do with normal PHP string
operations (like concatenation), but
for most things you should use the
equivalent mbstring function.
To
know what you're doing (read: not mess
it up), you really need to know UTF-8
and how it works on the lowest
possible level. Check out any of the
links from utf8.com for some good
resources to learn everything you need
to know.
Also, I feel like this
should be said somewhere, even though
it may seem obvious: every PHP or HTML
file you'll be serving should be
encoded in valid UTF-8.
note that you don't need to use utf-8 - the important part is to use the same charset everywhere, independent of what charset that might be. but if you need to change things anyway, use utf-8.
I recommend changing your web pages to UTF-8.
Ideally, you should use the same encoding (UTF-8?) in your webpages, database, and JavaScript/AJAX. Many people forget to set charset for AJAX requests/responses, which gives you mangled data in some browsers (cough cough).
Thank you guys for your support, and sorry oezi for that confusion. I really made a search and didn't find my answer. However, it works fine with me now. I am going to explain what I did to make it work, so anybody else can get benefit of it:
- I made my tables charset to utf8_unicode_ci.
- To submit the data, I used AJAX with the default charset UTF-8.
- When I get the data from my DB, I used the iconv function as the follwoing
iconv("UTF-8","windows-1256",$row["text"]) , and it works
I hope that clear

Working with Foreign languages

My DB needs to hold strings containing foreign language characters. As an example, a user enters string into form, the form is submitted and the string is added to DB. The string will be displayed on a web page. I would like to use UTF-8 as this will be able to handle all of the required languages.
Currently, I believe my DB is set to 'latin1', but webpages are capable of displaying correct characters anyways. Problems arise when trying to set textareas to hold foreign characters and when viewing DB via command-line.
How can I implement this effectively? My plan was to blitz the whole site such that the DB charset is UTF-8, and the web page charset is UTF-8. Could someone give me the minimal commands on how to do this so I don't end up duplicating things (Having "UTF-8" commands everywhere when I really just need one) and making things too difficult to control?
edit: Using MySQL, PHP and JavaScript/HTML
That would be the way to go (UTF-8) in the DB. Here's what you want to look at:
Does your browser support UTF-8 characters (make sure the font you use has characters for all of the relevant code pages that you need to support) and is the meta charset tag set correctly?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> (in HTML4)
<?xml version="1.0" encoding="UTF-8"?> (In XHTML)
Are you sending a ContentType header? If so, make sure it matches what you define in your meta or XML version tag.
In regards to your command line, make sure the terminal you're using and your shell's charset also match. (check the locale on your server, assuming it's *NIX you can do this by typing "locale") doing this will change your locale setting:
export LANG="en_US.UTF-8"
If you're using windows check the system's locale. http://windows.microsoft.com/en-us/windows-vista/Change-the-system-locale . If you're using PuTTY to log into your server, you're going to want to make sure you've set it to unicode mode to support UTF-8.
The other thing you're going to want to look at in mysql is the table collation setting. Make sure it's a collation that makes sense for what you're looking to do or indices will have unexpected behaviors. (see http://dev.mysql.com/doc/refman/5.0/en/charset-mysql.html )
More likely than not, you'd want to set up a header file for your website itself that has the characterset encoding information and include that in every view. As far as the DB is concerned, obviously your text and varchar fields need to support the right encoding, there's no simple way to do this without altering each table to make sure its individual charset is right and its collation is right (once a table is created in a charset you need to convert it).

Categories