UTF8 doesn't get parsed by php, server related ? not file related - php

this is so weird, this has never happened to me. I've worked with utf-8 alot and this is the first time happening,
Since last week all my sites that had utf8 characters in files are now showing ? instead of the actual character!
the files are ok and I can see characters fine if I edit them,but after it gets processed by php it changes the utf-8 characters with ?.
the utf8 characters that were stored in database are loading just fine , but the problem is with the strings that are in php files.
Notice I said since last week, this means it happened all of a sudden and obviously something changed on server.
I contacted my hosting company but they have no clue what to look for and I don't know what to tell them to look for.
any clue what could have been changed on the server?
so to conclude:
it's not a database problem
it's not a file encoding problem (I hope not, I have 30+ sites with different cms on each one, can not afford to edit them all)
it's not a content-type issue in html because it's getting parsed by php and turns utf8 characters to ?
it could alse be a wordpress problem,but I'm sure this happend after some changes on server side
screenshot1
screenshot2

it's not a database problem - Check
it's not a file encoding problem - THIS actually could be it
it's not a content-type - Check (but make sure you write UTF-8 in meta tag lowercase !)
Wordpress problem - Maybe with combination of file encoding
I can imagine situation, when you deleted/disabled mb_string module for PHP and then edited your template using wordpress. Then your characted got shattered.

Related

Can't get rid of diamonds with question marks in the middle since php 5.6 switch

Rackspace upgraded their servers to php 5.6/apache 2.4 and ever since then, I have had several sites show these strange characters. I have gone all through google to apply patches/fixes but absolutely nothing is working. Here is an example of one: believerschallenge.com/index.php
Any help would be GREATLY appreciated!
While it's not immediately obvious what character those are supposed to be, I can think of two ways to try to address this. The first is to check the source where these characters are appearing and remove / alter the characters that aren't being rendered correctly.
The second is more complicated but probably better in the long run, and it involves checking the various things detailed in this question:
I need help fixing Broken UTF8 encoding
In particular, this one is probably the cause:
Change your PHP default charset to utf-8:
ini_set("default_charset", 'utf-8');
See UTF-8 all the way through
You probably have latin1-encoded bytes in your text and have not specified that the connection between the client and the server is utf8. Both should be changed to get rid of the black-diamond-with-question-mark.

UTF8 encoding not working when reading from PO files in CakePHP 2.3.2

My site is to be in Turkish, and I've created a locale file in app/Locale/tur/LC_MESSAGES/default.po
I've set the configuration Configure::write('Config.language','tr'); in my App controller's before filter. It is ready from the intended po file. However the characters when shown are getting garbled. Example: Ürünler shown as �r�nler
I've set character encoding to utf8 in page headers. Database encoding works fine. If I echo Ürünler as it is in a string it still works fine. However its only when it comes from PO file that it is creating problems.
I am developing my site in CakePHP 2.3.2. I've done many many multilingual sites in Cake but never faced this problem.
My PO file is okay as I even tried one of the PO files that is working fine in my past projects, it still doesn't work.
Any help appreciated. Thanks!!
It is not enough to set your headers to utf8. You also need to save files that contain utf8 chars as utf8. So check your file and make sure this is the case (utf8 without bom!).
Please make sure that default.po is saved in Unicode(utf-8)
Dreamweaver CS6 - Open file -> Modify -> Page Properties -> Document encoding (select Unicode (UTF-8)

utf8 filenames and greek chars

I'm trying to figure this out but I'm quite puzzled at the mo.
I have a directory in my website containing pdf files with greek filenames (ie ΤΙΜΟΚΑΤΑΛΟΓΟΣ.pdf)
I want to have links for the files on a web page so that users can open or save the files.
So far I can list the files ok but if I click on them I get a 404 error. It's as if the server thinks they're not there although they are.
I understand it's problably an encoding issue but beyond that I'm not sure what to look for. The website encoding is utf-8 and in order to display the filenames correctly I had to use mb_convert_encoding($file->filename, 'utf8', 'iso-8859-7').
This is the url: http://www.med4u.gr/timokatalogoi/
This is the directory listing: http://www.med4u.gr/pricelists/
The site is based on Joomla and it's hosted on a linux server.
Any ideas?
ISO-8859-* MUST DIE! (That's not personal!) Do everything in UTF-8. Everything. With good reason, some of us get upset when we see them being used, especially Latin-1 (8859-1) which bites a lot of people. I think you would find it very helpful to just dump them and move on to UTF-8.
Things to check:
Store your files encoded in UTF-8: Usually no difficulties with that.
Make sure your server is sending the files with UTF-8 charset: add header('Content-Type: text/html;charset=UTF-8'); near the top of your PHP.
Just in case someone saves your page, it's helpful in that case to put the same thing in a <meta> tag in the head.
Check it all in your browser: right click, view page info, and make sure the encoding is right.
CPanel is very flexible, so that's all doable without much fuss. Feel free to comment if you want more detail.
If you have a database, there are a few more hoops to jump through, but it's worth it. With UTF-8 you never have to worry, and it's the definitive, future-proof way of doing things.
Let's suppose for the sake of argument that the file name on disk is aa.pdf but your conversion displays it as ab.pdf. You need either to revert the conversion so it points back to aa.pdf, or teach the server to remap or redirect requests for ab.pdf to this file. Or if you prefer, rename the file to ab.pdf instead, if your file system can handle this name.
It's definitely an encoding problem. You'll need to escape the URL, or convert it to whatever character set your server recognises.
e.g. 'ΤΙΜΟΚΑΤΑΛΟΓΟΣ LASER.pdf' in iso-8859-7 = 'ÔÉÌÏÊÁÔÁËÏÃÏÓ LASER.pdf' in iso-8859-1

UTF8 characters not printed as such in Drupals HTML

I am trying to debug a nasty utf-8 problem, and do not know where to start.
A page contains the word 'categorieën', wich should be categorieën. Clearly something is wrong with the UTF-8. This happens with all these multibite characters. I have scanned the gazillion topics here on UTF8, but they mostly cover the basics, not this situation where everything appears to be configured and set correct, but clearly is not.
The pages are served by Drupal, from a MySQL database.
The database was migrated (not by me) by sql-dumping and -importing trough phpmyadmin. Good chance something went wrong there, because before, there was no problem. And because the problem occurs only on older, imported items. Editing these items or inserting new ones, and fixxing the wrongly encoded characters by hand, fixes the problem. Though I cannot see a difference in the database.
Content re-edited trough Drupal does not have this problem.
When, on the CLI, using MySQL, I can read out that text and get the correct ë character. On both The articles that render "correct "and "incorrect" characters.
The tables have collation utf8_general_ci
Headers appear to be sent with correct encoding: Vary Accept-Encoding and Content-Type text/html; charset=utf-8
HTML head contains a <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
the HTTP headers tell me there is a Varnish proxy inbetween. Could that cause UTF8-conversion/breakage
content is served Gzipped, normal in Drupal, and I have never seen this UTF8 issie wrt the gzipping, but you never know.
It appears the import is the culprit and I would like to know
a) what went wrong.
b) why I cannot see a difference in the mysql cli client between "wrong" and "correct" characters
c) how to fix the database, or where to start looking and learning on how to fix it.
The dump file was probably output as UTF-8, but interpreted as latin1 during import.
The ë, the latin1 two-byte representation of UTF-8's ë, is physically in your tables as UTF-8 data.
Seeing as you have a mix of intact and broken data, this will be tough to fix in a general way, but usually, this dirty workaround* will work well:
UPDATE table SET column = REPLACE("ë", "ë", column);
Unless you are working with languages other than dutch, the range of broken characters should be extremely limited and you might be able to fix it with a small number of such statements.
Related questions with the same problem:
Detecting utf8 broken characters in MySQL
I need help fixing Broken UTF8 encoding
* (of course, don't forget to make backups before running anything like this!)
There should have not gone anything awol in exporting and importing a Drupal dump, unless the person doing this somehow succeeded into setting the export as something else than UTF8. We export/import dumps a lot and have never bumped into a such problem.
Hopefully Pekkas answers will help you to resolve the issue, if it is in the DB, but I also thought that you could check wether the data being shown on the web page is being ran through some php functions that arent multibyte friendly.
Here are some equivalents of normal functions in mb: http://php.net/manual/en/ref.mbstring.php
ps. If you have recently moved your site to another server (so it's not just a db import), you should check what headers your site is sending out with a tool such as http://www.webconfs.com/http-header-check.php
Make sure the last row has UTF8 in it.
You mention that the import might be the problem. In that case it's possible that during import the connection with the client and the MySQL server wasn't using UTF-8. I've had this problem a couple of times in the past, so I'd like to share with you these MySQL settings (in my.conf):
Under the server settings add these:
# UTF 8
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_general_ci
skip-character-set-client-handshake
And under the client settings add:
default-character-set=utf8
This might save you some headache the next time.
To be absolutely sure you have utf8 from start to end:
- source code files in utf8 without BOM
- database with utf8 collation
- database tables with utf8 collation
- database connection in utf8 (query it with 'SET CHARSET UTF8')
- pages header set to utf8 (the ajax ones too)
- meta tag to set page in utf8

Getting funny squares in browser when displaying content

I have content stored in a Postgres DB, now everytime I call the content so that it gets displayed using php, i get funny squares in IE and funny square type question marks in Firefox?
Example below
* - March � May 2009
How do I remove this?
I do not have access to the server so can't adjust the encoding there, only have postgres DB details and FTP access to upload my files
I would also recommend: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky, I've read it only recently myself, it will definitely help you sort out your problems.
You need to make sure that Postgres, PHP, and your browser all agree on the content encoding, and that you have an appropriate font selected in your browser. The simplest way to do that is to choose UTF8 for everything.
I don't know about PHP, but I do know about databases and browsers. First you need to find out if the database is UTF8. (From psql, I would do a "\l" and look at the encoding.) Then you need to find out if PHP supports UTF8 (I have no idea how you do that). Then you need to see if how those characters are being stored in the database by the PHP app. Then you need to figure out if the web server is correctly reporting the content encoding. (On Linux/Unix, I'd use the program "HEAD" (not "head") to see the headers its returning.) And then you need to figure out if your browser is using a font that supports UTF8.
Or, you could just make sure you only store ASCII and forget the rest of the world exists. Not recommended.
Wrong charset somewhere. The characters could be stored wrong already in database, or you have wrong charset in meta tags on the page(try manually change charset in browser), or there could be problem with wrong encoding when page is communicating with database.
Check this page http://www.postgresql.org/docs/8.2/static/multibyte.html for more informations.
Try to have same encoding on all places, preferably UTF-8
You have encoding issues. Make sure the encoding is set right in the database, in the html markup and make sure the files themselves are saved in proper encoding.

Categories