filename encoding issue

filename encoding issue - php

I am getting a file with a faroese name and trying to save it in a PHP script:
2010_08_Útflutningur.xls
In Ubuntu 10.04 LTS is saving it as :
2010_08_�tflutningur.xls (invalid encoding)
I've installed and run utf8-migration-tool, but with no effect.
Is this a ubuntu error that I can fix or I just have to give up and modify the name in php?
Thanks

Ubuntu uses UTF8 internally for its filenames. In this particular case utf8_encode does the trick as the original filename is ISO-8859-1 encoded. In other cases I could use iconv, and detect the encoding if is unknown.

"Ú" it's not the ubuntu error.Basically your "Ú" chartered takes as a unreadable special charecter.So it's better to modify the name.

Related

exif_read_data: keywords decoded incorrectly

I'm using exif_read_data() to extract exif data from uploaded pictures. This worked fine on my Windows machine but on my Mac with latest XAMPP all fields seem to be extracted correctly except the keywords/tags. If I look in the file the camera model (which is extracted correctly) is encoded in ASCII it seems (one byte per char). However, the keywords (which were originally edited on Windows (Explorer)) are encoded in UTF16-LE it seems (i.e. ASCII code followed by 0x00). So it seems to be a mix of character encoding.
I tried to force the character encoding to a certain standard (with e.g. ini_set('exif.encode_unicode', 'byte2le')) but most of the times I get question marks in the keywords or nothing at all.
Anyone any idea what's wrong, how to fix it and why this worked fine on Windows XAMMP and not Mac XAMPP?
Thanks

I found the answer:
Forcing exif.decode_unicode_motorola to UCS-2LE instead of the default value UCS-2BE did the trick.
ini_set('exif.decode_unicode_motorola', 'UCS-2LE');
Still don't understand why it works on a Windows machine without this.

php's iconv() not displaying proper strings anymore

I have been using iconv to convert data from my company's database (windows-1250) to UTF8. It all worked fine until recently.
I'm not really sure what happened, as I've noticed the change only recently. The problem is that iconv seems to stop working well - it still throws notices when I use bad encoding name.
Earlier, when I saved a string to the db with
htmlspecialchars(iconv('UTF-8', 'windows-1250', $string), ENT_QUOTES) it was fine. Now only question marks are written to my db instead of e.g. ąęś.
When I correct them via PL/SQL Developer and read them via php: htmlspecialchars_decode(iconv('windows-1250', 'UTF-8', $string), ENT_QUOTES)
I receive aes. I tried to set the encoding in php, right before string output:
header('Content-Type: text/html; charset=utf-8');, but it didn't help.
My software is:
PHP 5.3.15 (cli)
iconv (GNU libc) 2.15
Apache/2.2.22
openSUSE 12.2
Oracle 10.2.0.4
oci

After some help from hakre, I was able to solve my problem.
My strings were already transliterated when I selected them from the database. PHP's oci_connect() fourth parameter is character set. If you don't provide it, it is taken from environment variable NLS_LANG.
I had neither fourth parameter nor environment variable, so my database connection charset was wrong. Once I added NLS_LANG variable it started to work fine.
Thank you hakre! : )

Ubuntu encoding of new files

I'm searching there for a long time, but without any helpful result.
I'm developing a PHP project using eclipse on a Ubuntu 11.04 VM. Every thing works fine. I've never need to look for the file encoding. But after deploying the project to my server, all contents were shown with the wrong encoding. After a manual conversion to UTF8 with Notepad++ my problems were solved.
Now I want to change it in my Ubuntu VM, too. And there's the problem. I've checked the preferences in Eclipse but every property ist set to UTF8: General content types, workspace, project settings, everything ...
If I look for the encoding on the terminal, it says "test_new.dat: text/plain; charset=us-ascii". All files are saved to ascii format. If I try to create a new file with the terminal ("touch") it's also the same.
Then I've tried to convert the files with iconv:
iconv -f US-ASCII -t UTF8 -o test.dat test_new.dat
But the encoding doesn't change. Especially PHP files seems to be resistant. I have some *.ini files in my project for which a conversion works?!
Any idea what to do?
Here are my locale settings of Ubuntu:
LANG=de_DE.UTF-8
LANGUAGE=de_DE:en
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

I was also wondering about character encoding and found something that might be usefull here.
When I create a new empty .txt-file on my ubuntu 12.04 and ask for its character encoding with: "file -bi filename.txt" it shows me: charset=binary. After opening it and writing something inside like "haha" I saved it using "save as" and explicitly chose UTF-8 as character encoding. Now very strangely it did not show me charset=UTF-8 after asking again, but returned charset=us-ascii. This seemed already strange. But it got even stranger, when I did the whole thing again but this time included some german specific charakters (ä in this case) in the file and saved again (this time without saving as, I just pressed save). Now it said charset=UTF-8.
It therefore seems that at least gedit is checking the file and downgrading from UTF-8 to us-ascii if there is no need for UTF-8 since the file can be encoded using us-ascii.
Hope this helped a bit even though it is not php related.
Greetings

UTF-8 is compatible with ASCII. An ASCII text file is therefore also valid UTF-8, and a conversion from ASCII to UTF-8 is a no-op.

What encoding does MAC Excel use?

I have a client that wants to export a .csv to the server where it will be parsed by PHP in order to generate a table with its data. I'm using iconv to convert to the appropriate encoding (UTF-8). Unfortunately I'm a on Windows, so I don't know what the source encoding is.
What encoding would MAC Excel use to generate a .csv? I've tried so many different combinations, but none work on the french accents, which are - as far as I know - not arranged the same way in the MAC's charset as in UTF-8
For example:
The correct display should be:
'Délégation'
Most types of encoding (including using utf8_encode()) gives:
'DÈlÈgation'
macintosh to UTF-8 gives:
'D»l»gation'
If I open the .csv file - that was saved from MAC - on my PC, I see the french 'é' accents as 'È', so is there a possibility that saving the file onto my computer (or server) forces the file directly to UTF-8 so now the 'È' are the direct values of the characters, instead of an UTF-8 encoding misinterpretation?
Hex Dump
Using bin2hex(), the hex dump for the string:
'DÈlÈgation 1' is:
44c86cc8676174696f6e2031
-- in fact, I'm assuming that it's DÈlÈgation and not Délégation because if I open the .csv file in notepad (on my PC), it shows it up as È and not é.

A common encoding for Mac programs to use is MacRoman.

Would it be possible for your client to install the trial version of Apple Numbers from the Apple website, open the .csv file using Numbers, and then go to "file", "export", "CSV", and pick either "UTF-8" or "windows Latin 1" and resend you the UTF-8 and the Windows Latin 1 files?
The "Numbers" application on a Mac solves problematic issues encountered on Excel sometimes...

$_GET encoding problem with cyrillic text

I'm trying this code (on my local web server)
<?php
echo 'the word is / думата е '.$_GET['word'];
?>
but I get corrupted result when enter ?word=проба
the word is / думата е ����
The document is saved as 'UTF-8 without BOM' and headers are also UTF-8.
I have tried urlencode() and urldecode() but the effect was same.
When upload it on web server, works fine...

What if you try sending a HTTP Content-type header, to indicate the browser which encoding / charset your page is generating ?
For instance, something like this might help :
header('Content-type: text/html; charset=UTF-8');
echo 'the word is / думата е '.$_GET['word'];
Of course, this is if you are generating HTML -- you probably are.
Considering there is a configuration setting at the server's level that defines which encoding is sent by default, maybe the default encoding on your server is OK -- while the one on your local server is not.
Sending such a header by yourself would solve the problem : it would make sure the encoding is always set properly.

I suppose you are using the Apache web server.
There is a common problem with Apache configuration - a line with "AddDefaultCharset" in the config should be commented out (add # in the begining of the line, or replace the line with "AddDefaultCharset off") because it "overrides any encoding given in the files in meta http-equiv or xml encoding tags".
In my current installation (Apache2 # Ubuntu Linux) the line is found in "/etc/apache2/conf.d/charset" but in other (Linux/Unix) setups can be in "/etc/apache2/httpd.conf", or "/etc/apache/httpd.conf" (if you are using Apache 1). If you don't find it in these files you can search for it with "cd /etc/apache2 ; grep -r AddDefaultCharset *" (for Apache 2 # Unix/Linux).

Take a look at Changing the server encoding. An excellent read!
Cheers!

If You recieve $_GET from AJAX make sure that Your blablabla.js file in UTF-8 encode. Also You can use iconv("cp1251","utf8",$_GET['word']); to display your $_GET['word'] in UTF-8

I just had the issue and it sometimes happens if you filter the GET variable with htmlentities(). It seems like this function converts cyrillic characters into weird stuff.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

filename encoding issue - php

Ubuntu uses UTF8 internally for its filenames. In this particular case utf8_encode does the trick as the original filename is ISO-8859-1 encoded. In other cases I could use iconv, and detect the encoding if is unknown.

"Ú" it's not the ubuntu error.Basically your "Ú" chartered takes as a unreadable special charecter.So it's better to modify the name.

Related

exif_read_data: keywords decoded incorrectly

php's iconv() not displaying proper strings anymore

Ubuntu encoding of new files

What encoding does MAC Excel use?

$_GET encoding problem with cyrillic text

Categories

Resources