exif_read_data: keywords decoded incorrectly - php

I'm using exif_read_data() to extract exif data from uploaded pictures. This worked fine on my Windows machine but on my Mac with latest XAMPP all fields seem to be extracted correctly except the keywords/tags. If I look in the file the camera model (which is extracted correctly) is encoded in ASCII it seems (one byte per char). However, the keywords (which were originally edited on Windows (Explorer)) are encoded in UTF16-LE it seems (i.e. ASCII code followed by 0x00). So it seems to be a mix of character encoding.
I tried to force the character encoding to a certain standard (with e.g. ini_set('exif.encode_unicode', 'byte2le')) but most of the times I get question marks in the keywords or nothing at all.
Anyone any idea what's wrong, how to fix it and why this worked fine on Windows XAMMP and not Mac XAMPP?
Thanks

I found the answer:
Forcing exif.decode_unicode_motorola to UCS-2LE instead of the default value UCS-2BE did the trick.
ini_set('exif.decode_unicode_motorola', 'UCS-2LE');
Still don't understand why it works on a Windows machine without this.

Related

Hungarian/Bulgarian characters from CSV file end up garbled in PHP

I'm trying to import a CSV file which looks something like this:
"source "," destination "
férfi-/ruházat-Öltöny," férfi-/ruházat-blézer_zakó",
Note that this is just a sample of the CSV, not the whole CSV.
The way I'm reading the file is pretty straight forward:
$line = fgets($this->fileHandle) ;
$line = mb_convert_encoding($line , 'UTF-8', mb_detect_encoding($line));
Where $this->fileHandle is just a resource pointing to the file opened using fopen. So nothing too special there.
I want to do some string manipulation on the strings inside the CSV. I can import it just fine.
When I read from the file, either using fgets, fread or whatever other function I can think if I end up with garbled text.
Something along the lines of this:
So far I've tried mb_internal_encoding("UTF-8"), to ISO-8859-2 and a few other encodings. Nothing worked.
I've also tried mb_convert_encoding($line , 'UTF-8', mb_detect_encoding($line)) where $line is the line read from the csv.
Again, nothing. Still garbled text.
Next I assumed it may be something from my OS. I' using MAC with a docker instance on Ubuntu.
Using High Sierra v10.13.4 on mac
A locale command in the terminal gives me:
LANG="C.UTF-8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
As far as the docker instance:
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
# locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=
So everything seems to be fine in that regard.
I've also tried an online PHP interpreter and that works fine. So clearly the issue is on my side.
To be honest I have no idea where the issue lies.
Any pointing in the right direction is greatly appreciated.
To answer my own question:
I had to ini_set("default_charset", "UTF-8");. The default was an empty string.
I have no idea how it worked without it so far, I assume it has some sort of fallback encoding.
Either way, I hope this helps anybody else who gets stuck on this.

Ubuntu encoding of new files

I'm searching there for a long time, but without any helpful result.
I'm developing a PHP project using eclipse on a Ubuntu 11.04 VM. Every thing works fine. I've never need to look for the file encoding. But after deploying the project to my server, all contents were shown with the wrong encoding. After a manual conversion to UTF8 with Notepad++ my problems were solved.
Now I want to change it in my Ubuntu VM, too. And there's the problem. I've checked the preferences in Eclipse but every property ist set to UTF8: General content types, workspace, project settings, everything ...
If I look for the encoding on the terminal, it says "test_new.dat: text/plain; charset=us-ascii". All files are saved to ascii format. If I try to create a new file with the terminal ("touch") it's also the same.
Then I've tried to convert the files with iconv:
iconv -f US-ASCII -t UTF8 -o test.dat test_new.dat
But the encoding doesn't change. Especially PHP files seems to be resistant. I have some *.ini files in my project for which a conversion works?!
Any idea what to do?
Here are my locale settings of Ubuntu:
LANG=de_DE.UTF-8
LANGUAGE=de_DE:en
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
I was also wondering about character encoding and found something that might be usefull here.
When I create a new empty .txt-file on my ubuntu 12.04 and ask for its character encoding with: "file -bi filename.txt" it shows me: charset=binary. After opening it and writing something inside like "haha" I saved it using "save as" and explicitly chose UTF-8 as character encoding. Now very strangely it did not show me charset=UTF-8 after asking again, but returned charset=us-ascii. This seemed already strange. But it got even stranger, when I did the whole thing again but this time included some german specific charakters (ä in this case) in the file and saved again (this time without saving as, I just pressed save). Now it said charset=UTF-8.
It therefore seems that at least gedit is checking the file and downgrading from UTF-8 to us-ascii if there is no need for UTF-8 since the file can be encoded using us-ascii.
Hope this helped a bit even though it is not php related.
Greetings
UTF-8 is compatible with ASCII. An ASCII text file is therefore also valid UTF-8, and a conversion from ASCII to UTF-8 is a no-op.

What encoding does MAC Excel use?

I have a client that wants to export a .csv to the server where it will be parsed by PHP in order to generate a table with its data. I'm using iconv to convert to the appropriate encoding (UTF-8). Unfortunately I'm a on Windows, so I don't know what the source encoding is.
What encoding would MAC Excel use to generate a .csv? I've tried so many different combinations, but none work on the french accents, which are - as far as I know - not arranged the same way in the MAC's charset as in UTF-8
For example:
The correct display should be:
'Délégation'
Most types of encoding (including using utf8_encode()) gives:
'DÈlÈgation'
macintosh to UTF-8 gives:
'D»l»gation'
If I open the .csv file - that was saved from MAC - on my PC, I see the french 'é' accents as 'È', so is there a possibility that saving the file onto my computer (or server) forces the file directly to UTF-8 so now the 'È' are the direct values of the characters, instead of an UTF-8 encoding misinterpretation?
Hex Dump
Using bin2hex(), the hex dump for the string:
'DÈlÈgation 1' is:
44c86cc8676174696f6e2031
-- in fact, I'm assuming that it's DÈlÈgation and not Délégation because if I open the .csv file in notepad (on my PC), it shows it up as È and not é.
A common encoding for Mac programs to use is MacRoman.
Would it be possible for your client to install the trial version of Apple Numbers from the Apple website, open the .csv file using Numbers, and then go to "file", "export", "CSV", and pick either "UTF-8" or "windows Latin 1" and resend you the UTF-8 and the Windows Latin 1 files?
The "Numbers" application on a Mac solves problematic issues encountered on Excel sometimes...

echo and UTF-8 (PHP)

I have installed Apache on my server (I wasn't using Apache) and special characters started to show wrong.
So I changed every file to UTF-8, configured MySQL to work with UTF-8 and everything worked fine. However, my Python app (which retrieves some information from the website) doesn't work properly.
For example, I had a file "test.php" which returned either 0 or 1. Python code then did whatever with that result.
But now, my Python app doesn't receive "0", I don't know what it gets from the website. I made the app send a GET request to my site with what it was getting and it sent me this: "???0".
What can I do? I tried to change the header to send the result as ISO-8859-1 (as it was before) but isn't working either.
It's BOM symbol. Remove this symbol from script in Notepad++ editor (Menu -> Encoding -> Encode in UTF-8 without BOM).

filename encoding issue

I am getting a file with a faroese name and trying to save it in a PHP script:
2010_08_Útflutningur.xls
In Ubuntu 10.04 LTS is saving it as :
2010_08_�tflutningur.xls (invalid encoding)
I've installed and run utf8-migration-tool, but with no effect.
Is this a ubuntu error that I can fix or I just have to give up and modify the name in php?
Thanks
Ubuntu uses UTF8 internally for its filenames. In this particular case utf8_encode does the trick as the original filename is ISO-8859-1 encoded. In other cases I could use iconv, and detect the encoding if is unknown.
"Ú" it's not the ubuntu error.Basically your "Ú" chartered takes as a unreadable special charecter.So it's better to modify the name.

Categories