ANSI <--> UTF-8 charset issue, when displaying images from the database (PHP)

ANSI <--> UTF-8 charset issue, when displaying images from the database (PHP) - php

I am loading images from a MySQL database, to display them in an Web GUI.
This is pretty standard and worked pretty well, till I tried to install the software in russia...
Here a example of the code that loads the image:
// Load overview image
if ($global_mode == 'overview') {
// Load the image from the database.
mysql_select_db("$db_x");
$sql = "SELECT $db_x.sensor_images.image
FROM $db_x.sensor_images
WHERE $db_x.sensor_images.image_id = '" . $global_image_id . "'";
$sql = mysql_query($sql);
$row = mysql_fetch_assoc($sql);
// Image output.
header('Content-type: image/jpeg');
echo $row['image'];
}
I installed the software on many european based laptops and I never had the problem, that images were not displayed...
Apparently on russian laptops (Windows 7, XAAMP, MySQL) this was not the case, images were not displayed.
I started to do some research and found out (on my laptop, where images get displayed...), that if I change the Encoding of the php file (in this case the show_image.php), I could replicate the error I had on russian laptops.
If the encoding is set to ANSI, the images get displayed...
Here I have disabled the header, so the browser displays the binary data (the encoding of the PHP file is set to ANSI)...
EXAMPLE A
Now I set the Encoding of the PHP file to UTF-8
By doing this images do not get displayed any more...
This is the output when I try to display the data without the header...
EXAMPLE B
As you can see, the output is different...
On my laptop (european):
ANSI: images get displayed, the data (without header) looks like EXAMPLE A
UTF-8: images get not displayed, the data (without header) looks like EXAMPLE B
On russian laptops:
ANSI: images get not displayed, the data (without header) looks like EXAMPLE B
UTF-8: images get not displayed, the data (without header) looks like EXAMPLE B
I still don't understand why changing the encoding of a php file has an impact on the output of binary data, respectively an image...
On the russian laptops the PHP files get always interpreted as if the encoding was set to UTF-8, no matter if I set it to ANSI or something else...
Please help!
Thx.

In your IDE you see "UTF-8" and "UTF-8 without BOM". You're choosing UTF-8, which in this case means with BOM. The BOM is prepended to the file and is the first thing that's output. This may a) break the output of your header, thereby breaking the data display, and b) giving the browser a clue that the following data is supposedly UTF-8 encoded, hence the browser is interpreting the data as UTF-8, which results in a lot of UNICODE REPLACEMENT CHARACTERS �. Check your error logs, you should see PHP complain about Headers already sent.
The data you're sending is always the same, it's just interpreted in different encodings depending on the machine's default and the presence or absence of a UTF-8 BOM.
The only reason it breaks at all under any circumstances is that you're outputting the wrong headers and/or are sending additional content before or after the image data. Check with a low-level tool like curl what exactly is output, and find and remove anything that doesn't belong.

Save the php File that sends out the picture as "UTF-8 w/o BOM".If there are any files inlcuded, it is mandatory that they are saved as either "ANSI" or "UTF-8 w/o BOM", too. There also must be no space nor any text before the <?php -Tag.
If you want to send any text, e.g. an error message because the picture file is non-existent, you need to send the header("Content-Type: charset=utf-8"); right before the text in order to display all characters correctly - but not in combination with the image:
<?php
include "/someSafeDir/utils.inc.php";
$pic= secureGet($_GET['pic']);
$imagePath=/somePath/
$picture=$imagePath.$pic;
if (file_exists($picture)){
if ($fd = fopen ($picture, "r")) {
$fsize = filesize($picture);
header ("Content-type: image/jpeg");
header ("Content-length: $fsize");
readfile($picture);
} else
echo "File \"$pic\" could not be opened.\n";
fclose ($fd);
} else {
header("Content-Type: text/html; charset=utf-8");
echo "File \"$pic\" not existent";
}
?>

It could be that your PHP server charset is not the correct one.
In your php.ini file, try having the following directive:
default_charset = "UTF-8"
And restart your server.

Related

PHP doesn't recognize filename with apostrophe in it

Currently I am trying to check with PHP if a file exists. The current file I am trying to check if it exists has an apostrophe in it, the file is called:13067-AP-03 A - Situation projetée.pdf.
The code I use to check if the file exist is:
$filename = 'C:/13067-AP-03 A - Situation projetée.pdf';
if (file_exists($filename))
{
echo "The file exists";
} else
{
echo "The file does not exist";
}
The problem that I am facing right now is that whenever I try to check if the file exists I get the message it doesn't exist. If I continue to remove the é I get the message that the file does exist.
It looks that PHP somehow doesn't recognize the file if it has a apostrophe in it. I tried the following:
urlencode($filename);
addslashes($filename);
utf8_encode($filename);
None of which worked. I also tried:
setlocale(LC_ALL, "en_US.utf8");
Maybe worth noticing is that when I get the filename straight from PHP I get the following:
13067-AP-03 A - Situation projet�e.pdf
I have to do the following to have the filename displayed correctly:
$filename = iconv( "CP437", 'UTF-8', $filename);
I was wondering if someone had the same problem before and could help me out with this one. All help is greatly appreciated.
For those who are interested, the script runs on a windows machine.
Strangely this worked: I copied all the source code from Sublime Text 3 to notepad. I proceeded to save the source code in notepad by overwriting the PHP file.
Now when I check to see if the file exists it shows the following filename that exists:
13067-AP-03 A - Situation projet�e.pdf
The only problem that I am facing right now is that I want to download the file using file_get_contents. But file_get_contents doesnt interpet the � as an apostrophe.

I think it's a problem of the PHP under Windows. I downloaded a Windows binary copy to my Windows who's in Japanese and successfully reproduced your problem.
According to https://bugs.php.net/bug.php?id=47096
So, if you have a generic name of a file (along with its path) as a Unicode string $u (for example UTF-8 encoded) and you want to try to save it with that name under Windows, you must first check the current locale calling setlocale(LC_CTYPE, 0) to retrieve the current code page, then you must convert $u to an array of bytes according to the code page; if one or more code points have no counterpart in the current code page, the file cannot be saved with that name from PHP. Dot.
My code page is CP932, which you can see yours by running chcp in cmd.
So the code is expected to be:
$filename='C:\Users\Frederick\Desktop\13067-AP-03 A - Situation projetée.pdf';
$filename=mb_convert_encoding($filename, 'CP932', 'UTF-8');
var_dump($filename);
var_dump(file_exists($filename));
But this won't work! Why? Because CP932 doesn't contain the character of é!
According to https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16, and FAT32 file systems use the OEM character set.
Windows itself uses UTF-16LE, which is called Unicode by Microsoft, to save its file names. But PHP doesn't support a UTF-16LE encoded file name.
In conclusion, it's a pity that I cannot find a way to solve the problem rather than escaping all those characters when naming the files if you work on Windows. And I also do not think that the team of PHP will solve the problem in the future.

Make sure that your text editor is saving the file as "UTF-8 without BOM"
BOM is the Byte Order Mark, two bytes placed at the start of the file which allow software reading the file to determine if it has been saved as little-endian or big-endian, however the PHP interpreter cannot interpret these characters and so you must save the file without the byte order mark.

Try this on start of your php file:
<?php
header('Content-Type: text/html; charset=utf-8');
?>

Encoding bug with an image and file_get_contents in PHP

I use this code to retrieve and display an image:
header("Content-type: image/png");
echo file_get_contents(site_domain().image_asset_module_url("header.png",$this->name));
on my local WAMP it works, but on the remote server file_get_contents returns a wrong-encoded string:
Local:
‰PNG IHDR^jRÀ2¡ pHYsÒÝ~üÿÿIDATxÚì½˜Uõµþ¿`ŠŠÔéÃÕ¨¹&&ù'77¹i¦˜è‰=V:RlH‡™aAlH™B¯Jbh...
Remote:
�PNG IHDR^jR�2� pHYs��~���IDATx����U����`������ը�&&�'77�i��草=V:Rl...
If I use utf8_encode I get:
PNG IHDR^jRÀ2¡ pHYsÒÝ~üÿÿIDATxÚì½Uõµþ¿`ÔéÃÕ¨¹&&ù'77¹i¦è=V:RlHaAlHB¯Jbh...
So I always get a break picture on my remote Server - why and what is the solution?

The data is always the same. file_get_contents does not alter data in any way. You're also not dealing with text in some encoding, but with binary data. Any sort of text-encoding or conversion thereof does not apply here.
Your first sample is the binary image data as interpreted as Latin-1 encoded text.
Your second sample is the same binary data as interpreted as UTF-8 encoded text.
I.e., the data is fine, the interpretation is wrong. The interpretation should be set by the Content-Type header, perhaps this is not being set correctly on the remote server. For this problem, inspect the raw HTTP response headers and see How to fix "Headers already sent" error in PHP.

I would have rather used
<?php
$file = 'http://url/to_image.png';
$data = file_get_contents($file);
header('Content-type: image/png');
echo $data;
Or Can you try this
$remoteImage = "http://www.example.com/gifs/logo.gif";
$imginfo = getimagesize($remoteImage);
header("Content-type: $imginfo['mime']");
readfile($remoteImage);

PHP generate Excel/CSV file and send as UTF-8

I'm retrieving data from my Postgres DB in UTF-8. The db and the client_connection settings are in UTF-8.
Then I send 2 headers to the visitor:
header("Content-Type: application/msexcel");
header("Content-Disposition: $mode; filename=export.xls");
and start outputting plain text data in a CSV-manner. This will open as a simple Excel file on the visitors desktop.
$cols = array ("col1", "col2", "col3");
echo implode("\t", $cols)."\r\n";
Works fine, untill special characters like é, è etc are encountered.
I tried changing my client_encoding while retrieving the data from the db to latin-1, which works in most cases but not for all chars. So that is not a solution.
How could I send the outputted file as UTF-8? I don't think converting the data from the db to latin-1 is possible, since the char seems unknown in latin-1 ... so I need Excel to treat the file as UTF-8

I'd look into using the PHPExcel engine. It uses UTF-8 as default and it can generate a whole list of spreadsheet file types (Excel, OpenOffice, CSV, etc.).

I would recommend not sending plain-text and masquerading it as Excel. XLS files are typically binary, and while binary isn't required, the official Excel method of using non-binary data is to format it as XML.
You mention "CSV" in the title, but nothing about your problem includes anything related to CSV. I bring this up because I believe that you should actually change your tabs to commas, and then you could simply output a standard .csv file, which is read by Excel still but doesn't rely on undocumented or unstable functionality.
If you truly want to send application/msexcel, then you should use a real Excel library, because currently, you are not creating a real Excel file.

use ; charset=UTF-8 after aplication/xxxxxx I do use:
header("Content-Type: application/vnd.ms-excel; charset=UTF-8");
// header("Content-Length: " . strlen($thecontent)); // this is not mandatory
header('Content-Disposition: attachment; filename="file.xls"');

Try mb_convert_encoding function.

Try to use iconv, for converting string into required charset.

Have you tried utf8_encode() the string?
So something like: echo implode("\t", utf8_encode($cols)."\r\n")
Not sure if that would work, but give it a go

Accents in uploaded file being replaced with '?'

I am building a data import tool for the admin section of a website I am working on. The data is in both French and English, and contains many accented characters. Whenever I attempt to upload a file, parse the data, and store it in my MySQL database, the accents are replaced with '?'.
I have text files containing data (charset is iso-8859-1) which I upload to my server using CodeIgniter's file upload library. I then read the file in PHP.
My code is similar to this:
$this->upload->do_upload()
$data = array('upload_data' => $this->upload->data());
$fileHandle = fopen($data['upload_data']['full_path'], "r");
while (($line = fgets($fileHandle)) !== false) {
echo $line;
}
This produces lines with accents replaced with '?'. Everything else is correct.
If I download my uploaded file from my server over FTP, the charset is still iso-8850-1, but a diff reveals that the file has changed. However, if I open the file in TextEdit, it displays properly.
I attempted to use PHP's stream_encoding method to explicitly set my file stream to iso-8859-1, but my build of PHP does not have the method.
After running out of ideas, I tried wrapping my strings in both utf8_encode and utf8_decode. Neither worked.
If anyone has any suggestions about things I could try, I would be extremely grateful.

It's Important to see if the corruption is happening before or after the query is being issued to mySQL. There are too many possible things happening here to be able to pinpoint it. Are you able to output your MySql to check this?
Assuming that your query IS properly formed (no corruption at the stage the query is being outputted) there are a couple of things that you should check.
What is the character encoding of the database itself? (collation)
What is the Charset of the connection - this may not be set up correctly in your mysql config and can be manually set using the 'SET NAMES' command
In my own application I issue a 'SET NAMES utf8' as my first query after establishing a connection as I am unable to change the MySQL config.
See this.
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
Edit: If the issue is not related to mysql I'd check the following
You say the encoding of the file is 'charset is iso-8859-1' - can I ask how you are sure of this?
What happens if you save the file itself as utf8 (Without BOM) and try to reprocess it?
What is the encoding of the php file that is performing the conversion? (What are you using to write your php - it may be 'managing' this for you in an undesired way)
(an aside) Are the files you are processing suitable for processing using fgetcsv instead?
http://php.net/manual/en/function.fgetcsv.php

Files uploaded to your server should be returned the same on download. That means, the encoding of the file (which is just a bunch of binary data) should not be changed. Instead you should take care that you are able to store the binary information of that file unchanged.
To achieve that with your database, create a BLOB field. That's the right column type for it. It's just binary data.
Assuming you're using MySQL, this is the reference: The BLOB and TEXT Types, look out for BLOB.

The problem is that you are using iso-8859-1 instead of utf-8. In order to encode it in the correct charset, you should use the iconv function, like so:
$output_string = iconv('utf-8", "utf-8//TRANSLIT", $input_string);
iso-8859-1 does not have the encoding for any sort of accents.
It would be so much better if everything were utf-8, as it handles virtually every character known to man.

fwrite() and UTF8

I am creating a file using php fwrite() and I know all my data is in UTF8 ( I have done extensive testing on this - when saving data to db and outputting on normal webpage all work fine and report as utf8.), but I am being told the file I am outputting contains non utf8 data :( Is there a command in bash (CentOS) to check the format of a file?
When using vim it shows the content as:
Donâ~#~Yt do anything .... Itâ~#~Ys a
great site with
everything....Weâ~#~Yve only just
launched/
Any help would be appreciated: Either confirming the file is UTF8 or how to write utf8 content to a file.
UPDATE
To clarify how I know I have data in UTF8 i have done the following:
DB is set to utf8 When saving data
to database I run this first:
$enc = mb_detect_encoding($data);
$data = mb_convert_encoding($data, "UTF-8", $enc);
Just before I run fwrite i have checked the data with Note each piece of data returns 'IS utf-8'
if (strlen($data)==mb_strlen($data, 'UTF-8')) print 'NOT UTF-8';
else print 'IS utf-8';
Thanks!

If you know the data is in UTF8 than you want to set up the header.
I wrote a solution answering to another tread.
The solution is the following: As the UTF-8 byte-order mark is \xef\xbb\xbf we should add it to the document's header.
<?php
function writeStringToFile($file, $string){
$f=fopen($file, "wb");
$file="\xEF\xBB\xBF".$file; // this is what makes the magic
fputs($f, $string);
fclose($f);
}
?>
You can adapt it to your code, basically you just want to make sure that you write a UTF8 file (as you said you know your content is UTF8 encoded).

fwrite() is not binary safe. That means, that your data - be it correctly encoded or not - might get mangled by this command or it's underlying routines.
To be on the safe side, you should use fopen() with the binary mode flag. that's b. Afterwards, fwrite() will safe your string data "as-is", and that is in PHP until now binary data, because strings in PHP are binary strings.
Background: Some systems differ between text and binary data. The binary flag will explicitly command PHP on such systems to use the binary output. When you deal with UTF-8 you should take care that the data does not get's mangeled. That's prevented by handling the string data as binary data.
However: If it's not like you told in your question that the UTF-8 encoding of the data is preserved, than your encoding got broken and even binary safe handling will keep the broken status. However, with the binary flag you still ensure that this is not the fwrite() part of your application that is breaking things.
It has been rightfully written in another answer here, that you do not know the encoding if you have data only. However, you can validate data if it validates UTF-8 encoding or not, so giving you at least some chance to check the encoding. A function in PHP which does this I've posted in a UTF-8 releated question so it might be of use for you if you need to debug things: Answer to: SimpleXML and Chinese look for can_be_valid_utf8_statemachine, that's the name of the function.

//add BOM to fix UTF-8 in Excel
fputs($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF) ));
I find this piece works for me :)

The problem is your data is double encoded. I assume your original text is something like:
Don’t do anything
with ’, i.e., not the straight apostrophe, but the right single quotation mark.
If you write a PHP script with this content and encoded in UTF-8:
<?php
//File in UTF-8
echo utf8_encode("Don’t"); //this will double encode
You will get something similar to your output.

$handle = fopen($file,"w");
fwrite($handle, pack("CCC",0xef,0xbb,0xbf));
fwrite($handle,$file);
fclose($handle);

I know all my data is in UTF8 - wrong.
Encoding it's not the format of a file. So, check charset in headers of the page, where you taking data from:
header("Content-type: text/html; charset=utf-8;");
And check if data really in multi-byte encoding:
if (strlen($data)==mb_strlen($data, 'UTF-8')) print 'not UTF-8';
else print 'utf-8';

There is some reason:
first you get information from database it is not utf-8.
if you sure that was true use this ,I always use this and it work :
$file= fopen('../logs/logs.txt','a');
fwrite($file,PHP_EOL."_____________________output_____________________".PHP_EOL);
fwrite($file,print_r($value,true));

The only thing I had to do is add a UTF8 BOM to the CSV, the data was correct but the file reader (external application) couldn't read the file properly without the BOM

Try this simple method that is more useful and add to the top of the page before tag <body> :
<head>
<meta charset="utf-8">
</head>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

ANSI <--> UTF-8 charset issue, when displaying images from the database (PHP) - php

It could be that your PHP server charset is not the correct one. In your php.ini file, try having the following directive: default_charset = "UTF-8" And restart your server.

Related

PHP doesn't recognize filename with apostrophe in it

Encoding bug with an image and file_get_contents in PHP

PHP generate Excel/CSV file and send as UTF-8

Accents in uploaded file being replaced with '?'

fwrite() and UTF8

Categories

Resources