PHP File Opening Encoding Problem? - php

When I try to open a .log file created by a game in PHP I get a bunch of this.
ÿþ*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*�*� �
K�2� �E�n�g�i�n�e� �s�t�a�r�t� �u�p�.�.�.� �
[�2�0�0�9�/�2�2�/�0�9�]� �
[�1�6�:�0�7�:�3�3�]� �
[�0�.�1�.�4�6�.�0�]� �
[�0�]� �
I have no idea as to why? My code is
$file = trim($_GET['id']);
$handle = #fopen($file, "a+");
if ($handle) {
print "<table>";
while (!feof($handle)) {
$buffer = stream_get_line($handle, 10000, "\n");
echo "<tr><td width=10>" . __LINE__ . "</td><td>" . $buffer . "</td></tr>";
}
print "</table>";
fclose($handle);
I'm using stream_get_line because it is apparently better for large files?

PHP doesn't really know much about encodings. In particular, it knows nothing about the encoding of your file.
The data looks like UTF-16LE. so you'll need to convert that into something you can handle - or, since you're just printing, you can convert the entire script to output its HTML as UTF-16LE as well.
I would probably prefer converting to UTF-8 and using that as the page encoding, so you're sure no characters are lost. Take a look at iconv, assuming it's available (a PHP extension is required on Windows, I believe).
Note that regardless of what you do, you should strip the first two characters of the first line, assuming the encoding is always the same. In the data you're showing, these characters are the byte order mark, which tells us the file's encoding (UTF-16LE, like I mentioned earlier).
However, seeing as how it appears to be plain text, and all you're doing is printing the data, consider just opening it in a plain old text editor (that supports Unicode). Not knowing your operating system, I'm hesitant to suggest a specific one, but if you're on Windows and the file is relatively small, Notepad can do it.
As a side note, __LINE__ will not give you the line number of the file you're reading, it will print the line number of the currently executing script line.

You might be running into a UTF-8 Byte Order Mark: http://en.wikipedia.org/wiki/Byte-order_mark
Try reading it like so:
<?php
// Reads past the UTF-8 bom if it is there.
function fopen_utf8 ($filename, $mode) {
$file = #fopen($filename, $mode);
$bom = fread($file, 3);
if ($bom != b"\xEF\xBB\xBF")
rewind($file, 0);
else
echo "bom found!\n";
return $file;
}
?>
From: http://us3.php.net/manual/en/function.fopen.php#78308

Related

PHP : converting an UCS-2 LE BOM string to UTF-8 stops working once i write the string to a file

I am currently having a hard time trying to do the simplest thing :
I have a UCS-2 LE BOM encoded file that I am converting to UTF-8.
Here is what Notepad++ says about the encoding :
My converting routine is simple :
I am opening the input file and creating an output file.
I am parsing the input file and converting everyline on-the-go to the UTF-8 format
Once the converting is done, I remove the input file
Once the input file is removed, I rename my output file to the name of the input file
Here is the code that does it :
public function convertCsvToUtf8(string $absolutePathToFile) : string {
$dotPosition = strrpos($absolutePathToFile, ".");
$absolutePathToNewFile = substr($absolutePathToFile, 0, $dotPosition)."-utf8.csv";
$res_input_file = fopen($absolutePathToFile, "r");
$res_output_file = fopen($absolutePathToNewFile, "w+");
while($input_string = fgets($res_input_file)){
$inputEncoding = mb_detect_encoding($input_string, mb_list_encodings(), true);
$output_string = iconv($inputEncoding, 'UTF-8', $input_string);
fputs($res_output_file, ($output_string));
}
fclose($res_input_file);
fclose($res_output_file);
unlink($absolutePathToFile);
rename($absolutePathToNewFile, $absolutePathToFile);
return $absolutePathToFile;
}
Here you can see an example of an execution :
So... everything seems to be okay at a first glance (expect the fact that the "°" is replaced by a weird character); but when I open the output file with Notepad++, here is a sample what I see :
I have no idea what is going on here.
Any help would be awesome !
Feel free to ask for more details !
Thanks in advance,

Encoding issue with PHP while writing in a .csv file

I'm working with a php array which contains some values parsed from a previous scraping process (using Simple HTML DOM Parser). I can normally print / echo the values of this array, which contains special chars é,à,è, etc. BUT, the problem is the following :
When I'm using fwrite to save values in a .csv file, some characters are not successfully saved. For example, Székesfehérvár is well displayed on my php view in HTML, but saved as Székesfehérvár in the .csv file which I generate with the php script above.
I've already set-up several things in the php script :
The page I'm scraping seems to be utf-8 encoded
My PHP script is also declared as utf-8 in the header
I've tried a lot of iconv and mb_encode methods in different places in the code
NOTE that when I'm make a JS console.log of my php array, using json_encode, the characters are also broken, maybe linked to the original encoding of the page I'm scraping?
Here's a part of the script, it is the part who is writing values in a .csv file
<?php
$data = array(
array("item1", "item2"),
array("item1", "item2"),
array("item1", "item2"),
array("item1", "item2")
// ...
);
//filename
$filename = 'myFileName.csv';
foreach($data as $line) {
$string_txt = ""; //declares the content of the .csv as a string
foreach($line as $item) {
//writes a new line of the .csv
$line_txt = "";
//each line of the .csv equals to the values of the php subarray, tab separated
$line_txt .= $item . "\t";
}
//PHP endline constant, indicates the next line of the .csv
$line_txt .= PHP_EOL;
//add the line to the string which is the global content of the .csv
$line_txt .= $string_txt;
}
//writing the string in a .csv file
$file = fopen($filename, 'w+');
fwrite($file, $string_txt);
fclose($file);
I am currently stuck because I can't save values with accentuated characters correctly.
Put this line in your code
header('Content-Type: text/html; charset=UTF-8');
Hope this helps you!
Try it
$file = fopen('myFileName.csv','w');
$data= array_map("utf8_decode", $data);
fputcsv($file,$data);
Excel has problems displaying utf8 encoded csv files. I saw this before. But you can try utf8 BOM. I tried it and works for me. This is simply adding these bytes at the start of your utf8 string:
$line_txt .= chr(239) . chr(187) . chr(191) . $item . "\t";
For more info:
Encoding a string as UTF-8 with BOM in PHP
Alternatively, you can use the file import feature in Excel and make sure the file origin says 65001 : Unicode(UTF8). It should display your text properly and you will need to save it as an Excel file to preserve the format.
The solution (provided by #misorude) :
When scraping HTML contents from webpages, there is a difference between what's displayed in your debug and what's really scraped in the script. I had to use html_entity_decode to let PHP interpret the true value of the HTML code I've scraped, and not the browser's interpretation.
To validate a good retriving of values before store them somewhere, you could try a console.log in JS to see if values are correctly drived :
PHP
//decoding numeric HTML entities who represents "Sóstói Stadion"
$b = html_entity_decode("Sóstói Stadion");
Javascript (to test):
<script>
var b = <?php echo json_encode($b) ;?>;
//print "Sóstói Stadion" correctly
console.log(b);
</script>

trying to export csv in microsoft excel with special character like chinese but failed

I have a web app where I am trying to export to CSV from a database.
It runs perfectly with english character set, but when I put some chinese text in the database my CSV shows dumb character like ????.
<?php
$con=mysqli_connect(global_dbhost,global_dbusername,global_dbpassword,global_dbdatabase);
if(isset($_GET['csv']))
{
$query ='SELECT CONCAT("TC00", `t_id`),m_id,s_id,t_name,Description,start_date,end_date,start_time,end_time,status,active FROM tc_task';
$today = date("dmY");
//CSVExport($query);
$con=mysqli_connect(global_dbhost,global_dbusername,global_dbpassword,global_dbdatabase);
//echo 'inside function';
$sql_csv = mysqli_query($con,$query) or die("Error: " . mysqli_error()); //Replace this line with what is appropriate for your DB abstraction layer
file_put_contents("csvLOG.txt","\n inside ajax",FILE_APPEND);
header("Content-type:text/octect-stream");
header("Content-Disposition:attachment;filename=caring_data".$today.".csv");
while($row = mysqli_fetch_row($sql_csv)) {
print '"' . stripslashes(implode('","',$row)) . "\"\n";
}
exit;
}
?>
Solution available here:
Open in notepad (or equivalent)
Re-Save CSV as Unicode (not UTF-8)
Open in Excel
Profit
Excel does not handle UTF-8. If you go to the import options for CSV files, you will notice there is no choice for UTF-8 encoding. Since Unicode is supported, the above should work (though it is an extra step). The equivalent can likely be done on the PHP side (if you can save as Unicode instead of UTF-8). This seems doable according to this page which suggests:
$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str, 'UTF-16LE', 'UTF-8');

Excel csv export into a php file with fgetcsv

I'm using excel 2010 professional plus to create an excel file.
Later on I'm trying to export it as a UTF-8 .csv file.
I do this by saving it as CSV (symbol separated.....sry I know not the exact wording there
but I don't have the english version and I fear it is translated differently than 1:1).
There I click on tools->weboptions and select unicode (UTF-8) as encoding.
The example .csv is as follows:
ID;englishName;germanName
1;Austria;Österreich
So far so good, but if I open the file now with my php code:
header('Content-Type: text/html; charset=UTF-8');
iconv_set_encoding("internal_encoding", "UTF-8");
iconv_set_encoding("output_encoding", "UTF-8");
setlocale(LC_ALL, 'de_DE.utf8');
$fp=fopen($filePathName,'r');
while (($dataRow= fgetcsv($fp,0,";",'"') )!==FALSE)
{
print_r($dataRow);
}
I get: �sterreich as a result on the screen (as that is the "error" I cut all other parts of the result).
If I open the file with notedpad++ and look at the encoding I see "ANSI" instead of UTF-8.
If I change the encoding in notepad++ to UTF8....the ö,ä,... are replaced by special chars, which I have to correct manually.
If I go another route and create a new UTF-8 file with notedpad++ and put in the same data as in the excel file I get shown "Österreich" on screen when I open it with the php file.
Now the question I have is, why does it not function with excel, thus am I doing something wrong here? Or am I overlooking something?
Edit:
As the program will in the end be installed on windows servers provided by customers,
a solution is needed where it is not necessary to install additional tools (php libraries,... are ok, but having to install a vm-ware or cygwin,... is not).
Also there won't be a excel (or office) locally installed on the server as the
customer will be able to upload the .csv file via a file upload dialog (the dialog itself
is not part of the problem, as I know how to handle those and additionally the problem itself I stumbled over when I created an excel file and converted it to .csv on a testmachine where excel was locally installed).
Tnx
From PHP DOC
Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.
You can try
header('Content-Type: text/html; charset=UTF-8');
$fp = fopen("log.txt", "r");
echo "<pre>";
while ( ($dataRow = fgetcsv($fp, 1000, ";")) !== FALSE ) {
$dataRow = array_map("utf8_encode", $dataRow);
print_r($dataRow);
}
Output
Array
(
[0] => ID
[1] => englishName
[2] => germanName
)
Array
(
[0] => 1
[1] => Austria
[2] => Österreich
)
I don't know why Excel is generating a ANSI file instead of UTF-8 (as you can see in Notepad++), but if this is the case, you can convert the file using iconv:
iconv --from-code=ISO-8859-1 --to-code=UTF-8 my_csv_file.csv > my_csv_file_utf8.csv
And for the people from Czech republic:
function convert( $str ) {
return iconv( "CP1250", "UTF-8", $str );
}
...
while (($data = fgetcsv($this->fhandle, 1000, ";")) !== FALSE) {
$data = array_map( "convert", $data );
...
The problem must be your file encoding, it looks it's not utf-8.
When I tried your example and double checked file that is indeed utf-8, it works for me, I get:
Array ( [0] => 1 [1] => Austria [2] => Österreich )
Use LibreOffice (OpenOffice), it's more reliable for these sort of things.
From what you say, I suspect excel writes an UTF-8 file without BOM, which makes guessing that the encoding is utf-8 slightly trickier. You can confirm this diagnostic if the characters appear correctly in Notepad++ when pressing to Format->Encode in UTF-8 (without BOM) (rather than Format->Convert to UTF-8 (without BOM)).
And are you sure every user is going to use UTF-8 ? Sounds to me that you need something that does a little smart guessing of what your real input encoding is. By "smart", I mean that this guessing recognizes BOM-less UTF-8.
To cut to the chase, I'd do something like that :
$f = fopen('file.csv', 'r');
while( ($row = fgets($f)) != null )
if( mb_detect_encoding($row, 'UTF-8', true) !== false )
var_dump(str_getcsv( $row, ';' ));
else
var_dump(str_getcsv( utf8_encode($row), ';' ));
fclose($f);
Which works because you read the characters to guess the encoding, rather than lazily trusting the first 3 characters : so UTF-8 without BOM would still be recognized as UTF-8. Of course if your csv file is not too big you could do that encoding detection on the whole file contents : something like mb_detect_encoding(file_get_contents(...), ...)

PHP fputcsv with UTF-8 Problem

I'm trying to allow my clients view some of the MySQL data in Excel. I have used PHP's fputcsv() function, like:
public function generate() {
setlocale(LC_ALL, 'ko_KR.UTF8');
$this->filename = date("YmdHis");
$create = $this->directory."Report".$this->filename.".csv";
$f = fopen("$create","w") or die("can't open file");
fwrite($f, "\xEF\xBB\xBF");
$i = 1;
$length = count($this->inputarray[0]);
fwrite($f, $this->headers."\n");
// print column titles
foreach($this->inputarray[0] as $key=>$value) {
$delimiter = ($i == $length) ? "\n\n" : ",";
fwrite($f, $key.$delimiter);
$i++;
}
// print actual rows
foreach($this->inputarray as $row) {
fputcsv($f, $row);
}
fclose($f);
}
My clients are Korean, and a good chunk of the MySQL database contains values in utf8_unicode_ci. By using the above function, I successfully generated a CSV file with correctly encoded data that opens fine in Excel on my machine (Win7 in English), but when I opened the file in Excel on the client computer (Win7 in Korean), the characters were broken again. I tried taking the header (\xEF\xBB\xBF) out, and commenting out the setlocale, to no avail.
Can you help me figure this out?
If, as you say, your CSV file has "correctly encoded data" - i.e. that it contains a valid UTF-8 byte stream, and assuming that the byte stream of the file on your client's site is the same (e.g. has not been corrupted in transit by a file transfer problem) then it sounds like the issue Excel on the client's machine not correctly interpreting the UTF-8. This might be because it's not supported or that some option needs to be selected when importing to indicate the encoding. As such, you might try producing your file in a different encoding (using mb_convert_encoding or iconv).
If you get your client to export a CSV containing Korean characters then you'll be able to take a look at that file and determine the encoding that is being produced. You should then try using that encoding.
Try encoding the data as UTF-16LE, and ensure that the file has the appropriate BOM.
Alternatively, send your clients an Excel file rather than a CSV, then the encoding shouldn't be a problem
Try wrapping the text in each fwrite call with utf8_encode.
Then use what is suggested here: http://www.php.net/manual/en/function.fwrite.php#69566

Categories