File encoding commande - php

I have 1200 files encoded ANSI. I need to convert them into UTF-8. It is not reasonable to convert each file using the simple solution file/save as!
Is there a commande in php which convert files from ANSI to UTF-8?

You can do this with the iconv library, which has a PHP binding (https://secure.php.net/manual/en/function.iconv.php). Consider using the command-line program to convert your source files instead, and keeping everything in utf8 instead of juggling encodings.

I have found the solution using PHP.
This is the code used:
<?php
set_time_limit ( 30000 );
$k=0;
while ($k<1232)
{
$fres="contenu_url".$k.".txt";
$inF = fopen($fres,"r");
$fres1="contenu_utf".$k.".txt";
$OutF = fopen($fres1,"w+");
$k=$k+1;
if($inF == false)
echo "<p>Impossible d'ouvrir le fichier</p>.\n";
$contenu_ancien="";
while (!feof($inF))
$contenu_ancien .= fgets($inF, 4096);
$contenu_utf8 = utf8_encode ($contenu_ancien);
fputs($OutF,$contenu_utf8);
fclose($OutF); fclose($inF);
}
?>

Related

Special characters encoding during CSV import

I have script that read *.CSV file and then export it content to MSSQL Database. Script is running only via CLI.
My problem is that this CSV file contains string with national characters like ą,ó,ż,ź,ś. For example i have word pracowników but in CLI i see word pracownikˇw.
My code
$handler = fopen($file, "r");
if ($handler !== false) {
while (($this->currentRow = fgetcsv($handler, 0, $this->csvDelimiter)) !== false) {
$row = $this->setHeaders(
$this->currentRow,
$this->config[$type]['columnMapping']
);
if ($row !== false) {
$this->dataImported[$type][] = $row;
}
}
fclose($handler);
}
What i tried
Using fgetcsv with setlocale or without - not working.
Replace fgetcsv with fgets and read each line via str_getcsv - not working.
Using utf8_encode for each row - not working.
Additional info
According to my PHP (PHP5.3) and few editors this file is encoded in ANSII, i tried to decoded it with iconv but all special characters are always replace with some strange symbols, like showed before.
On loop of $this->currentRow try to use for each element which has special char.
echo mb_convert_encoding($data[$c],"HTML-ENTITIES","UTF-8");

Encoding Issue while reading Excel file through PHP COM

I am reading an excel spreadsheet with PHP COM utility, everything is working fine except there are some cells in Excel file having different language data. When I read this data through PHP Com it displays like ???????
$ExlApp = new COM ( "Excel.Application" );
$workbook = $ExlApp->Workbooks->Open ( 'f:\dev\htdocs\excel\testfile.xlsx' );
$worksheet = $workbook->worksheets ( 1 );
$done = false;
$row_index = 1;
while ( $done == false ) {
$english = $worksheet->cells ( $row_index, 1 )->value;
$dari = $worksheet->cells ( $row_index, 2 )->value;
if ($english != '') {
$row_index ++;
echo "<div style='float:left;width:420px'>".$english."</div><div>".$dari."</div>";
} else {
$done = true;
}
}
$workbook->close ();
I have checked page encoding and its set to UTF-8. When I open original excel file it shows correct text but when I read it from PHP COM the encoding is lost. Does anyone have solution to this problem.
EDIT
How I can ensure that the value given by excel $worksheet->cells ( $row_index,2)->value is in correct encoding OR is there any property in Excel which I can set through PHP COM so it return data in UTF-8?
I have checked the encoding of value returned by Excel cell through mb_detect_encoding function in PHP and it gives ASCII where as it must give UTF-16 or UTF-8. It appears that excel does not give value in correct encoding.
Here is the Excel file I am reading with this script:
http://asimishaq.com/myfiles/testfile.xlsx
Note that the solution is required using PHP COM-INTEROP only.
As pointed out by #rc we need to specify codepage property in COM constructor to obtain data in correct encoding.
$ExlApp = new COM ( "Excel.Application", NULL, CP_UTF8 );
By changing the above line in script the data is displayed correctly.

How to properly read .nfo file with PHP

I'm trying to open and show a nfo file with a php script.
Everything is working but the result isn't like in the NFO file. I got special chars like that :
�������������������������
When I open the source code of the result, I can see the NFO file like he is in real!
Did I need to use some special tricks for HTML or something like that ?
You can convert the character encoding of your NFO text (to output to eg. utf8):
$nfoContent = file_get_contents('foo.nfo');
$nfoContent = mb_convert_encoding($nfoContent, 'UTF-8', 'ASCII');
Thanks for your reply!
I found it, in PHP you just need to do :
header('Content-Type: text/plain; charset=ansi');
<?php
header('Content-Type: text/html; charset=UTF-8');
$file = 'CORE.NFO';
$nfo = file_get_contents($file);
echo '<pre>';
echo iconv('CP850', 'UTF-8', $nfo).PHP_EOL;
echo '</pre>';
http://pastebin.com/uqxg4yYC
CP850 west europe
CP866 russia
https://de.wikipedia.org/wiki/NFO

How do I get fgetcsv() in PHP to work with Japanese characters?

I have the following data being generated from a google spreadsheet rss feed.
いきます,go,5
きます,come,5
かえります,"go home, return",5
がっこう,school,5
スーパー,supermarket,5
えき,station,5
ひこうき,airplane,5
Using PHP I can do the following:
$url = 'http://google.com.....etc/etc';
$data = file_get_contents($url);
echo $data; // This prints all Japanese symbols
But if I use:
$url = 'http://google.com.....etc/etc';
$handle = fopen($url);
while($row = fgetcsv($handle)) {
print_r($row); // Outputs [0]=>,[1]=>'go',[2]=>'5', etc, i.e. the Japanese characters are skipped
}
So it appears the Japanese characters are skipped when using either fopen or fgetcsv.
My file is saved as UTF-8, it has the PHP header to set it as UTF-8, and there is a meta tag in the HTML head to mark it as UTF-8. I don't think it's the document it's self because it can display characters through the file_get_contents method.
Thanks
I can't add comment to the answer from Darien
I reproduce the problem, after change a locale the problem was solved.
You must install jp locale on server before trying repeat this.
Ubuntu
Add a new row to the file /var/lib/locales/supported.d/local
ja_JP.UTF-8 UTF-8
And run command
sudo dpkg-reconfigure locales
Or
sudo locale-gen
Debian
Just execute "dpkg-reconfigure locales" and select necesary locales (ja_JP.UTF-8)
I don't know how do it for other systems, try searching by the keywords "locale-gen locale" for your server OS.
In the php file, before open csv file, add this line
setlocale(LC_ALL, 'ja_JP.UTF-8');
This looks like it might be the same as PHP Bug 48507.
Have you tried changing your PHP locale setting prior to running the code and resetting it afterwards?
You might want to consider this library. I remember using it some time back, and it is much nicer than the built-in PHP functions for handling CSV files. がんばって!
May be iconv character encoding help you
http://php.net/manual/en/function.iconv.php
You can do that by hand not using fgetcsv and friends:
<?php
$file = file('http://google.com.....etc/etc');
foreach ($file as $row) {
$row = preg_split('/,(?!(?:[^",]|[^"],[^"])+")/', trim($row));
foreach ($row as $n => $cell) {
$cell = str_replace('\\"', '"', trim($cell, '"'));
echo "$n > $cell\n";
}
}
Alternatively you can opt in for a more fancy closures-savvy way:
<?php
$file = file('http://google.com.....etc/etc');
array_walk($file, function (&$row) {
$row = preg_split('/,(?!(?:[^",]|[^"],[^"])+")/', trim($row));
array_walk($row, function (&$cell) {
$cell = str_replace('\\"', '"', trim($cell, '"'));
});
});
foreach ($file as $row) foreach ($row as $n => $cell) {
echo "$n > $cell\n";
}

exporting php output as excel

include_once 'mysqlconn.php';
include_once "functions.php";
$filename = $_GET['par'].".xls";
header("Content-type: application/x-msexcel");
header('Content-Disposition: attachment; filename="'.basename($filename).'"');
if ($_GET['i'] == "par1") {
func1();
} else if ($_GET['i'] == "par2") {
echo "şşşıııİİİ";
func2();
} else if ($_GET['i'] == "par3") {
echo "şşşıııİİİ";
func3();
}
this is my export2excel.php file and func1,2,3 are in functions.php file and produces table output all work well except character encoding in a strange way. I am using utf-8 encoding for all my files. 2nd else if statement above produces healthy encoded output but rest 2 are encodes my output with strange characters like "BÃœTÇE İÇİ". it is "BÜTÇE İÇİ" in turkish.
in short. same files, same encoding, same database but different results.
any idea?
Excel uses UTF-16LE + BOM as default Unicode encoding.
So you have to convert your output to UTF-16LE and prepend the UTF-16LE-BOM "\xFF\xFE".
Some further information:
Microsoft Excel mangles Diacritics in .csv files?
Exporting data to CSV and Excel in your Rails apps
Instead I would use one of the existing libraries
PHP Excel Extension PECL extension by Ilia Alshanetsky (Core PHP Developer & Release Master)
Spreadsheet_Excel_Writer PEAR Package
PHPExcel
Edit:
Some code that could help if you really not want to use an existing library
<?php
$output = <<<EOT
<table>
<tr>
<td>Foo</td>
<td>IñtërnâtiônàlizætiøöäÄn</td>
</tr>
<tr>
<td>Bar</td>
<td>Перевод русского текста в транслит</td>
</tr>
</table>
EOT;
// Convert to UTF-16LE
$output = mb_convert_encoding($output, 'UTF-16LE', 'UTF-8');
// Prepend BOM
$output = "\xFF\xFE" . $output;
header('Pragma: public');
header("Content-type: application/x-msexcel");
header('Content-Disposition: attachment; filename="utf8_bom.xls"');
echo $output;
if anyone is trying to use the excel_writer in moodle and is getting encoding issues with output - say if you're developing a report that has a url as data in a field - then in this instance to simply fix this issue I wrapped the data in quotes so it at least opened up in excel here's my example:
// Moodles using the PEAR excel_writer export
$table->setup();
$ex=new table_excel_export_format($table);
$ex->start_document( {string} );
$ex->start_table( {string} );
// heading on the spreadsheet
$title = array('Report Title'=>'Report 1');
$ex->add_data($title);
// end heading
$ex->output_headers( array_keys($table->columns) );
**foreach($data as $row){
$string="'".trim($row->resname,"'")."'";
$row->resname=$string;
$ex->add_data( $table->get_row_from_keyed($row) );
}**
$ex->finish_table();
$ex->finish_document();
Excel uses UTF-16LE as the default encoding. So you should either convert UTF-8 to UTF-16LE yourself or use one of the tried and tested Excel PHP libs instead of trying to reinvent the wheel. I would recommend using PHPExcel...

Categories