Encoding Issue while reading Excel file through PHP COM

Encoding Issue while reading Excel file through PHP COM - php

I am reading an excel spreadsheet with PHP COM utility, everything is working fine except there are some cells in Excel file having different language data. When I read this data through PHP Com it displays like ???????
$ExlApp = new COM ( "Excel.Application" );
$workbook = $ExlApp->Workbooks->Open ( 'f:\dev\htdocs\excel\testfile.xlsx' );
$worksheet = $workbook->worksheets ( 1 );
$done = false;
$row_index = 1;
while ( $done == false ) {
$english = $worksheet->cells ( $row_index, 1 )->value;
$dari = $worksheet->cells ( $row_index, 2 )->value;
if ($english != '') {
$row_index ++;
echo "<div style='float:left;width:420px'>".$english."</div><div>".$dari."</div>";
} else {
$done = true;
}
}
$workbook->close ();
I have checked page encoding and its set to UTF-8. When I open original excel file it shows correct text but when I read it from PHP COM the encoding is lost. Does anyone have solution to this problem.
EDIT
How I can ensure that the value given by excel $worksheet->cells ( $row_index,2)->value is in correct encoding OR is there any property in Excel which I can set through PHP COM so it return data in UTF-8?
I have checked the encoding of value returned by Excel cell through mb_detect_encoding function in PHP and it gives ASCII where as it must give UTF-16 or UTF-8. It appears that excel does not give value in correct encoding.
Here is the Excel file I am reading with this script:
http://asimishaq.com/myfiles/testfile.xlsx
Note that the solution is required using PHP COM-INTEROP only.

As pointed out by #rc we need to specify codepage property in COM constructor to obtain data in correct encoding.
$ExlApp = new COM ( "Excel.Application", NULL, CP_UTF8 );
By changing the above line in script the data is displayed correctly.

Related

Using PHP mySQL with CSV data containing BOM

I have a database that holds stock levels for certain items that are supplied by different suppliers. Each supplier sends me a daily CSV file with their current stock levels. I am trying to update the stock levels into my database.
The problem I am having is that when I extract the data from the CSV and send it through queries, it is not being working properly.
I have echoed the queries prior to sending them and the output is fine. Using phpMyAdmin, if I just paste the code as it is echoed, it works fine. This has led me to believe that it is an encoding problem.
Viewing the CSV file in cPanel File Manager I see there is an odd character at the beginning of the file. (I believe this is caleld a BOM). If I delete this characted and save the CSV file then my code works perfectly and the databse updates as expected.
Editing the file in cPanel File Manager, the Encoding opens as ansi_x3.110-1983. While manually deleting the character will fix the issue, it is not an option as I want this to be a fully automated daily process.
My code to open the file and extract the data from CSV:
// Open File
$csvData = fopen($file, "r");
if($csvData !== FALSE)
{
while(!feof($csvData))
{
$csvRow[] = fgetcsv($csvData, 100);
}
}
// Close file
fclose($csvData);
My code to build a simple search query
foreach($csvRow as $row)
{
$searchQuery = "SELECT * FROM supplier WHERE supplierItemCode = '".$row[0]."'";
$result = $conn->query($searchQuery);
echo "<br>".$searchQuery;
if($result->num_rows > 0)
{
// CODE NEVER REACHES HERE
}
As mentioned, if I simply paste the echo of $searchQuery into phpMyAdmin and run the query it works fine.
I have tried using fseek($csvData, 2) which successfully removes the BOM characters from the first row of data, but that is having no effect.
As suggested, I have tried using
$csvData = fopen($file, "r");
$BOM = null;
if($csvData !== FALSE)
{
$BOM = fread($csvData, 3);
if($BOM !== FALSE)
{
if($BOM != "\xef\xbb\xbf")
{
echo "<h5>BOM: ".$BOM; // This code is executed every time
fseek($csvData, 0);
}
}
//fseek($csvData, 2); // This was my earlier attempts without the above BOM filter
while(!feof($csvData))
{
$csvRow[] = fgetcsv($csvData, 100);
}
}
Using the BOM filter method produces this output.
As a further note, you'll notice that in my Update query output, there is a blank space in the SET quantity column. This space is not visible in the csv file.
This query is built with
$updateQuery = "UPDATE supplier SET ".$supplier." = '".$row[2]."' WHERE supplierItemCode = '".$row[0]."'";
Any suggestions on what exactly is causing this issue and how I can get around it.
Thanks in advance.

Try the following modification to the code that opens and reads the CSV file. It checks for the presence of the BOM and bypasses it if present:
$cvsRow = [];
// Open File
$csvData = fopen($file, "r");
if($csvData !== FALSE)
{
$BOM = fread($csvData, 4); // read potential BOM sequences to see if one is present or not
if ($BOM !== FALSE)
{
if (strlen($BOM) >= 3 && substr_compare($BOM, "\xef\xbb\xbf", 0, 3) == 0)
{
fseek($csvData, 3); // found UTF-8 encoded BOM
}
elseif (strlen($BOM) >= 2 && (substr_compare($BOM, "\xfe\xff", 0, 2) == 0 || substr_compare($BOM, "\xff\xfe", 0, 2) == 0))
{
fseek($csvData, 2); // found UTF-16 encoded BOM
}
elseif ($BOM != "\00\00\xfe\xff" && $BOM != "\xff\xfe\00\00")
{
fseek($csvData, 0); // did not find UTF-32 encoded BOM
}
while(!feof($csvData))
{
$csvRow[] = fgetcsv($csvData, 100);
}
}
// Close file (only if it has been successfully opened)
fclose($csvData);
}

I finally got a solution to work. After doing a lot of investigating, I believed it was encoded in UTF-16, despite what the BOM characters may have been saying.
I just wrote a simple function to convert each CSV value I was passing to the SQL.
function Convert($str)
{
return mb_convert_encoding($str, "UTF-8", "UTF-16BE");
}
........
$updateQuery = "UPDATE supplier SET ".$supplier." = '".Convert($row[2])."' WHERE supplierItemCode = '".Convert($row[0])."'";
I'm not sure why the BOM was causing such issue and why removing it entirely was not working. Thanks for everyone's help that lead me to discover the encoding problem.

How to "clean" string readed from file() [duplicate]

It drives me crazy ... I try to parse a csv file and there is a very strange behavior.
Here is the csv
action;id;nom;sites;heures;jours
i;;"un nom a la con";1200|128;;1|1|1|1|1|1|1
Now the php code
$required_fields = array('id','nom','sites','heures','jours');
if (($handle = fopen($filename, "r")) !== FALSE)
{
$cols = 0;
while (($row = fgetcsv($handle, 1000, ";")) !== FALSE)
{
$row = array_map('trim',$row);
// Identify headers
if(!isset($headers))
{
$cols = count($row);
for($i=0;$i<$cols;$i++) $headers[strtolower($row[$i])] = $i;
foreach($required_fields as $val) if(!isset($headers[$val])) break 2;
$headers = array_flip($headers);
print_r($headers);
}
elseif(count($row) >= 4)
{
$temp = array();
for($i=0;$i<$cols;$i++)
{
if(isset($headers[$i]))
{
$temp[$headers[$i]] = $row[$i];
}
}
print_r($temp);
print_r($temp['action']);
var_dump(array_key_exists('action',$temp));
die();
}
}
}
And the output
Array
(
[0] => action
[1] => id
[2] => nom
[3] => sites
[4] => heures
[5] => jours
)
Array
(
[action] => i
[id] =>
[nom] => un nom a la con
[sites] => 1200|128
[heures] =>
[jours] => 1|1|1|1|1|1|1
)
<b>Notice</b>: Undefined index: action in <b>index.php</b> on line <b>110</b>
bool(false)
The key "action" exists in $temp but $temp['action'] returns Undefined and array_key_exists returns false. I've tried with a different key name, but still the same. And absolutely no problem with the others keys.
What's wrong with this ?
PS: line 110 is the print_r($temp['action']);
EDIT 1
If i add another empty field in the csv at the begining of each line, action display correctly
;action;id;nom;sites;heures;jours
;i;;"un nom a la con";1200|128;;1|1|1|1|1|1|1

Probably there is some special character at the beginning of the first line and trim isn't removing it.
Try to remove every non-word character this way:
// Identify headers
if(!isset($headers))
{
for($i=0;$i<$cols;$i++)
{
$headers[preg_replace("/[^\w\d]/","",strtolower($row[$i]))] = $i;
....

If your CSV file is in UTF-8 encoding,
make sure that it's UTF-8 and not UTF-8-BOM.
(you can check that in Notepad++, Encoding menu)

I had the same problem with CSV files generated in MS Excel using UTF-8 encoding. Adding the following code to where you read the CSV solves the issue:
$handle = fopen($file, 'r');
// ...
$bom = pack('CCC', 0xef, 0xbb, 0xbf);
if (0 !== strcmp(fread($handle, 3), $bom)) {
fseek($handle, 0);
}
// ...
What it does, is checking for the presence of UTF-8 byte order mark. If there is one, we move the pointer past BOM. This is not a generic solution since there are other types BOMs, but you can adjust it as needed.

Sorry I am posting on an old thread, but thought my answer could add to ones already provided here...
I'm working with a Vagrant guest VM (Ubuntu 16.04) from a Windows 10 host. When I first came across this bug (in my case, seeding a database table using Laravel and a csv file), #ojovirtual's answer immediately made sense, since there can be formatting issues between Windows and Linux.
#ojovirtual's answer didn't quite work for me, so I ended up doing touch new_csv_file.csv through Bash, and pasting contents from the 'problematic' CSV file (which was originally created on my Windows 10 host) into this newly-created one. This definitely fixed my issues - it would have been good to learn and debug some more, but I just wanted to get my particular task completed.

I struggled with this issue for a few hours only to realize that the issue was being caused by a null key in the array. Please ensure that none of the keys has a null value.

I struggled with this issue until I realised that my chunk of code has been run twice.
First run when index was present and my array was printed out properly, and the second run when index was not present and the notice error is triggered. That left me wondering "why my obviously existing and properly printed out array is triggering an 'Undefined index' notice". :)
Maybe this will help somebody.

Special characters encoding during CSV import

I have script that read *.CSV file and then export it content to MSSQL Database. Script is running only via CLI.
My problem is that this CSV file contains string with national characters like ą,ó,ż,ź,ś. For example i have word pracowników but in CLI i see word pracownikˇw.
My code
$handler = fopen($file, "r");
if ($handler !== false) {
while (($this->currentRow = fgetcsv($handler, 0, $this->csvDelimiter)) !== false) {
$row = $this->setHeaders(
$this->currentRow,
$this->config[$type]['columnMapping']
);
if ($row !== false) {
$this->dataImported[$type][] = $row;
}
}
fclose($handler);
}
What i tried
Using fgetcsv with setlocale or without - not working.
Replace fgetcsv with fgets and read each line via str_getcsv - not working.
Using utf8_encode for each row - not working.
Additional info
According to my PHP (PHP5.3) and few editors this file is encoded in ANSII, i tried to decoded it with iconv but all special characters are always replace with some strange symbols, like showed before.

On loop of $this->currentRow try to use for each element which has special char.
echo mb_convert_encoding($data[$c],"HTML-ENTITIES","UTF-8");

File encoding commande

I have 1200 files encoded ANSI. I need to convert them into UTF-8. It is not reasonable to convert each file using the simple solution file/save as!
Is there a commande in php which convert files from ANSI to UTF-8?

You can do this with the iconv library, which has a PHP binding (https://secure.php.net/manual/en/function.iconv.php). Consider using the command-line program to convert your source files instead, and keeping everything in utf8 instead of juggling encodings.

I have found the solution using PHP.
This is the code used:
<?php
set_time_limit ( 30000 );
$k=0;
while ($k<1232)
{
$fres="contenu_url".$k.".txt";
$inF = fopen($fres,"r");
$fres1="contenu_utf".$k.".txt";
$OutF = fopen($fres1,"w+");
$k=$k+1;
if($inF == false)
echo "<p>Impossible d'ouvrir le fichier</p>.\n";
$contenu_ancien="";
while (!feof($inF))
$contenu_ancien .= fgets($inF, 4096);
$contenu_utf8 = utf8_encode ($contenu_ancien);
fputs($OutF,$contenu_utf8);
fclose($OutF); fclose($inF);
}
?>

PHP MYSQL BLOB export to csv format

I am trying to create a script that can export data from a mysql database containing files (pdf mainly). This data is then imported onto the local version of the system.
However I have a problem with the BLOB field, when I export and import using PHPMYADMIN, it works fine, however when using my script, the BLOB field has additional code added to the top. Almost as if it is code to instruct programs how to deal with it. When i try to import this version into PHPMYADMIN, it doesnt work.
Is is a formatting error, do i need to convert it to a type. At present it is simply pulled from the database as a row['content'] item and then outputted to the csv
Any help would be much appreciated
Regards
///////////////// source code of my script
$file = 'supportingFiles'; // csv name.
$result = mysql_query("SHOW COLUMNS FROM files");
$a = 0;
if (mysql_num_rows($result) > 0) { //print column titles based on number of columns in the two files
while ($row = mysql_fetch_assoc($result)) {
$a++;
}
}
$values = mysql_query("SELECT * FROM files "); //select client details wehere required
while ($rows = mysql_fetch_array($values)) { //print client information
for ($k=0;$k<$a;$k++) {
$csv_output .= $rows[$k].",";
}
$csv_output .= "\n"; //end of line
}
$filename = $file."_".date("d-m-Y_H-i",time());
header("Content-type: application/csv");
header("Content-disposition: csv" . date("Y-m-d") . ".csv");
header( "Content-disposition: filename=supportingFiles.csv");
echo $csv_output; //output data file
when an export is ran using this script the following proceeds the actual content outputted when a csv export of the table is run from phpmyadmin...
%âãÏÓ
%%ISIS AfpToPdf-V.6.2/h3 '2008-05-19 (build:6.20.0.08205)'
4 0 obj
[
/DeviceRGB
]
endobj
5 0 obj
[/Pattern 4 0 R]
endobj
6 0 obj
[
/DeviceCMYK
]
endobj
7 0 obj
[/Pattern 6 0 R]
endobj
14 0 obj
<</Length 1221/Filter/FlateDecode>>
stream
xÚ•WÛnã6ýÿÃ¼mÌ%)’¢úo6Š6Û„ûTÈckW–RYÎ6ýüh‡CJ¶Ûi€\à™áp.gÇ‹|Æ!ÿ6{wÃA#~‚ãçgø3Í %ph×3ü^ïÿæ[P2cB‚É4Käw?
R<¦èØo-\Ã¯ðs³_oüqŽ.yL11°Ø·uåžà}ÕìÜTûáÏüã-\-ó©.ÿ„ÅŸ^þ!Ÿý=óõ‘YÆRƒ=±ÌÀ¬Á$´îÊ”)•TÉ ±6a¶×JŸúXËY¦ÇJµÖ²äìQ«/]k&ÍyÇœyV›¦gµši{^›0c†¨\c-jÏÆlì!ªÚQ! Íä^#YrÁóQ™_juÊ5x¶“˜µaâÂYÉôùZ)ëQÛk³ÉY•2~>_¥Ž1×{þ4[ä’ŒÉÅv2<¾ÿâ{Œ_ïªr…ŒžÂð©©„9øúÂ™FFÄZÄ«ª†1Ö ŒÎ’58Tj˜â‡$áb¯D`²H‡¾
üí</m›ºÛ|¿ƒÕ¦h×n²ã¯—ˆøQc¿¼Ç‰ãvûå·ê k<ÙyJ+¦
"2¦75^‚Ó ãg&x M„ñ4–êØå wÈ°\äå%BÄËØŒ4ì‰·è÷çßASÃsb¢ñ(¾t_bO¼Gî…›7]Qõe„û¦=ªo:9ø| Ö”3¾2ÃµØéPü74–š 2qÔYle%Ûn‹ö š{xjö-«U³¯»·5Wáhz‹ª¨Wˆ¿¶ÙÂCëËf¿ƒeœ¥ Ÿ”Ã\{IÆðE 9i=î… ð3ŠŠß‹§«;DõÊ•îî¸ 4ñþYöóÓÎ^¯<ÁeN$…ö±öG ¥§÷kÔå”^]¥Xò2¼cˆ}\>¢LdQÅSA7=6RN¾‘¥B— dXêjë[w{z¿Uâ‡WIÈxVÊK¥`Ü¬pÐ—Öˆx—þ! U¦¨¿ús ½%ÁBŒœãÃ%Øƒï¤ïž«p†ËzMÁaÞ6Ê»†"
iç\lMÏ>ÈL®v‘€šû8ê:Ú<kwh
‰h¶l#ì†ý"ô·+ÊjÇN½ü§7\2¥éû=0å‹¡¾ô½&±HW´ƒ„J^…–Ç¯"?

that is because CSV is not well standardized as a format (not even the "comma" as in "comma seperated values") and it is because phpmyadmin does some encoding/decoding on the values when exporting/importing.
You need this encoding/decoding part, because your binary BLOB (as you said, mostly PDF) can easily contain commas, line breaks, quotes, and all kinds of stuff that breaks a CSV parser.
To import your files using phpmyadmin, you would have to replicate the encoding machanism used there - lucky you that it's open source and you can have a look at the code.
Alternatively, if you want your own export/import mechanism (lets say: you write your own importer that matches your exporter) then you could make good use of base64 encoding here to ensure your CSV (which is intended as a plain-text format by the way) stores binary data correctly.
exporter:
// convert binary blob to text format
$plaintextdata_for_csv = base64_encode($binarydata_from_blob);
importer:
// decode text format to binary blob
$binarydata_for_blob = base64_decode($plaintextdata_from_csv);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Encoding Issue while reading Excel file through PHP COM - php

As pointed out by #rc we need to specify codepage property in COM constructor to obtain data in correct encoding. $ExlApp = new COM ( "Excel.Application", NULL, CP_UTF8 ); By changing the above line in script the data is displayed correctly.

Related

Using PHP mySQL with CSV data containing BOM

How to "clean" string readed from file() [duplicate]

Special characters encoding during CSV import

File encoding commande

PHP MYSQL BLOB export to csv format

Categories

Resources