Using PHP mySQL with CSV data containing BOM - php

I have a database that holds stock levels for certain items that are supplied by different suppliers. Each supplier sends me a daily CSV file with their current stock levels. I am trying to update the stock levels into my database.
The problem I am having is that when I extract the data from the CSV and send it through queries, it is not being working properly.
I have echoed the queries prior to sending them and the output is fine. Using phpMyAdmin, if I just paste the code as it is echoed, it works fine. This has led me to believe that it is an encoding problem.
Viewing the CSV file in cPanel File Manager I see there is an odd character at the beginning of the file. (I believe this is caleld a BOM). If I delete this characted and save the CSV file then my code works perfectly and the databse updates as expected.
Editing the file in cPanel File Manager, the Encoding opens as ansi_x3.110-1983. While manually deleting the character will fix the issue, it is not an option as I want this to be a fully automated daily process.
My code to open the file and extract the data from CSV:
// Open File
$csvData = fopen($file, "r");
if($csvData !== FALSE)
{
while(!feof($csvData))
{
$csvRow[] = fgetcsv($csvData, 100);
}
}
// Close file
fclose($csvData);
My code to build a simple search query
foreach($csvRow as $row)
{
$searchQuery = "SELECT * FROM supplier WHERE supplierItemCode = '".$row[0]."'";
$result = $conn->query($searchQuery);
echo "<br>".$searchQuery;
if($result->num_rows > 0)
{
// CODE NEVER REACHES HERE
}
As mentioned, if I simply paste the echo of $searchQuery into phpMyAdmin and run the query it works fine.
I have tried using fseek($csvData, 2) which successfully removes the BOM characters from the first row of data, but that is having no effect.
As suggested, I have tried using
$csvData = fopen($file, "r");
$BOM = null;
if($csvData !== FALSE)
{
$BOM = fread($csvData, 3);
if($BOM !== FALSE)
{
if($BOM != "\xef\xbb\xbf")
{
echo "<h5>BOM: ".$BOM; // This code is executed every time
fseek($csvData, 0);
}
}
//fseek($csvData, 2); // This was my earlier attempts without the above BOM filter
while(!feof($csvData))
{
$csvRow[] = fgetcsv($csvData, 100);
}
}
Using the BOM filter method produces this output.
As a further note, you'll notice that in my Update query output, there is a blank space in the SET quantity column. This space is not visible in the csv file.
This query is built with
$updateQuery = "UPDATE supplier SET ".$supplier." = '".$row[2]."' WHERE supplierItemCode = '".$row[0]."'";
Any suggestions on what exactly is causing this issue and how I can get around it.
Thanks in advance.

Try the following modification to the code that opens and reads the CSV file. It checks for the presence of the BOM and bypasses it if present:
$cvsRow = [];
// Open File
$csvData = fopen($file, "r");
if($csvData !== FALSE)
{
$BOM = fread($csvData, 4); // read potential BOM sequences to see if one is present or not
if ($BOM !== FALSE)
{
if (strlen($BOM) >= 3 && substr_compare($BOM, "\xef\xbb\xbf", 0, 3) == 0)
{
fseek($csvData, 3); // found UTF-8 encoded BOM
}
elseif (strlen($BOM) >= 2 && (substr_compare($BOM, "\xfe\xff", 0, 2) == 0 || substr_compare($BOM, "\xff\xfe", 0, 2) == 0))
{
fseek($csvData, 2); // found UTF-16 encoded BOM
}
elseif ($BOM != "\00\00\xfe\xff" && $BOM != "\xff\xfe\00\00")
{
fseek($csvData, 0); // did not find UTF-32 encoded BOM
}
while(!feof($csvData))
{
$csvRow[] = fgetcsv($csvData, 100);
}
}
// Close file (only if it has been successfully opened)
fclose($csvData);
}

I finally got a solution to work. After doing a lot of investigating, I believed it was encoded in UTF-16, despite what the BOM characters may have been saying.
I just wrote a simple function to convert each CSV value I was passing to the SQL.
function Convert($str)
{
return mb_convert_encoding($str, "UTF-8", "UTF-16BE");
}
........
$updateQuery = "UPDATE supplier SET ".$supplier." = '".Convert($row[2])."' WHERE supplierItemCode = '".Convert($row[0])."'";
I'm not sure why the BOM was causing such issue and why removing it entirely was not working. Thanks for everyone's help that lead me to discover the encoding problem.

Related

Why does the character "x" change to "×" when placed into a .csv file after running my PHP script? [duplicate]

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 4 years ago.
I wrote a PHP script that connects to a distributor's server, downloads several inventory files, and creates a massive .csv file to import into WooCommerce. Everything works except for one thing: when I look at the exported .csv file, the "x" character in my "caliber" column is always converted to the string "×".
updateInventoryFunctions.php:
function fixCalibers($number, $attributesList) {
$calibers = array (
# More calibers...
"9×23mm Winchester" => '9X23'
);
$pos = array_search($attributesList[$number], $calibers);
if ($pos !== false) {
$attributesList[$number] = $pos;
return $attributesList[$number];
} elseif ($attributesList[$number] == "40SW") {
$attributesList[$number] = '.40 S&W';
return $attributesList[$number];
} # More conditionals...
}
updateInventory.php:
# Code that connects to the distributor's server, downloads files, and places headers into the .csv file.
if (($localHandle = fopen("current_rsr_inventory.csv", "w")) !== false) {
# Code that defines arrays for future fixes and creates a multidimensional array of attributes...
foreach ($tempInventoryFile as &$line) {
$line = explode(";", $line);
# Code that fixes several inconsistencies from the distributor...
$matchingKey = array_search($line[0], $skuList);
$attributesList = array();
if ($matchingKey !== false) {
# Code that fixes more inconsistencies...
if ($attributesList[18] === "" || $attributesList[18] === null) {
array_splice($attributesList, 18, 1);
include_once "updateInventoryFunctions.php";
$attributesList[17] = fixCalibers(17, $attributesList);
} # More conditionals...
# Code that fixes more inconsistencies...
foreach ($attributesList as $attribute) {
$line[] = $attribute;
} // End foreach.
} // End if.
fputcsv($localHandle, $line);
} // End foreach.
} // End if.
# Code that closes files and displays success message...
The caliber "9×23mm Winchester" is displayed as "9×23mm Winchester" in the .csv file. I've tried placing single quotes around the array key and escaping the character "x". There are multiple instances of this mysterious switch.
Thanks in advance for any help!
This is an encoding issue. The character "×" is incorrectly encoded from UTF-8 to ISO-8859-1. Specify the output encoding as UTF-8, for example header('Content-Type: text/html; charset=utf-8');, or manually specify encoding in your browser will solve this issue.
"×" is U+C397, and code point C3 in ISO-8859-1 is tilde A "Ã".
Try to put header on top of your script:
header('Content-Type: text/html; charset=utf-8');

PHP Reading string as exponential number

I am reading a CSV file to import data. I am having a column with some auto-generated numbers(text & Numbers). The problem is in some of the rows my script reads the value as exponential number.
Example: 58597E68 is considered 5.86E+72
I need it to read as String as not number. The issue occurs only if I am having the character (E) in middle of the auto-generated number.
$feed = 'path-to-csv/import.csv';
if (!file_exists($feed)) {
//$feed = 'import.csv';
exit('Cannot find the CSV file: ' . $feed);
}
$row=0;
if (($handle = fopen($feed, 'r')) !== FALSE) {
while (($data_csv_rows = fgetcsv($handle, 1000000, ',')) !== FALSE) {
$row++;
if ($row == 1) {
continue;
} // skipping header row
echo "Row " . ($row-1) . "<br>";print_r($data_csv_rows);echo "<br><br>";
}
}
The problem is not your CSV but the original software (probably Excel) that produced the CSV.
CSV is a simple data format when you find something like 5.86E+72 it's like that in the CSV data and it's too late to fix it.
To avoid this make sure you export the data correct into CSV.
Some PHP code to find this kind of bad data in a field:
if (strpos($value, 'E+') !== FALSE) {
preg_match('~E\+[0-9]+$~', $value, $preg_result);
if (isset($preg_result[0])) {
die('Probably wrong data found within "'.$value.'".');
}
}
}
In your case it seems that 58597E68 is converted to float(5.8597E+72).
At least with str_getcsv() I can not recreate the problem, see https://3v4l.org/RZ1eA.
By definition it would be correct, since there are no " around this data, so PHP tries to determinate the type of this data and if it is potentally a numeric value... So be sure add " around strings. PHP String to Numeric Conversion documentation.
Update: I can not reproduce your use-case! 58597E68 becomes "58597E68" with str_getcsv() and with fgetcsv() It is not autoconverted to float! See https://3v4l.org/oXkBu for details! I suspect there is something wrong with the data you provide us or your validation.

Special characters encoding during CSV import

I have script that read *.CSV file and then export it content to MSSQL Database. Script is running only via CLI.
My problem is that this CSV file contains string with national characters like ą,ó,ż,ź,ś. For example i have word pracowników but in CLI i see word pracownikˇw.
My code
$handler = fopen($file, "r");
if ($handler !== false) {
while (($this->currentRow = fgetcsv($handler, 0, $this->csvDelimiter)) !== false) {
$row = $this->setHeaders(
$this->currentRow,
$this->config[$type]['columnMapping']
);
if ($row !== false) {
$this->dataImported[$type][] = $row;
}
}
fclose($handler);
}
What i tried
Using fgetcsv with setlocale or without - not working.
Replace fgetcsv with fgets and read each line via str_getcsv - not working.
Using utf8_encode for each row - not working.
Additional info
According to my PHP (PHP5.3) and few editors this file is encoded in ANSII, i tried to decoded it with iconv but all special characters are always replace with some strange symbols, like showed before.
On loop of $this->currentRow try to use for each element which has special char.
echo mb_convert_encoding($data[$c],"HTML-ENTITIES","UTF-8");

CSV parsing conditions in PHP

I created a CSV parser that works fine for some CSV files I've found online, but one that I converted from XLS to CSV via Microsoft Excel 2011 does not work.
The ones that work are formatted as such:
"Sort Order","Common Name","Formal Name","Type","Sub Type","Sovereignty","Capital","ISO 4217 Currency Code","ISO 4217 Currency Name","ITU-T Telephone Code","ISO 3166-1 2 Letter Code","ISO 3166-1 3 Letter Code","ISO 3166-1 Number","IANA Country Code TLD"
"1","Afghanistan","Islamic State of Afghanistan","Independent State",,,"Kabul","AFN","Afghani","+93","AF","AFG","004",".af".........................etc...
The one that doesn't work is formatted like this:
Order Id,Date Ordered,Date Returned,Product Id,Description,Order Reason Code,Return Qty,Order Return Comment,Ship To Name,Ship To Address1,Ship To Address2,Ship To Address3,Ship To City,Ship To State,Ship To Zipcode,Ship To Country,Disposition,Ship To Email,ShipVia
5555555,2013-07-05 13:58:36.000,2013-08-16 00:00:00.000,5555-55,0555 - Some Test Thing,Refund,2,,jeric beatty,123 fake st,,,burke,NJ,55055,US,Discard,test#test.com,Super Fast Shipping
Is there anyway to get excel to export in the format as the first one? I would like to avoid doing this manually as the file is huge and I would have to manually edit lots of parts of it where I couldn't do a "replace all". Another issue could be that there are double and sometimes triple commas in some places. Though this does appear in both files.
Here is the parser:
function ingest_csv() {
$file_url = 'http://www.path.to/csv/file.csv';
$record_num = 0;
$records = array();
$header = array();
if (($handle = fopen($file_url, "r")) !== FALSE) {
$records['id'] = '';
while (($data = fgetcsv($handle)) !== FALSE) {
$records['id'][$record_num] = '';
$cell_num = 0;
foreach ($data as $cell) {
if($record_num == 0) {
$header = $data;
} else {
$current_key = $header[$cell_num];
$records['id'][$record_num][$current_key] = $cell;
}
$cell_num++;
}
$record_num++;
}
fclose($handle);
}
else {
echo 'could not open file.';
}
return array($record_num, $records);
}
function batch_csv() {
list($num_rows, $rows) = ingest_csv
print_r($num_rows);
print_r($rows);
}
As mentioned in the comments though you may be trying to reinvent the wheel here, though personally I've asked questions where I didn't want to give long rambling explanations of why I was forced to use unconventional approaches so should this be one of those situations here's an answer.
In OpenOffice Calculator (for example) and when you go to save as CSV you get a number of further options including the decision to double quote all fields.
Unfortunately Excel doesn't give you the choice, but Microsoft do offer up a workaround using a macro - http://support.microsoft.com/kb/291296/en-us

CSV file generation error

I'm working on a project for a client - a wordpress plugin that creates and maintains a database of organization members. I'll note that this plugin creates a new table within the wordpress database (instead of dealing with the data as custom_post_type meta data). I've made a lot of modifications to much of the plugin, but I'm having an issue with a feature (that I've left unchanged).
One half of this feature does a csv import and insert, and that works great. The other half of this sequence is a feature to download the contents of this table as a csv. This part works fine on my local system, but fails when running from the server. I've poured over each portion of this script and everything seems to make sense. I'm, frankly, at a loss as to why it's failing.
The php file that contains the logic is simply linked to. The file:
<?php
// initiate wordpress
include('../../../wp-blog-header.php');
// phpinfo();
function fputcsv4($fh, $arr) {
$csv = "";
while (list($key, $val) = each($arr)) {
$val = str_replace('"', '""', $val);
$csv .= '"'.$val.'",';
}
$csv = substr($csv, 0, -1);
$csv .= "\n";
if (!#fwrite($fh, $csv))
return FALSE;
}
//get member info and column data
$table_name = $wpdb->prefix . "member_db";
$year = date ('Y');
$members = $wpdb->get_results("SELECT * FROM ".$table_name, ARRAY_A);
$columns = $wpdb->get_results("SHOW COLUMNS FROM ".$table_name, ARRAY_A);
// echo 'SQL: '.$sql.', RESULT: '.$result.'<br>';
//output headers
header("Content-type: application/octet-stream");
header("Content-Disposition: attachment; filename=\"members.csv\"");
//open output stream
$output = fopen("php://output",'w');
//output column headings
$data[0] = "ID";
$i = 1;
foreach ($columns as $column){
//DIAG: echo '<pre>'; print_r($column); echo '</pre>';
$field_name = '';
$words = explode("_", $column['Field']);
foreach ($words as $word) $field_name .= $word.' ';
if ( $column['Field'] != 'id' && $column['Field'] != 'date_updated' ) {
$data[$i] = ucwords($field_name);
$i++;
}
}
$data[$i] = "Date Updated";
fputcsv4($output, $data);
//output data
foreach ($members as $member){
// echo '<pre>'; print_r($member); echo '</pre>';
$data[0] = $member['id'];
$i = 1;
foreach ($columns as $column){
//DIAG: echo '<pre>'; print_r($column); echo '</pre>';
if ( $column['Field'] != 'id' && $column['Field'] != 'date_updated' ) {
$data[$i] = $member[$column['Field']];
$i++;
}
}
$data[$i] = $member['date_updated'];
//echo '<pre>'; print_r($data); echo '</pre>';
fputcsv4($output, $data);
}
fclose($output);
?>
So, obviously, a routine wherein a query is run, $output is established with fopen, each row is then formatted as comma delimited and fwrited, and finally the file is fclosed where it gets pushed to a local system.
The error that I'm getting (from the server) is
Error 6 (net::ERR_FILE_NOT_FOUND): The file or directory could not be found.
But it clearly is getting found, its just failing. If I enable phpinfo() (PHP Version 5.2.17) at the top of the file, I definitely get a response - notably Cannot modify header information (I'm pretty sure because phpinfo() has already generated a header). All the expected data does get printed to the bottom of the page (after all the phpinfo diagnostics), however, so that much at least is working correctly.
I am guessing there is something preventing the fopen, fwrite, or fclose functions from working properly (a server setting?), but I don't have enough experience with this to identify exactly what the problem is.
I'll note again that this works exactly as expected in my test environment (localhost/XAMPP, netbeans).
Any thoughts would be most appreciated.
update
Ok - spent some more time with this today. I've tried each of the suggested fixes, including #Rudu's writeCSVLine fix and #Fernando Costa's file_put_contents() recommendation. The fact is, they all work locally. Either just echoing or the fopen,fwrite,fclose routine, doesn't matter, works great.
What does seem to be a problem is the inclusion of the wp-blog-header.php at the start of the file and then the additional header() calls. (The path is definitely correct on the server, btw.)
If I comment out the include, I get a csv file downloaded with some errors planted in it (because $wpdb doesn't exist. And if comment out the headers, I get all my data printed to the page.
So... any ideas what could be going on here?
Some obvious conflict of the wordpress environment and the proper creation of a file.
Learning a lot, but no closer to an answer... Thinking I may need to just avoid the wordpress stuff and do a manual sql query.
Ok so I'm wondering why you've taken this approach. Nothing wrong with php://output but all it does is allow you to write to the output buffer the same way as print and echo... if you're having trouble with it, just use print or echo :) Any optimizations you could have got from using fwrite on the stream then gets lost by you string-building the $csv variable and then writing that in one go to the output stream (Not that optimizations are particularly necessary). All that in mind my solution (in keeping with your original design) would be this:
function escapeCSVcell($val) {
return str_replace('"','""',$val);
//What about new lines in values? Perhaps not relevant to your
// data but they'll mess up your output ;)
}
function writeCSVLine($arr) {
$first=true;
foreach ($arr as $v) {
if (!$first) {echo ",";}
$first=false;
echo "\"".escapeCSVcell($v)."\"";
}
echo "\n"; // May want to use \r\n depending on consuming script
}
Now use writeCSVLine in place of fputcsv4.
Ran into this same issue. Stumbled upon this thread which does the same thing but hooks into the 'plugins_loaded' action and exports the CSV then. https://wordpress.stackexchange.com/questions/3480/how-can-i-force-a-file-download-in-the-wordpress-backend
Exporting the CSV early eliminates the risk of the headers already being modified before you get to them.

Categories