How to read multibyte characters from a CSV file using PHP - php

I have a CSV file which contains a mixture of English and Chinese characters (it is a list of contacts exported from the Mozilla Thunderbird email program). I am trying to create a function which can extract the information from this file. It appears that function fgetcsv() does not support multibyte characters. Since I am running PHP5.2, I do not have access to str_getcsv().
Although the situation above refers to English and Chinese, I am looking for a solution which will work with any language.
Right now I have the function namecards_import_str_getcsv() as my CSV parsing function, which tries to mimic str_getcsv().
function namecards_import_str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = '\\', $eol = '\n') {
if (!function_exists('str_getcsv')) {
if (is_string($input) && !empty($input)) {
$output = array();
$tmp = preg_split("/".$eol."/",$input);
if (is_array($tmp) && !empty($tmp)) {
while (list($line_num, $line) = each($tmp)) {
if (preg_match("/" . $escape . $enclosure . "/", $line)) {
while ($strlen = strlen($line)) {
$pos_delimiter = strpos($line, $delimiter);
$pos_enclosure_start = strpos($line, $enclosure);
if (is_int($pos_delimiter) && is_int($pos_enclosure_start) && ($pos_enclosure_start < $pos_delimiter)) {
$enclosed_str = substr($line, 1);
$pos_enclosure_end = strpos($enclosed_str, $enclosure);
$enclosed_str = substr($enclosed_str, 0, $pos_enclosure_end);
$output[$line_num][] = $enclosed_str;
$offset = $pos_enclosure_end + 3;
}
else {
if (empty($pos_delimiter) && empty($pos_enclosure_start)) {
$output[$line_num][] = substr($line, 0);
$offset = strlen($line);
}
else {
$output[$line_num][] = substr($line,0,$pos_delimiter);
$offset = (!empty($pos_enclosure_start) && ($pos_enclosure_start < $pos_delimiter))? $pos_enclosure_start : $pos_delimiter + 1;
}
}
$line = substr($line,$offset);
}
}
else {
$line = preg_split("/" . $delimiter . "/", $line);
/*
* Validating against pesky extra line breaks creating false rows.
*/
if (is_array($line) && !empty($line[0])) {
$output[$line_num] = $line;
}
}
}
return $output;
}
else {
return false;
}
}
else {
return false;
}
}
else {
return str_getcsv($input);
}
}
This function is called by the following line of code:
$file = $_SESSION['namecards_csv_file'];
if (file_exists($file->uri)) {
// Load raw csv content into a handler variable.
$handle = fopen($file->uri, "r");
$cardinfo = array();
while (($data = fgets($handle)) !== FALSE) {
$data = namecards_import_str_getcsv($data);
dsm($data);
$cardinfo[] = $data[0];
}
fclose($handle);
}
else {
drupal_set_message(t('CSV file doesn\'t exist'), 'error');
}
In the array of results the strings of Chinese characters are in the correct place in the array by they appear as symbols e.g. "��".
Another method I had tried before this was to simply use fgetcsv() (See below example). But in this case the elements of the returned array were empty.
$file = $_SESSION['namecards_csv_file'];
if (file_exists($file->uri)) {
// Load raw csv content into a handler variable.
$handle = fopen($file->uri, "r");
$cardinfo = array();
while (($data = fgetcsv($handle, 5000, ",")) !== FALSE) {
dsm($data);
$cardinfo[] = $data;
}
fclose($handle);
}
else {
drupal_set_message(t('CSV file doesn\'t exist'), 'error');
}
In case you are interested here is the contents of the CSV file:
First Name,Last Name,Display Name,Nickname,Primary Email,Secondary Email,Screen Name,Work Phone,Home Phone,Fax Number,Pager Number,Mobile Number,Home Address,Home Address 2,Home City,Home State,Home ZipCode,Home Country,Work Address,Work Address 2,Work City,Work State,Work ZipCode,Work Country,Job Title,Department,Organization,Web Page 1,Web Page 2,Birth Year,Birth Month,Birth Day,Custom 1,Custom 2,Custom 3,Custom 4,Notes,
Ben,Gunn,Ben Gunn,Benny,ben1#asdf.com,ben2#asdf.com,,+94 (10) 11111111,+94 (10) 22222222,+94 (10) 33333333,,+94 44444444444,12 Benny Lane,,Beijing,Beijing,100028,China,13 asdfsdfs,,sdfsf,sdfsdf,134323,China,Manager,Sales,Benny Inc,,,,,,,,,,,
乔,康,乔 康,小康,,,,,,,,,,,,,,,北京市朝阳区,,,,,,,,,,,,,,,,,,,

Just writing up as an answer what was figured out in the comments:
fgetcsv is locale sensitive, so make sure to setlocale to a UTF-8 locale.

Related

Replace values of a csv file

I need to find and replace all the values of rows of a CSV using PHP;
I am trying this but its replacing even the headers to 0 and row values are not doubling as it suppose to.
public function checkForNumericValues()
{
// Read the columns and detect numeric values
if (($this->handle = fopen($this->csvFile, "r")) !== FALSE)
{
$fhandle = fopen($this->csvFile,"r");
$content = fread($fhandle,filesize($this->csvFile));
while (($this->data = fgetcsv($this->handle, 1000, ",")) !== FALSE)
{
$this->num = count($this->data);
// Skipping the header
if($this->row == 1)
{
$this->row++;
continue;
}
$this->row++;
// Check and replace the numeric values
for ($j=0; $j < $this->num; $j++)
{
if(is_numeric($this->data[$j]))
{
$content = str_replace($this->data[$j], $this->data[$j] * 2, $content);
}
else
{
$content = str_replace($this->data[$j], 0, $content);
}
}
break;
// print_r($content);
}
$fhandle = fopen($this->csvFile,"w");
fwrite($fhandle,$content);
fclose($this->handle);
}
echo "Numeric and String values been changed in rows of the CSV!";
}
CSV is like this:
You shouldn't update the entire $contents when you're processing each field in the CSV, just update that field. Your str_replace() will replace substrings elsewhere in the file; for instance, if the current field contains 5, you'll replace all the 5's in the file with 10, so 125 will become 1210.
You can do it correctly by replacing the element in the $this->data array. After you do that, you can then join them back into a string with implode(). Then you can keep all the updated lines in a string, which you write back to the file at the end.
You can skip the header line by calling fgets() before the while loop.
public function checkForNumericValues()
{
// Read the columns and detect numeric values
if (($this->handle = fopen($this->csvFile, "r")) !== FALSE)
{
$output = "";
$output .= fgets($this->csvFile); // Copy header line to output
while (($this->data = fgetcsv($this->handle, 1000, ",")) !== FALSE)
{
$this->num = count($this->data);
// Check and replace the numeric values
for ($j=0; $j < $this->num; $j++)
{
if(is_numeric($this->data[$j]))
{
$this->data[$j] *= 2;
}
else
{
$this->data[$j] = 0;
}
}
$output .= implode(',', $this->data) . "\n";
}
fclose($this->handle);
$fhandle = fopen($this->csvFile,"w");
fwrite($fhandle,$output);
fclose($fhandle);
}
echo "Numeric and String values been changed in rows of the CSV!";
}

PHP Add or Edit line in text File

I'm trying to search a text file, if a line contains a specific key ID I want to update the entire line, if not add a new line.
So based on this text file :
admin,1234,ID1345,NW
staff,1325,ID1001,NE
staff,2157,ID2003,SW
staff,8519,ID3001,NS
I want to search for ID1345 and then update that line only, the other lines stay exactly as they are. If no lines contain ID1345 then add a new line to the text file.
So far I've got:
$search = 'ID1345';
$result = 'admin,6698,ID1345,OP';
$reading = fopen('myfile', 'r');
$writing = fopen('myfile.tmp', 'w');
$replaced = false;
while (!feof($reading)) {
$line = fgets($reading);
if (stristr($line, $search)) {
$line = "$result\n";
$replaced = true;
}
fputs($writing, $line);
}
if (!$replaced) fputs($writing, "$result\n");
fclose($reading); fclose($writing);
rename('myfile.tmp', 'myfile');
This seems to work for the find and replace, but if the line doesn't exist it keeps adding it not just once.
I know this is due to the if (!$replaced) line, but I'm not sure how to do this.
The example above is small, but there could be a few thousand entries in the file..
Thanks
function file_properties($fileProperties) {
$result = array();
$lines = split("\n", $fileProperties);
$key = "";
$isWaitingOtherLine = false;
foreach($lines as $i=>$line) {
if(empty($line) || (!$isWaitingOtherLine && strpos($line,"#") === 0)) continue;
if(!$isWaitingOtherLine) {
$key = substr($line,0,strpos($line,'='));
$value = substr($line,strpos($line,'=') + 1, strlen($line));
} else {
$value .= $line;
}
/* Check if ends with single '\' */
if(strrpos($value,"\\") === strlen($value)-strlen("\\")) {
$value = substr($value, 0, strlen($value)-1)."\n";
$isWaitingOtherLine = true;
} else {
$isWaitingOtherLine = false;
}
$result[$key] = $value;
unset($lines[$i]);
}
return $result;
}
This will read and write values to file with any extension like .env in Laravel or anything *.properties in Java

Call to undefined function str_getcsv()

I get this error
Call to undefined function str_getcsv()
It seems to be a php version. it didn't come out until version 5.3
Anyone know a way replace this function instead upgrade the PHP version?
I don't know if this actually works, but on the manual page there is some example implementation which you can use as a fallback like this:
if(!function_exists('str_getcsv')) {
function str_getcsv($input, $delimiter = ',', $enclosure = '"') {
if( ! preg_match("/[$enclosure]/", $input) ) {
return (array)preg_replace(array("/^\\s*/", "/\\s*$/"), '', explode($delimiter, $input));
}
$token = "##"; $token2 = "::";
//alternate tokens "\034\034", "\035\035", "%%";
$t1 = preg_replace(array("/\\\[$enclosure]/", "/$enclosure{2}/",
"/[$enclosure]\\s*[$delimiter]\\s*[$enclosure]\\s*/", "/\\s*[$enclosure]\\s*/"),
array($token2, $token2, $token, $token), trim(trim(trim($input), $enclosure)));
$a = explode($token, $t1);
foreach($a as $k=>$v) {
if ( preg_match("/^{$delimiter}/", $v) || preg_match("/{$delimiter}$/", $v) ) {
$a[$k] = trim($v, $delimiter); $a[$k] = preg_replace("/$delimiter/", "$token", $a[$k]); }
}
$a = explode($token, implode($token, $a));
return (array)preg_replace(array("/^\\s/", "/\\s$/", "/$token2/"), array('', '', $enclosure), $a);
}
}
Aha, I found this code snippet in php manual
<?php
if (!function_exists('str_getcsv')) {
function str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = '\\', $eol = '\n') {
if (is_string($input) && !empty($input)) {
$output = array();
$tmp = preg_split("/".$eol."/",$input);
if (is_array($tmp) && !empty($tmp)) {
while (list($line_num, $line) = each($tmp)) {
if (preg_match("/".$escape.$enclosure."/",$line)) {
while ($strlen = strlen($line)) {
$pos_delimiter = strpos($line,$delimiter);
$pos_enclosure_start = strpos($line,$enclosure);
if (
is_int($pos_delimiter) && is_int($pos_enclosure_start)
&& ($pos_enclosure_start < $pos_delimiter)
) {
$enclosed_str = substr($line,1);
$pos_enclosure_end = strpos($enclosed_str,$enclosure);
$enclosed_str = substr($enclosed_str,0,$pos_enclosure_end);
$output[$line_num][] = $enclosed_str;
$offset = $pos_enclosure_end+3;
} else {
if (empty($pos_delimiter) && empty($pos_enclosure_start)) {
$output[$line_num][] = substr($line,0);
$offset = strlen($line);
} else {
$output[$line_num][] = substr($line,0,$pos_delimiter);
$offset = (
!empty($pos_enclosure_start)
&& ($pos_enclosure_start < $pos_delimiter)
)
?$pos_enclosure_start
:$pos_delimiter+1;
}
}
$line = substr($line,$offset);
}
} else {
$line = preg_split("/".$delimiter."/",$line);
/*
* Validating against pesky extra line breaks creating false rows.
*/
if (is_array($line) && !empty($line[0])) {
$output[$line_num] = $line;
}
}
}
return $output;
} else {
return false;
}
} else {
return false;
}
}
}
?>
You could try doing it with fgetcsv() although you'd have to open the file into stream to be able to read it.
Example:
$fh = fopen('php://temp', 'r+');
fwrite($fh, $string);
rewind($fh);
$row = fgetcsv($fh);
fclose($fh);
I found that the above code snippets do not properly parse csv files where the entries are contained in quotes and contain commas themselves.
But you can combine the other responses here into something that worked for me.
Basically, if the str_getcsv function isn't defined, define it yourself, and its implementation should create an in-memory file handle that is then passed to fgetcsv. For some reason PHP on GoDaddy (where I'm hosting a site) doesn't have str_getcsv but DOES have fgetcsv. Go figure.
if (!function_exists('str_getcsv')) {
function str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = '\\', $eol = '\n') {
$fh = fopen('php://temp', 'r+');
fwrite($fh, $input);
rewind($fh);
$row = fgetcsv($fh);
fclose($fh);
return $row;
}
}

Search through CSV, returning only exact fit value

Again I'm working on a working CSV filter. It will search through about 500 lines of promotional code and return its amount to ajax receiver. The weird thing is, if I only enter 2 letters, instead of searching for exact fit, the php processor would return the result once it has found a value which contains my entered letters! I need it to look for only exact fit of 4-strings value.
Here's my code so far:
<?php
// if data are received via POST, with index of 'test'
if (isset($_POST['test'])) {
$promocodevalid = false;
$file = fopen('test.csv', 'r');
$coupon = array($_POST['test']);
$coupondef = $_POST['test']; // get data
$coupon = array_map('preg_quote', $coupon);
$regex = '/'.implode('|', $coupon).'/i';
while (($line = fgetcsv($file)) !== FALSE) {
list($promocode, $amount) = $line;
if(preg_match($regex, $promocode)) {
$validity = 1;
echo $amount."[BRK]".$promocode."[BRK]".$validity;
$promocodevalid = true;
break;
}
}
if(!$promocodevalid) {
$validity = 0;
echo $amount."[BRK]".$promocode."[BRK]".$validity;
}
}
?>
Try to avoid regexes where they are not needed. Search for str* function you need.
Above code should look like:
if (isset($_POST['test'])) {
$promocodevalid = false;
$file = fopen('test.csv', 'r');
$coupondef = $_POST['test']; // get data
while (($line = fgetcsv($file)) !== FALSE) {
list($promocode, $amount) = $line;
// remove strtolower if you are have lowercase promocode,
// but probably leave a $coupondef lowered.
if(strpos(strtolower($promocode), strtolower($coupondef)) === 0) {
$validity = 1;
echo $amount."[BRK]".$promocode."[BRK]".$validity;
$promocodevalid = true;
break;
}
}
if(!$promocodevalid) {
$validity = 0;
echo $amount."[BRK]".$promocode."[BRK]".$validity;
}
}

Regex Unclosed Quote

I am trying to figure out a way to read a CSV with returns in it in PHP. The problem is when you read the file like this:
if (($handle = fopen($file, "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
row data...
}
}
If you have a retun in the CSV it does not work, it just sees the returns as a new row.
My idea was to have a regex to check for unclosed quotes, but I dont know of anything like that. Any ideas?
Since it seems the built-in fgetcsv does not correctly handle the CSV standard, there are suggestions for alternatives on the PHP man page for fgetcsv - here's one of them:
From http://www.php.net/manual/en/function.fgetcsv.php
The PHP's CSV handling stuff is
non-standard and contradicts with
RFC4180, thus fgetcsv() cannot
properly deal with files like this
example ...
There is a quick and dirty
RFC-compliant realization of CSV
creation and parsing:
<?php
function array_to_csvstring($items, $CSV_SEPARATOR = ';', $CSV_ENCLOSURE = '"', $CSV_LINEBREAK = "\n") {
$string = '';
$o = array();
foreach ($items as $item) {
if (stripos($item, $CSV_ENCLOSURE) !== false) {
$item = str_replace($CSV_ENCLOSURE, $CSV_ENCLOSURE . $CSV_ENCLOSURE, $item);
}
if ((stripos($item, $CSV_SEPARATOR) !== false)
|| (stripos($item, $CSV_ENCLOSURE) !== false)
|| (stripos($item, $CSV_LINEBREAK !== false))) {
$item = $CSV_ENCLOSURE . $item . $CSV_ENCLOSURE;
}
$o[] = $item;
}
$string = implode($CSV_SEPARATOR, $o) . $CSV_LINEBREAK;
return $string;
}
function csvstring_to_array(&$string, $CSV_SEPARATOR = ';', $CSV_ENCLOSURE = '"', $CSV_LINEBREAK = "\n") {
$o = array();
$cnt = strlen($string);
$esc = false;
$escesc = false;
$num = 0;
$i = 0;
while ($i < $cnt) {
$s = $string[$i];
if ($s == $CSV_LINEBREAK) {
if ($esc) {
$o[$num] .= $s;
} else {
$i++;
break;
}
} elseif ($s == $CSV_SEPARATOR) {
if ($esc) {
$o[$num] .= $s;
} else {
$num++;
$esc = false;
$escesc = false;
}
} elseif ($s == $CSV_ENCLOSURE) {
if ($escesc) {
$o[$num] .= $CSV_ENCLOSURE;
$escesc = false;
}
if ($esc) {
$esc = false;
$escesc = true;
} else {
$esc = true;
$escesc = false;
}
} else {
if ($escesc) {
$o[$num] .= $CSV_ENCLOSURE;
$escesc = false;
}
$o[$num] .= $s;
}
$i++;
}
// $string = substr($string, $i);
return $o;
}
?>

Categories