How to process a CSV file with different enclosures - php

I am uploading a CSV file using fgetcsv() function . Now i encounter three cases as ==>
1) The CSV file is clean and contains no enclosures
--> ex : Name,Age,Address ..in this scenario , the file is processed properly and uploaded
2) The CSV file has " as enclosure
--> ex : "Name","Age","Address" .. in this scenario, the file is processed properly is i pass double quotes as delimiter in fgetcsv()
3)The CSV file has ' as enclosure
--> ex : 'Name','Age','Address' ..
in this scenario, the file is not processed at all and not uploaded.
I want to acheive all these cases ,means a CSV file that contains either of the enclosures then it should process preperly.

Check the manual of fgetcsv(). The function accepts some optional arguments. For example the enclosure (as the fourth param).
You need to use it like the following in your 3.) case:
fgetcsv($handle, 0, ',', "'");

You can read the first character of the File, If that equals to single quote ,provide 'fgetcsv' function single quote as enclosure character and vice verce

Related

How to use fgetcsv() when the CSV has several double quotes """" or if the entire line is wrapped in quotes?

Some CSV files that we import to our server cannot be parsed correctly.
We are reading the CSV file with PHP's fgetcsv():
while (($line = fgetcsv($file)) !== false) { ... }
However, when the CSV line is wrapped in quotes (and contains two double quotes inside), for example:
"first entry,"""","""",Data Chunk,2022-05-30"
The fgetcsv() function cannot handle the line correctly and sees the first entry,"""","""",Data Chunk,2022-05-30 as one entry.
How can we make sure the function does regard first entry as a separate entry, and also interpretes the other parts """" as empty entries?
On more research I found:
Fields containing double quotes ("), Line Break (CRLF) and Comma must be enclosed with double quotes.
If Fields enclosed by double quotes (") contain double quotes character then the double quotes inside the field must be preceded with another double quote as an escape sequence. Source
This is likely the issue that we face here.
A more complete data example of the CSV:
Allgemeines
Subject,Body,Attachment,Author,Created At,Updated At
"Hello everyone, this is a sample. Kind regards,"""","""",Author name (X),2022-05-30 14:54:32 UTC,2022-05-30 14:54:37 UTC"
","""",https://padlet-uploads.storage.googleapis.com/456456456/testfile.docx,Author name (X),2022-05-15 13:53:04 UTC,2022-05-15 13:54:40 UTC"
",""Hello everyone!"
This is some fun text.
More to come.
Another sentence.
And more text.
Even more text
See you soon.
","",Author name (X),2021-07-22 09:41:06 UTC,2021-07-23 16:12:42 UTC
""
Important Things to Know in 2022
Subject,Body,Attachment,Author,Created At,Updated At
"","
01.01.2022 First day of new year
02.02.2202 Second day of new year
Please plan ahead.
","",Author name (X),2021-07-22 09:58:19 UTC,2022-03-24 14:16:50 UTC
""
Note: Line starts with double quote and ends with double quote and carriage return and new line feed.
Turns out the CSV data was corrupted.
The user messed around with the CSV in Excel, and as stated in the comments, likely overwrote the original CSV. Causing double escapings.
For anyone facing the same issue:
Do not waste your time in trying to recover corrupted CSV files with a custom parser.
Ask your user to give you access to the original CSV export site and generate the CSV yourself.
Check the CSV integrity. See code below.
$file = fopen($csvfile, 'r');
// validate if all the records have same number of fields, empty lines (count 1), full entry (count 6) - depends on your CSV structure
$length_array = array();
while (($data = fgetcsv($file, 1000, ",")) !== false)
{
// count number of entries
$length_array[] = count($data);
};
$length_array = array_unique($length_array);
// free memory by closing file
fclose($file);
// depending on your CSV structure it is $length_array==1 or $length_array==2
if (count($length_array) > 2)
{
// count mismatch
return 'Invalid CSV!';
}
👍

Changing a comma delimited csv file to a double quoted comma delimited csv file PHP

I have a csv file that I'm trying to change from comma delimitted to double quoted comma delimited with PHP and I don't even know where to begin.
Here are the first few lines of the file:
ID,LastName,FirstName,Grade
13,Donahue,Apple,5
21,Westen,Craig,4
I tried using a str_replace but it doesn't add the " to the beginning or the end of the row.

fgetcsv/fputcsv $escape parameter fundamentally broken

Overview
fgetcsv and fputcsv support an $escape argument, however, it's either broken, or I'm not understanding how it's supposed to work. Ignore the fact that you don't see the $escape parameter documented on fputcsv, it is supported in the PHP source, there's a small bug preventing it from coming through in the documentation.
The function also supports $delimiter and $enclosure parameters, defaulting to a comma and a double quote respectively. I would expect the $escape parameter should be passed in order to have a field containing any one of those metacharacters (backslash, comma or double quote), however this certainly isn't the case. (I now understand from reading Wikipedia, these are to be enclosed in double-quotes).
What I've tried
Take for example the pitfall that has affected numerous posters in the comments section from the fgetcsv documentation. The case where we'd like to write a single backslash to a field.
$r = fopen('/tmp/test.csv', 'w');
fwrite($r, '"\"');
fclose($r);
$r = fopen('/tmp/test.csv', 'r');
var_dump(fgetcsv($r));
fclose($r);
This returns false. I've also tried "\\", however that also returns false. Padding the backslash(es) with some nebulous text gives fgetcsv the boost it needs... "hi\\there" and "hi\there" both parse and have the same result, but the result has only 1 backslash, so what's the point of the $escape at all?
I've observed the same behavior when not enclosing the backslash in double quotes. Writing a 'CSV' file containing the string \, and \\, have the same result when parsed by fgetcsv, 1 backslash.
Let's ask PHP how it might encode a backslash as a field in a CSV using fputcsv
$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, array('\\'));
fclose($r);
echo file_get_contents('/tmp/test.csv');
The result is a double-quote enclosed single backslash (and I've tried 3 versions of PHP > 5.5.4 when $enclose support was supposedly added to fputcsv). The hilarity of this is that fgetcsv can't even read it properly per my notes above, it returns false... I'd expect fputcsv not to enclose the backslash in double quotes or fgetcsv to be able to read "\" as fputcsv has written it..., or really in my apparently misconstrued mind, for fputcsv to write a double quote enclosed pair of backslashes and for fgetcsv to be able to properly parse it!
Reproducible Test
Try writing a single quote to a file using fputcsv, then reading it via fgetcsv.
$aBackslash = array('\\');
// Write a single backslash to a file using fputcsv
$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, $aBackslash);
fclose($r);
// Read the file using fgetcsv
$r = fopen('/tmp/test.csv', 'r');
$aFgetcsv = fgetcsv($r);
fclose($r);
// Compare the read value from fgetcsv to our original value
if(count(array_diff($aBackslash, $aFgetcsv)))
echo "PHP CSV support is broken\n";
Questions
Taking a step back I have some questions
What's the point of the $escape parameter?
Given the loose definition of CSV files, can it be said PHP is supporting them correctly?
What's the 'proper' way to encode a backslash in a CSV file?
Background
I initially discovered this when a co-worker provided me a CSV file produced from Python, which wrote out a single backslash enclosed by double quotes and after fgetcsv failed to read it. I had the gaul to ask him if he could use a standard Python function. Little did I know the PHP CSV toolkit is a tangled mess! (FWIW: the Python dev tells me he's using the CSV writing module).
From a quick look at Python's documentation on CSV Format Parameters, the escape character used within enclosed values (i.e. inside double quotes) is another double quote.
For PHP, the default escape character is a backslash (^); to match Python's behaviour you need to use this:
$data = fgetcsv($r, 0, ',', '"', '"');
(^) Actually fgetcsv() treats both $enclosure||$enclosure and $escape||$enclosure in the same way, so the $escape argument is used to avoid treating the backslash as a special character.
(^^) Setting the $length parameter to 0 instead of a fixed hard limit makes it less efficient.
EDIT 2
So after sleep and a relook at the code, turns out fputcsv doesn't accept the escape parameter, and I was being stupid. I've updated the code below to proper working code. The same basic principle applies, the escape parameter is there to alter the escape parameter so you can load a CSV with backslashes without them being treated as escape characters. The trick is to use a character that isn't contained within the csv. You can do this by grepping the file for a specific character, until you find one that isn't returned.
EDIT
Ok, so the verdict is that it checks for the escape char, and then never stops checking. So, if it finds it, it's escaped. That simple.
That said, the purpose of the escape parameter is to allow for this exact situation, where you can alter the escape char to a character that isn't needed.
Here I've converted your example code to a working code:
$aBackslash = array('\\');
// Write a single backslash to a file using fputcsv
$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, $aBackslash, ',', '"'); // EDIT 2: Removed escape param that causes PHP Notice.
fclose($r);
// Read the file using fgetcsv
$r = fopen('/tmp/test.csv', 'r');
$aFgetcsv = fgetcsv($r, ',', '"', '#');
fclose($r);
// Compare the read value from fgetcsv to our original value
if(count(array_diff($aBackslash, $aFgetcsv)))
echo "PHP CSV support is broken\n";
else
echo "PHP WORKS!\n";
One important caveat is that both fgetcsv and fputcsv must have the same parameters, otherwise the returned array will not match up to the original array.
ORIGINAL ANSWER
You are very correct. This is a failing with the language. I've tried every permutation of slashes that I can think of, and I've yet to actually achieve a successful response from the CSV. It always returns just as your example says.
I think what #deceze was mention is that in your example you use array('\\') which is actually the string literal "\" which PHP interprets as such, and passes "\" to the CSV, which is then returned that way. This returns the erroneous response \", which, as I stated above, is definitely wrong.
I did manage to find a work around, so that the result is actually appropriate:
First, for your example we'll either need to generate /tmp/test.csv in with "\" as the body, or alter the array slightly. Easiest method is just changing the array to:
array('"\\\\"');
After that, we should change up the fgetcsv request a bit.
$aFgetcsv = fgetcsv($r);
$aFgetcsv = array_map('stripslashes', $aFgetcsv);
By doing this, we're telling PHP to strip the first slash, thus making the string within $aFgetcsv "\"
Just had the same problem. The solution was to set $escape to false:
$row = ['a', '{"b":"single dquote=\""}', 'c'];
fputcsv($f, $row); // invalid csv: a,"{""b"":""single dquote=\"""}",c
fputcsv($f, $row, ',', '"', false); // valid csv: a,"{""b"":""single dquote=\""""}",c

Export to txt file error occurred

I have exported data to a text file using PHP/MYSQL. I wrote into file.txt created using the instruction below:
fwrite($fp ,'' . $value. '' . "\t");
Every thing goes right, but some problem appears when a field in the DB contains a ',' character, like this:
Section= Society, Education & Youth
So in the text file created the section value appears in two columns separated and that's wrong, because the value of section is a one and should be inserted in one cell (I see the problem in excel file)
So the problem is, how can I tell the output to ignore the ',' in some values so that it wouldn't be taken as two columns?
A comma-separated value (CSV) file can use a delimiter character, usually quotes, to denote text for just that case. If your data does not have quotes within the text than you can give that a try. You can tell Excel what the delimiter is. You can also use a different characters (tab, comma, etc.) to delimit the fields just as long as Excel knows how you're delimiting the data.
// try with quotes
fwrite($fp, "\"$value\"\t");

User fgetcsv with and without quotations around entries

Edit: is there an alternative to fgetcsv?
The code below processes csv files where each entry is in cased by quotes and separated by commas ex: "Name","Last"... the problem I'm having is sometimes the csv files do not have quotes around each entry and just has the comma to separate it ex: Name,Last. How can I handle both types?
$uploadcsv = "/temp/files/Load15.csv";
$handle = fopen($uploadcsv, 'r');
$column_headers = array();
$row_count = 0;
while (($data = fgetcsv($handle, 100000, ",")) !== FALSE) {
if ($row_count==0){
$column_headers = $data;
} else {
print_r($data);
}
++$row_count;
}
this csv works:
"Name","Last"
"Mike","Aidens"
"Mike1","Aidens1"
this csv does not work:
Name,Last
Mike,Aidens
Mike1,Aidens1
Edit: Strange error... I tried a small snippet from the CSV file with no quotations and it worked. Odd then, I try a large piece then the entire CSV content (this is all be paste into a new test.csv file) and it worked. Both files are the same exact size 17,151kb yet the original csv file will not process. There is no trailing spaces or line at the end.
Set the 4th parameter to an empty string, it sets the enclosure, which is default ".
fgetcsv($handle, 100000, ",", '');
Use this line of code before php getcsv function call
ini_set('auto_detect_line_endings',TRUE);
As far as I am aware fgetcsv should work fine with or without quotes around the data.
Unless the CSV file is malformed, this will "just work".
In order words, you don't need to worry about whether or not every field has quotes around it, fgetcsv will take care of this for you.
Had the same problem, it couldn't read Hebrew (utf-8) letters without double quotes. It ran fine on the command line (could read Hebrew without double quotes), but in Apache it read only the header which had double quotes and returned empty strings instead of Hebrew strings in the rest of the lines which did not have double quotes at all.
Checked the locale in Apache and it returned the letter "C", but in the command line it returned "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C"
Thus I've added the following line before the fgetcsv command:
setlocale(LC_CTYPE, 'en_US.UTF-8');
And it worked, and read Hebrew letters without double quotes successfully.

Categories