fgetcsv/fputcsv $escape parameter fundamentally broken - php

Overview
fgetcsv and fputcsv support an $escape argument, however, it's either broken, or I'm not understanding how it's supposed to work. Ignore the fact that you don't see the $escape parameter documented on fputcsv, it is supported in the PHP source, there's a small bug preventing it from coming through in the documentation.
The function also supports $delimiter and $enclosure parameters, defaulting to a comma and a double quote respectively. I would expect the $escape parameter should be passed in order to have a field containing any one of those metacharacters (backslash, comma or double quote), however this certainly isn't the case. (I now understand from reading Wikipedia, these are to be enclosed in double-quotes).
What I've tried
Take for example the pitfall that has affected numerous posters in the comments section from the fgetcsv documentation. The case where we'd like to write a single backslash to a field.
$r = fopen('/tmp/test.csv', 'w');
fwrite($r, '"\"');
fclose($r);
$r = fopen('/tmp/test.csv', 'r');
var_dump(fgetcsv($r));
fclose($r);
This returns false. I've also tried "\\", however that also returns false. Padding the backslash(es) with some nebulous text gives fgetcsv the boost it needs... "hi\\there" and "hi\there" both parse and have the same result, but the result has only 1 backslash, so what's the point of the $escape at all?
I've observed the same behavior when not enclosing the backslash in double quotes. Writing a 'CSV' file containing the string \, and \\, have the same result when parsed by fgetcsv, 1 backslash.
Let's ask PHP how it might encode a backslash as a field in a CSV using fputcsv
$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, array('\\'));
fclose($r);
echo file_get_contents('/tmp/test.csv');
The result is a double-quote enclosed single backslash (and I've tried 3 versions of PHP > 5.5.4 when $enclose support was supposedly added to fputcsv). The hilarity of this is that fgetcsv can't even read it properly per my notes above, it returns false... I'd expect fputcsv not to enclose the backslash in double quotes or fgetcsv to be able to read "\" as fputcsv has written it..., or really in my apparently misconstrued mind, for fputcsv to write a double quote enclosed pair of backslashes and for fgetcsv to be able to properly parse it!
Reproducible Test
Try writing a single quote to a file using fputcsv, then reading it via fgetcsv.
$aBackslash = array('\\');
// Write a single backslash to a file using fputcsv
$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, $aBackslash);
fclose($r);
// Read the file using fgetcsv
$r = fopen('/tmp/test.csv', 'r');
$aFgetcsv = fgetcsv($r);
fclose($r);
// Compare the read value from fgetcsv to our original value
if(count(array_diff($aBackslash, $aFgetcsv)))
echo "PHP CSV support is broken\n";
Questions
Taking a step back I have some questions
What's the point of the $escape parameter?
Given the loose definition of CSV files, can it be said PHP is supporting them correctly?
What's the 'proper' way to encode a backslash in a CSV file?
Background
I initially discovered this when a co-worker provided me a CSV file produced from Python, which wrote out a single backslash enclosed by double quotes and after fgetcsv failed to read it. I had the gaul to ask him if he could use a standard Python function. Little did I know the PHP CSV toolkit is a tangled mess! (FWIW: the Python dev tells me he's using the CSV writing module).

From a quick look at Python's documentation on CSV Format Parameters, the escape character used within enclosed values (i.e. inside double quotes) is another double quote.
For PHP, the default escape character is a backslash (^); to match Python's behaviour you need to use this:
$data = fgetcsv($r, 0, ',', '"', '"');
(^) Actually fgetcsv() treats both $enclosure||$enclosure and $escape||$enclosure in the same way, so the $escape argument is used to avoid treating the backslash as a special character.
(^^) Setting the $length parameter to 0 instead of a fixed hard limit makes it less efficient.

EDIT 2
So after sleep and a relook at the code, turns out fputcsv doesn't accept the escape parameter, and I was being stupid. I've updated the code below to proper working code. The same basic principle applies, the escape parameter is there to alter the escape parameter so you can load a CSV with backslashes without them being treated as escape characters. The trick is to use a character that isn't contained within the csv. You can do this by grepping the file for a specific character, until you find one that isn't returned.
EDIT
Ok, so the verdict is that it checks for the escape char, and then never stops checking. So, if it finds it, it's escaped. That simple.
That said, the purpose of the escape parameter is to allow for this exact situation, where you can alter the escape char to a character that isn't needed.
Here I've converted your example code to a working code:
$aBackslash = array('\\');
// Write a single backslash to a file using fputcsv
$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, $aBackslash, ',', '"'); // EDIT 2: Removed escape param that causes PHP Notice.
fclose($r);
// Read the file using fgetcsv
$r = fopen('/tmp/test.csv', 'r');
$aFgetcsv = fgetcsv($r, ',', '"', '#');
fclose($r);
// Compare the read value from fgetcsv to our original value
if(count(array_diff($aBackslash, $aFgetcsv)))
echo "PHP CSV support is broken\n";
else
echo "PHP WORKS!\n";
One important caveat is that both fgetcsv and fputcsv must have the same parameters, otherwise the returned array will not match up to the original array.
ORIGINAL ANSWER
You are very correct. This is a failing with the language. I've tried every permutation of slashes that I can think of, and I've yet to actually achieve a successful response from the CSV. It always returns just as your example says.
I think what #deceze was mention is that in your example you use array('\\') which is actually the string literal "\" which PHP interprets as such, and passes "\" to the CSV, which is then returned that way. This returns the erroneous response \", which, as I stated above, is definitely wrong.
I did manage to find a work around, so that the result is actually appropriate:
First, for your example we'll either need to generate /tmp/test.csv in with "\" as the body, or alter the array slightly. Easiest method is just changing the array to:
array('"\\\\"');
After that, we should change up the fgetcsv request a bit.
$aFgetcsv = fgetcsv($r);
$aFgetcsv = array_map('stripslashes', $aFgetcsv);
By doing this, we're telling PHP to strip the first slash, thus making the string within $aFgetcsv "\"

Just had the same problem. The solution was to set $escape to false:
$row = ['a', '{"b":"single dquote=\""}', 'c'];
fputcsv($f, $row); // invalid csv: a,"{""b"":""single dquote=\"""}",c
fputcsv($f, $row, ',', '"', false); // valid csv: a,"{""b"":""single dquote=\""""}",c

Related

Is there a way to replace some quotation marks with preg_split and to leave some out?

I'm trying to use preg_split() but the results aren't what I expect to get from the function.
I'm new to php and the whole preg_split() scene and it seems to complicated for me to understand, at least for the moment.
$row = "EL10,40,2019-02-06,55555,2019-01-06,ar#email.com,"Text , random text , 52555885/ 48484848484",Yes,One Two,Broke,2019-01-01,000.00,0.00,0.0,0.0,0.0,0.00,0.00,0.0,VRA "Morning";
$row_expl = preg_split('/(?:[^"]*"|)\K\s*(,\s*|$)/',$row);
I expect to remove comma delimiters while leaving commas in quotation marks.
Everything almost seems to work, the only problem occurs at the very end.
It adds extra quotation marks to: VRA "Morning".
The result seems like this: "VRA ""Morning"""
A regex actually is the wrong tool for your problem. A CSV parser that defines the delimiter and enclosure character is the tool you need.
str_getcsv('EL10,40,2019-02-06,55555,2019-01-06,ar#email.com,"Text , random text , 52555885/ 48484848484",Yes,One Two,Broke,2019-01-01,000.00,0.00,0.0,0.0,0.0,0.00,0.00,0.0,VRA "Morning')
The default delimiter for the str_getcsv is , and the enclosure character is " so should be all set with the default options. You can see more about the function here http://php.net/manual/en/function.str-getcsv.php.
https://3v4l.org/GFWkr

Saving source code from HTML textarea to file

I am saving C++ code from a textarea of an HTML form using PHP.
The problem is if my code is like below,
printf("%d\n");
printf("%d\n");
the code that is saved to the file is like this:
printf(\"%d\\n\");\nprintf(\"%d\\n\");
I want the original code to be saved in the file. If I use,
$sourceCode = str_replace('\n',"\n", $sourceCode);
$sourceCode = str_replace('\"',"\"", $sourceCode);
the result is like below (saved in the file):
printf("%d\
");printf("%d\
");
It is clear that replacing \n in the source code replaces all the HTML created \n along with the \n that user gave as input (the original text). The only difference is user's input has an additional \ before the \n, that is \\n.
How can I resolve the problem such that only the implicit escape characters will be replaced, but the explicit escape characters, that the user wrote himself, will not be changed?
As mentioned by KenB, we need to see the PHP code that you are using to process the form input.
Processing Form Input
It looks to me like addslashes has been used on the form input.
If you are doing that in your code, don't. This is not the proper way to process form input. Instead, you should use the correct function (such as htmlspecialchars or mysqli_real_escape_string) to escape the input before you use it. Read about addslashes.
If you are using an older version of PHP where magic_quotes_gpc is on by default, then you should fix that. Read about 'Disabling Magic Quotes'.
Stripping Out the Slashes
If you have no control over the code that is adding the slashes, then you can remove them with a simple PHP function called stripslashes.
$sourceCode = stripslashes($sourceCode);
Read about stripslashes.
Understanding Escape Sequences
Your str_replace code shows a lack of understanding about escape sequences and/or a lack of understanding about single vs double quotes.
In the following code, a literal \n is replaced with a line break. With the double quotes, PHP interprets the \n as an escape sequence rather than a literal string.
$sourceCode = str_replace('\n',"\n", $sourceCode);
What you want is to replace a literal \\n with a literal \n. Note that to specify a literal backslash it must be doubled; hence the triple backslash you see below.
$sourceCode = str_replace('\\\n', '\n', $sourceCode);
And although this next line accomplishes what you wanted...
$sourceCode = str_replace('\"',"\"", $sourceCode);
...it could have been written differently. The following code is easier to read, saves you having to escape the literal ", and doesn't require PHP to interpret the string.
$sourceCode = str_replace('\"', '"', $sourceCode);
I've given the above code as examples to explain how PHP interprets escapes sequences, but don't use them. Either avoid adding the slashes in the first place or strip them using the proper function, as explained in the first part of this answer.
Read more about escape sequences and quoting strings.
The Literal \n Between Lines
I'm not sure what you are doing to add the literal \n between the lines. We'd need to see your code. But to remove it after the fact, you could try the following
$sourceCode = str_replace(';\n', ";\n", $sourceCode);
Of course, then you'd likely need to correct other C++ end-of-line sequences. So it is better to not add it in the first place.

php function for csv conversion w/ commas and other formatting characters

I am downloading my data from MySQL to .csv format. I am having no problem using mysql_real_escape_string(), but this function removes any commas or formatting characters that exist in my data.. So the .csv structure is maintained, but my grammatical characters (such as commas) are expectantly removed.
mysql_real_escape_string doesn't REMOVE data. It simply makes a string safe to insert into an SQL query. Standard rules for CSV is the enclose any string containing commas in double-quotes, so
This is my comma , containing string
becomes
"This is my comma, containing string"
in the CSV output. And any fields containing double-quotes should have the quotes doubled:
This is my "little" friend
becomes
This is my ""little"" friend
Enclosing each field with double quotes helps.
A function to convert an array to CSV:
function arr2csv($twoDaray) {
foreach($twoDarray as $k=>$v) {
$row=implode('","',$v);
echo '"'.$row.'"'.chr(10).chr(13);
}
}
I solved this by wrapping the entire string in quotes, then individually wrapping quotes and commas to maintain the formatting:
...
$csv_output .= "\"" . eregi_replace("\"", "\"\"", stripslashes($rowr[$j])) . "\",";
...
You'll note that I strangely applied stripslashes(). Unfortunately the script I am working on only works in php4, and so slashes are added by default settings of the .ini. So I just strip them out.
I'll also probably replace eregi_replace() with str_replace() as I believe it's deprecated.
Anyhow. The above solution works to remove commas and slashes and maintains them where

PHP exec() and spaces in paths

I'm executing the following in a PHP application:
$source = '/home/user/file.ext';
$output_dir = $this->setOutputString();
chdir('/home/ben/xc/phplib/bgwatcher-2011a/a01/');
exec('php bin/createjob.php $source $output_dir', $output);
return $output[0];
The problem is this: I have control over $source, but not $output_dir, which is a legacy Windows filesystem, and there are spaces in the path. An example $output_dir is:
/home/vol1/district id/store id/this_is_the_file.html
When inserting the output string into the exec() function, I have tried both:
addslashes($output_dir) and '"' . $output_dir . '"' to escape the entire output string. In the first case, the path gets concatenated to:
/home/vol1/districtthis_is_the_file.html
... where everything between the first space and the filename gets dropped. In the second case, exec() appears to throw a shoe and doesn't execute properly - unfortunately, the error message is getting lost in the machinery - I can provide it if it's absolutely necessary, but I'm also under time constraints to find a solution.
What's the solution, here? Do I sprintf() the entire string for exec()? I'm very confused as to why addslashes isn't working correctly to escape the spaces, and I assume it has something to do with sanitization with exec(), but I can't find any documentation to back it up.
Update: I've tried escapeshellarg() and preg_replace() without success. Thinking about this further, do I need to double-escape the path? Or escape the path and the command? If the path is being unescaped once by exec(), and once by PHP before it executes the command, does it stand to reason that I need to account for both escapes? Or is that not how it works?
I don't believe addslashes() does anything with spaces. escapeshellarg() might be what you want instead. Docs on escapeshellarg
From the PHP doc (here),
Returns a string with backslashes before characters that need to be quoted in database queries etc. These characters are single quote ('), double quote ("), backslash () and NUL (the NULL byte).
This won't do anything to the spaces. What you will need to do is use str_replace() to add slashes, like this:
$new_string = str_replace(" ", "\\ ", $old_string);
According to the PHP docs,
Returns a string with backslashes before characters that need to be quoted in database queries etc. These characters are single quote ('), double quote ("), backslash () and NUL (the NULL byte).
Looks like you'll have to preg_replace the spaces yourself.
Edit:
Even though this is the topic of another discussion, if performance is an issue, then after looking into it a little more, it seems that str_replace is actually quite a bit faster than preg_replace:
The test labeled "str_replace()" was
the faster by 0.9053 seconds (it took
10.3% the time.)
The first test took 1.0093 seconds. (preg_replace)
The second test took 0.104 seconds. (str_replace)
Benchmark found here.
I've used exec() with paths with spaces before, on both Windows and Linux hosts, and in both cases quoting the path worked perfectly for me.
That said, if you have no control over the safety of a shell argument, always run it through escapeshellarg() first!
You can very well use shell quotes, since that is what all exec commands run through:
exec("php bin/createjob.php '$source' '$output_dir'", $output);
It btw works not just for arguments, but also for the command itself:
exec('"/bin/echo" "one parameter"');
Use escapeshellcmd() anyway.
this works for me when using exec() with soffice(LibreOffice):
$file_name = "Some, file name.xlsx";
exec('/usr/bin/soffice --headless --convert-to pdf '."'".$file_name."'".' 2>&1', $output, $r);
You can use double quotes and escape character together to work out this.
$fileName = "filename with spaces.pdf";
exec("php bin/createjob.php >\"".$fileName."\" 2> error.log" , $output, $return);

User fgetcsv with and without quotations around entries

Edit: is there an alternative to fgetcsv?
The code below processes csv files where each entry is in cased by quotes and separated by commas ex: "Name","Last"... the problem I'm having is sometimes the csv files do not have quotes around each entry and just has the comma to separate it ex: Name,Last. How can I handle both types?
$uploadcsv = "/temp/files/Load15.csv";
$handle = fopen($uploadcsv, 'r');
$column_headers = array();
$row_count = 0;
while (($data = fgetcsv($handle, 100000, ",")) !== FALSE) {
if ($row_count==0){
$column_headers = $data;
} else {
print_r($data);
}
++$row_count;
}
this csv works:
"Name","Last"
"Mike","Aidens"
"Mike1","Aidens1"
this csv does not work:
Name,Last
Mike,Aidens
Mike1,Aidens1
Edit: Strange error... I tried a small snippet from the CSV file with no quotations and it worked. Odd then, I try a large piece then the entire CSV content (this is all be paste into a new test.csv file) and it worked. Both files are the same exact size 17,151kb yet the original csv file will not process. There is no trailing spaces or line at the end.
Set the 4th parameter to an empty string, it sets the enclosure, which is default ".
fgetcsv($handle, 100000, ",", '');
Use this line of code before php getcsv function call
ini_set('auto_detect_line_endings',TRUE);
As far as I am aware fgetcsv should work fine with or without quotes around the data.
Unless the CSV file is malformed, this will "just work".
In order words, you don't need to worry about whether or not every field has quotes around it, fgetcsv will take care of this for you.
Had the same problem, it couldn't read Hebrew (utf-8) letters without double quotes. It ran fine on the command line (could read Hebrew without double quotes), but in Apache it read only the header which had double quotes and returned empty strings instead of Hebrew strings in the rest of the lines which did not have double quotes at all.
Checked the locale in Apache and it returned the letter "C", but in the command line it returned "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=C;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C"
Thus I've added the following line before the fgetcsv command:
setlocale(LC_CTYPE, 'en_US.UTF-8');
And it worked, and read Hebrew letters without double quotes successfully.

Categories