PHP 5.3 fgetcsv \" - php

I'm importing a CSV from DB2 into MySQL, all goes well until half a million rows in I encounter \" from a column with encrypted data.
Here is an example:
100,"foo","bar","µ┬;¬µ┬;→ºµ┬;Öì\"
101,"foo","bar","$⌠ù¶∙$∙µ┬µ┬;→ºµ┬;Öì"
When fgetcsv parses this, it escapes the last double quote and includes the next line as if it is part of that field.
I see a few bug reports and in PHP 5.3 they added an escape param for fgetcsv.
What does DB2 use as an escape? Just "?

From the comments on the fgetcsv manual page it looks like this is a fairly common issue with no real good workaround. There are however some alternate functions which people have been kind enough to post on the page which might do what you need.
Here is a link to one of them: http://us3.php.net/manual/en/function.fgetcsv.php#98800

Related

mysql/php import csv into table with enclosures inside delimiter

I have come across a weird issue when importing a csv into mysql through sql or through php with data minipulation.
I have a csv from a third party (which i have no control of and i am unable to change) that is delimited by commas and has enclosures of double quotes. Simple enough. However in some of the cells there is data such as:
"first" value, secondvalue, thirdvalue, "fourth, value"
now when i import this into SQL the first value is being split due to the enclosure. How can i get it to ignore such cells and just input them as first value but still keep the enclosures so they work on "fourth, value" ?
Is there a regex that i could run on each line as i import it into the table (i dont mind importing lines one by one by reading them through php then using INSERT) or is there functionality in SQL to allow this?
I have tried the following statements but do not work
load data local infile '../htdocs/invoice/upload/importthis.csv'
into table items_raw
fields terminated by ','
enclosed by '"' lines terminated by '\n'
(date, clid_nu, clid, dnid, dcontext_nu, channel_nu,
dstchannel_nu, lastapp_nu, lastdata_nu, duration, billsec_nu, disposition_nu,
amaflags_nu, accountcode_nu, uniqueid_nu, userfield_nu)
and have also tried using OPTIONALLY ENCLOSED BY '"' however this also does not work
I have also tried using fgetcsv however i am getting the same results from it
Any ideas?
EDIT
so the regex "((.*),(.*))" seems to match the fourth value but not the first value. Is this the best way to go or am i over complicating this?
This looks like malformed CSV to me. This line should be:
"""first"" value", secondvalue, thirdvalue, "fourth, value"
where " is commonly used as an escape character.
The problem with using regexps on CSV input is that CSV is not a regular language.
Try using fgetcsv, see if that function has the same behavior as your SQL importer. Count the number of items it finds on each row. You might be able to catch all the anomalies that way.
Is it good enough to detect anomalies or do you also want to fix them automatically? - for instance if the number of anomalies is very high.
Alternatively you could write your own CSV parser that can read this, and convert the file to proper CSV.
Writing a CSV parser is actually not that hard. I can give an outline if you want.

PHPExcel - column ranges stripped from formula

I'm trying to insert a "COUNTIFS" formula and I know it's not a supported function, but I have the calculation setting turned off on the writer. It's not throwing any errors. However,
$active_sheet->setCellValue('C3', "=COUNTIFS('INVENTORY'!$H:$H,1,'INVENTORY'!$C:$C,\"ADMIN\",'INVENTORY'!$N:$N,1)");
Is getting written to the file as
COUNTIFS('INVENTORY'!:,1,'INVENTORY'!:,"ADMIN",'INVENTORY'!:,1)
I read on another page somewhere that referencing columns like this wasn't supported either, but I also tried putting them in like "$C2:$C3000" and it didn't help.
The issue would be that you're using double quotes around your second parameter. PHP is trying to replace $H, $C and $N with actual variables.
Try using single quotes and escaping the existing single quotes within the string.
Here are the docs on how PHP parses double quoted strings which might help.
So the solution was to remove or escape the dollar signs ($). I removed them and it worked fine, and then I escaped them and it also worked!!
Resulting example line:
$active_sheet->setCellValue('C3', "=COUNTIFS('INVENTORY'!\$H:\$H,1,'INVENTORY'!\$C:\$C,\"ADMIN\",'INVENTORY'!\$N:\$N,1)");
edit: wow, noob mistake. Credit goes to Ian Belcher.
Column and Row ranges aren't supported by PHPExcel at this point: you need to supply an actual range (e.g. 'INVENTORY'!$H1:$H500 rather than simply 'INVENTORY'!$H:$H.
There are also a couple of known issues when inserting or deleting new rows or columns into a worksheet when it doesn't correctly adjust an existing formula to reflect that change. Those issues have been fixed in the latest develop branch code on github, so you may want to try again using that.

The holy grail of cleaning input and output in php?

I've been running and developed a classified site now for the last 8 months and all the bugs were due to only one reason: how the users input their text...
My question is: Is there a php class, a plugin, something that I can do
$str = UltimateClean($str) before sending $str to my sql??
PS. I also noticed the problems doubled when i started using JSON, because I also have to be careful outputting the result in JSON..
Some issues I faced: multi-language strings (different charsets), copy-paste from Excel sheets.
Note: I am not worried for SQL Injections.
No, there isn't.
Different modes of escaping are for different purposes. You cannot universally escape something.
For Databases: Use PDO with prepared queries
For HTML: Use htmlspecialchars()
For JSON: json_encode() handles this for you
For character sets: You should be using UTF-8 on your page. Do this, and set your databases accordingly, and watch those issues disappear.

Displaying Code Snippets Properly Without Escape Characters

I have a PHP script that stores my code snippets.
To insert, I use:
$snippet_code = mysqli_real_escape_string($conn,trim($_POST['snippet_code']));
To display, I use the following which is wrapped in a pre tag:
$snippet_code = htmlentities($row['SnippetText']);
I notice that sometimes I get a lot of escape characters like \\\\ when the snippet is displayed on the page. The escape characters are present wherever single or double quotes appear in the code. The problem seems to be more severe in non-English language browsers.
How can I properly do this? How can I properly store and display code on a page?
Assuming you mean slash escape sequences like \", and not HTML escape sequences like & try this:
$snippet_code = htmlentities(stripslashes($row['SnippetText']));
If it is actually HTML escapes causing you trouble, just omit the htmlentities call.
If you are getting ' converted to \', your server is probably configured with a legacy option called Magic Quotes. You can read about it in the PHP manual. My advice is to disable them if possible.
Also, check your database. It's possible that your current data is corrupted. If so, you can write a small script thay uses stripslashes() to fix it.
From your comments, it seems that you are in fact talking about slashes found before quotes.
It's not clear from the limited information you've given us why non-English browsers would show more of these.
However, it is likely that these slashes should not be present in the first place. Perhaps you are running mysql_real_escape_string several times, instead of just once... but, again, nothing you've shown us indicates that.
Either way, you should fix the data in the database and not just hack around the issue on display.

why can't php just convert quotes to html entities for mysql?

PHP uses "magic quotes" by default but has gotten a lot of flak for it. I understand it will disable it in next major version of PHP.
While the arguments against it makes sense, what I don't get it is why not just use the HTML entities to represent quotes instead of stripping and removing slashes? After all, a VAST majority of mySQL is used for outputting to web browsers?
For example, ' is used instead of ' and it won't affect the database at all.
Another question, why can't PHP just have configurations set up for each version of PHP with this tag <?php4 or <?php5 so appropriate interpreters can be loaded for those versions?
Just curious. :)
Putting ' into a string column in a database would be fine, if all you use database content for is outputting to a web page. But that's not true.
It's better to escape output at the time you output it. That's the only time you know for sure that the output is going to a web page -- not a log file, an email, or other destination.
PS: PHP already turns magic quotes off by default in the standard php.ini file. It's deprecated in PHP 5.3, and it will be removed from the language entirely in PHP 6.0.
Here's a good reason, mostly in response to your own posted answer: Using htmlspecialchars() or htmlentities() does not make your SQL query safe. That's what mysql_real_escape_string() is for.
You seem to be making the assumption that it's only the single and double quote characters that pose a problem. MySQL queries are actually vulnerable to the \x00, \n, \r, \, ', " and \x1a characters in your data. If you are not using prepared statements or mysql_real_escape_string(), then you have an SQL injection vulnerability.
htmlspecialchars() and htmlentities() do not convert all of these characters, ergo you cannot make your query safe by using these functions. To that end, addslashes() does not make your query safe either!
Other smaller downsides include what the other posters have already mentioned about MySQL not always being used for web content, as well as the fact that you are increasing the amount of storage and index space needed for your data (consider one byte of storage for a quote character, versus six or more bytes of storage for its entity form).
I will reply to your first question only.
Validation of input is a wrong approach anyway, because it's not input that matters, the problem is where it's used. PHP can't assume that all input to a MySQL query would be output to a context where a HTML Entity would make sense.
It's nice to see that magic_quotes is going; it's the cause of a lot of security issues with PHP, and it's nice to see them taking a new approach :)
You'll do yourself a big favour if you reframe your validation approaches to validate on OUTPUT, for the context you are working in. Only you, as the programmer, can know this.
The reason that MySQL doesn't convert ' to ' is because ' is not '. If you want to convert your data for output, you should be doing that at the view layer, not in your database. It's really not very hard to just call htmlentities before/when you echo.
Thanks everyone. I had to REALLY think what you meant and the implications it may have if I change the quotes to HTML entities instead of adding slashes to them but again, isn't that actually changing the output/input too?
I cannot think of a reason why we CANNOT or SHOULDN'T use HTML entities for mySQL as long as we make it clear that all data is encoded using HTML entities. After all, my argument is based on a fact that the majority of mySQL is used for outputting to HTML browsers and also the fact that ' and " and / can seriously harm mySQL databases. So, isn't it actually SAFER to encode ' and " and / as HTML entities before sending them as INSERT queries? Also, we're going XML so why waste time writing htmlentities and stripslashes and addslashes when accessing data that's ALREADY encoded in HTML entities?
You can't just convert ' to '. Think about it: what happens when you want to store the string "'"? If you store ' then when you load the page it will display ' and not '.
So now you have to convert ALL HTML entities, not just quotes. Then you start getting into all sorts of weird conversion problems. The simplest solution is to just store the real data in the database, then you can display it how you like. You might want to use the actual quotes - in most cases " and ' don't do any harm outside of the tag brackets.
Sometimes you may want to store actual HTML in a field and display it raw (as long as it's checked and sanitized on its way in/out.

Categories