Line break issue from CSV to MySQL - php

I am importing a .csv file into MySQL and everything works fine, except the line breaks that are in the file.
One of my .csv rows looks like this:
42,E-A-R™ Classic™ Earplugs,ear,images/ear/classic.jpg,5%,"Proven size, shape, and foam
3M's most popular earplug
Corded and uncorded in a variety of individual packs
NRR 29 dB / CSA Class AL",312-1201,,"E-A-R™ Classic™ Uncorded Earplugs, in Poly Bag",310-1001,,E-A-R™ Classic™ Uncorded Earplugs in Pillow Pack,311-1101,,"E-A-R™ Classic™ Corded Earplugs, in Poly Bag"
The sixth field over should break into a new line when called, but it doesn't. When importing the .csv I select Lines terminated by \r. I have tried \n and auto but no luck.
Weird thing is, the field looks correct in the database with all of the appropriate breaks. If I manually go in to insert the line breaks in PHPmyadmin it prints correctly. Each field is set to UTF-8 as well.
Any ideas on this? Thanks.
edit: here is the MySQL statement
LOAD DATA LOCAL INFILE '/tmp/php89FC0F' REPLACE INTO TABLE `ohes_flyer_products`
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\r'

LOAD DATA LOCAL INFILE '/tmp/php89FC0F' REPLACE INTO TABLE `ohes_flyer_products`
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\r\n'

maybe you could use fgetcsv to parse each csv line into an array and then dump that array into the database?
something along the lines of
$fd = fopen($csvfile, "r");
while ($line = fgetcsv($fd))
{
$sql = sprintf("INSERT INTO tablename (...) VALUES ('%s', ...)", $line[0], ...);
$res = mysql_query($sql);
}
note 1: code not ready for production, check SQL injections!
note 2: please, use prepared statements as using them will speed the thing a lot (or make one multi-row insert statement).
note 3: wrap all in a transaction.

Your CSV file has some qualities that you might be able to exploit.
The field containing carriage returns that do not terminate the record are enclosed in quotation marks.
The carriage return denoting the end of record follows a record with data enclosed in quotation marks. If this is true for all records, it is a way to possibly distinguish mid-field carriage returns from record terminators.
Knowing this, here are some things you can try:
Using a program like UltraEdit (or Notepad++) and its find/replace features (that include regular expression handling):
Find all carriage returns that are preceded by a quotation mark and replace them with a unique character or string. I suggest the pipe character "|" but first ensure they aren't used anywhere in the CSV file. These will represent end-of-record.
Next, replace all carriage returns with spaces. This will bring your fields with unwanted carriage returns back into alignment with the other data.
Finally, replace all special end-of-record characters with carriage returns. The end result that the only carriage returns present are end-of-record indicators.
Given that the carriage returns appear within a field that is enclosed by a delimiter (the quotation marks) you can specify that the import engine should only honor field and record delimiters outside of quotations. (MySQL LOAD DATA INFILE syntax) Specifically, look at the ENCLOSED BY 'char' parameter. Since not all of your fields use the delimiter, you will need to specify OPTIONALLY. In theory you should be able to specify how the CSV file is constructed and not need to parse it beforehand. I am of the opinion, however, that the in-field carriage returns should probably be removed so that the text will properly wrap when output in new context.

Your CSV appears to be non-standard, but that's often the reality of dealing with customer datasets.
As tools like MySQL's LOAD DATA statement are made to handle only the perfect use case, I've found that dealing with non-standard datasets like this requires code.
One way to handle this is to first scrub your CSV, replacing mid-field line breaks with a special, unique string (like ===MIDFIELD_LINE_BREAK===). Then I would write a custom CSV parser in a scripting language (Python, Ruby, PHP, Perl, etc).
In your CSV parser, iterate through lines in the file. For each line:
Swap the \n or \r characters back in for the ===MIDFIELD_LINE_BREAK=== characters.
Construct and execute an INSERT statement.

This worked for me:
$query = <<<EOT
LOAD DATA LOCAL INFILE '$file' REPLACE INTO TABLE `$table`
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\\'
LINES TERMINATED BY '\\\n'
IGNORE 1 ROWS;
EOT;
I had to tweak #Krunal's answer, due to getting errors, by adding a few extra forward slashes.
Unix line returns used here, by the way.
DOS: \\\r\\\n
Old Mac: \\\r
Unix: \\\n

Related

Export to txt file error occurred

I have exported data to a text file using PHP/MYSQL. I wrote into file.txt created using the instruction below:
fwrite($fp ,'' . $value. '' . "\t");
Every thing goes right, but some problem appears when a field in the DB contains a ',' character, like this:
Section= Society, Education & Youth
So in the text file created the section value appears in two columns separated and that's wrong, because the value of section is a one and should be inserted in one cell (I see the problem in excel file)
So the problem is, how can I tell the output to ignore the ',' in some values so that it wouldn't be taken as two columns?
A comma-separated value (CSV) file can use a delimiter character, usually quotes, to denote text for just that case. If your data does not have quotes within the text than you can give that a try. You can tell Excel what the delimiter is. You can also use a different characters (tab, comma, etc.) to delimit the fields just as long as Excel knows how you're delimiting the data.
// try with quotes
fwrite($fp, "\"$value\"\t");

What is the proper New Line Character in Outlook Contact Export?

I have a CSV parser, that takes Outlook 2010 Contact Export .CSV file, and produces an array of values.
I break each row on the new line symbol, and each column on the comma. It works fine, until someone puts a new line inside a field (typically Address). This new line, which I assume is "\n" or "\r\n", explodes the row where it shouldn't, and the whole file becomes messed up from there on.
In my case, it happens when Business Street is written in two lines:
123 Apple Dr. Unit A
My code:
$file = file_get_contents("outlook.csv");
$rows = explode("\r\n",$file);
foreach($rows as $row)
{
$columns = explode(",",$row);
// Further manipulation here.
}
I have tried both "\n" and "\r\n", same result.
I figured I could calculate the number of columns in the first row (keys), and then find a way to not allow a new line until this many columns have been parsed, but it feels shady.
Is there another character for the new line that I can try, that would not be inside the data fields themselves?
The most common way of handling newlines in CSV files is to "quote" fields which contain significant characters such as newlines or commas. It may be worth looking into whether your CSV generator does this.
I recommend using PHP's fgetcsv() function, which is intended for this purpose. As you've discovered, splitting strings on commas works only in the most trivial cases.
In cases, where that doesn't work, a more sophisticated, reportedly RFC4180-compliant parser is available here.
I also recommend fgetcsv()
fgetcsv will also take care of commas inside strings ( between quotes ).
Interesting parsing tutorial
+1 to the previous answer ;)
PS: fgetcsv is a bit slower then opening the file and explode the contents etc. But imo it's worth it.

How do I handle NULL values in a mysql SELECT ... OUTFILE statement in conjunction with FIELDS ESCAPED BY? NULL values are currently being truncated

I'm encountering some difficulties using MySQL's SELECT ... OUTFILE on result sets that include both null values and columns that require double quote escaping (ie, columns that contain '"' characters). This is the outfile syntax I am using:
INTO OUTFILE '$csv_file'
FIELDS ESCAPED BY '""' TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n'
My problem is concerning the FIELDS ESCAPED BY portion of the query - if this portion is omitted, then null values will export properly (...,"\N",... is what it looks like in the csv).
However, columns that contain double quotes will get split across multiple lines/columns in excel. This is because excel requires that '"' characters inside columns to be escaped by writing them as '""'.
Including the FIELDS ESCAPED BY clause fixes the excel problem with columns containing double quote characters, however, it breaks NULL columns. NULL columns get exported as ( ..."N,... ) missing both the backslash and the trailing quotation mark on the column. In excel, this causes multiple columns to collapse into each other due to the lack of a closing quotation.
My goal is to be able to export columns that contain double quotes and newlines, as well as export null columns as \N, however I can't seem to figure out how to do it. MySQL docs state that FIELDS ESCAPED BY affects how NULL columns are outputted, but I can't figure out how an escape sequence of '""' results in dropping the backslash and the trailing quote on a NULL column
Currently, my solution is to perform a string replace on each line as I output it to the user, by using FIELDS ESCAPED BY and replacing '"N,' with '"\N",'. This seems to work, but it doesn't feel right, and I'm afraid of it causing some sort of issues down the line
IFNULL( ) on the select columns is potentially an option, but the way we are using this in our code, is actually quite difficult to implement. It also needs to be done for each column that could potentially have NULL values, so it's a solution I'd like to avoid if I can
Thanks!
I was able to successfully save MySQL query results as CSV and import them into Excel as follows:
Use the form...
IFNULL(ColumnA, "" ) AS "Column A",
...for each column or expression in your SELECT statement than can possibly return a NULL (\N). This will ensure NULL values in your CSV file appear as properly quoted empty strings rather than improperly quoted \N's. Instead of an empty string, you could possibly specify a value to represent a NULL, e.g...
IFNULL(ColumnA, "~NULL~" ) AS "Column A",
Use the following OUTFILE options:
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\r\n'
Note that ESCAPED BY specifies one double quote, as does ENCLOSED BY. I haven't tested whether OPTIONALLY ENCLOSED BY will be successful, so I just leave OPTIONALLY out.
Using a double-quote to escape another double-quote within a quoted field value is required per the CSV specification - RFC 4180, section 2.7.
Try to use coalesce function to convert the column that can be null to ""
http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#function_coalesce

Creating carriage returns in csv cell via php

I'm trying to dynamically generate a csv file with some cells that will contain multiple lines, the address field for example will need to be grouped into a single "address" cell instead of address,city,state etc. All is going well and but for the last two days i've tried to insert \r, \r\n, \n, chr(10),chr(13), as well as a carriage return within the code to create the carriage return i'm looking for within the cell. All of these fail, either being literally printed in my csv as "\r" etc or when I do a manual carriage return in the code it generates a new row. I'm using this to create the breaks in my cells but it isn't working
$groupedCell = implode('\r',$data);
I'm pretty sure the code is correct as its placing \r where I would like a carriage return but not the actual return i'm looking for. I've tried some different encodings but still no luck, I am testing in Open Office which I guess could be the issue but I would assume it can handle carriage returns within a cell and I haven't seen any documentation to support otherwise. Thanks for reading!
The CSV spec is one I find implemented in many different ways... it basically seems like it's only half-speced which is frustrating given it's popularity.
To include a new-line within a cell in a CSV there cell may need to be wrapped, or the new-line may need to be escaped. You'll notice from the linked doc there are three ways to do this - and different programmes treat it differently:
Excel wraps the whole cell in double quotes: a cell can have (unescaped) newline characters within it and be considered a single cell, as long as it's wrapped in double quotes (note also you'll need to use excel-style double quote escaping within the cell contents)
Other programmes insert a single backslash before the character, therefore a line ending in \ is not considered the end of a line, but a newline character within the cell. A cell can have unescaped newline characters within as long as they're preceded by the backslash character.
Others still replace a newline with C-style character escaping, the actual character sequence \n or \r\n. In this case the cell has fully escaped newline characters.
The problem is compounded by the potential need to escape the control characters (as well as other content (eg " in #1, and \ in #2+3) and different styles of escaping (eg. an embedded quote could be escaped as: double double quote "" or backslash-double quote \")
My advice: generate an open-office document with multiple lines and key escape characters and see how open-office generates a CSV file. From there you can decide which of the above methods to use for newlines within cells, and which escaping method.
example of style-1 (excel):
#num,str,num
1,"Hello
World",1990
2,"Yes",1991
example of style-2:
#num,str,num
1,Hello \
Word,1990
2,Yes,1991
example of style-3:
#num,str,num
1,Hello \nWorld,1990
2,Yes,1991
You need to use "\r". You can't use escaped characters (aside from \') in single quoted strings. '\n' and '\r' are a literal backslash followed by an n or r, while "\n" and "\r" are newlines and carriage returns respectively.
As for inserting new lines in your CSV file, it's up to your implementation. There is no standard for CSV, so you'll have to figure out what format to use based on the system you're supplying the CSV data to. Some might accept a '\n' sequence and interpret it as a new line, others might allow a literal newline provided the cell is enclosed in quotes, and still others will not accept new lines at all.
Created an Excel 2010 worksheet with 3 columns.
Added a heading row with literal values: one, two, three
Added 1 data row with literal values: abc, abc, abc except that within the 2nd column I pressed ALT+ENTER after each letter to create a carriage return and line feed.
Did SAVE AS > OTHER and choose CSV while ignoring the warnings.
Examined the CSV data using NOTEPAD++ and clicked the Show All Characters button in toolbar.
One can see the following:
one, two, three[CR][LF]
abc,"a[LF]
b[LF]
c",abc[CR][LF]
Hope this lends more clarify.

How to remove new line (and some other) characters for csv output?

I have some data (which users input using WYSIWYG editor). I have created a tool to create a csv copy of the data for some backup purposes. For each record I
$csv_data .= str_replace(
array('<br />','<br/>', '\n', ','),
'',
strip_tags($db_data['description'])
).",";
for some of the records I find product description split across multiple lines, even though I am removing BR, new line characters etc above, and this breaks the csv file. Any idea what I am doing wrong? thank you very much for your help.
You use '' around the \n. Single Quotes do not allow escape characters like \n, use double quotes ("") instead.
See:
http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.single

Categories