Dealing with line breaks in individual cells in a CSV file - php

I'm currently working on a webapplication that will take the data from a CSV file, and convert it into an array for easier access to specific data in the file. I do however have a problem. Currently my code looks something like this:
$data = file_get_contents('data.csv');
$array = str_getcsv($data,"\n");
$i = 0;
foreach($array as $row){
$newArray[$i] = str_getcsv($row,";");
$i++;
}
This works fine for the most part, but it messes up when there is a line break inside an individual value. I'm working with product descriptions so some of the companies put intentional line breaks in their descriptions. When I open the file in Excel I can see these linebreaks clearly, and my question is now, how do I deal with them? I've tried a lot of different approaches and read a lot online, but nothing seems to work for me.
I hope you can help me find a solution.
EDIT
Here is an example from the CSV file
5124;"Altid billig el og 5 stjerner på Trustpilot";"Altid billig el og 5 stjerner på Trustpilot.#
Vi har kun ét produkt og du skal ikke længere spekulere i om du nu har en billig elleverandør efter 6 måneder eller lign. Vi går efter at være blandt de billigste elleverandører, hvilket vi også beviser hvert år."
I don't know if it makes much sense, but this where the problem is. In this particular exampel there is a "forced" line break when I open it in excel, where i placed a bold "#". As far as I can see this should be valid?

str_getcsv expects one row of CSV formatted text, you cannot use it to parse an entire file. Let fgetcsv read and parse the file line by line:
$file = fopen('data.csv', 'r');
$data = [];
while ($row = fgetcsv($file)) {
$data[] = $row;
}
fclose($file);
var_dump($data);

Related

Reading and parsing english with japnese characters from a cav file php

i have a csv file which have lots of lines like this:
I Want It All (Tribute to Queen);Dancer (おもしろ♪ Ver.)
Hijo De La Luna (Tribute to Mecano);Perfect (おもしろ♪ Ver.)
You've Got A Friend In Me (おもしろ♪英語 Ver.) [映画『トイ·ストーリー』より]
The CSV file has two columns. First one contains only english strings but 2nd one contains mix of english and japnese characters. My code to read this csv file:
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<?php
header('Content-Encoding: UTF-8');
$string = file_get_contents('myfile.csv');
echo $string;
?>
// My output
��I Want It All (Tribute to Queen);Dancer (J0�0W0�0j& Ver.)
Hijo De La Luna (Tribute to Mecano);Perfect (J0�0W0�0j& Ver.)
��You've Got A Friend In Me (J0�0W0�0j&񂞊 Ver.) [ f;u0�0�0��0�0�0�0�00�0�0]
If I try:
echo "Losing My Religion (Tribute to R.E.M.);I Love It (オモシロ♪ヴォイス ver.)"
it displays text with japnese characters correctly. I tried all the solutions, i found on this site, but unable to parse the csv file correctly.
I need help to parse this file correctly. Thanks in advance for your help!
I got the issue solved by this:
$f = file_get_contents('myfile.csv'); // Get the whole file as string
$f = mb_convert_encoding($f, 'UTF8', 'UTF-16LE'); // Convert the file to UTF8
$f = preg_split("/\R/", $f); // Split it by line breaks
$f = array_map('str_getcsv', $f);

PHP translating via csv file

I have written a small PHP function to read a CSV file as source of Language(s).
class getLang {
var $lang;
function __construct() {
$theBigArray = file('/inc/lang.csv');
foreach ($theBigArray as $key => $value) {
$line = explode(',', $value);
$this->lang[trim($line[0])] = trim($line[1]);
}
}
}
Imagine the CSV file has a content like this :
Hello , Hallo
goodbye , Vaarwel
pizza , pizza
food , voedsel
morning , ochtend-
thanks, dank je
morning , Goedemorgen
nice , Leuk je te zien
day , goedendag
...
last_word , laatste woord
It's working well, for what I need, but I'm looking for a better idea to read CSV file and return a simple array or an object.
Note : Regarding, such CSV file will be filled out by a person(s) who don't know the programming and mostly, they use MS Excel to fill/make the CSV file, But we not talking about Regular expression in this post.
Best regard

Extract information from some lines of a CSV file with PHP

I am trying to extract some information from a csv file. What I want is to extract information only from the lines where this information is from the email and ID.
Previously I had no problems extracting information from csv files via php, however this csv file that I am using has the information in a very particular way.
Here I show part of the contents of the csv file:
As you can see, the information entered is not in a conventional format.
Using this small code:
$file = fopen("mails.csv","rb");
while(feof($file) == false)
{
echo fgets($file). "<br />";
}
fclose($file);
I get this on screen:
the file is more extensive, has more data. What I have shown is only a small part of the file, used as an example.
What I want to do is extract information from the lines where the radio type input is because there is the email, which is the information that I really want to extract.
I have tried with the conventional PHP functions to extract information from csv files but they do not work for me.
they throw me multiple errors and I can not get the information only from the lines that have the emails and save those emails in an array.
Alguna idea que puedan darme para obtener solo las lineas que tienen los inputs radio y extraer de allí el correo electronico de cada linea y guardarlo en un array?
Any help they can give me I would appreciate it.
Assuming that your format never changes (including for the invalid lines that you wish to ignore), the below would work.
Note: I have not tested this code so use this as a pointer and make adjustments as required.
$file = fopen("mails.csv","rb");
while(feof($file) == false){
$contents = fgets($file);
if (substr($contents,0,1) != "#"){
$val = explode(",", $contents);
echo $val[0]. "<br />";
}
}
fclose($file);

How to "clean" string readed from file() [duplicate]

It drives me crazy ... I try to parse a csv file and there is a very strange behavior.
Here is the csv
action;id;nom;sites;heures;jours
i;;"un nom a la con";1200|128;;1|1|1|1|1|1|1
Now the php code
$required_fields = array('id','nom','sites','heures','jours');
if (($handle = fopen($filename, "r")) !== FALSE)
{
$cols = 0;
while (($row = fgetcsv($handle, 1000, ";")) !== FALSE)
{
$row = array_map('trim',$row);
// Identify headers
if(!isset($headers))
{
$cols = count($row);
for($i=0;$i<$cols;$i++) $headers[strtolower($row[$i])] = $i;
foreach($required_fields as $val) if(!isset($headers[$val])) break 2;
$headers = array_flip($headers);
print_r($headers);
}
elseif(count($row) >= 4)
{
$temp = array();
for($i=0;$i<$cols;$i++)
{
if(isset($headers[$i]))
{
$temp[$headers[$i]] = $row[$i];
}
}
print_r($temp);
print_r($temp['action']);
var_dump(array_key_exists('action',$temp));
die();
}
}
}
And the output
Array
(
[0] => action
[1] => id
[2] => nom
[3] => sites
[4] => heures
[5] => jours
)
Array
(
[action] => i
[id] =>
[nom] => un nom a la con
[sites] => 1200|128
[heures] =>
[jours] => 1|1|1|1|1|1|1
)
<b>Notice</b>: Undefined index: action in <b>index.php</b> on line <b>110</b>
bool(false)
The key "action" exists in $temp but $temp['action'] returns Undefined and array_key_exists returns false. I've tried with a different key name, but still the same. And absolutely no problem with the others keys.
What's wrong with this ?
PS: line 110 is the print_r($temp['action']);
EDIT 1
If i add another empty field in the csv at the begining of each line, action display correctly
;action;id;nom;sites;heures;jours
;i;;"un nom a la con";1200|128;;1|1|1|1|1|1|1
Probably there is some special character at the beginning of the first line and trim isn't removing it.
Try to remove every non-word character this way:
// Identify headers
if(!isset($headers))
{
for($i=0;$i<$cols;$i++)
{
$headers[preg_replace("/[^\w\d]/","",strtolower($row[$i]))] = $i;
....
If your CSV file is in UTF-8 encoding,
make sure that it's UTF-8 and not UTF-8-BOM.
(you can check that in Notepad++, Encoding menu)
I had the same problem with CSV files generated in MS Excel using UTF-8 encoding. Adding the following code to where you read the CSV solves the issue:
$handle = fopen($file, 'r');
// ...
$bom = pack('CCC', 0xef, 0xbb, 0xbf);
if (0 !== strcmp(fread($handle, 3), $bom)) {
fseek($handle, 0);
}
// ...
What it does, is checking for the presence of UTF-8 byte order mark. If there is one, we move the pointer past BOM. This is not a generic solution since there are other types BOMs, but you can adjust it as needed.
Sorry I am posting on an old thread, but thought my answer could add to ones already provided here...
I'm working with a Vagrant guest VM (Ubuntu 16.04) from a Windows 10 host. When I first came across this bug (in my case, seeding a database table using Laravel and a csv file), #ojovirtual's answer immediately made sense, since there can be formatting issues between Windows and Linux.
#ojovirtual's answer didn't quite work for me, so I ended up doing touch new_csv_file.csv through Bash, and pasting contents from the 'problematic' CSV file (which was originally created on my Windows 10 host) into this newly-created one. This definitely fixed my issues - it would have been good to learn and debug some more, but I just wanted to get my particular task completed.
I struggled with this issue for a few hours only to realize that the issue was being caused by a null key in the array. Please ensure that none of the keys has a null value.
I struggled with this issue until I realised that my chunk of code has been run twice.
First run when index was present and my array was printed out properly, and the second run when index was not present and the notice error is triggered. That left me wondering "why my obviously existing and properly printed out array is triggering an 'Undefined index' notice". :)
Maybe this will help somebody.

PHP: Undefined index even if it exists

It drives me crazy ... I try to parse a csv file and there is a very strange behavior.
Here is the csv
action;id;nom;sites;heures;jours
i;;"un nom a la con";1200|128;;1|1|1|1|1|1|1
Now the php code
$required_fields = array('id','nom','sites','heures','jours');
if (($handle = fopen($filename, "r")) !== FALSE)
{
$cols = 0;
while (($row = fgetcsv($handle, 1000, ";")) !== FALSE)
{
$row = array_map('trim',$row);
// Identify headers
if(!isset($headers))
{
$cols = count($row);
for($i=0;$i<$cols;$i++) $headers[strtolower($row[$i])] = $i;
foreach($required_fields as $val) if(!isset($headers[$val])) break 2;
$headers = array_flip($headers);
print_r($headers);
}
elseif(count($row) >= 4)
{
$temp = array();
for($i=0;$i<$cols;$i++)
{
if(isset($headers[$i]))
{
$temp[$headers[$i]] = $row[$i];
}
}
print_r($temp);
print_r($temp['action']);
var_dump(array_key_exists('action',$temp));
die();
}
}
}
And the output
Array
(
[0] => action
[1] => id
[2] => nom
[3] => sites
[4] => heures
[5] => jours
)
Array
(
[action] => i
[id] =>
[nom] => un nom a la con
[sites] => 1200|128
[heures] =>
[jours] => 1|1|1|1|1|1|1
)
<b>Notice</b>: Undefined index: action in <b>index.php</b> on line <b>110</b>
bool(false)
The key "action" exists in $temp but $temp['action'] returns Undefined and array_key_exists returns false. I've tried with a different key name, but still the same. And absolutely no problem with the others keys.
What's wrong with this ?
PS: line 110 is the print_r($temp['action']);
EDIT 1
If i add another empty field in the csv at the begining of each line, action display correctly
;action;id;nom;sites;heures;jours
;i;;"un nom a la con";1200|128;;1|1|1|1|1|1|1
Probably there is some special character at the beginning of the first line and trim isn't removing it.
Try to remove every non-word character this way:
// Identify headers
if(!isset($headers))
{
for($i=0;$i<$cols;$i++)
{
$headers[preg_replace("/[^\w\d]/","",strtolower($row[$i]))] = $i;
....
If your CSV file is in UTF-8 encoding,
make sure that it's UTF-8 and not UTF-8-BOM.
(you can check that in Notepad++, Encoding menu)
I had the same problem with CSV files generated in MS Excel using UTF-8 encoding. Adding the following code to where you read the CSV solves the issue:
$handle = fopen($file, 'r');
// ...
$bom = pack('CCC', 0xef, 0xbb, 0xbf);
if (0 !== strcmp(fread($handle, 3), $bom)) {
fseek($handle, 0);
}
// ...
What it does, is checking for the presence of UTF-8 byte order mark. If there is one, we move the pointer past BOM. This is not a generic solution since there are other types BOMs, but you can adjust it as needed.
Sorry I am posting on an old thread, but thought my answer could add to ones already provided here...
I'm working with a Vagrant guest VM (Ubuntu 16.04) from a Windows 10 host. When I first came across this bug (in my case, seeding a database table using Laravel and a csv file), #ojovirtual's answer immediately made sense, since there can be formatting issues between Windows and Linux.
#ojovirtual's answer didn't quite work for me, so I ended up doing touch new_csv_file.csv through Bash, and pasting contents from the 'problematic' CSV file (which was originally created on my Windows 10 host) into this newly-created one. This definitely fixed my issues - it would have been good to learn and debug some more, but I just wanted to get my particular task completed.
I struggled with this issue for a few hours only to realize that the issue was being caused by a null key in the array. Please ensure that none of the keys has a null value.
I struggled with this issue until I realised that my chunk of code has been run twice.
First run when index was present and my array was printed out properly, and the second run when index was not present and the notice error is triggered. That left me wondering "why my obviously existing and properly printed out array is triggering an 'Undefined index' notice". :)
Maybe this will help somebody.

Categories