While reading a csv file with PHP a problem occured with a line break within the CSV file. The contents of one cell will be split once a comma is followed by a line break:
$csv = array_map('str_getcsv', file($file));
first,second,"third,
more,text","forth"
next,dataset
This will result in:
1) first | second | third
2) more text | forth
3) next | dataset
While it should result in:
1) first | second | third more text | forth
2) next | dataset
Is this a bug within str_getcsv?
Don't do that, use fgetcsv(). You're having problems because file() doesn't care about the string encapsulation in your file.
$fh = fopen('file.csv', 'r');
while( $line = fgetcsv($fh) ) {
// do a thing
}
fclose($fh);
https://secure.php.net/manual/en/function.fgetcsv.php
And try not to store all the lines into an array before performing your operations if you can help it. Your system's memory usage will thank you.
<?php
$csvString = "ID,Condition,Condition,Condition,Condition,AdSize,Content:Text,Content:Text,Content:Text,Content:ImageUrl,Content:LandingPageUrl,Archive,Default
ID,Locations:Region,Device Properties:Device,Weather:Condition,Dmp:Liveramp,AdSize,title1,description1,price1,imageUrl1,landingPageUrl1,Archive,Default
ROW_001,\"Wa, Ca, Tn\",Mobile,Snow,12345,300x250,Hello Washingtonian,My Custom Description,10,http://domain/Snow.jpg,https://www.example.com,TRUE,
ROW_002,Wa,Mobile,Snow,12345,300x250,Hello Washingtonian,My Custom Description,10,http://domain/New_Snow.jpg,https://www.example.com,,
ROW_003,Wa,Mobile,,,300x250,Hello Washingtonian,My Custom Description,10,http://domain/clear.jpg,https://www.example.com,,
ROW_004,,,,,300x250,Hello,My Custom Description,20,http://domain/clear.jpg,https://www.example.com,,TRUE";
function csvToArray($csvString, $delimiter = ',', $lineBreak = "\n") {
$csvArray = [];
$rows = str_getcsv($csvString, $lineBreak); // Parses the rows. Treats the rows as a CSV with \n as a delimiter
foreach ($rows as $row) {
$csvArray[] = str_getcsv($row, $delimiter); // Parses individual rows. Now treats a row as a regular CSV with ',' as a delimiter
}
return $csvArray;
}
print_r(csvToArray($csvString));
https://gist.github.com/sul4bh/d392315c7049abd86916e077707bf123
Related
I have csv files I would like to combine into a single csv file. I have managed to remove all headers. The headers write to the new file in the right column and row. But the rows from from the csv files are not lining up they start from column B instead of column A. Data from the csv is put in an array added to the new file. Is there a way I could remove the trailing commas and invoke a PHP_EOL. Here is an example of an element with data.
Array([0]=>"Joe,Soap,,,25,11,,,,,,"
[1]=>"Jimmy,Tesla,10,,4,,,,,,,,")
I would like each element to write on new line starting from column A. Here is my script.
$fileload = $filecontents;
$lines = file($fileload);
foreach ($lines as $key => $line) {
$lineArr = explode(',',$line);
if(count(array_filter($lineArray)) <= 3)
{
continue;
}
if(count(array_intersect($lineArr, $outputheaders)) >= 1)
{
continue;
}
//Row Data
$parts[] = $line;
}
$headers = implode(",",$putheaders);
$sTmp = $sTmp.$headers;
$details = implode("','",$parts);
$sTmp = $sTmp.$details;
file_put_contents($Out, $sTmp, FILE_APPEND | LOCK_EX);
I am trying to parse a csv file into an array. Unfortunately one of the columns contains commas and quotes (Example below). Any suggestions how I can avoid breaking up the column in to multiple columns?
I have tried changing the deliminator in the fgetcsv function but that didn't work so I tried using str_replace to escape all the commas but that broke the script.
Example of CSV format
title,->link,->description,->id
Achillea,->http://www.example.com,->another,short example "Of the product",->346346
Seeds,->http://www.example.com,->"please see description for more info, thanks",->34643
Ageratum,->http://www.example.com,->this is, a brief description, of the product.,->213421
// Open the CSV
if (($handle = fopen($fileUrl, "r")) !==FALSE) {
// Set the parent array key to 0
$key = 0;
// While there is data available loop through unlimited times (0) using separator (,)
while (($data = fgetcsv($handle, 0, ",")) !==FALSE) {
// Count the total keys in each row
$c = count($data);
//Populate the array
for ($x = 0; $x < $c; $x++) {
$arrCSV[$key][$x] = $data[$x];
}
$key++;
} // end while
// Close the CSV file
fclose($handle);
}
Maybe you should think about using PHP's file()-function which reads you CSV-file into an array.
Depending on your delimiter you could use explode() then to split the lines into cells.
here an example:
$csv_file("test_file.csv");
foreach($csv_file as $line){
$cell = explode(",->", $line); // ==> if ",->" is your csv-delimiter!
$title[] = $cell[0];
$link[] = $cell[1];
$description = $cell[2];
$id[] = $cell[3];
}
I have a 12 XML files from which I am extracting ONE CSV file, from which - I am extracting column 1 and appending values to a tt.txt file .
NOW, I need to extract the values from this .txt file... everytime data is written to it ...
But the problem is , when I use
$contents = fread ($fd,filesize ($filename));
fclose ($fd);
$delimiter = ',' ;
$splitcontents = explode($delimiter, $contents);
IT reads ONLY from the first value of the file , every time a tt.txt file is appended !
I hope u understand the problem .. What I need is , I want $contents to have only the new data that was appended... instead it reads from the start of the file everytime...
Is there a way to achieve this, or does php fail ?/
This prob is extraction from TXT file- > performing computations- > writing INTO a new txt file . The problem being that I can't read from a middle value to a new value.. PHP always reads from the start of a file.
I think you need to store the last file position.
Call filesize to get current length, read the file, later, check if filesize is different (or maybe you know this some other way, and use fseek to move the cursor in the file, then read from there.
IE:
$previousLength = 0;
// your loop when you're calling your new read function
$length = filesize($filename);
fseek($fd,$previousLength);
$contents = fread($fd,$length - $previousLength);
$previousLength = $length;
It is only reading the first field because PHP does not automatically assume that a newline character (\n) means a new record; you have to handle this, yourself.
Using what you already have, I would do the following:
$contents = fread($fd, filesize($filename));
close($fd);
/* Now, split up $contents by newline, turning this into an array, where each element
* is, in effect, a new line in the CSV file. */
$contents = explode("\n", $contents);
/* Now, explode each element in the array, into itself. */
foreach ($contents as &$c) {
$c = explode(",", $c);
}
In the future, if you want to go line-by-line, as you run the risk of hogging too many resources by reading the entire file in, use fgets().
I'm not great at arrays but it sounds to me like you need an associative array (I'm doing a similar thing with the following code.
$lines = explode("\n", $contents);
foreach ($lines as $line) {
$parts = explode(',', $line);
if (count($parts) > 0) {
$posts = array();
$posts[] = array('name' => $parts[3],'email' => $parts[4],'phone' => $parts[5],'link' => $parts[6],'month' => $parts[0],'day' => $parts[1],'year' => $parts[2]); }
foreach ($posts as $post):
$post = array_filter(array_map('trim', $post));
how to find out if csv file fields are tab delimited or comma delimited. I need php validation for this. Can anyone plz help. Thanks in advance.
It's too late to answer this question but hope it will help someone.
Here's a simple function that will return a delimiter of a file.
function getFileDelimiter($file, $checkLines = 2){
$file = new SplFileObject($file);
$delimiters = array(
',',
'\t',
';',
'|',
':'
);
$results = array();
$i = 0;
while($file->valid() && $i <= $checkLines){
$line = $file->fgets();
foreach ($delimiters as $delimiter){
$regExp = '/['.$delimiter.']/';
$fields = preg_split($regExp, $line);
if(count($fields) > 1){
if(!empty($results[$delimiter])){
$results[$delimiter]++;
} else {
$results[$delimiter] = 1;
}
}
}
$i++;
}
$results = array_keys($results, max($results));
return $results[0];
}
Use this function as shown below:
$delimiter = getFileDelimiter('abc.csv'); //Check 2 lines to determine the delimiter
$delimiter = getFileDelimiter('abc.csv', 5); //Check 5 lines to determine the delimiter
P.S I have used preg_split() instead of explode() because explode('\t', $value) won't give proper results.
UPDATE: Thanks for #RichardEB pointing out a bug in the code. I have updated this now.
Here's what I do.
Parse the first 5 lines of a CSV file
Count the number of delimiters [commas, tabs, semicolons and colons] in each line
Compare the number of delimiters in each line. If you have a properly formatted CSV, then one of the delimiter counts will match in each row.
This will not work 100% of the time, but it is a decent starting point. At minimum, it will reduce the number of possible delimiters (making it easier for your users to select the correct delimiter).
/* Rearrange this array to change the search priority of delimiters */
$delimiters = array('tab' => "\t",
'comma' => ",",
'semicolon' => ";"
);
$handle = file( $file ); # Grabs the CSV file, loads into array
$line = array(); # Stores the count of delimiters in each row
$valid_delimiter = array(); # Stores Valid Delimiters
# Count the number of Delimiters in Each Row
for ( $i = 1; $i < 6; $i++ ){
foreach ( $delimiters as $key => $value ){
$line[$key][$i] = count( explode( $value, $handle[$i] ) ) - 1;
}
}
# Compare the Count of Delimiters in Each line
foreach ( $line as $delimiter => $count ){
# Check that the first two values are not 0
if ( $count[1] > 0 and $count[2] > 0 ){
$match = true;
$prev_value = '';
foreach ( $count as $value ){
if ( $prev_value != '' )
$match = ( $prev_value == $value and $match == true ) ? true : false;
$prev_value = $value;
}
} else {
$match = false;
}
if ( $match == true ) $valid_delimiter[] = $delimiter;
}//foreach
# Set Default delimiter to comma
$delimiter = ( $valid_delimiter[0] != '' ) ? $valid_delimiter[0] : "comma";
/* !!!! This is good enough for my needs since I have the priority set to "tab"
!!!! but you will want to have to user select from the delimiters in $valid_delimiter
!!!! if multiple dilimiter counts match
*/
# The Delimiter for the CSV
echo $delimiters[$delimiter];
There is no 100% reliable way to detemine this. What you can do is
If you have a method to validate the fields you read, try to read a few fields using either separator and validate against your method. If it breaks, use another one.
Count the occurrence of tabs or commas in the file. Usually one is significantly higher than the other
Last but not least: Ask the user, and allow him to override your guesses.
I'm just counting the occurrences of the different delimiters in the CSV file, the one with the most should probably be the correct delimiter:
//The delimiters array to look through
$delimiters = array(
'semicolon' => ";",
'tab' => "\t",
'comma' => ",",
);
//Load the csv file into a string
$csv = file_get_contents($file);
foreach ($delimiters as $key => $delim) {
$res[$key] = substr_count($csv, $delim);
}
//reverse sort the values, so the [0] element has the most occured delimiter
arsort($res);
reset($res);
$first_key = key($res);
return $delimiters[$first_key];
In my situation users supply csv files which are then entered into an SQL database. They may save an Excel Spreadsheet as comma or tab delimited files. A program converting the spreadsheet to SQL needs to automatically identify whether fields are tab separated or comma
Many Excel csv export have field headings as the first line. The heading test is unlikely to contain commas except as a delimiter. For my situation I counted the commas and tabs of the first line and use that with the greater number to determine if it is csv or tab
Thanks for all your inputs, I made mine using your tricks : preg_split, fgetcsv, loop, etc.
But I implemented something that was surprisingly not here, the use of fgets instead of reading the whole file, way better if the file is heavy!
Here's the code :
ini_set("auto_detect_line_endings", true);
function guessCsvDelimiter($filePath, $limitLines = 5) {
if (!is_readable($filePath) || !is_file($filePath)) {
return false;
}
$delimiters = array(
'tab' => "\t",
'comma' => ",",
'semicolon' => ";"
);
$fp = fopen($filePath, 'r', false);
$lineResults = array(
'tab' => array(),
'comma' => array(),
'semicolon' => array()
);
$lineIndex = 0;
while (!feof($fp)) {
$line = fgets($fp);
foreach ($delimiters as $key=>$delimiter) {
$lineResults[$key][$lineIndex] = count (fgetcsv($fp, 1024, $delimiter)) - 1;
}
$lineIndex++;
if ($lineIndex > $limitLines) break;
}
fclose($fp);
// Calculating average
foreach ($lineResults as $key=>$entry) {
$lineResults[$key] = array_sum($entry)/count($entry);
}
arsort($lineResults);
reset($lineResults);
return ($lineResults[0] !== $lineResults[1]) ? $delimiters[key($lineResults)] : $delimiters['comma'];
}
I used #Jay Bhatt's solution for finding out a csv file's delimiter, but it didn't work for me, so I applied a few fixes and comments for the process to be more understandable.
See my version of #Jay Bhatt's function:
function decide_csv_delimiter($file, $checkLines = 10) {
// use php's built in file parser class for validating the csv or txt file
$file = new SplFileObject($file);
// array of predefined delimiters. Add any more delimiters if you wish
$delimiters = array(',', '\t', ';', '|', ':');
// store all the occurences of each delimiter in an associative array
$number_of_delimiter_occurences = array();
$results = array();
$i = 0; // using 'i' for counting the number of actual row parsed
while ($file->valid() && $i <= $checkLines) {
$line = $file->fgets();
foreach ($delimiters as $idx => $delimiter){
$regExp = '/['.$delimiter.']/';
$fields = preg_split($regExp, $line);
// construct the array with all the keys as the delimiters
// and the values as the number of delimiter occurences
$number_of_delimiter_occurences[$delimiter] = count($fields);
}
$i++;
}
// get key of the largest value from the array (comapring only the array values)
// in our case, the array keys are the delimiters
$results = array_keys($number_of_delimiter_occurences, max($number_of_delimiter_occurences));
// in case the delimiter happens to be a 'tab' character ('\t'), return it in double quotes
// otherwise when using as delimiter it will give an error,
// because it is not recognised as a special character for 'tab' key,
// it shows up like a simple string composed of '\' and 't' characters, which is not accepted when parsing csv files
return $results[0] == '\t' ? "\t" : $results[0];
}
I personally use this function for helping automatically parse a file with PHPExcel, and it works beautifully and fast.
I recommend parsing at least 10 lines, for the results to be more accurate. I personally use it with 100 lines, and it is working fast, no delays or lags. The more lines you parse, the more accurate the result gets.
NOTE: This is just a modifed version of #Jay Bhatt's solution to the question. All credits goes to #Jay Bhatt.
When I output a TSV file I author the tabs using \t the same method one would author a line break like \n so that being said I guess a method could be as follows:
<?php
$mysource = YOUR SOURCE HERE, file_get_contents() OR HOWEVER YOU WISH TO GET THE SOURCE;
if(strpos($mysource, "\t") > 0){
//We have a tab separator
}else{
// it might be CSV
}
?>
I Guess this may not be the right manner, because you could have tabs and commas in the actual content as well. It's just an idea. Using regular expressions may be better, although I am not too clued up on that.
you can simply use the fgetcsv(); PHP native function in this way:
function getCsvDelimeter($file)
{
if (($handle = fopen($file, "r")) !== FALSE) {
$delimiters = array(',', ';', '|', ':'); //Put all that need check
foreach ($delimiters AS $item) {
//fgetcsv() return array with unique index if not found the delimiter
if (count(fgetcsv($handle, 0, $item, '"')) > 1) {
$delimiter = $item;
break;
}
}
}
return (isset($delimiter) ? $delimiter : null);
}
Aside from the trivial answer that c sv files are always comma-separated - it's in the name, I don't think you can come up with any hard rules. Both TSV and CSV files are sufficiently loosely specified that you can come up with files that would be acceptable as either.
A\tB,C
1,2\t3
(Assuming \t == TAB)
How would you decide whether this is TSV or CSV?
You also can use fgetcsv (http://php.net/manual/en/function.fgetcsv.php) passing it a delimiter parameter. If the function returns false it means that the $delimiter parameter wasn't the right one
sample to check if the delimiter is ';'
if (($data = fgetcsv($your_csv_handler, 1000, ';')) !== false) { $csv_delimiter = ';'; }
How about something simple?
function findDelimiter($filePath, $limitLines = 5){
$file = new SplFileObject($filePath);
$delims = $file->getCsvControl();
return $delims[0];
}
This is my solution.
Its works if you know how many columns you expect.
Finally, the separator character is the $actual_separation_character
$separator_1=",";
$separator_2=";";
$separator_3="\t";
$separator_4=":";
$separator_5="|";
$separator_1_number=0;
$separator_2_number=0;
$separator_3_number=0;
$separator_4_number=0;
$separator_5_number=0;
/* YOU NEED TO CHANGE THIS VARIABLE */
// Expected number of separation character ( 3 colums ==> 2 sepearation caharacter / row )
$expected_separation_character_number=2;
$file = fopen("upload/filename.csv","r");
while(! feof($file)) //read file rows
{
$row= fgets($file);
$row_1_replace=str_replace($separator_1,"",$row);
$row_1_length=strlen($row)-strlen($row_1_replace);
if(($row_1_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_1_number=$separator_1_number+$row_1_length;
}
$row_2_replace=str_replace($separator_2,"",$row);
$row_2_length=strlen($row)-strlen($row_2_replace);
if(($row_2_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_2_number=$separator_2_number+$row_2_length;
}
$row_3_replace=str_replace($separator_3,"",$row);
$row_3_length=strlen($row)-strlen($row_3_replace);
if(($row_3_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_3_number=$separator_3_number+$row_3_length;
}
$row_4_replace=str_replace($separator_4,"",$row);
$row_4_length=strlen($row)-strlen($row_4_replace);
if(($row_4_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_4_number=$separator_4_number+$row_4_length;
}
$row_5_replace=str_replace($separator_5,"",$row);
$row_5_length=strlen($row)-strlen($row_5_replace);
if(($row_5_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_5_number=$separator_5_number+$row_5_length;
}
} // while(! feof($file)) END
fclose($file);
/* THE FILE ACTUAL SEPARATOR (delimiter) CHARACTER */
/* $actual_separation_character */
if ($separator_1_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_1;}
else if ($separator_2_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_2;}
else if ($separator_3_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_3;}
else if ($separator_4_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_4;}
else if ($separator_5_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_5;}
else {$actual_separation_character=";";}
/*
if the number of columns more than what you expect, do something ...
*/
if ($expected_separation_character_number>0){
if ($separator_1_number==0 and $separator_2_number==0 and $separator_3_number==0 and $separator_4_number==0 and $separator_5_number==0){/* do something ! more columns than expected ! */}
}
If you have a very large file example in GB, head the first few line, put in a temporary file. Open the temporary file in vi
head test.txt > te1
vi te1
Easiest way I answer this is open it in a plain text editor, or in TextMate.
I'm trying to read data from a.csv file to ouput it on a webpage as text.
It's the first time I'm doing this and I've run into a nasty little problem.
My .csv file(which gets openened by Excel by default), has multiple rows and I read the entire thing as one long string.
like this:
$contents = file_get_contents("files/data.csv");
In this example file I made, there are 2 lines.
Paul Blueberryroad
85 us Flashlight,Bag November 20,
2008, 4:39 pm
Hellen Blueberryroad
85 us lens13mm,Flashlight,Bag,ExtraBatteries November
20, 2008, 16:41:32
But the string read by PHP is this:
Paul;Blueberryroad 85;us;Flashlight,Bag;November 20, 2008, 4:39 pmHellen;Blueberryroad 85;us;lens13mm,Flashlight,Bag,ExtraBatteries;November 20, 2008, 16:41:32
I'm splitting this with:
list($name[], $street[], $country[], $accessories[], $orderdate[]) = split(";",$contents);
What I want is for $name[] to contain "Paul" and "Hellen" as its contents. And the other arrays to receive the values of their respective columns.
Instead I get only Paul and the content of $orderdate[] is
November 20, 2008, 4:39 pmHellen
So all the rows are concatenated. Can someone show me how i can achieve what I need?
EDIT: solution found, just one werid thing remaining:
I've solved it now by using this piece of code:
$fo = fopen("files/users.csv", "rb+");
while(!feof($fo)) {
$contents[] = fgetcsv($fo,0,';');
}
fclose($fo);
For some reason, allthough my CSV file only has 2 rows, it returns 2 arrays and 1 boolean. The first 2 are my data arrays and the boolean is 0.
You are better off using fgetcsv() which is aware of CSV file structure and has designated options for handling CSV files. Alternatively, you can use str_getcsv() on the contents of the file instead.
The file() function reads a file in an array, every line is an entry of the array.
So you can do something like:
$rows = array();
$name = array();
$street = array();
$country = array();
$rows = file("file.csv");
foreach($rows as $r) {
$data = explode(";", $r);
$name[] = $data[0];
$street[] = $data[1];
$country[] = $data[2];
}
I've solved it now by using this piece of code:
$fo = fopen("files/users.csv", "rb+");
while(!feof($fo)) {
$contents[] = fgetcsv($fo,0,';');
}
fclose($fo);
For some reason, allthough my CSV file only has 2 rows, it returns 2 arrays and 1 boolean. The first 2 are my data arrays and the boolean is 0.
The remark about fgetcsv is correct.
I will still answer your question, for educational purpose. First thing, I don't understand the difference between your data (with comas) and the "string read by PHP" (it substitutes some spaces with semi-colon, but not all?).
PS.: I looked at the source code of your message, it looks like an odd mix of TSV (tabs) and CSV (coma).
Beside, if you want to go this way, you need to split first the file in lines, then the lines in fields.
The best way is of course fgetcsv() as pointed out.
$f = fopen ('test.csv', 'r');
while (false !== $data = fgetcsv($f, 0, ';'))
$arr[] = $data;
fclose($f);
But if you have the contents in a variable and want to split it, and str_getcsv is unavailable you can use this:
function str_split_csv($text, $seperator = ';') {
$regex = '#' . preg_quote($seperator) . '|\v#';
preg_match('|^.*$|m', $text, $firstline);
$chunks = substr_count($firstline[0], $seperator) + 1;
$split = array_chunk(preg_split($regex, $text), $chunks);
$c = count($split) - 1;
if (isset($split[$c]) && ((count($split[$c]) < $chunks) || (($chunks == 1) && ($split[$c][0] == ''))))
unset($split[$c]);
return $split;
}