I have a CSV file where the data is in landscape orientation.ex:
name, test name
age, 20
gender,Male
where the first column is the headers and the second the data, i tried using laravel maatwebsite/Excel and the response after reading the file, the first row is taken as the headers. (name and test name).
is there any method to read this type of CSV files in laravel using maatwebsite/Excel
You can use this function
public function readCSV($csvFile, $array)
{
$file_handle = fopen($csvFile, 'r');
while (!feof($file_handle)) {
$line_of_text[] = fgetcsv($file_handle, 0, $array['delimiter']);
}
fclose($file_handle);
return $line_of_text;
}
$csvFileName = "test.csv";
$csvFile = public_path('csv/' . $csvFileName);
$this->readCSV($csvFile,array('delimiter' => ','))
You don't need an entire library. PHP has a built-in function http://php.net/manual/en/function.str-getcsv.php
It doesn't matter for such small CSVs, but try to demand of those who give you CSVs to have them properly formatted, and not transposed or use online solutions to do so yourself. Here's the solution, but it stores the entire array in memory, even though the package in the first example is specifically made to avoid that.
With composer
You may use Spatie's simple-excel like so:
$csv = __DIR__ . '/data.csv';
$data = [];
SimpleExcelReader::create($csv)
// ->useDelimiter(';') // Optional
->noHeaderRow() // Optional
->getRows()
->each(function(array $row) use (&$data) {
$length = count($row);
for ($i = 1; $i < $length; $i++) {
$data[$i - 1] ??= [];
$data[$i - 1][$row[0]] = $row[$i];
}
});
I also opened an issue for your use-case. (Issue was closed, due to the solution not being memory efficient)
Without composer
As said by "online Thomas", there's a native PHP function for that, and I find it easiest to use it, in general, like so:
$csv = __DIR__ . '/data.csv';
$data = array_map('str_getcsv', file($csv));
Caveat: does not produce the desired results, if fields contain linebreaks
Use closure in your case, or if you need a delimiter other than ',', etc.:
$csv = __DIR__ . '/data.csv';
$data = [];
array_map(function ($line) {
$row = str_getcsv($line);
$length = count($row);
for ($i = 1; $i < $length; $i++) use (&$data) {
$data[$i - 1] ??= [];
$data[$i - 1][$row[0]] = $row[$i];
}
}, file($csv));
For 2022 readers: what I am using right now is: https://github.com/spatie/simple-excel
work like a charm with no memory usage due to LazyCollections
Related
I have a large CSV file. Because of memory concerns (with MySQL), I would like to only read a part of it at a time, if possible.
That it's CSV might not be important. The important thing is that it needs to cut with a new line.
Example content:
Some CSV content
that will break
on a line break
This could be my path:
$path = 'path/to/my.csv';
A solution for it could in my mind look like this:
$csv_content1 = read_csv_file($path, 0, 100);
$csv_content2 = read_csv_file($path, 101, 200);
It reads the raw content on line 0-100.
It reads the raw content on line 101-200.
Information
No parsing is needed (just split into content).
The file exists on my own server.
Don't read the whole file into the memory.
I want to be able to do the second read on another time, not on the same run. I accept save temp values like pointers if needed.
I've been trying to read other topics but did not find an exact match to this problem.
Maybe some of these could somehow work?
SplFileObject
fgetcsv
Maybe I can't use $csv_content2 before I've used $csv_content1, because I need to save some kind of a pointer? In that case it's fine. I will read them in order anyway.
After much thinking and reading I finally think I found the solution to my problem. Correct me if this is a bad solution because of memory usage or from other perspectives.
First run
$buffer = part($path_to_file, 0, 100);
Next run
$buffer = part($path_to_file, $buffer['pointer'], 100);
Function
function part($path, $offset, $rows) {
$buffer = array();
$buffer['content'] = '';
$buffer['pointer'] = array();
$handle = fopen($path, "r");
fseek($handle, $offset);
if( $handle ) {
for( $i = 0; $i < $rows; $i++ ) {
$buffer['content'] .= fgets($handle);
$buffer['pointer'] = mb_strlen($buffer['content']);
}
}
fclose($handle);
return($buffer);
}
In my more object oriented environment it looks more like this:
function part() {
$handle = fopen($this->path, "r");
fseek($handle, $this->pointer);
if( $handle ) {
for( $i = 0; $i < 2; $i++ ) {
if( $this->pointer != $this->filesize ) {
$this->content .= fgets($handle);
}
}
$this->pointer += mb_strlen($this->content);
}
fclose($handle);
}
I am trying to write a file to a database 500 lines at a time so I do not run low on memory by avoiding dealing with very large arrays. For some reason, I am not getting any errors, but I am seeing a very, very small fraction entered into my table.
$ln = intval(shell_exec("wc -l $text_filename_with_path"));
echo "FILENAME WITH PATH: " . $text_filename_with_path ."\n\n";
echo "ARRAY LENGTH: " . $ln . "\n\n";
//pointer is initialized at zero
$fp = fopen($text_filename_with_path, "r");
$offset = 0;
$c = 0;
while($offset < $ln){
$row_limit = 500;
//get a 500 row section of the file
$chunk = fgets($fp, $row_limit);
//prepare for `pg_copy_from` by exploding to array
$chunk = explode("\n", $chunk);
//each record from the file being read is just one element
//prepare for three column DB table by adding columns (one
//unique PK built from UNIX time concat with counter, the
//other from a non-unique batch ID)
array_walk($chunk,
function (&$item, $key) use ($datetime, $c) {
$item = time() . $c . $key . "\t" . $datetime . "\t" . $item;
}
);
//increase offset to in order to move pointer forward
$offset += $row_limit;
//set pointer ahead to new position
fseek($fp, $offset);
echo "CURRENT POINTER: " . ftell($fp) . "\n"; //prints out 500, 1000, 1500 as expected
//insert array directly into DB from array
pg_copy_from($con, "ops.log_cache_test", $chunk, "\t", "\\NULL");
//increment to keep PK column unique
$c++;
}
I am getting as I say a fraction of the contents of the file, and lots of the data looks a bit messed up, eg about have the entries are blank in the part of the array element that gets assigned by $item within my array_walk() callback. Further it seems that exploding on \n is not working properly as lines seem exploded at ununiform positions (ie, log records don't look symmetrical). Have I just made a total mess out of this
You are not using fgets properly (2nd parameter isn't the number of rows);
There are two ways I can think of at the moment to solve it:
1. A loop getting one line at a time, until you've reached your row limit.
code should look something like this (not tested, assuming the end of line char is "\n" and no "\r")
<?php
/**Your code and initialization here*/
while (!feof($file)){
$counter = 0;
$buffer = array();
while (($line = fgets($file)) !== false && $counter < $row_limit) {
$line = str_replace("\n", "", $line); // fgets gets the line with the newline char at the end of line.
$buffer[] = $line;
$counter++;
}
insertRows($rows);
}
function insertRows($rows){
/** your code here */
}?>
Assuming the file isn't too big- using file_get_contents();
code should look something like this (same assumptions)
<?php
/**Your code and initialization here*/
$data = file_get_contents($filename);
if ($data === FALSE )
echo "Could not get content for file $filename\n";
$data = explode("\n",$data);
for ($offset=0;$offset<count($data);$offset+=$row_limit){
insertRows(array_slice ($rows,$offset,$row_limit));
}
function insertRows($rows){
/** your code here */
}
I didn't test it, so I hope it's ok.
I'm using the following to convert CSV to JSON (https://gist.github.com/robflaherty/1185299). I need to need to modify it so that instead of using the exact file url path, it's pulling the newest file url in the directory as it's "source" in $feed.
Any help would be great! I've tried using the code found here PHP: Get the Latest File Addition in a Directory, but can't seem to figure how modify it so that it would work.
<?php
header('Content-type: application/json');
// Set your CSV feed
$feed = 'http://myurl.com/test.csv';
// Arrays we'll use later
$keys = array();
$newArray = array();
// Function to convert CSV into associative array
function csvToArray($file, $delimiter) {
if (($handle = fopen($file, 'r')) !== FALSE) {
$i = 0;
while (($lineArray = fgetcsv($handle, 4000, $delimiter, '"')) !== FALSE) {
for ($j = 0; $j < count($lineArray); $j++) {
$arr[$i][$j] = $lineArray[$j];
}
$i++;
}
fclose($handle);
}
return $arr;
}
// Do it
$data = csvToArray($feed, ',');
// Set number of elements (minus 1 because we shift off the first row)
$count = count($data) - 1;
//Use first row for names
$labels = array_shift($data);
foreach ($labels as $label) {
$keys[] = $label;
}
// Add Ids, just in case we want them later
$keys[] = 'id';
for ($i = 0; $i < $count; $i++) {
$data[$i][] = $i;
}
// Bring it all together
for ($j = 0; $j < $count; $j++) {
$d = array_combine($keys, $data[$j]);
$newArray[$j] = $d;
}
// Print it out as JSON
echo json_encode($newArray);
?>
It's a difficult question to answer because there isn't enough detail.
Here are some questions that need answered.
1). Are you creating the csv files that are being read? If you are, you just make sure that the file you want to read is called "latest.csv" and when you go to create "latest.csv" you check for an existing "latest.csv" and rename/archive it first. Your directory then contains archives but the latest one is always of the same name.
2). If you are not creating the csv files then you might want to ask the provider of the csv files if there's a way for you to identify the latest one, as surely, if they are providing them they'd expect to be providing everyone the latest feed and have a mechanism of doing that.
3). If you don't know the provider and want to take a guess, have a look at how the files are named and try to predict the latest one. Eg, if they appear to be including a month and year in them do a file_exists() (if you can) on the predicted next latest file. Again, just a possibility.
Based on your comments, if the files reside on the same server or are accessible on a filesystem that supports the file functions, then:
array_multisort(array_map('filemtime', $files=glob('/path/to/*.csv')), SORT_DESC, $files);
$newest = $files[0];
For remote access you could look at something like this: How can I download the most recent file on FTP with PHP?
I seem to be in a catch-22 with a small app I'm developing in PHP on Google App Engine using Quercus;
I have a remote csv-file which I can download & store in a string
To parse that string I'd ideally use str_getcsv, but Quercus doesn't have that function yet
Quercus does seem to know fgetcsv, but that function expects a file handle which I don't have (and I can't make a new one as GAE doesn't allow files to be created)
Anyone got an idea of how to solve this without having to dismiss the built-in PHP csv-parser functions and write my own parser instead?
I think the simplest solution really is to write your own parser . it's a piece of cake anyway and will get you to learn more regex- it makes no sense that there is no csv string to array parser in PHP so it's totally justified to write your own. Just make sure it's not too slow ;)
You might be able to create a new stream wrapper using stream_wrapper_register.
Here's an example from the manual which reads global variables: http://www.php.net/manual/en/stream.streamwrapper.example-1.php
You could then use it like a normal file handle:
$csvStr = '...';
$fp = fopen('var://csvStr', 'r+');
while ($row = fgetcsv($fp)) {
// ...
}
fclose($fp);
this shows a simple manual parser i wrote with example input with qualifed, non-qualified, escape feature. it can be used for the header and data rows and included an assoc array function to make your data into a kvp style array.
//example data
$fields = strparser('"first","second","third","fourth","fifth","sixth","seventh"');
print_r(makeAssocArray($fields, strparser('"asdf","bla\"1","bl,ah2","bl,ah\"3",123,34.234,"k;jsdfj ;alsjf;"')));
//do something like this
$fields = strparser(<csvfirstline>);
foreach ($lines as $line)
$data = makeAssocArray($fields, strparser($line));
function strparser($string, $div = ",", $qual = "\"", $esc = "\\") {
$buff = "";
$data = array();
$isQual = false; //the result will be a qualifier
$inQual = false; //currently parseing inside qualifier
//itereate through string each byte
for ($i = 0; $i < strlen($string); $i++) {
switch ($string[$i]) {
case $esc:
//add next byte to buffer and skip it
$buff .= $string[$i+1];
$i++;
break;
case $qual:
//see if this is escaped qualifier
if (!$inQual) {
$isQual = true;
$inQual = true;
break;
} else {
$inQual = false; //done parseing qualifier
break;
}
case $div:
if (!$inQual) {
$data[] = $buff; //add value to data
$buff = ""; //reset buffer
break;
}
default:
$buff .= $string[$i];
}
}
//get last item as it doesnt have a divider
$data[] = $buff;
return $data;
}
function makeAssocArray($fields, $data) {
foreach ($fields as $key => $field)
$array[$field] = $data[$key];
return $array;
}
if it can be dirty and quick. I would just use the
http://php.net/manual/en/function.exec.php
to pass it in and use sed and awk (http://shop.oreilly.com/product/9781565922259.do) to parse it. I know you wanted to use the php parser. I've tried before and failed simply because its not vocal about its errors.
Hope this helps.
Good luck.
You might be able to use fopen with php://temp or php://memory (php.net) to get it to work. What you would do is open either php://temp or php://memory, write to it, then rewind it (php.net), and then pass it to fgetcsv. I didn't test this, but it might work.
The closest I've seen in the PHP docs, is to fread() a given length, but that doesnt specify which line to start from. Any other suggestions?
Yes, you can do that easily with SplFileObject::seek
$file = new SplFileObject('filename.txt');
$file->seek(1000);
for($i = 0; !$file->eof() && $i < 1000; $i++) {
echo $file->current();
$file->next();
}
This is a method from the SeekableIterator interface and not to be confused with fseek.
And because SplFileObject is iterable you can do it even easier with a LimitIterator:
$file = new SplFileObject('longFile.txt');
$fileIterator = new LimitIterator($file, 1000, 2000);
foreach($fileIterator as $line) {
echo $line, PHP_EOL;
}
Again, this is zero-based, so it's line 1001 to 2001.
You not going to be able to read starting from line X because lines can be of arbitrary length. So you will have to read from the start counting the number of lines read to get to line X. For example:
<?php
$f = fopen('sample.txt', 'r');
$lineNo = 0;
$startLine = 3;
$endLine = 6;
while ($line = fgets($f)) {
$lineNo++;
if ($lineNo >= $startLine) {
echo $line;
}
if ($lineNo == $endLine) {
break;
}
}
fclose($f);
Unfortunately, in order to be able to read from line x to line y, you'd need to be able to detect line breaks... and you'd have to scan through the whole file. However, assuming you're not asking about this for performance reasons, you can get lines x to y with the following:
$x = 10; //inclusive start line
$y = 20; //inclusive end line
$lines = file('myfile.txt');
$my_important_lines = array_slice($lines, $x, $y);
See: array_slice
Well, you can't use function fseek to seek the appropriate position because it works with given number of bytes.
I think that it's not possible without some sort of cache or going through lines one after the other.
Here is the possible solution :)
<?php
$f = fopen('sample.txt', 'r');
$lineNo = 0;
$startLine = 3;
$endLine = 6;
while ($line = fgets($f)) {
$lineNo++;
if ($lineNo >= $startLine) {
echo $line;
}
if ($lineNo == $endLine) {
break;
}
}
fclose($f);
?>
If you're looking for lines then you can't use fread because that relies on a byte offset, not the number of line breaks. You actually have to read the file to find the line breaks, so a different function is more appropriate. fgets will read the file line-by-line. Throw that in a loop and capture only the lines you want.
I was afraid of that... I guess it's plan B then :S
For each AJAX request I'm going to:
Read into a string the number of lines I'm going to return to the client.
Copy the rest of the file into a temp file.
Return string to the client.
It's lame, and it will probably be pretty slow with 10,000+ lines files, but I guess it's better than reading the same over and over again, at least the temp file is getting shorter with every request... No?