Best way to read a large file in php [duplicate] - php

This question already has answers here:
Reading very large files in PHP
(8 answers)
Closed 1 year ago.
I have a file with around 100 records for now.
The file has users in json format per line.
Eg
{"user_id" : 1,"user_name": "Alex"}
{"user_id" : 2,"user_name": "Bob"}
{"user_id" : 3,"user_name": "Mark"}
Note : This is a just very simple example, I have more complex json values per line in the file.
I am reading the file line by line and store that in an array which obviously will be big if there are a lot of items in the file.
public function read(string $file) : array
{
//Open the file in "reading only" mode.
$fileHandle = fopen($file, "r");
//If we failed to get a file handle, throw an Exception.
if ($fileHandle === false) {
throw new Exception('Could not get file handle for: ' . $file);
}
$lines = [];
//While we haven't reach the end of the file.
while (!feof($fileHandle)) {
//Read the current line in.
$lines[] = json_decode(fgets($fileHandle));
}
//Finally, close the file handle.
fclose($fileHandle);
return $lines;
}
Next, Ill process this array and only take the parameters I need (some parameters might be further processed) and then Ill export this array to csv.
public function processInput($users){
$data = [];
foreach ($users as $key => $user)
{
$data[$key]['user_id'] = $user->user_id;
$data[$key]['user_name'] = strtoupper($user->user_name);
}
// Call export to csv $data.
}
What should be the best way to read the file (incase we have a big file)?
I know file_get_contents is not optimized way and instead fgets is a better approach.
Is there a much better way considering big file read and then put it to csv.

You need to modify your reader to make it more "lazy" in some sense. For example consider this:
public function read(string $file, callable $rowProcessor) : void
{
//Open the file in "reading only" mode.
$fileHandle = fopen($file, "r");
//If we failed to get a file handle, throw an Exception.
if ($fileHandle === false) {
throw new Exception('Could not get file handle for: ' . $file);
}
//While we haven't reach the end of the file.
while (!feof($fileHandle)) {
//Read the current line in.
$line = json_decode(fgets($fileHandle));
$rowProcessor($line);
}
//Finally, close the file handle.
fclose($fileHandle);
return $lines;
}
Then your will need different code that works with this:
function processAndWriteJson($filename) { //Names are hard
$writer = fopen('output.csv', 'w');
read($filename, function ($row) use ($writer) {
// Do processing of the single row here
fputcsv($writer, $processedRow);
});
}
If you want to get the same result as before with your read method you can do:
$lines = [];
read($filename, function ($row) use ($writer) {
$lines[] = $row;
});
It does provide some more flexibility. Unfortunately it does mean you can only process one line at a time and scanning up and down the file is harder

Related

Fatal Error - Out of Memory while reading a *.dat file in php [duplicate]

I am reading a file containing around 50k lines using the file() function in Php. However, its giving a out of memory error since the contents of the file are stored in the memory as an array. Is there any other way?
Also, the lengths of the lines stored are variable.
Here's the code. Also the file is 700kB not mB.
private static function readScoreFile($scoreFile)
{
$file = file($scoreFile);
$relations = array();
for($i = 1; $i < count($file); $i++)
{
$relation = explode("\t",trim($file[$i]));
$relation = array(
'pwId_1' => $relation[0],
'pwId_2' => $relation[1],
'score' => $relation[2],
);
if($relation['score'] > 0)
{
$relations[] = $relation;
}
}
unset($file);
return $relations;
}
Use fopen, fread and fclose to read a file sequentially:
$handle = fopen($filename, 'r');
if ($handle) {
while (!feof($handle)) {
echo fread($handle, 8192);
}
fclose($handle);
}
EDIT after update of question and comments to answer of fabjoa:
There is definitely something fishy if a 700kb file eats up 140MB of memory with that code you gave (you could unset $relation at the end of the each iteration though). Consider using a debugger to step through it to see what happens. You might also want to consider rewriting the code to use SplFileObject's CSV functions as well (or their procedural cousins)
SplFileObject::setCsvControl example
$file = new SplFileObject("data.csv");
$file->setFlags(SplFileObject::READ_CSV);
$file->setCsvControl('|');
foreach ($file as $row) {
list ($fruit, $quantity) = $row;
// Do something with values
}
For an OOP approach to iterate over the file, try SplFileObject:
SplFileObject::fgets example
$file = new SplFileObject("file.txt");
while (!$file->eof()) {
echo $file->fgets();
}
SplFileObject::next example
// Read through file line by line
$file = new SplFileObject("misc.txt");
while (!$file->eof()) {
echo $file->current();
$file->next();
}
or even
foreach(new SplFileObject("misc.txt") as $line) {
echo $line;
}
Pretty much related (if not duplicate):
How to save memory when reading a file in Php?
If you don't know the maximum line length and you are not comfortable to use a magic number for the max line length then you'll need to do an initial scan of the file and determine the max line length.
Other than that the following code should help you out:
// length is a large number or calculated from an initial file scan
while (!feof($handle)) {
$buffer = fgets($handle, $length);
echo $buffer;
}
Old question but since I haven't seen anyone mentioning it, PHP generators is a great way to reduce save memory consumption.
For example:
function read($fileName)
{
$fileHandler = fopen($fileName, 'rb');
while(($line = fgets($fileHandler)) !== false) {
yield rtrim($line, "\r\n");
}
fclose($fileHandler);
}
foreach(read(__DIR__ . '/filenameHere') as $line) {
echo $line;
}
allocate more memory during the operation, maybe something like ini_set('memory_limit', '16M');. Don't forget to go back to initial memory allocation once operation is done

Parse tabulation-based data file php

I have several files to parse (with PHP) in order to insert their respective content in different database tables.
First point : the client gave me 6 files, 5 are CSV with values separated by coma ; The last one do not come from the same database and its content is tabulation-based.
I built a FileParser that uses SplFileObject to execute a method on each line of the file-content (basically, create an Entity with each dataset and persist it to the database, with Symfony2 and Doctrine2).
But I cannot manage to parse the tabulation-based text file with SplFileObject, it does not split the content in lines as I expect it to do...
// In my controller context
$parser = new MyAmazingFileParser();
$parser->parse($filename, $delimitor, function ($data) use ($em) {
$e = new Entity();
$e->setSomething($data[0);
// [...]
$em->persist($e);
});
// In my parser
public function parse($filename, $delimitor = ',', $run = null) {
if (is_callable($run)) {
$handle = new SplFileObject($filename);
$infos = new SplFileInfo($filename);
if ($infos->getExtension() === 'csv') {
// Everything is going well here
$handle->setCsvControl(',');
$handle->setFlags(SplFileObject::DROP_NEW_LINE + SplFileObject::READ_AHEAD + SplFileObject::SKIP_EMPTY + SplFileObject::READ_CSV);
foreach (new LimitIterator($handle, 1) as $data) {
$result = $run($data);
}
} else {
// Why does the Iterator-way does not work ?
$handle->setCsvControl("\t");
// I have tried with all the possible flags combinations, without success...
foreach (new LimitIterator($handle, 1) as $data) {
// It always only gets the first line...
$result = $run($data);
}
// And the old-memory-killing-dirty-way works ?
$fd = fopen($filename, 'r');
$contents = fread($fd, filesize($filename));
foreach (explode("\t", $contents) as $line) {
// Get all the line as I want... But it's dirty and memory-expensive !
$result = $run($line);
}
}
}
}
It is probably related with the horrible formatting of my client's file, but after a long discussion with them, they really cannot get another format for me, for some acceptable reasons (constraints in their side), unfortunately.
The file is currently long of 49459 lines, so I really think the memory is important at this step ; So I have to make the SplFileObject way working, but do not know how.
An extract of the file can be found here :
Data-extract-hosted

How to edit a particular line of the csv file using PHP?

I have a PHP script that allow users to upload their data. The first line of the csv file are the headers (fname, lname, age, address, email).
My plan is - after the users uploaded their csv, my script will run a function to check the spelling of the headers. If there are misspelled header, my script will correct it. I am using the code below to correct the headers:
if (($file = fopen($csvFile , "r")) != FALSE) {
$ctr = 0;
$record = fgetcsv($file, 1024)) != FALSE) {
if ($ctr == 0) {
correctHeader($record);
# write to new csv.
} else {
# write to new csv.
}
}
}
After correcting, the value of the header and the succeeding lines will be appended on the new csv file. I think this step can be optimized, if I could just edit the first line of the csv (header) and skip the # write to new csv step.
One of the ways I can think of is the following:
Use fgets() to get the first line of the file (instead of fgetcsv()).
Save the length of the line in bytes.
Parse the line with str_getcsv().
Correct headers as needed.
Save headers into a new CSV file.
fopen() the original CSV file for reading.
fseek() the original CSV file handle to length of first line (saved in step 2) + 1.
fopen() the new CSV file for writing (appending actually).
fread() the original CSV file in the loop until EOF and fwrite() chunks into the new CSV file.
Fix bugs.
Have a pint. :)
Here's the code (minus the loop for reading):
<?php
$from = 'd.csv';
$to = 'd.good.csv';
$old = fopen($from, 'r');
if (!is_resource($old)) {
die("Failed to read from source file: $from");
}
$headerLine = fgets($old);
$headerLine = fixHeaders($headerLine);
$new = fopen($to, 'w');
if (!is_resource($new)) {
die("Failed to write to destination file: $new");
}
// Save the fixed header into the new file
fputs($new, $headerLine);
// Read the rest of old and save to new.
// Old file is already open and we are the second line.
// For large files, reading should probably be done in the loop with chunks.
fwrite($new, fread($old, filesize($from)));
// Close files
fclose($old);
fclose($new);
// Just an example
function fixHeaders($line) {
return strtoupper($line);
}

How can I split a CSV file in PHP?

I have a big CSV file. I want to separate this file into separate files based on the value in one of the fields.
This is what I have done. Using fgetcsv I convert the CSV into an array, and using in_array, I check the content and display if it contains the string within the array.
I will be getting the comparison string from another text file iteratively to check whether it is contained in the csv. In this case I have specified it as "Testing".
Below is the code:
if (($handle = fopen("test.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
if(in_array("Testing", $data))
{
var_dump($data);
}
}
fclose($handle);
}
This is working, but now I am stuck. How do I write $data into another CSV file? Or is there a better way to do this?
It's actually pretty simple and if the string just has to be on the line, you don't even need fgetcsv. Just
$srcFile = new SplFileObject('test.csv');
$destFile = new SplFileObject('new.csv', 'w+');
foreach ($srcFile as $line) {
if (strpos($line, 'testing') !== FALSE) {
$destFile->fwrite($line);
}
}
This will create two file objects. The first one holding the content of your source file. The second one creating an all new file for the lines containing your search string. We then just iterate over each line and check if the search string exists. If so, we write it to destination file.
The source file will not be touched this way. If you want to have one file with the search string and one file without, just create a third SplFileObject and add an else block to the if writing the line to that one then. In the end, delete the source csv file.
You have to do some tricky thing I am providing some basic idea for doing so, here is the code:
//opening file
if ($fp = fopen('log.csv', 'r')) {
$line_number = 0;
//loop for Reading file as line by line csv file
while ($line = fgetcsv($fp, 0, ';')) {
if ($line_number++ == 0) {
continue;
}
//array data string to make possible to provide file name
//according to column name required
$date = explode(' ', $line[0]);
//Change the column name according to your needs
$file = $date[0] .'.log';
file_put_contents(
//change the folder name according to your needs
'monthly/'. $file,
//printing data in appended file
implode(';', $line) ."\n",
FILE_APPEND
);
}
//closing file
fclose($fp);
}
It reads CSV file line by line, extracts date part from the first column and creates new file and appends data to it.
Note:
folder "monthly" must be writable

Least memory intensive way to read a file in PHP

I am reading a file containing around 50k lines using the file() function in Php. However, its giving a out of memory error since the contents of the file are stored in the memory as an array. Is there any other way?
Also, the lengths of the lines stored are variable.
Here's the code. Also the file is 700kB not mB.
private static function readScoreFile($scoreFile)
{
$file = file($scoreFile);
$relations = array();
for($i = 1; $i < count($file); $i++)
{
$relation = explode("\t",trim($file[$i]));
$relation = array(
'pwId_1' => $relation[0],
'pwId_2' => $relation[1],
'score' => $relation[2],
);
if($relation['score'] > 0)
{
$relations[] = $relation;
}
}
unset($file);
return $relations;
}
Use fopen, fread and fclose to read a file sequentially:
$handle = fopen($filename, 'r');
if ($handle) {
while (!feof($handle)) {
echo fread($handle, 8192);
}
fclose($handle);
}
EDIT after update of question and comments to answer of fabjoa:
There is definitely something fishy if a 700kb file eats up 140MB of memory with that code you gave (you could unset $relation at the end of the each iteration though). Consider using a debugger to step through it to see what happens. You might also want to consider rewriting the code to use SplFileObject's CSV functions as well (or their procedural cousins)
SplFileObject::setCsvControl example
$file = new SplFileObject("data.csv");
$file->setFlags(SplFileObject::READ_CSV);
$file->setCsvControl('|');
foreach ($file as $row) {
list ($fruit, $quantity) = $row;
// Do something with values
}
For an OOP approach to iterate over the file, try SplFileObject:
SplFileObject::fgets example
$file = new SplFileObject("file.txt");
while (!$file->eof()) {
echo $file->fgets();
}
SplFileObject::next example
// Read through file line by line
$file = new SplFileObject("misc.txt");
while (!$file->eof()) {
echo $file->current();
$file->next();
}
or even
foreach(new SplFileObject("misc.txt") as $line) {
echo $line;
}
Pretty much related (if not duplicate):
How to save memory when reading a file in Php?
If you don't know the maximum line length and you are not comfortable to use a magic number for the max line length then you'll need to do an initial scan of the file and determine the max line length.
Other than that the following code should help you out:
// length is a large number or calculated from an initial file scan
while (!feof($handle)) {
$buffer = fgets($handle, $length);
echo $buffer;
}
Old question but since I haven't seen anyone mentioning it, PHP generators is a great way to reduce save memory consumption.
For example:
function read($fileName)
{
$fileHandler = fopen($fileName, 'rb');
while(($line = fgets($fileHandler)) !== false) {
yield rtrim($line, "\r\n");
}
fclose($fileHandler);
}
foreach(read(__DIR__ . '/filenameHere') as $line) {
echo $line;
}
allocate more memory during the operation, maybe something like ini_set('memory_limit', '16M');. Don't forget to go back to initial memory allocation once operation is done

Categories