PHPExcel reading large CSV is failing

PHPExcel reading large CSV is failing - php

I am attempting to import a csv using PHPExcel into my application so that I can load the data into a table. When the file reaches 2 meg+ the code fails.
I'm running Laravel on WAMP64. The code that is failing is:
$objPHPExcel = PHPExcel_IOFactory::load(Input::file('file')->getRealPath());
The error message is:
ErrorException: file_get_contents(C:\wamp\www\imax\public): failed to open stream: Permission denied in C:\wamp\www\imax\vendor\phpoffice\phpexcel\Classes\PHPExcel\Shared\OLERead.php:85
I know it's a size issue because the code completes properly when the file is 2048K. I can add one character to the file pushing it to 2049K and it fails. So it's not a permissions issue.
The line that fails in OLERead.php is:
// Get the file identifier
// Don't bother reading the whole file until we know it's a valid OLE file
$this->data = file_get_contents($sFileName, FALSE, NULL, 0, 8);
Wampserver 3.0.6
PHP 7.0.10

It sounds like you need to up the memory allocated for PHP. You can do that at runtime or change your php.ini config file.
To up memory limit during runtime:
ini_set('memory_limit','16M'); Feel free to change the 16M to what you need.
To change it permanently:
Open your php.ini file and look for this line upload_max_filesize = 2M; and change the 2M to what you desire. (i also believe WAMP lets you edit this by right clicking the icon and choosing on of the options)
Note: You may want to just search for upload_max_filesize and leave out the = 2 M part as yours may be different.

If the code is using OLERead, then either you're not reading a large CSV file, but a BIFF-format xls file... or you're letting PHPExcel try and identify the filetype itself.... if you know that it is a CSV file, then instantiate the CSV Reader manually rather than letting PHPExcel try and identify the file.
// Tell PHPExcel that you want to load a CSV file
$objReader = new PHPExcel_Reader_CSV();
// Load the $inputFileName to a PHPExcel Object
$objPHPExcel = $objReader->load(Input::file('file')->getRealPath());
However, if you know that you're working with CSV files, then it's more efficient to use PHP's native CSV reading function, fgetcsv()

Related

High memory usage when determining mime type using finfo->buffer(), but only with certain text file types?

Is this a bug with how the finfo->buffer method works? I noticed that PHP allocates a significant amount of memory when determining mime type using finfo, and only with certain file types like csv and txt files. Quick sample code below:
// Load a 60 MB CSV file
$contents = file_get_contents('file.csv');
$finfo = new finfo(FILEINFO_MIME_TYPE);
// Outputs text/plain. Peak memory usage after this is 400+ MB
echo $finfo->buffer($contents);
If I try a different file type though, like a large zipped archive for example, memory usage barely changes after the file_get_contents call.
This was tested on PHP 7.2.28

it can be some sort of php or libmagic bug as mentioned here:
https://bugs.php.net/bug.php?id=79263
https://bugs.php.net/bug.php?id=78987
I saw the Issue on a machine with:
php 7.3.31-1~deb10u1
libmagic 533
but there wasn't any problem on another machine with this config:
php 7.4.29
libmagic 537
so if you update your software packages the problem should be fixed.
another work around would be using file() method.
instead of:
$finfo->buffer($contents);
saving the content somewhere on the disk, then :
$finfo->file($filepath);
links:
libmagic file Issue, on version 5.33

Getting mimetype from URL in PHP on Windows

I try to write a script that would fetch some external resource and count it's size. To do so I need to detect if the given URL provides some binary file (i.e. graphics) or some HTML document. I thought of checking mimetype but when I run following code:
$finfo = new \finfo(FILEINFO_MIME);
return $finfo->file($this->url);
I get the information Warning: finfo::file(): Failed identify data 0:no magic files loaded in. I've already uncommented the line extension=php_fileinfo.dll in my php.ini but for some reason there's no mimetypes definition. I run the code on standard xampp instalation.
How can I fix it? Or maybe you have some other ideas for such detection?

PHP's fgetcsv returning false at beginning of file

This is a PHP script running under Windows. It had been working but has recently stopped.
The file is opened and a valid file handle is returned: $fh = fopen($filename, 'r');
However, the very first time I call fgetcsv it returns false:
$headers = fgetcsv($fh, 6000, ',');
$line_no++;
if($headers === FALSE){
echo 'Error parsing file headers';
}
This is now happening on all csv files I try. Other changes I have tried to no avail are:
ini_set('auto_detect_line_endings', true); Right before opening the file
rewind($fh); Right after opening the file
Using both 0 or a number like 6000 for the second parameter, length.
Changing the file's line endings style from unix to Windows and Mac
It seems like something with Windows is causing this file not to parse.
Is there any way to return the actual error from fgetcsv? The documentation doesn't say there is, just that it returns false on any error. Are there other Windows settings that could be causing issues? The Windows security settings give everyone full control of the files.

The issue turned out to be that a change at the beginning of the script was using the same file as a lock file so the script wouldn't be run on the same file twice at the same time. Then, later in the script when I actually wanted to parse the file, I opened it again (which was successful), but then I couldn't actually read the contents.
The solution I used was to create a temporary lock file based on the filename instead of using the actual file. Eg: $filename.'.lock'
It was a silly mistake on my part, however it would have been much more helpful if PHP had returned or written an error/warning at some point.

The canonical way to debug this would be "print_r($headers)".
As fgetcsv returns an array, it must be empty or a non-array. If you can configure (or have configured) PHP to log errors to a known location (Windows with IIS would be "syslog" and should show up in the Event Viewer), you should be able to figure out what's wrong.

What to check if PharData::buildFromDirectory fails to write contents of a file to a tar?

I have a background script which generates html files (ea 100-500KB in size) as a by-product and when it has accumulated 500 of them, it packs them up in a .tar.gz and archives them. It was running non-stop for several weeks and generated 131 .tar.gz files thus far until this morning when it threw the following exception:
Uncaught exception 'PharException' with message 'tar-based phar
"E:/xampp/.../archive/1394109645.tar" cannot be created, contents of file
"58836.html" could not be written' in E:/xampp/.../background.php:68
The code responsible for archiving
$name = $path_archive . $set . '.tar';
$archive = new PharData($name);
$archive->buildFromDirectory($path_input); // <--- line 68
$archive->compress(Phar::GZ);
unset($archive);
unlink($name);
array_map('unlink', glob($path_input . '*'));
What I've checked and made sure of so far
I couldn't find anything irregular in the html file itself,
nothing else was touching this file during the process,
scripts timeout and memory were unlimited
and enough spare memory and disk space
What could be causing the exception and/or is there a way to get a more detailed message back from PharData::buildFromDirectory?
Env: Virtual XP (in VirtualBox) running portable XAMPP (1.8.2, PHP 5.4.25) in a shared folder of a Win7 host

I solved similar problem after hours of bug-hunting today. It was caused by too little space on one partition of the disk. I had enough space in the partition where tar.gz archive was created but after removing some log files from another partition everything works again.
I think it's possible that object PharData stores some temporary data somewhere and that's why this is happening even if there is enough space on the disk where you create tar.gz archive.

Dynamically created zip files by ZipStream in PHP won't open in OSX

I have a PHP site with a lot of media files and users need to be able to download multiple files at a time as a .zip. I'm trying to use ZipStream to serve the zips on the fly with "store" compression so I don't actually have to create a zip on the server, since some of the files are huge and it's prohibitively slow to compress them all.
This works great and the resulting files can be opened by every zip program I've tried with no errors except for OS X's default unzipping program, Archive Utility. You double click the .zip file and Archive Utility decides it doesn't look a real zip and instead compresses into a .cpgz file.
Using unzip or ditto in the OS X terminal or StuffIt Expander unzips the file with no problem but I need the default program (Archive Utility) to work for the sake of our users.
What sort of things (flags, etc.) in otherwise acceptable zip files can trip Archive Utility into thinking a file isn't a valid zip?
I've read this question, which seems to describe a similar issue but I don't have any of the general purpose bitfield bits set so it's not the third bit issue and I'm pretty sure I have valid crc-32's because when I don't, WinRAR throws a fit.
I'm happy to post some code or a link to a "bad" zip file if it would help but I'm pretty much just using ZipStream, forcing it into "large file mode" and using "store" as the compression method.
Edit - I've tried the "deflate" compression algorithm as well and get the same results so I don't think it's the "store". It's also worth pointing out that I'm pulling down the files one a time from a storage server and sending them out as they arrive so a solution that requires all the files to be downloaded before sending anything isn't going to be viable (extreme example is 5GB+ of 20MB files. User can't wait for all 5GB to transfer to zipping server before their download starts or they'll think it's broken)
Here's a 140 byte, "store" compressed, test zip file that exhibits this behavior: http://teknocowboys.com/test.zip

The problem was in the "version needed to extract" field, which I found by doing a hex diff on a file created by ZipStream vs a file created by Info-zip and going through the differences, trying to resolve them.
ZipStream by default sets it to 0x0603. Info-zip sets it to 0x000A. Zip files with the former value don't seem to open in Archive Utility. Perhaps it doesn't support the features at that version?
Forcing the "version needed to extract" to 0x000A made the generated files open as well in Archive Utility as they do everywhere else.
Edit: Another cause of this issue is if the zip file was downloaded using Safari (user agent version >= 537) and you under-reported the file size when you sent out your Content-Length header.
The solution we employ is to detect Safari >= 537 server side and if that's what you're using, we determine the difference between the Content-Length size and the actual size (how you do this depends on your specific application) and after calling $zipStream->finish(), we echo chr(0) to reach the correct length. The resulting file is technically malformed and any comment you put in the zip won't be displayed, but all zip programs will be able to open it and extract the files.
IE requires the same hack if you're misreporting your Content-Length but instead of downloading a file that doesn't work, it just won't finish downloading and throws a "download interrupted".

use ob_clean(); and flush();
Example :
$file = __UPLOAD_PATH . $projectname . '/' . $fileName;
$zipname = "watherver.zip"
$zip = new ZipArchive();
$zip_full_path_name = __UPLOAD_PATH . $projectname . '/' . $zipname;
$zip->open($zip_full_path_name, ZIPARCHIVE::CREATE);
$zip->addFile($file); // Adding one file for testing
$zip->close();
if(file_exists($zip_full_path_name)){
header('Content-type: application/zip');
header('Content-Disposition: attachment; filename="'.$zipname.'"');
ob_clean();
flush();
readfile($zip_full_path_name);
unlink($zip_full_path_name);
}

I've had this exact issue but with a different cause.
In my case the php generated zip would open from the command line, but not via finder in OSX.
I had made the mistake of allowing some HTML content into the output buffer prior to creating the zip file and sending that back as the response.
<some html></....>
<?php
// Output a zip file...
The command line unzip program was evidently tolerant of this but the Mac unarchive function was not.

No idea. If the external ZipString class doesn't work, try another option. The PHP ZipArchive extension won't help you, since it doesn't support streaming but only ever writes to files.
But you could try the standard Info-zip utility. It can be invoked from within PHP like this:
#header("Content-Type: archive/zip");
passthru("zip -0 -q -r - *.*");
That would lead to an uncompressed zip file directly send back to the client.
If that doesn't help, then the MacOS zip frontend probably doesn't like uncompressed stuff. Remove the -0 flag then.

The InfoZip commandline tool I'm using, both on Windows and Linux, uses version 20 for the zip's "version needed to extract" field. This is needed on PHP as well, as the default compression is the Deflate algorithm. Thus the "version needed to extract" field should really be 0x0014. If you alter the "(6 << 8) +3" code in the referenced ZipStream class to just "20", you should get a valid Zip file across platforms.
The author is basically telling you that the zip file was created in OS/2 using the HPFS file system, and the Zip version needed predates InfoZip 1.0. Not many implementations know what to do about that one any longer ;)

For those using ZipStream in Symfony, here's your solution: https://stackoverflow.com/a/44706446/136151
use Symfony\Component\HttpFoundation\StreamedResponse;
use Aws\S3\S3Client;
use ZipStream;
//...
/**
* #Route("/zipstream", name="zipstream")
*/
public function zipStreamAction()
{
//test file on s3
$s3keys = array(
"ziptestfolder/file1.txt"
);
$s3Client = $this->get('app.amazon.s3'); //s3client service
$s3Client->registerStreamWrapper(); //required
$response = new StreamedResponse(function() use($s3keys, $s3Client)
{
// Define suitable options for ZipStream Archive.
$opt = array(
'comment' => 'test zip file.',
'content_type' => 'application/octet-stream'
);
//initialise zipstream with output zip filename and options.
$zip = new ZipStream\ZipStream('test.zip', $opt);
//loop keys useful for multiple files
foreach ($s3keys as $key) {
// Get the file name in S3 key so we can save it to the zip
//file using the same name.
$fileName = basename($key);
//concatenate s3path.
$bucket = 'bucketname';
$s3path = "s3://" . $bucket . "/" . $key;
//addFileFromStream
if ($streamRead = fopen($s3path, 'r')) {
$zip->addFileFromStream($fileName, $streamRead);
} else {
die('Could not open stream for reading');
}
}
$zip->finish();
});
return $response;
}
If your controller action response is not a StreamedResponse, you are likely going to get a corrupted zip containing html as I found out.

It's an old question but I leave what it worked for me just in case it helps someone else.
When setting the options you need set Zero header to true and enable zip 64 to false (this will limit the archive to archive to 4 Gb though):
$options->setZeroHeader(true);
$opt->setEnableZip64(false)
Everything else as described by Forer.
Solution found on https://github.com/maennchen/ZipStream-PHP/issues/71

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHPExcel reading large CSV is failing - php

Related

High memory usage when determining mime type using finfo->buffer(), but only with certain text file types?

Getting mimetype from URL in PHP on Windows

PHP's fgetcsv returning false at beginning of file

What to check if PharData::buildFromDirectory fails to write contents of a file to a tar?

Dynamically created zip files by ZipStream in PHP won't open in OSX

Categories

Resources