xls parser for laravel for very large files - php

Is there any package for laravel (also for lumen) that could help me parse(read) huge xls files ? I mention huge because packages like maatwebsite/excel, phpoffice/phpspreadsheet do not help. I need for xls not for xlsx. Please help.

phpoffice's phpspreadsheet does in fact have support for .xls files, as can be found in the reading column of the table on their documentation page.
edit 1:
The mentioned package has support for reading in chunks:
This can be particularly useful for conserving memory, by allowing you to read and process a large workbook in "chunks": an example of this usage might be when transferring data from an Excel worksheet to a database.
$inputFileType = 'Xls';
$inputFileName = './sampleData/example2.xls';
/** Define a Read Filter class implementing \PhpOffice\PhpSpreadsheet\Reader\IReadFilter */
class ChunkReadFilter implements \PhpOffice\PhpSpreadsheet\Reader\IReadFilter
{
private $startRow = 0;
private $endRow = 0;
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->startRow = $startRow;
$this->endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the configured rows
if (($row == 1) || ($row >= $this->startRow && $row < $this->endRow)) {
return true;
}
return false;
}
}
/** Create a new Reader of the type defined in $inputFileType **/
$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader($inputFileType);
/** Define how many rows we want to read for each "chunk" **/
$chunkSize = 2048;
/** Create a new Instance of our Read Filter **/
$chunkFilter = new ChunkReadFilter();
/** Tell the Reader that we want to use the Read Filter **/
$reader->setReadFilter($chunkFilter);
/** Loop to read our worksheet in "chunk size" blocks **/
for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) {
/** Tell the Read Filter which rows we want this iteration **/
$chunkFilter->setRows($startRow,$chunkSize);
/** Load only the rows that match our filter **/
$spreadsheet = $reader->load($inputFileName);
// Do some processing here
}
See samples/Reader/12_Reading_a_workbook_in_chunks_using_a_configurable_read_filter_ for a working example of this code.

Related

PhpSpreadSheet : Writing to a specific WorkSheet

I'm new to PhpSpreadSheet, and I'd like to know if there is a way to load a CSV into a specific WorkSheet ?
I tried the code bellow but it seems to keep loadind the CSVs into the first WorkSheet :/.
<?php
require 'vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;
use PhpOffice\PhpSpreadsheet\Reader\Csv;
$spreadsheet = new Spreadsheet();
$spreadsheet->setActiveSheetIndex(0);
$pathToCsv1 = 'files/csv_files/1.csv';
$pathToCsv2 = 'files/csv_files/2.csv';
$pathToCsv3 = 'files/csv_files/3.csv';
$pathToCsv4 = 'files/csv_files/4.csv';
$aCsvFiles = array($pathToCsv1, $pathToCsv2, $pathToCsv3, $pathToCsv4);
foreach ($aCsvFiles as $index => $csvFile) {
$reader = new Csv();
$reader->setDelimiter(';');
$reader->loadIntoExisting($csvFile, $spreadsheet);
$workSheet = $spreadsheet->createSheet();
$spreadsheet->setActiveSheetIndex($index + 1);
}
$writer = new Xlsx($spreadsheet);
$writer->save('files/xls_files/all.xlsx');
I only get 4.csv in all.xlsx but i have the created WorkSheets
Combining Multiple Files into a Single Spreadsheet Object
While you can limit the number of worksheets that are read from a
workbook file using the setLoadSheetsOnly() method, certain readers also
allow you to combine several individual "sheets" from different files
into a single Spreadsheet object, where each individual file is a
single worksheet within that workbook. For each file that you read, you
need to indicate which worksheet index it should be loaded into using
the setSheetIndex() method of the $reader, then use the
loadIntoExisting() method rather than the load() method to actually read
the file into that worksheet.
Example:
$inputFileType = 'Csv';
$inputFileNames = [
'./sampleData/example1.csv',
'./sampleData/example2.csv'
'./sampleData/example3.csv'
];
/** Create a new Reader of the type defined in $inputFileType **/
$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader($inputFileType);
/** Extract the first named file from the array list **/
$inputFileName = array_shift($inputFileNames);
/** Load the initial file to the first worksheet in a `Spreadsheet` Object **/
$spreadsheet = $reader->load($inputFileName);
/** Set the worksheet title (to the filename that we've loaded) **/
$spreadsheet->getActiveSheet()
->setTitle(pathinfo($inputFileName,PATHINFO_BASENAME));
/** Loop through all the remaining files in the list **/
foreach($inputFileNames as $sheet => $inputFileName) {
/** Increment the worksheet index pointer for the Reader **/
$reader->setSheetIndex($sheet+1);
/** Load the current file into a new worksheet in Spreadsheet **/
$reader->loadIntoExisting($inputFileName,$spreadsheet);
/** Set the worksheet title (to the filename that we've loaded) **/
$spreadsheet->getActiveSheet()
->setTitle(pathinfo($inputFileName,PATHINFO_BASENAME));
}
Note that using the same sheet index for multiple sheets won't append files into the same sheet, but overwrite the results of the previous load. You cannot load multiple CSV files into the same worksheet.
https://phpspreadsheet.readthedocs.io/en/develop/topics/reading-files/#combining-multiple-files-into-a-single-spreadsheet-object

How to save image metadata in Laravel 5.6

I'm trying to make an image gallery. For this purpose i'm storing the original images (right now about 7000 and in future there will be over 60.000) in the storage laravel path.
Next i make a job that stores the path and metadata(image size, resolution, mimetype, width and height) to db.
The problem is its very very slow.
this is my controller:
public function startJob() {
// Start doing Jobs
CreateDirectories::withChain([
new RecordPaths,
// new OptimizeImage,
// new SendNotification,
])->dispatch()->delay(now()->addSeconds(3));
echo 'create directories and stored paths to database!';
}
In my controller i make some jobs.
First it will make a directory where i'm storing thumbs. After this job is done the next one is RecordPaths to the DB.
And here is the problem it is very slow (image/2sec).
this is my job:
class RecordPaths implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public function handle()
{
$this->truncate();
$files = Storage::disk('gallery')->allFiles();
foreach($files as $file) {
$thumb = new Thumb;
$thumb->brand = explode("/", $file, 2)[0];
$thumb->name = array_slice(explode("/", $file),-1)[0];
$thumb->path = $file;
//
$thumb->size = $this->imageMetadata($file, 'fileSize');
$thumb->width = $this->imageMetadata($file, 'imageWidth');
$thumb->height = $this->imageMetadata($file, 'imageHeight');
$thumb->mime = $this->imageMetadata($file, 'mimeType');
//
$thumb->save();
}
}
public function truncate() {
return Thumb::truncate();
}
public function imageMetadata($file, $type) {
$metaData = [];
$metaData['mimeType'] = \Image::make(storage_path("app\public\gallery\\") . $file)->exif('MimeType');
$metaData['fileSize'] = \Image::make(storage_path("app\public\gallery\\") . $file)->exif('FileSize');
$metaData['imageWidth'] = \Image::make(storage_path("app\public\gallery\\") . $file)->exif('ExifImageWidth');
$metaData['imageHeight'] = \Image::make(storage_path("app\public\gallery\\") . $file)->exif('ExifImageLength');
return $metaData[$type];
}
}
the $files = Storage::disk('gallery')->allFiles(); in the handle method return this:
and my db after some insers:
Do have anybody any idea how to speed it up ?
First of all do only one \Image::make instead of 4, then just call ->exif method with params.
Second, replace $thumb->save(); with batch inserting. You can use this library or write own code (look this).
P.S. It will increase your execution time.
P.S.S. Also you can try to use laravel chunks or split images between few job workers.

Unable to convert an excel to pdf using php

I have installed PhpSpreadsheet and dompdf successfully using composer.
My requirement is that I need to convert an excel sheet into pdf, I got it working using the default settings, this is the code I have used.
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;
use PhpOffice\PhpSpreadsheet\Writer\Csv;
use PhpOffice\PhpSpreadsheet\Exception;
use PhpOffice\PhpSpreadsheet\IOFactory;
use \PhpOffice\PhpSpreadsheet\Writer\Pdf\Dompdf;
$spreadsheet = new Spreadsheet();
try {
$sheet = $spreadsheet->getActiveSheet();
// code to fill in the data
$spreadsheet->getActiveSheet()->fromArray(
$data, // The data to set
NULL, // Array values with this value will not be set
'A2' // Top left coordinate of the worksheet range where
);
} catch (Exception $e) {
}
$writer = new Xlsx($spreadsheet);
try {
IOFactory::registerWriter("PDF", Dompdf::class);
$pdfwriter = IOFactory::createWriter($spreadsheet, 'PDF');
$pdfwriter->save($filepath . 'pdf_test.pdf');
} catch (\PhpOffice\PhpSpreadsheet\Writer\Exception $e) {
}
I have skipped out code for brevity, this code works fine and generates a pdf file, I require the pdf to be printed in landscape mode, for that the docs mention a Custom implementation or configuration of the pdf library, so I created a file called PDFBase_DOMPDF that looks like this
use Dompdf\Dompdf;
class PDFBase_DOMPDF extends Dompdf
{
}
And I have created a file called PDFBase_Writer that looks like this.
use PhpOffice\PhpSpreadsheet\Writer\Pdf\Dompdf;
class PDFBase_Writer extends Dompdf
{
protected function createExternalWriterInstance()
{
$instance = new PDFBase_DOMPDF();
$instance->setPaper('A4', 'landscape');
return $instance;
}
}
I modified the original code to use the new pdf class so the line changed to this.
IOFactory::registerWriter("PDF", PDFBase_Writer::class);
The problem is I get an exception with the following error
Registered writers must implement PhpOffice\PhpSpreadsheet\Writer\IWriter
How exactly do I fix this?
Reading and writing to a persisted storage is not possible using the base PhpSpreadsheet classes. For this purpose, PhpSpreadsheet provides readers and writers, which are implementations of \PhpOffice\PhpSpreadsheet\Reader\IReader and \PhpOffice\PhpSpreadsheet\Writer\IWriter.
You must load the Excel file like this:
$reader = new \PhpOffice\PhpSpreadsheet\Reader\Xlsx();
$reader->setReadDataOnly(true);
$spreadsheet = $reader->load("TestRead.xlsx");
Registered writers must implement PhpOffice\PhpSpreadsheet\Writer\IWriter
PHPSpreadsheet Writer classes must implement all the methods defined in the IWriter interface. You're creating a new Writer, so it needs to provide an implementation of all those methods:
interface IWriter
{
/**
* IWriter constructor.
*
* #param Spreadsheet $spreadsheet
*/
public function __construct(Spreadsheet $spreadsheet);
/**
* Save PhpSpreadsheet to file.
*
* #param string $pFilename Name of the file to save
*
* #throws \PhpOffice\PhpSpreadsheet\Writer\Exception
*/
public function save($pFilename);
}
So your Writer needs to implement a constructor that accepts a Spreadsheet object as an argument, and a save() method that accepts a filename (as a string) argument.

PHPExcel Reading CSV Reading values incorrectly, values having special characters

I am having hard time to read values form CSV correctly, the values are being fetched but the values contains special characters e. g �5�0�0�0�.
I have to perform some calculations on the values. Casting it to float/int didn't worked.
This is how i am doing it.
$inputFileName=$this->fileName;
/** Include PHPExcel_IOFactory */
$included=require_once __SITE_PATH.DS.'assets'.DS.'script'.DS.'PHPExcel'.DS.'PHPExcel'.DS.'IOFactory.php';
if( empty($included) )
{
header('HTTP/1.1 400 Bad Request',true,400);
echo 'xml lib not found';
return false;
}
/** Load $inputFileName to a PHPExcel Object **/
//$objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
/** Identify the type of $inputFileName **/
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
/** Create a new Reader of the type that has been identified **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Advise the Reader that we only want to load cell data **/
$objReader->setReadDataOnly(true);
/** Advise the Reader of which WorkSheets we want to load **/
//$objReader->setLoadSheetsOnly($sheetname);
/** Load $inputFileName to a PHPExcel Object **/
$this->fileHandler = $objReader->load($inputFileName);
/*get the worksheet*/
$objWorksheet = $this->fileHandler->getSheet(0);
$this->eXjobs=$objWorksheet->toArray(null,true,true,true);
Now when i loop $this->eXjobs & var_dump() the values i can see the special characters.
any help will be much appreciate.

how to speed up PHPExcel reader

HI all expert i am newbie in php.
can anyone tell me to speed up PHPExcel and this is my code ,it read 20000 rows with 4 columns . it take more than 15s. thank you so much
function upload_fl($FILES){
$file = $FILES['excel'];
//echo getcwd();
//print_r($file);
require 'phpexcel/PHPExcel.php';
if(move_uploaded_file($file['tmp_name'],'C:/wamp/www/datatable_017/php/upload/'.$file['name'])){
$data = '../php/upload/'.$file['name'];
$objPHPExcel = PHPExcel_IOFactory::load($data);
$sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
//var_dump($sheetData);
echo sizeof($sheetData);
//$writefile = fopen($FILES['excel']['name'].'.txt','w');
/*
foreach($sheetData as $row){
foreach($row as $col->$value){
//$value_inser = (is_numeric($value) == true ? ''number',''.$value.'',''':''text','',''.$value.''');
//fwrite($writefile,$cate_set_id[$col].','.$com_id.','.$year.','.$value_insert.'\n');
}
}
-->
fclose($writefile);
*/
return 'upload/'.$file['name'].'.txt';
}else{
return 'upload failed';
}
}//function
If you have multiple worksheets, but don't need to load all of them, then you can limit the worksheets that the Reader will load using the setLoadSheetsOnly() method. To load a single named worksheet:
$inputFileType = 'Excel5';
$inputFileName = './sampleData/example1.xls';
$sheetname = 'Data Sheet #2';
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Advise the Reader of which WorkSheets we want to load **/
$objReader->setLoadSheetsOnly($sheetname);
/** Load $inputFileName to a PHPExcel Object **/
$objPHPExcel = $objReader->load($inputFileName);
Or you can specify several worksheets with one call to setLoadSheetsOnly() by passing an array of names:
$inputFileType = 'Excel5';
$inputFileName = './sampleData/example1.xls';
$sheetnames = array('Data Sheet #1','Data Sheet #3');
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Advise the Reader of which WorkSheets we want to load **/
$objReader->setLoadSheetsOnly($sheetnames);
/** Load $inputFileName to a PHPExcel Object **/
$objPHPExcel = $objReader->load($inputFileName);
If you only need to access part of a worksheet, then you can define a Read Filter to identify just which cells you actually want to load:
$inputFileType = 'Excel5';
$inputFileName = './sampleData/example1.xls';
$sheetname = 'Data Sheet #3';
/** Define a Read Filter class implementing PHPExcel_Reader_IReadFilter */
class MyReadFilter implements PHPExcel_Reader_IReadFilter {
public function readCell($column, $row, $worksheetName = '') {
// Read rows 1 to 7 and columns A to E only
if ($row >= 1 && $row <= 7) {
if (in_array($column,range('A','E'))) {
return true;
}
}
return false;
}
}
/** Create an Instance of our Read Filter **/
$filterSubset = new MyReadFilter();
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Advise the Reader of which WorkSheets we want to load
It's more efficient to limit sheet loading in this manner rather than coding it into a Read Filter **/
$objReader->setLoadSheetsOnly($sheetname);
echo 'Loading Sheet using filter';
/** Tell the Reader that we want to use the Read Filter that we've Instantiated **/
$objReader->setReadFilter($filterSubset);
/** Load only the rows and columns that match our filter from $inputFileName to a PHPExcel Object **/
$objPHPExcel = $objReader->load($inputFileName);
If you don't need to load formatting information, but only the worksheet data, then the setReadDataOnly() method will tell the reader only to load cell values, ignoring any cell formatting:
$inputFileType = 'Excel5';
$inputFileName = './sampleData/example1.xls';
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Advise the Reader that we only want to load cell data, not formatting **/
$objReader->setReadDataOnly(true);
/** Load $inputFileName to a PHPExcel Object **/
$objPHPExcel = $objReader->load($inputFileName);
although note that reading only the raw data like this won't allow you to differentiate between date values and floats
If you want to work with large Excel files, don't build a large PHP array in memory by making that toArray() call, because that's a big overhead in memory usage, and there's also a big cost in performance with the constant need to allocate more and more memory as the array is built..... if you're going to process a row at a time, use the iterators built into PHPExcel, or just use a loop to access each individual row in turn.

Categories