Validating SVG file in PHP with XMLReader - php

I am validating a SVG document (which I believe to be valid) against the SVG spec. I'm using XMLReader in PHP, and would rather stick with that as I will be using XMLReader elsewhere; that said if there are other stream-based readers that will do this easier/better, do let me me know.
OK, here's some code:
// Set some values for the purpose of this example
$this->path = '/Users/jon/Development/Personal/Visualised/master/test-assets/import-png.svg';
$xsdPath = '/Users/jon/Development/Personal/Visualised/master/test-assets/xsd/SVG.xsd';
$reader = new XMLReader();
$reader->open($this->path);
$valid = $reader->setSchema($xsdPath);
$reader->close();
OK, so the XSD files I've got in my xsd folder are:
SVG.xsd
xlink.xsd
xml.xsd
It seems that the parser imports the second and third XSD from the first - I want any dependencies to be stored on disk, not retrieved from the internet.
OK, here's the output:
XMLReader::setSchema(): Element '{http://www.w3.org/2001/XMLSchema}import': Skipping import of schema located at '/Users/jon/Development/Personal/Visualised/master/test-assets/xsd/xml.xsd' for the namespace 'http://www.w3.org/XML/1998/namespace', since this namespace was already imported with the schema located at 'http://www.w3.org/2001/xml.xsd'. in /Users/jon/Development/Personal/Visualised/master/lib/Visualised/Document.php on line 45
Warning: XMLReader::setSchema(): Element '{http://www.w3.org/2001/XMLSchema}attribute': The attribute 'type' is not allowed. in /Users/jon/Development/Personal/Visualised/master/lib/Visualised/Document.php on line 45
Warning: XMLReader::setSchema(): Element '{http://www.w3.org/2001/XMLSchema}attribute': The attribute 'type' is not allowed. in /Users/jon/Development/Personal/Visualised/master/lib/Visualised/Document.php on line 45
Warning: XMLReader::setSchema(): Element '{http://www.w3.org/2001/XMLSchema}attribute': The attribute 'type' is not allowed. in /Users/jon/Development/Personal/Visualised/master/lib/Visualised/Document.php on line 45
Warning: XMLReader::setSchema(): Unable to set schema. This must be set prior to reading or schema contains errors. in /Users/jon/Development/Personal/Visualised/master/lib/Visualised/Document.php on line 45
It seems like maybe I have imported the wrong version of a schema somewhere - I found all the XSD docs just through a web search. Any ideas?
Edit: the last error suggests the schema should be set before reading the document. OK, so I've changed the code to this:
$reader = new XMLReader();
$valid = $reader->setSchema($xsdPath);
$reader->open($this->path);
$reader->close();
-- some of the initial warnings go, but I still get the Unable to set schema one.

The XSD file for SVG you link to is from an old working draft version of SVG 1.1. There's currently no officially supported XML schema for SVG 1.1. Please see this answer for more details.

Related

Edit Excel XML Worksheet File from XLSM in PHP

I have an XLSM File, where I need to edit some Cell Values with PHP. As I couldn't find a proper library, which can actually edit an xlsm file (most read excel and create a whole new excel file, which in this case would delete the macros inside the excel or even throw too many exceptions), I decided to unzip the xlsm file and directly edit the worksheet xml file by changing the values in the cells:
<c r="K15" s="52">
<v>83221.56</v>
</c>
For example I would change the Value inside the "v" Tag.
As Simple XML doesnt work, because it messes up some namespaces inside the file, I decided to edit it with Regular Expressions.
So far so good - i got the change in the file. But Formulars inside the Excel file, that depend on the cell I just changed the Value in won't recognize my change. When you open the Excel file, it properly shows the correct value, but other cells that use that changed value in their formula won't update.
Does anyone have any idea how to properly change the XML File and keeping the excel in tact?
Thanks!
As I could not figure out a solution in PHP and previous solutions with C++ (Is there a PHP library for XLSM(Excel with Macro) parsing/editing?) where to complicated for me, I found a solution with python, I want to share.
My environment is Ubuntu 16.04, I have Python installed. I have installed https://editpyxl.readthedocs.io/en/latest/
I placed a little script in the same directory as the PHP script, which I call with PHP:
from editpyxl import Workbook
import sys
import logging
logging.basicConfig()
if len(sys.argv) != 4:
print("Three arguments accepted, got " + (str(len(sys.argv) -1)))
print("Argument 1: Sheet name, Argument 2: Cell Identifier, Argument 3: New Value")
sys.exit();
wb = Workbook()
source_filename = r'OriginalFile.xlsm'
wb.open(source_filename)
ws = wb[sys.argv[1]]
ws.cell(sys.argv[2]).value = sys.argv[3]
destination_filename = "NewFile.xlsm"
wb.save(destination_filename)
wb.close()
In PHP I call it via
exec('python excel.py "SheetName" "CellName" "NewValue"')
Seems to be a workaround but it works (especially on Linux) and is very easy to implement. This solution has a performance limitation though. The python script reads, changes the value and saves the excel in each runtime. If you only have some values to change, this might not be a problem but if you plan to edit larger Excel Files with a larger amount of cells to edit, you might write the complete code that edits the xlsm in python.
This code, however, works for me. It edits the Excel and all Formulars/Calculations inside stay fine, also the Macros are still untouched.

League CSV package - reading one line at a time from a resource/stream

I'm using The PHP League CSV importer/exporter to import a large CSV file in Laravel. Since the file is large, I would like to stream it to the CSV parser and handle it one line at a time, without loading every line into memory.
Laravel uses flysystem for the underlying filesystem, and I am using that to obtain a PHP resource to the source CSV.
What I don't understand is how - if it is at all possible - I can feed that resource stream into League CSV so that it reads one line at a time for me to process, before reading in the next line. All the documentation seems to imply that a CSV file is always read fully into memory, and that is what I want to avoid.
Do I need to use callbacks? If so, how can I be sure the stream resource is only being read one line at a time as needed, and not all at once?
I'm guessing I start by creating a stream reader?
use League\Csv\Reader;
$reader = Reader::createFromStream($resource, 'r');
You can iterate over the rows without loading the whole file by using the IteratorAggregate interface of the Reader. So you basically just do
foreach ($reader as $row) {
// do stuff
}
If you are using a mac to read or create the CSV file you will need to add this to your code for it to work correctly:
if (!ini_get("auto_detect_line_endings")) {
ini_set("auto_detect_line_endings", '1');
}

File downloaded via Podio Export API is not readable through code

Use-case:
Export data from different apps using batch EXPORT APIs to merge into a master excel file and use it for reporting purpose...
Implementation Details
I'm using Podio export API to get data from an application. Application name is Kall8-number-text (as an example).. here is the code snippet
Code Snippet
$batch_id = PodioItem::export(11804702,"xlsx",array("filters" => array( "kall8-number-text" => "510-592-5916") ));
PodioBatch::get( $batch_id );
$file = PodioFile::get($file_id);
// Download the file. This might take a while...
$file_content = $file->get_raw();
// Store the file on local disk
$path_to_file= "downloads/".$name;
file_put_contents($path_to_file, $file_content);
Problem Description:
I'm trying to read downloaded file using phpexcel library but getting error "You tried to set a sheet active by the out of bounds index: 0. The actual number of sheets is 0"
This error shows that file has NO sheet but it is not true. File has data/sheet and it shows upon opening that file.
One interesting fact, if I open the same excel file (manually by double click) and SAVE without making any change, then same code works fine. In my end to end process, I cannot add a manual step to open file every-time to proceed further...
For your information, I thought this is a PHPExcel bug and contacted Mark Backer (coordinator PHPOffice Suit) and he replied with following remarks which seems true.
"My guess would be non-standard namespacing in the file that's generated, which loading and saving in MS Excel fixes"
File Reading Code
$objReader = new PHPExcel_Reader_Excel2007();
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load('callsheet.xlsx');
$objPHPExcel->setActiveSheetIndex(0);
$dataArray = $objPHPExcel->getActiveSheet()->toArray(null, true,true,true);
var_dump($dataArray);
Error Trace
Fatal error: Uncaught exception 'PHPExcel_Exception' with message 'You tried to set a sheet active by the out of bounds index: 0. The actual number of sheets is 0.' in E:\xampp\htdocs\podioexcel\Classes\PHPExcel.php:688 Stack trace: #0 E:\xampp\htdocs\podioexcel\test.php(18): PHPExcel-
setActiveSheetIndex(0) #1 {main} thrown in E:\xampp\htdocs\podioexcel\Classes\PHPExcel.php on line 688
Can you help me to address this issue? This is holding up my project completely.
File Path: https://drive.google.com/file/d/0B79S561prrEBUDY1NEhXQ1JySWM/view
Original question at Stackoverflow: Error While loading excel sheet Using phpexcel
Ejaz

PHPExcel - Clone sheet and keep its original style

I've tried to check every possible similar solution both here and in the PHPExcel official documentation / forums, but I didn't find any solution to my issue.
The problem
I'm trying to clone (or copy, being honest) a sheet to parse it into another file created through phpexcel by keeping the style of the cloned sheet.
The setup is:
sheet.xls <--- File to OPEN & COPY
PHPExcel object <-- File that gets created X times in a for loop, where I need to append Y Sheets according to a set of arrays.
What works
The cloning & appending works beautifully, takes time because of some strange notices related to a phpexcel file:
Notice: Undefined offset: 1 in \serverpath\PHPExcel\Classes\PHPExcel.php on line 729
Notice: Undefined offset: 2 in \serverpath\PHPExcel\Classes\PHPExcel.php on line 729
Notice: Undefined offset: 3 in \serverpath\PHPExcel\Classes\PHPExcel.php on line 729
Notice: Undefined offset: 4 in \serverpath\PHPExcel\Classes\PHPExcel.php on line 729
EDIT ::
Line 729 refers to this:
foreach ($sheet->getCellCollection(false) as $cellID) {
$cell = $sheet->getCell($cellID);
++$countReferencesCellXf[$cell->getXfIndex()]; // line 729
}
Which is about styles as far as I can tell.
<-- There are thousand of these, no idea where they are coming from though, the files are getting generated correctly, they just lose their format as said above.
What doesn't work
The generated files LOSES the original format but keeps the formula, hence every single border (and any style) of the original "template" (sheet.xls) is lost.
The relevant part of the code
I'm only posting the really relevant code here, mostly because it's about a thousand lines of code.
File that will later be saved creation (happens in parent foreach) :
$file = new PHPExcel();
Cloning (happens inside a child foreach after the creation above) :
$sd = $objReader->load("sheet.xls");
$sc = $sd ->getActiveSheet()->copy();
$clonedSheet = clone $sc;
Appending (happens N times inside a child foreach of the cloning above) :
$ficheName = "not relevant tbh and less than 31 characters";
$temporarySheet = clone $clonedSheet;
$temporarySheet->setTitle($ficheName);
$file->addSheet($temporarySheet,0);
$file->setActiveSheetIndex($file->getIndex($temporarySheet));
unset($temporarySheet);
// some actions are done here
Saving (outside of the foreach, happens in the same foreach where the PHPExcel object gets created:
$objWriter = PHPExcel_IOFactory::createWriter($file, 'Excel5');
$objWriter->save($filename);
Restrictions
I have absolutely no restrictions about what kind of excel format I'm supposed to use, I'm using 2003 because I have some machines that only works with excel 2003, but they will soon be upgrading to office 2010, so literally any reader and writer is okay, I'm using 2003 because I've always used it and had no problem so far.
I am forced, though, to clone the XLS sheet inside another file, the only possible trick I can do is clone the sheet inside the same file and save it later by keeping the original one, but if there is any other chance to "export" the style I would really appreciate it.
What I have already checked:
PHPExcel clone .xlsm with macros
http://www.mindfiresolutions.com/Cloning-a-XLS-worksheet-in-PHP--Mindfire-Solutions-933.php
PHPExcel 1.8.0 - Creating many sheets by cloning a template sheet gets slower with each clone
Workaround for copying style with PHPExcel
EDIT ::
I've also tried to:
Open the file and get the sheet instead of cloning the original one - Problem persists.
Tried to use Excel2007 both for reading and writing - Problem persist.
Tried NOT to use ->copy() - Problem persists.
UPDATED phpexcel to 1.8, now the Notice above appears on line 1079, but refers to the same exact piece of code - Problem persists.
Okay, I've figured out a possible workaround.
Because the problem seems to be with:
clone
PHPExcel Worksheet ->copy() prototype
Referencing PHPExcel Worksheet
I've thought about that:
Instead of creating a new PHPExcel object instance, just OPEN the original file.
Append the file with other instances of the same file, by copying the sheet still from the same file.
Remove the LAST sheet when finished.
So, in a nutshell, I've changed this:
$file = new PHPExcel();
To this:
$file = $objReader->load("sheet.xlsx"); // decided to work with excel2007
And this:
$objWriter = PHPExcel_IOFactory::createWriter($file, 'Excel5');
$objWriter->save($filename);
To this:
$sheetCount = $file->getSheetCount();
$file->removeSheetByIndex($sheetCount - 1);
$objWriter = PHPExcel_IOFactory::createWriter($file, 'Excel2007'); // same story, excel 2007 instead of 2003
$objWriter->save($filename);
Now I don't have any error and everything is working as expected, despite I'm sure that there may be another cleverer solution.
If you don't change the format of sheet.xls then try to
A) use .xlsx
B) rename *.xlsx to *.zip
C) unzip sheet.zip, and the files you haved saved
D) copy the .xls/styles.xml from sheet to the saved files
E) repack and rename *.zip to *.xlsx
and your format is back.
You can minimize the problem a bit by not generating in a loop in php but rather run the php in a loop.

PHPExcel: 'PHPExcel_Reader_excel.php' not found

i'm using phpexcel and i have a problem: when creating a reader object i get this error:
Fatal error: Class 'PHPExcel_Reader_excel.php' not found in C:\xampp\htdocs\phpexcel\Classes\PHPExcel\IOFactory.php on line 170
my code is:
<?php
require_once(dirname(__FILE__)."/Classes/phpexcel.php");
//or
require_once(dirname(__FILE__)."/Classes/PHPExcel/IOFactory.php");
//$phpexcel = new PHPExcel();
$reader = PHPExcel_IOFactory::createReader("excel.php");
?>
i checked IOFactory.php on line 170 and found this:
$searchType = 'IReader';
// Include class
foreach (self::$_searchLocations as $searchLocation) {
if ($searchLocation['type'] == $searchType) {
$className = str_replace('{0}', $readerType, $searchLocation['class']);
$instance = new $className();
if ($instance !== NULL) {
return $instance;
}
}
}
but it is not possible to locate any class because they are using _ instead of / (the path is phpexcel\Classes\PHPExcel\Reader and there are files like excel5.php excel2007.php but not excel.php)
what is wrong? documentation is a litle bit confusing
Unless you've added a custom reader of your own called PHPExcel_Reader_excel.php then this will return an error.
As described in section 1 of PHPExcel User Documentation - Reading Spreadsheet Files online and in the /Documentation folder, there are 7 different readers available for 7 different spreadsheet formats:
PHPExcel can read a number of different spreadsheet file formats, although not all features are supported by all of the readers. Check the Functionality Cross-Reference document (Functionality Cross-Reference.xls) for a list that identifies which features are supported by which readers.
Currently, PHPExcel supports the following File Types for Reading:
Excel5
The Microsoft Excel™ Binary file format (BIFF5 and BIFF8) is a binary file format that was used by Microsoft Excel™ between versions 95 and 2003. The format is supported (to various extents) by most spreadsheet programs. BIFF files normally have an extension of .xls. Documentation describing the format can be found online at http://msdn.microsoft.com/en-us/library/cc313154(v=office.12).aspx or from http://download.microsoft.com/download/2/4/8/24862317-78F0-4C4B-B355-C7B2C1D997DB/[MS-XLS].pdf (as a downloadable PDF).
Excel2003XML
Microsoft Excel™ 2003 included options for a file format called SpreadsheetML. This file is a zipped XML document. It is not very common, but its core features are supported. Documentation for the format can be found at http://msdn.microsoft.com/en-us/library/aa140066%28office.10%29.aspx though it’s sadly rather sparse in its detail.
Excel2007
Microsoft Excel™ 2007 shipped with a new file format, namely Microsoft Office Open XML SpreadsheetML, and Excel 2010 extended this still further with its new features such as sparklines. These files typically have an extension of .xlsx. This format is based around a zipped collection of eXtensible Markup Language (XML) files. Microsoft Office Open XML SpreadsheetML is mostly standardized in ECMA 376 (http://www.ecma-international.org/news/TC45_current_work/TC45_available_docs.htm) and ISO 29500.
OOCalc
aka Open Document Format (ODF) or OASIS, this is the OpenOffice.org XML File Format for spreadsheets. It comprises a zip archive including several components all of which are text files, most of these with markup in the eXtensible Markup Language (XML). It is the standard file format for OpenOffice.org Calc and StarCalc, and files typically have an extension of .ods. The published specification for the file format is available from the OASIS Open Office XML Format Technical Committee web page (http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office#technical). Other information is available from the OpenOffice.org XML File Format web page (http://xml.openoffice.org/general.html), part of the OpenOffice.org project.
SYLK
This is the Microsoft Multiplan Symbolic Link Interchange (SYLK) file format. Multiplan was a predecessor to Microsoft Excel™. Files normally have an extension of .slk. While not common, there are still a few applications that generate SYLK files as a cross-platform option, because (despite being limited to a single worksheet) it is a simple format to implement, and supports some basic data and cell formatting options (unlike CSV files).
Gnumeric
The Gnumeric file format is used by the Gnome Gnumeric spreadsheet application, and typically files have an extension of .gnumeric. The file contents are stored using eXtensible Markup Language (XML) markup, and the file is then compressed using the GNU project's gzip compression library. http://projects.gnome.org/gnumeric/doc/file-format-gnumeric.shtml
CSV
Comma Separated Value (CSV) file format is a common structuring strategy for text format files. In CSV flies, each line in the file represents a row of data and (within each line of the file) the different data fields (or columns) are separated from one another using a comma (“,”). If a data field contains a comma, then it should be enclosed (typically in quotation marks ("). Sometimes tabs “\t” or the pipe symbol (“|”) are used as separators instead of a comma. Because CSV is a text-only format, it doesn't support any data formatting options.
You need to specify the reader by name when you use the createReader() method, e.g:
$reader = PHPExcel_IOFactory::createReader("Excel5");
There are plenty of examples in the /Examples folder showing this usage for different readers, for letting PHPExcel itself select the correct reader using load(), and for verifying that your file is of the correct format before setting the reader using the identify() method
I have to confess, I'd thought this documentation was fairly straightforward, especially with the examples that are included
To make it easier you could use
$objReader = PHPExcel_IOFactory::createReaderForFile($file);
and it will automatically pick a reader for your file

Categories