How to avoid PHPExcel re-renders graphs in template? - php

I am exporting info from MySQL to an Excel2007 file.
The script basically reads a template Excel2007 file, adds some data and writes it in a new Excel2007 file.
The template has some graphs within it.
The issue is that they get re rendered and they are slightly different.
(axis titles orientation changed, grpah title orientation changed, scale is changed...)
Is there any way to just add data withoutPHPExcel to interpret and compile the rest of the file?
Thanks!
Gabrie

'fraid not - PHPExcel is not a file editor. The Reader parses the file into the elements that PHPExcel recognises, discarding anything that it doesn't recognise. The Writer takes its elements from the PHPExcel object, unaware that there may have been elements discarded when it was originally loaded.
So it will either discard the chart completely (unless you set includeCharts to TRUE); or it will recreate the chart with some default values for elements or attributes that it doesn't yet handle; in more extreme cases, some features will be lost (secondary axes being a prime example); while in the most extreme it might completely corrupt a chart (but hopefully this should be extremely rare).
Text orientation isn't yet handled.
Scale is set purely to default
(automatic scaling)
In time, more features will be added to charting (handling of stock charts is my next target), but until then it's still limited.

I think I have found the answer in a library: TinyButStrong - http://www.tinybutstrong.com/plugins/opentbs/demo/demo.html
Great for templating! Would be awsome to have them join forces with PHPExcel.
Thanks All for the time spent.

Related

How to export data to Excel template, retaining VBA code and data validation, using PHP?

I need to export data to an Excel template that contains VBA code and data validation in PHP.
I tried using PHPExcel library but it is removing the VBA code and data validations from the template.
I tried using PHPReport library, didn't get proper solution.
The template contains multiple worksheets and they are interdependent.
E.g.: Worksheet 1 contains employee data, then worksheet 2 contains salary with respect to employee name.
I have spent a great deal of time working on this problem, and the problem of memory consumption for large data sets. All of the PHP libs I have found keep all of the cells in memory, which is not viable for anything more than a small sheet.
What I ended up doing was writing a set of Java utilities using Apache POI and packaged them with PHP/Java Bridge so I can call them from PHP. This will allow you to create a new workbook based upon a template, keeping the macros intact. You can also use POI's streaming API, so you can handle massive data sets without crashing your server.
If you have Java chops, I highly recommend going this route, it's really the only way to do brain surgery on Excel files from PHP.
If you have any questions about how to do this or would like some example code, I'll be glad to help.

Set the zoom level for the html file generated using PHPExcel?

I am trying to generate a HTML file using PHPExcel. I have more than 30 columns and would like to zoom the page.
I've tried using the code below, but hasn't worked.
$objPHPExcel->getActiveSheet()->getPageSetup()->setFitToPage(true);
$objPHPExcel->getActiveSheet()->getPageSetup()->setFitToWidth(2);
I've also tried using this, but hasn't worked either.
$objPHPExcel->getActiveSheet()->getSheetView()->setZoomScale(250);
Both options don't work in the case of HTML page, but the zoom works if it is an Excel file. Where am I going wrong?
As you may see from the chaining of commands, the setZoomScale() method is part of the Worksheet class and it will only have an impact in case a worksheet is written, not when it is read.
The PHPDoc for PHPExcel lists the internal command _writeSheetViews() for Excel2007 writer and _storeZoom for old Excel versions whereas PHPExcel_Writer_HTML doesn't offer similar behavior.
What you may try is adding custom CSS styling to the created HTML file that may use smaller font-sizes for the table. AFAIK you won't be able to change the zoom level of the browser programmatically.

Extracting dynamically changing data in excel via php

I have an open excel sheet that's constantly being updated by another program via DDE. I wish to have a php script that accesses some of the data in this excel sheet. I have tried using PHPExcel and it seems that I cannot have the changes I make (e.g. via setCellValue) being immediately reflected in the open excel sheet. Similarly, if I change value of a cell (without saving sheet to the file system) the new value of the cell is not available via getValue().
Is this functionality supported by phpExcel? If so, could someone please point me to documentation that shows how this can be done? Alternatively, is there another way (not using phpExcel, for example) to do this?
Thanks.
I was able to do this using the method shown at the rarified blog webpage
This worked for me, both for "pushing" cell values from php to excel, as well as getting modified values (without saving the file) from excel to perl. This site also has a nifty ajax-based function that keeps auto-refreshing my webpage with the latest values in excel.
Many thanks to the author of the blog.
It's not supported by PHPExcel.... PHPExcel loads the workbook into memory at the point in time when you issue the load() call , and at that point it can't "autorefresh" whenever the workbook is changed by your DDE because the DDE update is to the workbook on disk, not the PHPEXcel copy that's in PHP memory.
You'd need to be constantly loading and reloading to pick up changes to the underlying file.
Likewise, if you change the workbook in PHPExcel, it doesn't write that change back to the file on disk unless you explicitly save(), so the change will not be visible to your DDE program.
I'm not aware whether you can even do this with MS Excel itself... if you load a workbook using MS Excel itself, you're loading from disk into memory, and if anything else is accessing that workbook at the time, you find that you've loaded it in read-only mode, and (as far as I'm aware) it won't automatically refresh whenever the DDE program updates the original version. If anything can work with this the way you need, it's likely to be COM, but I wouldn't build up your hopes too much.

PHPExcel large data sets with multiple tabs - memory exhausted

Using PHPExcel I can run each tab separately and get the results I want but if I add them all into one excel it just stops, no error or any thing.
Each tab consists of about 60 to 80 thousand records and I have about 15 to 20 tabs. So about 1600000 records split into multiple tabs (This number will probably grow as well).
Also I have tested the 65000 row limitation with .xls by using the .xlsx extension with no problems if I run each tab it it's own excel file.
Pseudo code:
read data from db
start the PHPExcel process
parse out data for each page (some styling/formatting but not much)
(each numeric field value does get summed up in a totals column at the bottom of the excel using the formula SUM)
save excel (xlsx format)
I have 3GB of RAM so this is not an issue and the script is set to execute with no timeout.
I have used PHPExcel in a number of projects and have had great results but having such a large data set seems to be an issue.
Anyone every have this problem? work around? tips? etc...
UPDATE:
on error log --- memory exhausted
Besides adding more RAM to the box is there any other tips I could do?
Anyone every save current state and edit excel with new data?
I had the exact same problem and googling around did not find a valuable solution.
As PHPExcel generates Objects and stores all data in memory, before finally generating the document file which itself is also stored in memory, setting higher memory limits in PHP will never entirely solve this problem - that solution does not scale very well.
To really solve the problem, you need to generate the XLS file "on the fly". Thats what i did and now i can be sure that the "download SQL resultset as XLS" works no matter how many (million) row are returned by the database.
Pity is, i could not find any library which features "drive-by" XLS(X) generation.
I found this article on IBM Developer Works which gives an example on how to generate the XLS XML "on-the-fly":
http://www.ibm.com/developerworks/opensource/library/os-phpexcel/#N101FC
Works pretty well for me - i have multiple sheets with LOTS of data and did not even touch the PHP memory limit. Scales very well.
Note that this example uses the Excel plain XML format (file extension "xml") so you can send your uncompressed data directly to the browser.
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats#Excel_XML_Spreadsheet_example
If you really need to generate an XLSX, things get even more complicated. XLSX is a compressed archive containing multiple XML files. For that, you must write all your data on disk (or memory - same problem as with PHPExcel) and then create the archive with that data.
http://en.wikipedia.org/wiki/Office_Open_XML
Possibly its also possible to generate compressed archives "on the fly", but this approach seems really complicated.

How do extract text layer and background layer from pdf?

In my project I've to do a PDF Viewer in HTML5/CSS3 and the application has to allow user to add comments and annotation. Actually, I've to do something very similar to crocodoc.com.
At the beginning I was thinking to create images from the PDF and allow user create area and post comments associates to this area. Unfortunately, the client wants also navigate in this PDF and add only comments on allowed sections (for example, paragraphs or selected text).
And now I'm in front of one problem that is to get the text and the best way to do it. If any body has some clues how I can reach it, I would appreciate.
I tried pdftohtml, but output doesn't look like the original document whom is really complex (example of document). Even this one doesn't reflect really the output, but is much better than pdftohtml.
I'm open to any solutions, with preference for command line under linux.
I've been down the same road as you, with even much more complex tasks.
After trying out everything I ended up using C# under Mono (so it runs on linux) with iTextSharp.
Even with a very complete library such as iTextSharp, some tasks required allot of trial-and-error :)
To extract the text from a page is easy (check the below snipper), however if you intend to keep the text coordinates, fonts and sizes, you will have more work to do.
int pdf_page = 5;
string page_text = "";
PdfReader reader = new PdfReader("path/to/pdf/file.pdf");
PRTokeniser token = new PRTokeniser(reader.GetPageContent(pdf_page));
while(token.NextToken())
{
if(token.TokenType == PRTokeniser.TokType.STRING)
{
page_text += token.StringValue;
}
else if(token.StringValue == "Tj")
{
page_text += " ";
}
}
Do a Console.WriteLine(token.StringValue) on all tokens to see how paragraphs of text are structured in PDFs. This way you can detect coordinates, font, font size, etc.
Addition:
Given the task you are required to do, I have a suggestion for you:
Extract the text with coordinates and font families and sizes - all information about each paragraph. Then, to a PDF-to-images, and in your online viewer, apply invisible selectable text over the paragraphs on the image where needed.
This way your users can select a part of the text where needed, without the need of reconstructing the whole PDF in html :)
I recently researched and discovered a native PHP solution to achieve this using FOSS. The FPDI PHP class can be used to import a PDF document for use with either the TCPDF or FPDF PHP classes, both of which provide functionality for creating, reading, updating and writing PDF documents. Personally, I prefer TCPDF as it provides a larger feature set (TCPDF vs. FPDF), a richer API (TCPDF vs. FPDF), more usage examples (TCPDF vs. FPDF) and a more active community forum (TCPDF vs. FPDF).
Choose one of the before mentioned classes, or another, to programmatically handle PDF documents. Focusing on both current and possible future deliverables, as well as the desired user experience, decide where (e.g. server - PHP, client - JavaScript, both) and to what extent (feature driven) your interactive logic should be implemented.
Personally, I would use a TCPDF instance obtained by importing a PDF document via FPDI to iteratively inspect, translate to a common format (XML, JSON, etc.) and store the resulting representation in relational tables designed to persist data pertinent to the desired level of document hierarchy and detail. The necessary level of detail is often dictated by a specifications document and its mention of both current and possible future deliverables.
Note: In this case, I strongly advise translating documents and storing them in a common format to create a layer of abstraction and transparency. For example, a possible and unforeseen future deliverable might be to provide the same application functionality for users uploading Microsoft Word documents. If the uploaded Microsoft Word document was not translated and stored in a common format then updates to the Web service API and dependent business logic would almost certainly be necessary. This ultimately results in storing bloated, sub-optimal data and inefficient use of development resources in designing, developing and supporting multiple translators. It would also be an inefficient use of server resources to translate outbound data for every request, as opposed to translating inbound data to an optimal format only once.
I would then extend the base document tables by designing and relating additional tables for persisting functionality specific document asset data such as:
Versioned Additions / Edits / Deletions
What
Header / Footer
Text
Original Value
New Value
Image
Page(s) (one, many or all)
Location (relative - textual anchor, absolute - x/y coordinates)
File (relative or absolute directory or url)
Brush (drawing)
Page(s) (one, many or all)
Location (relative - textual anchor, absolute - x/y coordinates)
Shape (x/y coordinates to redraw line, square, circle, user defined, etc.)
Type (pen, pencil, marker, etc.)
Weight (1px, 3px, 5px, etc.)
Color
Annotation
Page
Location (relative - textual anchor, absolute - x/y coordinates)
Shape (line, square, circle, user defined, etc.)
Value (annotation text)
Comment
Target (page, another text/image/brush/annotation asset, parent comment - threading)
Value (comment text)
When
Date
Time
Who
User
Once some, all or more, of the document and its asset data has a place to persist I would design, document and develop a PHP Web service API to expose CRUD and PDF document upload functionality to the UI consumer, while enforcing core business rules. At this point, the remaining work now lies on the Client-side. Currently, I have relational tables persisting both a document and its asset data, as well as an API exposing sufficient functionality to the consumer, in this case the Client-side JavaScript.
I can now design and develop a Client-side application using the latest Web technologies such as HTML5, JavaScript and CSS3. I can upload and request PDF documents using the Web service API and easily render the returned common format out to the browser however I decide (probably HTML in this case). I can then use 100% native JavaScript and/or 3rd party libraries for DOM helper functionality, creating vector graphics to provide drawing and annotation features, as well as access and control functional and stylistic attributes of currently selected document text and/or images. I can provide a real-time collaborative experience by employing WebSockets (before mentioned WebService API does not apply), or a semi-delayed, but still fairly seamless experience using XMLHttpRequest.
From this point forward the sky is the limit and the ball is in your court!
It's a hard task you're trying to accomplish.
To read text from a PDF, have a look at PEAR's PDF_Reader proposal code.
There's also a very extensive documentation around Zend_PDF(), which also allows the loading and parsing of a PDF document. The various elements of the PDF can be iterated on and thus also being transformed to HTML5 or whatever you like. You may even embed the notations from your website into the PDFs and vice versa.
Still, you have been given no easy task. Good Luck.
pdftk is a very good tool to do thinks like that (I don't know if it can do exactly this task).
http://www.pdflabs.com/docs/pdftk-cli-examples/

Categories