Here is my problem: My organization wants to upload word documents from users to the server. On the server side, the word document (enforced with styles) needs to be converted to XML format files. Next, I need to use php to parse the open xml formats files and put the content into the database. Does anyone know how to convert word to XML on server side automatically?Is there any API or sample codes for php to parse Open XML Formats? Your suggestions are appreciated.
Have you looked at using VBA?
I have had to do similar work and I've used VBA within a WSF or VBS file. If you're server is a Windows environment it will run right from the OS. You can execute this from PHP (not recommended) or drop the Docx file into a hot folder outside of the web server environment. I recommend the latter since the web server env. can introduce security issues.
Another note, if you want to separate content from styling, you're going to need to perform some post-processing on the output markup. Word is a "word" processor so styling is what it is designed to do. If this is a requirement, I would suggest moving to a structured, XML-based authoring tool instead.
Hope this helps!
Related
Please see the screenshot below. I have been sent this file from a client. The file is exported from another piece of software and it loads up in word as a fully styled and presented document, but the filename as "xml" extension.
This is a bit confusing, the document starts in the source as XML but then has a load of what looks like encrypted garbage after it, so it's not XML at all is it? So I don't quite understand why their software marks it as XML when it's not.
My second question is: Is this actually a word document? Normally it would be .doc or .docx, I have never seen a word document saved with .xml before.
Final question: How would we now then use php to convert this to a PDF? I don't mind using third party software installed on the server and run via command line, I just need to figure out how to convert this. I could do it in C# but it's preferred in PHP so that our backend server can do the work rather than the desktop software we have that links with it.
Is there is any way to displaying world document,excel sheet and power point in browser with out downloading.
I assume that you are going to use php for this, so you can try checking some libraries such as PHPWord by Microsoft for example.
If you wish to only display the document content, it is possible to do using some scripting language such as php. Basically office 2007+ formats are zipped XML documents with changed extension. Make a simple word 2007+ document, save it and change extension from .docx to .zip, than you can extract it and see what it's made of. You can find a lot of details here. Now displaying content may be a little tricky. As mentioned, there are libraries out there to handle this, but how will they handle the documents, I am not really sure. Most of them are abandoned, PHPword is in beta since 2011.
There are some indications that Apache is working on cloud version of Open office, but there is no release date yet. Once done, you will have a full featured office suite web app.
If you feel really creative you could use cron job (or scheduled task if you like Windows) to open a document, take a screenshot and basically make .jpg or .png version of the document (works fine with short documents, longer ones may be problematic), displaying it in a browser without much complication. It is also possible to schedule export to .pdf - all browsers do have Adobe PDF plugins.
To sum up, using php for parsing simple documents should be fine, but getting complex docs to display properly, may be much more difficult task and possibly not worth your time. I would go for cron export to pdf, to preserve most if not all of the document's structure.
I don't know much about Delphi / ClientDataSets but I'm willing to look into it. I have a question before I pursue it though, to determine if what I want to achieve is feasible.
I want to use a PHP script to save a dozen subsets of my MYSQL database to CDS files once weekly. Is there a File specification that I can follow to create a CDS file? I'll be running the script on a shared web host using Linux, so I don't think running Delphi scripts on the server is viable.
Thanks!
There is a related question on Stackoverflow which includes a partial XSD:
Anyone that has a partial XSD that describes the METADATA section of Delphi TClientDataSet XML files?
You can use this XSD and an XML library to create XML files from your data which are compatible with TCLientDataSet, so they can be opened in a Delphi application.
I don't know PHP XML libraries, but in many languages XML libraries are able to create mapping code based on the XSD, which then can be used to read and write XML files based on the schema definition.
We run multiple Windows/IIS/.Net sites (up to 30+ sites per server). Each site is customized for the individual customer via a configuration file that contains the settings.
I am tasked with writing a small tool that will 'grep' all of the config files on a certain server for a particular config setting (or settings) and return the values for a nice tabled web page display. It will save many groups lots of time, especially since most groups don't have access to production servers, but they need to know how a customer is currently configured.
I have working code that finds all .config files from a starting path, I can easily extend this to do my grep'ing. Here are the challenges:
I want to aggregate this data from MULTIPLE servers. That means, the tool will be hosted on its own server -- and will make calls to a list of servers.
I'm limited to using .NET/ASP on the actual servers (they won't install PHP on IIS), but I'm writing the tool in PHP.
PROPOSED DESIGN: From my vantage point, I'm thinking the best way to accomplish this is to write my PHP tool and have it make AJAX or CURL requests to ASP scripts that live on each server in the list. Each ASP script could do the recursive directory parsing to find the config files and individually grep the files for the data, and return it in the RESPONSE.
Is that the best way to accomplish this? Should the ASP or PHP side do the 'heavy lifting'? Is their a recommended data format I should be using to pass the data.
Any ideas or samples would be great. If you need more info, I can provide!
Thanks!
Update: Here's an example of a config. Its a basic ASP file that gets included in other scripts.
custConfig1 = " 8,9,6:5:5 "
custConfig2 = " On "
I think you're bang on using PHP for the "receiving" script, and pretty sure you have that in hand.
Based on the format of your example config file, you could use ExecuteGlobal in classic ASP to load each file as you loop through them in your recursive directory lookup. Then you can use the custConfig1 et al. names in your script. e.g. (pseudo)
for each file
output("custConfig1") = custConfig1
next
Return what you need as JSON using a handy library and then do all the "hard" work of collating it and outputting it in PHP.
Yes, "grep" (if by that you mean importing a text file and using reg expressions to navigate it) isn't the best solution, in my humble opinion, use either JSON or XML as the format, and use PHP's built in XML or JSON tools.
JSON: http://php.net/manual/en/book.json.php
XML: http://php.net/manual/en/book.simplexml.php
You could use the DOM to navigate XML alternatively to SimpleXML, but SimpleXML is easier to learn (again, in my opinion) and will work for your needs.
I realise this may just be speculation, but I'd appreciate comments from anyone who has some insight into this.
Something like MS Word COM add-in, or an OO bridge, or a custom implementation.
The reason I want to know is that I want to provide basic online document editing (really basic, basically just rich text at this point) for a php web app. I'm guess I will store the markup in html format then convert to rtf/doc etc for user convenience.
The Apache POI project (written in Java) offers an interface to many file types from the MS Office suite.
You can run the Java code from within PHP using the PHP/Java bridge.
I used this once for an application where MS Word documents had to be indexed in a web application. I remember that setting everything up was quite a hassle, but then it worked very well and reasonably fast. (Unfortunately, the code was written in PHP4 and I don't own it, so I cannot help you out with any snippets here.)
P.S. I cannot post links since I'm a new user, so google for "Apache POI" and "PHP/Java bridge" to get to the respective project's homepage.
This class might help you. I've never used it but here are some links:
Reading from a Word Document with COM in PHP
create a word document
Create Word Document using PHP in Linux
They have probably written their own, maybe starting from wvWare or something similar. I have noticed that Google Desktop on Linux seems to use wvWare to parse MS Word documents.
The documentation for the Word file formats is available, but reading through it makes you realize that it would not be an easy task.
Automating Word or OpenOffice would be the easiest, but there might be licensing issues with using Word like that, and possible concurrency issues with using either of them on a web server.
A popular way to do it is to generate RTF with the file extension .doc. It works fine with Word and other editors, and users remain happy that it is "a DOC file"