I need help...I want to view a word document (.doc or docx) with HTML using php code without downloading it. I make use of this code but it didn't work... it show it as a text( no margin, paragraph, not as it was on docx)
<?php
function docx2text($filename) {
return readZippedXML($filename, "word/document.xml");
}
function readZippedXML($archiveFile, $dataFile) {
// Create new ZIP archive
$zip = new ZipArchive;
// Open received archive file
if (true === $zip->open($archiveFile)) {
// If done, search for the data file in the archive
if (($index = $zip->locateName($dataFile)) !== false) {
// If found, read it to the string
$data = $zip->getFromIndex($index);
// Close archive file
$zip->close();
// Load XML from a string
// Skip errors and warnings
$xml = new DOMDocument();
$xml->loadXML($data, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
// Return data without XML formatting tags
return strip_tags($xml->saveXML());
}
$zip->close();
}
// In case of failure return empty string
return "";
}
echo docx2text("resume.docx"); // Save this contents to file
?>
Related
Goal
I have never touched PHP.
My goal is to retrieve BLOB .docx content from MySQL. I have found this resource to help me: Get content of docx file which saved in mysql dabase as blob type in php
I have just installed something called xampp along with Apache and PHP.
Created a folder within htdocs called Techincal. Inside there I have 2 files called test3.php and test.docx
At this moment I am not using MySQL at all. I am trying to see what PHP can do for me.
I have copied the code from the link above.
Code
<?php
/*Name of the document file*/
$document = 'test.docx';
/**Function to extract text*/
function extracttext($filename) {
//Check for extension
$ext = end(explode('.', $filename));
//if its docx file
if($ext == 'docx')
$dataFile = "word/document.xml";
//else it must be odt file
else
$dataFile = "content.xml";
//Create a new ZIP archive object
$zip = new ZipArchive;
// Open the archive file
if (true === $zip->open($filename)) {
// If successful, search for the data file in the archive
if (($index = $zip->locateName($dataFile)) !== false) {
// Index found! Now read it to a string
$text = $zip->getFromIndex($index);
// Load XML from a string
// Ignore errors and warnings
$xml = DOMDocument::loadXML($text, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
// Remove XML formatting tags and return the text
return strip_tags($xml->saveXML());
}
//Close the archive file
$zip->close();
}
// In case of failure return a message
return "File not found";
}
echo extracttext($document);
?>
When I run it on localhost - been following instructions on other websites on how to run PHP files.
Output
Notice: Only variables should be passed by reference in C:\xampp\htdocs\technical\test3.php on line 9
Testing
You need to store the result of explode() statement on line no 9 into a variable and then that variable should be passed to end function. This will resolve your problem.
Refer the below-corrected code.
<?php
/*Name of the document file*/
$document = 'test.docx';
/**Function to extract text*/
function extracttext($filename) {
//Check for extension
$tmp = explode('.', $filename);
$ext = end( $tmp );
//if its docx file
if($ext == 'docx')
$dataFile = "word/document.xml";
//else it must be odt file
else
$dataFile = "content.xml";
//Create a new ZIP archive object
$zip = new ZipArchive;
// Open the archive file
if (true === $zip->open($filename)) {
// If successful, search for the data file in the archive
if (($index = $zip->locateName($dataFile)) !== false) {
// Index found! Now read it to a string
$text = $zip->getFromIndex($index);
// Load XML from a string
// Ignore errors and warnings
$xml = DOMDocument::loadXML($text, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
// Remove XML formatting tags and return the text
return strip_tags($xml->saveXML());
}
//Close the archive file
$zip->close();
}
// In case of failure return a message
return "File not found";
}
echo extracttext($document);
?>
Function end() need an array by reference (read: https://www.php.net/manual/en/function.end.php) and you just put there a result of other function. You have to store result of explode() to some variable and that variable put to end().
I have a msword file which is sitting on server.
i want when user tried of open it will open on his computer in msword directly.
<?php
$document ='MyWordDocument.docx';
/**Function to extract text*/
function extracttext($filename)
{
//Check for extension
$ext = end(explode('.', $filename));
//if its docx file
if($ext == 'docx')
$dataFile = "word/document.xml";
//else it must be odt file
else
$dataFile = "content.xml";
//Create a new ZIP archive object
$zip = new ZipArchive;
// Open the archive file
if (true === $zip->open($filename)) {
// If successful, search for the data file in the archive
if (($index = $zip->locateName($dataFile)) !== false) {
// Index found! Now read it to a string
$text = $zip->getFromIndex($index);
// Load XML from a string
// Ignore errors and warnings
$xml = DOMDocument::loadXML($text, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
// Remove XML formatting tags and return the text
return strip_tags($xml->saveXML());
}
//Close the archive file
$zip->close();
}
// In case of failure return a message
return "File not found";
}
echo extracttext($document);
I tried above code but this code reads and display the content in browser.
Can we achieve this using PHP?
Only thing you can achieve is to download* the file into users computer. It is impossible to force him to also open the file. Why? Security.
*or display download dialog, depends on browser and settings.
I'm working on a project where I need to create a .docx document. I was using PHPWord, loading a template and then saving the file. This document has a lot of nested tables and PHPWord is breaking the tables after some replaces in the template.
So I decided to save the document as Word XML document (.xml) and do the replaces myself. I will load the text into a variable, do the replaces and then save as a new word document. My problem is that I don't know how to create a .docx document using a .xml.
Would you have some code snippets I could use?
Thanks for any help
I have come to the piece of code below. It saves the file but when I try to open using word it gives me invalid document
$xmlString = simplexml_load_file($this->config->application->fileTemplateFolder.'coi.xml')->asXML();
$xmlString = str_replace('${coi_number}', $coi['application_number'], $xmlString);
$path = $this->config->application->fileTemplateFolder.'test.docx';
$zip = new ZipArchive();
$zip->open($path, ZipArchive::CREATE);
$zip->addFromString("word/document.xml", $xmlString);
$zip->close();
Here is how I solved the issue:
private function CreateWordDocument($xmlString) {
$templateFolder = $this->config->fileTemplateFolder;
if(!endsWith($templateFolder, '/'))
$templateFolder = $templateFolder.'/';
$temp_file = tempnam(sys_get_temp_dir(), 'coi_').'.docx';
copy($templateFolder. 'coi.docx', $temp_file);
$zip = new ZipArchive();
if($zip->open($temp_file)===TRUE) {
$zip->deleteName('word/document.xml');
$zip->addFromString("word/document.xml", $xmlString);
$zip->close();
return $temp_file;
}
else {
return null;
}
}
$dir = "temp/docx";
$errors = array();
$zip = new ZipArchive;
if($zip->open($file_path) === false){
$errors[] = 'Failed to open file';
}
if (empty($errors)) {
$zip->extractTo($dir,"word/document.xml");
$zip->close();
$files = scandir($dir);
print_r($files);
Ok so, basically for some reason the extraction wont work. After seeing the folders empty, I decided to do a scandir to see if they were deleting after the php finished. Nothing. $files variable outputs nothing (ofcourse apart from .. and .).
The zip is actually a docx file, and after explicitly checking for errors, php seems to think the zip_open works, but I'm not sure if this is just a false positive.
I'm wondering if this is due to the fact this is actually a docx file and I need to explicitly save it as a zip file on the server. Or perhaps because this happens straight after being uploaded and the temp file gets deleted before being able to do anything with it (which I imagine is unlikely, as other formats work fine). Perhaps neither of my assumptions are close, or there's the chance I wrote the whole thing wrong. Any help?
Here you go:
<?php
/*Name of the document file*/
$document = 'demo.docx';
/*Directory*/
$dir = "temp/docx/";
/**Function to extract text*/
function extracttext($filename, $action) {
//Check for extension
$ext = end(explode('.', $filename));
//Check if DOCX file
if($ext == 'docx'){
$dataFile = "word/document.xml";
//else it's probebly an ODT file
} else {
$dataFile = "content.xml";
}
//Create a new ZIP archive object
$zip = new ZipArchive;
// Open the archive file
if (true === $zip->open($filename)) {
// If successful, search for the data file in the archive
if (($index = $zip->locateName($dataFile)) !== false) {
// Index found! Now read it to a string
$text = $zip->getFromIndex($index);
// Load XML from a string
// Ignore errors and warnings
$xml = DOMDocument::loadXML($text, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
if($action == "save"){
// Save xml to file
file_put_contents($dir ."word/document.xml", $xml->saveXML());
return "File succesfully saved.";
} else if($action == "text"){
// Remove XML formatting tags and return the text
return strip_tags($xml->saveXML());
}
}
//Close the archive file
$zip->close();
}
// In case of failure return a message
return "File not found";
}
//Save xml file
echo extracttext($document, "save");
//Echo text from file
echo extracttext($document, "text");
?>
I am saving docx file as BLOB type in mysql dadabase. after the saveing i am trying to see the content of the file through fetching the content of filed but it is showing some unreadable content.This this is working well for file having extention .doc but i don't know why it is not working for the .docx file.If any answer please help with proper explanation.
Make a query to select the data, then put the result in a variable.
Use file_put_content to get the docx file. Just be carefull with header.
To read it, the process is different from a doc. You have to "unzip" the docx and read the xml file inside it. You can use this function:
<?php
/*Name of the document file*/
$document = 'filename.docx';
/**Function to extract text*/
function extracttext($filename) {
//Check for extension
$ext = end(explode('.', $filename));
//if its docx file
if($ext == 'docx')
$dataFile = "word/document.xml";
//else it must be odt file
else
$dataFile = "content.xml";
//Create a new ZIP archive object
$zip = new ZipArchive;
// Open the archive file
if (true === $zip->open($filename)) {
// If successful, search for the data file in the archive
if (($index = $zip->locateName($dataFile)) !== false) {
// Index found! Now read it to a string
$text = $zip->getFromIndex($index);
// Load XML from a string
// Ignore errors and warnings
$xml = DOMDocument::loadXML($text, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
// Remove XML formatting tags and return the text
return strip_tags($xml->saveXML());
}
//Close the archive file
$zip->close();
}
// In case of failure return a message
return "File not found";
}
echo extracttext($document);
?>
(source of the code: http://www.botskool.com/geeks/how-extract-text-docx-or-odt-files-using-php)
Docx is a zipped file type See Tag Wiki
That's why you can't get the content of the document from the raw content.
I found this solution :
"update blob_table set blob_col='LOAD_FILE('$tmp_name')";
where $tmp_name is the file you upload, and this is the answer for this 6 years old question, using LOAD_FILE function. may be this is a newly added function to mysql.