Does anyone know how to merge (concatenate) docx documents with PHP (or Python if it's not possible in PHP)?
To clarify, my server is Linux based. I have 2 existing docx document, I need to put them in a new docx document using PHP or possibly Python.
Merging two different Docx files may be very complicated because Headers, Styles, Charts, Comments, User Modification Traces and other special contents are saved in separate inner XML sub-files in each Docx. Thus, two Docx may have different objects having the same ids. So it would be a very huge job to list all possible objects in the two documents, give them new inner ids, and re-affect them in a single one. Probably only Ms Office can do this currently.
Nevertheless, if you know that your two documents to be merged have the same styles, and if you know you have no charts, headers and other special objects, then the merging becomes something quite easy to perform.
In this case, you only have to use a Zip reader, such as TbsZip, to open the first Docx file (which is technically a zip archive containing XML sub-files) ; then read the sub-file "word/document.xml" and extract the part which is between the tags < w:body >
and < /w:body >.
In the second Docx file, open the "word/content.xml" and insert the previous content just before the tag < /w:body >. Save the result in a new Docx file.
This can be done using TbsZip, like this :
<?php
include_once('tbszip.php');
$zip = new clsTbsZip();
// Open the first document
$zip->Open('doc1.docx');
$content1 = $zip->FileRead('word/document.xml');
$zip->Close();
// Extract the content of the first document
$p = strpos($content1, '<w:body');
if ($p===false) exit("Tag <w:body> not found in document 1.");
$p = strpos($content1, '>', $p);
$content1 = substr($content1, $p+1);
$p = strpos($content1, '</w:body>');
if ($p===false) exit("Tag </w:body> not found in document 1.");
$content1 = substr($content1, 0, $p);
// Insert into the second document
$zip->Open('doc2.docx');
$content2 = $zip->FileRead('word/document.xml');
$p = strpos($content2, '</w:body>');
if ($p===false) exit("Tag </w:body> not found in document 2.");
$content2 = substr_replace($content2, $content1, $p, 0);
$zip->FileReplace('word/document.xml', $content2, TBSZIP_STRING);
// Save the merge into a third file
$zip->Flush(TBSZIP_FILE, 'merge.docx');
You may merge two Word documents with PHPDocX with a single line of code: (Source: Merging Word documents with PHPDocX)
require_once 'path /classes/DocxUtilities.inc';
$newDocx = new DocxUtilities();
$myOptions = array('mergeType' => 0);
$newDocx->mergeDocx('firstWordDoc.docx', 'secondWordDoc.docx', 'mergedWord.docx',
$myOptions);
This merging let you preserve all section structure (paper size, margins, associated footers and headers,...), includes all the required styles, manages all lists (this may seem trivial but it is not so in the OOXML standard), preserves images and charts as well as footnotes, endnotes and comments.
Moreover there is an option to preserve the original numberings (by default the page numbering continues).
One also may, via the mergeType option, to discard the section structure of the merged document and add it at the end of the first document as part of its last section. In this case, of course, the headers and footers are not imported but all other elements are still preserved.
Aspose.Words Cloud SDK for PHP can merge/join several Word Documents into a one Word document while keeping the formatting of appended or destination document depending upon the ImportFormatMode parameter value. Secondly, it is a commercial API but the free pricing plan allows 150 free monthly API Calls.
<?php
require_once('D:\xampp\htdocs\aspose-words-cloud-php-master\vendor\autoload.php');
//TODO: Get your ClientId and ClientSecret at https://dashboard.aspose.cloud (free registration is required).
$ClientSecret="xxxxxxxxxxxxxxxxxxxxxxxxxxxx";
$ClientId="xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx";
$wordsApi = new Aspose\Words\WordsApi($ClientId,$ClientSecret);
try {
$remoteDataFolder = "Temp";
$localFile = "C:/Temp/02_pages_adobe.docx";
$remoteFileName = "02_pages_adobe.docx";
$localFile1 = "C:/Temp/Sections.docx";
$remoteFileName1 = "Sections.docx";
$outputFileName = "TestAppendDocument.docx";
$uploadRequest = new Aspose\Words\Model\Requests\UploadFileRequest($localFile,$remoteDataFolder."/".$remoteFileName,null);
$wordsApi->uploadFile($uploadRequest);
$uploadRequest1 = new Aspose\Words\Model\Requests\UploadFileRequest($localFile1,$remoteDataFolder."/".$remoteFileName1,null);
$wordsApi->uploadFile($uploadRequest1);
$requestDocumentListDocumentEntries0 = new Aspose\Words\Model\DocumentEntry(array(
"href" => $remoteDataFolder . "/" . $remoteFileName1,
"import_format_mode" => "KeepSourceFormatting",
));
$requestDocumentListDocumentEntries = [
$requestDocumentListDocumentEntries0,
];
$requestDocumentList = new Aspose\Words\Model\DocumentEntryList(array(
"document_entries" => $requestDocumentListDocumentEntries,
));
$request = new Aspose\Words\Model\Requests\AppendDocumentRequest(
$remoteFileName,
$requestDocumentList,
$remoteDataFolder,
NULL,
NULL,
NULL,
$remoteDataFolder . "/" . $outputFileName,
NULL,
NULL
);
$result = $wordsApi->appendDocument($request);
##Download file
$request = new Aspose\Words\Model\Requests\DownloadFileRequest($remoteDataFolder."/".$outputFileName,NULL,NULL);
$result = $wordsApi->downloadFile($request);
copy($result->getPathName(),"AppendOutput.docx");
} catch (Exception $e) {
echo "Something went wrong: ", $e->getMessage(), "\n";
PHP_EOL;
}
?>
P.S: I'm developer evangelist at Aspose.
Related
I want to split slides of one pptx file into seperated pptx files, containing one slide each. The content/text is copied but the layout & styling is not copied. Here is the code.
Can anyone please help ?
<?php
use PhpOffice\PhpPresentation\PhpPresentation;
use PhpOffice\PhpPresentation\IOFactory;
use PhpOffice\PhpPresentation\Style\Color;
use PhpOffice\PhpPresentation\Style\Alignment;
use PhpOffice\PhpPresentation\Slide\SlideLayout;
$objReader = \PhpOffice\PhpPresentation\IOFactory::createReader('PowerPoint2007');
$objPHPPowerPoint = $objReader->load('a.pptx');
$totalSlides = $objPHPPowerPoint->getSlideCount();
$oMasterSlide = $objPHPPowerPoint->getAllMasterSlides()[0];
$documentProperties = $objPHPPowerPoint->getDocumentProperties();
for ( $count = 0; $count < $totalSlides; $count++ ) {
$objPHPPresentation = new PhpPresentation();
$slide = $objPHPPowerPoint->getSlide( $count );
$background = $slide->getBackground();
$newSlide = $objPHPPresentation->addSlide( $slide );
$newSlide->setBackground ( $background );
$objPHPPresentation->setAllMasterSlides( $oMasterSlide );
$objPHPPresentation->removeSlideByIndex(0);
$oWriterPPTX = \PhpOffice\PhpPresentation\IOFactory::createWriter($objPHPPresentation, 'PowerPoint2007');
$oWriterPPTX->save($count.'.pptx');
}
I don't think it's an issue with your code - more an issue with the underlying libraries - as mentioned here: PhpPresentation imagecreatefromstring(): Data is not in a recognized format - PHP7.2
It ran a test to see if it was something I could replicate - and I was able to. The key difference in my test was in one presentation I had a simple background, and in the other it was a gradient.
This slide caused problems:
But this one was copied over fine:
With the more complex background I got errors like:
PHP Warning: imagecreatefromstring(): Data is not in a recognized format
My code is even less complicated than yours, I just clone the original slideshow and remove all except a single slide before saving it:
for ( $count = 0; $count < $totalSlides; $count++ ) {
$copyVersion = clone $objPHPPowerPoint;
foreach ($copyVersion->getAllSlides() as $index => $slide) {
if ($index !== $count) {
$copyVersion->removeSlideByIndex($index);
}
}
$oWriterPPTX = \PhpOffice\PhpPresentation\IOFactory::createWriter($copyVersion, 'PowerPoint2007');
$oWriterPPTX->save($count.'.pptx');
}
Sorry if this doesn't exactly solve your problem, but hopefully it can help identify why it's happening. The other answer I linked to has more information about finding unsupported images types in your slides.
You can try using Aspose.Slides Cloud SDK for PHP to split a presentation into separate slides and save them to many formats. You can evaluate this REST-based API making 150 free API calls per month for API learning and presentation processing. The following code example shows you how to split a presentation and save slides to PPTX format using Aspose.Slides Cloud:
use Aspose\Slides\Cloud\Sdk\Api\Configuration;
use Aspose\Slides\Cloud\Sdk\Api\SlidesApi;
use Aspose\Slides\Cloud\Sdk\Model;
$configuration = new Configuration();
$configuration->setAppSid("my_client_id");
$configuration->setAppKey("my_client_key");
$slidesApi = new SlidesApi(null, $configuration);
$filePath = "example.pptx";
// Upload the file to the default storage.
$fileStream = fopen($filePath, 'r');
$slidesApi->uploadFile($filePath, $fileStream);
// Split the file and save the slides in PPTX format in the same folder.
$response = $slidesApi->split($filePath, null, Model\SlideExportFormat::PPTX);
// Download files of the slides.
foreach($response->getSlides() as $slide) {
$slideFilePath = pathinfo($slide->getHref())["basename"];
$slideFile = $slidesApi->downloadFile($slideFilePath);
echo $slideFile->getRealPath(), "\r\n";
}
Sometimes it is necessary to split a presentation without using any code. In this case, you can use Online PowerPoint Splitter.
I work as a Support Developer at Aspose.
I have tried to use the imagick library to create two functions like this:
function storeCoordinatesImage($img_path, $coordinates){
$im = new imagick($img_path);
$im->setImageProperty("coords", $coordinates);
$im->writeImage($img_path);
}
function getCoordinatesImage($img_path){
$im = new imagick($img_path);
return $im->getImageProperty("coords");
}
If I run:
if(!storeCoordinatesImage("I.jpg", "hi")) echo "fal";
echo getCoordinatesImage("I.jpg");
Nothing is returned.
But if I run:
$im = new imagick($img_path);
$im->setImageProperty("coords", "hello");
echo $im->getImageProperty("coords");
it returns "hello".
So it must be some issue with writing to the image? Although none of these functions are returning false. (i.e they are all working)
Use an image's profile payload to store arbitrary data. Although a JSON payload stored in an image comment (i.e. JPG_COM) tag seems the quick-n-easy, several competing technologies exist for this propose. The most popular being exif, but I would recommend xmp.
eXtensible Metadata Platform
In my opinion, xmp seems over engineered, but does offer a platform to ensure all vendor proprietary information is separated via XML namespace.
Wikipedia has a great overview, and Adobe's white papers (pdf) do a great job outlining the "does-n-don'ts" for a vendor to implement.
ImageMagick doesn't handle anything outside of read/write profile payloads, so you would be responsible for implementing a XML manager.
For example...
// A minimal XMP tempalte
$XMP_BASE='<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMPTk 2.8">'
.'<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"></rdf:RDF>'
.'</x:xmpmeta>';
$xmp = new DOMDocument();
$xmp->loadXML($XMP_BASE);
// Create a <rdf:Descriptiom> & <Coords> DOM element.
foreach($xmp->getElementsByTagName('RDF') as $node) {
$coords = $xmp->createElement('Coords', 'hello');
$description = $xmp->createElement('rdf:Description');
$description->setAttribute('about', '');
$description->appendChild($coords);
$node->appendChild($description);
}
$img = new Imagick('rose:');
// Write profile to image.
$img->setImageProfile('xmp', $xmp->saveXML());
$img->writeImage('output.jpg');
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$img2 = new Imagick('output.jpg');
$xmp2 = new DOMDocument();
// Read profile from image.
$xmp2->loadXML($img2->getImageProfile('xmp'));
// Grab `Coords' value
foreach($xmp2->getElementsByTagName('Coords') as $coords) {
print $coords->textContent . PHP_EOL;
}
//=> "hello"
And you can verify with the identify utility.
identify -verbose output.jpg | grep Coords
#=> Coords: hello
Of course, you would also need to implement a XML DOM "merge" method in the event that an image already contains a XMP profile, and you don't wish to overwrite existing data.
As Ben mentioned this is not possible. Instead you can add a "comment":
function storeCommentImage($img_path, $coordinates){
$im = new imagick($img_path);
$im->commentImage($coordinates);
return $im->writeImage($img_path);
}
function getCommentImage($img_path){
$im = new imagick($img_path);
return $im->getImageProperty("comment");
}
Seems like you can't persist that data for jpegs: https://github.com/ImageMagick/ImageMagick/issues/55#issuecomment-157114261
Maybe try with a png?
I need to replace content in some word documents based on User input. I am trying to read a template file (e.g. "template.docx"), and replace First name {fname}, Address {address} etc.
template.docx:
To,
The Office,
{officeaddress}
Sub: Authorization Letter
Sir / Madam,
I/We hereby authorize to {Ename} whose signature is attested here below, to submit application and collect Residential permit for {name}
Kindly allow him to support our International assignee
{name} {Ename}
Is there a way to do the same in Laravel 5.3?
I am trying to do with phpword, but I can only see code to write new word files - but not read and replace existing ones. Also, when I simply read and write, the formatting is messed up.
Code:
$file = public_path('template.docx');
$phpWord = \PhpOffice\PhpWord\IOFactory::load($file);
$phpWord->save('b.docx');
b.docx
To,
The Office,
{officeaddress}
Sub:
Authorization Letter
Sir / Madam,
I/We hereby authorize
to
{Ename}
whose signature is attested here below, to submit a
pplication and collect Residential permit
for
{name}
Kindly allow him to support our International assignee
{name}
{
E
name}
This is the working version to #addweb-solution-pvt-ltd 's answer.
//This is the main document in Template.docx file.
$file = public_path('template.docx');
$phpword = new \PhpOffice\PhpWord\TemplateProcessor($file);
$phpword->setValue('{name}','Santosh');
$phpword->setValue('{lastname}','Achari');
$phpword->setValue('{officeAddress}','Yahoo');
$phpword->saveAs('edited.docx');
However, not all of the {name} fields are changing. Not sure why.
Alternatively:
// Creating the new document...
$zip = new \PhpOffice\PhpWord\Shared\ZipArchive();
//This is the main document in a .docx file.
$fileToModify = 'word/document.xml';
$file = public_path('template.docx');
$temp_file = storage_path('/app/'.date('Ymdhis').'.docx');
copy($template,$temp_file);
if ($zip->open($temp_file) === TRUE) {
//Read contents into memory
$oldContents = $zip->getFromName($fileToModify);
echo $oldContents;
//Modify contents:
$newContents = str_replace('{officeaddqress}', 'Yahoo \n World', $oldContents);
$newContents = str_replace('{name}', 'Santosh Achari', $newContents);
//Delete the old...
$zip->deleteName($fileToModify);
//Write the new...
$zip->addFromString($fileToModify, $newContents);
//And write back to the filesystem.
$return =$zip->close();
If ($return==TRUE){
echo "Success!";
}
} else {
echo 'failed';
}
Works well. Still trying to figure how to save it as a new file and force a download.
I have same task to edit .doc or .docx file in php, i have use this code for it.
Reference : http://www.onlinecode.org/update-docx-file-using-php/
$full_path = 'template.docx';
//Copy the Template file to the Result Directory
copy($template_file_name, $full_path);
// add calss Zip Archive
$zip_val = new ZipArchive;
//Docx file is nothing but a zip file. Open this Zip File
if($zip_val->open($full_path) == true)
{
// In the Open XML Wordprocessing format content is stored.
// In the document.xml file located in the word directory.
$key_file_name = 'word/document.xml';
$message = $zip_val->getFromName($key_file_name);
$timestamp = date('d-M-Y H:i:s');
// this data Replace the placeholders with actual values
$message = str_replace("{officeaddress}", "onlinecode org", $message);
$message = str_replace("{Ename}", "ingo#onlinecode.org", $message);
$message = str_replace("{name}", "www.onlinecode.org", $message);
//Replace the content with the new content created above.
$zip_val->addFromString($key_file_name, $message);
$zip_val->close();
}
To read and replace content from Doc file, you can use PHPWord package and download this package using composer command:
composer require phpoffice/phpword
As per version v0.12.1, you need to require the PHP Word Autoloader.php from src/PHPWord folder and register it
require_once 'src/PhpWord/Autoloader.php';
\PhpOffice\PhpWord\Autoloader::register();
1) Open document
$template = new \PhpOffice\PhpWord\TemplateProcessor('YOURDOCPATH');
2) Replace string variables for single
$template->setValue('variableName', 'MyVariableValue');
3) Replace string variables for multi occurrence
- Clone your array placeholder to the count of your array
$template->cloneRow('arrayName', count($array));
- Replace variable value
for($number = 0; $number < count($array); $number++) {
$template->setValue('arrayName#'.($number+1), htmlspecialchars($array[$number], ENT_COMPAT, 'UTF-8'));
}
4) Save the changed document
$template->saveAs('PATHTOUPDATED.docx');
UPDATE
You can pass limit as third parameter into $template->setValue($search, $replace, $limit) to specifies how many matches should take place.
If you find simple solution you can use this library
Example:
This code will replace $search to $replace in $pathToDocx file
$docx = new IRebega\DocxReplacer($pathToDocx);
$docx->replaceText($search, $replace);
Library phpoffice/phpword working is ok.
For correct working you must use the right symbols in your Word document, like that:
${name}
${lastname}
${officeAddress}
and for method "setValue" you need to use only names, like:
'name'
'lastname'
'officeAddress'
Very good working within Laravel, Lumen, and other frameworks
Example:
//This is the main document in Template.docx file.
$file = public_path('template.docx');
$phpword = new \PhpOffice\PhpWord\TemplateProcessor($file);
$phpword->setValue('name','Santosh');
$phpword->setValue('lastname','Achari');
$phpword->setValue('officeAddress','Yahoo');
$phpword->saveAs('edited.docx');
I am working on a program that parses text files uploaded by a user and then saves the parsed XML file on the server. However, when I write the XML file I get some the text
at the end of each line. This text is not in my original text file. I didn't even notice it until I opened the new XML file to verify that it was righting all of the content. Has anyone ran into this before and if so can you tell me if it's due to the way I'm creating and writing my file?
fileUpload.php - These 3 lines occur when the user uploads the file.
$fileName = basename($_FILES['fileaddress']['name']);
$fileContents = file_get_contents($_FILES['fileaddress']['tmp_name']);
$xml = $parser->parseUnformattedText($fileContents);
$parsedFileName = pathinfo($fileName, PATHINFO_FILENAME) . ".xml";
file_put_contents($parsedFileName, $xml);
parser.php
function parseUnformattedText($inputText, $bookName = "")
{
//create book, clause, text nodes
$book = new SimpleXmlElement("<book></book>");
$book->addAttribute("bookName", $bookName);
$conj = $book->addChild("conj", "X");
$clause = $book->addChild("clause");
$trimmedText = $this->trimNewLines($inputText);
$trimmedText = $this->trimSpaces($inputText);
$text = $clause->addChild("text", $trimmedText);
$this->addChapterVerse($text, "", "");
//make list of pconj's for beginning of file
$pconjs = $this->getPconjList();
//convert the xml to string
$xml = $book->asXml();
//combine the list of pconj's and xml string
$xml = "$pconjs\n$xml";
return $xml;
}
Input text file
1:1 X
it seemed good to me also,
X
having had perfect understanding of all things from the very first
to write you an orderly account, [most] excellent Theophilius
and
1:4
that
you may know the certainty of those things in which you were instructed
1:5 X
There was in the days of Herod, the king of Judea and a certain priest named Zacharias
X
his wife[was] of the daughters of Aaron
and
her name [was] Elizabeth.
1:8 So
it was,
that
while he was serving as priest 1:9 before God in the order of his division,
1:10 and
the whole multitude of the people was praying outside at the hour of incense
but
therefore
it was done.
Going off of Seroczynski's answer I was able to create a function that trimmed removed any carriage returns from the text. The XML output looked fine after that. Here's the function I used to fix the issue:
function trimCarriageReturns($text)
{
$textOut = str_replace("\r", "\n", $text);
$textOut = str_replace("\n\n", "\n", $textOut);
return $textOut;
}
is the ASCII character for \r\n which doesn't seem to come out correctly from parseUnformattedText().
Try $xml = nl2br($parser->parseUnformattedText($fileContents));
I' using OpenTBS to merge 2 file docx.
include_once('tbszip.php');
$zip = new clsTbsZip();
// Open the first document
$zip->Open('file-1.docx');
$content1 = $zip->FileRead('word/document.xml');
$zip->Close();
// Extract the content of the first document
$p = strpos($content1, '<w:body');
if ($p===false) exit("Tag <w:body> not found in document 1.");
$p = strpos($content1, '>', $p);
$content1 = substr($content1, $p+1);
$p = strpos($content1, '</w:body>');
if ($p===false) exit("Tag </w:body> not found in document 1.");
$content1 = substr($content1, 0, $p);
// Insert into the second document
$zip->Open('file-2.docx');
$content2 = $zip->FileRead('word/document.xml');
$p = strpos($content2, '</w:body>');
if ($p===false) exit("Tag </w:body> not found in document 2.");
$content2 = substr_replace($content2, $content1, $p, 0);
$zip->FileReplace('word/document.xml', $content2, TBSZIP_STRING);
// Save the merge into a third file
$zip->Flush(TBSZIP_DOWNLOAD, 'merge1.docx');
content in file-1.docx include image+text, file-2: only text.
But when gen file merge1.docx, can not gen image from file-1.docx
Please for me a solution, thanks.
P/s: sorry for my english.
when I reversed the order to open the file, file merge1.docx full content. why?
// Open the first document
$zip->Open('file-2.docx');
$content1 = $zip->FileRead('word/document.xml');
$zip->Close();
..........
// Insert into the second document
$zip->Open('file-1.docx');
It is quite difficult to merge two DOCX because of internal elements such as pictures, charts, ...
In the archive, pictures must be saved in the word/media/ directory.
They must be declared in the file /[Content_Types].xml
They also must be assigned to a unique Id in the file /word/_rels/document.xml.rels.
And then the unique Id must be used in a XML element corresponding to the picture in the word/document.xml file.
So in order to merge two DOCX files you have to apply your snippet, then get the pictures from DOCX to the other, and then perform the operation above.
You're using TbsZip, which is used by OpenTBS but it is not the same tool.
OpenTBS won't help you to merge two DOCX together.