I am using the PHP DOMDocument and at one moment I am using the createTextNode
$copyrightNode = $doc->createTextNode('©');
$copyrightContainer = $dom_output->createElement('copyright-statement');
$copyrightContainer->appendChild($copyrightNode);
In the XML that is generated some time later, I am getting:
<copyright-statement>©</copyright-statement>
And my goal is to have
<copyright-statement>©</copyright-statement>
Any idea on how to do that?
Thank you in advance.
When PHP outputs an XML document, any characters that cannot be represented in the specified output encoding will be replaced with numeric entities (either decimal or hexadecimal, both are equivalent):
<?php
$dom = new DOMDocument;
$node = $dom->createElement('copyright-statement', '©');
$dom->appendChild($node);
$dom->encoding = 'UTF-8';
print $dom->saveXML(); // <copyright-statement>©</copyright-statement>
$dom->encoding = 'ASCII';
print $dom->saveXML(); // <copyright-statement>©</copyright-statement>
The correct thing to do here is to use the createEntityReference method (e.g. createEntityReference("copy");), and then appendChild this entity.
Example:
<?php
$copyrightNode = $doc->createEntityReference("copy");
$copyrightContainer = $dom_output->createElement('copyright-statement');
$copyrightContainer->appendChild($copyrightNode);
To create © you could (I believe) do:
$copyrightNode = $doc->createCDATASection("©");
$copyrightContainer = $dom_output->createElement('copyright-statement');
$copyrightContainer->appendChild($copyrightNode);
Related
I want to save DOM tags value to exist XML, I found replace function but it is in js and I need the function in PHP
I tried save and saveXML function, but this didn't worked. I have tags in XML with colon "iaiext:auction_title". I used getElement and it's work good, next i cut title to 50 characters function work too, but how i can replace old title to this new title if i dont use path like simple_load_file. How to show in my script this path?
$dom = new DOMDocument;
$dom->load('p.xml');
$i = 0;
$tytuly = $dom->getElementsByTagName('auction_title');
foreach ($tytuly as $tytul){
$title = $tytul->nodeValue;
$end_title = doTitleCut($title);
//echo "<pre>";
//echo($end_title);
//echo "<pre>";
$i = $i+1;
}
In your loop, you can update a particular nodes value the same way you fetch it - with nodeValue. So in your loop, just update it each time...
$tytul->nodeValue = doTitleCut($title);
Then after your loop, you can just echo the new XML out using
echo $dom->saveXML();
or save it using
$dom->save("3.xml");
It is the same basic API in PHP. However browsers implement more or other parts of the API. Here are 5 revisions of the API (DOM Level 1 to 4 and DOM LS). DOM 3 added a property to read/write the text content of a node: https://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent
The following example prefixes the titles:
$xml = <<<'XML'
<auctions>
<auction_title>World!</auction_title>
<auction_title>World & Universe!</auction_title>
</auctions>
XML;
$document = new DOMDocument();
$document->loadXML($xml);
$titleNodes = $document->getElementsByTagName('auction_title');
foreach ($titleNodes as $titleNode) {
$title = $titleNode->textContent;
$titleNode->textContent = 'Hello '.$title;
}
echo $document->saveXML();
Output:
<?xml version="1.0"?>
<auctions>
<auction_title>Hello World!</auction_title>
<auction_title>Hello World & Universe!</auction_title>
</auctions>
PHPs DOMNode::$nodeValue implementation does not match the W3C API definition. It behaves the same as DOMNode::$textContent for reads and does not fully escape on write.
I am trying to export xml with CDATA tags. I use the following code:
$xml_product = $xml_products->addChild('product');
$xml_product->addChild('mychild', htmlentities("<![CDATA[" . $mytext . "]]>"));
The problem is that I get CDATA tags < and > escaped with < and > like following:
<mychild><![CDATA[My some long long long text]]></mychild>
but I need:
<mychild><![CDATA[My some long long long text]]></mychild>
If I use htmlentities() I get lots of errors like tag raquo is not defined etc... though there are no any such tags in my text. Probably htmlentities() tries to parse my text inside CDATA and convert it, but I dont want it either.
Any ideas how to fix that? Thank you.
UPD_1 My function which saves xml to file:
public static function saveFormattedXmlFile($simpleXMLElement, $output_file) {
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML(urldecode($simpleXMLElement->asXML()));
$dom->save($output_file);
}
A short example of how to add a CData section, note the way it skips into using DOMDocument to add the CData section in. The code builds up a <product> element, $xml_product has a new element <mychild> created in it. This newNode is then imported into a DOMElement using dom_import_simplexml. It then uses the DOMDocument createCDATASection method to properly create the appropriate bit and adds it back into the node.
$xml = new SimpleXMLElement('<?xml version="1.0" encoding="UTF-8"?><Products />');
$xml_product = $xml->addChild('product');
$newNode = $xml_product->addChild('mychild');
$mytext = "<html></html>";
$node = dom_import_simplexml($newNode);
$cdata = $node->ownerDocument->createCDATASection($mytext);
$node->appendChild($cdata);
echo $xml->asXML();
This example outputs...
<?xml version="1.0" encoding="UTF-8"?>
<Products><product><mychild><![CDATA[<html></html>]]></mychild></product></Products>
The database has the value: This is a test.<br><h1>this is also a test.</h1>This is a test.<br>this is a test.<br>
Using mysql the value is given by: $DBval['test'].
the row settings are:
Type = LONGTEXT
Collation = UTF8_general_ci
$doc = new DOMDocument();
$test = $doc->createElement("div");
$doc->appendChild($test);
$test_value = $doc->createElement("p", $DBval['test']);
$test->appendChild($test_value);
echo $doc->saveXML();
result:
"This is a test.<br><h1>this is also a test.</h1>This is a test.<br>this is a test.<br>"
The result is written in plain text and weirdly enough in double quotes.
I just want the result to be written in HTML like this:
This is a test.this is also a test.This is a
test.this is a test.
There a few reason why ths will not work (at least as expected)
If you have 'malformed' html you will need to use saveHTML() instead of saveXML().
Since your string is already containting some html tag you will need to do an loadHTML(); insert it
You can echo ONLY the element by passing the DOMElement to the saveHTML($text_value) so you don't echo all the document.
Take into account that domDocuemnt will emcapsulate any 'free-floating' text into a <p> tag. In this case of text only node you shall use ->createTextNode() Instead.
Now, here is the tricky part: You can do:
$doc = new DOMDocument();
$doc->loadHTML($DBval['test']);
echo $doc->saveHTML();
But if you want to actually 'import' html into another DOMElement you do need to IMPORT it. Here a function i used (addapted for your case and commented for explaination)
//For a valid html5 DOCTYPE declaration
//$doc = new DOMDocument();
$dom = new DOMImplementation;
$doc = $dom->createDocument(null, 'html', $dom->createDocumentType('html'));
//To keep thing tidy
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$doc->encoding = 'utf8';
//Creates your test div
$test = $doc->createElement("div");
$doc->appendChild($test);
/** HERE STARTS THE MAGIC */
$tempDoc= new DOMDocument; //Create a temp Doc to import the new html
libxml_use_internal_errors(true); //This prevent some garbage warning.
//Prevent encoding garbage on import, change accordingly to your setup
$htmlToImport = mb_convert_encoding($DBval['test'], 'HTML-ENTITIES', 'utf8');
//Load your Html into the temp document
//As commented, we'll encapsulate the html in a span to prevent DOM to automaticly add the 'p' tag
$tempDoc->loadHTML('<span>'.$htmlToImport.'</span>');
//$tempDoc->loadHTML($htmlToImport); //#REMOVED: was adding 'p' tag
//Restore Garbage Warning report
libxml_clear_errors();
libxml_use_internal_errors(false);
//Get the htl to import now sotred in the body of the temp document
$bodyToImport = $tempDoc->getElementsByTagName('body')->item(0);
//Import all those new childs to your div
foreach($bodyToImport->childNodes as $node){
$test->appendChild($doc->importNode($node->cloneNode(true),true));
}
/** All this to replace these 2 lines :(
$test_value = $doc->createElement("p", $DBval['test']);
$test->appendChild($test_value);
*/
//echo $doc->saveXML();
echo $doc->saveHTML(); //echo all the document
//echo $doc->saveHTML($test); //echo only the test 'div'
I've used the term 'garbage' beacuse it is mainely error you can ignore, but while you dev, you might wat to take a look at those error.
I know this looks overkill but it's the only way i managed to work with any HTML, charset and make it work in a clean way.
Really hope this helps. DOM can be tricky but it has the abiity to keep thing structured if used properly.
I have the following script snippet. Originally I did not realize to use getElementById that I needed to include createDocumentType, but now I get the error listed above. What am I doing wrong here? Thanks in advance!
...
$result = curl_exec($ch); //contains some webpage i am grabbing remotely
$dom = new DOMDocument();
$dom->createDocumentType('html', '-//W3C//DTD HTML 4.01 Transitional//EN', 'http://www.w3.org/TR/html4/loose.dtd');
$elements = $dom->loadHTML($result);
$e = $elements->getElementById('1');
...
Edit: Additional note, I verified the DOM is correct on the remote page.
DOMDocument does not have a method named createDocumentType, as you can see in the Manual. The method belongs to the DOMImplemetation class. It is used like this (taken from the manual):
// Creates an instance of the DOMImplementation class
$imp = new DOMImplementation;
// Creates a DOMDocumentType instance
$dtd = $imp->createDocumentType('graph', '', 'graph.dtd');
// Creates a DOMDocument instance
$dom = $imp->createDocument("", "", $dtd);
Since you want to load HTML into the document, you don't need to specify a document type, since it is determined from the imported HTML. You just have to have some id attributes, or a DTD that identifies an other attribute as an id. This is part of the HTML file, not the parsing PHP code.
$dom = new DOMDocument();
$dom->loadHTML($result);
$element = $dom->getElementById('my_id');
will do the job.
I was successfully using the following code to merge multiple large XML files into a new (larger) XML file. Found at least part of this on StackOverflow
$docList = new DOMDocument();
$root = $docList->createElement('documents');
$docList->appendChild($root);
$doc = new DOMDocument();
foreach(xmlFilenames as $xmlfilename) {
$doc->load($xmlfilename);
$xmlString = $doc->saveXML($doc->documentElement);
$xpath = new DOMXPath($doc);
$query = self::getQuery(); // this is the name of the ROOT element
$nodelist = $xpath->evaluate($query, $doc->documentElement);
if( $nodelist->length > 0 ) {
$node = $docList->importNode($nodelist->item(0), true);
$xmldownload = $docList->createElement('document');
if (self::getShowFileName())
$xmldownload->setAttribute("filename", $filename);
$xmldownload->appendChild($node);
$root->appendChild($xmldownload);
}
}
$newXMLFile = self::getNewXMLFile();
$docList->save($newXMLFile);
I started running into OUT OF MEMORY issues when the number of files grew as did the size of them.
I found an article here which explained the issue and recommended using XMLWriter
So, now trying to use PHP XMLWriter to merge multiple large XML files together into a new (larger) XML file. Later, I will execute xpath against the new file.
Code:
$xmlWriter = new XMLWriter();
$xmlWriter->openMemory();
$xmlWriter->openUri('mynewFile.xml');
$xmlWriter->setIndent(true);
$xmlWriter->startDocument('1.0', 'UTF-8');
$xmlWriter->startElement('documents');
$doc = new DOMDocument();
foreach($xmlfilenames as $xmlfilename)
{
$fileContents = file_get_contents($xmlfilename);
$xmlWriter->writeElement('document',$fileContents);
}
$xmlWriter->endElement();
$xmlWriter->endDocument();
$xmlWriter->flush();
Well, the resultant (new) xml file is no longer correct since elements are escaped - i.e.
<?xml version="1.0" encoding="UTF-8"?>
<CONFIRMOWNX>
<Confirm>
<LglVeh id="GLE">
<AddrLine1>GLEACHER & COMPANY</AddrLine1>
<AddrLine2>DESCAP DIVISION</AddrLine2>
Can anyone explain how to take the content from the XML file and write them properly to new file?
I'm burnt on this and I KNOW it'll be something simple I'm missing.
Thanks.
Robert
See, the problem is that XMLWriter::writeElement is intended to, well, write a complete XML element. That's why it automatically sanitize (replace & with &, for example) the contents of what's been passed to it as the second param.
One possible solution is to use XMLWriter::writeRaw method instead, as it writes the contents as is - without any sanitizing. Obviously it doesn't validate its inputs, but in your case it does not seem to be a problem (as you're working with already checked source).
Hmm, Not sure why it's converting it to HTML Characters, but you can decode it like so
htmlspecialchars_decode($data);
It converts special HTML entities back to characters.