PHP DOMDocument And DOMXpath - php

I am trying to find the last paragraph tag in a block of HTML using DOMDocument/DOMXpath but can't seem to figure it out.
# Create DOMDocument Object
$dom = new DOMDocument;
# Load HTML into DomDocument Object
$dom->loadHTML($data['component2']);
# Creat DOMXPath Object and load DOMDocument Object into XPath for magical goodness
$xpath = new DOMXPath($dom);
# Loop through each comment node
foreach($xpath->query('//p') as $node) {
// krumo($node->parentNode);
print_r($node->parentNode->lastChild);
}
exit();
The print_r returns an empty DOMText Object ( )... any idea on how to find the last paragraph in a block of HTML using DOMDocument/DOMXPath?
Working Code:
# Create DOMDocument Object
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
# Load HTML into DomDocument Object
$dom->loadHTML($data['component2']);
# Creat DOMXPath Object and load DOMDocument Object into XPath for magical goodness
$xpath = new DOMXPath($dom);
$q = $xpath->query('//div[#class="t_content"]/p[last()]');
$data['component2'] = str_replace(utf8_decode($q->item(0)->nodeValue), "", $data['component2']);

Use this instead:
print_r($node->parentNode->lastChild->nodeValue);

Related

Symfony DOMCrawler: How to change html?

How to edit html of elements? I tried this, but i get this error.
Fatal error: Uncaught InvalidArgumentException: Attaching DOM nodes
from multiple documents in the same crawler is forbidden.
$crawler = new Crawler('<h1>The title</h1>');
$crawler
->filter('h1,h2,h3,h4,h5,h6')
->each(function (Crawler $crawler, $i) use (&$replace) {
$crawler->html('<span>test</span>' . $crawler->html());
});
Use this:
$doc = new DOMDocument;
$doc->loadHTML($html);
$crawler = new Crawler($doc);
$crawler
->filter('h1,h2,h3,h4,h5,h6')
->each(function (Crawler $crawler) use ($doc) {
foreach ($crawler as $node) {
$span = $doc->createElement('span', 'test');
$node->parentNode->insertBefore($span, $node);
}
});
Important: Use same DOMDocument object for creating new tag that used in Crawler object.
As explained in The DomCrawler Component docs:
An instance of the Crawler represents a set of DOMElement objects, which are nodes that can be traversed...
So, you need to traverse Crawler object before manipulate DOMElements.

Add DocType to exits DOMDocument

i have a Php class likes "Extension_DOMDocument" and this extends the PHP "DOMDocument" class.
I create a new Object of Extension_DOMDocument and would add to DocType to this Object.
My code is:
// $this->data is an array to convert array to xml
$objcDom = new Extension_DOMDocument('1.0', 'utf-8');
$objcDom->fromMixed($this->data);
How I can add an DocType to $objcDom?
You can use the the DOM implementation to create a document type object. Document type objects are still DOM nodes. You can append them to an existing document.
class MyDOMDocument extends DOMDocument {}
$dom = new MyDOMDocument();
$implementation = new DOMImplementation();
$dom->appendChild($implementation->createDocumentType('example'));
$dom->appendChild($dom->createElement('foo'));
echo $dom->saveXml();
Output:
<?xml version="1.0"?>
<!DOCTYPE example>
<foo/>
I would use this
<?php
// Creates an instance of the DOMImplementation class
$imp = new DOMImplementation;
// Creates a DOMDocumentType instance
$dtd = $imp->createDocumentType('graph', '', 'graph.dtd');
// Creates a DOMDocument instance
$dom = $imp->createDocument("", "", $dtd);
// Set other properties
$dom->encoding = 'UTF-8';
$dom->standalone = false;
// Create an empty element
$element = $dom->createElement('graph');
// Append the element
$dom->appendChild($element);
// Retrieve and print the document
echo $dom->saveXML();
?>
Check: http://php.net/manual/en/domimplementation.createdocumenttype.php

How to insert HTML to PHP DOMNode?

Is there any way I can insert an HTML template to existing DOMNode without content being encoded?
I have tried to do that with:
$dom->createElement('div', '<h1>Hello world</h1>');
$dom->createTextNode('<h1>Hello world</h1>');
The output is pretty much the same, with only difference that first code would wrap it in a div.
I have tried to loadHTML from string but I have no idea how can I append it's body content to another DOMDocument.
In javascript, this process seems to be quite simple and obvious.
You can use
DOMDocumentFragment::appendXML — Append raw XML data
Example:
// just some setup
$dom = new DOMDocument;
$dom->loadXml('<html><body/></html>');
$body = $dom->documentElement->firstChild;
// this is the part you are looking for
$template = $dom->createDocumentFragment();
$template->appendXML('<h1>This is <em>my</em> template</h1>');
$body->appendChild($template);
// output
echo $dom->saveXml();
Output:
<?xml version="1.0"?>
<html><body><h1>This is <em>my</em> template</h1></body></html>
If you want to import from another DOMDocument, replace the three lines with
$tpl = new DOMDocument;
$tpl->loadXml('<h1>This is <em>my</em> template</h1>');
$body->appendChild($dom->importNode($tpl->documentElement, TRUE));
Using TRUE as the second argument to importNode will do a recursive import of the node tree.
If you need to import (malformed) HTML, change loadXml to loadHTML. This will trigger the HTML parser of libxml (what ext/DOM uses internally):
libxml_use_internal_errors(true);
$tpl = new DOMDocument;
$tpl->loadHtml('<h1>This is <em>malformed</em> template</h2>');
$body->appendChild($dom->importNode($tpl->documentElement, TRUE));
libxml_use_internal_errors(false);
Note that libxml will try to correct the markup, e.g. it will change the wrong closing </h2> to </h1>.
It works with another DOMDocument for parsing the HTML code. But you need to import the nodes into the main document before you can use them in it:
$newDiv = $dom->createElement('div');
$tmpDoc = new DOMDocument();
$tmpDoc->loadHTML($str);
foreach ($tmpDoc->getElementsByTagName('body')->item(0)->childNodes as $node) {
$node = $dom->importNode($node, true);
$newDiv->appendChild($node);
}
And as a handy function:
function appendHTML(DOMNode $parent, $source) {
$tmpDoc = new DOMDocument();
$tmpDoc->loadHTML($source);
foreach ($tmpDoc->getElementsByTagName('body')->item(0)->childNodes as $node) {
$node = $parent->ownerDocument->importNode($node, true);
$parent->appendChild($node);
}
}
Then you can simply do this:
$elem = $dom->createElement('div');
appendHTML($elem, '<h1>Hello world</h1>');
As I do not want to struggle with XML, because it throws errors faster and I am not a fan of prefixing an # to prevent error output. The loadHTML does the better job in my opinion and it is quite simple as that:
$doc = new DOMDocument();
$div = $doc->createElement('div');
// use a helper to load the HTML into a string
$helper = new DOMDocument();
$helper->loadHTML('This is my HTML Link.');
// now the magic!
// import the document node of the $helper object deeply (true)
// into the $div and append as child.
$div->appendChild($doc->importNode($helper->documentElement, true));
// add the div to the $doc
$doc->appendChild($div);
// final output
echo $doc->saveHTML();
Here is simple example by using DOMDocumentFragment:
$doc = new DOMDocument();
$doc->loadXML("<root/>");
$f = $doc->createDocumentFragment();
$f->appendXML("<foo>text</foo><bar>text2</bar>");
$doc->documentElement->appendChild($f);
echo $doc->saveXML();
Here is helper function for replacing DOMNode:
/**
* Helper function for replacing $node (DOMNode)
* with an XML code (string)
*
* #var DOMNode $node
* #var string $xml
*/
public function replaceNodeXML(&$node, $xml) {
$f = $this->dom->createDocumentFragment();
$f->appendXML($xml);
$node->parentNode->replaceChild($f,$node);
}
Source: Some old "PHP5 Dom Based Template" article.
And here is another suggestion posted by Pian0_M4n to use value attribute as workaround:
$dom = new DomDocument;
// main object
$object = $dom->createElement('div');
// html attribute
$attr = $dom->createAttribute('value');
// ugly html string
$attr->value = "<div> this is a really html string ©</div><i></i> with all the © that XML hates!";
$object->appendChild($attr);
// jquery fix (or javascript as well)
$('div').html($(this).attr('value')); // and it works!
$('div').removeAttr('value'); // to clean-up
No ideal, but at least it works.
Gumbo's code works perfectly! Just a little enhancement that adding the TRUE parameter so that it works with nested html snippets.
$node = $parent->ownerDocument->importNode($node);
$node = $parent->ownerDocument->importNode($node, **TRUE**);

Can I get the matched DOM string with PHP and DOMDocument?

I've got my HTML inside of $html.
dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#id="header"]');
foreach($tags as $tag) {
var_dump($tag->nodeValue); // the innerHTML of that element
var_dump($tag); // object(DOMElement)#3 (0) { }
}
Is there a way to get that node, or remove it?
Basically, I'm parsing an existing website and need to remove elements from it. What method do I call to do that?
Thanks
Have you checked out DOMNode::removeChild ?

Add node to a XML variable and Save it

i want to add the xml node i.e <name>B07BZFZV8D</name> to the XML variable before saving it.
I want to add the 'name' node inside the 'Self' Element.
#Previously i use to save it directly like this,
$Response #this is the respnse from api
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML($Response);
##saving in file
$myfile = file_put_contents('data.xml', $Response.PHP_EOL , FILE_APPEND | LOCK_EX);
With DOM you use methods of the document object to create the node and methods of the parent node to insert/add it to the hierarchy.
DOMDocument has create* methods for the different node types (element, text, cdata section, comment, ...). The parent nodes (element, document, fragment) have methods like appendChild and insertBefore to add/remove them.
Xpath can be used to fetch nodes from the DOM.
$document = new DOMDocument;
$document->preserveWhiteSpace = FALSE;
$document->loadXML($xmlString);
$xpath = new DOMXpath($document);
// fetch the first Data element inside the Report document element
foreach ($xpath->evaluate('/Report/Data[1]') as $data) {
// create the name element and append it
$name = $data->appendChild($document->createElement('name'));
// create a node for the text content and append it
$name->appendChild($document->createTextNode('Vivian'));
}
$document->formatOutput = TRUE;
echo $document->saveXML();
Output:
<?xml version="1.0" encoding="UTF-8"?>
<Report>
<Data>
<id>87236</id>
<purchase>3</purchase>
<address>XXXXXXXX</address>
<name>Vivian</name>
</Data>
</Report>
Using #ThW Code:
Need to change Create Element functionality
$document = new DOMDocument;
$document->preserveWhiteSpace = FALSE;
$document->loadXML($xmlString);
$xpath = new DOMXpath($document);
// fetch the first Data element inside the Report document element
foreach ($xpath->evaluate('/Report/Data[1]') as $data) {
// create the name element with value and append it
$xmlElement = $document->createElement('name', 'Vivian');
$data->appendChild($xmlElement);
}
$document->formatOutput = TRUE;
echo $document->saveXML();
It works for me with php7.0. Check it works for you.

Categories