Simple HTML DOM Parser - find class inside another class - php

I have this HTML script:
<div class="find-this">I do not need this</div>
<div class="content">
<div class="find-this">I need this</div>
</div>
<div class="content">
<div class="find-this">I need this</div>
<div class="find-this">I need this as well</div>
</div>
So far, I have this:
foreach($html->find('div[class=content]') as $key => $element) :
$result = $html->find('div[class=find-this]', $key)->innertext;
echo $result;
endforeach;
How do I find the find-this class that is inside the content class, and not the one above, without knowing how many are inside the needed class and how many are outside? Thank you.

XPath might be what you are looking for. With this code you get only the three nodes that you need.
/* Creates a new DomDocument object */
$dom = new DomDocument;
/* Load the HTML */
$dom->loadHTMLFile("test.html");
/* Create a new XPath object */
$xpath = new DomXPath($dom);
/* Query all <divs> with the class name */
$nodes = $xpath->query("//div[#class='content']//div[#class='find-this']");
/* Set HTTP response header to plain text for debugging output */
header("Content-type: text/plain");
/* Traverse the DOMNodeList object to output each DomNode's nodeValue */
foreach ($nodes as $i => $node) {
echo "Node($i): ", $node->nodeValue, "\n";
}
Note: I based my answer on this other related answer.

Related

Content of a div inside a php variable

I have a lot of php included pages inside a template.
<h1> tag is also inside an included page, but I need to change them dynamically:
<div id='xnavact'>abc</div>
js
var a = $('#xnavact').html();
$('h1').html(a);
This works but I've heard that Google Search does not include changed content via javascript.
Am I right about this, and how could I make the same thing using php?
Something like:
<h1><?php echo $content_of_xnavact ?></h1>
But how to get content of a div inside a php variable?
You can parse your HTML content in PHP with some packages like Symfony 2 Dom Crawler.
If your just want to use a value many time through your script, I think you should consider saving this value in a variable and use it instead of store the whole HTML elements in a static file and use any parser.
An example of using Dom Crawler library for you:
use Symfony\Component\DomCrawler\Crawler;
use Symfony\Component\CssSelector\CssSelector;
CssSelector::disableHtmlExtension();
function getInnerHtml( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveHtml( $child );
}
return $innerHTML;
}
$html = <<<'HTML'
<div>
<div>foo</div>
<div id="xnavact"><span>bar</span></div>
</div>
HTML;
$crawler = new Crawler($html);
$crawler = $crawler->filter('#xnavact');
foreach ($crawler as $domElement) {
print getInnerHtml($domElement); //result: <span>bar</span>
}
You can use preg_replace or I don't understand your question correctly

How to traverse child elements by xpath in DomNodeList?

I have this PHP script:
<?php
libxml_use_internal_errors(true);
/* Createa a new DomDocument object */
$dom = new DomDocument;
$dom_grep = new DomDocument;
/* Load the HTML */
$dom->loadHTMLFile("http://domain.com/catalog/0_1.html");
/* Create a new XPath object */
$xpath = new DomXPath($dom);
/* Query all <table> nodes containing specified class name */
$nodes = $xpath->query("/html/.//table[#class='right']");
/* Set HTTP response header to plain text for debugging output */
header("Content-type: text/plain");
/* How to make Xpath in code below??? */
foreach ($nodes as $i => $node) {
$child[$i]["title"] = $node->query("//tr[#class='bg3']//h3");
$child[$i]["href"] = $node->query("a['href=/catalog/details']");
}
}
?>
But I got this error in result:
"Fatal error: Call to undefined method DOMElement::query()" in $child array
How to make another xpath query in $nodes?
Thank you!

How to change root of a node with DOMDocument methods?

How to only change root's tag name of a DOM node?
In the DOM-Document model we can not change the property documentElement of a DOMElement object, so, we need "rebuild" the node... But how to "rebuild" with childNodes property?
NOTE: I can do this by converting to string with saveXML and cuting root by regular expressions... But it is a workaround, not a DOM-solution.
Tried but not works, PHP examples
PHP example (not works, but WHY?):
Try-1
// DOMElement::documentElement can not be changed, so...
function DomElement_renameRoot1($ele,$ROOTAG='newRoot') {
if (gettype($ele)=='object' && $ele->nodeType==XML_ELEMENT_NODE) {
$doc = new DOMDocument();
$eaux = $doc->createElement($ROOTAG); // DOMElement
foreach ($ele->childNodes as $node)
if ($node->nodeType == 1) // DOMElement
$eaux->appendChild($node); // error!
elseif ($node->nodeType == 3) // DOMText
$eaux->appendChild($node); // error!
return $eaux;
} else
die("ERROR: invalid DOM object as input");
}
The appendChild($node) cause an error:
Fatal error: Uncaught exception 'DOMException'
with message 'Wrong Document Error'
Try-2
From #can suggestion (only pointing link) and my interpretation of the poor dom-domdocument-renamenode manual.
function DomElement_renameRoot2($ele,$ROOTAG='newRoot') {
$ele->ownerDocument->renameNode($ele,null,"h1");
return $ele;
}
The renameNode() method caused an error,
Warning: DOMDocument::renameNode(): Not yet implemented
Try-3
From PHP manual, comment 1.
function renameNode(DOMElement $node, $newName)
{
$newNode = $node->ownerDocument->createElement($newName);
foreach ($node->attributes as $attribute)
$newNode->setAttribute($attribute->nodeName, $attribute->nodeValue);
while ($node->firstChild)
$newNode->appendChild($node->firstChild); // changes firstChild to next!?
$node->ownerDocument->replaceChild($newNode, $node); // changes $node?
// not need return $newNode;
}
The replaceChild() method caused an error,
Fatal error: Uncaught exception 'DOMException' with message 'Not Found Error'
As this has not been really answered yet, the error you get about not found is because of a little error in the renameNode() function you've copied.
In a somewhat related question about renaming different elements in the DOM I've seen this problem as well and used an adoption of that function in my answer that does not have this error:
/**
* Renames a node in a DOM Document.
*
* #param DOMElement $node
* #param string $name
*
* #return DOMNode
*/
function dom_rename_element(DOMElement $node, $name) {
$renamed = $node->ownerDocument->createElement($name);
foreach ($node->attributes as $attribute) {
$renamed->setAttribute($attribute->nodeName, $attribute->nodeValue);
}
while ($node->firstChild) {
$renamed->appendChild($node->firstChild);
}
return $node->parentNode->replaceChild($renamed, $node);
}
You might have spotted it in the last line of the function body: This is using ->parentNode instead of ->ownerDocument. As $node was not a child of the document, you did get the error. And it also was wrong to assume that it should be. Instead use the parent element to search for the child in there to replace it ;)
This has not been outlined in the PHP manual usernotes so far, however, if you did follow the link to the blog-post that originally suggested the renameNode() function you could find a comment below it offering this solution as well.
Anyway, my variant here uses a slightly different variable naming and is more distinct about the types. Like the example in the PHP manual it misses the variant that deals with namespace nodes. I'm not yet booked what would be best, e.g. creating an additional function dealing with it, taking over namespace from the node to rename or changing the namespace explicitly in a different function.
First, you need to understand that the DOMDocument is only the hierarchical root of the document-tree. It's name is always #document. You want to rename the root-element, which is the $document->documentElement.
If you want to copy nodes form a document to another document, you'll need to use the importNode() function: $document->importNode($nodeInAnotherDocument)
Edit:
renameNode() is not implemented yet, so you should make another root, and simply replace it with the old one. If you use DOMDocument->createElement() you don't need to use importNode() on it later.
$oldRoot = $doc->documentElement;
$newRoot = $doc->createElement('new-root');
foreach ($oldRoot->attributes as $attr) {
$newRoot->setAttribute($attr->nodeName, $attr->nodeValue);
}
while ($oldRoot->firstChild) {
$newRoot->appendChild($oldRoot->firstChild);
}
$doc->replaceChild($newRoot, $oldRoot);
This is an variation of my "Try-3" (see question), and works fine!
function xml_renameNode(DOMElement $node, $newName, $cpAttr=true) {
$newNode = $node->ownerDocument->createElement($newName);
if ($cpAttr && is_array($cpAttr)) {
foreach ($cpAttr as $k=>$v)
$newNode->setAttribute($k, $v);
} elseif ($cpAttr)
foreach ($node->attributes as $attribute)
$newNode->setAttribute($attribute->nodeName, $attribute->nodeValue);
while ($node->firstChild)
$newNode->appendChild($node->firstChild);
return $newNode;
}    
Of course, if you show how to use DOMDocument::renameNode (without errors!), the bounty goes for you!
ISTM in your approach you attempt to import nodes from another DOMDocument, so you need to use the importNode() method:
$d = new DOMDocument();
/* Make a `foo` element the root element of $d */
$root = $d->createElement("foo");
$d->appendChild($root);
/* Append a `bar` element as the child element of the root of $d */
$child = $d->createElement("bar");
$root->appendChild($child);
/* New document */
$d2 = new DOMDocument();
/* Make a `baz` element the root element of $d2 */
$root2 = $d2->createElement("baz");
$d2->appendChild($root2);
/*
* Import a clone of $child (from $d) into $d2,
* with its child nodes imported recursively
*/
$child2 = $d2->importNode($child, true);
/* Add the clone as the child node of the root of $d2 */
$root2->appendChild($child2);
However, it is far easier to append the child nodes to a new parent element (thereby moving them), and replace the old root with that parent element:
$d = new DOMDocument();
/* Make a `foo` element the root element of $d */
$root = $d->createElement("foo");
$d->appendChild($root);
/* Append a `bar` element as the child element of the root of $d */
$child = $d->createElement("bar");
$root->appendChild($child);
/* <?xml version="1.0"?>
<foo><bar/></foo> */
echo $d->saveXML();
$root2 = $d->createElement("baz");
/* Make the `bar` element the child element of `baz` */
$root2->appendChild($child);
/* Replace `foo` with `baz` */
$d->replaceChild($root2, $root);
/* <?xml version="1.0"?>
<baz><bar/></baz> */
echo $d->saveXML();
I hope I am not missing anything but I happened to have the similar problem and was able to solve it by using use DomDocument::replaceChild(...).
/* #var $doc DOMDocument */
$doc = DOMImplementation::createDocument(NULL, 'oldRoot');
/* #var $newRoot DomElement */
$newRoot = $doc->createElement('newRoot');
/* all the code to create the elements under $newRoot */
$doc->replaceChild($newRoot, $doc->documentElement);
$doc->documentElement->isSameNode($newRoot) === true;
What threw me off initially was that $doc->documentElement was readonly, but the above worked and seems to be much simpler solution IF the $newRoot was created with the same DomDocument, otherwise you'll need do the importNode solution as described above. From your question is appears that $newRoot could be created from the same $doc.
Let us know if this worked out for you. Cheers.
EDIT: Noticed in version 20031129 that the DomDocument::$formatOutput, if set, does not format $newRoot output when you finally call $doc->saveXML()

Building an xml document from functions that return DOMNodes

My question is a rather simple one for anyone familiar with the DOM* classes in PHP.
Basically i have different classes that i want to return to me something that I can append in my xml document
Following pseudo-code should demonstrate better
Class ChildObject{ function exportToXML( return a DOMNode ? ) }
Class ContainerObject{
function exportToXML(){
$domSomething = new DOM*SOMETHING*;
foreach($children as $child) $domSomething->appendChild($child->exportToXML);
return $domSomething ;
}
}
Now i want to create the entire DOMDocument
$xml = new DOMDocument();
$root = $xml->createElement('root');
foreach($containers as $container) $root->appendChild($container->exportToXML());
I tried sending the DOMDocument object as a reference, did not work. I tried creating DOMNodes but didn't work as well....so i'm looking at a simple answer: what datatypes do i need to return in order for me to achieve the above functionality?
<?php
$xml = new DOMDocument();
$h = $xml->createElement('hello');
$node1 = new DOMNode('aaa');
$node1->appendChild(new DOMText('new text content'));
//node1 is being returned by a function
$node2 = new DOMNode('bbb');
$node2->appendChild(new DOMText('new text content'));
//node2 is being returned by some other function
$h->appendChild($node1);//append to this element the returned node1
$h->appendChild($node2);//append to this element the returned node2
$xml->appendChild($h);//append to the document the root node
$content = $xml->saveXML();
file_put_contents('xml.xml', $content);//output to an xml file
?>
The above code should do the following:
consider that i want to build the following xml
<hello>
<node1>aaa</node1>
<node2>bbb</node2>
</hello>
node1 could be again a node that has multiple children so node1 could be as well as something like this:
<node1>
<child1>text</child1>
<child2>text</child2>
<child3>
<subchild1>text</subchild1>
</child3>
</node1>
Basically when i call exportToXML() something should be returned, call it $x that i can append in my document using $xml->appendChild($x);
I want to create the above structure and return the object that can be appended in the DOMDocument
The following code:
<?php
$xml = new DOMDocument();
$h = $xml->appendChild($xml->createElement('hello'));
$node1 = $h->appendChild($xml->createElement('aaa'));
$node1->appendChild($xml->createTextNode('new text content'));
$node2 = $h->appendChild($xml->createElement('bbb'));
$node2->appendChild($xml->createTextNode('new text content'));
$xml->save("xml.xml");
?>
will produce:
<?xml version="1.0"?>
<hello>
<aaa>new text content</aaa>
<bbb>new text content</bbb>
</hello>
Your example XML showed <node1>aaa</node1> but I think your various code snippet examples went out of sync when you were editing =) In case you need that output, try:
<?php
$xml = new DOMDocument();
$h = $xml->appendChild($xml->createElement('hello'));
$node1 = $h->appendChild($xml->createElement('node1'));
$node1->appendChild($xml->createTextNode('aaa'));
$node2 = $h->appendChild($xml->createElement('node2'));
$node2->appendChild($xml->createTextNode('bbb'));
$xml->save("xml.xml");
?>

How to insert HTML to PHP DOMNode?

Is there any way I can insert an HTML template to existing DOMNode without content being encoded?
I have tried to do that with:
$dom->createElement('div', '<h1>Hello world</h1>');
$dom->createTextNode('<h1>Hello world</h1>');
The output is pretty much the same, with only difference that first code would wrap it in a div.
I have tried to loadHTML from string but I have no idea how can I append it's body content to another DOMDocument.
In javascript, this process seems to be quite simple and obvious.
You can use
DOMDocumentFragment::appendXML — Append raw XML data
Example:
// just some setup
$dom = new DOMDocument;
$dom->loadXml('<html><body/></html>');
$body = $dom->documentElement->firstChild;
// this is the part you are looking for
$template = $dom->createDocumentFragment();
$template->appendXML('<h1>This is <em>my</em> template</h1>');
$body->appendChild($template);
// output
echo $dom->saveXml();
Output:
<?xml version="1.0"?>
<html><body><h1>This is <em>my</em> template</h1></body></html>
If you want to import from another DOMDocument, replace the three lines with
$tpl = new DOMDocument;
$tpl->loadXml('<h1>This is <em>my</em> template</h1>');
$body->appendChild($dom->importNode($tpl->documentElement, TRUE));
Using TRUE as the second argument to importNode will do a recursive import of the node tree.
If you need to import (malformed) HTML, change loadXml to loadHTML. This will trigger the HTML parser of libxml (what ext/DOM uses internally):
libxml_use_internal_errors(true);
$tpl = new DOMDocument;
$tpl->loadHtml('<h1>This is <em>malformed</em> template</h2>');
$body->appendChild($dom->importNode($tpl->documentElement, TRUE));
libxml_use_internal_errors(false);
Note that libxml will try to correct the markup, e.g. it will change the wrong closing </h2> to </h1>.
It works with another DOMDocument for parsing the HTML code. But you need to import the nodes into the main document before you can use them in it:
$newDiv = $dom->createElement('div');
$tmpDoc = new DOMDocument();
$tmpDoc->loadHTML($str);
foreach ($tmpDoc->getElementsByTagName('body')->item(0)->childNodes as $node) {
$node = $dom->importNode($node, true);
$newDiv->appendChild($node);
}
And as a handy function:
function appendHTML(DOMNode $parent, $source) {
$tmpDoc = new DOMDocument();
$tmpDoc->loadHTML($source);
foreach ($tmpDoc->getElementsByTagName('body')->item(0)->childNodes as $node) {
$node = $parent->ownerDocument->importNode($node, true);
$parent->appendChild($node);
}
}
Then you can simply do this:
$elem = $dom->createElement('div');
appendHTML($elem, '<h1>Hello world</h1>');
As I do not want to struggle with XML, because it throws errors faster and I am not a fan of prefixing an # to prevent error output. The loadHTML does the better job in my opinion and it is quite simple as that:
$doc = new DOMDocument();
$div = $doc->createElement('div');
// use a helper to load the HTML into a string
$helper = new DOMDocument();
$helper->loadHTML('This is my HTML Link.');
// now the magic!
// import the document node of the $helper object deeply (true)
// into the $div and append as child.
$div->appendChild($doc->importNode($helper->documentElement, true));
// add the div to the $doc
$doc->appendChild($div);
// final output
echo $doc->saveHTML();
Here is simple example by using DOMDocumentFragment:
$doc = new DOMDocument();
$doc->loadXML("<root/>");
$f = $doc->createDocumentFragment();
$f->appendXML("<foo>text</foo><bar>text2</bar>");
$doc->documentElement->appendChild($f);
echo $doc->saveXML();
Here is helper function for replacing DOMNode:
/**
* Helper function for replacing $node (DOMNode)
* with an XML code (string)
*
* #var DOMNode $node
* #var string $xml
*/
public function replaceNodeXML(&$node, $xml) {
$f = $this->dom->createDocumentFragment();
$f->appendXML($xml);
$node->parentNode->replaceChild($f,$node);
}
Source: Some old "PHP5 Dom Based Template" article.
And here is another suggestion posted by Pian0_M4n to use value attribute as workaround:
$dom = new DomDocument;
// main object
$object = $dom->createElement('div');
// html attribute
$attr = $dom->createAttribute('value');
// ugly html string
$attr->value = "<div> this is a really html string ©</div><i></i> with all the © that XML hates!";
$object->appendChild($attr);
// jquery fix (or javascript as well)
$('div').html($(this).attr('value')); // and it works!
$('div').removeAttr('value'); // to clean-up
No ideal, but at least it works.
Gumbo's code works perfectly! Just a little enhancement that adding the TRUE parameter so that it works with nested html snippets.
$node = $parent->ownerDocument->importNode($node);
$node = $parent->ownerDocument->importNode($node, **TRUE**);

Categories