php domdocument exception loadHTMLFile - php

I am having a strange behavior in my script. That has me confused
Script 1.
$dom = new DOMDocument();
$dom->loadHTMLFile("html/signinform.html");//loads file here
$form = $dom->getElementsByTagName("form")->item(0);
$div = $dom->createElement("div");
$dom->appendChild($div)->appendChild($form);
echo $dom->saveHTML();
Script 2.
$dom = new DOMDocument();
$div = $dom->createElement("div");
$dom->loadHTMLFile("html/signinform.html");//loads file here
$form = $dom->getElementsByTagName("form")->item(0);
$dom->appendChild($div)->appendChild($form);
echo $dom->saveHTML();
Script 1 works without problem. It shows the form. However Script 2 throws the following error: Fatal error: Uncaught exception 'DOMException' with message 'Wrong Document Error' in C:\Users
Could someone explain to me why the mere changing of position of the loadHTMLFile function results in such error? Thanks

You have added an element to the DOM (div) and then attempted to load a file to be parsed and its DOM structure used.
Load the file first if you intend to use one.

For DOM manipulation you do not need to insert an already existing element so doing something like this: $dom->appendChild($form) only reinserts the same form element, when you pull an element using $dom->getElementsByTag("form")->item(0) it becomes it's own DOM object which you can reference directly and append to. A proper example would be:
$dom = new DOMDocument();
$dom->loadHTMLFile("assets/dom_document-form.html");
$div = $dom->createElement("div");
$form = $dom->getElementsByTagName("form")->item(0);
$form->appendChild($div);
echo $dom->saveHTML();
One should append directly to the object they pulled from the DOM instead and load the document first.
To help aid your initial questions too:
Append directly to element that you pulled as it references the object.
new DOMDocument can be used to create multiple documents.
using DOMDocument::createElement before loadHTMLFile creates 2 DOMDocuments.
Using DomDocument::createDocumentFragment acts the same and creates it's own DOM.
If you would like to keep your code the same and create two DomDocuments then you should use DomDocument::importNode, an example of this would be:
$dom = new DOMDocument();
$div = $dom->createElement("div");
$dom->loadHTMLFile("assets/dom_document-form.html");
$node = $dom->importNode($div);
$form = $dom->getElementsByTagName("form")->item(0);
$form->appendChild($node);
echo $dom->saveHTML();

Related

retrieving certain attributes using DOMDocument

I'm trying to figure out how parse an html page to get a forms action value, the labels within the form tab as well as the input field names. I took at look at php.net Domdocument and it tells me to get a childnode but all that does is give me errors that it doesnt exist. I also tried doing print_r of the variable holding the html content and all that shows me is length=1. Can someone show me a few samples that i can use because php.net is confusing to follow.
<?php
$content = "some-html-source";
$content = preg_replace("/&(?!(?:apos|quot|[gl]t|amp);|#)/", '&', $content);
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($content);
$form = $dom->getElementsByTagName('form');
print_r($form);
I suggest using DomXPath instead of getElementsByTagName because it allows you to select attribute values directly and returns a DOMNodeList object just like getElementsByTagName. The # in #action indicates that we're selecting by attribute.
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DomXPath($doc);
$action = $xpath->query('//form/#action')->item(0);
var_dump($action);
Similarly, to get the first input
$action = $xpath->query('//form/input')->item(0);
To get all input fields
for($i=0;$i<$xpath->query('//form/input')->length;$i++) {
$label = $xpath->query('//form/input')->item($i);
var_dump($label);
}
If you're not familiar with XPath, I recommend viewing these examples.

Call to undefined method DOMDocument::createDocumentType()

I have the following script snippet. Originally I did not realize to use getElementById that I needed to include createDocumentType, but now I get the error listed above. What am I doing wrong here? Thanks in advance!
...
$result = curl_exec($ch); //contains some webpage i am grabbing remotely
$dom = new DOMDocument();
$dom->createDocumentType('html', '-//W3C//DTD HTML 4.01 Transitional//EN', 'http://www.w3.org/TR/html4/loose.dtd');
$elements = $dom->loadHTML($result);
$e = $elements->getElementById('1');
...
Edit: Additional note, I verified the DOM is correct on the remote page.
DOMDocument does not have a method named createDocumentType, as you can see in the Manual. The method belongs to the DOMImplemetation class. It is used like this (taken from the manual):
// Creates an instance of the DOMImplementation class
$imp = new DOMImplementation;
// Creates a DOMDocumentType instance
$dtd = $imp->createDocumentType('graph', '', 'graph.dtd');
// Creates a DOMDocument instance
$dom = $imp->createDocument("", "", $dtd);
Since you want to load HTML into the document, you don't need to specify a document type, since it is determined from the imported HTML. You just have to have some id attributes, or a DTD that identifies an other attribute as an id. This is part of the HTML file, not the parsing PHP code.
$dom = new DOMDocument();
$dom->loadHTML($result);
$element = $dom->getElementById('my_id');
will do the job.

Accessing an imported element after the original DOMDocument is destroyed

I've been messing around with DOMDocument lately, and I've noticed that in order to transfer elements from one document to the next, I have to call $DOMDocument->importNode() on the target DOMDocument.
However, I'm running into weird issues, where once the originating document is destroyed, the cloned element misbehaves.
For example, here's some lovely working code:
$dom1 = new DOMDocument;
$dom2 = new DOMDocument;
$dom2->loadHTML('<div id="div"><span class="inner"></span></div>');
$div = $dom2->getElementById('div');
$children = $dom1->importNode( $div, true )->childNodes;
echo $children->item(0)->tagName; // Output: "span"
Here's a demo: http://codepad.viper-7.com/pjd9Ty
The problem arises when I try using the elements after their original document is out of scope:
global $dom;
$dom = new DOMDocument;
function get_div_children () {
global $dom;
$local_dom = new DOMDocument;
$local_dom->loadHTML('<div id="div"><span class="inner"></span></div>');
$div = $local_dom->getElementById('div');
return $dom->importNode( $div, true )->childNodes;
}
echo get_div_children()->item(0)->tagName;
The above results in the following errors:
PHP Warning: Couldn't fetch DOMElement. Node no longer exists in ...
PHP Notice: Undefined property: DOMElement::$tagName in ...
Here's a demo: http://codepad.viper-7.com/c0kqOA
My question is twofold:
Shouldn't the returned elements exist even after the original document was destroyed, since they were cloned into the current document?
A workaround. For various reasons, I have to manipulate the elements after the original document is destroyed, but before I actually insert them into the DOM of the other DOMDocument. Is there any way to accomplish this?
Clarification: I understand that if the elements are inserted into the DOM, it behaves as expected. But, as outlined above, my setup calls for the elements to be manipulated before being inserted into the DOM (long story). Given that the first example here works - and that manipulating elements outside of the DOM is standard procedure in JavaScript - shouldn't this be possible here as well?
The cloned node has a reference to $dom, but $dom has not. Internal PHP garbage collector destroys such nodes when the calling context changes. There is only one way to create this reference: $dom->documentElement->appendChild($node).
So, use code like this (static keyword will prevent garbage collector from destroying your variable):
global $dom;
$dom = new DOMDocument;
function get_div_children () {
global $dom;
$local_dom = new DOMDocument;
$local_dom->loadHTML('<div id="div"><span class="inner"></span></div>');
$div = $local_dom->getElementById('div');
static $nodes;
$nodes = $dom->importNode( $div, true )->childNodes;
return $nodes;
}
echo get_div_children()->item(0)->tagName;

PHP DOMXpath not picking anything up

I'm trying to write a script that grabs the URL of the first image from this website: http://www.slothradio.com/covers/?adv=&artist=pantera&album=vulgar+display+of+power
Here's my script:
$content = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[#class='album0']/img");
echo '<pre>';print_r($elements);exit;
When I run that, it outputs
DOMNodeList Object
(
)
Even when I change my query to $xpath->query("*/img"), I still get nothing. What am I doing wrong?
$doc->loadHTMLFile($content); takes in FILE PATH not HTML content see documentation
http://php.net/manual/en/domdocument.loadhtmlfile.php
Use
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
To Output Element use
var_dump(iterator_to_array($elements));
//Or
print_r(iterator_to_array($elements));
Thanks
:)
What am I doing wrong?
You are using print_r, but DOMNodeList does not offer any output for that function (because it's an internal class). You can start with outputting the number of items for example. In the end you need to iterate over the node list and deal with each node on your own.
printf("Found %d element(s).\n", $elements->length);

Getting an element from PHP DOM and changing its value

I'm using PHP/Zend to load html into a DOM, and then I get a specific div id that I want to modify.
$dom = new Zend_Dom_Query($html);
$element = $dom->query('div[id="someid"]');
How do I modify the text/content/html displayed inside that $element div, and then save the changes to the $dom or $html so I can print the modified html. Any idea how to do this?
Zend_Dom_Query is tailored just for querying a dom, so it doesn't provide an interface in and of itself to alter the dom and save it, but it does expose the PHP Native DOM objects that will let you do so. Something like this should work:
$dom = new Zend_Dom_Query($html);
$document = $dom->getDocument();
$elements = $dom->query('div[id="someid"]');
foreach($elements AS $element) {
//$element is an instance of DOMElement (http://www.php.net/DOMElement)
//You have to create new nodes off the document
$node = $document->createElement("div", "contents of div");
$element->appendChild($node)
}
$newHtml = $document->saveXml();
Take a look at the PHP Doc for DOMElement to get an idea of how you can alter the dom:
http://www.php.net/DOMElement

Categories