PHP DomDocument, append element of a page to another - php

I have a php script for generate a shop.
So, first i retrieve my html page with dom document :
$oPage = new webHTML("boutique_panier_HTML");
$oInter = $oPage->getElementById("inter");
webHTML() is just a custom DomDocument class. So, i retrieve my principal div (inter) and i do some treatments into this div before return $oPage->saveHTML();
So, for now, it's ok.
I need to load another page, retrieve an element (form) and put this element on my $oInter.
So, juste before return $oPage->saveHTML();, i do :
$oPage2 = new webHTML("formulaire_bon_commande");
$oInter2 = $oPage2->getElementsByTagName("form");
$oInter->appendChild($oInter2);
So, i load the page "formulaire_bon_commande", i retrieve my element form, and i try to append this element to my $oInter div.
And with this code, i have just a white page... No effect. Any ideas ?

method getElementsByTagName returns DOMNodeList, appendChild expects DOMNode, so you have to iterate $oInter2
$oInter2 = $oPage2->getElementsByTagName("form");
foreach ($oInter2 as $el){
$node = $oPage->importNode($el, true);
$oInter->appendChild($node);
}
Example:
$oPage = new DOMDocument();
$oPage->loadHTML('<html><p id="inter"></p></html>');
$oInter = $oPage->getElementById("inter");
$oPage2 = new DOMDocument();
$oPage2->loadHTML('<html><form><button></button></form></html>');
$oInter2 = $oPage2->getElementsByTagName("form");
foreach($oInter2 as $el) {
$node = $oPage->importNode($el, true);
$oInter->appendChild($node);
}
echo $oPage->saveHTML();
output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p id="inter"><form><button></button></form></p></body></html>

Related

I'm trying to scrape a specific div with an id on a page

I want to scrape the contents of a page, well really just a single div from that page, and display it to the user inside of a small div on a webpage. I just need a piece of info from a carfax page that needs user credentials so I can't post the exact code but I tried using google.com and have the same problem so the solution should cross over.
Right now I've tried this:
$webPage = file_get_contents('http://www.google.com');
$doc = new DOMDocument();
$doc->loadHTML($webPage);
$div = $doc->getElementById('lga');//this is the id to the div holding the image above the textbox
//echo $webPage;//this displays www.google.com minus the image. I imagine because of the file path
//var_dump($div);//this display "object(DOMElement)#2 (0) { }" and I'm not sure what that means
//echo $div;//this has a server error
I'm also looking at simple_html_dom.php trying to figure that out.
You can use this:
/**
* Downloads a web page from $url, selects the the element by $id
* and returns it's xml string representation.
*/
function getElementByIdAsString($url, $id, $pretty = true) {
$doc = new DOMDocument();
#$doc->loadHTMLFile($url);
if(!$doc) {
throw new Exception("Failed to load $url");
}
// Obtain the element
$element = $doc->getElementById($id);
if(!$element) {
throw new Exception("An element with id $id was not found");
}
if($pretty) {
$doc->formatOutput = true;
}
// Return the string representation of the element
return $doc->saveXML($element);
}
// call it:
echo getElementByIdAsString('http://www.google.com', 'lga');

Building an xml document from functions that return DOMNodes

My question is a rather simple one for anyone familiar with the DOM* classes in PHP.
Basically i have different classes that i want to return to me something that I can append in my xml document
Following pseudo-code should demonstrate better
Class ChildObject{ function exportToXML( return a DOMNode ? ) }
Class ContainerObject{
function exportToXML(){
$domSomething = new DOM*SOMETHING*;
foreach($children as $child) $domSomething->appendChild($child->exportToXML);
return $domSomething ;
}
}
Now i want to create the entire DOMDocument
$xml = new DOMDocument();
$root = $xml->createElement('root');
foreach($containers as $container) $root->appendChild($container->exportToXML());
I tried sending the DOMDocument object as a reference, did not work. I tried creating DOMNodes but didn't work as well....so i'm looking at a simple answer: what datatypes do i need to return in order for me to achieve the above functionality?
<?php
$xml = new DOMDocument();
$h = $xml->createElement('hello');
$node1 = new DOMNode('aaa');
$node1->appendChild(new DOMText('new text content'));
//node1 is being returned by a function
$node2 = new DOMNode('bbb');
$node2->appendChild(new DOMText('new text content'));
//node2 is being returned by some other function
$h->appendChild($node1);//append to this element the returned node1
$h->appendChild($node2);//append to this element the returned node2
$xml->appendChild($h);//append to the document the root node
$content = $xml->saveXML();
file_put_contents('xml.xml', $content);//output to an xml file
?>
The above code should do the following:
consider that i want to build the following xml
<hello>
<node1>aaa</node1>
<node2>bbb</node2>
</hello>
node1 could be again a node that has multiple children so node1 could be as well as something like this:
<node1>
<child1>text</child1>
<child2>text</child2>
<child3>
<subchild1>text</subchild1>
</child3>
</node1>
Basically when i call exportToXML() something should be returned, call it $x that i can append in my document using $xml->appendChild($x);
I want to create the above structure and return the object that can be appended in the DOMDocument
The following code:
<?php
$xml = new DOMDocument();
$h = $xml->appendChild($xml->createElement('hello'));
$node1 = $h->appendChild($xml->createElement('aaa'));
$node1->appendChild($xml->createTextNode('new text content'));
$node2 = $h->appendChild($xml->createElement('bbb'));
$node2->appendChild($xml->createTextNode('new text content'));
$xml->save("xml.xml");
?>
will produce:
<?xml version="1.0"?>
<hello>
<aaa>new text content</aaa>
<bbb>new text content</bbb>
</hello>
Your example XML showed <node1>aaa</node1> but I think your various code snippet examples went out of sync when you were editing =) In case you need that output, try:
<?php
$xml = new DOMDocument();
$h = $xml->appendChild($xml->createElement('hello'));
$node1 = $h->appendChild($xml->createElement('node1'));
$node1->appendChild($xml->createTextNode('aaa'));
$node2 = $h->appendChild($xml->createElement('node2'));
$node2->appendChild($xml->createTextNode('bbb'));
$xml->save("xml.xml");
?>

How to insert HTML to PHP DOMNode?

Is there any way I can insert an HTML template to existing DOMNode without content being encoded?
I have tried to do that with:
$dom->createElement('div', '<h1>Hello world</h1>');
$dom->createTextNode('<h1>Hello world</h1>');
The output is pretty much the same, with only difference that first code would wrap it in a div.
I have tried to loadHTML from string but I have no idea how can I append it's body content to another DOMDocument.
In javascript, this process seems to be quite simple and obvious.
You can use
DOMDocumentFragment::appendXML — Append raw XML data
Example:
// just some setup
$dom = new DOMDocument;
$dom->loadXml('<html><body/></html>');
$body = $dom->documentElement->firstChild;
// this is the part you are looking for
$template = $dom->createDocumentFragment();
$template->appendXML('<h1>This is <em>my</em> template</h1>');
$body->appendChild($template);
// output
echo $dom->saveXml();
Output:
<?xml version="1.0"?>
<html><body><h1>This is <em>my</em> template</h1></body></html>
If you want to import from another DOMDocument, replace the three lines with
$tpl = new DOMDocument;
$tpl->loadXml('<h1>This is <em>my</em> template</h1>');
$body->appendChild($dom->importNode($tpl->documentElement, TRUE));
Using TRUE as the second argument to importNode will do a recursive import of the node tree.
If you need to import (malformed) HTML, change loadXml to loadHTML. This will trigger the HTML parser of libxml (what ext/DOM uses internally):
libxml_use_internal_errors(true);
$tpl = new DOMDocument;
$tpl->loadHtml('<h1>This is <em>malformed</em> template</h2>');
$body->appendChild($dom->importNode($tpl->documentElement, TRUE));
libxml_use_internal_errors(false);
Note that libxml will try to correct the markup, e.g. it will change the wrong closing </h2> to </h1>.
It works with another DOMDocument for parsing the HTML code. But you need to import the nodes into the main document before you can use them in it:
$newDiv = $dom->createElement('div');
$tmpDoc = new DOMDocument();
$tmpDoc->loadHTML($str);
foreach ($tmpDoc->getElementsByTagName('body')->item(0)->childNodes as $node) {
$node = $dom->importNode($node, true);
$newDiv->appendChild($node);
}
And as a handy function:
function appendHTML(DOMNode $parent, $source) {
$tmpDoc = new DOMDocument();
$tmpDoc->loadHTML($source);
foreach ($tmpDoc->getElementsByTagName('body')->item(0)->childNodes as $node) {
$node = $parent->ownerDocument->importNode($node, true);
$parent->appendChild($node);
}
}
Then you can simply do this:
$elem = $dom->createElement('div');
appendHTML($elem, '<h1>Hello world</h1>');
As I do not want to struggle with XML, because it throws errors faster and I am not a fan of prefixing an # to prevent error output. The loadHTML does the better job in my opinion and it is quite simple as that:
$doc = new DOMDocument();
$div = $doc->createElement('div');
// use a helper to load the HTML into a string
$helper = new DOMDocument();
$helper->loadHTML('This is my HTML Link.');
// now the magic!
// import the document node of the $helper object deeply (true)
// into the $div and append as child.
$div->appendChild($doc->importNode($helper->documentElement, true));
// add the div to the $doc
$doc->appendChild($div);
// final output
echo $doc->saveHTML();
Here is simple example by using DOMDocumentFragment:
$doc = new DOMDocument();
$doc->loadXML("<root/>");
$f = $doc->createDocumentFragment();
$f->appendXML("<foo>text</foo><bar>text2</bar>");
$doc->documentElement->appendChild($f);
echo $doc->saveXML();
Here is helper function for replacing DOMNode:
/**
* Helper function for replacing $node (DOMNode)
* with an XML code (string)
*
* #var DOMNode $node
* #var string $xml
*/
public function replaceNodeXML(&$node, $xml) {
$f = $this->dom->createDocumentFragment();
$f->appendXML($xml);
$node->parentNode->replaceChild($f,$node);
}
Source: Some old "PHP5 Dom Based Template" article.
And here is another suggestion posted by Pian0_M4n to use value attribute as workaround:
$dom = new DomDocument;
// main object
$object = $dom->createElement('div');
// html attribute
$attr = $dom->createAttribute('value');
// ugly html string
$attr->value = "<div> this is a really html string ©</div><i></i> with all the © that XML hates!";
$object->appendChild($attr);
// jquery fix (or javascript as well)
$('div').html($(this).attr('value')); // and it works!
$('div').removeAttr('value'); // to clean-up
No ideal, but at least it works.
Gumbo's code works perfectly! Just a little enhancement that adding the TRUE parameter so that it works with nested html snippets.
$node = $parent->ownerDocument->importNode($node);
$node = $parent->ownerDocument->importNode($node, **TRUE**);

Can I get the matched DOM string with PHP and DOMDocument?

I've got my HTML inside of $html.
dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#id="header"]');
foreach($tags as $tag) {
var_dump($tag->nodeValue); // the innerHTML of that element
var_dump($tag); // object(DOMElement)#3 (0) { }
}
Is there a way to get that node, or remove it?
Basically, I'm parsing an existing website and need to remove elements from it. What method do I call to do that?
Thanks
Have you checked out DOMNode::removeChild ?

Add node to a XML variable and Save it

i want to add the xml node i.e <name>B07BZFZV8D</name> to the XML variable before saving it.
I want to add the 'name' node inside the 'Self' Element.
#Previously i use to save it directly like this,
$Response #this is the respnse from api
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML($Response);
##saving in file
$myfile = file_put_contents('data.xml', $Response.PHP_EOL , FILE_APPEND | LOCK_EX);
With DOM you use methods of the document object to create the node and methods of the parent node to insert/add it to the hierarchy.
DOMDocument has create* methods for the different node types (element, text, cdata section, comment, ...). The parent nodes (element, document, fragment) have methods like appendChild and insertBefore to add/remove them.
Xpath can be used to fetch nodes from the DOM.
$document = new DOMDocument;
$document->preserveWhiteSpace = FALSE;
$document->loadXML($xmlString);
$xpath = new DOMXpath($document);
// fetch the first Data element inside the Report document element
foreach ($xpath->evaluate('/Report/Data[1]') as $data) {
// create the name element and append it
$name = $data->appendChild($document->createElement('name'));
// create a node for the text content and append it
$name->appendChild($document->createTextNode('Vivian'));
}
$document->formatOutput = TRUE;
echo $document->saveXML();
Output:
<?xml version="1.0" encoding="UTF-8"?>
<Report>
<Data>
<id>87236</id>
<purchase>3</purchase>
<address>XXXXXXXX</address>
<name>Vivian</name>
</Data>
</Report>
Using #ThW Code:
Need to change Create Element functionality
$document = new DOMDocument;
$document->preserveWhiteSpace = FALSE;
$document->loadXML($xmlString);
$xpath = new DOMXpath($document);
// fetch the first Data element inside the Report document element
foreach ($xpath->evaluate('/Report/Data[1]') as $data) {
// create the name element with value and append it
$xmlElement = $document->createElement('name', 'Vivian');
$data->appendChild($xmlElement);
}
$document->formatOutput = TRUE;
echo $document->saveXML();
It works for me with php7.0. Check it works for you.

Categories