Cannot access child nodes while parsing XML with PHP and Xpath - php

I have this following XML
<Logs>
<UnplugDate>
<Date>2013-09-10T09:20:00</Date>
<Date>2013-09-09T16:03:00</Date>
</UnplugDate>
What I'm trying to do here is to read the values of both variables under <UnplugDate> tag.
I try to use the hasChildNodes() but when I debug It doesn't go inside
foreach($unplug_date as $node)
block.
Any Idea how can I read these values? Thanks in advance
$logs = $key->getElementsByTagName(tag_constants::TAG_LOGS);
$unplug_date = $logs->item(0)->getElementsByTagName(tag_constants::TAG_UNPLUG_DATE)->item(0);
foreach($unplug_date as $node) {
if($node->hasChildNodes()) {
foreach ($node->childNodes as $unplug_date_value) {
$unplug_date_value = $unplug_date->getElementsByTagName(tag_constants::TAG_DATE)->item(0)->nodeValue;
}
}
}
NOTE:
tag_constants::TAG_LOGS -> Logs
tag_constants::TAG_UNPLUG_DATE -> UnplugDate
tag_constants::TAG_DATE -> Date

I've finally find the solution. Writing :
$test = $unplug_date->getElementsByTagName(tag_constants::TAG_DATE);
instead of
$unplug_date = $logs->item(0)->getElementsByTagName(tag_constants::TAG_UNPLUG_DATE)->item(0);
solves the problem.

Related

Determine what element is now xpath html foreach?

I am in the middle of a process that should extract something from a HTML page. I am fairly new to DomDocument in PHP, but I got this together from some tutorials and Stack Overflow.
Unfortunately, I need to know what element I am currently getting in the foreach loop below. As far as I know, the getName() function has something to do with XML, because it gives an Undefined Function Fatal error. Do you guys know any way to do this?
$rawdom = new DOMDocument();
$rawdom->loadHTML($page);
$finder = new DomXPath($rawdom);
$nodes = $finder->query("//dl[contains(#class, 'layout__definitionlist')]");
$tmp_dom = new DOMDocument();
foreach ($nodes as $node) {
echo $node->getName();
$tmp_dom->appendChild($tmp_dom->importNode($node,true));
}
$innerHTML = $tmp_dom->saveHTML();
echo $innerHTML;
With DOMElement objects, the element name is not accessible using a getName() function, but as property $tagName:
echo $node->tagName;
getName() is only available with SimpleXMLElement, which is another XML/XPath API for PHP.

PHP DOMDocument - trouble accessing list index

I am writing some code for an IRC bot written in php and running on the linux cli. I'm having a little trouble with my code to retrieve a websites title tag and display it using DOMDocument NodeList. Basically, on websites with two or more tags (and you would be surprised how many there actually are...) I want to process for only the first title tag. As you can see from the code below (which is working fine for processing one, or more tags) there is a foreach block where it iterates through each title tag.
public function onReceivedData($data) {
// loop through each message token
foreach ($data["message"] as $token) {
// if the token starts with www, add http file handle
if (strcmp(substr($token, 0, 4), "www.") == 0) {
$token = "http://" . $token;
}
// validate token as a URL
if (filter_var($token, FILTER_VALIDATE_URL)) {
// create timeout stream context
$theContext['http']['timeout'] = 3;
$context = stream_context_create($theContext);
// get contents of url
if ($file = file_get_contents($token, false, $context)) {
// instantiate a new DOMDocument object
$dom = new DOMDocument;
// load the html into the DOMDocument obj
#$dom->loadHTML($file);
// retrieve the title from the DOM node
// if assignment is valid then...
if ($title = $dom->getElementsByTagName("title")) {
// send a message to the channel
foreach ($title as $theTitle) {
$this->privmsg($data["target"], $theTitle->nodeValue);
}
}
} else {
// notify of failure
$this->privmsg($data["target"], "Site could not be reached");
}
}
}
}
What I'd prefer, is to somehow limit it to only processing the first title tag. I'm aware that I can just wrap an if statement around it with a variable so it only echos one time, but I'm more looking at using a "for" statement to process a single iteration. However, when I do this, I can't access the title attribute with $title->nodeValue; it says it's undefined, and only when i use the foreach $title as $theTitle can I access the values. I've tried $title[0]->nodeValue and $title->nodeValue(0) to retrieve the first title from the list, but unfortunately to no avail. A bit stumped and a quick google didn't turn up a lot.
Any help would be greatly appreciated! Cheers, and I'll keep looking too.
You can solve this with XPath:
$dom = new DOMDocument();
#$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$title = $xpath->query('//title')->item(0)->nodeValue;
Try something like this:
$title->item(0)->nodeValue;
http://www.php.net/manual/en/class.domnodelist.php

How to change root of a node with DOMDocument methods?

How to only change root's tag name of a DOM node?
In the DOM-Document model we can not change the property documentElement of a DOMElement object, so, we need "rebuild" the node... But how to "rebuild" with childNodes property?
NOTE: I can do this by converting to string with saveXML and cuting root by regular expressions... But it is a workaround, not a DOM-solution.
Tried but not works, PHP examples
PHP example (not works, but WHY?):
Try-1
// DOMElement::documentElement can not be changed, so...
function DomElement_renameRoot1($ele,$ROOTAG='newRoot') {
if (gettype($ele)=='object' && $ele->nodeType==XML_ELEMENT_NODE) {
$doc = new DOMDocument();
$eaux = $doc->createElement($ROOTAG); // DOMElement
foreach ($ele->childNodes as $node)
if ($node->nodeType == 1) // DOMElement
$eaux->appendChild($node); // error!
elseif ($node->nodeType == 3) // DOMText
$eaux->appendChild($node); // error!
return $eaux;
} else
die("ERROR: invalid DOM object as input");
}
The appendChild($node) cause an error:
Fatal error: Uncaught exception 'DOMException'
with message 'Wrong Document Error'
Try-2
From #can suggestion (only pointing link) and my interpretation of the poor dom-domdocument-renamenode manual.
function DomElement_renameRoot2($ele,$ROOTAG='newRoot') {
$ele->ownerDocument->renameNode($ele,null,"h1");
return $ele;
}
The renameNode() method caused an error,
Warning: DOMDocument::renameNode(): Not yet implemented
Try-3
From PHP manual, comment 1.
function renameNode(DOMElement $node, $newName)
{
$newNode = $node->ownerDocument->createElement($newName);
foreach ($node->attributes as $attribute)
$newNode->setAttribute($attribute->nodeName, $attribute->nodeValue);
while ($node->firstChild)
$newNode->appendChild($node->firstChild); // changes firstChild to next!?
$node->ownerDocument->replaceChild($newNode, $node); // changes $node?
// not need return $newNode;
}
The replaceChild() method caused an error,
Fatal error: Uncaught exception 'DOMException' with message 'Not Found Error'
As this has not been really answered yet, the error you get about not found is because of a little error in the renameNode() function you've copied.
In a somewhat related question about renaming different elements in the DOM I've seen this problem as well and used an adoption of that function in my answer that does not have this error:
/**
* Renames a node in a DOM Document.
*
* #param DOMElement $node
* #param string $name
*
* #return DOMNode
*/
function dom_rename_element(DOMElement $node, $name) {
$renamed = $node->ownerDocument->createElement($name);
foreach ($node->attributes as $attribute) {
$renamed->setAttribute($attribute->nodeName, $attribute->nodeValue);
}
while ($node->firstChild) {
$renamed->appendChild($node->firstChild);
}
return $node->parentNode->replaceChild($renamed, $node);
}
You might have spotted it in the last line of the function body: This is using ->parentNode instead of ->ownerDocument. As $node was not a child of the document, you did get the error. And it also was wrong to assume that it should be. Instead use the parent element to search for the child in there to replace it ;)
This has not been outlined in the PHP manual usernotes so far, however, if you did follow the link to the blog-post that originally suggested the renameNode() function you could find a comment below it offering this solution as well.
Anyway, my variant here uses a slightly different variable naming and is more distinct about the types. Like the example in the PHP manual it misses the variant that deals with namespace nodes. I'm not yet booked what would be best, e.g. creating an additional function dealing with it, taking over namespace from the node to rename or changing the namespace explicitly in a different function.
First, you need to understand that the DOMDocument is only the hierarchical root of the document-tree. It's name is always #document. You want to rename the root-element, which is the $document->documentElement.
If you want to copy nodes form a document to another document, you'll need to use the importNode() function: $document->importNode($nodeInAnotherDocument)
Edit:
renameNode() is not implemented yet, so you should make another root, and simply replace it with the old one. If you use DOMDocument->createElement() you don't need to use importNode() on it later.
$oldRoot = $doc->documentElement;
$newRoot = $doc->createElement('new-root');
foreach ($oldRoot->attributes as $attr) {
$newRoot->setAttribute($attr->nodeName, $attr->nodeValue);
}
while ($oldRoot->firstChild) {
$newRoot->appendChild($oldRoot->firstChild);
}
$doc->replaceChild($newRoot, $oldRoot);
This is an variation of my "Try-3" (see question), and works fine!
function xml_renameNode(DOMElement $node, $newName, $cpAttr=true) {
$newNode = $node->ownerDocument->createElement($newName);
if ($cpAttr && is_array($cpAttr)) {
foreach ($cpAttr as $k=>$v)
$newNode->setAttribute($k, $v);
} elseif ($cpAttr)
foreach ($node->attributes as $attribute)
$newNode->setAttribute($attribute->nodeName, $attribute->nodeValue);
while ($node->firstChild)
$newNode->appendChild($node->firstChild);
return $newNode;
}    
Of course, if you show how to use DOMDocument::renameNode (without errors!), the bounty goes for you!
ISTM in your approach you attempt to import nodes from another DOMDocument, so you need to use the importNode() method:
$d = new DOMDocument();
/* Make a `foo` element the root element of $d */
$root = $d->createElement("foo");
$d->appendChild($root);
/* Append a `bar` element as the child element of the root of $d */
$child = $d->createElement("bar");
$root->appendChild($child);
/* New document */
$d2 = new DOMDocument();
/* Make a `baz` element the root element of $d2 */
$root2 = $d2->createElement("baz");
$d2->appendChild($root2);
/*
* Import a clone of $child (from $d) into $d2,
* with its child nodes imported recursively
*/
$child2 = $d2->importNode($child, true);
/* Add the clone as the child node of the root of $d2 */
$root2->appendChild($child2);
However, it is far easier to append the child nodes to a new parent element (thereby moving them), and replace the old root with that parent element:
$d = new DOMDocument();
/* Make a `foo` element the root element of $d */
$root = $d->createElement("foo");
$d->appendChild($root);
/* Append a `bar` element as the child element of the root of $d */
$child = $d->createElement("bar");
$root->appendChild($child);
/* <?xml version="1.0"?>
<foo><bar/></foo> */
echo $d->saveXML();
$root2 = $d->createElement("baz");
/* Make the `bar` element the child element of `baz` */
$root2->appendChild($child);
/* Replace `foo` with `baz` */
$d->replaceChild($root2, $root);
/* <?xml version="1.0"?>
<baz><bar/></baz> */
echo $d->saveXML();
I hope I am not missing anything but I happened to have the similar problem and was able to solve it by using use DomDocument::replaceChild(...).
/* #var $doc DOMDocument */
$doc = DOMImplementation::createDocument(NULL, 'oldRoot');
/* #var $newRoot DomElement */
$newRoot = $doc->createElement('newRoot');
/* all the code to create the elements under $newRoot */
$doc->replaceChild($newRoot, $doc->documentElement);
$doc->documentElement->isSameNode($newRoot) === true;
What threw me off initially was that $doc->documentElement was readonly, but the above worked and seems to be much simpler solution IF the $newRoot was created with the same DomDocument, otherwise you'll need do the importNode solution as described above. From your question is appears that $newRoot could be created from the same $doc.
Let us know if this worked out for you. Cheers.
EDIT: Noticed in version 20031129 that the DomDocument::$formatOutput, if set, does not format $newRoot output when you finally call $doc->saveXML()

Change relative path to full in css

I am using simple html dom to extract data from a website and pharse it. I cannot however change one of the realative paths in the style tag to a full one. I have tried many combinations.
I found a post here to use a PEAR script with simple html dom and it has worked on all links except below.
require_once 'includes/URL2.php';
$uri = new Net_URL2('http://www.stormcinemas.ie'); // URI of the resource
$baseURI = $uri;
foreach ($htmlcss->find('background[url]') as $elem) {
$elem->url = $baseURI->resolve($elem->url)->__toString();
}
foreach ($html->find('*[src]') as $elem) {
$elem->src = $baseURI->resolve($elem->src)->__toString();
}
foreach ($html->find('*[href]') as $elem) {
if (strtoupper($elem->tag) === 'BASE') continue;
$elem->href = $baseURI->resolve($elem->href)->__toString();
}
foreach ($html->find('form[action]') as $elem) {
$elem->action = $baseURI->resolve($elem->action)->__toString();
}
style.css
<style>
div.spriteImgSmall { background: url(/images/css_sprites/film_sprites/smallimages_sprite.jpg); }
</style>
Thanks
The solution was provided here but was deleted unfortunately. Thanks again, it actualy did solve my question.
Here it is for future ref.
$htmlcss = preg_replace('/url\(\s*[\'"]?\/?(.+?)[\'"]?\s*\)/i', 'url('.
$baseURI.'/$1)', $htmlcss);
I would still be interested if someone know's how to use simple html dom on css as there is nothing anywhere on the net. It may not even be possible.

Php wrapper class for XML

I'm working on a new class to wrap XML handling. I want my class to use simplexml if it's installed, and the built in XML functions if it's not. Can anyone give me some suggestions on a skeleton class to do this? It seems "wrong" to litter each method with a bunch of if statements, and that also seems like it would make it nearly impossible to correctly test.
Any upfront suggestions would be great!
EDIT: I'm talking about these built-in xml functions.
Which built-in xml functions are you referring to? SimpleXml is a standard extension, which uses libxml underneath - just as the dom extension does. So if the dom extension is installed, chances are that so is SimpleXml.
I've made a class which wraps SimpleXml functionality... take what you may from it...
bXml.class.inc
There is one weird thing... it's that SimpleXml doesn't allow its constructor to be overloaded, so you can't do things at initiation ... like override the input value (i.e. so you can accept XML as in input). I got around that limitation by using an ArrayObject class to wrap the new SimpleXml class.
I use something like this for doing xml translations and content:
Assuming xml structure something like this (important to use a regular structure, means you can pull off some nice agile tricks!):
<word name="nameofitem">
<en>value</en>
<pt>valor</pt>
<de>value_de</de>
</word>
and then a class to handle the xml:
class translations
{
public $xml = null;
private $file = null;
private $dom = null;
function __construct($file="translations") {
// get xml
$this->file = $file;
$this->haschanges = false;
$this->xml = file_get_contents($_SERVER['DOCUMENT_ROOT']."/xml/".$file.".xml");
$this->dom = new DOMdocument();
$this->dom->loadXML($this->xml);
}
function updateNode($toupdate, $newvalue, $lang="pt",$rootnode="word"){
$this->haschanges = true;
$nodes = $this->dom->getElementsByTagName($rootnode);
foreach ($nodes as $key => $value) {
if ($value->getAttribute("name")==$toupdate) {
$nodes->item($key)->getElementsByTagName($lang)->item(0)->nodeValue = htmlspecialchars($newvalue,ENT_QUOTES,'UTF-8');
}
}
}
function saveUpdated(){
$toSave = $this->dom->saveXML();
if ($this->haschanges === true) {
file_put_contents($_SERVER['DOCUMENT_ROOT']."/xml/".$this->file.".xml", $toSave);
return true;
}
else {
return false;
}
}
}
I took out a few of the methods I have, for brevity, but I extend this with things to handle file and image uploads etc too.
Once you have all this you can do:
$xml = new translations();
// loop through all the language posts
foreach ($_POST["xml"]["en"] as $key => $value) {
$xml->updateNode($key, stripslashes($value), "en");
}
Or something ;) hope this gives you some ideas!

Categories