Getting nodes by attribute value in DomXML and PHP - php

I am trying to select a specific node to edit its content and attributes later on, but I am not able to select a notice node (see the PHP code below).
XML document
<?xml version="1.0" encoding="UTF-8"?>
<notices lastID="0001">
<!--
<notice id="0000" endDate="31-12-2025">
Content of the notice.
</notice>
-->
<notice id="0001" endDate="13-01-2013" active="1">
One amazing notice.
</notice>
</notices>
The $id value is "0001" (string).
PHP
$document = new DOMDocument;
$document = dom_import_simplexml($this->xml);
$notices= $document->getElementsByTagName('notice');
#var_dump($notices);
foreach($notices as $element)
{
if($element->getAttribute('id') == $id)
{
$notice = $element;
var_dump($notice); //Returns absolutely nothing (I guess the if returns false).
}
}
$notice->removeAttribute("endDate");
$notice->setAttribute("endDate",$endDate);
Everytime I fall into the if statement, my $notice returns no value.
I tried with xpath query ( //notice[#id="{$id}"]) without any success.
To clarify, my problem would be $element->getAttribute('id') does not seem to work.
I also tried with SimpleXML:
PHP
$document = new DOMDocument;
$document = simplexml_import_dom($this->xml);
$notice = "";
foreach($document->notices->children() as $element)
{
$attributes = $element->attributes();
if($attributes['id'] == $id)
{
$notice= $element;
var_dump($notice);
}
}
$avis->removeAttribute("endDate");
$avis->setAttribute("endDate",$endDate);
SimpleXML gives me the following message: Node no longer exists on the following line:
foreach($document->notices->children() as $element)

I finally got it.
PHP
$document = simplexml_import_dom($this->xml);
$notice = "";
$xpathResults = $document->xpath('//notice');
foreach($xpathResults as $element)
{
$elementID = $element->id[0];
$domXML = dom_import_simplexml($element);
if((string) $noticeID == $id)
{
$notice = $domXML;
}
}

Related

PHP - Get value from xml file

Using the following xml: http://www.bnr.ro/nbrfxrates.xml
How can I get the EUR value?
Been trying like this ... but no luck.
$xmlDoc = new DOMDocument();
$xmlDoc->load('http://www.bnr.ro/nbrfxrates.xml');
$searchNode = $xmlDoc->getElementsByTagName("Cube");
var_dump($searchNode);
foreach ($searchNode as $searchNode) {
$valueID = $searchNode->getAttribute('Rate');
echo $valueID;
}
Check this
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->load('http://www.bnr.ro/nbrfxrates.xml');
foreach ($xmlDoc->getElementsByTagName('Rate') as $searchNode) {
if($searchNode->getAttribute('currency') === 'EUR') {
echo $searchNode->nodeValue;
}
}
?>
First Rate is not an attribute but an element. So you would need another getElementsByTagName('Rate') and loop over it. However the XML uses a default namespace so getElementByTagNameNS('http://www.bnr.ro/xsd', 'Rate') would be the correct way.
An easier way is to use Xpath to fetch the value directly:
$document = new DOMDocument();
$document->load('http://www.bnr.ro/nbrfxrates.xml');
$xpath = new DOMXpath($document);
$xpath->registerNamespace('r', 'http://www.bnr.ro/xsd');
var_dump(
$xpath->evaluate('number(//r:Cube/r:Rate[#currency="EUR"])')
);
Output:
float(4.4961)
Xpath does not have a default namespace, so you have to register your own alias for it (I used r in the example.).
The Xpath expression
fetch any {http://www.bnr.ro/nbrfxrates.xml}Cube
//r:Cube
fetch all {http://www.bnr.ro/nbrfxrates.xml}Rate children
//r:Cube/r:Rate
filter by the currency attribute
//r:Cube/r:Rate[#currency="EUR"]
cast the first found node into a number
number(//r:Cube/r:Rate[#currency="EUR"])
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->load('http://www.bnr.ro/nbrfxrates.xml');
foreach($xmlDoc->getElementsByTagName("Rate") as $node)
{
$currency = $node->getAttribute('currency');
if($currency == 'EUR')
{
$value = $node->nodeValue;
}
}
echo 'value for EUR is - '. $value;
?>

How can I retrieve infos from PHP DOMElement?

I'm working on a function that gets the whole content of the style.css file, and returns only the CSS rules that needed by the currently viewed page (it will be cached too, so the function only runs when the page was changed).
My problem is with parsing the DOM (I'm never doing it before with PHP DOM). I have the following function, but $element->tagname returns NULL. I also want to check the element's "class" attribute, but I'm stuck here.
function get_rules($html) {
$arr = array();
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('*') as $element ){
$arr[sizeof($arr)] = $element->tagname;
}
return array_unique($arr);
}
What can I do? How can I get all of the DOM elements tag name, and class from HTML?
Because tagname should be an undefined index because its supposed to be tagName (camel cased).
function get_rules($html) {
$arr = array();
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('*') as $element ){
$e = array();
$e['tagName'] = $element->tagName; // tagName not tagname
// get all elements attributes
foreach($element->attributes as $attr) {
$attrs = array();
$attrs['name'] = $attr->nodeName;
$attrs['value'] = $attr->nodeValue;
$e['attributes'][] = $attrs;
}
$arr[] = $e;
}
return $arr;
}
Simple Output

Extracting certain portions of HTML from within PHP

Ok, so I'm writing an application in PHP to check my sites if all the links are valid, so I can update them if I have to.
And I ran into a problem. I've tried to use SimpleXml and DOMDocument objects to extract the tags but when I run the app with a sample site I usually get a ton of errors if I use the SimpleXml object type.
So is there a way to scan the html document for href attributes that's pretty much as simple as using SimpleXml?
<?php
// what I want to do is get a similar effect to the code described below:
foreach($html->html->body->a as $link)
{
// store the $link into a file
foreach($link->attributes() as $attribute=>$value);
{
//procedure to place the href value into a file
}
}
?>
so basically i'm looking for a way to preform the above operation. The thing is I'm currently getting confused as to how should I treat the string that i'm getting with the html code in it...
just to be clear, I'm using the following primitive way of getting the html file:
<?php
$target = "http://www.targeturl.com";
$file_handle = fopen($target, "r");
$a = "";
while (!feof($file_handle)) $a .= fgets($file_handle, 4096);
fclose($file_handle);
?>
Any info would be useful as well as any other language alternatives where the above problem is more elegantly fixed (python, c or c++)
You can use DOMDocument::loadHTML
Here's a bunch of code we use for a HTML parsing tool we wrote.
$target = "http://www.targeturl.com";
$result = file_get_contents($target);
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
#$dom->loadHTML($result);
$links = extractLink(getTags( $dom, 'a', ));
function extractLink( $html, $argument = 1 ) {
$href_regex_pattern = '/<a[^>]*?href=[\'"](.*?)[\'"][^>]*?>(.*?)<\/a>/si';
preg_match_all($href_regex_pattern,$html,$matches);
if (count($matches)) {
if (is_array($matches[$argument]) && count($matches[$argument])) {
return $matches[$argument][0];
}
return $matches[1];
} else
function getTags( $dom, $tagName, $element = false, $children = false ) {
$html = '';
$domxpath = new DOMXPath($dom);
$children = ($children) ? "/".$children : '';
$filtered = $domxpath->query("//$tagName" . $children);
$i = 0;
while( $myItem = $filtered->item($i++) ){
$newDom = new DOMDocument;
$newDom->formatOutput = true;
$node = $newDom->importNode( $myItem, true );
$newDom->appendChild($node);
$html[] = $newDom->saveHTML();
}
if ($element !== false && isset($html[$element])) {
return $html[$element];
} else
return $html;
}
You could just use strpos($html, 'href=') and then parse the URL. You could also search for <a or .php

ignoring nested elements when parsing xml with php

probably a simple question to answer for someone:::
xml:
<foobar>
<foo>i am a foo</foo>
<bar>i am a bar</bar>
<foo>i am a <bar>bar</bar></foo>
</foobar>
In the above, I want to display all elements that are <foo>. When the script gets to the line with the nested < bar > the result is "i am a bar" .. which isn't the result I had hoped for.
Is it not possible to print out the entire contents of that element as it is, so that i see: "i am a <bar>bar</bar>"
php:
$xml = file_get_contents('sample');
$dom = new DOMDocument;
#$dom->loadHTML($xml);
$resources= $dom->getElementsByTagName('foo');
foreach ($resources as $resource){
echo $resource->nodeValue . "\n";
}
After some trolling and trying to do what I needed with SimpleXML, I arrived at the following conclusion. My issue with SimpleXML was where the elements are. If the xml is structured, and the hierarchy is standard ... I have no problem.
If the XML is a web page for example, and the <foo> element is anywhere, SimpleXML doesn't have a good facility like getElementsByTagName to pull out the element wherever it may be....
<?php
$doc = new DOMDocument();
$doc->load('sample');
$element_name = 'foo';
if ($doc->getElementsByTagName($element_name)->length > 0) {
$resources = $doc->getElementsByTagName($element_name);
foreach ($resources as $resource) {
$id = null;
if (!$resource->hasAttribute('id')) {
$resource->setAttribute('id', gen_uuid());
}
$innerHTML = null;
$children = $resource->childNodes;
foreach ($children as $child) {
$tmp_doc = new DOMDocument();
$tmp_doc->appendChild($tmp_doc->importNode($child,true));
$innerHTML .= rtrim($tmp_doc->saveHTML());
}
$resource->nodevalue = $innerHTML;
}
}
echo $doc->saveHTML();
?>
Rather than writing all that code, you might try XPath. That expression would be "//foo", which would get a list of all the elements in the document named "foo".
http://php.net/manual/en/simplexmlelement.xpath.php

Retrieving single node value from a nodelist

I'm having difficulty extracting a single node value from a nodelist.
My code takes an xml file which holds several fields, some containing text, file paths and full image names with extensions.
I run an expath query over it, looking for the node item with a certain id. It then stores the matched node item and saves it as $oldnode
Now my problem is trying to extract a value from that $oldnode. I have tried to var_dump($oldnode) and print_r($oldnode) but it returns the following: "object(DOMElement)#8 (0) { } "
Im guessing the $oldnode variable is an object, but how do I access it?
I am able to echo out the whole node list by using: echo $oldnode->nodeValue;
This displays all the nodes in the list.
Here is the code which handles the xml file. line 6 is the line in question...
$xpathexp = "//item[#id=". $updateID ."]";
$xpath = new DOMXpath($xml);
$nodelist = $xpath->query($xpathexp);
if((is_null($nodelist)) || (! is_numeric($nodelist))) {
$oldnode = $nodelist->item(0);
echo $oldnode->nodeValue;
//$imgUpload = strchr($oldnode->nodeValue, ' ');
//$imgUpload = strrchr($imgUpload, '/');
//explode('/',$imgUpload);
//$imgUpload = trim($imgUpload);
$newItem = new DomDocument;
$item_node = $newItem ->createElement('item');
//Create attribute on the node as well
$item_node ->setAttribute("id", $updateID);
$largeImageText = $newItem->createElement('largeImgText');
$largeImageText->appendChild( $newItem->createCDATASection($largeImgText));
$item_node->appendChild($largeImageText);
$urlANode = $newItem->createElement('urlA');
$urlANode->appendChild( $newItem->createCDATASection($urlA));
$item_node->appendChild($urlANode);
$largeImg = $newItem->createElement('largeImg');
$largeImg->appendChild( $newItem->createCDATASection($imgUpload));
$item_node->appendChild($largeImg);
$thumbnailTextNode = $newItem->createElement('thumbnailText');
$thumbnailTextNode->appendChild( $newItem->createCDATASection($thumbnailText));
$item_node->appendChild($thumbnailTextNode);
$urlB = $newItem->createElement('urlB');
$urlB->appendChild( $newItem->createCDATASection($urlA));
$item_node->appendChild($urlB);
$thumbnailImg = $newItem->createElement('thumbnailImg');
$thumbnailImg->appendChild( $newItem->createCDATASection(basename($_FILES['thumbnailImg']['name'])));
$item_node->appendChild($thumbnailImg);
$newItem->appendChild($item_node);
$newnode = $xml->importNode($newItem->documentElement, true);
// Replace
$oldnode->parentNode->replaceChild($newnode, $oldnode);
// Display
$xml->save($xmlFileData);
//header('Location: index.php?a=112&id=5');
Any help would be great.
Thanks
Wasn't it supposed to be echo $oldnode->firstChild->nodeValue;? I remember this because technically you need the value from the text node.. but I might be mistaken, it's been a while. You could give it a try?
After our discussion in the comments on this answer, I came up with this solution. I'm not sure if it can be done cleaner, perhaps. But it should work.
$nodelist = $xpath->query($xpathexp);
if((is_null($nodelist)) || (! is_numeric($nodelist))) {
$oldnode = $nodelist->item(0);
$largeImg = null;
$thumbnailImg = null;
foreach( $oldnode->childNodes as $node ) {
if( $node->nodeName == "largeImg" ) {
$largeImg = $node->nodeValue;
} else if( $node->nodeName == "thumbnailImg" ) {
$thumbnailImg = $node->nodeValue;
}
}
var_dump($largeImg);
var_dump($thumbnailImg);
}
You could also use getElementsByTagName on the $oldnode, then see if it found anything (and if a node was found, $oldnode->getElementsByTagName("thumbnailImg")->item(0)->nodeValue). Which might be cleaner then looping through them.

Categories