php getElementsByTagName with specific attribute - php

I think this gets the first element called <gallery>
$gallery = $objDOM->getElementsByTagName('gallery')->item(0);
I'm trying to get <gallery name="Third">
I think I need something equivalent to:
$gallery = $objDOM->getElementsByTagName('gallery[#name="Third"]')->item;
Thanks, Andy

This is only possible with DOMXPath, e.g.
$xp = new DOMXPath($yourDOMDocument);
$nodes = $xp->query('//gallery[#name="Third"]');
or by iterating over the node list after the call to getElementsByTagName with
foreach ($objDOM->getElementsByTagName('gallery') as $gallery) {
if($gallery->getAttribute('name') === 'Third') {
// do something
}
}

As the name suggests getElementsByTagName() only accepts tag names. Try XPath instead
$xpath = new DOMXPath ($objDOM);
$nodeList = $xpath->query('gallery[#name="Third"]');
$gallery = $nodeList->item(0);
Dont tested it, so there may be errors, typos or something.

Related

PHP XPath issue

Having a real bugger of an Xpath issue. I am trying to match the nodes with a certain value.
Here is an example XML fragment.
http://pastie.org/private/xrjb2ncya8rdm8rckrjqg
I am trying to match a given MatchNumber node value to see if there are two or more. Assuming that this is stored in a variable called $data I am using the below expression. Its been a while since ive done much XPath as most thing seem to be JSON these days so please excuse any rookie oversights.
$doc = new DOMDocument;
$doc->load($data);
$xpath = new DOMXPath($doc);
$result = $xpath->query("/CupRoundSpot/MatchNumber[.='1']");
I need to basically match any node that has a Match Number value of 1 and then determine if the result length is greater than 1 ( i.e. 2 or more have been found ).
Many thanks in advance for any help.
Your XML document has a default namespace: xmlns="http://www.fixtureslive.com/".
You have to register this namespace on the xpath element and use the (registered) prefix in your query.
$xpath->registerNamespace ('fl' , 'http://www.fixtureslive.com/');
$result = $xpath->query("/fl:ArrayOfCupRoundSpot/fl:CupRoundSpot/fl:MatchNumber[.='1']");
foreach( $result as $e ) {
echo '.';
}
The following XPath:
/CupRoundSpot[MatchNumber = 1]
Returns all the CupRoundSpot nodes where MatchNumber equals 1. You could use these nodes futher in your PHP to do stuff with it.
Executing:
count(/CupRoundSpot[MatchNumber = 1])
Returns you the total CupRoundSpot nodes found where MatchNumber equals 1.
You have to register the namespace. After that you can use the Xpath count() function. An expression like that will only work with evaluate(), not with query(). query() can only return node lists, not scalar values.
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('fl', 'http://www.fixtureslive.com/');
var_dump(
$xpath->evaluate(
'count(/fl:ArrayOfCupRoundSpot/fl:CupRoundSpot[number(fl:MatchNumber) = 1])'
)
);
Output:
float(2)
DEMO: https://eval.in/130366
To iterate the CupRoundSpot nodes, just use foreach:
$nodes = $xpath->evaluate(
'/fl:ArrayOfCupRoundSpot/fl:CupRoundSpot[number(fl:MatchNumber) = 1]'
);
foreach ($nodes as $node) {
//...
}

PHP DOMDocument, retrieve just content of a div, without div tag

I'm using DOMDocument to retrieve on a HTML page a special div.
I just want to retrive the content of this div, without the div tag.
For example :
$dom = new DOMDocument;
$dom->loadHTML($webtext['content']);
$main = $dom->getElementById('inter');
$dom->saveHTML()
Here, i have the result :
<div id="inter">
//SOME THINGS IN MY DIV
</div>
And i just want to have :
//SOME THINGS IN MY DIV
Ideas ? Thanks !
I'm going to go with simple does it. You already have:
$dom = new DOMDocument;
$dom->loadHTML($webtext['content']);
$main = $dom->getElementById('inter');
$dom->saveHTML();
Now, DOMDocument::getElementById() returns one DOMElement which extends DOMNode which has the public stringnodeValue. Since you don't specify if you are expecting anything but text within that div, I'm going to assume that you want anything that may be stored in there as plain text. For that, we are going to remove $dom->saveHTML();, and instead replace it with:
$divString = $main->nodeValue;
With that, $divString will contain //SOME THINGS IN MY DIV, which, from your example, is the desired output.
If, however, you want the HTML of the inside of it and not just a String representation - replace it with the following instead:
$divString = "";
foreach($main->childNodes as $c)
$divString .= $c->ownerDocument->saveXML($c);
What that does is takes advantage of the inherited DOMNode::childNodes which contains a DOMNodeList each containing its own DOMNode (for reference, see above), and we loop through each one getting the ownerDocument which is a DOMDocument and we call the DOMDocument::saveXML() function. The reason we pass the current $c node in to the function is to prevent an entire valid document from being outputted, and because the ownerDocument is what we are looping through - we need to get one child at a time, with no children left behind. (sorry, it's late, couldn't resist.)
Now, after either option, you can do with $divString what you will. I hope this has helped explain the process to you and hopefully you walk away with a better understanding of what is going on instead of rote copying of code just because it works. ^^
you can use my custom function to remove extra div from content
$html_string = '<div id="inter">
SOME THINGS IN MY DIV
</div>';
// custom function
function DOMgetinnerHTML($element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
your code will like
$dom = new DOMDocument;
$dom->loadHTML($html_string);
$divs = $dom->getElementsByTagName('div');
$innerHTML_contents = DOMgetinnerHTML($divs->item(0));
echo $innerHTML_contents
and your output will be
SOME THINGS IN MY DIV
you can use xpath
$xpath = new DOMXPath($xml);
foreach($xpath->query('//div[#id="inter"]/*') as $node)
{
$node->nodeValue
}
or simplu you can edit your code. see here
$main = $dom->getElementById('inter');
echo $main->nodeValue

How to delete element with DOMDocument?

Is it possible to delete element from loaded DOM without creating a new one? For example something like this:
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $href)
if($href->nodeValue == 'First')
//delete
You remove the node by telling the parent node to remove the child:
$href->parentNode->removeChild($href);
See DOMNode::$parentNodeDocs and DOMNode::removeChild()Docs.
See as well:
How to remove attributes using PHP DOMDocument?
How to remove an HTML element using the DOMDocument class
This took me a while to figure out, so here's some clarification:
If you're deleting elements from within a loop (as in the OP's example), you need to loop backwards
$elements = $completePage->getElementsByTagName('a');
for ($i = $elements->length; --$i >= 0; ) {
$href = $elements->item($i);
$href->parentNode->removeChild($href);
}
DOMNodeList documentation: You can modify, and even delete, nodes from a DOMNodeList if you iterate backwards
Easily:
$href->parentNode->removeChild($href);
I know this has already been answered but I wanted to add to it.
In case someone faces the same problem I have faced.
Looping through the domnode list and removing items directly can cause issues.
I just read this and based on that I created a method in my own code base which works:https://www.php.net/manual/en/domnode.removechild.php
Here is what I would do:
$links = $dom->getElementsByTagName('a');
$links_to_remove = [];
foreach($links as $link){
$links_to_remove[] = $link;
}
foreach($links_to_remove as $link){
$link->parentNode->removeChild($link);
}
$dom->saveHTML();
for remove tag or somthing.
removeChild($element->id());
full example:
$dom = new Dom;
$dom->loadFromUrl('URL');
$html = $dom->find('main')[0];
$html2 = $html->find('p')[0];
$span = $html2->find('span')[0];
$html2->removeChild($span->id());
echo $html2;

php xpath parse script src

I am trying to parse all script src link values, but I get an empty array.
$dom = new DOMDocument();
$file = #$dom->loadHTML($remote);
$xpath = new DOMXpath($dom);
$link = $xpath->query('//script[contains(#src, "pcode")]');
$return = array();
foreach($link as $links) {
$return[] = $links->nodeValue;
}
Your XPATH query looks valid, should grab every <script> with attribute src containing pcode.
If it's returning an empty array, there's a few things to check:
Make sure the DOM document and loading, and there are not errors when loading it into XPATH. It could be possible that the suppressed DOM->load is giving an error or warning. If you query elsewhere and it works, then ignore this.
Make sure the tags in your document are case-matching.
Try
$link = $xpath->query("//script[contains(#src, 'pcode')]");
Seems silly, just switching quote marks, but you never know.
Be sure to check namespaces. If your HTML contains a declaration like this
<html xmlns="http://www.w3.org/1999/xhtml">
You'll need to register the namespace with the document
$xp = new domxpath( $xml);
$xp->registerNamespace('html', 'http://www.w3.org/1999/xhtml' );
And Look for elements like this
$elements = $xp->query( "//html:script", $xml );
Namespaces, because paranoia breeds confidence.

Finding number of nodes in PHP, DOM, XPath

I am loading HTML into DOM and then querying it using XPath in PHP. My current problem is how do I find out how many matches have been made, and once that is ascertained, how do I access them?
I currently have this dirty solution:
$i = 0;
foreach($nodes as $node) {
echo $dom->savexml($nodes->item($i));
$i++;
}
Is there a cleaner solution to find the number of nodes, I have tried count(), but that does not work.
You haven't posted any code related to $nodes so I assume you are using DOMXPath and query(), or at the very least, you have a DOMNodeList.
DOMXPath::query() returns a DOMNodeList, which has a length member. You can access it via (given your code):
$nodes->length
If you just want to know the count, you can also use DOMXPath::evaluate.
Example from PHP Manual:
$doc = new DOMDocument;
$doc->load('book.xml');
$xpath = new DOMXPath($doc);
$tbody = $doc->getElementsByTagName('tbody')->item(0);
// our query is relative to the tbody node
$query = 'count(row/entry[. = "en"])';
$entries = $xpath->evaluate($query, $tbody);
echo "There are $entries english books\n";

Categories