Traversing child nodes with PHP DOMXpath?

Traversing child nodes with PHP DOMXpath? - php

I'm having some trouble understanding what exactly is stored in childNodes. Ideally I'd like to do another xquery on each of the child nodes, but can't seem to get it straight. Here's my scenario:
Data:
<div class="something">
<h3>
Link text 1
</h3>
<div class"somethingelse">Something else text 1</div>
</div>
<div class="something">
<h3>
Link text 2
</h3>
<div class"somethingelse">Something else text 2</div>
</div>
<div class="something">
<h3>
Link text 3
</h3>
<div class"somethingelse">Something else text 3</div>
</div>
And the code:
$html = new DOMDocument();
$html->loadHtmlFile($local_file);
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#class='something']");
foreach ($nodelist as $n) {
Can I run another query here? }
For each element of "something" (i.e., $n) I want to access the values of the two pieces of text and the href. I tried using childNode and another xquery but couldn't get anything to work. Any help would be greatly appreciated!

Yes you can run another xpath query, something like that :
foreach ($nodelist as $n)
{
$other_nodes = $xpath->query('div[#class="somethingelse"]', $n);
echo $other_nodes->length;
}
This will get you the inner div with the class somethingelse, the second argument of the $xpath->query method tells to query to take this node as context, see more http://fr2.php.net/manual/en/domxpath.query.php

If I understand your question correctly, it worked when I used the descendant:: expression. Try this:
foreach ($nodelist as $n) {
$other_nodes = $xpath->query('descendant::div[#class="some-descendant"]', $n);
echo $other_nodes->length;
echo $other_nodes->item(0)->nodeValue;
}
Although sometimes it's just enough to combine queries using the // path expression for narrowing your search. The // path expression selects nodes in the document starting from the current node that match the selector.
$nodes = $xpath->query('//div[#class="some-descendant"]//div[#class="some-descendant-of-that-descendant"]');
Then loop through those for the stuff you need. Hope this helps.

Trexx had it but he missed the last sentence of the question:
foreach ($nodelist as $n){
$href = $xpath->query('h3/a', $n)->item(0)->getAttribute('href');
$a_text = $xpath->query('h3/a', $n)->item(0)->nodeValue;
$div_text = $xpath->query('div', $n)->item(0)->nodeValue;
}

Here is a code snippet that allows you to access the information contained within each of the nodes with class attribute "something":
$nodes_tracker = 0;
$nodes_array = array();
foreach($nodelist as $n){
$info = $xpath->query('//h3//a', $n)->item($nodes_tracker)->nodeValue;
$extra_info = $xpath->query('//div[#class="somethingelse"', $n)->item($nodes_tracker)->nodeValue;
array_push($nodes_array, $info. ' - '. $extra_info . '<br>'); //Add each info to array
$nodes_tracker++;
}
print_r($nodes_array);`

Related

Get content and class attribute value from child nodes of the DOM

at first sorry of my bad english !
this is my simple cURL result :
<li class="result">
<div class="song_info">
<span class="artist_name">art1</span>
<span class="song_name">name1</span>
<span class="views">100 time</span>
</div>
</li>
//again
<li class="result">
<div class="song_info">
<span class="artist_name">art2</span>
<span class="song_name">name2</span>
<span class="views">200 time</span>
</div>
</li>
and many like that ....
i used this code to extract values from html :
$classname = 'song_info';
$dom = new DOMDocument;
$dom->loadHTML($html); // my html result .
$xpath = new DOMXPath($dom);
$get = $xpath->query("//*[#class='" . $classname . "']");
$text = $get->item(0)->nodeValue;
echo $text;
this code give me just first result :
art1
name1
100time
i want to get all results ! (Better in json)
can anyone help me ?

DOMXPath::query method returns DOMNodeList. It implements Traversable interface, therefore you can loop through it with foreach. Rename $get variable to $nodes, so the variable will explicitly show what is stored in it. Then:
foreach ($nodes as $curNode) {
$childNodes = $curNode->childNodes;
foreach ($childNodes as $curChildNode) {
// use $curChildNode->textContent to get content
// and $curChildNode->getAttribute('class') to get class name
}
}

I found My Answer
$text = $get->item(0)->nodeValue; >> Give First Result
$text = $get->item(1)->nodeValue; >> Give Second Result
I write a loop and receive all results :/

Zend_Dom_Query how to get html code of current node

I have nodes, and iterate them in loop.
$html = <<<HTML
<div id="test">
<span>1</span>
<span>2</span>
</div>
HTML;
$dom= new Zend_Dom_Query($html);
$results = $dom->query('span');
foreach($results as $node){
...
}
How get html code of node? (not innerHTML, full HTML code <span>1</span>)

$htmlNode = iconv('UTF-8','ISO-8859-1',$results->getDocument()->saveXML($node));
Iconv exist here because i have russian characters.

I was recently working on Zend_Dom_Query. Was having a very hard time to figure this out. Finally got the solution. So this answer is for those still struggling out there.
$dom = new Zend_Dom_Query($html);
$results = $dom->query('div#test');
foreach($results as $node){
if($node->hasChildnodes()) {
$childNodes = $node->childNodes;
$countOfNodes = $childNodes->length;
$firstSpan = $childNodes->item(0)->C14N();
}
}
$firstSpan will contain <span>1</span>. You can also loop through the nodes using $countOfNodes to get 2nd span or nth element
Please check PHP:DOMElement - Manual and PHP:DOMNodeList for more info.

Strip an entire block of html based on class or id with php

I have the following php function which is supposed to remove a block of html tag based on a given classname or id. I got this function at http://www.katcode.com/php-html-parsing-extracting-and-removing-html-tag-of-specific-class-from-string/
This function works as it should but seems to have problems when we have nested tags. In the example below i'm trying to remove the entire div block that has class 'two'.
This function seems to have problems with nested tags. It's not removing the div block properly. It's having problems figuring out beginning and end of the block. How can i rework this function remove an entire tag regardless of how many nested elements it contains. I'm open to other php suggestions. I can easily do this with jQuery, but i'm looking for a php server side solution.
html looks like this
<div class="test">
<div>testing1</div>
<div class="two">
<div>testing3</div>
<div>testing3</div>
</div>
<div>testing3</div>
<div>testing4</div>
</div>
php
<?php
$x = '<div class="test"><div>testing1</div><div class="two"><div>testing3</div><div>testing3</div></div><div>testing3</div><div>testing4</div></div>';
function removeTag($str,$id,$start_tag,$end_tag){
while(($pos_srch = strpos($str,$id))!==false){
$beg = substr($str,0,$pos_srch);
$pos_start_tag = strrpos($beg,$start_tag);
$beg = substr($beg,0,$pos_start_tag);
$end = substr($str,$pos_srch);
$end_tag_len = strlen($end_tag);
$pos_end_tag = strpos($end,$end_tag);
$end = substr($end,$pos_end_tag+$end_tag_len);
$str = $beg.$end;
}
return $str;
}
echo removeTag($x,'two','<div','/div>');
?>

Not tested but try something like:
$doc = new DOMDocument();
$doc->loadHTML($x);
$xpath = new DOMXPath($doc);
$query = "//div[contains(#class, 'two')]";
$oldnodes = $xpath->query($query);
foreach ($oldnodes as $node) {
$fragment = $doc->createDocumentFragment();
while($node->childNodes->length > 0) {
$fragment->appendChild($node->childNodes->item(0));
}
$node->parentNode->replaceChild($fragment, $node);
}
echo $doc->saveHTML();
Hope it helps

html should probably never be parsed with php that way.
use phps domdocument class to open the html as an object. you can then use domdocument methods to search the document for the block you are looking for (xpath), loop through the xpath results and remove them, and then resave the document in text form.

What xPath should I use to display the requested data?

I am using the following script to get the POST TITLE and the CONTENT of an RSS feed. The structure of it is: ( I guess i did not make any error)
<div id="feedBody">
<div id="feedContent">
<div class="entry">
<h3>TITLE OF POST</h3>
<div base="http://feeds.feedburner.com/blogspot/hyMBI"
class="feedEntryContent"
> CONTENT OF POST </div>
</div>
</div>
</div>
<?php
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile('http://feeds.feedburner.com/blogspot/hyMBI');
libxml_clear_errors();
$xPath = new DOMXPath($dom);
$links = $xPath->query('????????????????');
foreach($links as $link) {
printf("%s \n", $link->nodeValue);
}
?>
What xPath should I use to get the data? Is there any way of having them seperate?
Thanks a million, hopefully this is my last question on my project...

First, you should load the XML using load, not loadHTMLFile.
Judging by your variable name "$links", I guess you're wanting the values of the <link> elements inside the <item> elements. So construct an xpath query that says just that: //item/link.

Basic XPath: //div[#class="entry"] gets you an array of all entries. You can get the first (or only) entry with //div[#class="entry"][1]. With that, you can use h3 to get the text of the title node, and div[1] to get the contents (if it's guaranteed that there's only one, otherwise specify the class).
You can put them together like //div[#class="entry"][1]/h3 if you like, so that you only have to query the root node. Otherwise, save the new node for the next query, like:
$entries = $xPath->query('//div[#class="entry"][1]');
foreach($entry in $entries) {
$title = $xPath->evaluate('h3[1]',$entry);
$post = $xPath->evaluate('div[1]',$entry);
}
If your RSS returns a whole group of posts, you can leave off the first [1] and loop through the whole group this way.

Grabbing links using xpath in php

i am trying to grab links from the Google search page. i am using the be below xpath to
//div[#id='ires']/ol[#id='rso']/li/h3/a/#href
grab the links. xPather evaluates it and gives the result. But when i use it with my php it doesn't show any result. Can someone please tell me what I am doing wrong? There is nothing wrong with the cURL.
below is my code
$dom = new DOMDocument();
#$dom->loadHTML($result);
$xpath=new DOMXPath($dom);
$elements = $xpath->evaluate("//div[#id='ires']/ol[#id='rso']/li/h3/a");
foreach ($elements as $element)
{
$link = $element->getElementsByTagName("href")->item(0)->nodeValue;
echo $link."<br>";
}
Sample Html provided by Robert Pitt
<li class="g w0">
<h3 class="r">
<em>LINK</em>
</h3>
<button class="ws" title=""></button>
<div class="s">
META
</div>
</li>

You can make life simpler by using the original XPath expression that you quoted:
//div[#id='ires']/ol[#id='rso']/li/h3/a/#href
Then, loop over the matching attributes like:
$hrefs = $xpath->evaluate(...);
foreach ($hrefs as $href) {
echo $href->value . "<br>";
}
Be sure to check whether any attributes were matched (var_dump($hrefs->length) would suffice).

Theres no element called href, thats an attribute:
$link = $element->getElementsByTagName("href")->item(0)->nodeValue;
You can just use
$link = $element->getAttribute('href');

did you try
$element->getElementsByTagName("a")
instead of
$element->getElementsByTagName("href")

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Traversing child nodes with PHP DOMXpath? - php

Trexx had it but he missed the last sentence of the question: foreach ($nodelist as $n){ $href = $xpath->query('h3/a', $n)->item(0)->getAttribute('href'); $a_text = $xpath->query('h3/a', $n)->item(0)->nodeValue; $div_text = $xpath->query('div', $n)->item(0)->nodeValue; }

Related

Get content and class attribute value from child nodes of the DOM

Zend_Dom_Query how to get html code of current node

Strip an entire block of html based on class or id with php

What xPath should I use to display the requested data?

Grabbing links using xpath in php

Categories

Resources