What xPath should I use to display the requested data? - php

I am using the following script to get the POST TITLE and the CONTENT of an RSS feed. The structure of it is: ( I guess i did not make any error)
<div id="feedBody">
<div id="feedContent">
<div class="entry">
<h3>TITLE OF POST</h3>
<div base="http://feeds.feedburner.com/blogspot/hyMBI"
class="feedEntryContent"
> CONTENT OF POST </div>
</div>
</div>
</div>
<?php
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile('http://feeds.feedburner.com/blogspot/hyMBI');
libxml_clear_errors();
$xPath = new DOMXPath($dom);
$links = $xPath->query('????????????????');
foreach($links as $link) {
printf("%s \n", $link->nodeValue);
}
?>
What xPath should I use to get the data? Is there any way of having them seperate?
Thanks a million, hopefully this is my last question on my project...

First, you should load the XML using load, not loadHTMLFile.
Judging by your variable name "$links", I guess you're wanting the values of the <link> elements inside the <item> elements. So construct an xpath query that says just that: //item/link.

Basic XPath: //div[#class="entry"] gets you an array of all entries. You can get the first (or only) entry with //div[#class="entry"][1]. With that, you can use h3 to get the text of the title node, and div[1] to get the contents (if it's guaranteed that there's only one, otherwise specify the class).
You can put them together like //div[#class="entry"][1]/h3 if you like, so that you only have to query the root node. Otherwise, save the new node for the next query, like:
$entries = $xPath->query('//div[#class="entry"][1]');
foreach($entry in $entries) {
$title = $xPath->evaluate('h3[1]',$entry);
$post = $xPath->evaluate('div[1]',$entry);
}
If your RSS returns a whole group of posts, you can leave off the first [1] and loop through the whole group this way.

Related

Changing a tag <a> to <div> with DOMDocument on WordPress

I'm a beginner in PHP and I would like to set up several functions to replace specific code bits on WordPress (including plugin elements that I can't edit directly).
Below is an example (first line: initial result, second line: desired result):
<span class="fn" itemprop="name">Gael Beyries</span>
<div class="vcard author"><span class="fn" itemprop="name">Gael Beyries</span></div>
PS: I came across this topic: Parsing WordPress post content but the example is too complicated for what I want to do. Could you present me an example code that solves this problem so I can try to modify it to modify other html elements?
Although I'm not sure how this fits into WP, I have basically taken the code from the linked answer and adapted it to your requirements.
I've assumed you want to find the <a> tags with class="vcard author" and this is the basis of the XPath expression. The code in the foreach() loop just copies the data into a new node and replaces the old one...
function replaceAWithDiv($content){
$dom = new DOMDocument();
$dom->loadHTML($content);
$xpath = new DOMXPath($dom);
$aTags = $xpath->query('//a[#class="vcard author"]');
foreach($aTags as $a){
// Create replacement element
$div = $dom->createElement("div");
$div->setAttribute("class", "vcard author");
// Copy contents from a tag to div
foreach ($a->childNodes as $child ) {
$div->appendChild($child);
}
// Replace a tag with div
$a->parentNode->replaceChild($div, $a);
}
return $dom->saveHTML();
}

php - parentNode of a string found with preg_match

I'm trying to access the parentNode of an element found with preg_match, because I would like to read the result found with regex through the DOM of the document. I can't access it directly through PHP's DOMDocument because the amount of div's is variable and they have no actualy ID or any other attribute that is able to match.
To illustrate this: in the below example I'd match match_me with preg_match, and then I'd want to access the parentNode (div) and put all the child elements (the p's) in an DOMdocument object, so I can easily display them.
<div>
.... variable amount of divs
<div>
<div>
<p>1 match_me</p><p>2</p>
</div>
</div>
</div>
Use DOMXpath to query for the node by the value of its child:
$dom = new DOMDocument();
// Load your doc however necessary...
$xpath = new DOMXpath($dom);
// This query should match the parent div itself
$nodes = $xpath->query('/div[p/text() = "1 match_me"]');
$your_div = $nodes->item(0);
// Do something with the children
$p_tags = $your_div->childNodes;
// Or in this version, the query returns the `<p>` on which `parentNode` is called
$ndoes = $xpath->query('/p[text() = "1 match_me"]');
$your_div = $nodes->item(0)->parentNode;

PHP DOMXPath problem

$xpath = new DOMXpath($doc);
$res = $xpath->query(".//*[#id='post2679883']/tr[2]/td[2]/div[2]");
foreach( $res as $obj ) {
var_dump($obj->nodeValue);
}
I need to take all the items in the id with the word "post".
Example:
<div id="post2242424">trarata</div>
<div id="post114525">trarata</div>
<div id="post8568686">trarata</div>
Question number two:
I need to get this elements with HTML tags, but $obj->nodeValue returns text without html tags.
You could use the xpath function starts-with to filter the nodes in your XPath if all the nodes you want start with "post". For example;
$xpath->query(".//*[starts-with(#id, 'post')]/tr[2]/td[2]/div[2]");
For the second part, I think has been answered already - PHP DOMDocument stripping HTML tags

How can I execute XPath queries on DOMElements using PHP?

I'm trying to do Xpath queries on DOMElements but it doesn't seem to work. Here is the code
<html>
<div class="test aaa">
<div></div>
<div class="link">contains a link</div>
<div></div>
</div>
<div class="test bbb">
<div></div>
<div></div>
<div class="link">contains a link</div>
</div>
</html>
What I'm doing is this:
$dom = new DOMDocument();
$html = file_get_contents("file.html");
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$entries = $xpath->query("//div[contains(#class,'test')]");
if (!$entries->length > 0) {
echo "Nothing\n";
} else {
foreach ($entries as $entry) {
$link = $xpath->query('/div[#class=link]',$entry);
echo $link->item(0)->nodeValue;
// => PHP Notice: Trying to get property of non-object
}
}
Everything works fine up to $xpath->query('/div[#class=link], $entry);. I don't know how to use Xpath on a particular DOMElement ($entry).
How can I use xpath queries on DOMElement?
It looks like you're trying to mix CSS selectors with XPath. You want to be using a predicate ([...]) looking at the value of the class attribute.
For example, your //div.link might look like //div[contains(concat(' ',normalize-space(#class),' '),' link ')].
Secondly, within the loop you try to make a query with a context node then ignore that by using an absolute location path (it starts with a slash).
Updated to reflect changes to the question:
Your second XPath expression (/div[#class=link]) is still a) absolute, and b) has an incorrect condition. You want to be asking for matching elements relative to the specified context node ($entry) with the class attribute having a string value of link.
So /div[#class=link] should become something like div[#class="link"], which searches children of the $entry elements (use .//div[...] or descendant::div[...] if you want to search deeper).

Get only the first result of a foreach loop

I am trying to display all the links that are located in <div class="post">. There are many <div class="post"> inside my page (it is a blog). However there are many links in every <div class="post"> but I want to display the first of every div.
My question is how can I limit my code to show only the first one and continue to the next div?
Below is my code that gets and displays all the links. Gracias!
<?php
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile('http://www.mydomain.com/');
libxml_clear_errors();
$xPath = new DOMXPath($dom);
$links = $xPath->query('//div[#class="post"]/a/#href');
foreach($links as $link) {
printf("%s \n", $link->nodeValue);
}
?>
:)
$xPath = new DOMXPath($dom);
$links = $xPath->query('//div[#class="post"]/a/#href');
$i = 1;
foreach($links as $link) {
printf("%s \n", $link->nodeValue);
if($i == 1) break;
}
I want to display the first [link] of every div.
The following asks for the hrefs from only the first link (at any depth) within each of the divs
//div[#class="post"]/descendant::a[1]/#href
If you want to only accept links which are immediate children of the div, then remove descendant:: from the above.
So, similarly to your existing code, the PHP might look something like
$links = $xPath->query('//div[#class="post"]/descendant::a[1]/#href');
foreach($links as $link) {
printf("%s \n", $link->value);
}
You are now searchig for each link inside a div, so the data in $links contains no information at all about in what div the link occured. To find the first link in each div you will first have to query (somehow) for all divs. Then foreach div find all links in the div and select the first (or immediatly select the first link in that div if possible.
You can probably get an array of divs via the dom object or via xPath.

Categories