PHP XMLNode, DOMnode Xpath selection predicate for a grandchild attribute value

PHP XMLNode, DOMnode Xpath selection predicate for a grandchild attribute value - php

I have some xml
<div> First Element
<div>
<level3 name="fred">
</level3>
</div>
</div>
<div> Second Element
<div>
<level3 name="dave">
</level3>
</div>
</div>
<div> Third Element
<div>
<level3 name="jim">
</level3>
</div>
</div>
<div> Fifth Element
<div>
<level3 name="mike">
</level3>
</div>
</div>
I want to extract the xml (as a string, including the xml tags) from a specific top level div element based in its grandchilds name at level3.
So to get the top div above the level3 node with the name of jim I have been looking at things like:
$sname="jim";
$spath = new DOMXPath($doc);
// Find a div with a child div with a level3 with a matching attribute name.
$spexp = "//div[./div/level3[contains(#name,\"$sname\")]]";
$story = $spath->evaluate("$spexp");
echo $story->item(0)->nodeValue . "\n";
I have tried various combinations - including 'exists' in the predicate which I am sure is basic xslt, but not in PHP(!).
I have googled loads... but predicates going down past the immediate level hasn't come up, and it seems PHP's xpath has its own flavour, so general XPath stuff isn't always useful.

The XPath was OK, this just removes the first bit inside the first [ as it's not needed.
To output the XML, you need to use saveXML() with the node you want to export all of the XML tags as well...
$sname="jim";
$spath = new DOMXPath($doc);
// Find a div with a child div with a level3 with a matching attribute name.
$spexp = "//div[div/level3[contains(#name,\"$sname\")]]";
$story = $spath->evaluate("$spexp");
echo $doc->saveXML($story->item(0)). "\n";
Gives...
<div> Third Element
<div>
<level3 name="jim">
</level3>
</div>
</div>

Related

Get html nodes that are not within a node with attribute X

I have the following xpath query, and I am not sure how to make it so it only finds items with the attribute pi-repeat, but not it's children that have that attribute.
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//*[#pi-repeat]') as $node) {
// Do stuff
}
Html example:
<body>
<div pi-repeat="thing1">
<div pi-repeat="sub-item"></div>
</div>
<div class="a-class">
<div pi-repeat="thing2">
<div pi-repeat="sub-item"></div>
</div>
</div>
</body>
As you can see here there are four pi-repeat attributes, I would like my query to only select the ones that are not within an element pi-repeat attribute.
In this case, only thing1 and thing2 would get selected.

The 2nd predicate in the XPath below will do the job. It filters out elements where ancestor element has pi-repeat attribute :
//*[#pi-repeat][not(ancestor::*[#pi-repeat])]

how to get content of an element with HTML nodes?

I need to get the content of an element and place that content into another element. I use createTextNode to append that content as a child to the target element.
As I append it as text node, < and > is converted into < and >. How can I append that content without conversion?
For example:
<li id="fn1">
<div>
<a>some text
</a>
</div>
</li>
Expected output:
<p>
<div>
<a>some text
</a>
</div>
</p>
But my output is like,
<p>
<div>
<a>some text</a>
</div>
</p>
my code
$ch=dom->createElement("p");
$li=$xp->query("//li[contains(#id, 'fn')]");
foreach($li as $liv) {
$linodes = $liv->childNodes;
$pvalue="";
foreach ($linodes as $lin) {
$pvalue.=$dom->saveXML($lin);}
$ch->appendChild($dom->createTextNode($pvalue)); }
I have tried,
$ch->appendChild($dom->createTextNode(htmlspecialchars_decode($pvalue))); but same output

If you want to
move a node within the same document: remove that node via DOMNode::removeChild and append the return value of that function via DOMNode::appendChild to its new parent node.
copy the node to a new location within the same document, make a deep clone of the node via DOMNode::clone the node and append it.
transfer the node to another document, import that node to the new document via DOMDOcument::importNode and then append it to its new parent.

How to get element whose parents child is element x? PHP Simple HTML DOM Parser

So for example I have a HTML tree like this:
<section class="product">
<div>
<div class="p-image">
<img alt="Product name" src="path/to/image.jpg">
</div>
<div class="p-content">
<h3>Product name</h3>
</div>
<div class="p-info">
<div class="new-price">
<span>400 €</span>
</div>
</div>
</section>
So I want to get the content of span element whose parent (div) has a child element (img) with a specific alt attribute. I know how to select an element by its attributes, but I haven't found any solution to selecting an element by it's parent's child.
I hope my explanation was understandable.
Thank you.

in jQuery you could use $(selector).parent() to get element's parent and $(img alt="x") to get the img tag with alt attribute that is equal to x

Web scraper with DOMDocument

I'm trying to scrape a web page for content, using file_get_contents to grab the HTML and then using a DOMDocument object. My problem is that I cannot get the appropriate information. I'm not sure if this is because I'm using DOMDocument's methods wrong, or if the (X)HTML in my source is just poor.
In the source, there is an element with an id of 'cards', which has two child divs. I want the first child, which has many child divs, who in turn have an anchor child with div child. I want the href from the anchor and the nodeValue from it's child div.
The structure is like this:
<div id="cards">
<div class="grid">
<div class="card-wrap">
<a href="linkValue">
<img src="..."/>
<div>nameValue</div>
</a>
</div>
...
</div>
<div id="...">
</div>
</div>
I've started out with $cards = $dom->getElementById("cards"). I get a DOMText Object, a DOMElement Object, a DOMText Object, a DOMElement Object, and a DOMText Object. I then use $grid = $cards->childNodes->item(1) to get the first DOMElement Object, which is presumably the .grid element. However, when I then iterate through the $grid with:
foreach($grid->childNodes as $item){
if($item->nodeName == "div"){
echo $item->nodeName,' | ',$item->nodeValue,'<br>';
}
}
I end up with a page full of "div | nameValue" where nameValue is the embedded div's nodeValue, and I am unable to locate the anchors to get their href value.
Am I doing something obviously wrong with my DOMDocument, or perhaps there is something more going on here?

Well, from your example code if($item->nodeName == "div"){ is very going to preclude any <a> tag. Additionally, I do not believe childNodes allows recursive iteration.
Therefore, to access the nodes in question, you could use:
$children = $dom->getElementById("cards")->childNodes
->item(1)->childNodes->item(1)->childNodes;
Yet, as you can see this is very messy... Introducing XPath:
http://php.net/manual/en/class.domxpath.php
http://www.w3schools.com/xpath/xpath_syntax.asp

The XPath way:
$src = <<<EOS
<div id="cards">
<div class="grid">
<div class="card-wrap">
<a href="linkValue">
<img src="..."/>
<div>nameValue</div>
</a>
</div>
</div>
<div id="whatever">
</div>
</div>
EOS;
$xml = new SimpleXMLElement($src);
list ($anchor) = $xml->xpath('//div[#id="cards"]/div[1]/div[1]/a');
echo $anchor->div, ' => ', $anchor['href'], PHP_EOL;
"Get anchor of first child div of first child div of div with an id of 'cards'"
Output:
nameValue => linkValue

Retrieving (relating) two separate tags/attributes using a single XPath query?

I am Xpathing a DOMDocument file I have. the general pattern of this domdocument is as follows:
<h2> Title info </h2>
<div> .... </div>
<p> ...</p>
<div class = format_text>
<p>
<img src = "http://sourceofimageOnline.com">
</p>
</div>
<h2> 2nd title</h2>
<div> .... </div>
<p> ...</p>
<div class = format_text>
<p>
<img src = "http://sourceofimageOnline.com"></img>
<img src = "http://sourceofimageonline.com"</img>
</p>
</div>
The key is to return the titles and the src attribute for images that are hyperlinks.
Essentially, I render it as :
Title 1
Img URI 1
Title 2
Img URI 2
Img URI 3
...
..
Now the Titles can be easily retrieved using
DomDocument->getElementsByTagNames('h2')
And the img src are retrieved by an XPATH query:
//div[#class = "format_text"]/p/a/img/#src
This returns all the information I need. However, I am being challenged by trying to get the img src's relate to the titles they fall under. Since they are retrieved independently, I am unable to comprehend what kind of Xpath query I need to execute to retrieve both such that the above constraint is satisfied.

fetch an array with XPath expression /html/body//h2
iterate over this array with another XPath expression
refer to the current h2 with . and refer to the first link with
./../div[#class='format_text']/p/a[$counter]/img
XPath expression where $counter is the array id.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP XMLNode, DOMnode Xpath selection predicate for a grandchild attribute value - php

Related

Get html nodes that are not within a node with attribute X

how to get content of an element with HTML nodes?

How to get element whose parents child is element x? PHP Simple HTML DOM Parser

Web scraper with DOMDocument

Retrieving (relating) two separate tags/attributes using a single XPath query?

Categories

Resources