I have the following xpath query, and I am not sure how to make it so it only finds items with the attribute pi-repeat, but not it's children that have that attribute.
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//*[#pi-repeat]') as $node) {
// Do stuff
}
Html example:
<body>
<div pi-repeat="thing1">
<div pi-repeat="sub-item"></div>
</div>
<div class="a-class">
<div pi-repeat="thing2">
<div pi-repeat="sub-item"></div>
</div>
</div>
</body>
As you can see here there are four pi-repeat attributes, I would like my query to only select the ones that are not within an element pi-repeat attribute.
In this case, only thing1 and thing2 would get selected.
The 2nd predicate in the XPath below will do the job. It filters out elements where ancestor element has pi-repeat attribute :
//*[#pi-repeat][not(ancestor::*[#pi-repeat])]
Related
I have some xml
<div> First Element
<div>
<level3 name="fred">
</level3>
</div>
</div>
<div> Second Element
<div>
<level3 name="dave">
</level3>
</div>
</div>
<div> Third Element
<div>
<level3 name="jim">
</level3>
</div>
</div>
<div> Fifth Element
<div>
<level3 name="mike">
</level3>
</div>
</div>
I want to extract the xml (as a string, including the xml tags) from a specific top level div element based in its grandchilds name at level3.
So to get the top div above the level3 node with the name of jim I have been looking at things like:
$sname="jim";
$spath = new DOMXPath($doc);
// Find a div with a child div with a level3 with a matching attribute name.
$spexp = "//div[./div/level3[contains(#name,\"$sname\")]]";
$story = $spath->evaluate("$spexp");
echo $story->item(0)->nodeValue . "\n";
I have tried various combinations - including 'exists' in the predicate which I am sure is basic xslt, but not in PHP(!).
I have googled loads... but predicates going down past the immediate level hasn't come up, and it seems PHP's xpath has its own flavour, so general XPath stuff isn't always useful.
The XPath was OK, this just removes the first bit inside the first [ as it's not needed.
To output the XML, you need to use saveXML() with the node you want to export all of the XML tags as well...
$sname="jim";
$spath = new DOMXPath($doc);
// Find a div with a child div with a level3 with a matching attribute name.
$spexp = "//div[div/level3[contains(#name,\"$sname\")]]";
$story = $spath->evaluate("$spexp");
echo $doc->saveXML($story->item(0)). "\n";
Gives...
<div> Third Element
<div>
<level3 name="jim">
</level3>
</div>
</div>
So for example I have a HTML tree like this:
<section class="product">
<div>
<div class="p-image">
<img alt="Product name" src="path/to/image.jpg">
</div>
<div class="p-content">
<h3>Product name</h3>
</div>
<div class="p-info">
<div class="new-price">
<span>400 €</span>
</div>
</div>
</section>
So I want to get the content of span element whose parent (div) has a child element (img) with a specific alt attribute. I know how to select an element by its attributes, but I haven't found any solution to selecting an element by it's parent's child.
I hope my explanation was understandable.
Thank you.
in jQuery you could use $(selector).parent() to get element's parent and $(img alt="x") to get the img tag with alt attribute that is equal to x
I'm trying to scrape a web page for content, using file_get_contents to grab the HTML and then using a DOMDocument object. My problem is that I cannot get the appropriate information. I'm not sure if this is because I'm using DOMDocument's methods wrong, or if the (X)HTML in my source is just poor.
In the source, there is an element with an id of 'cards', which has two child divs. I want the first child, which has many child divs, who in turn have an anchor child with div child. I want the href from the anchor and the nodeValue from it's child div.
The structure is like this:
<div id="cards">
<div class="grid">
<div class="card-wrap">
<a href="linkValue">
<img src="..."/>
<div>nameValue</div>
</a>
</div>
...
</div>
<div id="...">
</div>
</div>
I've started out with $cards = $dom->getElementById("cards"). I get a DOMText Object, a DOMElement Object, a DOMText Object, a DOMElement Object, and a DOMText Object. I then use $grid = $cards->childNodes->item(1) to get the first DOMElement Object, which is presumably the .grid element. However, when I then iterate through the $grid with:
foreach($grid->childNodes as $item){
if($item->nodeName == "div"){
echo $item->nodeName,' | ',$item->nodeValue,'<br>';
}
}
I end up with a page full of "div | nameValue" where nameValue is the embedded div's nodeValue, and I am unable to locate the anchors to get their href value.
Am I doing something obviously wrong with my DOMDocument, or perhaps there is something more going on here?
Well, from your example code if($item->nodeName == "div"){ is very going to preclude any <a> tag. Additionally, I do not believe childNodes allows recursive iteration.
Therefore, to access the nodes in question, you could use:
$children = $dom->getElementById("cards")->childNodes
->item(1)->childNodes->item(1)->childNodes;
Yet, as you can see this is very messy... Introducing XPath:
http://php.net/manual/en/class.domxpath.php
http://www.w3schools.com/xpath/xpath_syntax.asp
The XPath way:
$src = <<<EOS
<div id="cards">
<div class="grid">
<div class="card-wrap">
<a href="linkValue">
<img src="..."/>
<div>nameValue</div>
</a>
</div>
</div>
<div id="whatever">
</div>
</div>
EOS;
$xml = new SimpleXMLElement($src);
list ($anchor) = $xml->xpath('//div[#id="cards"]/div[1]/div[1]/a');
echo $anchor->div, ' => ', $anchor['href'], PHP_EOL;
"Get anchor of first child div of first child div of div with an id of 'cards'"
Output:
nameValue => linkValue
I am Xpathing a DOMDocument file I have. the general pattern of this domdocument is as follows:
<h2> Title info </h2>
<div> .... </div>
<p> ...</p>
<div class = format_text>
<p>
<img src = "http://sourceofimageOnline.com">
</p>
</div>
<h2> 2nd title</h2>
<div> .... </div>
<p> ...</p>
<div class = format_text>
<p>
<img src = "http://sourceofimageOnline.com"></img>
<img src = "http://sourceofimageonline.com"</img>
</p>
</div>
The key is to return the titles and the src attribute for images that are hyperlinks.
Essentially, I render it as :
Title 1
Img URI 1
Title 2
Img URI 2
Img URI 3
...
..
Now the Titles can be easily retrieved using
DomDocument->getElementsByTagNames('h2')
And the img src are retrieved by an XPATH query:
//div[#class = "format_text"]/p/a/img/#src
This returns all the information I need. However, I am being challenged by trying to get the img src's relate to the titles they fall under. Since they are retrieved independently, I am unable to comprehend what kind of Xpath query I need to execute to retrieve both such that the above constraint is satisfied.
fetch an array with XPath expression /html/body//h2
iterate over this array with another XPath expression
refer to the current h2 with . and refer to the first link with
./../div[#class='format_text']/p/a[$counter]/img
XPath expression where $counter is the array id.
I have a html layout like :
<div id="pageno">1</div>
<div id="pageno">2</div>
<div id="pageno">3</div>
<div id="pageno">4</div>
<div id="pageno">5</div>
I need to know using html dom parser how can i know the last div inner text?
THanks in advance
// Create a new DomDocument.
$dom = new DomDocument();
// Load your HTML into it.
$dom->loadHTML('
<div id="pageno">1</div>
<div id="pageno">2</div>
<div id="pageno">3</div>
<div id="pageno">4</div>
<div id="pageno">5</div>
');
// Obtain a list of the DIVs.
$divList = $dom->getElementsByTagName("div");
// Obtain the last element of the list.
$lastDiv = $divList->item($divList->length - 1);
// Output the inner text.
echo $lastDiv->nodeValue;
However, the HTML you have provided is not valid, as element IDs should be unique. This may cause an error in the loadHTML function.