how to get content of an element with HTML nodes? - php

I need to get the content of an element and place that content into another element. I use createTextNode to append that content as a child to the target element.
As I append it as text node, < and > is converted into < and >. How can I append that content without conversion?
For example:
<li id="fn1">
<div>
<a>some text
</a>
</div>
</li>
Expected output:
<p>
<div>
<a>some text
</a>
</div>
</p>
But my output is like,
<p>
<div>
<a>some text</a>
</div>
</p>
my code
$ch=dom->createElement("p");
$li=$xp->query("//li[contains(#id, 'fn')]");
foreach($li as $liv) {
$linodes = $liv->childNodes;
$pvalue="";
foreach ($linodes as $lin) {
$pvalue.=$dom->saveXML($lin);}
$ch->appendChild($dom->createTextNode($pvalue)); }
I have tried,
$ch->appendChild($dom->createTextNode(htmlspecialchars_decode($pvalue))); but same output

If you want to
move a node within the same document: remove that node via DOMNode::removeChild and append the return value of that function via DOMNode::appendChild to its new parent node.
copy the node to a new location within the same document, make a deep clone of the node via DOMNode::clone the node and append it.
transfer the node to another document, import that node to the new document via DOMDOcument::importNode and then append it to its new parent.

Related

XPATH select all descendant text nodes in the order they appear excluding some of them

I want to select all descendant text nodes with XPATH in the order they appear in a certain div. Problem being there are nodes I do not want to select.
<div class="div-needed">
<p> Some random text 1<strong>important text 1</strong> more text </p>
<p> Some random text 2<strong>important text 2</strong> more text </p>
<p> Some random text 3<strong>important text 3</strong> more text </p>
<div class="not-select">Unwanted content</div>
<ul class="unwanted"></ul>
<p> Some random text 4<strong>important text 4</strong> more text </p>
<h3>Text I also need</h3>
<script>
*/Unwanted Code /*
</script>
</div>
I need <h3>, the <p> and their children <strong> which are descendants from <div class="div-needed">, but I also need to obtain them in the order they appear in the HTML; I have tried div[#class="div-needed"]/descendant::text()[not(#class="not-select")] trying to select all text nodes and excluding the one I do not want but it does not change the final output. Does anyone know how to exclude some nodes when selecting descendants in XPATH but at the same time keeping nodes the way they appear?
I wouldn't look for text nodes, if you want the h3 and the p children select e.g. div[#class = 'div-needed']/p | div[#class = 'div-needed']/h3 or div[#class = 'div-needed']/*[self::p|self::h3]. As the strong elements are children of the p elements the text value of the p element contains the complete text; you can either read out string() with a relative XPath for each selected p or in the DOM the textContent property of each p or h3 element (or in general, the selected elements).

PHP XMLNode, DOMnode Xpath selection predicate for a grandchild attribute value

I have some xml
<div> First Element
<div>
<level3 name="fred">
</level3>
</div>
</div>
<div> Second Element
<div>
<level3 name="dave">
</level3>
</div>
</div>
<div> Third Element
<div>
<level3 name="jim">
</level3>
</div>
</div>
<div> Fifth Element
<div>
<level3 name="mike">
</level3>
</div>
</div>
I want to extract the xml (as a string, including the xml tags) from a specific top level div element based in its grandchilds name at level3.
So to get the top div above the level3 node with the name of jim I have been looking at things like:
$sname="jim";
$spath = new DOMXPath($doc);
// Find a div with a child div with a level3 with a matching attribute name.
$spexp = "//div[./div/level3[contains(#name,\"$sname\")]]";
$story = $spath->evaluate("$spexp");
echo $story->item(0)->nodeValue . "\n";
I have tried various combinations - including 'exists' in the predicate which I am sure is basic xslt, but not in PHP(!).
I have googled loads... but predicates going down past the immediate level hasn't come up, and it seems PHP's xpath has its own flavour, so general XPath stuff isn't always useful.
The XPath was OK, this just removes the first bit inside the first [ as it's not needed.
To output the XML, you need to use saveXML() with the node you want to export all of the XML tags as well...
$sname="jim";
$spath = new DOMXPath($doc);
// Find a div with a child div with a level3 with a matching attribute name.
$spexp = "//div[div/level3[contains(#name,\"$sname\")]]";
$story = $spath->evaluate("$spexp");
echo $doc->saveXML($story->item(0)). "\n";
Gives...
<div> Third Element
<div>
<level3 name="jim">
</level3>
</div>
</div>

How to get element whose parents child is element x? PHP Simple HTML DOM Parser

So for example I have a HTML tree like this:
<section class="product">
<div>
<div class="p-image">
<img alt="Product name" src="path/to/image.jpg">
</div>
<div class="p-content">
<h3>Product name</h3>
</div>
<div class="p-info">
<div class="new-price">
<span>400 €</span>
</div>
</div>
</section>
So I want to get the content of span element whose parent (div) has a child element (img) with a specific alt attribute. I know how to select an element by its attributes, but I haven't found any solution to selecting an element by it's parent's child.
I hope my explanation was understandable.
Thank you.
in jQuery you could use $(selector).parent() to get element's parent and $(img alt="x") to get the img tag with alt attribute that is equal to x

Web scraper with DOMDocument

I'm trying to scrape a web page for content, using file_get_contents to grab the HTML and then using a DOMDocument object. My problem is that I cannot get the appropriate information. I'm not sure if this is because I'm using DOMDocument's methods wrong, or if the (X)HTML in my source is just poor.
In the source, there is an element with an id of 'cards', which has two child divs. I want the first child, which has many child divs, who in turn have an anchor child with div child. I want the href from the anchor and the nodeValue from it's child div.
The structure is like this:
<div id="cards">
<div class="grid">
<div class="card-wrap">
<a href="linkValue">
<img src="..."/>
<div>nameValue</div>
</a>
</div>
...
</div>
<div id="...">
</div>
</div>
I've started out with $cards = $dom->getElementById("cards"). I get a DOMText Object, a DOMElement Object, a DOMText Object, a DOMElement Object, and a DOMText Object. I then use $grid = $cards->childNodes->item(1) to get the first DOMElement Object, which is presumably the .grid element. However, when I then iterate through the $grid with:
foreach($grid->childNodes as $item){
if($item->nodeName == "div"){
echo $item->nodeName,' | ',$item->nodeValue,'<br>';
}
}
I end up with a page full of "div | nameValue" where nameValue is the embedded div's nodeValue, and I am unable to locate the anchors to get their href value.
Am I doing something obviously wrong with my DOMDocument, or perhaps there is something more going on here?
Well, from your example code if($item->nodeName == "div"){ is very going to preclude any <a> tag. Additionally, I do not believe childNodes allows recursive iteration.
Therefore, to access the nodes in question, you could use:
$children = $dom->getElementById("cards")->childNodes
->item(1)->childNodes->item(1)->childNodes;
Yet, as you can see this is very messy... Introducing XPath:
http://php.net/manual/en/class.domxpath.php
http://www.w3schools.com/xpath/xpath_syntax.asp
The XPath way:
$src = <<<EOS
<div id="cards">
<div class="grid">
<div class="card-wrap">
<a href="linkValue">
<img src="..."/>
<div>nameValue</div>
</a>
</div>
</div>
<div id="whatever">
</div>
</div>
EOS;
$xml = new SimpleXMLElement($src);
list ($anchor) = $xml->xpath('//div[#id="cards"]/div[1]/div[1]/a');
echo $anchor->div, ' => ', $anchor['href'], PHP_EOL;
"Get anchor of first child div of first child div of div with an id of 'cards'"
Output:
nameValue => linkValue

Retrieving (relating) two separate tags/attributes using a single XPath query?

I am Xpathing a DOMDocument file I have. the general pattern of this domdocument is as follows:
<h2> Title info </h2>
<div> .... </div>
<p> ...</p>
<div class = format_text>
<p>
<img src = "http://sourceofimageOnline.com">
</p>
</div>
<h2> 2nd title</h2>
<div> .... </div>
<p> ...</p>
<div class = format_text>
<p>
<img src = "http://sourceofimageOnline.com"></img>
<img src = "http://sourceofimageonline.com"</img>
</p>
</div>
The key is to return the titles and the src attribute for images that are hyperlinks.
Essentially, I render it as :
Title 1
Img URI 1
Title 2
Img URI 2
Img URI 3
...
..
Now the Titles can be easily retrieved using
DomDocument->getElementsByTagNames('h2')
And the img src are retrieved by an XPATH query:
//div[#class = "format_text"]/p/a/img/#src
This returns all the information I need. However, I am being challenged by trying to get the img src's relate to the titles they fall under. Since they are retrieved independently, I am unable to comprehend what kind of Xpath query I need to execute to retrieve both such that the above constraint is satisfied.
fetch an array with XPath expression /html/body//h2
iterate over this array with another XPath expression
refer to the current h2 with . and refer to the first link with
./../div[#class='format_text']/p/a[$counter]/img
XPath expression where $counter is the array id.

Categories