Recursive context nodes for xpath->query - php

Basically what I'm trying to achieve is replacing the content of the src-attributes of a bunch of img-nodes by the content of the corresponding data-src-nodes in a page like the following one.
<html>
<body>
<div id="a">
<img src="" data-src="myValue" />
<img src="" data-src="myValue2" />
</div>
<img src="" data-src="myValue" />
</body>
</html>
I want to do this by finding a common base node (in this case the img nodes in the div with id a) and based on that node
the node containing the value to copy and#
the node retrieving the value
Script
<?PHP
$html = '<html><body><div id="a"><img src="" data-src="myValue"/><img src="" data-src="myValue2"/></div><img src="" data-src="myValue"/></body></html>';
$doc = new DOMDocument();
#$doc->loadHTML($html);
$basenode = false;
$xpath = new DOMXPath($doc);
$entries = $xpath->query('(//div[#id="a"])');
if ($entries->length > 0) $basenode = $entries->item(0);
if ($basenode) {
$img = $xpath->query('//img', $basenode);
foreach ($img as $curImg) {
$from = $xpath->query('//#data-src', $curImg);
$to = $xpath->query('//#src', $curImg);
$to->item(0)->value = $from->item(0)->value;
}
echo $doc->saveXML();
}
?>
Expected output
<html>
<body>
<div id="a">
<img src="myValue" data-src="myValue" />
<img src="myValue2" data-src="myValue2" />
</div>
<img src="" data-src="myValue" />
</body>
</html>
Actual output
<html>
<body>
<div id="a">
<img src="myValue" data-src="myValue" />
<img src="" data-src="myValue2" />
</div>
<img src="" data-src="myValue" />
</body>
</html>
So, the line
$from = $xpath->query('//#data-src', $curImg);
seems to actually base its search on the root node and not the img-node selected before. How can I solve this?
(I know that a possible workaround would be to omit selecting the img-nodes explicitly and doing something like from='//div[#id="a"]/img/#data-src' and to='//div[#id="a"]/img/#src' but I'm a bit concerned, that I might end up copying values between attributes of different nodes)

/ at the beginning specifies an absolute location path (i.e, from the document root). Instead, you want to use a relative one (relative to the context node).
For example; .//#data-src, or descendant::img/#data-src, and so on.

Related

How to get data attribute value?

I have a url within a data-attribute and I need to get the first one:
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this)"; class="carousel-cell-image" data-flickity-lazyload="http://esportareinsvizzera.com/site/wp-content/uploads/8.jpg">
</div>
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this);" class="carousel-cell-image" data-flickity-lazyload="http://www.finanziamentiprestitimutui.com/wp-content/uploads/2014/09/esportazioni-finanziamento-credito.jpg">
</div>
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this);" class="carousel-cell-image" data-flickity-lazyload="http://www.infologis.biz/wp-content/uploads/2013/09/Export.jpg">
</div>
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this);" class="carousel-cell-image" data-flickity-lazyload="http://www.cigarettespedia.com/images/2/25/Esportazione_horizontal_name_ks_20_s_green_italy.jpg">
</div>
I have been reading lots of answers like this one and this one but I am not a php guy.
I was using this to get the first img but now I need the actual data attribute value instead
<?php
$custom_image = usp_get_meta(false, 'usp-custom-4');
$custom_image = htmlspecialchars_decode($custom_image);
$custom_image = nl2br($custom_image);
$custom_image = preg_replace('/<br \/>/iU', '', $custom_image);
preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i',$custom_image, $image);
?>
<img src="<?php echo $image['src']; ?>" alt="<?php the_title(); ?>">
Use DOMDocument to parse the HTML, get the elements corresponding to img tags and get the data-flickity-lazyload attribute of the first img tag:
...
$DOM = new DOMDocument;
$DOM->loadHTML($custom_image);
$items = $DOM->getElementsByTagName('img');
$mySrc = $items->item(0)->getAttribute('data-flickity-lazyload');

Edit and manipulate class and data-attributes of DOM elements with PHP

I have some HTML snippets retrieved through PHP/JSON such as:
<div>
<p>Some Text</p>
<img src="example.jpg" />
<img src="example2.jpg" />
<img src="example3.jpg" />
</div>
I am loading it with DOMDocument() and xpath and would like to be able to manipulate it so I can add lazy loading to the images like so:
<div>
<p>Some Text</p>
<img class="lazy" src="blank.gif" data-src="example.jpg" />
<img class="lazy" src="blank.gif" data-src="example2.jpg" />
<img class="lazy" src="blank.gif" data-src="example3.jpg" />
</div>
Which entails:
Add class .lazy
Add data-src attribute from original src attribute
Modify src attribute to blank.gif
I am trying the following but it isn't working:
foreach ($xpath->query("//img") as $node) {
$node->setAttribute( "class", $node->getAttribute("class")." lazy");
$node->setAttribute( "data-src", $node->getAttribute("src"));
$node->setAttribute( "src", "./inc/image/blank.gif");
}
but it isn't working.
Are you sure? The following works for me.
<?php
$html = <<<EOQ
<div>
<p>Some Text</p>
<img src="example.jpg" />
<img src="example2.jpg" />
<img src="example3.jpg" />
</div>
EOQ;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//img') as $node) {
$node->setAttribute('class', $node->getAttribute('class') . ' lazy');
$node->setAttribute( "data-src", $node->getAttribute("src"));
$node->setAttribute( "src", "./inc/image/blank.gif");
}
echo $dom->saveHTML();

xpath not returning text if p tag is followed by any other tag

i want to get all the text between <p> and <h3> tag for the following HTML
<div class="bodyText">
<p>
<div class="articleBox articleSmallHorizontal channel-32333770 articleBoxBordered alignRight">
<div class="one">
<img src="url" alt="bar" class="img" width="80" height="60" />
</div>
<div class="two">
<h4 class="preTitle">QIEZ-Lieblinge</h4>
<h3 class="title"><a href="url" title="ABC" onclick="cmsTracking.trackClickOut({element:this, channel : 32333770, channelname : 'top_listen', content : 14832081, callTemplate : '_htmltagging.Text', action : 'click', mouseevent : event});">
Prominente Gastronomen </a></h3>
<span class="postTitle"></span>
<span class="district">Berlin</span> </div>
<div class="clear"></div>
</div>
I want this TEXT</p>
<h3>I want this TEXT</h3>
<p>I want this TEXT</p>
<p>
<div class="inlineImage alignLeft">
<div class="medium">
<img src="http://images03.qiez.de/Restaurant+%C3%96_QIEZ.jpg/280x210/0/167.231.886/167.231.798" width="280" height="210" alt="Schöne Lage: das Restaurant Ø. (c)QIEZ"/>
<span class="caption">
Schöne Lage: das Restaurant Ø. (c)QIEZ </span>
</div>
</div>I want this TEXT</p>
<p>I want this TEXT</p>
<p>I want this TEXT<br /> </p>
<blockquote><img src="url" alt="" width="68" height="68" />
"Eigentlich nur drei Worte: Ich komme wieder."<span class="author">Tina Gerstung</span></blockquote>
<div class="clear"></div>
</div>
i want all "I want this TEXT". i used xpath query
//div[contains(#class,'bodyText')]/*[local-name()='p' or local-name()='h3']
but it does not give me the text if <p> tag is followed by any other tag
It looks like you have div elements contained within your p element which is not valid and messing up things. If you use a var_dump in the loop you can see that it does actually pick up the node but the nodeValue is empty.
A quick and dirty fix to your html would be to wrap the first div that is contained in the p element in a span.
<span><div class="articleBox articleSmallHorizontal channel-32333770 articleBoxBordered alignRight">...</div></span>
A better fix would be to put the div element outside the paragraph.
If you use the dirty workaround you will need to change your query like so:
$xpath->query("//div[contains(#class,'bodyText')]/*[local-name()='p' or local-name()='h3']/text()");
If you do not have control of the source html. You can make a copy of the html and remove the offending divs:
$nodes = $xpath->query("//div[contains(#class,'articleBox')]");
$node = $nodes->item(0);
$node->parentNode->removeChild($node);
It might be easier to work with simple_html_dom. Maybe you can try this:
include('simple_html_dom.php');
$dom = new simple_html_dom();
$dom->load($html);
foreach($dom->find("div[class=bodyText]") as $parent) {
foreach($parent->children() as $child) {
if ($child->tag == 'p' || $child->tag == 'h3') {
// remove the inner text of divs contained within a p element
foreach($dom->find('div') as $e)
$e->innertext = '';
echo $child->plaintext . '<br>';
}
}
}
This is mixed content. Depending on what defines the position of the element, you can use a number of factors. In this cse, probably simply selected all the text nodes will be sufficient:
//div[contains(#class, 'bodyText')]/(p | h3)/text()
If the union operator within a path location is not allowed in your processor, then you can use your syntax as before or a little bit simpler in my opinion:
//div[contains(#class, 'bodyText')]/*[local-name() = ('p', 'h3')]/text()

php remove tags before a specified tag

I want to remove all image-tags before the headline starts, but they are not nested the same way. And then remove the empty tags.
<div class="c2">
<img src="image/file" width="480" height="360" alt="Image" />
</div>
<div class="c2">
<div class="headline">
headline
</div>
<div class="headline">
headline2
</div>
</div>
and different nested tags like
<div class="c2">
<p>
<img src="image/A.JPG" width="480" height="319" alt="Image" />
</p>
<div class="headline">
A headline
</div>
</div>
i think that could be solved recursively, but i dont know how.
Thanks for your help!
EDIT: if you want to remove only <img> followed by <div><div class="headline>" or <div class="headline">, use this xpath:
$imgs = $xpath->query("//img[../following-sibling::div[1]/div/#class='headline' or ../following-sibling::div[1]/#class='headline']");
see it working: http://codepad.viper-7.com/QhprLP
Do it like this:
$doc = new DOMDocument();
$doc->loadHTML($x); // assuming HTML in $x
$xpath = new DOMXpath($doc);
$imgs = $xpath->query("//img"); // select all <img> nodes
foreach ($imgs as $img) { // loop through list of all <img> nodes
$parent = $img->parentNode;
$parent->removeChild($img); // delete <img> node
if ($parent->childNodes->length >= 1) // if parent node of <img> is empty delete it
$parent->parentNode->removeChild($parent);
}
echo htmlentities($doc->saveHTML()); // display the new HTML
see it working: http://codepad.viper-7.com/350Hw6

Echoing html file using php

<html>
<body>
<?php
$html='<html>
<body>
<p>
Illustrating wget -r , with images too
</p>
<p>
This is the first image
<img src="abc.JPG" alt="First Captured Image"/>
</p>
<p>
This is the second image
<img src="def.JPG" alt="Second Captured Image"/>
</p>
</body>
</html>';
echo($html);
$dom = new domDocument;
#$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
echo $alter->get Attribute('alt');
echo "<img sr='$image'/>";
}
</body>
</html>
My code is to parse the images in the html code above and then use php to echo it again.
This is my sample code. How do I echo the above html code, using php, and display in the same way?
I tried doing this, but I cant place image in exactly the same position as where I retrieved. I don't want to use "header(contents)"
Your statement echo "<img sr='$image'/>"; is incorrect. The correct one is
echo "<img src='$image'/>";

Categories