Hi I would like to remove from a parent id or class all html code
<?php
$html = '<div class="m-interstitial"><div class="m-interstitial">
<div class="m-interstitial__ad" data-readmore-target="">
<div class="m-block-ad" data-tms-ad-type="box" data-tms-ad-status="idle" data-tms-ad-pos="1">
<div class="m-block-ad__label m-block-ad__label--report-enabled"><span class="m-block-ad__label__text">Advertising</span> <button class="m-block-ad__label__report-link" title="Report this ad" data-tms-ad-report=""> </button></div>
<div class="m-block-ad__content"> </div>
</div>
</div>
<button class="m-interstitial__unlock-btn" data-readmore-unlocker=""> <span class="m-interstitial__unlock-btn__text">Read more</span>
</button></div>';
// I tried it with below code but it does not work
//$remove = preg_replace('#<div class="m-interstitial">(.*?)</div>#', '', $html);
$remove = preg_replace('#<div class="m-interstitial">(.*?)</div>#s', '', $html);
var_dump($remove); // result = normally I want the result is empty "" but it seems does not works.
my preg_replace does not works as I wish. Any ideas ?
thank you
Based on your code example, why don't you just set $html = ''; if that is what you want? If you have differing HTML, then use XPath to find matches:
<?php
$html = '<div class="m-interstitial">
<div class="m-interstitial">
<div class="m-interstitial__ad" data-readmore-target="">
<div class="m-block-ad" data-tms-ad-type="box" data-tms-ad-status="idle" data-tms-ad-pos="1">
<div class="m-block-ad__label m-block-ad__label--report-enabled"><span class="m-block-ad__label__text">Advertising</span> <button class="m-block-ad__label__report-link" title="Report this ad" data-tms-ad-report=""> </button></div>
<div class="m-block-ad__content"> </div>
</div>
</div>
<button class="m-interstitial__unlock-btn" data-readmore-unlocker=""> <span class="m-interstitial__unlock-btn__text">Read more</span></button>
</div>';
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->omitXmlDeclaration = true;
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = false;
$dom->strictErrorChecking = false;
$dom->formatOutput = false;
$dom->loadHTML('<?xml encoding="utf-8" ?>'.$html);
libxml_clear_errors();
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);
$child = $xpath->query("(//div[#class='m-interstitial'])[1]");
$parent = $child[0]->parentNode;
$parent->removeChild($child[0]);
echo $dom->saveXML($dom->documentElement);
I am not 100% sure if this is what you want to do, but in theory, using XPath/DOM would be used like this.
Resulting in a empty HTML (since you want to filter out the parent or root element of your html).
<html><body/></html>
I just do almost the same but your seems better
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$styles = $xpath->query('//div[#class="m-interstitial"]');
if ($styles) {
foreach ($styles as $style) {
$style->textContent = "";
}
}
$html = $doc->saveHTML();
var_dump($html );
Related
I know nothing, ZERO, about xpath or DOM.
In the end I need the href value and the content of the span from 12 H2 tags on the page. I have figured out how to get each item individually but getting them all in one shot isn't clicking, no matter how much I read. A little help?
<h2 class="make-it-pretty">
<a class="more-pretty" href="some-file-somewhere">
<span class="another-class">Product Name</span>
</a>
</h2>
Here is what I use to get them individually.
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$htext = $xpath->query('//h2[contains(#class, "make-it-pretty")]')->item(0);
echo $htext->textContent;
I would probably use $doc->loadHTMLFile instead, but:
<?php
$html = '<html lang="en"><head><meta charset="UTF-8" /><title>Title Here</title></head>
<body>
<h2 class="make-it-pretty"><a class="more-pretty" href="some-file-somewhere"><span class="another-class">Product Name</span></a></h2>
</body></html>';
$doc = #new DOMDocument(); $doc->loadHTML($html);
function getElementsByClassName($className, $withinNode = null){
global $doc;
$d = $withinNode ?? $doc;
$r = []; $a = $d->getElementsByTagName('*');
foreach($a as $n){
if($n->getAttribute('class') === $className)$r[] = $n;
}
return $r;
}
$anotherClass = getElementsByClassName('another-class');
// getElementsByClassName('make-it-pretty'); works as well, in this case
echo $anotherClass[0]->textContent;
?>
try this without Xpath
<?
$html ='<h2 class="make-it-pretty"> <a class="more-pretty" href="some-file-somewhere"> <span class="another-class">Product Name</span> </a> </h2><h2 class="make-it-pretty"> <a class="more-pretty" href="some-file-somewhere"> <span class="another-class">Product Name</span> </a> </h2><h2 class="make-it-pretty"> <a class="more-pretty" href="some-file-somewhere"> <span class="another-class">Product Name</span> </a> </h2>';
$dom = new DOMDocument("1.0", "utf-8");
if($dom->loadHTML($html, LIBXML_NOWARNING)){
$h2s = $dom->getElementsByTagName('h2');
foreach ($h2s as $h2) {
$as = $h2->getElementsByTagName('a');
echo '<pre>';
//print_r($as);
foreach($as as $a){
print_r('link :'.$a->getAttribute('href')."\n");
$spans = $a->getElementsByTagName('span');
}
foreach($spans as $span){
print_r('content :'.$span->nodeValue."\n");
}
}
}
Tying to extract the value "Output" between spans only if the title is "ABCD (1,2)" using php. Basically, find "Output (extract Output).
Here is the section of html:
<div class="wrap">
<strong title="ABCD (1,2)" class="name">ABCD (1,2):</strong>
<div id="test1">
<div class="testclass" id="test2">
<span>Output</span>
</div>
</div>
</div>
Here is the code I like to use:
<?php
$html = file_get_contents('test.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
//Some code needs to go here!
$tags = $dom->getElementsByTagName('strong');
?>
One way would be to just use xpath in this case, use a query that would select that desired element. Get that element that has that title and get the following div, and under it, go to the span:
Example (using the markup above):
$html = '
<div class="wrap">
<strong title="ABCD (1,2)" class="name">ABCD (1,2):</strong>
<div id="test1">
<div class="testclass" id="test2">
<span>Output</span>
</div>
</div>
</div>
';
$search_string = 'ABCD (1,2)';
$dom = new DOMDocument;
#$dom->loadHTML($html);
$query = "//strong[#title = '{$search_string}']/following-sibling::div/div/span";
$xpath = new DOMXpath($dom);
$result = $xpath->query($query);
if($result->length > 0) {
echo $result->item(0)->nodeValue;
}
I have PHP code which removes all nodes that have at least one attribute. Here is my code:
<?php
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
</div>
DATA;
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);
$xpath = new DOMXPath($dom);
$lines_to_be_removed = $xpath->query("//*[count(#*)>0]");
foreach ($lines_to_be_removed as $line) {
$line->parentNode->removeChild($line);
}
// just to check
echo $dom->saveHTML();
?>
As you see in the fiddle, this is the current output of code above:
<div>
<p>These line shall stay</p>
<p>But keep this</p>
</div>
While this is desired result:
<div>
<p>These line shall stay</p>
Remove this one
<p>But keep this</p>
and this
</div>
How can I do that?
Prior to removing the elements you want to pluck out their child nodes and tack them on behind it.
Example:
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
<div style="color: red">and <p>also</p> this</div>
<div style="color: red">and this <div style="color: red">too</div></div>
</div>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//*[#*]") as $node) {
$parent = $node->parentNode;
while ($node->hasChildNodes()) {
$parent->insertBefore($node->lastChild, $node->nextSibling);
}
$parent->removeChild($node);
}
echo $dom->saveHTML();
Outputs:
<div>
<p>These line shall stay</p>
Remove this one
<p>But keep this</p>
and this
and <p>also</p> this
and this too
</div>
https://3v4l.org/9qHRM
(I added some nested elements to demonstrate the safety of this approach.)
Couple of asides:
You don't need $dom->removeChild($dom->doctype) if you load with the additional LIBXML_HTML_NODEFDTD flag.
Your xpath expression can be simplified to //*[#*]
You could use replaceChild() with the text content of that node:
foreach ($lines_to_be_removed as $line) {
$line->parentNode->replaceChild($dom->createTextNode($line->textContent),$line);
}
// <div>
// <p>These line shall stay</p>
// Remove this one
// <p>But keep this</p>
// and this
// </div>
However, this may prove problematic with your // notation of your xpath selector and recursion.
Using a more manual approach to copy the child contents of the target nodes into the parent nodes.
$data = '
<div>
<div>1A</div>
<div class="foo">1B
<div>2C</div>
<div class="foo">2D</div>
<div>2E</div>
<div class="foo">2F
<div>3G</div>
<div class="foo">3H</div>
</div>
</div>
</div>';
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);
SomeFunctionName( $dom->documentElement );
$html = $dom->saveHTML();
function SomeFunctionName( $parent )
{
$nodesToDelete = array();
if( $parent->hasChildNodes() )
{
foreach( $parent->childNodes as $node )
{
SomeFunctionName( $node );
if( $node->hasAttributes() and count( $node->attributes ) > 0 )
{
foreach( $node->childNodes as $childNode )
{
$node->parentNode->insertBefore( clone $childNode, $node );
}
$nodesToDelete[] = $node;
}
}
}
foreach( $nodesToDelete as $delete)
{
$delete->parentNode->removeChild( $delete );
}
}
// <div>
// <div>1A</div>
// 1B
// <div>2C</div>
// 2D
// <div>2E</div>
// 2F
// <div>3G</div>
// 3H
// <div>3I</div>
// 3J
// </div>
If you want to nest the child elements in a new "div" container swap out this porition of code
foreach( $parent->childNodes as $node )
{
SomeFunctionName( $node );
if( $node->hasAttributes() and count( $node->attributes ) > 0 )
{
$newNode = $node->ownerDocument->createElement('div');
foreach( $node->childNodes as $childNode )
{
$newNode->appendChild( clone $childNode );
}
$node->parentNode->insertBefore( $newNode, $node );
$nodesToDelete[] = $node;
}
}
// <div>
// <div>1A</div>
// <div>1B
// <div>2C</div>
// <div>2D</div>
// <div>2E</div>
// <div>2F
// <div>3G</div>
// <div>3H</div>
// <div>3I</div>
// <div>3J</div>
// </div>
// </div>
// </div>
This will remove all tags that have class and style attributes, so it's not a bullet proof:
<?php
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
</div>
DATA;
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);
$xpath = new DOMXPath($dom);
$lines_to_be_removed = $xpath->query("//*[count(#class)>0 or count(#style)>0]");
foreach ($lines_to_be_removed as $line) {
$line->parentNode->removeChild($line);
}
// just to check
echo $dom->saveHTML();
?>
Note this line:
$lines_to_be_removed = $xpath->query("//*[count(#class)>0] or count(#style)>0]");
my input
<div id='makeme' class='testme'>
<span id='whatspanID'>somthing</span>
<p class='ptagclass'></p>
</div>
My expected output
<div>
<span></span>
<p></p>
</div>
To remove the content inside the tag, i can use below snippet, but how to remove the attributes from the tag
$html = str_get_html($str);
foreach($html->find("text") as $ht) {
$ht->innertext = "";
}
$html->save();
Using DOM and Xpath allows you to select text and attribute nodes.
$html = <<<'HTML'
<div id='makeme' class='testme'>
<span id='whatspanID'>somthing</span>
<p class='ptagclass'></p>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$div = $xpath->evaluate('//div[#id="makeme"]')->item(0);
$nodes = $xpath->evaluate('.//text()|#*|.//*/#*', $div);
foreach ($nodes as $node) {
if ($node instanceof DOMAttr) {
$node->parentNode->removeAttributeNode($node);
} else {
$node->parentNode->removeChild($node);
}
}
echo $dom->saveHtml($div);
Output:
<div>
<span></span><p></p>
</div>
<?php
libxml_use_internal_errors(true);
$html = '
<html>
<body>
<div>
Message <b>bold</b>, <s>strike</s>
</div>
<div>
<span class="how">
Link, <b> BOLD </b>
</span>
</div>
</body>
</html>
';
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->strictErrorChecking = false;
$dom->recover = true;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$messages = $xpath->query("//div");
foreach($messages as $message)
{
echo $message->nodeValue;
}
This code returns "Message bold, strike Link, BOLD " without html tags...
I want to output the following code:
Message <b>bold</b>, <s>strike</s>
<span class="how">
Link, <b> BOLD </b>
</span>
Can you help me?
$dom = new DOMDocument;
foreach($messages as $message)
{
echo $dom->saveHTML($message);
}
Use saveHTML()
I can do it using SimpleXML really quickly (if it's okay for you to switch from DOMDocument and DOMXPath, probably you will go with my solution):
$html = '
<html>
<body>
<div>
Message <b>bold</b>, <s>strike</s>
</div>
<div>
<span class="how">
Link, <b> BOLD </b>
</span>
</div>
</body>
</html>
';
$xml = simplexml_load_string($html);
$arr = $xml->xpath('//div/*');
foreach ($arr as $x) {
echo $x->asXML();
}