Php Remove content html from specific class

Php Remove content html from specific class - php

Hi I would like to remove from a parent id or class all html code
<?php
$html = '<div class="m-interstitial"><div class="m-interstitial">
<div class="m-interstitial__ad" data-readmore-target="">
<div class="m-block-ad" data-tms-ad-type="box" data-tms-ad-status="idle" data-tms-ad-pos="1">
<div class="m-block-ad__label m-block-ad__label--report-enabled"><span class="m-block-ad__label__text">Advertising</span> <button class="m-block-ad__label__report-link" title="Report this ad" data-tms-ad-report=""> </button></div>
<div class="m-block-ad__content"> </div>
</div>
</div>
<button class="m-interstitial__unlock-btn" data-readmore-unlocker=""> <span class="m-interstitial__unlock-btn__text">Read more</span>
</button></div>';
// I tried it with below code but it does not work
//$remove = preg_replace('#<div class="m-interstitial">(.*?)</div>#', '', $html);
$remove = preg_replace('#<div class="m-interstitial">(.*?)</div>#s', '', $html);
var_dump($remove); // result = normally I want the result is empty "" but it seems does not works.
my preg_replace does not works as I wish. Any ideas ?
thank you

Based on your code example, why don't you just set $html = ''; if that is what you want? If you have differing HTML, then use XPath to find matches:
<?php
$html = '<div class="m-interstitial">
<div class="m-interstitial">
<div class="m-interstitial__ad" data-readmore-target="">
<div class="m-block-ad" data-tms-ad-type="box" data-tms-ad-status="idle" data-tms-ad-pos="1">
<div class="m-block-ad__label m-block-ad__label--report-enabled"><span class="m-block-ad__label__text">Advertising</span> <button class="m-block-ad__label__report-link" title="Report this ad" data-tms-ad-report=""> </button></div>
<div class="m-block-ad__content"> </div>
</div>
</div>
<button class="m-interstitial__unlock-btn" data-readmore-unlocker=""> <span class="m-interstitial__unlock-btn__text">Read more</span></button>
</div>';
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->omitXmlDeclaration = true;
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = false;
$dom->strictErrorChecking = false;
$dom->formatOutput = false;
$dom->loadHTML('<?xml encoding="utf-8" ?>'.$html);
libxml_clear_errors();
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);
$child = $xpath->query("(//div[#class='m-interstitial'])[1]");
$parent = $child[0]->parentNode;
$parent->removeChild($child[0]);
echo $dom->saveXML($dom->documentElement);
I am not 100% sure if this is what you want to do, but in theory, using XPath/DOM would be used like this.
Resulting in a empty HTML (since you want to filter out the parent or root element of your html).
<html><body/></html>

I just do almost the same but your seems better
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$styles = $xpath->query('//div[#class="m-interstitial"]');
if ($styles) {
foreach ($styles as $style) {
$style->textContent = "";
}
}
$html = $doc->saveHTML();
var_dump($html );

Related

Get H2 text and href values from inside all H2 tags on the page using xpath?

I know nothing, ZERO, about xpath or DOM.
In the end I need the href value and the content of the span from 12 H2 tags on the page. I have figured out how to get each item individually but getting them all in one shot isn't clicking, no matter how much I read. A little help?
<h2 class="make-it-pretty">
<a class="more-pretty" href="some-file-somewhere">
<span class="another-class">Product Name</span>
</a>
</h2>
Here is what I use to get them individually.
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$htext = $xpath->query('//h2[contains(#class, "make-it-pretty")]')->item(0);
echo $htext->textContent;

I would probably use $doc->loadHTMLFile instead, but:
<?php
$html = '<html lang="en"><head><meta charset="UTF-8" /><title>Title Here</title></head>
<body>
<h2 class="make-it-pretty"><a class="more-pretty" href="some-file-somewhere"><span class="another-class">Product Name</span></a></h2>
</body></html>';
$doc = #new DOMDocument(); $doc->loadHTML($html);
function getElementsByClassName($className, $withinNode = null){
global $doc;
$d = $withinNode ?? $doc;
$r = []; $a = $d->getElementsByTagName('*');
foreach($a as $n){
if($n->getAttribute('class') === $className)$r[] = $n;
}
return $r;
}
$anotherClass = getElementsByClassName('another-class');
// getElementsByClassName('make-it-pretty'); works as well, in this case
echo $anotherClass[0]->textContent;
?>

try this without Xpath
<?
$html ='<h2 class="make-it-pretty"> <a class="more-pretty" href="some-file-somewhere"> <span class="another-class">Product Name</span> </a> </h2><h2 class="make-it-pretty"> <a class="more-pretty" href="some-file-somewhere"> <span class="another-class">Product Name</span> </a> </h2><h2 class="make-it-pretty"> <a class="more-pretty" href="some-file-somewhere"> <span class="another-class">Product Name</span> </a> </h2>';
$dom = new DOMDocument("1.0", "utf-8");
if($dom->loadHTML($html, LIBXML_NOWARNING)){
$h2s = $dom->getElementsByTagName('h2');
foreach ($h2s as $h2) {
$as = $h2->getElementsByTagName('a');
echo '<pre>';
//print_r($as);
foreach($as as $a){
print_r('link :'.$a->getAttribute('href')."\n");
$spans = $a->getElementsByTagName('span');
}
foreach($spans as $span){
print_r('content :'.$span->nodeValue."\n");
}
}
}

Extracting value of a node after a certain tag

Tying to extract the value "Output" between spans only if the title is "ABCD (1,2)" using php. Basically, find "Output (extract Output).
Here is the section of html:
<div class="wrap">
<strong title="ABCD (1,2)" class="name">ABCD (1,2):</strong>
<div id="test1">
<div class="testclass" id="test2">
<span>Output</span>
</div>
</div>
</div>
Here is the code I like to use:
<?php
$html = file_get_contents('test.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
//Some code needs to go here!
$tags = $dom->getElementsByTagName('strong');
?>

One way would be to just use xpath in this case, use a query that would select that desired element. Get that element that has that title and get the following div, and under it, go to the span:
Example (using the markup above):
$html = '
<div class="wrap">
<strong title="ABCD (1,2)" class="name">ABCD (1,2):</strong>
<div id="test1">
<div class="testclass" id="test2">
<span>Output</span>
</div>
</div>
</div>
';
$search_string = 'ABCD (1,2)';
$dom = new DOMDocument;
#$dom->loadHTML($html);
$query = "//strong[#title = '{$search_string}']/following-sibling::div/div/span";
$xpath = new DOMXpath($dom);
$result = $xpath->query($query);
if($result->length > 0) {
echo $result->item(0)->nodeValue;
}

How can I remove DOM element tags but leave their contents?

I have PHP code which removes all nodes that have at least one attribute. Here is my code:
<?php
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
</div>
DATA;
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);
$xpath = new DOMXPath($dom);
$lines_to_be_removed = $xpath->query("//*[count(#*)>0]");
foreach ($lines_to_be_removed as $line) {
$line->parentNode->removeChild($line);
}
// just to check
echo $dom->saveHTML();
?>
As you see in the fiddle, this is the current output of code above:
<div>
<p>These line shall stay</p>
<p>But keep this</p>
</div>
While this is desired result:
<div>
<p>These line shall stay</p>
Remove this one
<p>But keep this</p>
and this
</div>
How can I do that?

Prior to removing the elements you want to pluck out their child nodes and tack them on behind it.
Example:
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
<div style="color: red">and <p>also</p> this</div>
<div style="color: red">and this <div style="color: red">too</div></div>
</div>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//*[#*]") as $node) {
$parent = $node->parentNode;
while ($node->hasChildNodes()) {
$parent->insertBefore($node->lastChild, $node->nextSibling);
}
$parent->removeChild($node);
}
echo $dom->saveHTML();
Outputs:
<div>
<p>These line shall stay</p>
Remove this one
<p>But keep this</p>
and this
and <p>also</p> this
and this too
</div>
https://3v4l.org/9qHRM
(I added some nested elements to demonstrate the safety of this approach.)
Couple of asides:
You don't need $dom->removeChild($dom->doctype) if you load with the additional LIBXML_HTML_NODEFDTD flag.
Your xpath expression can be simplified to //*[#*]

You could use replaceChild() with the text content of that node:
foreach ($lines_to_be_removed as $line) {
$line->parentNode->replaceChild($dom->createTextNode($line->textContent),$line);
}
// <div>
// <p>These line shall stay</p>
// Remove this one
// <p>But keep this</p>
// and this
// </div>
However, this may prove problematic with your // notation of your xpath selector and recursion.
Using a more manual approach to copy the child contents of the target nodes into the parent nodes.
$data = '
<div>
<div>1A</div>
<div class="foo">1B
<div>2C</div>
<div class="foo">2D</div>
<div>2E</div>
<div class="foo">2F
<div>3G</div>
<div class="foo">3H</div>
</div>
</div>
</div>';
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);
SomeFunctionName( $dom->documentElement );
$html = $dom->saveHTML();
function SomeFunctionName( $parent )
{
$nodesToDelete = array();
if( $parent->hasChildNodes() )
{
foreach( $parent->childNodes as $node )
{
SomeFunctionName( $node );
if( $node->hasAttributes() and count( $node->attributes ) > 0 )
{
foreach( $node->childNodes as $childNode )
{
$node->parentNode->insertBefore( clone $childNode, $node );
}
$nodesToDelete[] = $node;
}
}
}
foreach( $nodesToDelete as $delete)
{
$delete->parentNode->removeChild( $delete );
}
}
// <div>
// <div>1A</div>
// 1B
// <div>2C</div>
// 2D
// <div>2E</div>
// 2F
// <div>3G</div>
// 3H
// <div>3I</div>
// 3J
// </div>
If you want to nest the child elements in a new "div" container swap out this porition of code
foreach( $parent->childNodes as $node )
{
SomeFunctionName( $node );
if( $node->hasAttributes() and count( $node->attributes ) > 0 )
{
$newNode = $node->ownerDocument->createElement('div');
foreach( $node->childNodes as $childNode )
{
$newNode->appendChild( clone $childNode );
}
$node->parentNode->insertBefore( $newNode, $node );
$nodesToDelete[] = $node;
}
}
// <div>
// <div>1A</div>
// <div>1B
// <div>2C</div>
// <div>2D</div>
// <div>2E</div>
// <div>2F
// <div>3G</div>
// <div>3H</div>
// <div>3I</div>
// <div>3J</div>
// </div>
// </div>
// </div>

This will remove all tags that have class and style attributes, so it's not a bullet proof:
<?php
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
</div>
DATA;
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$dom->removeChild($dom->doctype);
$xpath = new DOMXPath($dom);
$lines_to_be_removed = $xpath->query("//*[count(#class)>0 or count(#style)>0]");
foreach ($lines_to_be_removed as $line) {
$line->parentNode->removeChild($line);
}
// just to check
echo $dom->saveHTML();
?>
Note this line:
$lines_to_be_removed = $xpath->query("//*[count(#class)>0] or count(#style)>0]");

PHP Simple HTML DOM Parser, Remove attributes from the TAG without any specific unique input

my input
<div id='makeme' class='testme'>
<span id='whatspanID'>somthing</span>
<p class='ptagclass'></p>
</div>
My expected output
<div>
<span></span>
<p></p>
</div>
To remove the content inside the tag, i can use below snippet, but how to remove the attributes from the tag
$html = str_get_html($str);
foreach($html->find("text") as $ht) {
$ht->innertext = "";
}
$html->save();

Using DOM and Xpath allows you to select text and attribute nodes.
$html = <<<'HTML'
<div id='makeme' class='testme'>
<span id='whatspanID'>somthing</span>
<p class='ptagclass'></p>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$div = $xpath->evaluate('//div[#id="makeme"]')->item(0);
$nodes = $xpath->evaluate('.//text()|#*|.//*/#*', $div);
foreach ($nodes as $node) {
if ($node instanceof DOMAttr) {
$node->parentNode->removeAttributeNode($node);
} else {
$node->parentNode->removeChild($node);
}
}
echo $dom->saveHtml($div);
Output:
<div>
<span></span><p></p>
</div>

PHP XPath. How to return string with html tags?

<?php
libxml_use_internal_errors(true);
$html = '
<html>
<body>
<div>
Message <b>bold</b>, <s>strike</s>
</div>
<div>
<span class="how">
Link, <b> BOLD </b>
</span>
</div>
</body>
</html>
';
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->strictErrorChecking = false;
$dom->recover = true;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$messages = $xpath->query("//div");
foreach($messages as $message)
{
echo $message->nodeValue;
}
This code returns "Message bold, strike Link, BOLD " without html tags...
I want to output the following code:
Message <b>bold</b>, <s>strike</s>
<span class="how">
Link, <b> BOLD </b>
</span>
Can you help me?

$dom = new DOMDocument;
foreach($messages as $message)
{
echo $dom->saveHTML($message);
}
Use saveHTML()

I can do it using SimpleXML really quickly (if it's okay for you to switch from DOMDocument and DOMXPath, probably you will go with my solution):
$html = '
<html>
<body>
<div>
Message <b>bold</b>, <s>strike</s>
</div>
<div>
<span class="how">
Link, <b> BOLD </b>
</span>
</div>
</body>
</html>
';
$xml = simplexml_load_string($html);
$arr = $xml->xpath('//div/*');
foreach ($arr as $x) {
echo $x->asXML();
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Php Remove content html from specific class - php

Related

Get H2 text and href values from inside all H2 tags on the page using xpath?

Extracting value of a node after a certain tag

How can I remove DOM element tags but leave their contents?

PHP Simple HTML DOM Parser, Remove attributes from the TAG without any specific unique input

PHP XPath. How to return string with html tags?

Categories

Resources