How to Remove the Parent Div using PHP DOMDocument - php

$html_string = '<div class="quote" post_id="57"
style="border:1px solid #000;padding:15px;margin:15px;"
user_id="1" user_name="david_cameron"><strong><span
style="font-size:200%;">My Name is Rashid Farooq</span></strong></div>';
I want to remove the Parent Div and get only the following output
<strong><span style="font-size:200%;">My Name is David Cameron</span></strong>
I have tried
$dom = new DOMDocument;
$dom->loadHTML($html_string);
$divs = $dom->getElementsByTagName('div');
$innerHTML_contents = $divs->item(0)->textContent
echo $innerHTML_contents
But It gives me the only 'My Name is David Cameron' and strip all the tags.
How Can I remove only the parent div and get all other html contents in the div?

try to use this function
function DOMinnerHTML($element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
like
$dom = new DOMDocument;
$dom->loadHTML($html_string);
$divs = $dom->getElementsByTagName('div');
$innerHTML_contents = DOMinnerHTML($divs->item(0));
echo $innerHTML_contents
output
<strong><span style="font-size:200%;">My Name is Rashid Farooq</span></strong>

Related

PHP DOMDocument: getElementsbyTagName('img') not work

I'm trying to grab an img tag inside a html. The HTML looks somewhat like this:
<img style='width:198px;height:279px;' class='featureImg' src='image-loader.gif' data-src='http://somesites.com/med/1455.jpg' alt="Picture">
Now I want to grab img src http://somesites.com/med/1455.jpg.
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$divs = $dom->getElementsByTagName('img');
foreach ($divs as $div){
if(preg_match_all('/\bfeatureImg\b/', $div->getAttribute('class'))) {
$links = $div->getElementsByTagName('img');
foreach($links as $link){
$li = $link->getAttribute('data-src');
echo ($li.'<br>');
}}}
And doesn't work... Anybody help??
This already gives you the images:
$divs = $dom->getElementsByTagName('img');
So if the preg_match_all matches, you can get the 'data-src' attribute:
$html = <<<SOURCE
<img style='width:198px;height:279px;' class='featureImg' src='image-loader.gif' data-src='http://somesites.com/med/1455.jpg' alt="Picture">
SOURCE;
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$divs = $dom->getElementsByTagName('img');
foreach ($divs as $div) {
if (preg_match_all('/\bfeatureImg\b/', $div->getAttribute('class'))) {
echo $div->getAttribute('data-src');
}
}
Will result in:
http://somesites.com/med/1455.jpg

How to loop through all the Childs under a tag in PHP DOMDocument

I have the following html
$html = '<body><div style="font-color:#000">Hello</div>
<span style="what">My name is rasid</span><div>new to you
</div><div style="rashid">New here</div></body>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$elements = $dom->getElementsByTagName('body');
I have tried
foreach($elements as $child)
{
echo $child->nodeName;
}
The Ouput is
body
But I need to loop through all the tags under body not the body. How can I do that.
I have also tried in above example to replace
$elements = $dom->getElementsByTagName('body');
with
$elements = $dom->getElementsByTagName('body')->item(0);
But It gives Error. Any Solution??
try this
$elements = $dom->getElementsByTagName('*');
$i = 1; //counter to output from 3rd one, since foreach loop below will output" html body div span div div"
foreach($elements as $child)
{
if ($i > 2) echo $child->nodeName."<br>"; //output "div span div div"
++$i;
}
If you only want child nodes of the body element, you can use:
$body = $dom->getElementsByTagName( 'body' )->item( 0 );
foreach( $body->childNodes as $node )
{
echo $node->nodeName . PHP_EOL;
}
If you want all descending nodes of the body element, you could use DOMXPath:
$xpath = new DOMXPath( $dom );
$bodyDescendants = $xpath->query( '//body//node()' );
foreach( $bodyDescendants as $node )
{
echo $node->nodeName . PHP_EOL;
}
use this code
$elements = $dom->getElementsByTagName('*');
foreach($elements as $child)
{
echo $child->nodeName;
}

Trying to use PHP DOM to replace node text without changing child nodes

I am trying to use the dom object to simplify the implementation of a glossary tooltip. What I need to do is to replace a text element in a paragraph, but NOT in an anchor tag that may be embedded in the paragraph.
$html = '<p>Replace this tag not this tag</p>';
$document = new DOMDocument();
$document->loadHTML($html);
$document->preserveWhiteSpace = false;
$document->validateOnParse = true;
$nodes = $document->getElementByTagName("p");
foreach ($nodes as $node) {
$node->nodeValue = str_replace("tag","element",$node->nodeValue);
}
echo $document->saveHTML();
I get:
'...<p>Replace this element not this element</p>...'
I want:
'...<p>Replace this element not this tag</p>...'
How do I implement this such that only the parent node text is changed and the child node (a tag) is not changed?
Try this:
$html = '<p>Replace this tag not this tag</p>';
$document = new DOMDocument();
$document->loadHTML($html);
$document->preserveWhiteSpace = false;
$document->validateOnParse = true;
$nodes = $document->getElementsByTagName("p");
foreach ($nodes as $node) {
while( $node->hasChildNodes() ) {
$node = $node->childNodes->item(0);
}
$node->nodeValue = str_replace("tag","element",$node->nodeValue);
}
echo $document->saveHTML();
Hope this helps.
UPDATE
To answer #paul's question in the comments below, you can create
$html = '<p>Replace this tag not this tag</p>';
$document = new DOMDocument();
$document->loadHTML($html);
$document->preserveWhiteSpace = false;
$document->validateOnParse = true;
$nodes = $document->getElementsByTagName("p");
//create the element which should replace the text in the original string
$elem = $document->createElement( 'dfn', 'tag' );
$attr = $document->createAttribute('title');
$attr->value = 'element';
$elem->appendChild( $attr );
foreach ($nodes as $node) {
while( $node->hasChildNodes() ) {
$node = $node->childNodes->item(0);
}
//dump the new string here, which replaces the source string
$node->nodeValue = str_replace("tag",$document->saveHTML($elem),$node->nodeValue);
}
echo $document->saveHTML();

get value of <h2> of html page with PHP DOM?

I have a var of a HTTP (craigslist) link $link, and put the contents into $linkhtml. In this var is the HTML code for a craigslist page, $link.
I need to extract the text between <h2> and </h2>. I could use a regexp, but how do I do this with PHP DOM? I have this so far:
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
What do I do next to put the contents of the element <h2> into a var $title?
if DOMDocument looks complicated to understand/use to you, then you may try PHP Simple HTML DOM Parser which provides the easiest ever way to parse html.
require 'simple_html_dom.php';
$html = '<h1>Header 1</h1><h2>Header 2</h2>';
$dom = new simple_html_dom();
$dom->load( $html );
$title = $dom->find('h2',0)->plaintext;
echo $title; // outputs: Header 2
You can use this code:
$linkhtml= file_get_contents($link);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($linkhtml); // loads your html
$xpath = new DOMXPath($doc);
$h2text = $xpath->evaluate("string(//h2/text())");
// $h2text is your text between <h2> and </h2>
You can do this with XPath: untested, may contain errors
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
$xpath = new DOMXpath($dom);
$elements = $xpath->query("/html/body/h2");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}

How to get nodes in first level using PHP DOMDocument?

I'm new to PHP DOM object and have a problem I can't find a solution. I have a DOMDocument with following HTML:
<div id="header">
</div>
<div id="content">
<div id="sidebar">
</div>
<div id="info">
</div>
</div>
<div id="footer">
</div>
I need to get all nodes that are on first level (header, content, footer). hasChildNodes() does not work, because first level node may not have children (header, footer).
For now my code looks like:
$dom = new DOMDocument();
$dom -> preserveWhiteSpace = false;
$dom -> loadHTML($html);
$childs = $dom -> getElementsByTagName('div');
But this gets me all div's. any advice?
You may have to go outside of DOMDocument - maybe convert to SimpleXML or DOMXpath
$file = $DOCUMENT_ROOT. "test.html";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("/");
Here's how I grab the first level elements (in this case, the top level TD elements in a table row:
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadHTML( $tr_element );
$xpath = new DOMXPath( $doc );
$td = $xpath->query("//tr/td[1]")->item(0);
do{
if( $innerHTML = self::DOMinnerHTML( $td ) )
array_push( $arr, $innerHTML );
$td = $td->nextSibling;
} while( $td != null );
$arr now contains the top TD elements, but not nested table TDs which you would get from
$dom->getElementsByTagName( 'td' );
The DOMinnerHTML function is something I snagged somewhere to get the innerHTML of an element/node:
public static function DOMinnerHTML( $element, $deep=true )
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild( $tmp_dom->importNode( $child, $deep ) );
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}

Categories