Exclude Content From A Div id - php

My PHP script can fetch content from a div id, but what is the way to filter this fetch data and exclude some of its content which has this div id <div id="navbar" class="n"> I have tried with this code but its not working
$regex = '#\<div id="navbar"\>(.+?)\<\/div\>#s';
preg_match($regex, $displaybody, $matches);
$match = $matches[0];
echo "$match";`
To fetch content i am using HTML DOM Parser.

Using regexpes to parse html is usually a bad idea. You can select nodes with the DOM just fine:
$input = '<html> <body> some content <span class="a">b</span> <div id="navbar" class="n">find me <span class="a">b</span></div> </html>';
$doc = new DOMDocument;
$doc->loadHTML($input);
$navbar = $doc->getElementById('navbar');
$innerhtml = '';
foreach ($navbar->childNodes as $cn) {
$innerhtml .= $doc->saveHTML($cn);
}
print $innerhtml;

Related

Need to get divs from string based on matching class

I have a variable $company_id = 8; and a block of HTML content stored as a string called all_content:
<div class="company-id-8">
Content One
</div>
<div class="company-id-9">
Content Two
</div>
<div class="company-id-8">
Content Three
</div>
<div class="company-id-3">
Content Four
</div>
I need to remove all of the divs from all_content that don't match the current company ID class. So, once filtered, the above html should become:
<div class="company-id-8">
Content One
</div>
<div class="company-id-8">
Content Three
</div>
I have the following code to filter out divs that don't belong to the current company:
$dom = new DomDocument();
$dom->loadHTML( $full_message );
$finder = new DomXPath($dom);
$classname = "company-id-" . $company_id;
$nodes = $finder->query("//div[contains(#class, '$classname')]");
foreach ( $nodes as $node ) {
$filtered_content .= ;
}
I can't seem to work out how to get my filtered div nodes back into the filtered_content string though?
How can I tidy this up and get it working?
Solution is to do the following:
$filtered_content = "";
foreach ( $nodes as $node ) {
$tmp_doc = new DOMDocument();
$tmp_doc->appendChild($tmp_doc->importNode($node,true));
$filtered_content .= $tmp_doc->saveHTML();
}
filtered_content ends up being a usable HTML string with the correct content.

Retrieve content from DOM->getElementById without element id

When using
$body = $dom->getElementById('content');
The output is the following:
<div id=content>
<div>
<p>some text</p>
</div>
</div>
I need to remove the <div id=content></div>part.
Since i only need the inner part, excluding the div with id content
needed result:
<div>
<p>some text</p>
</div>
My current code:
$url = 'myfile.html';
$file = file_get_contents($url);
$dom = new domDocument;
$dom->loadHTML($file);
//$body = $dom->getElementsByTagName('body')->item(0);
$body = $dom->getElementById('nbscontent');
$stringbody = $dom->saveHTML($body);
echo $stringbody;
getElementById returns a DOMElement which has the property childNodes which is a DOMNodeList. You can traverse through that to get the children and subsequently the innerHTML.
$str = "<div id='test'><p>inside</p></div>";
$dom = new DOMDocument();
$dom->loadHTML($str);
$body = $dom->getElementById('test');
$innerHTML = '';
foreach ($body->childNodes as $child)
{
$innerHTML .= $body->ownerDocument->saveHTML($child);
}
echo $innerHTML; // <p>inside</p>
Live Example
Repl

Replace content specific HTML tag using PHP

I have HTML code:
<div>
<h1>Header</h1>
<code><p>First code</p></code>
<p>Next example</p>
<code><b>Second example</b></code>
</div>
Using PHP I want replace all < symbols located in code elements for example above code I want converted to:
<div>
<h1>Header</h1>
<code><p>First code</p></code>
<p>Next example</p>
<code><b>Second example</b></code>
</div>
I try using PHP DomDocument class but my work was ineffective. Below is my code:
$dom = new DOMDocument();
$dom->loadHTML($content);
$innerHTML= '';
$tmp = '';
if(count($dom->getElementsByTagName('*'))){
foreach ($dom->getElementsByTagName('*') as $child) {
if($child->tagName == 'code'){
$tmp = $child->ownerDocument->saveXML( $child);
$innerHTML .= htmlentities($tmp);
}
else{
$innerHTML .= $child->ownerDocument->saveXML($child);
}
}
}
So, you're iterating over the markup properly, and your use of saveXML() was close to what you want, but nowhere in your code do you try to actually change the contents of the element. This should work:
<?php
$content='<div>
<h1>Header</h1>
<code><p>First code</p></code>
<p>Next example</p>
<code><b>Second example</b></code>
</div>';
$dom = new DOMDocument();
$dom->loadHTML($content, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
foreach ($dom->getElementsByTagName('code') as $child) {
// get the markup of the children
$html = implode(array_map([$child->ownerDocument,"saveHTML"], iterator_to_array($child->childNodes)));
// create a node from the string
$text = $dom->createTextNode($html);
// remove existing child nodes
foreach ($child->childNodes as $node) {
$child->removeChild($node);
}
// append the new text node - escaping is done automatically
$child->appendChild($text);
}
echo $dom->saveHTML();

Get href value from matching anchor text

I'm pretty new to the DOMDocument class and can't seem to find an answer for what i'm trying to do.
I have a large html file and i want to grab the link from an element based on the anchor text.
so for example
$html = <<<HTML
<div class="main">
<img src="http://images.com/spacer.gif"/>Keyword</font></span>
other text
</div>
HTML;
// domdocument
$doc = new DOMDocument();
$doc->loadHTML($html);
i want to get the value of the href attribute of any element that has the text keyword. Hope that was clear
$html = <<<HTML
<div class="main">
<img src="http://images.com/spacer.gif"/>Keyword</font></span>
other text
</div>
HTML;
$keyword = "Keyword";
// domdocument
$doc = new DOMDocument();
$doc->loadHTML($html);
$as = $doc->getElementsByTagName('a');
foreach ($as as $a) {
if ($a->nodeValue === $keyword) {
echo $a->getAttribute('href'); // prints "http://link.com"
break;
}
}

PHP: Fetch content from a html page using xpath()

I'm trying to fetch the content of a div in a html page using xpath and domdocument. This is the structure of the page:
<div id="content">
<div class="div1"></div>
<span class="span1></span>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<div class="div2"></div>
</div>
I want to get only the content of p, not spans and divs. I came thru this xpath expression .//*[#id='content']/p but guess something's not right because i'm getting only the first p. Tried using other expression with following-sibling and node() but all return the first p only.
.//*[#id='content']/span/following-sibling::p
.//*[#id='content']/node()[self::p]
This is how's used xpath:
$domDocument=new DOMDocument();
$domDocument->encoding = 'UFT8';
$domDocument->loadHTML($page);
$domXPath = new DOMXPath($domDocument);
$domNodeList = $domXPath->query($this->xpath);
$content = $this->GetHTMLFromDom($domNodeList);
And this is how i get html from nodes:
private function GetHTMLFromDom($domNodeList){
$domDocument = new DOMDocument();
$node = $domNodeList->item(0);
foreach($node->childNodes as $childNode)
$domDocument->appendChild($domDocument->importNode($childNode, true));
return $domDocument->saveHTML();
}
This XPath expression:
//div[#id='content']/p
Result in the wanted node set (five p elements)
EDIT: Now it's clear what is your problem. You need to iterate over the NodeList:
private function GetHTMLFromDom($domNodeList){
$domDocument = new DOMDocument();
foreach ($nodelist as $node) {
$domDocument->appendChild($domDocument->importNode($node, true));
}
return $domDocument->saveHTML();
}

Categories