DomDocument parse Newline works with span but not img

DomDocument parse Newline works with span but not img - php

See here: https://ideone.com/bjs3IC
Why does the newline correctly display with the spans but not imgs ?
<?php
outputImages();
outputSpans();
function outputImages(){
$html = "<div class='test'>
<pre>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
</pre>
</div>";
getHtml($html);
}
function outputSpans(){
$html = "<div class='test'>
<pre>
<span>a</span>
<span>b</span>
<span>c</span>
</pre>
</div>";
getHtml($html);
}
function getHtml($html){
$doc = new DOMDocument;
$doc->loadhtml($html);
$xpath = new DOMXPath($doc);
$tags = $xpath->query('//div[#class="test"]');
print(get_inner_html($tags[0]));
}
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}

The DOMDocument::loadHTML function has a second options parameter. It appears like the LIBXML_NOBLANKS is (at least one of) the default values there.
You can use
$doc->loadhtml($html, LIBXML_NOEMPTYTAG);
To override that default value and your code will work the same for the two samples.
p.s.
Not sure why you use
print(get_inner_html($tags[0]));
The $tags variable is a DOMNodeList, so you should use $tags->item(0) to get the first tag.
Your complete code should look like this:
outputImages();
outputSpans();
function outputImages() {
$html = "<div class='test'>
<pre>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
</pre>
</div>";
getHtml($html);
}
function outputSpans() {
$html = "<div class='test'>
<pre>
<span>a</span>
<span>b</span>
<span>c</span>
</pre>
</div>";
getHtml($html);
}
function getHtml($html) {
$doc = new DOMDocument;
$doc->loadHTML($html, LIBXML_NOEMPTYTAG);
$xpath = new DOMXPath($doc);
$tags = $xpath->query('//div[#class="test"]');
print(get_inner_html($tags->item(0)));
}
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}

Related

Replace content specific HTML tag using PHP

I have HTML code:
<div>
<h1>Header</h1>
<code><p>First code</p></code>
<p>Next example</p>
<code><b>Second example</b></code>
</div>
Using PHP I want replace all < symbols located in code elements for example above code I want converted to:
<div>
<h1>Header</h1>
<code><p>First code</p></code>
<p>Next example</p>
<code><b>Second example</b></code>
</div>
I try using PHP DomDocument class but my work was ineffective. Below is my code:
$dom = new DOMDocument();
$dom->loadHTML($content);
$innerHTML= '';
$tmp = '';
if(count($dom->getElementsByTagName('*'))){
foreach ($dom->getElementsByTagName('*') as $child) {
if($child->tagName == 'code'){
$tmp = $child->ownerDocument->saveXML( $child);
$innerHTML .= htmlentities($tmp);
}
else{
$innerHTML .= $child->ownerDocument->saveXML($child);
}
}
}

So, you're iterating over the markup properly, and your use of saveXML() was close to what you want, but nowhere in your code do you try to actually change the contents of the element. This should work:
<?php
$content='<div>
<h1>Header</h1>
<code><p>First code</p></code>
<p>Next example</p>
<code><b>Second example</b></code>
</div>';
$dom = new DOMDocument();
$dom->loadHTML($content, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
foreach ($dom->getElementsByTagName('code') as $child) {
// get the markup of the children
$html = implode(array_map([$child->ownerDocument,"saveHTML"], iterator_to_array($child->childNodes)));
// create a node from the string
$text = $dom->createTextNode($html);
// remove existing child nodes
foreach ($child->childNodes as $node) {
$child->removeChild($node);
}
// append the new text node - escaping is done automatically
$child->appendChild($text);
}
echo $dom->saveHTML();

How to parsing HTML Content [duplicate]

What function do you use to get innerHTML of a given DOMNode in the PHP DOM implementation? Can someone give reliable solution?
Of course outerHTML will do too.

Compare this updated variant with PHP Manual User Note #89718:
<?php
function DOMinnerHTML(DOMNode $element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$innerHTML .= $element->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
?>
Example:
<?php
$dom= new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load($html_string);
$domTables = $dom->getElementsByTagName("table");
// Iterate over DOMNodeList (Implements Traversable)
foreach ($domTables as $table)
{
echo DOMinnerHTML($table);
}
?>

Here is a version in a functional programming style:
function innerHTML($node) {
return implode(array_map([$node->ownerDocument,"saveHTML"],
iterator_to_array($node->childNodes)));
}

To return the html of an element, you can use C14N():
$dom = new DOMDocument();
$dom->loadHtml($html);
$x = new DOMXpath($dom);
foreach($x->query('//table') as $table){
echo $table->C14N();
}

A simplified version of Haim Evgi's answer:
<?php
function innerHTML(\DOMElement $element)
{
$doc = $element->ownerDocument;
$html = '';
foreach ($element->childNodes as $node) {
$html .= $doc->saveHTML($node);
}
return $html;
}
Example usage:
<?php
$doc = new \DOMDocument();
$doc->loadHTML("<body><div id='foo'><p>This is <b>an <i>example</i></b> paragraph<br>\n\ncontaining newlines.</p><p>This is another paragraph.</p></div></body>");
print innerHTML($doc->getElementById('foo'));
/*
<p>This is <b>an <i>example</i></b> paragraph<br>
containing newlines.</p>
<p>This is another paragraph.</p>
*/
There's no need to set preserveWhiteSpace or formatOutput.

In addition to trincot's nice version with array_map and implode but this time with array_reduce:
return array_reduce(
iterator_to_array($node->childNodes),
function ($carry, \DOMNode $child) {
return $carry.$child->ownerDocument->saveHTML($child);
}
);
Still don't understand, why there's no reduce() method which accepts arrays and iterators alike.

function setnodevalue($doc, $node, $newvalue){
while($node->childNodes->length> 0){
$node->removeChild($node->firstChild);
}
$fragment= $doc->createDocumentFragment();
$fragment->preserveWhiteSpace= false;
if(!empty($newvalue)){
$fragment->appendXML(trim($newvalue));
$nod= $doc->importNode($fragment, true);
$node->appendChild($nod);
}
}

Here's another approach based on this comment by Drupella on php.net, that worked well for my project. It defines the innerHTML() by creating a new DOMDocument, importing and appending to it the target node, instead of explicitly iterating over child nodes.
InnerHTML
Let's define this helper function:
function innerHTML( \DOMNode $n, $include_target_tag = true ) {
$doc = new \DOMDocument();
$doc->appendChild( $doc->importNode( $n, true ) );
$html = trim( $doc->saveHTML() );
if ( $include_target_tag ) {
return $html;
}
return preg_replace( '#^<' . $n->nodeName .'[^>]*>|</'. $n->nodeName .'>$#', '', $html );
}
where we can include/exclude the outer target tag through the second input argument.
Usage Example
Here we extract the inner HTML for a target tag given by the "first" id attribute:
$html = '<div id="first"><h1>Hello</h1></div><div id="second"><p>World!</p></div>';
$doc = new \DOMDocument();
$doc->loadHTML( $html );
$node = $doc->getElementById( 'first' );
if ( $node instanceof \DOMNode ) {
echo innerHTML( $node, true );
// Output: <div id="first"><h1>Hello</h1></div>
echo innerHTML( $node, false );
// Output: <h1>Hello</h1>
}
Live example:
http://sandbox.onlinephpfunctions.com/code/2714ea116aad9957c3c437d46134a1688e9133b8

Old query, but there is a built-in method to do that. Just pass the target node to DomDocument->saveHtml().
Full example:
$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);
Output: <p>ciao questa è una <b>prova</b>.</p>

For people who want to get the HTML from XPath query, here is my version:
$xpath = new DOMXpath( $my_dom_object );
$DOMNodeList = $xpath->query('//div[contains(#class, "some_custom_class_in_html")]');
if( $DOMNodeList->count() > 0 ) {
$page_html = $my_dom_object->saveHTML( $DOMNodeList->item(0) );
}

How to loop through all the Childs under a tag in PHP DOMDocument

I have the following html
$html = '<body><div style="font-color:#000">Hello</div>
<span style="what">My name is rasid</span><div>new to you
</div><div style="rashid">New here</div></body>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$elements = $dom->getElementsByTagName('body');
I have tried
foreach($elements as $child)
{
echo $child->nodeName;
}
The Ouput is
body
But I need to loop through all the tags under body not the body. How can I do that.
I have also tried in above example to replace
$elements = $dom->getElementsByTagName('body');
with
$elements = $dom->getElementsByTagName('body')->item(0);
But It gives Error. Any Solution??

try this
$elements = $dom->getElementsByTagName('*');
$i = 1; //counter to output from 3rd one, since foreach loop below will output" html body div span div div"
foreach($elements as $child)
{
if ($i > 2) echo $child->nodeName."<br>"; //output "div span div div"
++$i;
}

If you only want child nodes of the body element, you can use:
$body = $dom->getElementsByTagName( 'body' )->item( 0 );
foreach( $body->childNodes as $node )
{
echo $node->nodeName . PHP_EOL;
}
If you want all descending nodes of the body element, you could use DOMXPath:
$xpath = new DOMXPath( $dom );
$bodyDescendants = $xpath->query( '//body//node()' );
foreach( $bodyDescendants as $node )
{
echo $node->nodeName . PHP_EOL;
}

use this code
$elements = $dom->getElementsByTagName('*');
foreach($elements as $child)
{
echo $child->nodeName;
}

how to output text of elementById in DOM?

It should be very simple. I am loading in php via DOMDocument();
$doc = new DOMDocument();
$doc->loadHTML($html);
$el = $doc->getElementById('somethingId');
Lets say i have
<html><head></head><body><div id="somethingId">my
<span style="background:red">something else</span>
information</div></body></html>
Q1. How to echo whats inside that element ("my information") from $el?
Q2. How to echo whats inside and including span data (like innerHTML in javascript)?
Answer to Q2:
$children = $el->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
echo $innerHTML

You should do
echo ($el->nodeValue);

if (!is_null($el)) {
$content = $el->nodeValue;
if (empty($content)) {
$content = $el->textContent;
}
echo $content;
}

How to get innerHTML of DOMNode?

What function do you use to get innerHTML of a given DOMNode in the PHP DOM implementation? Can someone give reliable solution?
Of course outerHTML will do too.

Compare this updated variant with PHP Manual User Note #89718:
<?php
function DOMinnerHTML(DOMNode $element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$innerHTML .= $element->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
?>
Example:
<?php
$dom= new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load($html_string);
$domTables = $dom->getElementsByTagName("table");
// Iterate over DOMNodeList (Implements Traversable)
foreach ($domTables as $table)
{
echo DOMinnerHTML($table);
}
?>

Here is a version in a functional programming style:
function innerHTML($node) {
return implode(array_map([$node->ownerDocument,"saveHTML"],
iterator_to_array($node->childNodes)));
}

To return the html of an element, you can use C14N():
$dom = new DOMDocument();
$dom->loadHtml($html);
$x = new DOMXpath($dom);
foreach($x->query('//table') as $table){
echo $table->C14N();
}

A simplified version of Haim Evgi's answer:
<?php
function innerHTML(\DOMElement $element)
{
$doc = $element->ownerDocument;
$html = '';
foreach ($element->childNodes as $node) {
$html .= $doc->saveHTML($node);
}
return $html;
}
Example usage:
<?php
$doc = new \DOMDocument();
$doc->loadHTML("<body><div id='foo'><p>This is <b>an <i>example</i></b> paragraph<br>\n\ncontaining newlines.</p><p>This is another paragraph.</p></div></body>");
print innerHTML($doc->getElementById('foo'));
/*
<p>This is <b>an <i>example</i></b> paragraph<br>
containing newlines.</p>
<p>This is another paragraph.</p>
*/
There's no need to set preserveWhiteSpace or formatOutput.

In addition to trincot's nice version with array_map and implode but this time with array_reduce:
return array_reduce(
iterator_to_array($node->childNodes),
function ($carry, \DOMNode $child) {
return $carry.$child->ownerDocument->saveHTML($child);
}
);
Still don't understand, why there's no reduce() method which accepts arrays and iterators alike.

function setnodevalue($doc, $node, $newvalue){
while($node->childNodes->length> 0){
$node->removeChild($node->firstChild);
}
$fragment= $doc->createDocumentFragment();
$fragment->preserveWhiteSpace= false;
if(!empty($newvalue)){
$fragment->appendXML(trim($newvalue));
$nod= $doc->importNode($fragment, true);
$node->appendChild($nod);
}
}

Here's another approach based on this comment by Drupella on php.net, that worked well for my project. It defines the innerHTML() by creating a new DOMDocument, importing and appending to it the target node, instead of explicitly iterating over child nodes.
InnerHTML
Let's define this helper function:
function innerHTML( \DOMNode $n, $include_target_tag = true ) {
$doc = new \DOMDocument();
$doc->appendChild( $doc->importNode( $n, true ) );
$html = trim( $doc->saveHTML() );
if ( $include_target_tag ) {
return $html;
}
return preg_replace( '#^<' . $n->nodeName .'[^>]*>|</'. $n->nodeName .'>$#', '', $html );
}
where we can include/exclude the outer target tag through the second input argument.
Usage Example
Here we extract the inner HTML for a target tag given by the "first" id attribute:
$html = '<div id="first"><h1>Hello</h1></div><div id="second"><p>World!</p></div>';
$doc = new \DOMDocument();
$doc->loadHTML( $html );
$node = $doc->getElementById( 'first' );
if ( $node instanceof \DOMNode ) {
echo innerHTML( $node, true );
// Output: <div id="first"><h1>Hello</h1></div>
echo innerHTML( $node, false );
// Output: <h1>Hello</h1>
}
Live example:
http://sandbox.onlinephpfunctions.com/code/2714ea116aad9957c3c437d46134a1688e9133b8

Old query, but there is a built-in method to do that. Just pass the target node to DomDocument->saveHtml().
Full example:
$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);
Output: <p>ciao questa è una <b>prova</b>.</p>

For people who want to get the HTML from XPath query, here is my version:
$xpath = new DOMXpath( $my_dom_object );
$DOMNodeList = $xpath->query('//div[contains(#class, "some_custom_class_in_html")]');
if( $DOMNodeList->count() > 0 ) {
$page_html = $my_dom_object->saveHTML( $DOMNodeList->item(0) );
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

DomDocument parse Newline works with span but not img - php

Related

Replace content specific HTML tag using PHP

How to parsing HTML Content [duplicate]

How to loop through all the Childs under a tag in PHP DOMDocument

how to output text of elementById in DOM?

How to get innerHTML of DOMNode?

Categories

Resources