Remove <div> innerHTML with php - php

I try to change a html page through php. The idea is to reinvent the "contenteditable" attribute and change text on the fly. But I want to save it in the original html.
For this I have some initial text in a div element. This I convert to a form with a textarea, reload the page and then I can play with the text. Next I want to return the content of the textarea into the original div. It should replace the old text. It seems to work, except that the old text is always appended and I cannot get rid of it. The problem is probably in the setInnerHTML function. I tried:
$element->parentNode->removeChild($element);
but it did not work for some reason.
Thanks!
<?php
$text = $_POST["text"];
$id = $_GET["id"];
$ref = $_GET["ref"];
$html = new DOMDocument();
$html->loadHTMLFile($ref.".html");
$html->preserveWhiteSpace = false;
$html->formatOutput = true;
$elem = $html->getElementById($id);
function setInnerHTML($DOM, $element, $innerHTML)
{
$DOM->deleteTextNode($innerHTML);
$element->parentNode->removeChild($element);
$node = $DOM->createTextNode($innerHTML);
$element->appendChild($node);
}
setInnerHTML($html, $elem, $text);
$html->saveHTMLFile($ref.".html");
?>

Try changing your setInnerHTML to look like this:
function setInnerHTML($DOM, $element, $innerHTML) {
$node = $DOM->createTextNode($innerHTML);
$children = $element->childNodes;
foreach ($children as $child) {
$element->removeChild($child);
}
$element->appendChild($node);
}
Tell me if it is the result you desired.

Related

php html tags converted to string

I am trying to process a HTML file with php as a DOM document. Processing is okay, but when I save the html document with $html->saveHTMLFile("file_out.html"); all link tags are converted from:
Click here: <a title="editable" href="http://somewhere.net">somewhere.net</a>
to
Click here: <a title="editable" href="http://somewhere.net"> somewhere.net </a>
I process the links as php scripts, maybe this makes a difference?
I cannot convert the < back to < with htmlentitites_decode() or such. Is there any other conversion or encoding I can use?
The php script looks like the following:
<?php
$text = $_POST["textareaX"];
$id = $_GET["id"];
$ref = $_GET["ref"];
$html = new DOMDocument();
$html->preserveWhiteSpace = true;
$html->formatOutput = false;
$html->substituteEntities = false;
$html->loadHTMLFile($ref.".html");
$elem = $html->getElementById($id);
$elem->nodeValue = $innerHTML;
if ($text == "")
{ $text = "--- No details. ---"; }
$newtext = "";
$words = explode(" ",$text);
foreach ($words as $word) {
if (strpos($word, "http://") !== false) {
$newtext .= "<a alt=\"editable\" href=\"".$word."\">".$word."</a>";
}
else {$newtext .= $word." ";}
}
$text = $newtext;
function setInnerHTML($DOM, $element, $innerHTML) {
$node = $DOM->createTextNode($innerHTML);
$children = $element->childNodes;
foreach ($children as $child) {
$element->removeChild($child);
}
$element->appendChild($node);
}
setInnerHTML($html, $elem, $text);
$html->saveHTMLFile($ref.".html");
header('Location: '."tracking.php?ref=$ref&user=unLock");
?>
We get the reference to a file from "id" and "ref" and the input data from array "textareaX". Next I open the file, identify the html element by id and replace its content (a link) with the input data from the textarea. I provide only the href in the textarea and the script builds the hyperlink from that. Next I plug this back into the original file and overwrite the input file.
When I write the new file though, the link <a href= ...> </a> is converted to <a href=...> </a>, which is a problem.
Here is part of your code with the issue identified:
<?php
function setInnerHTML($DOM, $element, $innerHTML) {
/*********************************
Well, there's your problem:
**********************************/
$node = $DOM->createTextNode($innerHTML);
$children = $element->childNodes;
foreach ($children as $child) {
$element->removeChild($child);
}
$element->appendChild($node);
}
?>
What you are doing is passing your new anchor (a) tag as a string then creating a text node out of it (text is just that - text, not HTML). The createTextNode function automatically encodes any HTML tags so that they will be visible as text when viewed by a browser (this is so you can present HTML as visible code on your page if you choose to).
What you need to do is create the element as HTML (not a text node) then append it:
<?php
function setInnerHTML($DOM, $element, $innerHTML) {
$f = $DOM->createDocumentFragment();
$f->appendXML($innerHTML);
$element->appendChild($f);
}
?>

PHP DOMDocument Anchor Tags

I am using DOMDocument to parse all anchor tags from a string of HTML. I need to store all the anchors which do not contain a certain href into an array. Right now I am able to loop through all the anchors and filter out the correct ones but I cannot store the original anchor. I can access the href and text values by doing things like $node->getAttribute(‘href’) but how do I get the anchor in its original form like Some Text Thanks! Here is the code I have now:
$dom = new DOMDocument();
$dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));
$anchors = array();
foreach ($dom->getElementsByTagName('a') as $node) {
if(strpos($node->getAttribute('href'), 'some value') !== true){
$anchors[] = $node; // TODO: need to store the entire original anchor tag
}
}
Try this:
if(strpos($node->getAttribute('href'), 'some value') !== true){
$temp = new DOMDocument();
$temp->appendChild($temp->importNode($node, true));
$node = $temp->saveHTML();
//var_dump($node);
$anchors[] = $node;
}

DOMDOCUMENT | PHP: save getElementById output into new HTML file

I'm trying to save the result of getElementById using PHP.
The code I have:
<?php
$doc = new DOMDocument();
$doc->validateOnParse = true;
#$doc->loadHTMLfile('test.htm');
$div = $doc->getElementById('storytext');
echo $doc->saveHTML($div);
?>
This displays the relevant text, I now want to save that to a new file, I have tried using save(), saveHTMLfile() and file_put_contents(), none of those work because they only save strings and I cannot turn $div into a string, so I'm stuck.
If I just save the entire thing:
$doc->saveHTMLfile('name.ext');
It works but it saves everything, not just the part that I need.
I'm a complete DOM noob so I may be missing something very simple but I can't really find much about this through my searches.
function getInnerHtml( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
$html = getInnerHtml($div);
file_put_contents("name.ext", $html);

Dom replace entire node

Right now, i have this:
$text = $row->text;
$dom = new DOMDocument();
$dom->loadHTML($text);
$tags = $dom->getElementsByTagName('img');
foreach ($tags as $tag) {
$eg = $tag->getAttribute('data-easygal');
$src = $tag->getAttribute('src');
$values = explode("_",$eg);
$display = $this->prepareAlbum($values[0],$values[1],$src);
}
$row->text = $text;
is there a way to replace the whole node $tag, with what's in the $display string? I cant seem to find out how to str_replace the node for example.
Used to have preg_replace but that doesnt work properly on the clients server even though it works at home (and some instant anger from the php community with preg and html)
Tried searching the board, but no luck in finding what i need :S
Something like:
foreach($tags as &$tag) {
...
$tag = new DomNode();
}
Try
$tag-> parentNode ->replaceChild($newNode, $tag);
should replace the $tag node with $newNode - A DOM node that you create in the usual way.

Remove HTML element from parsed HTML document on a condition

I've parsed a HTML document using Simple PHP HTML DOM Parser. In the parsed document there's a ul-tag with some li-tags in it. One of these li-tags contains one of those dreaded "Add This" buttons which I want to remove.
To make this worse, the list item has no class or id, and it is not always in the same position in the list. So there is no easy way (correct me if I'm wrong) to remove it with the parser.
What I want to do is to search for the string 'addthis.com' in all li-elements and remove any element that contains that string.
<ul>
<li>Foobar</li>
<li>addthis.com</li><!-- How do I remove this? -->
<li>Foobar</li>
</ul>
FYI: This is purley a hobby project in my quest to learn PHP and not a case of content theft for profit.
All suggestions are welcome!
Couldn't find a method to remove nodes explicitly, but can remove with setting outertext to empty.
$html = new simple_html_dom();
$html->load(file_get_contents("test.html"), false, false); // preserve formatting
foreach($html->find('ul li') as $element) {
if (count($element->find('a.addthis_button')) > 0) {
$element->outertext="";
}
}
echo $html;
Well what you can do is use jQuery after the parsing. Something like this:
$('li').each(function(i) {
if($(this).html() == "addthis.com"){
$(this).remove();
}
});
This solution uses DOMDocument class and domnode.removechild method:
$str="<ul><li>Foobar</li><li>addthis.com</li><li>Foobar</li></ul>";
$remove='addthis.com';
$doc = new DOMDocument();
$doc->loadHTML($str);
$elements = $doc->getElementsByTagName('li');
$domElemsToRemove = array();
foreach ($elements as $element) {
$pos = strpos($element->textContent, $remove); // or similar $element->nodeValue
if ($pos !== false) {
$domElemsToRemove[] = $element;
}
}
foreach( $domElemsToRemove as $domElement ){
$domElement->parentNode->removeChild($domElement);
}
$str = $doc->saveHTML(); // <ul><li>Foobar</li><li>Foobar</li></ul>

Categories