Dom replace entire node - php

Right now, i have this:
$text = $row->text;
$dom = new DOMDocument();
$dom->loadHTML($text);
$tags = $dom->getElementsByTagName('img');
foreach ($tags as $tag) {
$eg = $tag->getAttribute('data-easygal');
$src = $tag->getAttribute('src');
$values = explode("_",$eg);
$display = $this->prepareAlbum($values[0],$values[1],$src);
}
$row->text = $text;
is there a way to replace the whole node $tag, with what's in the $display string? I cant seem to find out how to str_replace the node for example.
Used to have preg_replace but that doesnt work properly on the clients server even though it works at home (and some instant anger from the php community with preg and html)
Tried searching the board, but no luck in finding what i need :S

Something like:
foreach($tags as &$tag) {
...
$tag = new DomNode();
}

Try
$tag-> parentNode ->replaceChild($newNode, $tag);
should replace the $tag node with $newNode - A DOM node that you create in the usual way.

Related

Remove <div> innerHTML with php

I try to change a html page through php. The idea is to reinvent the "contenteditable" attribute and change text on the fly. But I want to save it in the original html.
For this I have some initial text in a div element. This I convert to a form with a textarea, reload the page and then I can play with the text. Next I want to return the content of the textarea into the original div. It should replace the old text. It seems to work, except that the old text is always appended and I cannot get rid of it. The problem is probably in the setInnerHTML function. I tried:
$element->parentNode->removeChild($element);
but it did not work for some reason.
Thanks!
<?php
$text = $_POST["text"];
$id = $_GET["id"];
$ref = $_GET["ref"];
$html = new DOMDocument();
$html->loadHTMLFile($ref.".html");
$html->preserveWhiteSpace = false;
$html->formatOutput = true;
$elem = $html->getElementById($id);
function setInnerHTML($DOM, $element, $innerHTML)
{
$DOM->deleteTextNode($innerHTML);
$element->parentNode->removeChild($element);
$node = $DOM->createTextNode($innerHTML);
$element->appendChild($node);
}
setInnerHTML($html, $elem, $text);
$html->saveHTMLFile($ref.".html");
?>
Try changing your setInnerHTML to look like this:
function setInnerHTML($DOM, $element, $innerHTML) {
$node = $DOM->createTextNode($innerHTML);
$children = $element->childNodes;
foreach ($children as $child) {
$element->removeChild($child);
}
$element->appendChild($node);
}
Tell me if it is the result you desired.

DOMDOCUMENT | PHP: save getElementById output into new HTML file

I'm trying to save the result of getElementById using PHP.
The code I have:
<?php
$doc = new DOMDocument();
$doc->validateOnParse = true;
#$doc->loadHTMLfile('test.htm');
$div = $doc->getElementById('storytext');
echo $doc->saveHTML($div);
?>
This displays the relevant text, I now want to save that to a new file, I have tried using save(), saveHTMLfile() and file_put_contents(), none of those work because they only save strings and I cannot turn $div into a string, so I'm stuck.
If I just save the entire thing:
$doc->saveHTMLfile('name.ext');
It works but it saves everything, not just the part that I need.
I'm a complete DOM noob so I may be missing something very simple but I can't really find much about this through my searches.
function getInnerHtml( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
$html = getInnerHtml($div);
file_put_contents("name.ext", $html);

Modify html attribute with php

I have a html string that contains exactly one a-element in it. Example:
test
In php I have to test if rel contains external and if yes, then modify href and save the string.
I have looked for DOM nodes and objects. But they seem to be too much for only one A-element, as I have to iterate to get html nodes and I am not sure how to test if rel exists and contains external.
$html = new DOMDocument();
$html->loadHtml($txt);
$a = $html->getElementsByTagName('a');
$attr = $a->item(0)->attributes();
...
At this point I am going to get NodeMapList that seems to be overhead. Is there any simplier way for this or should I do it with DOM?
Is there any simplier way for this or should I do it with DOM?
Do it with DOM.
Here's an example:
<?php
$html = 'test';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//a[contains(concat(' ', normalize-space(#rel), ' '), ' external ')]");
foreach($nodes as $node) {
$node->setAttribute('href', 'http://example.org');
}
echo $dom->saveHTML();
I kept going to modify with DOM. This is what I get:
$html = new DOMDocument();
$html->loadHtml('<?xml encoding="utf-8" ?>' . $txt);
$nodes = $html->getElementsByTagName('a');
foreach ($nodes as $node) {
foreach ($node->attributes as $att) {
if ($att->name == 'rel') {
if (strpos($att->value, 'external')) {
$node->setAttribute('href','modified_url_goes_here');
}
}
}
}
$txt = $html->saveHTML();
I did not want to load any other library for just this one string.
The best way is to use a HTML parser/DOM, but here's a regex solution:
$html = 'test<br>
<p> Some text</p>
test2<br>
<a rel="external">test3</a> <-- This won\'t work since there is no href in it.
';
$new = preg_replace_callback('/<a.+?rel\s*=\s*"([^"]*)"[^>]*>/i', function($m){
if(strpos($m[1], 'external') !== false){
$m[0] = preg_replace('/href\s*=\s*(("[^"]*")|(\'[^\']*\'))/i', 'href="http://example.com"', $m[0]);
}
return $m[0];
}, $html);
echo $new;
Online demo.
You could use a regular expression like
if it matches /\s+rel\s*=\s*".*external.*"/
then do a regExp replace like
/(<a.*href\s*=\s*")([^"]\)("[^>]*>)/\1[your new href here]\3/
Though using a library that can do this kind of stuff for you is much easier (like jquery for javascript)

Stripping <ins> and <del> tags from <script> tags

I have some code that is generating a diff between two documents, inserting <ins> and <del> tags haphazardly. For the most part it's doing a great job, but every now and then it inserts tags in script, style and the title tags.
Any ideas on how to remove the <del> tags (including the text between them), remove the <ins> tags (but retaining the text within them as part of the original string), however only within those three tags? (title, script and style).
Don't use regex to do this; it sounds like you have to deal with many, many lines. DOMDocument is great.
$dom = new DOMDocument;
$dom->loadHTML($your_html_string);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//script|//title|//style') as $node) {
foreach ($node->getElementsByTagName('del') as $delNode) {
$node->removeChild($delNode);
}
foreach ($node->getElementsByTagName('ins') as $insNode) {
$node->replaceChild($dom->createTextNode($insNode->nodeValue), $insNode);
}
}
Untested, this may or may not work:
$str = preg_replace('/(<script.*?>.*?)<del>.*?</del>(.*?</script>)/im', '$1$2', $str);
It attempts to look within the <script> ... </script> block of the string, and replace any instances of <del>...</del> with empty string.
The following ended up working quite well for me:
$tags = array('script', 'title', 'style');
foreach ($tags as $tag) {
$str = preg_replace_callback(
'/(<' . ($tag) . '\b[^>]*>)(.*?)(<\/' . ($tag) . '>)/is',
function($match) {
$replaced = preg_replace(
array(
'/__Delete-Start__.+__Delete-End__/Uis',
'/__Insert-Start__(.+)__Insert-End__/Uis'
),
array(
'',
'$1'
),
$match[2]
);
return ($match[1]) . ($replaced) . ($match[3]);
},
$str
);
}
While the following didn't end up being my solution, it did get me far and could be useful to others:
$dom = new DOMDocument;
$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//script|//title|//style') as $node) {
foreach ($node->getElementsByTagName('del') as $delNode) {
$node->removeChild($delNode);
}
foreach ($node->getElementsByTagName('ins') as $insNode) {
$node->replaceChild($dom->createTextNode($insNode->nodeValue), $insNode);
}
}
$str = (string) $dom->saveXML($dom, LIBXML_NOEMPTYTAG);//$xpath->query('//p')->item(0));
Hope this helps someone else.

php regular expression for matching anchor tags

go to the source of this page : www.songs.pk/indian/7days.html
there will be only eight links which start with http://link1
for example : Tune Mera Naam Liya
i want a php regular expression which matches the
http://link1.songs.pk/song1.php?songid=2792
and
Tune Mera Naam Liya
Thanks.
You're better off using something like simplehtmldom to find all links, then find all links with the relevant HTML / href.
Parsing HTML with regex isn't always the best solution, and in your case I feel it will bring you only pain.
$href = 'some_href';
$inner_text = 'some text';
$desired_anchors = array();
$html = file_get_html ('your_file_or_url');
// Find all anchors, returns a array of element objects
foreach($html->find('a') as $anchor) {
if ($a->href = $href && $anchor->innertext == $inner_text) {
$desired_anchors[] = $anchor;
}
}
print_r($desired_anchors);
That should get you started.
Don't use a regex buddy! PHP has a better suited tool for this...
$dom = new DOMDocument;
$dom->loadHTML($str);
$matchedAnchors = array();
$anchors = $dom->getElementsByTagName('a');
$match = 'http://link1';
foreach($anchors as $anchor) {
if ($anchor->hasAttribute('href') AND substr($anchor->getAttribute('href'), 0, strlen($match)) == $match) {
$matchedAnchors[] = $anchor;
}
}
here you go
preg_match_all('~<a .*href="(http://link1\..*)".*>(.*)</a>~Ui',$str,$match,PREG_SET_ORDER);
print_r($match);

Categories