I try to add some piece of HTML code which contains an attribute like {{ some_attr }} i.e. with empty value. For example:
<?php
$pageHTML = '<!doctype html>
<html>
<head>
</head>
<body>
<div id="root">Initial content</div>
</body>
</html>';
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($pageHTML);
libxml_use_internal_errors(false);
$tmplCode = '<div {{ some_attr }}>New content</div>';
foreach($dom->getElementsByTagName('body')[0]->getElementsByTagName('*') as $node) {
if($node->getAttribute('id') == 'root') {
$fragment = $dom->createDocumentFragment();
$fragment->appendXML($tmplCode);
$node->appendChild($fragment);
}
}
echo $dom->saveHTML((new \DOMXPath($dom))->query('/')->item(0));
?>
Since appendXML() doesn't pass empty attribute, I don't receive my div with New content
I've tried
$dom->loadHTML($pageHTML, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
and
foreach (libxml_get_errors() as $error) {
// Ignore unknown tag errors
if ($error->code === 801) continue;
throw new Exception("Could not parse template");
}
libxml_clear_errors();
before saveHTML() as described by the link https://stackoverflow.com/a/39671548
I've also tried
##$fragment = $dom->createDocumentFragment();
##$fragment->appendXML($tmplCode);
as mentioned by the link https://stackoverflow.com/a/15998516
But none of the solutions work
Is it possible to append a code with empty attribute using appendXML() ?
Ok, I've just found a solution from https://stackoverflow.com/a/4401089/3208225
...
if($node->getAttribute('id') == 'root') {
$tmpDoc = new DOMDocument();
$tmpDoc->loadHTML($tmplCode);
foreach ($tmpDoc->getElementsByTagName('body')->item(0)->childNodes as $newNode) {
$newNode = $dom->importNode($newNode, true);
$node->nodeValue = '';
$node->appendChild($newNode);
}
}
...
Related
I have a function that get all <h2> using DOMDocument,
Now I want to check if there is any HTML tag between <h2>[here]</h2>, don't get the <h2> and skip to next.
My Code:
foreach ($DOM->getElementsByTagName('*') as $element) {
if ($element->tagName == 'h2') {
$h = $element->textContent;
}
}
I think the easiest thing is to just reuse getElementsByTagName("*") on the element and count how many items are found.
$html = <<<EOT
<html><body><h2>Hello</h2> <h2>World</h2><h2><strong>!</strong></h2></body></html>
EOT;
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('h2') as $h2) {
if(!count($h2->getElementsByTagName('*'))){
var_dump($h2->textContent);
}
}
Demo here: https://3v4l.org/dI1e4
I have the following:
$node = $doc->getElementsByTagName('img');
if ($node->item(0) == null || $node->item(0) == '') {
// do stuff
} elseif ($node->item(0)->hasAttribute('src')) {
// do other stuff
} else {
// do more other stuff
}
What I want is to only return images from the body tag.
I have tried:
$body = $doc->getElementsByTagName('body');
foreach ($body as $body_node) {
$node = $body_node->getElementsByTagName('img');
}
however if there is an image in header it still seems to get returned by
$node->item(0)->hasAttribute('src')
Personally there should never be an img in the header but I find some url's add them in a noscript tag in the the header.
So how do I return only images from he body tag excluding any found in the head tag?
Do it using DOMXPath:
$xpath = new DOMXpath($doc);
$nodes = $xpath->query('//body//img');
$nodes is now a DOMNodeList that you can iterate over.
If you only want img nodes that have a src attribute:
$nodes = $xpath->query('//body//img[#src]');
Edit: Here is a fully working example:
<?php
$contents = file_get_contents('http://stackoverflow.com/');
$doc = new DOMDocument();
$doc->loadHTML($contents);
$xpath = new DOMXpath($doc);
$nodes = $xpath->query('//body//img');
foreach ($nodes as $node) {
echo $node->getAttribute('src') . "\n";
}
I need to set a class to parent of each text node inside of specific block on my page.
Here is what I'm trying to do:
$pageHTML = '<html><head></head>
<body>
<header>
<div>
<nav>Menu</nav>
<span>Another text</span>
</div>
</header>
<section>Section</section>
<footer>Footer</footer>
</body>
</html>';
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($pageHTML);
libxml_use_internal_errors(false);
foreach($dom->getElementsByTagName('body')[0]->childNodes as $bodyChild) {
if($bodyChild->nodeName == 'header') {
$blockDoc = new DOMDocument();
$blockDoc->appendChild($blockDoc->importNode($bodyChild, true));
$xpath = new DOMXpath($blockDoc);
foreach($xpath->query('//text()') as $textnode) {
if(preg_match('/\S/', $textnode->nodeValue)) { // exclude non-characters
$textnode->parentNode->setAttribute('class','my_class');
}
}
}
}
echo $dom->saveHTML((new \DOMXPath($dom))->query('/')->item(0));
I need to get <nav> and <span> inside of <header> with the my_class but I don't get.
As I can understand, I need to return back changed parents to DOM after setting the class to them, but how can I do that?
Ok, I've found the answer by myself:
...
$xpath = new DOMXpath($dom);
foreach($dom->getElementsByTagName('body')[0]->childNodes as $bodyChild) {
if($bodyChild->nodeName == 'header') {
foreach($xpath->query('.//text()', $bodyChild) as $textnode) {
if(preg_match('/\S/', $textnode->nodeValue)) { // exclude non-characters
$textnode->parentNode->setAttribute('class','my_class');
}
}
}
}
Try this code, you have to get the node by its name by using getElementsByTagName instead of checking by text node.
$pageHTML = '<html>
<head></head>
<body>
<header>
<div>
<nav>Menu</nav>
<span>Another text</span>
</div>
</header>
<section>Section</section>
<footer>Footer</footer>
</body>
</html>';
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($pageHTML);
libxml_use_internal_errors(false);
$elements = $dom->getElementsByTagName('header');
foreach ($elements as $node) {
$nav = $node->getElementsByTagName('nav');
$span = $node->getElementsByTagName('span');
$nav->item(0)->setAttribute('class', 'my_class');
$span->item(0)->setAttribute('class', 'my_class');
}
echo $dom->saveHTML();
Given the following HTML:
$content = '<html>
<body>
<div>
<p>During the interim there shall be nourishment supplied</p>
</div>
</body>
</html>';
How can I alter it to the following HTML:
<html>
<body>
<div>
<p>During the <span>interim</span> there shall be nourishment supplied</p>
</div>
</body>
</html>
I need to do this using DomDocument. Here's what I've tried:
$dom = new DomDocument();
$dom->loadHTML($content);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXpath($dom);
$elements = $xpath->query("//*[contains(text(),'interim')]");
if (!is_null($elements)) {
foreach ($elements as $element) {
$text = $element->nodeValue;
$element->nodeValue = str_replace('interim','<span>interim</span>',$text);
}
}
echo $dom->saveHTML();
However, this outputs literal html entities so it renders like this in the browser:
During the <span>interim</span> there shall be nourishment supplied
I imagine one should use createElement and appendChild methods instead of assigning nodeValue directly but I can't see how to insert an element in the middle of a textNode string?
Marcus Harrison's answer using splitText is a good one, but it can be simplified and needs to use mb_* methods to work with UTF-8 input:
<?php
$html = <<<END
<html>
<meta charset="utf-8">
<body>
<div>
<p>During € the interim there shall be nourishment supplied</p>
</div>
</body>
</html>
END;
$replace = 'interim';
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query(sprintf('//text()[contains(., "%s")]', $replace));
foreach ($nodes as $node) {
$start = mb_strpos($node->textContent, $replace);
$end = $start + mb_strlen($replace);
$node->splitText($end); // do this first
$node->splitText($start); // do this last
$newnode = $doc->createElement('span');
$node->parentNode->insertBefore($newnode, $node->nextSibling);
$newnode->appendChild($newnode->nextSibling);
}
$doc->encoding = 'UTF-8';
print $doc->saveHTML($doc->documentElement);
Create a new DomDocument with modified element and replace the old one
foreach ($elements as $element) {
$text = $element->nodeValue;
$el = new DomDocument();
$el->loadHTML('<iframe>'. str_replace('interim','<span>interim</span>',$text) . '</iframe>');
$new = $dom->importNode($el->getElementsByTagName('iframe')->item(0), true);
unset($el);
$element->parentNode->replaceChild($new, $element);
}
In order to do this, you must use the DOMString's splitText interface. This accepts an offset, which can be retrieved by using strpos:
$dom = new DomDocument();
$dom->loadHTML($content);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXpath($dom);
$elements = $xpath->query("//*[contains(text(),'interim')]");
if (!is_null($elements)) {
foreach ($elements as $element) {
$text = $element->childNodes->item(0);
$text->splitText(strpos($text->textContent, "interim"));
$text2 = $element->childNodes->item(1);
$text2->splitText(strpos($text2->textContent, " "));
$element->removeChild($text2);
$span = $dom->createElement("span");
$span->appendChild($dom->createTextNode("interim"));
$element->insertBefore($span, $element->childNodes->item(1));
}
}
echo $dom->saveHTML();
Edits: having just tested it, I realise I hadn't removed the original "interim" in the second text node. Edited this answer to do that. I have also edited this code to be as compatible with old versions of PHP as I can think of making it: as I don't run an old version of PHP it isn't possible for me to test that.
Consider the following html:
<html>
<title>Xyz</title>
<body>
<div>
<div class='mycls'>
<div>1 Books</div>
<div>2 Papers</div>
<div>3 Pencils</div>
</div>
</div>
<body>
</html>
$dom = new DOMDocument();
$dom->loadHTML([loaded html of remote url through curl]);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('html/body/div[#class="mycls"]');
till here its working fine, i need to replace the node to get following:
<body>
<div>
<span>
<div>1 Books</div>
<div>2 Papers</div>
<div>3 Pencils</div>
</span>
</div>
<body>
Something like the following should work for you:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$oldNode = $xpath->query('//div[#class="mycls"]')->item(0);
$span = $dom->createElement('span');
if ($oldNode->hasChildNodes()) {
$children = [];
foreach ($oldNode->childNodes as $child) {
$children[] = $child;
}
foreach ($children as $child) {
$span->appendChild($child->parentNode->removeChild($child));
}
}
$oldNode->parentNode->replaceChild($span, $oldNode);
echo htmlspecialchars($dom->saveHTML());
Demo: http://codepad.viper-7.com/WNTrR5
Note that in the demo I also have fixed your HTML which was utterly broken :-)
If you demo is really the HTML you are getting back from the cURL call and you cannot change it (no control over it) you can do:
$libxmlErrors = libxml_use_internal_errors(true); // at the start
and
libxml_use_internal_errors($libxmlErrors); // at the end
To prevent errors popping up