When loading HTML into an <textarea>, I intend to treat different kinds of links differently. Consider the following links:
http://stackoverflow.com
StackOverflow
When the text inside a link matches its href attribute, I want to remove the HTML, otherwise the HTML remains unchanged.
Here's my code:
$body = "Some HTML with a http://stackoverflow.com";
$dom = new DOMDocument;
$dom->loadHTML($body, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('a') as $node) {
$link_text = $node->ownerDocument->saveHTML($node->childNodes[0]);
$link_href = $node->getAttribute("href");
$link_node = $dom->createTextNode($link_href);
$node->parentNode->replaceChild($link_node, $node);
}
$html = $dom->saveHTML();
The problem with the above code is that DOMDocument encapsulates my HTML into a paragraph tag:
<p>Some HTML with a http://stackoverflow.com</p>
How do I get it ot only return the inner HTML of that paragraph?
You need to have a root node to have a valid DOM document.
I suggest you to add a root node <div> to avoid to destroy a possibly existing one.
Finally, load the nodeValue of the rootNode or substr().
$body = "Some HTML with a http://stackoverflow.com";
$body = '<div>'.$body.'</div>';
$dom = new DOMDocument;
$dom->loadHTML($body, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('a') as $node) {
$link_text = $node->ownerDocument->saveHTML($node->childNodes[0]);
$link_href = $node->getAttribute("href");
$link_node = $dom->createTextNode($link_href);
$node->parentNode->replaceChild($link_node, $node);
}
// or probably better :
$html = $dom->saveHTML() ;
$html = substr($html,5,-7); // remove <div>
var_dump($html); // "Some HTML with a http://stackoverflow.com"
This works is the input string is :
<p>Some HTML with a http://stackoverflow.com</p>
outputs :
<p>Some HTML with a http://stackoverflow.com</p>
Related
Im using Dom parser to extract content from html page within body tag something like:
$html = file_get_contents('index.php');
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$body = "";
foreach($dom->getElementsByTagName("body")->item(0)->childNodes as $child) {
$body .= $dom->saveHTML($child);
}
echo $body;
but now I need to put modified $body variable back into index.php file and replace content within tag. Can I do that with file_put_contents()
I need to search for an element by ID using PHP then appending html content to it. It seems simple enough but I'm new to php and can't find the right function to use to do this.
$html = file_get_contents('http://example.com');
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$descBox = $doc->getElementById('element1');
I just don't know how to do the next step. Any help would be appreciated.
Like chris mentioned in his comment try using DOMNode::appendChild, which will allow you to add a child element to your selected element and DOMDocument::createElement to actually create the element like so:
$html = file_get_contents('http://example.com');
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
//get the element you want to append to
$descBox = $doc->getElementById('element1');
//create the element to append to #element1
$appended = $doc->createElement('div', 'This is a test element.');
//actually append the element
$descBox->appendChild($appended);
Alternatively if you already have an HTML string you want to append you can create a document fragment like so:
$html = file_get_contents('http://example.com');
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
//get the element you want to append to
$descBox = $doc->getElementById('element1');
//create the fragment
$fragment = $doc->createDocumentFragment();
//add content to fragment
$fragment->appendXML('<div>This is a test element.</div>');
//actually append the element
$descBox->appendChild($fragment);
Please note that any elements added with JavaScript will be inaccessible to PHP.
you can also append this way
$html = '
<html>
<body>
<ul id="one">
<li>hello</li>
<li>hello2</li>
<li>hello3</li>
<li>hello4</li>
</ul>
</body>
</html>';
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
//get the element you want to append to
$descBox = $doc->getElementById('one');
//create the element to append to #element1
$appended = $doc->createElement('li', 'This is a test element.');
//actually append the element
$descBox->appendChild($appended);
echo $doc->saveHTML();
dont forget to saveHTML on the last line
All I want to do is save the first div with attribute role="main" as a string from an external URL using PHP.
So far I have this:
$doc = new DOMDocument();
#$doc->loadHTMLFile("http://example.com/");
$xpath = new DOMXPath($doc);
$elements = $xpath->query('//div[#role="main"]');
$str = "";
if ($elements->length > 0) {
$str = $elements->item(0)->textContent;
}
echo htmlentities($str);
But unfortunately the $str does not seem to be displaying the HTML tags. Just the text.
You can get the HTML via the saveHTML() method.
$str = $doc->saveHTML($elements->item(0));
I have a string that contains HTML and I would like to insert this HTML in a DOMElement.
For that, I did:
$abstract = "<p xmlns:default="http://www.w3.org/1998/Math/MathML">Test string <formula type="inline"><default:math xmlns="http://www.w3.org/1998/Math/MathML"><default:mi>π</default:mi></default:math></formula></p>"
$dom = new \DOMDocument();
#$dom->loadHTML($abstract);
$frag = $dom->createDocumentFragment();
When var dumping the $frag->nodeValue, I am getting null. Any idea?
I am not sure what you expect, you creating a new fragment and you add no content. Even if you do it would not work because the document fragment is no node, it is an helper construct to add a XML fragment to a document.
Here is an example:
$dom = new \DOMDocument();
$body = $dom->appendChild($dom->createElement('body'));
$fragment = $dom->createDocumentFragment();
$fragment->appendXml('<p>first</p>second');
$body->appendChild($fragment);
echo $dom->saveHtml();
Output:
<body><p>first</p>second</body>
Say I have the following link:
<li class="hook">
I_have_underscores
</li>
How would I, remove the underscores only in the text and not the href? I have used str_replace, but this removes all underscores, which isn't ideal.
So basically I would be left with this output:
<li class="hook">
I have underscores
</li>
Any help, much appreciated
You can use a HTML DOM parser to get the text within the tags, and then run your str_replace() function on the result.
Using the DOM Parser I linked, it is as simple as something like this:
$html = str_get_html(
'<li class="hook">I_have_underscores</li>');
$links = $html->find('a'); // You can use any css style selectors here
foreach($links as $l) {
$l->innertext = str_replace('_', ' ', $l->innertext)
}
echo $html
//<li class="hook">I have underscores</li>
That's it.
It's safer to parse HTML with DOMDocument instead of regex. Try this code:
<?php
function replaceInAnchors($html)
{
$dom = new DOMDocument();
// loadHtml() needs mb_convert_encoding() to work well with UTF-8 encoding
$dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));
$xpath = new DOMXPath($dom);
foreach($xpath->query('//text()[(ancestor::a)]') as $node)
{
$replaced = str_ireplace('_', ' ', $node->wholeText);
$newNode = $dom->createDocumentFragment();
$newNode->appendXML($replaced);
$node->parentNode->replaceChild($newNode, $node);
}
// get only the body tag with its contents, then trim the body tag itself to get only the original content
return mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");
}
$html = '<li class="hook">
I_have_underscores
</li>';
echo replaceInAnchors($html);