I'm reading in an HTML string from a text editor and need to manipulate some of the elements before saving it to the DB.
What I have is something like this:
<h3>Some Text<img src="somelink.jpg" /></h3>
or
<h3><img src="somelink.jpg" />Some Text</h3>
and I need to put it into the following format
<h3>Some Text</h3><div class="img_wrapper"><img src="somelink.jpg" /></div>
This is the solution that I came up with.
$html = '<html><body>' . $field["data"][0] . '</body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$domNodeList = $dom->getElementsByTagName("img");
// Remove Img tags from H3 and place it before the H# tag
foreach ($domNodeList as $domNode) {
if ($domNode->parentNode->nodeName == "h3") {
$parentNode = $domNode->parentNode;
$parentParentNode = $parentNode->parentNode;
$parentParentNode->insertBefore($domNode, $parentNode->nextSibling);
}
}
echo $dom->saveHtml();
You may be looking for a preg_replace
// take a search pattern, wrap the image tag matching parts in a tag
// and put the start and ending parts before the wrapped image tag.
// note: this will not match tags that contain > characters within them,
// and will only handle a single image tag
$output = preg_replace(
'|(<h3>[^<]*)(<img [^>]+>)([^<]*</h3>)|',
'$1$3<div class="img_wrapper">$2</div>',
$input
);
I updated the question with the answer, but for good measure, here it is again in the answers section.
$html = '<html><body>' . $field["data"][0] . '</body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$domNodeList = $dom->getElementsByTagName("img");
// Remove Img tags from H3 and place it before the H# tag
foreach ($domNodeList as $domNode) {
if ($domNode->parentNode->nodeName == "h3") {
$parentNode = $domNode->parentNode;
$parentParentNode = $parentNode->parentNode;
$parentParentNode->insertBefore($domNode, $parentNode->nextSibling);
}
}
echo $dom->saveHtml();
Related
I have a function that get all <h2> using DOMDocument,
Now I want to check if there is any HTML tag between <h2>[here]</h2>, don't get the <h2> and skip to next.
My Code:
foreach ($DOM->getElementsByTagName('*') as $element) {
if ($element->tagName == 'h2') {
$h = $element->textContent;
}
}
I think the easiest thing is to just reuse getElementsByTagName("*") on the element and count how many items are found.
$html = <<<EOT
<html><body><h2>Hello</h2> <h2>World</h2><h2><strong>!</strong></h2></body></html>
EOT;
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('h2') as $h2) {
if(!count($h2->getElementsByTagName('*'))){
var_dump($h2->textContent);
}
}
Demo here: https://3v4l.org/dI1e4
I have HTML code:
<div>
<h1>Header</h1>
<code><p>First code</p></code>
<p>Next example</p>
<code><b>Second example</b></code>
</div>
Using PHP I want replace all < symbols located in code elements for example above code I want converted to:
<div>
<h1>Header</h1>
<code><p>First code</p></code>
<p>Next example</p>
<code><b>Second example</b></code>
</div>
I try using PHP DomDocument class but my work was ineffective. Below is my code:
$dom = new DOMDocument();
$dom->loadHTML($content);
$innerHTML= '';
$tmp = '';
if(count($dom->getElementsByTagName('*'))){
foreach ($dom->getElementsByTagName('*') as $child) {
if($child->tagName == 'code'){
$tmp = $child->ownerDocument->saveXML( $child);
$innerHTML .= htmlentities($tmp);
}
else{
$innerHTML .= $child->ownerDocument->saveXML($child);
}
}
}
So, you're iterating over the markup properly, and your use of saveXML() was close to what you want, but nowhere in your code do you try to actually change the contents of the element. This should work:
<?php
$content='<div>
<h1>Header</h1>
<code><p>First code</p></code>
<p>Next example</p>
<code><b>Second example</b></code>
</div>';
$dom = new DOMDocument();
$dom->loadHTML($content, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
foreach ($dom->getElementsByTagName('code') as $child) {
// get the markup of the children
$html = implode(array_map([$child->ownerDocument,"saveHTML"], iterator_to_array($child->childNodes)));
// create a node from the string
$text = $dom->createTextNode($html);
// remove existing child nodes
foreach ($child->childNodes as $node) {
$child->removeChild($node);
}
// append the new text node - escaping is done automatically
$child->appendChild($text);
}
echo $dom->saveHTML();
I have some html where I'm attempting to retrieve the text but not with the <h1> tag content.
$html = '<div class="mytext">
<h1>Title of document</h1>
This is the text that I want, without the title.
</div>';
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xp = new DOMXpath($dom);
foreach($xp->query('//div[#class="mytext"]') as $node) {
$description = $node->nodeValue;
echo $description;
}
End result should be: This is the text that I want, without the title.
Currently it's: Title of document This is the text that I want, without the title
How can I just get the text without the h1 tag?
try this:
foreach($xp->query('//div[#class="mytext"]/text()[normalize-space()]') as $node) {
$description = $node->nodeValue;
echo $description;
}
I have a html string that contains exactly one a-element in it. Example:
test
In php I have to test if rel contains external and if yes, then modify href and save the string.
I have looked for DOM nodes and objects. But they seem to be too much for only one A-element, as I have to iterate to get html nodes and I am not sure how to test if rel exists and contains external.
$html = new DOMDocument();
$html->loadHtml($txt);
$a = $html->getElementsByTagName('a');
$attr = $a->item(0)->attributes();
...
At this point I am going to get NodeMapList that seems to be overhead. Is there any simplier way for this or should I do it with DOM?
Is there any simplier way for this or should I do it with DOM?
Do it with DOM.
Here's an example:
<?php
$html = 'test';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//a[contains(concat(' ', normalize-space(#rel), ' '), ' external ')]");
foreach($nodes as $node) {
$node->setAttribute('href', 'http://example.org');
}
echo $dom->saveHTML();
I kept going to modify with DOM. This is what I get:
$html = new DOMDocument();
$html->loadHtml('<?xml encoding="utf-8" ?>' . $txt);
$nodes = $html->getElementsByTagName('a');
foreach ($nodes as $node) {
foreach ($node->attributes as $att) {
if ($att->name == 'rel') {
if (strpos($att->value, 'external')) {
$node->setAttribute('href','modified_url_goes_here');
}
}
}
}
$txt = $html->saveHTML();
I did not want to load any other library for just this one string.
The best way is to use a HTML parser/DOM, but here's a regex solution:
$html = 'test<br>
<p> Some text</p>
test2<br>
<a rel="external">test3</a> <-- This won\'t work since there is no href in it.
';
$new = preg_replace_callback('/<a.+?rel\s*=\s*"([^"]*)"[^>]*>/i', function($m){
if(strpos($m[1], 'external') !== false){
$m[0] = preg_replace('/href\s*=\s*(("[^"]*")|(\'[^\']*\'))/i', 'href="http://example.com"', $m[0]);
}
return $m[0];
}, $html);
echo $new;
Online demo.
You could use a regular expression like
if it matches /\s+rel\s*=\s*".*external.*"/
then do a regExp replace like
/(<a.*href\s*=\s*")([^"]\)("[^>]*>)/\1[your new href here]\3/
Though using a library that can do this kind of stuff for you is much easier (like jquery for javascript)
The div is like this
<div style="width:90%;margin:0 auto;color:#Black;" id="content">
this is text, severaltags
</div>
how should i get the div's content including the tags using dom in php?
Assuming your using PHP5 you can use DOMDocument -- take note that this doesn't provide simple means for retrieving inner html of an element. You can do something along the following:
function DOMinnerHTML($element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
$dom = new DOMDocument();
$dom->loadHTML($html);
$items = $dom->getElementsByTagName('div');
if ($items->length)
{
$innerHTML = DOMinnerHTML($items->item(0));
}
echo $innerHTML;
For something this simple, although I don't normally recommend it, I'd use regex:
preg_match('|<div[^>]+>(.*?)</div>|is', $html, $match);
if ($match)
{
echo 'html is: ' . $match[1][0];
}
Something like this?
$document = new DOMDocument();
$document->loadHTML($html);
$element = $document->getElementById('content');
To get the values, you can try something like this
$doc = new DOMDocument();
$doc->loadHTMLFile('link-t0-html-file.php');
$xpath = new DOMXPath($doc);
$element = $xpath->query("//*[#id='content']")->item(0);
echo $element->nodeValue;
if i am not wrong you want this
echo "< div style='width:90%;margin:0 auto;color:#000000;font-size:14px;line-height:24px;'
id='content'>";
echo "this is text, several `<br/>` tags";
echo "< /div>";
just mind it never use double quote (") within double quote ("). use single quote(') within double quote.