Suppose we have this input:
<div wrap>1</div>
<div>2</div>
<div wrap>3</div>
<div wrap>4</div>
<div wrap>5</div>
The required output should be:
<div class="wrapper">
<div wrap>1</div>
</div>
<div>2</div>
<div class="wrapper">
<div wrap>3</div>
<div wrap>4</div>
<div wrap>5</div>
</div>
Also, suppose that these elements are direct children of the body element and there can be other unrelated element or text nodes before or after them.
Notice how consecutive elements are grouped inside a single wrapper and not individually wrapped.
How would you handle body's DOMNodeList and insert the wrappers in the correct place?
Following the conversation (comments) about wrapping only direct children of the body element,
For this input:
<body>
<div wrap>1
<div wrap>1.1</div>
</div>
<div>2</div>
<div wrap>3</div>
<div wrap>4</div>
<div wrap>5</div>
</body>
The required output should be:
<body>
<div class="wrapper">
<div wrap>1
<div wrap>1.1</div>
<!–– ignored ––>.
</div>
</div>
<div>2</div>
<div class="wrapper">
<div wrap>3</div>
<div wrap>4</div>
<div wrap>5</div>
</div>
</body>
Notice how elements that are not direct descendants of the body element are totally ignored.
It's been interesting to write and would be good to see other solutions, but here is my attempt anyway.
I've added comments in the code rather than describing the method here as I think the comments make it easier to understand...
// Test HTML
$startHTML = '<div wrap>1</div>
<div>2</div>
<div wrap>3</div>
<div wrap>4</div>
<div wrap>5</div>';
$doc = new DOMDocument();
$doc->loadHTML($startHTML);
$xp = new DOMXPath($doc);
// Find any div tag with a wrap attribute which doesn't have an immediately preceeding
// tag with a wrap attribute, (or the first node which means it won't have a preceeding
// element anyway)
$wrapList = $xp->query("//div[#wrap='' and preceding-sibling::*[1][not(#wrap)]
or position() = 1]");
// Iterate over each of the first in the list of wrapped nodes
foreach ( $wrapList as $wrap ) {
// Create new wrapper
$wrapper = $doc->createElement("div");
$class = $doc->createAttribute("class");
$class->value = "wrapper";
$wrapper->appendChild($class);
// Copy subsequent wrap nodes (if any)
$nextNode = $wrap->nextSibling;
while ( $nextNode ) {
$next = $nextNode;
$nextNode = $nextNode->nextSibling;
// If it's an element (and not a text node etc)
if ( $next->nodeType == XML_ELEMENT_NODE ) {
// If it also has a wrap attribute - copy it
if ($next->hasAttribute("wrap") ) {
$wrapper->appendChild($next);
}
// If no attribute, then finished copying
else {
break;
}
}
}
// Replace first wrap node with new wrapper
$wrap->parentNode->replaceChild($wrapper, $wrap);
// Move the wrap node into the wrapper
$wrapper->insertBefore($wrap, $wrapper->firstChild);
}
echo $doc->saveHTML();
As it's using HTML, the end result is all wrapped in the standard tags as well, but the output (formatted) is...
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
<div class="wrapper">
<div wrap>1</div>
</div>
<div>2</div>
<div class="wrapper">
<div wrap>3</div>
<div wrap>4</div>
<div wrap>5</div>
</div>
</body>
</html>
Edit:
If you only want it to apply to direct descendants of the <body> tag, then update the XPath expression to include it as part of the criteria...
$wrapList = $xp->query("//body/div[#wrap='' and preceding-sibling::*[1][not(#wrap)]
or position() = 1]");
Related
I would like to know if there is any way, in php, to match all classes with the same word,
Example:
<div class="classeby_class">
<div class="classos-nope">
<div class="row">
<div class="class-show"></div>
</div>
</div>
</div>
<div class="class-first-one">
<div class="container">
<div class="classes-show">
<div class="class"></div>
<div class="classing"></div>
</div>
</div>
</div>
in the example above I would like to match all div that contain the word "class" but do not match those that have the word "classes"
like,
positive for
<div class="class-show">...</div>
<div class="class-first-one">...</div>
<div class="class">...</div>
<div class="class-first-one">...</div>
but negative for
<div class="classeby_class">...</div>
<div class="classes-show">...</div>
<div class="classing">...</div>
I am using php to display several different html pages.
As regex would not be the appropriate method, first because of several page breaks, second because of hosting limitations, I'm trying to do this by parse.
All html code is stored on the server.
I can liminate with a specific class using the example below.
$doc = new DomDocument();
$xpath = new DOMXPath($doc);
$classtoremove = $xpath->query('//div[contains(#class,"class")]');
foreach($classtoremove as $classremoved){
$classremoved->parentNode->removeChild($classremoved);
}
echo $HTMLDoc->saveHTML();
I know there are CSS selectors, but when I try to use it in PHP it doesn't work. Possibly because I'm using XPath.
Example:
'[id*="class"],[class*="class"]'
Still, I think he would take values beyond what I need.
Any way to get these values by Xpath?
the intent is to completely remove the div or other tags that contain that word.
You could make use of a regex with word boundaries \bclass\b for the class attribtute and make use of DOMXPath::registerPhpFunctions.
For example
$data = <<<DATA
<div class="classeby_class">
<div class="classos-nope">
<div class="row">
<div class="class-show"></div>
</div>
</div>
</div>
<div class="class-first-one">
<div class="container">
<div class="classes-show">
<div class="class"></div>
<div class="classing"></div>
</div>
</div>
</div>
DATA;
$doc = new DomDocument();
$doc->loadHTML($data);
$xpath = new DOMXPath($doc);
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions();
$classtoremove = $xpath->query("//div[1 = php:function('preg_match', '/\bclass\b/', string(#class))]");
foreach ($classtoremove as $a) {
var_dump($a->getAttribute("class"));
}
Output
string(10) "class-show"
string(15) "class-first-one"
string(5) "class"
See a PHP demo
I have a known Div Class name, and I can retrieve the inner html code all good, but how would I retrieve the next Div Class name (not inner from the known Div Class) using php, dom document and xpath?
For example with the code below, if I know the Div class "mobile-container mobile-filter-container", how would I return "mobile-container mobile-cart-content-container"?
<div class="mobile-container mobile-filter-container">
<div class="mobile-wrapper-header"></div>
<div class="mobile-filter-wrapper"></div>
</div>
<div class="mobile-container mobile-cart-content-container">
<div class="mobile-wrapper-header">
Thanks,
I believe this should get you close enough to what you need:
$data = <<<DATA
<html>
<div class="mobile-container mobile-filter-container">
<div class="mobile-wrapper-header"></div>
<div class="mobile-filter-wrapper"></div>
</div>
<div class="mobile-container mobile-cart-content-container">
<div class="mobile-wrapper-header"></div>
<div class="mobile-filter-wrapper"></div>
</div>
<div class="unwanted">
<div class="mobile-wrapper-header"></div>
<div class="mobile-filter-wrapper"></div>
</div>
</html>
DATA;
$doc = new DOMDocument();
$doc->loadHTML($data);
$xpath = new DOMXpath($doc);
$elements = $xpath->query('.//div[#class="mobile-container mobile-filter-container"]/following-sibling::div[1]/#class');
echo $elements[0]->nodeValue;
Output:
mobile-container mobile-cart-content-container
I inherited the following piece of PHP code, that removes elements from the DOM before pushing the content into a page. We only want to show the first 5 elements to not have a too long page
Assuming the code retrieves an HTML fragment structured like this:
<div class='year'>2019</div>
<div class='record'>Record A</div>
<div class='record'>Record B</div>
<div class='year'>2018</div>
<div class='record'>Record C</div>
<div class='record'>Record D</div>
<div class='record'>Record E</div>
<div class='year'>2017</div>
<div class='record'>Record F</div>
<div class='year'>2016</div>
<div class='record'>Record G</div>
Now, the below piece of code removes all the extra records:
$dom = new DOMDocument();
// be sure to load the encoding
$dom->loadHTML('<?xml encoding="utf-8" ?>' . $tmp);
// let's use XPath
$finder = new DomXPath($dom);
// set the limit
$limit = 5; $cnt = 0;
// and remove unwanted elements
foreach($finder->query("//*[contains(#class, 'record')]") as $elm ) {
if ($cnt >= $limit)
$elm->parentNode->removeChild($elm);
$cnt++;
}
// finally, echo
echo $dom->saveHTML($dom->documentElement);
Logically, I end up having the following HTML:
<div class='year'>2019</div>
<div class='record'>Record A</div>
<div class='record'>Record B</div>
<div class='year'>2018</div>
<div class='record'>Record C</div>
<div class='record'>Record D</div>
<div class='record'>Record E</div>
<div class='year'>2017</div>
<div class='year'>2016</div>
How could I identify all the elements having the class year and having the next sibling also having this class and delete it? (here that would get the 2017 element)
Then I believe it would only be a matter of checking if the last element has the class year and remove it.
Or is there a cleaner way to achieve that?
You can add an extra foreach after the current one...
foreach($finder->query("//div[#class='year']/following-sibling::div[1][#class='year']")
as $elm ) {
$elm->parentNode->removeChild($elm);
}
The XPath here is looking for a <div class="year"> element and then only looking at the next <div> tag for the same thing (following-sibling::div[1] limits it to just the next div tag after the current one).
Here is a plain JS method in case you want to do this on the client instead
const recs = document.querySelectorAll(".record");
const divs = document.querySelectorAll("div");
const lastRec = recs[4];
let found = false;
divs.forEach(div => {
div.classList.toggle("hide",found)
if (div === lastRec) found = true
})
.hide { display:none}
<div class='year'>2019</div>
<div class='record'>Record A</div>
<div class='record'>Record B</div>
<div class='year'>2018</div>
<div class='record'>Record C</div>
<div class='record'>Record D</div>
<div class='record'>Record E</div>
<div class='year'>2017</div>
<div class='record'>Record F</div>
<div class='year'>2016</div>
<div class='record'>Record G</div>
I finally ended up using the following code:
$dom = new DOMDocument();
// be sure to load the encoding
$dom->loadHTML('<?xml encoding="utf-8" ?>' . $tmp);
// let's use XPath
$finder = new DomXPath($dom);
foreach($finder->query("(//*[contains(#class, 'record')])[5]/following-sibling::*") as $elm) {
$elm->parentNode->removeChild($elm);
}
// finally, echo
echo $dom->saveHTML($dom->documentElement);
it allowed me to achieve my goal in 1 pass without using nested loops
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML with PHP?
I wasn't sure how to phrase this question.
Basically I have this php code:
$new_html = preg_replace('!<div.*?id="spotlight".*?>.*?</div>!is', '', $html);
I want this to change html code from this (example, not actual html):
<div id="container">
<div id="spotlight">
<!-- empty -->
</div>
<div id="content">
<!-- lots of content -->
</div>
</div>
To this:
<div id="container">
<div id="content">
<!-- lots of content -->
</div>
</div>
As you can see the php code will do this successfully, because the regex is looking for:
<div{anything}id="spotlight"{anything}>{anything}</div>
However
if the div id="spotlight" contains a child div like so:
<div id="container">
<div id="spotlight">
<div></div>
</div>
<div id="content">
<!-- lots of content -->
</div>
</div>
then the regex will match the end div tag of the child div!
How do i prevent this? How to i tell regex to ignore the closing div if another div was opened?
Thanks
Use DOMDocument:
$html = '<div id="container">
<div id="spotlight">
<!-- empty -->
</div>
<div id="content">
<!-- lots of content -->
</div>
</div>';
$dom = new DOMDocument;
$dom->loadXML($html);
$xpath = new DOMXPath($dom);
$query = '//div[#id="spotlight"]';
$entries = $xpath->query($query);
foreach($entries as $one){
$one->parentNode->removeChild($one);
}
echo $dom->saveHTML();
Codepad Example
$a = preg_replace('/<div[^>]+>\\s+<\/div>/', '', $a);
I'm trying to use DOMDocument and XPath to search an HTML document using PHP. I want to search by a number such as '022222', and it should return the value of the corresponding h2 tag. Any thoughts on how this would be done?
The HTML document can be found at http://pastie.org/1211369
How about this?
$sxml = simplexml_load_string($data);
$find = "022222";
print_r($sxml->xpath("//li[.='".$find."']/../../../div[#class='content']/h2"));
It returns:
Array
(
[0] => SimpleXMLElement Object
(
[0] => Item 2
)
)
//li[.='xxx'] will locate the li your searching for. Then we use ../ to step up three levels, before we descend into the content-div, as specified by div[#class='content']. Finally we choose the h2 child.
Just FYI, here's how to do it using DOM:
$dom = new DOMDocument();
$dom->loadXML($data);
$find = "022222";
$xpath = new DOMXpath($dom);
$res = $xpath->evaluate("//li[.='".$find."']/../../../div[#class='content']/h2");
if ($res->length > 0) {
$node = $res->item(0);
echo $node->firstChild->wholeText."\n";
}
I want to search by a number such as '022222', and it should return the value of the corresponding h2 tag. Any thoughts on how this would be done?
The HTML document can be found at http://pastie.org/1211369
To start with, the text at the provided link is not a well-formed XML or XHtml document and cannot be directly parsed with XPath.
Therefore I have wrapped it inan <html> element.
On this XML document one of the XPath expressions that selects exactly the wanted text node is:
/*/div[div/ul/li = '022222']/div[#class='content']/h2/text()
Among other advantages, this XPath expression doesn't use any reverse axes and is thus more readable.
The complete XML document on which this XPath expression is evaluated is the following:
<html>
<div class="item">
<div class="content"><h2>Item 1</h2></div>
<div class="phone">
<ul class="phone-single">
<li>01234 567890</li>
</ul>
</div>
</div>
<div class="item">
<div class="content"><h2>Item 2</h2></div>
<div class="phone">
<ul class="phone-multiple">
<li>022222</li>
<li>033333</li>
</ul>
</div>
</div>
<div class="item">
<div class="content"><h2>Item 3</h2></div>
<div class="phone">
<ul class="phone-single">
<li>02345 678901</li>
</ul>
</div>
</div>
<div class="item">
<div class="content"><h2>Item 4</h2></div>
<div class="phone">
<ul class="phone-multiple">
<li>099999999</li>
<li>088888888</li>
</ul>
</div>
</div>
</html>