Change class of all ancestor li elements with PHP DOMDocument

Change class of all ancestor li elements with PHP DOMDocument - php

I have nested bullet lists of ul > li > ul > li, etc.
<ul>
<li>Mammals
<ul>
<li>Canine
<ul>
<li>Fox</li>
<li>Wolf</li>
</ul>
</li>
<li>Feline</li>
</ul>
</li>
<li>Fish</li>
</ul>
How can I apply a class to all "li" elements (recursively) which are ancestors of the target element? I have:
<?php
$list = ob_get_clean();
$dom = new DOMDocument;
$dom->loadHTML($list);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//li');
foreach ($nodes as $object) {
$parts = parse_url($object->nodeValue);
parse_str($parts['query'], $query);
if (true) {
//if certain requirements are met, modify the current object
}
//also modify all ancestor li elements
//$object-> ?? ->setAttribute('class', 'current');
}
?>
There are reasons that the target objects must be identified before searching through each ones' ancestors. I just stripped this code down for relevancy.

Work up the chain of parent nodes, altering those which are li nodes:
<?php
$parentNode = $node->parentNode; // ul
while ($parentNode = $parentNode->parentNode) {
if ($parentNode->nodeName == 'LI') {
$parentNode->setAttribute('class', 'current');
}
}

Related

How to get all child nodes from DOMDocument?

I have the following
$string = '<html><head></head><body><ul id="mainmenu">
<li id="1">Hallo</li>
<li id="2">Welt
<ul>
<li id="3">Sub Hallo</li>
<li id="4">Sub Welt</li>
</ul>
</li>
</ul></body></html>';
$dom = new DOMDocument;
$dom->loadHTML($string);
now I want to have all li IDs inside one array.
I tried the following:
$all_li_ids = array();
$menu_nodes = $dom->getElementById('mainmenu')->childNodes;
foreach($menu_nodes as $li_node){
if($li_node->nodeName=='li'){
$all_li_ids[]=$li_node->getAttribute('id');
}
}
print_r($all_li_ids);
As you might see, this will print out [1,2]
How do I get all children (the subchildren as well [1,2,3,4])?

My test doesn't return element by using $dom->getElementById('mainmenu'). But if your using does, do not use Xpath
$xpath = new DOMXPath($dom);
$ul = $xpath->query("//*[#id='mainmenu']")->item(0);
$all_li_ids = array();
// Find all inner li tags
$menu_nodes = $ul->getElementsByTagName('li');
foreach($menu_nodes as $li_node){
$all_li_ids[]=$li_node->getAttribute('id');
}
print_r($all_li_ids); 1,2,3,4

One way to do it would be to add another foreach loop, ie:
foreach($menu_nodes as $node){
if($node->nodeName=='li'){
$all_li_ids[]=$node->getAttribute('id');
}
foreach($node as $sub_node){
if($sub_node->nodeName=='li'){
$all_li_ids[]=$sub_node->getAttribute('id');
}
}
}

DOMDocument, get the text in an element that follow a found element

I'd like to get the text for the ul>li that immediately follows the with the text ABC. The text in this case would be 123.
<h2>CDE</h2>
<ul>...</ul>
<h2>ABC</h2>
<ul>
<li>
<span>123</span>
</li>
</ul>
This is what I have, but it's not working
$dom = new DOMDocument();
$dom->loadHTML($html); // $html is the code above
$h2_all = $dom->getElementsByTagName('h2');
foreach($h2_all as $h2) {
$h2_text = $h2->textContent;
if (trim(strtolower($h2_text)) == 'abc') {
var_dump($h2->nextSibling);
}
}
I assume it's because $h2 doesn't contain the ul data I need, but I'm not sure how to get it.

You can use an xpath query:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xp = new DOMXPath($dom);
$qry = '//ul[preceding::h2[1] = "ABC"]/li/span';
$result = $xp->query($qry)->item(0)->nodeValue;
query details:
// # the path can start from anywhere in the dom tree
ul
[preceding::h2[1] = "ABC"] # condition: the first preceding h2 has the value "ABC"
/li/span # lets continue the path until the span node

Check the siblings and find the first ul:
$ul = null;
foreach($dom->getElementsByTagName('h2') as $h2) {
if(trim(strtolower($h2->textContent)) == "abc") {
$obj = $h2->nextSibling;
while($obj != null) {
if($obj->nodeName == "ul") {
$ul = $obj;
break 2;
}
$obj = $obj->nextSibling;
}
}
}
//make sure ul has at least one li
if($ul != null && $ul->firstChild != null) {
echo $ul->firstChild->nodeValue;
}

pick the elements INSIDE the ul NOT the ul itself

this is the php code:
include_once('simple_html_dom.php');
$html = file_get_html('URL');
$elem = $html->find('ul[id=members-list]', 0);
echo $elem;
I would like to be able to pick the inside of the UL so the elements per se, not the ul itself.
html as follows:
<ul id="members-list">
<li>1</li>
<li>2</li>
<li>3</li>
<li>4</li>
</ul>
so when I do echo $elem it returns the ul included. I want to take it out just return :
<li>1</li>
<li>2</li>
<li>3</li>
<li>4</li>

You just forgot to use children() method. Consider this example:
$ul = $html->find('ul[id="members-list"]', 0)->children();
foreach($ul as $li) {
echo $li;
}
Is stated in the manual:
How to traverse the DOM tree? -> Traverse the DOM Tree
mixed$e->children ( [int $index] ) Returns the Nth child object if index is set, otherwise return an array of children.
Or the much easier way: ->innertext magic attribute
$ul = $html->find('ul[id="members-list"]', 0);
echo $ul->innertext;

You can use:
$('#members-list li')
to iterate over them:
$('#members-list li').each(function(){
console.log(this);//object of current li
});

Have a look at this this also will print the values of inside li tag
<?php
$html = file_get_contents('2.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('ul') as $node) {
foreach($node->childNodes as $childNode){
echo $childNode->nodeValue;
}
}
?>

You have to use .html() for the selected ul element :
include_once('simple_html_dom.php');
$html = file_get_html('URL');
$elem = $html->find('ul[id=members-list]', 0)->html();
^-- to get all child elements with tag
echo $elem;

PHP: DOMDocument: Remove Unwanted Text from a Nested Element

I have the following xml document:
<?xml version="1.0" encoding="UTF-8"?>
<header level="2">My Header</header>
<ul>
<li>Bulleted style text
<ul>
<li>
<paragraph>1.Sub Bulleted style text</paragraph>
</li>
</ul>
</li>
</ul>
<ul>
<li>Bulleted style text <strong>bold</strong>
<ul>
<li>
<paragraph>2.Sub Bulleted <strong>bold</strong></paragraph>
</li>
</ul>
</li>
</ul>
I need to remove the numbers preceeding the Sub-bulleted text. 1. and 2. in the given example
This is the code I have so far:
<?php
class MyDocumentImporter
{
const AWKWARD_BULLET_REGEX = '/(^[\s]?[\d]+[\.]{1})/i';
protected $xml_string = '<some_tag><header level="2">My Header</header><ul><li>Bulleted style text<ul><li><paragraph>1.Sub Bulleted style text</paragraph></li></ul></li></ul><ul><li>Bulleted style text <strong>bold</strong><ul><li><paragraph>2.Sub Bulleted <strong>bold</strong></paragraph></li></ul></li></ul></some_tag>';
protected $dom;
public function processListsText( $loop = null ){
$this->dom = new DomDocument('1.0', 'UTF-8');
$this->dom->loadXML($this->xml_string);
if(!$loop){
//get all the li tags
$li_set = $this->dom->getElementsByTagName('li');
}
else{
$li_set = $loop;
}
foreach($li_set as $li){
//check for child nodes
if(! $li->hasChildNodes() ){
continue;
}
foreach($li->childNodes as $child){
if( $child->hasChildNodes() ){
//this li has children, maybe a <strong> tag
$this->processListsText( $child->childNodes );
}
if( ! ( $child instanceof DOMElement ) ){
continue;
}
if( ( $child->localName != 'paragraph') || ( $child instanceof DOMText )){
continue;
}
if( preg_match(self::AWKWARD_BULLET_REGEX, $child->textContent) == 0 ){
continue;
}
$clean_content = preg_replace(self::AWKWARD_BULLET_REGEX, '', $child->textContent);
//set node to empty
$child->nodeValue = '';
//add updated content to node
$child->appendChild($child->ownerDocument->createTextNode($clean_content));
//$xml_output = $child->parentNode->ownerDocument->saveXML($child);
//var_dump($xml_output);
}
}
}
}
$importer = new MyDocumentImporter();
$importer->processListsText();
The issue I can see is that $child->textContent returns the plain text content of the node, and strips the additional child tags. So:
<paragraph>2.Sub Bulleted <strong>bold</strong></paragraph>
becomes
<paragraph>Sub Bulleted bold</paragraph>
The <strong> tag is no more.
I'm a little stumped... Can anyone see a way to strip the unwanted characters, and retain the "inner child" <strong> tag?
The tag may not always be <strong>, it could also be a hyperlink <a href="#">, or <emphasize>.

Assuming your XML actually parses, you could use XPath to make your queries a lot easier:
$xp = new DOMXPath($this->dom);
foreach ($xp->query('//li/paragraph') as $para) {
$para->firstChild->nodeValue = preg_replace('/^\s*\d+.\s*/', '', $para->firstChild->nodeValue);
}
It does the text replacement on the first text node instead of the whole tag contents.

You resetting its whole content, but what you want is only to alter the first text node (keep in mind text nodes are nodes too). You might want to look for the xpath //li/paragraph/text()[position()=1], and work on / replace that DOMText node instead of the whole paragraph content.
$d = new DOMDocument();
$d->loadXML($xml);
$p = new DOMXPath($d);
foreach($p->query('//li/paragraph/text()[position()=1]') as $text){
$text->parentNode->replaceChild(new DOMText(preg_replace(self::AWKWARD_BULLET_REGEX, '', $text->textContent),$text);
}

PHP Grab HTML Tag Rows/Hierarchy

So I wanted to know if there was a way to grab a specific HTML tag's information using PHP.
Let's say we had this code:
<ul>
<li>First List</li>
<li>Second List</li>
<li>Third List</li>
</ul>
How could I search the HTML and pull the third list item's value into a variable? Or is there a way you could pull the entire unordered list into an array?

Has not been tested or compiled, but one way is create a function that utilizes PHP: DOMDocument and its method getElementsByTagName which returns a
PHP: DOMNodeList that you can access the node at a specific index.
function grabAttributes($file, $tag, $index) {
$dom = new DOMDocument();
if (!#$dom->load($file)) {
echo $file . " doesn't exist!\n";
return;
}
$list = $dom->getElementsByTagName($tag); // returns DOMNodeList of given tag
$newElement = $list->item($index)->nodeValue; // initialize variable
return $newElement;
}
If you call grabAttributes("myfile.html", "li", 2) the variable will be set to "Third List"
Or you can make a function to put all the attributes of a given tag into an array.
function putAttributes($file, $tag) {
$dom = new DOMDocument();
if (!#$dom->load($file)) {
echo $file . " doesn't exist!\n";
return;
}
$list = $dom->getElementsByTagName($tag); // returns DOMNodeList of given tag
$myArray = array(); // array to contain values.
foreach ($list as $tag) { // loop through node list and add to an array.
$myArray[] = $tag->nodeValue;
}
return $myArray;
}
If you call putAttributes("myfile.html", "li") it would return array("First List", "Second List", "Third List")

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Change class of all ancestor li elements with PHP DOMDocument - php

Work up the chain of parent nodes, altering those which are li nodes: <?php $parentNode = $node->parentNode; // ul while ($parentNode = $parentNode->parentNode) { if ($parentNode->nodeName == 'LI') { $parentNode->setAttribute('class', 'current'); } }

Related

How to get all child nodes from DOMDocument?

DOMDocument, get the text in an element that follow a found element

pick the elements INSIDE the ul NOT the ul itself

PHP: DOMDocument: Remove Unwanted Text from a Nested Element

PHP Grab HTML Tag Rows/Hierarchy

Categories

Resources