this is the php code:
include_once('simple_html_dom.php');
$html = file_get_html('URL');
$elem = $html->find('ul[id=members-list]', 0);
echo $elem;
I would like to be able to pick the inside of the UL so the elements per se, not the ul itself.
html as follows:
<ul id="members-list">
<li>1</li>
<li>2</li>
<li>3</li>
<li>4</li>
</ul>
so when I do echo $elem it returns the ul included. I want to take it out just return :
<li>1</li>
<li>2</li>
<li>3</li>
<li>4</li>
You just forgot to use children() method. Consider this example:
$ul = $html->find('ul[id="members-list"]', 0)->children();
foreach($ul as $li) {
echo $li;
}
Is stated in the manual:
How to traverse the DOM tree? -> Traverse the DOM Tree
mixed$e->children ( [int $index] ) Returns the Nth child object if index is set, otherwise return an array of children.
Or the much easier way: ->innertext magic attribute
$ul = $html->find('ul[id="members-list"]', 0);
echo $ul->innertext;
You can use:
$('#members-list li')
to iterate over them:
$('#members-list li').each(function(){
console.log(this);//object of current li
});
Have a look at this this also will print the values of inside li tag
<?php
$html = file_get_contents('2.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('ul') as $node) {
foreach($node->childNodes as $childNode){
echo $childNode->nodeValue;
}
}
?>
You have to use .html() for the selected ul element :
include_once('simple_html_dom.php');
$html = file_get_html('URL');
$elem = $html->find('ul[id=members-list]', 0)->html();
^-- to get all child elements with tag
echo $elem;
Related
I have nested bullet lists of ul > li > ul > li, etc.
<ul>
<li>Mammals
<ul>
<li>Canine
<ul>
<li>Fox</li>
<li>Wolf</li>
</ul>
</li>
<li>Feline</li>
</ul>
</li>
<li>Fish</li>
</ul>
How can I apply a class to all "li" elements (recursively) which are ancestors of the target element? I have:
<?php
$list = ob_get_clean();
$dom = new DOMDocument;
$dom->loadHTML($list);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//li');
foreach ($nodes as $object) {
$parts = parse_url($object->nodeValue);
parse_str($parts['query'], $query);
if (true) {
//if certain requirements are met, modify the current object
}
//also modify all ancestor li elements
//$object-> ?? ->setAttribute('class', 'current');
}
?>
There are reasons that the target objects must be identified before searching through each ones' ancestors. I just stripped this code down for relevancy.
Work up the chain of parent nodes, altering those which are li nodes:
<?php
$parentNode = $node->parentNode; // ul
while ($parentNode = $parentNode->parentNode) {
if ($parentNode->nodeName == 'LI') {
$parentNode->setAttribute('class', 'current');
}
}
I was seeing this tip
PHP DOM get items from first ul element
But in this case:
<li>First item
<ul>
<li>
First SubItem
</li>
<li>
Second SubItem
</li>
</ul>
</li>
PHP Code:
$DOM = new DOMDocument;
libxml_use_internal_errors(true);
$DOM->loadHTML( $output);
$items = $DOM->getElementsByTagName('ul');
echo '<ul>';
foreach ($items->item(3)->getElementsByTagName('li') as $li) {
var_dump($li);die();
echo '<li>'.$li->nodeValue;
$ul = $li->getElementsByTagName('ul');
echo '<ul>';
echo '--->'.$ul->length.'<br>';
for($u=0;$u<$ul->length;$u++){
foreach ($ul->item($u)->getElementsByTagName('li') as $lii) {
echo '<li>'.$lii->nodeValue.'</li>';
}
}
echo '</ul>';
echo '</li>';
}
echo '</ul>';
The Problem is:
Im getting in //$li->nodeValue;// "First itemFirst SubItemSecond SubItem" as the Fist node;
I need get this items separated (subItems)
I'm assuming you just want to retrieve the text values from those <li> tags.
You can greatly simplify the query with DOMXPath as ->query('//li') will fetch all <li> tags in your code snippet.
$DOM = new DOMDocument();
$DOM->loadHTML($output);
$xPath = new DOMXPath($DOM);
if($xpResponse = $xPath->query('//li/text()')) {
echo "<ul>\n";
foreach($xpResponse as $xNode) {
echo "<li>" . trim($xNode->nodeValue) . "</li>\n";
}
echo "</ul>\n";
}
This will simply output (as HTML):
First item
First SubItem
Second SubItem
I have the following
$string = '<html><head></head><body><ul id="mainmenu">
<li id="1">Hallo</li>
<li id="2">Welt
<ul>
<li id="3">Sub Hallo</li>
<li id="4">Sub Welt</li>
</ul>
</li>
</ul></body></html>';
$dom = new DOMDocument;
$dom->loadHTML($string);
now I want to have all li IDs inside one array.
I tried the following:
$all_li_ids = array();
$menu_nodes = $dom->getElementById('mainmenu')->childNodes;
foreach($menu_nodes as $li_node){
if($li_node->nodeName=='li'){
$all_li_ids[]=$li_node->getAttribute('id');
}
}
print_r($all_li_ids);
As you might see, this will print out [1,2]
How do I get all children (the subchildren as well [1,2,3,4])?
My test doesn't return element by using $dom->getElementById('mainmenu'). But if your using does, do not use Xpath
$xpath = new DOMXPath($dom);
$ul = $xpath->query("//*[#id='mainmenu']")->item(0);
$all_li_ids = array();
// Find all inner li tags
$menu_nodes = $ul->getElementsByTagName('li');
foreach($menu_nodes as $li_node){
$all_li_ids[]=$li_node->getAttribute('id');
}
print_r($all_li_ids); 1,2,3,4
One way to do it would be to add another foreach loop, ie:
foreach($menu_nodes as $node){
if($node->nodeName=='li'){
$all_li_ids[]=$node->getAttribute('id');
}
foreach($node as $sub_node){
if($sub_node->nodeName=='li'){
$all_li_ids[]=$sub_node->getAttribute('id');
}
}
}
I have the following xml document:
<?xml version="1.0" encoding="UTF-8"?>
<header level="2">My Header</header>
<ul>
<li>Bulleted style text
<ul>
<li>
<paragraph>1.Sub Bulleted style text</paragraph>
</li>
</ul>
</li>
</ul>
<ul>
<li>Bulleted style text <strong>bold</strong>
<ul>
<li>
<paragraph>2.Sub Bulleted <strong>bold</strong></paragraph>
</li>
</ul>
</li>
</ul>
I need to remove the numbers preceeding the Sub-bulleted text. 1. and 2. in the given example
This is the code I have so far:
<?php
class MyDocumentImporter
{
const AWKWARD_BULLET_REGEX = '/(^[\s]?[\d]+[\.]{1})/i';
protected $xml_string = '<some_tag><header level="2">My Header</header><ul><li>Bulleted style text<ul><li><paragraph>1.Sub Bulleted style text</paragraph></li></ul></li></ul><ul><li>Bulleted style text <strong>bold</strong><ul><li><paragraph>2.Sub Bulleted <strong>bold</strong></paragraph></li></ul></li></ul></some_tag>';
protected $dom;
public function processListsText( $loop = null ){
$this->dom = new DomDocument('1.0', 'UTF-8');
$this->dom->loadXML($this->xml_string);
if(!$loop){
//get all the li tags
$li_set = $this->dom->getElementsByTagName('li');
}
else{
$li_set = $loop;
}
foreach($li_set as $li){
//check for child nodes
if(! $li->hasChildNodes() ){
continue;
}
foreach($li->childNodes as $child){
if( $child->hasChildNodes() ){
//this li has children, maybe a <strong> tag
$this->processListsText( $child->childNodes );
}
if( ! ( $child instanceof DOMElement ) ){
continue;
}
if( ( $child->localName != 'paragraph') || ( $child instanceof DOMText )){
continue;
}
if( preg_match(self::AWKWARD_BULLET_REGEX, $child->textContent) == 0 ){
continue;
}
$clean_content = preg_replace(self::AWKWARD_BULLET_REGEX, '', $child->textContent);
//set node to empty
$child->nodeValue = '';
//add updated content to node
$child->appendChild($child->ownerDocument->createTextNode($clean_content));
//$xml_output = $child->parentNode->ownerDocument->saveXML($child);
//var_dump($xml_output);
}
}
}
}
$importer = new MyDocumentImporter();
$importer->processListsText();
The issue I can see is that $child->textContent returns the plain text content of the node, and strips the additional child tags. So:
<paragraph>2.Sub Bulleted <strong>bold</strong></paragraph>
becomes
<paragraph>Sub Bulleted bold</paragraph>
The <strong> tag is no more.
I'm a little stumped... Can anyone see a way to strip the unwanted characters, and retain the "inner child" <strong> tag?
The tag may not always be <strong>, it could also be a hyperlink <a href="#">, or <emphasize>.
Assuming your XML actually parses, you could use XPath to make your queries a lot easier:
$xp = new DOMXPath($this->dom);
foreach ($xp->query('//li/paragraph') as $para) {
$para->firstChild->nodeValue = preg_replace('/^\s*\d+.\s*/', '', $para->firstChild->nodeValue);
}
It does the text replacement on the first text node instead of the whole tag contents.
You resetting its whole content, but what you want is only to alter the first text node (keep in mind text nodes are nodes too). You might want to look for the xpath //li/paragraph/text()[position()=1], and work on / replace that DOMText node instead of the whole paragraph content.
$d = new DOMDocument();
$d->loadXML($xml);
$p = new DOMXPath($d);
foreach($p->query('//li/paragraph/text()[position()=1]') as $text){
$text->parentNode->replaceChild(new DOMText(preg_replace(self::AWKWARD_BULLET_REGEX, '', $text->textContent),$text);
}
So I wanted to know if there was a way to grab a specific HTML tag's information using PHP.
Let's say we had this code:
<ul>
<li>First List</li>
<li>Second List</li>
<li>Third List</li>
</ul>
How could I search the HTML and pull the third list item's value into a variable? Or is there a way you could pull the entire unordered list into an array?
Has not been tested or compiled, but one way is create a function that utilizes PHP: DOMDocument and its method getElementsByTagName which returns a
PHP: DOMNodeList that you can access the node at a specific index.
function grabAttributes($file, $tag, $index) {
$dom = new DOMDocument();
if (!#$dom->load($file)) {
echo $file . " doesn't exist!\n";
return;
}
$list = $dom->getElementsByTagName($tag); // returns DOMNodeList of given tag
$newElement = $list->item($index)->nodeValue; // initialize variable
return $newElement;
}
If you call grabAttributes("myfile.html", "li", 2) the variable will be set to "Third List"
Or you can make a function to put all the attributes of a given tag into an array.
function putAttributes($file, $tag) {
$dom = new DOMDocument();
if (!#$dom->load($file)) {
echo $file . " doesn't exist!\n";
return;
}
$list = $dom->getElementsByTagName($tag); // returns DOMNodeList of given tag
$myArray = array(); // array to contain values.
foreach ($list as $tag) { // loop through node list and add to an array.
$myArray[] = $tag->nodeValue;
}
return $myArray;
}
If you call putAttributes("myfile.html", "li") it would return array("First List", "Second List", "Third List")