PHP Simple HTML DOM Parser find direct LI elements - php

HTML:
<ul>
<li><a></a>
<ul>
<li></li>
<li></li>
</ul>
</li>
<li>
...
</li>
</ul>
For parent ul:first-of-type, what would be the selector for it's (direct) child li elements, in order to parse the descendant li elements separately?

In Jquery you can simply use this selector : ul > li
Update:-
Using Simple DOM:-
<ul class="listitems">
<li><a></a>
<ul>
<li></li>
<li></li>
</ul>
</li>
<li>
...
</li>
</ul>
Simple HTML Dom code to get just the first level li items:
$html = file_get_html( $url );
$first_level_items = $html->find( '.listitems', 0)->children();
foreach ( $first_level_items as $item ) {
... do stuff ...
}

Related

How to get parent and nested elements by DOMDocument?

In a typical HTML as
<ol>
<li>
<span>parent</span>
<ul>
<li><span>nested 1</span></li>
<li><span>nested 2</span></li>
</ul>
</li>
</ol>
I try to get the contents of <li> elements but I need to get the parent and those nested under ul separately.
If go as
$ols = $doc->getElementsByTagName('ol');
foreach($ols as $ol){
$lis = $ol->getElementsByTagName('li');
// here I need li immediately under <ol>
}
$lis is all li elements including both parent and nested ones.
How can I get li elements one level under ol by ignoring deeper levels?
There are two approaches to this, the first is how you are working with getElementsByTagName(), the idea would be just to pick out the first <li> tag and assume that it is the correct one...
$ols = $doc->getElementsByTagName('ol');
foreach($ols as $ol){
$lis = $ol->getElementsByTagName('li')[0];
echo $doc->saveHTML($lis).PHP_EOL;
}
This echoes...
<li>
<span>parent</span>
<ul>
<li><span>nested 1</span></li>
<li><span>nested 2</span></li>
</ul>
</li>
which should work - BUT is not exact enough at times.
The other method would be to use XPath, where you can specify the levels of the document tags you want to retrieve. This uses //ol/li, which is any <ol> tag with an immediate descendant <li> tag.
$xp = new DOMXPath($doc);
$lis = $xp->query("//ol/li");
foreach ( $lis as $li ) {
echo $doc->saveHTML($li);
}
this also gives...
<li>
<span>parent</span>
<ul>
<li><span>nested 1</span></li>
<li><span>nested 2</span></li>
</ul>
</li>

nested ul li categories function

my categories table
ID,PARENT,NAME,ORDER
ul > li > li ... like this i want sort my data with php function.
first i want take all in array. after use function.
$query = mysql_query("SELECT ID,PARENT,NAME,ORDER FROM categories");
$category = array();
if(mysql_num_rows($query)>0){
while ($rs= mysql_fetch_assoc($query))
$category[$rs['PARENT']][$rs['ORDER']] = array('id'=>$rs['ID'],'name'=>$rs['NAME']);
}
After how can i print according sortable menu ?
<ul>
<li>menu1
<ul>
<li>menu1a</li>
<li>menu1b</li>
</ul>
</li>
<li>menu2
<ul>
<li>menu2a</li>
<li>menu2b</li>
</ul>
</li>
<li>menu3
<ul>
<li>menu3a<ul>
<li>menu3a_a</li>
<li>menu3a_b</li>
</ul></li>
<li>menu3b</li>
</ul>
</li>
</ul>
I think you have to use recurse(0, $category); instead of recurse(0, $categories);
and for every category you have to call this function.

PHP How can I split li items from a ul list by class?

I am trying to use Simple DOM Parser, to parse each li item in the ul list by the class name. Below you can see the code I am trying to parse, but keep in mind it's from a html source code, just selected the code I want to parse.
I have tried with this PHP code, but it doesn't work at all..
include 'simple_html_dom.php';
$url = "http://www.pinterest.com/avast/";
$html = file_get_html( $url );
$first_level_items = $html->find( '.userStats', 0)->children();
foreach ( $first_level_items as $item ) {
print_r($first_level_items);
}
This is the ul list I want to get the li items from.
<ul class="userStats">
<li>
<a href="/avast/boards/" type="button" class="NavigateButton Button Module ButtonBase hasText borderless active BoardCount">
<span class="buttonText">
13 opslagstavler
</span>
</a>
</li>
<li>
<a href="/avast/pins/">
<div class="PinCount Module">
317 pins
</div>
</a>
</li>
<li>
<a href="/avast/likes/">
1 synes om
</a>
</li>
</ul>
Any help would be appreciated!
Have you considered using an xpath query instead?
$xml = new SimpleXMLElement($your_ul_html);
$result = $xml->xpath('//ul[#class="userStats"]/li');
foreach($result as $node) {
print_r($node); // or echo $node-asXML();
}

Remove unnecessary li

echo $nav gives code like this:
<ul>
<li class="someclass">sometext
<ul>
<li class="someclass">sometext</li>
<li class="spacer"></li>
<li class="someclass">sometext</li>
<li class="spacer"></li>
<li class="someclass">sometext</li>
<li class="spacer"></li>
<li class="someclass">sometext</li>
<li class="spacer"></li>
</ul>
</li>
<li class="spacer"></li>
<li class="someclass">sometext</li>
<li class="spacer"></li>
</ul>
There are list items with class spacer inside each child ul, after each normal list item.
How do I remove the spacer list items which are grandchildren of the main list, using PHP?
Example: <ul> <li> <ul> <li class="spacer">
I'm searching for a regular expression, which should erase <li class="spacer"></li> only in a child <ul> element.
If you don't have access to the $nav variable to remove it (which you likely do) then I'd just use CSS to hide it, something like this should work:
li ul li.spacer {
display:none;
}
If however you have access to $nav - delete that spacer li from the code. Simples.
Also, on a side note. having empty elements like that on the page as "spacers" is semantically bad. This should be handled via CSS, add margins/padding on other elements on the page, don't use a class of spacer, if you do then you may as well go back to using stray <br /> tags everywhere to create spaces.
$xml = new SimpleXMLElement($nav);
$spacers = $xml->xpath('li//li[#class="spacer"]');
foreach($spacers as $i => $n) {
unset($spacers[$i][0]);
}
echo $xml->asXML();
This is converting to XML (use a recent PHP 5.3 version and DOMDocument to export to HTML). Output:
<?xml version="1.0"?>
<ul>
<li class="someclass">sometext
<ul>
<li class="someclass">sometext</li>
<li class="someclass">sometext</li>
<li class="someclass">sometext</li>
<li class="someclass">sometext</li>
</ul>
</li>
<li class="spacer"/>
<li class="someclass">sometext</li>
<li class="spacer"/>
</ul>
How about str_replace?
$nav = str_replace('<li class="spacer"></li>','',$nav);
edited code below
Based on the new requirement this code works. I know its hacky and sloppy but it works:
$temp = explode("\n",$nav);
for ($i=0;$i<count($temp);$i++) {
if (strstr($temp[$i],"<ul>")) {
$nested_ul = 1;
}
if (strstr($temp[$i],"</ul>")) {
$nested_ul = 0;
}
if ($nested_ul==0) {
if (!strstr($temp[$i],"spacer")) {
$new_nav .= $temp[$i]."\n";
}
} else {
$new_nav .= $temp[$i]."\n";
}
}
echo $new_nav;
"Easily" is relative. It depends on a few things. If you want, modify where the $nav is getting generated from.
use preg_replace to replace the li tags:
$new_nav = preg_replace('/<li class="spacer"></li>/', '', $nav);
echo $nav;
There are multiple ways:
Do not create it. It will be easier if you do not create something you do not want. It will be easier to maintain. So if you have any control over what is generated into $var string, just change it.
Simply replace it like that: str_replace('<li class="spacer"></li>', $var).
Use some HTML parser and remove the nodes.
Use JavaScript to remove <li class="spacer"></li> on client side.
Use substr_replace and strpos instead of str_replace, and specify an offset just after the first spacer.
http://www.php.net/manual/en/function.substr-replace.php
http://www.php.net/manual/en/function.strpos.php
Add the following CSS
ul ul li.spacer { display: none; }
Try this:
$nav = str_replace('<li class="spacer"></li>', '', $nav);

How to imitate child selector with Simple HTML DOM?

Fellas!
I have one nasty page to parse but can't figure out how to extract correct data blocks from it using Simple HTML DOM, because it has no CSS child selector support.
HTML:
<ul class="ul-block">
<li>xxx</li>
<li>xxx</li>
<li>
<ul>
<li>xxx2</li>
</ul>
</ul>
How would I extract (direct) child li elements of parent ul.ul-block?
The $node->find('ul[class=ul-block] > li'); doesn't work and $node->find('ul[class=ul-block] li'); ofc finds also nested descandant li elements :(
I had the same issue, and used the children method to grab just the first level items.
<ul class="my-list">
<li>
Some Text
<ul>
<li>Some Inner Text</li>
<li>Some Inner Text</li>
<li>Some Inner Text</li>
<li>Some Inner Text</li>
</ul>
</li>
<li>
Some Text
<ul>
<li>Some Inner Text</li>
<li>Some Inner Text</li>
<li>Some Inner Text</li>
<li>Some Inner Text</li>
</ul>
</li>
</ul>
And here's the Simple HTML Dom code to get just the first level li items:
$html = file_get_html( $url );
$first_level_items = $html->find( '.my-list', 0)->children();
foreach ( $first_level_items as $item ) {
... do stuff ...
}
Simple example with php DOM:
$dom = new DomDocument;
$dom->loadHtml('
<ul class="ul-block">
<li>a</li>
<li>b</li>
<li>
<ul>
<li>c</li>
</ul>
</li>
</ul>
');
$xpath = new DomXpath($dom);
foreach ($xpath->query('//ul[#class="ul-block"]/li') as $liNode) {
echo $liNode->nodeValue, '<br />';
}

Categories