php xpath find node with descendant having specific attribute - php

I have a multi level <ul><li> menu structure. I am trying to get the top level LI that has a descendant LI with a class of "current-menu-line".
My HTML
<ul id="menu1" class="nav">
<li class="menu-item dorment-menu-item">welcome</li>
<li class="menu-item dorment-menu-item">evidence</li>
<li class="menu-item dorment-menu-item">network</li>
<li class="menu-item dorment-menu-item">Multi page
<ul class="subnav">
<li class="menu-item current-menu-line">page 1</li>
</ul>
</li>
</ul>
I have tried a number of xPath queries to get the topmost <li> that has the descendant with no success
$query = '//li[contains(#class, "current-menu-line")]'; // this one works ... makes sure LI.current-menu-line can be found
$query = '/ul/li[li[contains(#class, "current-menu-line")]]'; // returns 0 rows
$query = '/ul/li[//li[contains(#class, "current-menu-line")]]'; // returns 0 rows
$query = '/ul/li[descendant::li[contains(#class, "current-menu-line")]]'; // returns 0 rows
// added after examining "Select elements which has certain descendent using Xpath"
$query = '/ul/li[.//li[contains(#class, "current-menu-line")]]' ; // returns 0 nodes
$query = "/ul/li[descendant::li[#class, 'current-menu-line')]]"; // invalid query
Is someone able to tell me the correct query to get the correct LI ?
Cheers

This selection should fit your description:
/ul/li[.//li[contains(#class, 'current-menu-line')]]
I see that you tried this already and it did not return any nodes for you, but I suspect there might be a difference in the XML-snipped that you provided to the actual XML that you use. I verified the XPath using an online XPath evaluator and it returns the correct result even if the nested structure is more than two levels below.

Related

How can I count the amount of lines of an HTML code with PHP?

I have some HTML generated by a WYSIWYG-editor (WordPress).
I'd like to show a preview of this HTML, by only showing up to 3 lines of text (in HTML format).
Example HTML: (always formated with new lines)
<p>Hello, this is some generated HTML.</p>
<ol>
<li>Some list item<li>
<li>Some list item</li>
<li>Some list item</li>
</ol>
I'd like to preview a maximum of 4 lines of text in this formated HTML.
Example preview to display: (numbers represent line numbers, not actual output).
Hello, this is some generated HTML.
Some list item
Some list item
Would this be possible with Regex, or is there any other method that I could use?
I know this would be possible with JavaScript in a 'hacky' way, as questioned and answered on this post.
But I'd like to do this purely on the server-side (with PHP), possibly with SimpleXML?
It's really easy with XPath:
$string = '<p>Hello, this is some generated HTML.</p>
<ol>
<li>Some list item</li>
<li>Some list item</li>
<li>Some list item</li>
</ol>';
// Convert to SimpleXML object
// A root element is required so we can just blindly add this
// or else SimpleXMLElement will complain
$xml = new SimpleXMLElement('<root>'.$string.'</root>');
// Get all the text() nodes
// I believe there is a way to select non-empty nodes here but we'll leave that logic for PHP
$result = $xml->xpath('//text()');
// Loop the nodes and display 4 non-empty text nodes
$i = 0;
foreach( $result as $key => $node )
{
if(trim($node) !== '')
{
echo ++$i.'. '.htmlentities(trim($node)).'<br />'.PHP_EOL;
if($i === 4)
{
break;
}
}
}
Output:
1. Hello, this is some generated HTML.<br />
2. Some list item<br />
3. Some list item<br />
4. Some list item<br />
I have personally coded the following function, which isn't perfect, but works fine for me.
function returnHtmlLines($html, $amountOfLines = 4) {
$lines_arr = array_values(array_filter(preg_split('/\n|\r/', $html)));
$linesToReturn = array_slice($lines_arr, 0, $amountOfLines);
return preg_replace('/\s{2,}/m', '', implode('', $linesToReturn));
}
Which returns the following HTML when using echo:
<p>Hello, this is some generated HTML.</p><ol><li>Some list item<li><li>Some list item</li>
Or formatted:
<p>Hello, this is some generated HTML.</p>
<ol>
<li>Some list item<li>
<li>Some list item</li>
Browsers will automatically close the <ol> tag, so it works fine for my needs.
Here is a Sandbox example

PHP simple html dom file_get_html how to extract/break after div

I am trying to fetch something through file_get_html everything seems fine but stuck on one part, i am using foreach function to extract a <li> and the problem is it is extracting all the li under that specific div or ul please read the following code.
Trying to extract from these lines
<ul class="x-name y-name">
<li>example 1</li>
<li>example 2</li>
<li>example 3</li>
</ul>
using
foreach($html->find("div#master") as $e){
foreach ($e->find("otherobject]") as $es){
$val["myuse-".strip_tags($es)] = "";
foreach($e->find("ul[class='x-name b-name'] li") as $elist){
$val["myuse-".strip_tags($es)] .= "<li>".strip_tags($elist->innertext)."</li>";
}
}
}
The problem is its not returning after completion of 3 <li> it is continuing drilling down to all the same name <li> What I want it to stop after the <li> numbers are complete and return to the previous foreach and when the next name is selected then it will search for next ul li.

Get last li content Xpath php

I have next html:
<ul class="pages">
<li class="page-1 active"><a data-page="1" href="/sold?page=1">1</a></li>
<li class="page-2"><a data-page="2" href="/sold?page=2">2</a></li>
<li class="page-3"><a data-page="3" href="/sold?page=3">3</a></li>
<li>...</li>
<li class="page-975"><a data-page="975" href="/sold?page=975">975</a></li>
</ul>
I am trying to get the last li's text which contains the number of the last page (in my example it is 975) with help of Xpath.
I've tried something like:
$page_count = $xpath->query(".//ul[#class='pages']/li/a[last()]/text()")->item(0)->textContent;
but it doesn't work.
what would be the correct query to get last li's text?
try this one:
$page_count = $xpath->query(".//ul[#class='pages']/li[last()]/a/text()")->item(0)->nodeValue;
first: you need to grab the last li and then the text of its a-element
Your version basically searches all the li-elements and within them searches for the last a-element (they only have one) and then their text attribute. So you basically got a list of all the texts.
second: try nodeValue instead of textContent

Wrap segments of HTML with divs (and generate table of contents from HTML-tags) with PHP

My original HTML looks something like this:
<h1>Page Title</h1>
<h2>Title of segment one</h2>
<img src="img.jpg" alt="An image of segment one" />
<p>Paragraph one of segment one</p>
<h2>Title of segment two</h2>
<p>Here is a list of blabla of segment two</p>
<ul>
<li>List item of segment two</li>
<li>Second list item of segment two</li>
</ul>
Now, using PHP (not jQuery), I want to alter it, like so:
<h1>Page Title</h1>
<div class="pane">
<h2>Title of segment one</h2>
<img src="img.jpg" alt="An image of segment one" />
<p>Paragraph one of segment one</p>
</div>
<div class="pane">
<h2>Title of segment two</h2>
<p>Here is a list of blabla of segment two</p>
<ul>
<li>List item of segment two</li>
<li>Second list item of segment two</li>
</ul>
</div>
So basically, I wish to wrap all HTML between sets of <h2></h2> tags with <div class="pane" /> The HTML above would already allow me to create an accordion with jQuery, which is fine, but I would like to go a little bit further:
I wish to create an ul of all the <h2></h2>sets that were affected, like so:
<ul class="tabs">
<li>Title of segment one</li>
<li>Title of segment two</li>
</ul>
Please note that I'm using jQuery tools tabs, to implement the JavaScript part of this system, and it does not require that the hrefs of the .tabs point to their specific h2 counterparts.
My first guess would be to use regular expressions, but I've also seen some people talking about DOM Document
Two solutions exist for this problem in jQuery, but I really need a PHP equivalent:
https://stackoverflow.com/questions/7968303/wrapping-a-series-of-elements-between-two-h2-tags-with-jquery
Automatically generate nested table of contents based on heading tags
Could anyone please practically assist me please?
The DOMDocument can help you with that. I've answered a similar question before:
using regex to wrap images in tags
Update
Full code sample included:
$d = new DOMDocument;
libxml_use_internal_errors(true);
$d->loadHTML($html);
libxml_clear_errors();
$segments = array(); $pane = null;
foreach ($d->getElementsByTagName('h2') as $h2) {
// first collect all nodes
$pane_nodes = array($h2);
// iterate until another h2 or no more siblings
for ($next = $h2->nextSibling; $next && $next->nodeName != 'h2'; $next = $next->nextSibling) {
$pane_nodes[] = $next;
}
// create the wrapper node
$pane = $d->createElement('div');
$pane->setAttribute('class', 'pane');
// replace the h2 with the new pane
$h2->parentNode->replaceChild($pane, $h2);
// and move all nodes into the newly created pane
foreach ($pane_nodes as $node) {
$pane->appendChild($node);
}
// keep title of the original h2
$segments[] = $h2->nodeValue;
}
// make sure we have segments (pane is the last inserted pane in the dom)
if ($segments && $pane) {
$ul = $d->createElement('ul');
foreach ($segments as $title) {
$li = $d->createElement('li');
$a = $d->createElement('a', $title);
$a->setAttribute('href', '#');
$li->appendChild($a);
$ul->appendChild($li);
}
// add as sibling of last pane added
$pane->parentNode->appendChild($ul);
}
echo $d->saveHTML();
Use PHP DOM functions to perform this task.
..a nice PHP html parser is what you need.
This one is good.
Its a PHP equivalent to jquery.

Regexp to insert string in the beginning of anchor tag?

I need to insert a string directly after the open anchor ends (where the anchor content starts).
Here is my code:
<ul id="menu-topmenu2" class="menu">
<li id="menu-item-5" class="menu-item menu-item-type-post_type menu-item-5">
<a href="http://localhost/domain/barnlager.se/?page_id=2">
About
</a>
</li>
<li id="menu-item-5" class="menu-item menu-item-type-post_type menu-item-5">
<a href="http://localhost/domain/barnlager.se/?page_id=2">
Services
</a>
</li>
</ul>
In this example I need content before "About" and "Services". A short regexp should do it? The HTML code above can be a string called $content.
I use PHP. Thanks!
I'd use parser, DOM for instance:
$content = '...your html string...';
$doc = new DOMDocument();
$doc->loadHTML('<html><body>'.$content.'</body></html>');
$x = new DOMXPath($doc);
foreach($x->query('//a') as $anchor){
// strrev(trim($anchor->nodeValue))) is just an example. put anything you like.
$anchor->insertBefore(new DOMText(strrev(trim($anchor->nodeValue))),$anchor->firstChild);
}
echo $doc->saveXML($doc->getElementsByTagName('ul')->item(0));
And as an added bonus it throws a warning you have defined id="menu-item-5" twice in your HTML, which is not valid.
You can find every anchor tag with /<a.*?>/i. If you want to replace something after that, the call would look like preg_replace("/(<a.*?>)/", '$1YOUR ADDITIONAL TEXT', $content).
If for whatever reason you need a double-quoted string as the replacement argument, make sure to backslash-escape the $1.

Categories