Get parent of child element with xpath in Symfony2 Crawler - php

<ul id="menu">
<li><a href='#'>First Item</a></li>
<li><a href='#'>Second Item</a></li>
</ul>
I can access all links via xpath query.
$result = $crawler->filterXPath('//ul[#id="menu"]/li/a');
but i wonder if is it possible to access parent element of child element using filterXPath() method without editing xpath query in PHP DomCrawler ?
For example i want to access //ul[#id="menu"] using node element in each() method.
$result = $crawler->filterXPath('//ul[#id="menu"]/li/a');
if($result->count() < 1){
exit('Query not found.');
}
$result->each(function (Crawler $node)){
$parentOfNode = $node->parent() // ??
//...
};

Related

PHP simple html dom file_get_html how to extract/break after div

I am trying to fetch something through file_get_html everything seems fine but stuck on one part, i am using foreach function to extract a <li> and the problem is it is extracting all the li under that specific div or ul please read the following code.
Trying to extract from these lines
<ul class="x-name y-name">
<li>example 1</li>
<li>example 2</li>
<li>example 3</li>
</ul>
using
foreach($html->find("div#master") as $e){
foreach ($e->find("otherobject]") as $es){
$val["myuse-".strip_tags($es)] = "";
foreach($e->find("ul[class='x-name b-name'] li") as $elist){
$val["myuse-".strip_tags($es)] .= "<li>".strip_tags($elist->innertext)."</li>";
}
}
}
The problem is its not returning after completion of 3 <li> it is continuing drilling down to all the same name <li> What I want it to stop after the <li> numbers are complete and return to the previous foreach and when the next name is selected then it will search for next ul li.

Find immediate descendants with PHP Simple DOM parser

I would like to be able to do the equivalent of
$html->find("#foo>ul")
But the PHP Simple DOM library doesn't recognize the "immediate descendant" selector > and so finds all <ul> items under #foo including those that are nested deeper in the dom.
What would you recommend as the best way to grab the immediate descendants that are of a specific type?
You can use DomElementFilter to fetch the desired type of nodes under some Dom branch. This is described here:
PHP DOM: How to get child elements by tag name in an elegant manner?
Or do a regular loop on all childNodes and filter then by their tag name by yourself:
foreach ($parent->childNodes as $node)
if ($node->nodeName == "tagname1")
...
HTML snippet
<div id="foo">
<ul>
<li>1</li>
</ul>
<ul>
<li>2</li>
</ul>
<ul>
<li>3</li>
</ul>
</div>
PHP code to get FIRST <ul>
echo $html->find('#foo>ul', 0);
this will output
<ul>
<li>1</li>
</ul>
but if you want to get just 1 from first <ul>
echo $html->find('#foo>ul', 0)->plaintext;
Just to share the solutions i found in related posts and to put it in a nutshell:
"Find immediate descendants with PHP Simple DOM parser" works both with...
...PHP Simple DOM:
//if there is only one div containing your searched tag
foreach ($html->find('div.with-given-class')[0]->children() as $div_with_given_class) {
if ($div_with_given_class->tag == 'tag-you-are-searching-for') {
$output [] = $div_with_given_class->plaintext; //or whatever you want
}
}
//if there are more divs with a given class (better solution)
$all_divs_with_given_class =
$html->find('div.with-given-class');
foreach ($all_divs_with_given_class as $single_div_with_given_class) {
foreach ($single_div_with_given_class->children() as $children) {
if ($children->tag == 'tag-you-are-searching-for') {
$output [] = $children->plaintext; //or whatever you want
}
}
}
...and also PHP DOM/xpath:
$all_divs_with_given_class =
$xpath->query("//div[#class='with-given-class']/tag-you-are-searching-for");
if (!is_null($all_divs_with_given_class)) {
foreach ($all_divs_with_given_class as $tag-you-are-searching-for) {
$ouput [] = $tag-you-are-searching-for->nodeValue; //or whatever you want
}
}
Note that you have to use single slashes "/" in the xpath to find immediate descendants only.

php xpath find node with descendant having specific attribute

I have a multi level <ul><li> menu structure. I am trying to get the top level LI that has a descendant LI with a class of "current-menu-line".
My HTML
<ul id="menu1" class="nav">
<li class="menu-item dorment-menu-item">welcome</li>
<li class="menu-item dorment-menu-item">evidence</li>
<li class="menu-item dorment-menu-item">network</li>
<li class="menu-item dorment-menu-item">Multi page
<ul class="subnav">
<li class="menu-item current-menu-line">page 1</li>
</ul>
</li>
</ul>
I have tried a number of xPath queries to get the topmost <li> that has the descendant with no success
$query = '//li[contains(#class, "current-menu-line")]'; // this one works ... makes sure LI.current-menu-line can be found
$query = '/ul/li[li[contains(#class, "current-menu-line")]]'; // returns 0 rows
$query = '/ul/li[//li[contains(#class, "current-menu-line")]]'; // returns 0 rows
$query = '/ul/li[descendant::li[contains(#class, "current-menu-line")]]'; // returns 0 rows
// added after examining "Select elements which has certain descendent using Xpath"
$query = '/ul/li[.//li[contains(#class, "current-menu-line")]]' ; // returns 0 nodes
$query = "/ul/li[descendant::li[#class, 'current-menu-line')]]"; // invalid query
Is someone able to tell me the correct query to get the correct LI ?
Cheers
This selection should fit your description:
/ul/li[.//li[contains(#class, 'current-menu-line')]]
I see that you tried this already and it did not return any nodes for you, but I suspect there might be a difference in the XML-snipped that you provided to the actual XML that you use. I verified the XPath using an online XPath evaluator and it returns the correct result even if the nested structure is more than two levels below.

Wrap segments of HTML with divs (and generate table of contents from HTML-tags) with PHP

My original HTML looks something like this:
<h1>Page Title</h1>
<h2>Title of segment one</h2>
<img src="img.jpg" alt="An image of segment one" />
<p>Paragraph one of segment one</p>
<h2>Title of segment two</h2>
<p>Here is a list of blabla of segment two</p>
<ul>
<li>List item of segment two</li>
<li>Second list item of segment two</li>
</ul>
Now, using PHP (not jQuery), I want to alter it, like so:
<h1>Page Title</h1>
<div class="pane">
<h2>Title of segment one</h2>
<img src="img.jpg" alt="An image of segment one" />
<p>Paragraph one of segment one</p>
</div>
<div class="pane">
<h2>Title of segment two</h2>
<p>Here is a list of blabla of segment two</p>
<ul>
<li>List item of segment two</li>
<li>Second list item of segment two</li>
</ul>
</div>
So basically, I wish to wrap all HTML between sets of <h2></h2> tags with <div class="pane" /> The HTML above would already allow me to create an accordion with jQuery, which is fine, but I would like to go a little bit further:
I wish to create an ul of all the <h2></h2>sets that were affected, like so:
<ul class="tabs">
<li>Title of segment one</li>
<li>Title of segment two</li>
</ul>
Please note that I'm using jQuery tools tabs, to implement the JavaScript part of this system, and it does not require that the hrefs of the .tabs point to their specific h2 counterparts.
My first guess would be to use regular expressions, but I've also seen some people talking about DOM Document
Two solutions exist for this problem in jQuery, but I really need a PHP equivalent:
https://stackoverflow.com/questions/7968303/wrapping-a-series-of-elements-between-two-h2-tags-with-jquery
Automatically generate nested table of contents based on heading tags
Could anyone please practically assist me please?
The DOMDocument can help you with that. I've answered a similar question before:
using regex to wrap images in tags
Update
Full code sample included:
$d = new DOMDocument;
libxml_use_internal_errors(true);
$d->loadHTML($html);
libxml_clear_errors();
$segments = array(); $pane = null;
foreach ($d->getElementsByTagName('h2') as $h2) {
// first collect all nodes
$pane_nodes = array($h2);
// iterate until another h2 or no more siblings
for ($next = $h2->nextSibling; $next && $next->nodeName != 'h2'; $next = $next->nextSibling) {
$pane_nodes[] = $next;
}
// create the wrapper node
$pane = $d->createElement('div');
$pane->setAttribute('class', 'pane');
// replace the h2 with the new pane
$h2->parentNode->replaceChild($pane, $h2);
// and move all nodes into the newly created pane
foreach ($pane_nodes as $node) {
$pane->appendChild($node);
}
// keep title of the original h2
$segments[] = $h2->nodeValue;
}
// make sure we have segments (pane is the last inserted pane in the dom)
if ($segments && $pane) {
$ul = $d->createElement('ul');
foreach ($segments as $title) {
$li = $d->createElement('li');
$a = $d->createElement('a', $title);
$a->setAttribute('href', '#');
$li->appendChild($a);
$ul->appendChild($li);
}
// add as sibling of last pane added
$pane->parentNode->appendChild($ul);
}
echo $d->saveHTML();
Use PHP DOM functions to perform this task.
..a nice PHP html parser is what you need.
This one is good.
Its a PHP equivalent to jquery.

Extracting node values using XPath

There is a section of amazon.com from which I want to extract the data (node value only, not the link) for each item.
The value I'm looking for is inside and <span class="narrowValue">
<ul data-typeid="n" id="ref_1000">
<li style="margin-left: -18px">
<a href="/s/ref=sr_ex_n_0?rh=i%3Aaps%2Ck%3Ahow+to+grow+tomatoes&sort=salesrank&keywords=how+to+grow+tomatoes&ie=UTF8&qid=1327603358">
<span class="expand">Any Department</span>
</a>
</li>
<li style="margin-left: 8px">
<strong>Books</strong>
</li>
<li style="margin-left: 6px">
<a href="/s/ref=sr_nr_n_0?rh=k%3Ahow+to+grow+tomatoes%2Cn%3A283155%2Cp_n_feature_browse-bin%3A618073011%2Cn%3A%211000%2Cn%3A48&bbn=1000&sort=salesrank&keywords=how+to+grow+tomatoes&ie=UTF8&qid=1327603358&rnid=1000">
<span class="refinementLink">Crafts, Hobbies & Home</span><span class="narrowValue">(19)</span>
</a>
</li>
<li style="margin-left: 6px">
<a href="/s/ref=sr_nr_n_1?rh=k%3Ahow+to+grow+tomatoes%2Cn%3A283155%2Cp_n_feature_browse-bin%3A618073011%2Cn%3A%211000%2Cn%3A10&bbn=1000&sort=salesrank&keywords=how+to+grow+tomatoes&ie=UTF8&qid=1327603358&rnid=1000">
<span class="refinementLink">Health, Fitness & Dieting</span><span class="narrowValue">(3)</span>
</a>
</li>
<li style="margin-left: 6px">
<a href="/s/ref=sr_nr_n_2?rh=k%3Ahow+to+grow+tomatoes%2Cn%3A283155%2Cp_n_feature_browse-bin%3A618073011%2Cn%3A%211000%2Cn%3A6&bbn=1000&sort=salesrank&keywords=how+to+grow+tomatoes&ie=UTF8&qid=1327603358&rnid=1000">
<span class="refinementLink">Cookbooks, Food & Wine</span><span class="narrowValue">(2)</span>
</a>
</li>
</ul>
How could I do this with XPath?
the code is from the link amazon kindle search
currently i am trying
$rank=array();
$words = $xpath->query('//ul[#id="ref_1000"]/li/a/span[#class="refinementLink"]');
foreach ($words as $word) {
$rank[]=(trim($word->nodeValue));
}
var_dump($rank);
The following expression should work:
//*[#id='ref_1000']/li/a/span[#class='narrowValue']
For better performance you could provide a direct path to the start of this expression, but the one provided is more flexible (given that you probably need this to work across multiple pages).
Keep in mind, also, that your HTML parser might generate a different result tree than the one produced by Firebug (where I tested). Here's an even more flexible solution:
//*[#id='ref_1000']//span[#class='narrowValue']
Flexibility comes with potential performance (and accuracy) costs, but it's often the only choice when dealing with tag soup.
If you need to grap the categories names:
// Suppress invalid markup warnings
libxml_use_internal_errors(true);
// Create SimpleXML object
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html); // $html - string fetched by CURL
$xml = simplexml_import_dom($doc);
// Find a category nodes
$categories = $xml->xpath("//span[#class='refinementLink']");
EDIT. Using DOMDocument
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
// Select the parent node
$categories = $xpath->query("//span[#class='refinementLink']/..");
foreach ($categories as $category) {
echo '<pre>';
echo $category->childNodes->item(1)->firstChild->nodeValue;
echo $category->childNodes->item(2)->firstChild->nodeValue;
echo '</pre>';
// Crafts, Hobbies & Home (19)
}
I'd highly recommend you checkout the phpQuery library. It's essentially the jQuery selectors engine for PHP, so to get at the text you're wanting you could do something like:
foreach (pq('span.refinementLink') as $p) {
print $p->text() . "\n";
}
That should output something like:
Crafts, Hobbies & Home
Health, Fitness & Dieting
Cookbooks, Food & Wine
It's by far the easiest screen scraping, DOM parsing thing I know of for PHP.

Categories