The question looks like the same as : XPath Get first element of subset but it's, I think, a bit different.
Here's the following blog:
http://www.mademoiselledeco.com/
I want to get the first picture of each post. For that, I thought of the following xpath query :
//div[contains(#class,'type-post status-publish')]//img/#src
Following the example of the previous post I mentionned, I also tried:
//div[contains(#class,'type-post status-publish')](//img/#src)[1]
but that says
Warning: DOMXPath::query(): Invalid expression
any idea?
Thanks a lot
OK, I understand, after inspection of the source: each <img> is contained in a <p>, thus img[1] will match all pictures, since they are, in the context of a paragraph, the first image.
In this context, I would rather try getting the first paragraph containing an image:
//div[contains(#class,'type-post status-publish')]//p[img][1]/img/#src
With this XPath I get 9 img/#src.
//div[#class='post-content-container']//p[./img][1]/img
This is not the best solution but I think it would work.
//div[#class='post-content-container']
Should get each post
//p[./img][1]/img
Should get the first paragraph, which contains an image. Then selects the image.
Actually the duplicate question you've picked isn't that far off. It has an explanation in one of it's answers which sounds pretty legit:
The [] operator has a higher precedence (binds stronger) than the // abbreviation.
So the //img abbreviation stands in your way. Let's expand it:
/descendant-or-self::node()/child::img
Adding [1] at the end would select each first img child (which is exactly as others have outlined). This is also the reason why there is higher precedence for the predicate here.
The Abbreviated Syntax section in Xpath 1.0 actually covers this with a note:
NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.
That is: you're not looking for the descendant-or-self axis and any nodes children therein, but just for the first img element in the descendant axis:
/descendant::img[1]
So the xpath expression in full:
//div[contains(#class,'type-post status-publish')]/descendant::img[1]/#src
Result with your example (10):
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/03/Couleur-FionaLynch-Caroline-St.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/02/2-OF-MO-cascade-lumineuse2-1024x398.jpg"
src="https://s-media-cache-ak0.pinimg.com/736x/2e/f7/eb/2ef7eb28dc3e6ac9830cf0f1be7defce.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/couleur-peinture-flamant-vert-trekking.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/Lily-of-the-Valley-Designed-by-Marie-Deroudilhe-02.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/shopping-decoration-jaune-bleu-delamaison-1024x866.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/wikao-cheminee-berlin-mademoiselledeco4.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/voeux2015-mademoiselledeco-blog.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2014/12/suite-novotel-constance-guisset-1.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2014/12/wish-list-decoration-noel-2014.jpg"
I hope this sheds some light.
Related
In the following PHP code DOMDocument::getElementById returns the node <a name="test">instead of the node <div id="test">:
<?php
$doc = new DOMDocument();
$doc->loadHTML('<a name="test"></a><div id="test"></div>'); // triggers duplicate ID warning
echo $doc->getElementById("test")->nodeName; // outputs "a"
?>
This happens only for <a>nodes. Is this intended?
JavaScript handles it as I expected:
<script>
window.addEventListener('DOMContentLoaded', function() {
document.body.innerHTML = '<a name="test"></a><div id="test"></div>';
console.log(document.getElementById('test'));
});
</script>
EDIT (question was marked as duplicate): This question is not about wether I should use name or id and also not about using both name and id, but why PHP finds nodes with name attribute when I search for an id.
As of HTML5, the name attribute isn't supported in a tags so it looks like it's changed to an id attribute.
This is most likely a hold-over from PHP emulating old IE behaviour. In IE 7 and earlier, document.getElementById() did indeed treat name attributes on <a> elements as if they were id attributes and so would match the <a> element rather than the <div> element. IE has long since moved on, but PHP it seems, on this point, has not.
As you can read here
For HTML documents (and the text/html MIME type), the following processing model must be followed to determine what the indicated part of the document is.
Parse the URL, and let fragid be the component of the URL.
If fragid is the empty string, then the indicated part of the document is the top of the document.
If there is an element in the DOM that has an ID exactly equal to fragid, then the first such element in tree order is the indicated part of the document; stop the algorithm here.
If there is an a element in the DOM that has a name attribute whose value is exactly equal to fragid, then the first such element in tree order is the indicated part of the document; stop the algorithm here.
Otherwise, there is no indicated part of the document.
When I pasted your code to PHP sandbox I noticed an interesting warning:
I am trying to parse a html file/strings for two things using php and xpath.
<DIV STYLE="top:110px; left:1280px; width:88px" Class="S0">Aug30</DIV>
I tried to look for an unknown value (here: Aug30) with knowing the style top and left value (here: 110px and 1280px).
And the other way. I know the value Aug30 but want to get its values of top and left.
Perhaps XPATH is not the best way to do this. Any idea on how to solve my problem?
Thanks in advance for your help!
To filter <div> element by style attribute value in XPath you can do something like this :
//div[contains(#style, 'top:110px') and contains(#style, 'left:1280px')]
Above XPath will search for <div> node having style attribute value contains two specific strings.
The other requirement isn't supported in XPath 1.0 as far as I can see. We can get the entire value of style attribute, but getting part of it is a dead end. There are some string functions we can use, even though returning a function's result isn't supported.
You'll need to do that using XPath 2.0 or using the host programming language (PHP in this case).
I'm using Mink and the Selenium2 Driver with Behat to run some acceptance tests and for the most part, everything is going well.
However, I'm trying to target an element based on a data-* attribute with XPath, and the test chokes on it.
I've used XPathHelper and FirePath and my XPath checks out in both of those extensions:
//html//#data-search-id='images'
That appears to target the correct element.
However, when I add the following to FeatureContext.php
$el = $this->getSession()->getPage()->find('xpath', $this->getSession()->getSelectorsHandler()->selectorToXpath('xpath', "//#data-search-id='images'"));
$el->click();
I get the following error from Behat:
invalid selector: Unable to locate an element with the xpath expression
//html//#data-search-id='images' because of the following error:
TypeError: Failed to execute 'evaluate' on 'Document':
The result is not a node set, and therefore cannot be converted
to the desired type.
Your XPath expression is a totally valid expression – it will find all #data-search-id attributes and return true if one of them is 'images', otherwise false.
But you want to click an item, and obviously clicking a boolean value is rather difficult. Query for the item fulfilling the condition instead (thus, move the comparison into a predicate):
//html//*[#data-search-id='images']
Additionally, I'd remove the //html. The HTML node must be the root node anyway, so /html would have been fine (no reason for searching it in all subtree). As you're searching for an arbitrary descendent of it, and this will not be the root node (as <html/> is), omitting it completely does not change the meaning of the XPath expression.
//*[#data-search-id='images']
I think the XPath you're looking for is:
//html//*[#data-search-id='images']
So I had a question regarding something similar to that at:
getting a parent node and its child, where the child has a certain text
But the situation change and I find some problems with my above idea.
I am now trying to find a node with specific text because it seems when I use 29 it will also find nodes that has 2999 or anything else that has 29 in it.
So my question is how can I turn :
$myvar = $xpath->query('(//*[text()[contains(., "something")]])');
so it will look for node with certain text. not for a node that "has" certain text.
Just use //*[text() = "29"] instead of contains().
Use:
//*[. = 29]
This selects any element, whose string value, when converted to a number is equal to 29.
I am trying to access a specific element of the Dom using XPath
Here is an example
<table>
<tbody>
<tr>
<td>
<b>1</b> data<br>
<b>2</b> data<br>
<b>3</b> data<br>
</td>
</tr>
</tbody>
</table>
I want to target "table td" so my query in Xpath is something like
$finder->query('//table/td');
only this doesn't return the td as its a sub child and direct access would be done using
$finder->query('//tr/td');
Is there a better way to write the query which would allow me to use something like the first example ignoring the elements in-between and return the TD?
Is there a better way to write the query which would allow me to use
something like the first example ignoring the elements in-between and
return the TD?
You can write:
//table//td
However, is this really "better"?
In many cases the evaluation of the XPath pseudo-operator // can result in significant inefficiency as it causes the whole subtree rooted in the context-node to be traversed.
Whenever the path to the wanted nodes is statically known, it may be more efficient to replace any // with the specific, known path, thus avoiding the complete subtree traversal.
For the provided XML document, such expression is:
/*/*/tr/td
If there is more than one table element, each a child of the top element and we want to select only the tds of the forst table, a good, specific expression is:
/*/table[1]/*/tr/td
If we want to select only the first td of the first table in the same document, a good way to do this would be:
(/*/table[1]/*/tr//td)[1]
Or if we want to select the first td in the XML document (not knowing its structure in advance), then we could specify this:
(//td)[1]
What you are looking for is:
$finder->query('//table//td');
Oh boy oh boy, there's something not seen often.
As for your first xpath query, you can just return what you want but use double // on before tagnames
But, I don't see why you don't just want to get the td's by tagname...
You can write this way too:-
$finder->query('//td');