Simple HTML DOM Parser find closest Element

Simple HTML DOM Parser find closest Element - php

I want to find the "nearest Element" for a specific String using (simplehtmldom.sourceforge.net)
For example, if I search for the word "Imprint" I want to get the nearest element over it, therefore I programmed a "recursive function"
function getelementofcontent($html,$search){
for($i=0;strpos($html->childNodes($i)->plaintext,$search)==-1;$i++);
getelementofcontent($html->childNodes($i),$search);
}
So the method is always called when there is still the string... what did I do wrong?

Related

First Element of XPATH

The question looks like the same as : XPath Get first element of subset but it's, I think, a bit different.
Here's the following blog:
http://www.mademoiselledeco.com/
I want to get the first picture of each post. For that, I thought of the following xpath query :
//div[contains(#class,'type-post status-publish')]//img/#src
Following the example of the previous post I mentionned, I also tried:
//div[contains(#class,'type-post status-publish')](//img/#src)[1]
but that says
Warning: DOMXPath::query(): Invalid expression
any idea?
Thanks a lot

OK, I understand, after inspection of the source: each <img> is contained in a <p>, thus img[1] will match all pictures, since they are, in the context of a paragraph, the first image.
In this context, I would rather try getting the first paragraph containing an image:
//div[contains(#class,'type-post status-publish')]//p[img][1]/img/#src
With this XPath I get 9 img/#src.

//div[#class='post-content-container']//p[./img][1]/img
This is not the best solution but I think it would work.
//div[#class='post-content-container']
Should get each post
//p[./img][1]/img
Should get the first paragraph, which contains an image. Then selects the image.

Actually the duplicate question you've picked isn't that far off. It has an explanation in one of it's answers which sounds pretty legit:
The [] operator has a higher precedence (binds stronger) than the // abbreviation.
So the //img abbreviation stands in your way. Let's expand it:
/descendant-or-self::node()/child::img
Adding [1] at the end would select each first img child (which is exactly as others have outlined). This is also the reason why there is higher precedence for the predicate here.
The Abbreviated Syntax section in Xpath 1.0 actually covers this with a note:
NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.
That is: you're not looking for the descendant-or-self axis and any nodes children therein, but just for the first img element in the descendant axis:
/descendant::img[1]
So the xpath expression in full:
//div[contains(#class,'type-post status-publish')]/descendant::img[1]/#src
Result with your example (10):
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/03/Couleur-FionaLynch-Caroline-St.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/02/2-OF-MO-cascade-lumineuse2-1024x398.jpg"
src="https://s-media-cache-ak0.pinimg.com/736x/2e/f7/eb/2ef7eb28dc3e6ac9830cf0f1be7defce.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/couleur-peinture-flamant-vert-trekking.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/Lily-of-the-Valley-Designed-by-Marie-Deroudilhe-02.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/shopping-decoration-jaune-bleu-delamaison-1024x866.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/wikao-cheminee-berlin-mademoiselledeco4.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/voeux2015-mademoiselledeco-blog.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2014/12/suite-novotel-constance-guisset-1.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2014/12/wish-list-decoration-noel-2014.jpg"
I hope this sheds some light.

Using simplehtmldom in PHP how do I get the data-href attribute?

I am using this PHP library to work with a dom :http://simplehtmldom.sourceforge.net/
I am wanting to access the data-href element of a li element on this page:http://www.spareroom.co.uk/flatshare/bristol/
according to the api reference:
http://simplehtmldom.sourceforge.net/manual_api.htm
This code should work - so long as $res represents the li dom node - which in my case it does:
echo $res->data-href;
However when i run that the echo is "0".... when I would expect to see something like :
"/flatshare/fad_click.pl?fad_id=3248085&search_id=&offset=0&city_id=&flatshare_type=offered&search_results=%2Fflatshare%2Fbristol%2F&"
Can somebody please help me to understand what I am doing wrong

$res->data-href is parse as
$res->data - href
i.e. it's a subtraction, because - is not a valid character in an identifier. Try:
$res->{"data-href"}

Since you are accessing a object keys with special characters have to be quoted and surrounded with {}.
So:
echo $res->data-herf
Should be:
echo $res->{"data-href"}
In all honesty though it probably just easier(and safer) to use the method:
$res->getAttribute("data-href");

PHP xpath check for div style and its value

I am trying to parse a html file/strings for two things using php and xpath.
<DIV STYLE="top:110px; left:1280px; width:88px" Class="S0">Aug30</DIV>
I tried to look for an unknown value (here: Aug30) with knowing the style top and left value (here: 110px and 1280px).
And the other way. I know the value Aug30 but want to get its values of top and left.
Perhaps XPATH is not the best way to do this. Any idea on how to solve my problem?
Thanks in advance for your help!

To filter <div> element by style attribute value in XPath you can do something like this :
//div[contains(#style, 'top:110px') and contains(#style, 'left:1280px')]
Above XPath will search for <div> node having style attribute value contains two specific strings.
The other requirement isn't supported in XPath 1.0 as far as I can see. We can get the entire value of style attribute, but getting part of it is a dead end. There are some string functions we can use, even though returning a function's result isn't supported.
You'll need to do that using XPath 2.0 or using the host programming language (PHP in this case).

XPath finding certain node with certain text

So I had a question regarding something similar to that at:
getting a parent node and its child, where the child has a certain text
But the situation change and I find some problems with my above idea.
I am now trying to find a node with specific text because it seems when I use 29 it will also find nodes that has 2999 or anything else that has 29 in it.
So my question is how can I turn :
$myvar = $xpath->query('(//*[text()[contains(., "something")]])');
so it will look for node with certain text. not for a node that "has" certain text.

Just use //*[text() = "29"] instead of contains().

Use:
//*[. = 29]
This selects any element, whose string value, when converted to a number is equal to 29.

closest method in phpQuery?

I'm trying to see if there's a way to implement
the "closest" method on phpQuery (just like it works on jQuery).
Is there such a thing?

From Commonly Confused Bits Of jQuery:
CLOSEST(SELECTOR)
This is a bit of a well-kept secret, but very useful. It works like parents(), except that it returns only one parent/ancestor. In my experience, you’ll normally want to check for the existence of one particular element in an element’s ancestry, not a whole bunch of them, so I tend to use this more than parents().
So as parents() exists in phpQuery you can go with the example from the source
Tip: you can simulate closest() by using parents() and limiting it to one returned element.
$($('#element1').parents('#element2').get(0)).css('background', '#f90');

To actually spell out the phpquery syntax... (took me ages to get this!)
I wanted to extract the 'item' from an RSS feed that contained a specific enclosure (media file link).
function fetch($feed ,$fname)
{
// load the file
phpQuery::newDocumentFileHTML($feed);
// Find the first enclosure that links to the file
// drill up to the parent elements to 'item' take the first (0)
// this is the 'nearest' equivalent
return pq("enclosure[url='" . $fname ."']:first")->parents('item')->xml();
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Simple HTML DOM Parser find closest Element - php

Related

First Element of XPATH

Using simplehtmldom in PHP how do I get the data-href attribute?

PHP xpath check for div style and its value

XPath finding certain node with certain text

closest method in phpQuery?

Categories

Resources