I am trying to parse a html file/strings for two things using php and xpath.
<DIV STYLE="top:110px; left:1280px; width:88px" Class="S0">Aug30</DIV>
I tried to look for an unknown value (here: Aug30) with knowing the style top and left value (here: 110px and 1280px).
And the other way. I know the value Aug30 but want to get its values of top and left.
Perhaps XPATH is not the best way to do this. Any idea on how to solve my problem?
Thanks in advance for your help!
To filter <div> element by style attribute value in XPath you can do something like this :
//div[contains(#style, 'top:110px') and contains(#style, 'left:1280px')]
Above XPath will search for <div> node having style attribute value contains two specific strings.
The other requirement isn't supported in XPath 1.0 as far as I can see. We can get the entire value of style attribute, but getting part of it is a dead end. There are some string functions we can use, even though returning a function's result isn't supported.
You'll need to do that using XPath 2.0 or using the host programming language (PHP in this case).
Related
The question looks like the same as : XPath Get first element of subset but it's, I think, a bit different.
Here's the following blog:
http://www.mademoiselledeco.com/
I want to get the first picture of each post. For that, I thought of the following xpath query :
//div[contains(#class,'type-post status-publish')]//img/#src
Following the example of the previous post I mentionned, I also tried:
//div[contains(#class,'type-post status-publish')](//img/#src)[1]
but that says
Warning: DOMXPath::query(): Invalid expression
any idea?
Thanks a lot
OK, I understand, after inspection of the source: each <img> is contained in a <p>, thus img[1] will match all pictures, since they are, in the context of a paragraph, the first image.
In this context, I would rather try getting the first paragraph containing an image:
//div[contains(#class,'type-post status-publish')]//p[img][1]/img/#src
With this XPath I get 9 img/#src.
//div[#class='post-content-container']//p[./img][1]/img
This is not the best solution but I think it would work.
//div[#class='post-content-container']
Should get each post
//p[./img][1]/img
Should get the first paragraph, which contains an image. Then selects the image.
Actually the duplicate question you've picked isn't that far off. It has an explanation in one of it's answers which sounds pretty legit:
The [] operator has a higher precedence (binds stronger) than the // abbreviation.
So the //img abbreviation stands in your way. Let's expand it:
/descendant-or-self::node()/child::img
Adding [1] at the end would select each first img child (which is exactly as others have outlined). This is also the reason why there is higher precedence for the predicate here.
The Abbreviated Syntax section in Xpath 1.0 actually covers this with a note:
NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.
That is: you're not looking for the descendant-or-self axis and any nodes children therein, but just for the first img element in the descendant axis:
/descendant::img[1]
So the xpath expression in full:
//div[contains(#class,'type-post status-publish')]/descendant::img[1]/#src
Result with your example (10):
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/03/Couleur-FionaLynch-Caroline-St.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/02/2-OF-MO-cascade-lumineuse2-1024x398.jpg"
src="https://s-media-cache-ak0.pinimg.com/736x/2e/f7/eb/2ef7eb28dc3e6ac9830cf0f1be7defce.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/couleur-peinture-flamant-vert-trekking.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/Lily-of-the-Valley-Designed-by-Marie-Deroudilhe-02.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/shopping-decoration-jaune-bleu-delamaison-1024x866.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/wikao-cheminee-berlin-mademoiselledeco4.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/voeux2015-mademoiselledeco-blog.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2014/12/suite-novotel-constance-guisset-1.jpg"
src="http://www.mademoiselledeco.com/wp-content/uploads/2014/12/wish-list-decoration-noel-2014.jpg"
I hope this sheds some light.
On a PHP+MySQL project, there's a string of text coming from a MySQL table that contains HTML tags but those tags never get rendered by Google Chrome or any browser I tried yet:
You can see that the HTML (p, strong) aren't getting interpreted by the browser.
So the result is:
EDIT: HTML/PHP
<div class="winery_description">
<?php echo $this->winery['description']; ?>
</div>
$this->winery being the array result of the SQL Select.
EDIT 2: I'm the dumbest man in the world, the source contains entities. So the new question is: How do I force entities to be interpreted?
Real source:
Any suggestions? Thanks!
You are probably using innerText or textContent to set the content of your div, which just replace the child nodes of the div with a single text node.
Use innerHTML instead, to have the browser parse the HTML and create the appropriate DOM nodes.
The answer provided by #Paulpro is correct.
Also note that if you are using jQuery, be sure to use the .html() method instead of .text() method:
$('#your_element').html('<h1>This works!</h1>');
$('#another_element').text('<h2>Wrong; you will see the <h2> in the output');
Working with PHP DOM - HTML manipulation.
Got 2 questions
Recently read that, there is better way to output special html characters (e.g. ©): DOMDocument::createEntityReference() method. Main advantage is, you don't need to use htmlentities, it will be automatically escaped.
For ex: $copyright_symbol = $document->createEntityReference("copy");.
Now, the problem is, where can I find characters' code reference? In my case I need php equalent of × (× symbol)
What if I want to set muliple classes to element? Can I do it like that $el->setAttribute('class', 'class1 class2 ...') ??
here you can see character codes as well as friendly names. For your ×, you will use "times"
and for the second question, yes, you can do it like that.
Lots of tutorials around the net but none of them can explain me this:
How do I select a single element (in a table, for example), having its absolute XPath?
Example:
I have this:
/html/body/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[3]/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[4]/td[5]/span
What's that PHP function to get the text of that element?!
Really I could not find an answer. Found lots of guides and hints to get all the elements of the table, all the buttons of a form, etc, but not what I need.
Thank you.
$xml = simplexml_load_string($html_content_string);
$arr = $xml->xpath("//body/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[3]/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[4]/td[5]/span");
var_dump($arr);
Load you HTML document into a DOM object then make a DOMXPath object from it and let it evaluate your query string.
It's all described in detail here: http://php.net/manual/en/book.dom.php
I'm trying to fetch data from a div (based on his id), using PHP's PCRE. The goal is to fetch div's contents based on his id, and using recursivity / depth to get everything inside it. The main problem here is to get other divs inside the "main div", because regex would stop once it gets the next </div> it finds after the initial <div id="test">.
I've tryed so many different approaches to the subject, and none of it worked. The best solution, in my oppinion, is to use the R parameter (Recursion), but never got it to work properly.
Any Ideais?
Thanks in advance :D
You'd be much better off using some form of DOM parser - regex really isn't suited to this problem. If all you want is basic HTML dom parsing, something like simplehtmldom would be right up your alley. It's trivial to install (just include a single PHP file) and trivial to use (2-3 lines will do what you need).
include('simple-html-dom.php');
$dom = str_get_html($bunchofhtmlcode);
$testdiv = $dom->find('div#test',0); // 0 for the first occurrence
$testdiv_contents = $testdiv->innertext;