I have just learned about XPath and I am wanting to read data from only certain columns in a table.
My current code looks like this:
<?php
$file_contents = file_get_contents('test.html');
$dom_document = new DOMDocument();
$dom_document->loadHTML($file_contents);
//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);
$elements = $dom_xpath->query("//tr[#class='rowstyle']");
if (!is_null($elements)) {
foreach ($elements as $element)
{
echo $element->nodeValue . '<br />';
}
}
else
{
echo 'none';
}
?>
Also a variation in the query because through my research I have seen lots of issues with nest table elements but it produces the same result:
$elements = $dom_xpath->query("//table[#class='tablestyle']/tbody/tr[#class='rowstyle']");
It does grab the row of data but it makes into a single string, combining all of the cells into one string and making the tags disappear.
What I really want to do is separate those cells and grab the certain row number.
I am also curious on how to find out which version of XPath I have... My PHP version is 5.3.5
Its not combining those cells... youre outputting the nodeValue which in this case is behaving like innerHTML. IF you want to work on the cells themselves the either use childNodes or a xpah query using the row as the context, then loop over the cells.
Example:
$dom_xpath = new DOMXpath($dom_document);
$elements = $dom_xpath->query("//tr[#class='rowstyle']");
foreach ($elements as $element)
{
foreach($element->childNodes as $cell) {
echo $cell->nodeValue . '<br />';
}
}
Related
I know there are many questions on parsing HTML in PHP, but I can't seem to find the specific problem I'm experiencing. My code works on other elements in the page, and also iterates over the inputs returning the tag name. At the same time their value property is empty, when 2 of them have a value for sure. Here is my code
$html = file_get_contents('http://...sample website...html');
$doc = new DOMDocument;
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*/input[#type='hidden']");
if(!is_null($elements)){
foreach ($elements as $element) {
echo "<br/>[". $element->nodeName. "]";
echo $element->nodeValue. "\n";
}
}
$xpath->query("//*/input[#type='hidden']/#value");
instead of
$xpath->query("//*/input[#type='hidden']");
also works well.
Same question, same answers
I got it myself, if anyone else has a similar problem it is just that nodeValue returns the "innerHTML" of an element, to get its properties use $element -> getAttribute("value") (for the "value" attribute)
I'm trying to parse this HTML page: http://www.valor.com.br/valor-data/moedas
For a simple start, I'm trying to get all td elements with class="left" and echoing their inner texts. What I'm struggling to understand is why this code:
$finder = new DomXPath($dom);
$tds = $finder->query("//*[#class='left']");
foreach ($tds as $td) {
echo $td->textContent;
}
gives me the expected output (a bunch of words that belong to those td elements which aren't worth pasting here) while this:
$finder = new DomXPath($dom);
$tds = $finder->query("//td[#class='left']");
foreach ($tds as $td) {
echo $td->textContent;
}
finds nothing. I've also tried $finder->query("//td") to simply get all td elements, but it's like DomXPath doesn't recognize tag names. Has anyone ever faced this same problem?
I have not tested, but this is probably a namespace issue. Your input page is XHTML and has correctly declared an XHTML namespace. Therefore, you need to register a namespace prefix and use that prefix in your query.
Something like this
$finder = new DomXPath($dom);
$finder->registerNamespace("x", "http://www.w3.org/1999/xhtml");
$tds = $finder->query("//x:td[#class='left']");
foreach ($tds as $td) {
echo $td->textContent;
}
I want to display the list elements of the ul but not the last one. I have used DOM nut it takes a long time.. Can Someone please give me the Xpath expression to solve this.
Please Provide the whole solution code.
$doc = new DOMDocument();
#$doc->loadHTMLFile($sel_image['snapdeal_content']);
$divs = $doc->getElementsByTagName('ul');
foreach($divs as $div) {
if ($div->getAttribute('class') == 'key-features') {
$li = $div->getElementsByTagName('li');
for($j=0;$j<$li->length-1;$j++){
echo "->".$li->item($j)->nodeValue;
echo "<br />";
}
}
}
Try this excerpt to replace the for-loops in your solution. The $li array should contain all n-1 <li> elements of all <ul> enumerations in the document.
$xpath = new DOMXPath($doc);
$query = '//ul[#class = "key-features"]/li[position() < last()]';
$li = $xpath->query($query);
Also see http://www.php.net/manual/en/domxpath.query.php and XSL for-each: how to detect last node? .
I've been hacking at this for a while and just cant seem to get it right.
How can you get get the contents of all script elements, when the number of script elements is variable. My example markup looks like this:
<div></div>
<iframe><iframe>
<script>xxxx</script>
<script>xxxx</script>
<script>xxxx</script>
What I have so far works only if I keep the number of scripts static so clearly Im not iterating over the array correctly, but Im totally thrown by the DOMXPath documentation as how to do it. This is what I have so far:
$dom = new DOMDocument();
$dom->preserveWhiteSpace = true;
#$dom->loadHtml($form_content);
$xpath = new DOMXPath($dom);
$items = $xpath->query('//script');
foreach ($items as $item) {
$scriptContents = $item->previousSibling->previousSibling->nodeValue . "\r\ n\r\n";
$scriptContents .= $item->previousSibling->nodeValue . "\r\n\r\n";
$scriptContents .= $item->nodeValue . "\r\n\r\n";
}
echo $scriptContents;
How should I go about this? I've been search SO for a while now, but can seem to apply a solution that works. Thanks in advance - b
It appears that you are overwriting $scriptContents with each iteration, which is probably not what you are intending. The way the script currently is operating, your output would be limited to the two previous siblings of the last script tag (whether or not they are actually script tags themselves) along with the last script tag.
If you are strictly trying to output the script tags you can do this:
$xpath = new DOMXPath($dom);
$items = $xpath->query('//script');
foreach ($items as $item) {
echo $item->nodeValue . "\r\n\r\n";
}
I've got this PHP code loading in some html.
$dom = new DOMDocument();
$dom->loadHTML($somehtml);
$xpath = new DOMXPath($dom);
$divContent = $xpath->query('//table[class="defURLP"]');
echo $divContent;
I'm too confused to understand quite what needs to go on here, however my desire would it to be able to populate the variable $divContent to have the html contents of the table with the classname defURLP
It's currently just returning
object(DOMNodeList)#3 (0) { }
You need to retrieve the first item from the DOMNodeList returned by your xpath query, since there may be more than one in the list.
// Queries for tables having class defURLP
$tables = $xpath->query('//table[class="defURLP"]');
// Reference the first one in $divContent
$divContent = $tables->item(0);
// Output its nodeValue
echo $divContent->nodeValue;
Or iterate over the node list with a foreach:
$tables = $xpath->query('//table[class="defURLP"]');
// Iterate over the whole node list in $tables (if it is multiple nodes)
foreach ($tables as $t) {
echo $t->nodeValue;
}