DOMXpath query returns null

DOMXpath query returns null - php

$doc = new DOMDocument();
$doc->loadHTMLFile("https://www.tipico.com/en/wettschein/bslc-bVysdHEpshHRDMQ7E-Y5Q%3D%3D/");
$xpath = new DOMXpath($doc);
$footer = $xpath->query("//div[#class='t_foot']/div[1]/div[1]");
var_dump($footer->item(0)->nodeValue);
Shouldn't this return 48,37? I have other xpath queries which are working, but especially this is not.

The problem is that t_foot is not the only class on the element you are trying to get, so the class name is not equal to the string t_foot. Instead you should select element which class contains t_foot. So XPath expression should be this:
$footer = $xpath->query('//div[contains(#class, "t_foot")]/div[1]/div[1]');

Related

php - loadHTML() - every <p> until a certain class

I'm calling some wikipedia content two different way:
$html = file_get_contents('https://en.wikipedia.org/wiki/Sans-serif');
The first one is to call the first paragraph
$dom = new DomDocument();
#$dom->loadHTML($html);
$p = $dom->getElementsByTagName('p')->item(0)->nodeValue;
echo $p;
The second one is to call the first paragraph after a specific $id
$dom = new DOMDocument();
#$dom->loadHTML($html);
$p=$dom->getElementById('$id')->getElementsByTagName('p')->item(0);
echo $p->nodeValue;
I'm looking for a third way to call all the first part.
So I was thinking about calling all the <p> before the id or class "toc" which is the id/class of the table of content.
Any idea how to do that?

If you're just looking for the intro in plain text, you can simply use Wikipedia's API:
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Sans-serif
If you want HTML formatting as well (excluding inner images and the likes):
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&titles=Sans-serif

You could use DOMDocument and DOMXPath with for example an xpath expression like:
//div[#id="toc"]/preceding-sibling::p
$doc = new DOMDocument();
$doc->load("https://en.wikipedia.org/wiki/Sans-serif");
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//div[#id="toc"]/preceding-sibling::p');
foreach ($nodes as $node) {
echo $node->nodeValue;
}
That would give you the content of the paragraphs preceding the div with id = toc.

Get DOMNodeList of elements with only the given class

I am parsing a 3rd party HTML page using PHP DOMDocument and DomXPath.
I use the following code:
$dom = new DOMDocument();
$html = file_get_contents($url);
$dom->loadHTML('<?xml encoding="UTF-8">' . $html);
$dom->encoding = "UTF-8";
$finder = new DomXPath($dom);
Now there are several elements using the same class, but I want to target the one that uses only the given class, for example:
<table class="tbl"></table>
<table class="tbl red"></table>
<table class="tbl large blue"></table>
I use the following selector:
$classname = "tbl";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
Which, of course fetches all three tables given above. Is there a simple way to get only the first one?
Thanks

Yes, there is a way.
Note that with your XPath query you can access to desired node by this way:
$nodes->item(0);
To select only the first node you have to modify your pattern in this way:
$nodes = $finder->query("(//*[contains(#class, '$classname')])[1]");
But to access to desired node you need anyway to use this syntax:
$nodes->item(0);

retrieving certain attributes using DOMDocument

I'm trying to figure out how parse an html page to get a forms action value, the labels within the form tab as well as the input field names. I took at look at php.net Domdocument and it tells me to get a childnode but all that does is give me errors that it doesnt exist. I also tried doing print_r of the variable holding the html content and all that shows me is length=1. Can someone show me a few samples that i can use because php.net is confusing to follow.
<?php
$content = "some-html-source";
$content = preg_replace("/&(?!(?:apos|quot|[gl]t|amp);|#)/", '&', $content);
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($content);
$form = $dom->getElementsByTagName('form');
print_r($form);

I suggest using DomXPath instead of getElementsByTagName because it allows you to select attribute values directly and returns a DOMNodeList object just like getElementsByTagName. The # in #action indicates that we're selecting by attribute.
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DomXPath($doc);
$action = $xpath->query('//form/#action')->item(0);
var_dump($action);
Similarly, to get the first input
$action = $xpath->query('//form/input')->item(0);
To get all input fields
for($i=0;$i<$xpath->query('//form/input')->length;$i++) {
$label = $xpath->query('//form/input')->item($i);
var_dump($label);
}
If you're not familiar with XPath, I recommend viewing these examples.

PHP with DOM can not get the description correctly

I am using the following code to get the first of a page. However I can not get it. What am I missing here ?
$doc = new DOMDocument();
$doc->loadhtmlfile("");
$xpath = new DOMXpath($doc);
$descr = $xpath->query('//div[#class="description"]');
print_r($descr);

query() returns a DOMNodeList, to get the <div> DOMNode, you need to get it from the list:
$descr = $xpath->query('//div[#class="description"]')->item(0);
Now, $descr contains a DOMNode of the first <div> with class description.

DOMNodeList, xPath and PHP

I am parsing an HTML page with DOM and XPath in PHP.
I have to fetch a nested <Table...></table> from the HTML.
I have defined a query using FirePath in the browser which is pointing to
html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table
When I run the code it says DOMNodeList is fetched having length 0. My objective is to spout out the queried <Table> as a string. This is an HTML scraping script in PHP.
Below is the function. Please help me how can I extract the required <table>
$pageUrl = "http://www.boc.cn/sourcedb/whpj/enindex.html";
getExchangeRateTable($pageUrl);
function getExchangeRateTable($url){
$htmlTable = "";
$xPathTable = nulll;
$xPathQuery1 = "html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table";
if(strlen($url)==0){die('Argument exception: method call [getExchangeRateTable] expects a string of URL!');}
// initialize objects
$page = tidyit($url);
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
// $elements is sppearing as DOMNodeList
$elements = $xpath->query($xPathQuery1);
// print_r($elements);
foreach($elements as $e){
$e->firstChild->nodeValue;
}
}

have you try like this
$dom = new domDocument;
$dom->loadHTML($tes);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName("table");
$rows = $tables->item(0)->getElementsByTagName("tr");
print_r($rows);

Remove the tbody's from your XPath query - they are in most cases inserted by your browser, as is with the page you are trying to scrape.
/html/body/table[2]/tr/td[2]/table[2]/tr/td/table
This will most likely work.
However, its probaly more safe to use a different XPath. Following XPath will select the first th based on it's textual content, then select the tr's parent - a tbody or table:
//th[contains(text(),'Currency Name')]/parent::tr/parent::*

The xpath query should be with a leading / like :-
/html/...

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

DOMXpath query returns null - php

Related

php - loadHTML() - every <p> until a certain class

Get DOMNodeList of elements with only the given class

retrieving certain attributes using DOMDocument

PHP with DOM can not get the description correctly

DOMNodeList, xPath and PHP

Categories

Resources