This question already has answers here:
How to retrieve comments from within an XML Document in PHP
(4 answers)
Closed 8 years ago.
I am trying to retrieve content from a p element in this page. As you can see, in the source code there is a paragraph with the content i want:
<p id="qb"><!--
QBlastInfoBegin
Status=READY
QBlastInfoEnd
--></p>
Actually i want to take the value of the Status.
Here is my PHP code.
#$dom->loadHTML($ncbi->ncbi_request($params));
$XPath = new DOMXpath($dom);
$nodes = $XPath->query('//p[#id="qb"]');
$node = $nodes->item(0)->nodeValue;
var_dump($node))
that returns
["nodeValue"]=> string(0) ""
Any idea ?
Thanks!
Seems that to get comment values you need to use //comment()
I'm not too familiar with XPaths so am not too sure on the exact syntax
Sources: https://stackoverflow.com/a/7548089/723139 / https://stackoverflow.com/a/1987555/723139
Update: with working code
<?php
$data = file_get_contents('http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?RID=UY5PPBRH014&CMD=Get');
$dom = new DOMDocument();
#$dom->loadHTML($data);
$XPath = new DOMXpath($dom);
$nodes = $XPath->query('//p[#id="qb"]/comment()');
foreach ($nodes as $comment)
{
var_dump($comment->textContent);
}
I checked up the site, and it seems you are after the comment inside, you need to add comment() on your xpath query. Consider this example:
$contents = file_get_contents('http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?RID=UY5PPBRH014&CMD=Get');
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($contents);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$comment = $xpath->query('//p[#id="qb"]/comment()')->item(0)->nodeValue;
echo '<pre>';
print_r($comment);
Outputs:
QBlastInfoBegin
Status=READY
QBlastInfoEnd
Related
I'm calling some wikipedia content two different way:
$html = file_get_contents('https://en.wikipedia.org/wiki/Sans-serif');
The first one is to call the first paragraph
$dom = new DomDocument();
#$dom->loadHTML($html);
$p = $dom->getElementsByTagName('p')->item(0)->nodeValue;
echo $p;
The second one is to call the first paragraph after a specific $id
$dom = new DOMDocument();
#$dom->loadHTML($html);
$p=$dom->getElementById('$id')->getElementsByTagName('p')->item(0);
echo $p->nodeValue;
I'm looking for a third way to call all the first part.
So I was thinking about calling all the <p> before the id or class "toc" which is the id/class of the table of content.
Any idea how to do that?
If you're just looking for the intro in plain text, you can simply use Wikipedia's API:
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Sans-serif
If you want HTML formatting as well (excluding inner images and the likes):
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&titles=Sans-serif
You could use DOMDocument and DOMXPath with for example an xpath expression like:
//div[#id="toc"]/preceding-sibling::p
$doc = new DOMDocument();
$doc->load("https://en.wikipedia.org/wiki/Sans-serif");
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//div[#id="toc"]/preceding-sibling::p');
foreach ($nodes as $node) {
echo $node->nodeValue;
}
That would give you the content of the paragraphs preceding the div with id = toc.
This is my code:
$xml = file_get_contents('C:\myxml.xml');
$dom = new DOMDocument();
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
$xpath->registerNamespace("bme", "http://www.bmecat.org/bmecat/1.2/bmecat_new_catalog");
$expression = 'string(//bme:ARTICLE)';
var_dump($xpath->evaluate($expression));
This will var_dump the first of all the ARTICLE nodes. How can I get all of them?
Thanks for helping!
I am trying to extract a complete table including the HTML tags, with XPath, that I can store in a variable, do a bit of string replacement on, then echo directly to the screen. I have found numerous posts on getting the text out of the table but I want to retain the HTML formatting since I am just going to display it (after minor modification).
At present I am extracting the table using string functions stristr, substr etc. but I would prefer to use XPath.
I can display the contents of the table with the following but it just displays the table TD fields with no formatting. It also does not store it in a variable that I can manipulate.
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$arr = $xpath->query('//table');
foreach($arr as $el) {
echo $el->textContent;
I tried this but got no output:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$arr = $xpath->query('//table');
echo $arr->saveHTML();
Use DOMNode::C14N():
foreach($arr as $el) {
echo $el->C14N();
This question already has answers here:
Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?
(2 answers)
Closed 8 years ago.
libxml_use_internal_errors(true);
$url = 'http://thepiratebay.is/browse/200/0/7';
$html = file_get_contents($url);
$dom = new \DOMDocument();
$dom->loadHTML($html);
$x = new \DOMXPath($dom);
$nodeList = $x->query('/html/body/div[2]/div[2]/table/tbody/tr');
foreach ($nodeList as $node) {
die(var_dump($node));
}
Gives me the error:
"Invalid argument supplied for foreach()"
Not sure why xpath doesn't work on that domain?
If I'm right you'd like to get all the titles in that table. I'd suggest an easier, yet more specific XPath query, i.e.
$nodeList = $x->query('//div[#class="detName"]');
See it in action
I am using domDocument hoping to parse this little html code. I am looking for a specific span tag with a specific id.
<span id="CPHCenter_lblOperandName">Hello world</span>
My code:
$dom = new domDocument;
#$dom->loadHTML($html); // the # is to silence errors and misconfigures of HTML
$dom->preserveWhiteSpace = false;
$nodes = $dom->getElementsByTagName('//span[#id="CPHCenter_lblOperandName"');
foreach($nodes as $node){
echo $node->nodeValue;
}
But For some reason I think something is wrong with either the code or the html (how can I tell?):
When I count nodes with echo count($nodes); the result is always 1
I get nothing outputted in the nodes loop
How can I learn the syntax of these complex queries?
What did I do wrong?
You can use simple getElementById:
$dom->getElementById('CPHCenter_lblOperandName')->nodeValue
or in selector way:
$selector = new DOMXPath($dom);
$list = $selector->query('/html/body//span[#id="CPHCenter_lblOperandName"]');
echo($list->item(0)->nodeValue);
//or
foreach($list as $span) {
$text = $span->nodeValue;
}
Your four part question gets an answer in three parts:
getElementsByTagName does not take an XPath expression, you need to give it a tag name;
Nothing is output because no tag would ever match the tagname you provided (see #1);
It looks like what you want is XPath, which means you need to create an XPath object - see the PHP docs for more;
Also, a better method of controlling the libxml errors is to use libxml_use_internal_errors(true) (rather than the '#' operator, which will also hide other, more legitimate errors). That would leave you with code that looks something like this:
<?php
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query("//span[#id='CPHCenter_lblOperandName']") as $node) {
echo $node->textContent;
}