How do I get the value of an input field like the one below where it does not have an ID attribute using PHP's DOMDocument?
<input type="text" name="make" value="Toyota">
XPath makes it simple, assuming that's the only text input with "make" as its name:
$dom = new DOMDocument();
$dom->loadHTML(...);
$xp = new DOMXpath($dom);
$nodes = $xp->query('//input[#name="make"]');
$node = $nodes->item(0);
$car_make = $node->getAttribute('value');
If there's more than one input with that particular field name on the page (which is entirely possible), then you'll have to do some extra work to narrow down WHICH of those multiple inputs you want.
$dom = new DOMDocument();
$dom->loadHTML($result);
$xpath = new DOMXpath($dom);
$node = $xpath->query('//input[#name="token"]/attribute::value');
$token = $node->item(0)->nodeValue;
Related
I'm calling some wikipedia content two different way:
$html = file_get_contents('https://en.wikipedia.org/wiki/Sans-serif');
The first one is to call the first paragraph
$dom = new DomDocument();
#$dom->loadHTML($html);
$p = $dom->getElementsByTagName('p')->item(0)->nodeValue;
echo $p;
The second one is to call the first paragraph after a specific $id
$dom = new DOMDocument();
#$dom->loadHTML($html);
$p=$dom->getElementById('$id')->getElementsByTagName('p')->item(0);
echo $p->nodeValue;
I'm looking for a third way to call all the first part.
So I was thinking about calling all the <p> before the id or class "toc" which is the id/class of the table of content.
Any idea how to do that?
If you're just looking for the intro in plain text, you can simply use Wikipedia's API:
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Sans-serif
If you want HTML formatting as well (excluding inner images and the likes):
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&titles=Sans-serif
You could use DOMDocument and DOMXPath with for example an xpath expression like:
//div[#id="toc"]/preceding-sibling::p
$doc = new DOMDocument();
$doc->load("https://en.wikipedia.org/wiki/Sans-serif");
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//div[#id="toc"]/preceding-sibling::p');
foreach ($nodes as $node) {
echo $node->nodeValue;
}
That would give you the content of the paragraphs preceding the div with id = toc.
I need to get "option" values from a site. But there are more than one "select". How do I get "option" values by "Select name" value . (get all option value by select name=ctl02$dlOgretimYillari)
<select name="ctl02$dlOgretimYillari" onchange="javascript:setTimeout('__doPostBack(\'ctl02$dlOgretimYillari\',\'\')', 0)" id="ctl02_dlOgretimYillari" class="NormalBlack">
<option selected="selected" value="-40">2016-2017</option>
</select>
My Code :
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8'));
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$options_value = array();
$options_name = array();
$options_selected = array();
foreach($dom->getElementsByTagName('option') as $option) {
array_push($options_value, $option->getAttribute('value'));
array_push($options_selected, $option->getAttribute('selected'));
array_push($options_name, $option->nodeValue);
}
get all option value by select name=ctl02$dlOgretimYillari
The solution using DOMXPath::query method:
$content = '<select name="ctl02$dlOgretimYillari" onchange="javascript:setTimeout(\'__doPostBack(\'ctl02$dlOgretimYillari\',\'\')\', 0)" id="ctl02_dlOgretimYillari" class="NormalBlack"> <option selected="selected" value="-40">2016-2017</option> </select>';
$doc = new DOMDocument();
libxml_use_internal_errors();
$doc->loadXML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8'));
$xpath = new DOMXPath($doc);
$nodes = $xpath->query("//select[#name='ctl02\$dlOgretimYillari']/option/#value");
// outputting first option value
print_r($nodes->item(0)->nodeValue);
The output:
-40
For additional condition: How do I get the value of the option text?
...
$nodes = $xpath->query("//select[#name='ctl02\$dlOgretimYillari']/option/text()");
...
#RomanPerekhrest 's answer definitely solved the initial question!
But in my case I needed to get all the selected values, so here's the solution for anyone with the same problem:
$nodes = $xpath->query("//select[#name='ctl02\$dlOgretimYillari']/option[#selected]/#value");
// Printing
foreach ($nodes as $node) {
print_r($node->nodeValue);
}
I'm trying to figure out how parse an html page to get a forms action value, the labels within the form tab as well as the input field names. I took at look at php.net Domdocument and it tells me to get a childnode but all that does is give me errors that it doesnt exist. I also tried doing print_r of the variable holding the html content and all that shows me is length=1. Can someone show me a few samples that i can use because php.net is confusing to follow.
<?php
$content = "some-html-source";
$content = preg_replace("/&(?!(?:apos|quot|[gl]t|amp);|#)/", '&', $content);
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($content);
$form = $dom->getElementsByTagName('form');
print_r($form);
I suggest using DomXPath instead of getElementsByTagName because it allows you to select attribute values directly and returns a DOMNodeList object just like getElementsByTagName. The # in #action indicates that we're selecting by attribute.
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DomXPath($doc);
$action = $xpath->query('//form/#action')->item(0);
var_dump($action);
Similarly, to get the first input
$action = $xpath->query('//form/input')->item(0);
To get all input fields
for($i=0;$i<$xpath->query('//form/input')->length;$i++) {
$label = $xpath->query('//form/input')->item($i);
var_dump($label);
}
If you're not familiar with XPath, I recommend viewing these examples.
This question already has answers here:
How to retrieve comments from within an XML Document in PHP
(4 answers)
Closed 8 years ago.
I am trying to retrieve content from a p element in this page. As you can see, in the source code there is a paragraph with the content i want:
<p id="qb"><!--
QBlastInfoBegin
Status=READY
QBlastInfoEnd
--></p>
Actually i want to take the value of the Status.
Here is my PHP code.
#$dom->loadHTML($ncbi->ncbi_request($params));
$XPath = new DOMXpath($dom);
$nodes = $XPath->query('//p[#id="qb"]');
$node = $nodes->item(0)->nodeValue;
var_dump($node))
that returns
["nodeValue"]=> string(0) ""
Any idea ?
Thanks!
Seems that to get comment values you need to use //comment()
I'm not too familiar with XPaths so am not too sure on the exact syntax
Sources: https://stackoverflow.com/a/7548089/723139 / https://stackoverflow.com/a/1987555/723139
Update: with working code
<?php
$data = file_get_contents('http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?RID=UY5PPBRH014&CMD=Get');
$dom = new DOMDocument();
#$dom->loadHTML($data);
$XPath = new DOMXpath($dom);
$nodes = $XPath->query('//p[#id="qb"]/comment()');
foreach ($nodes as $comment)
{
var_dump($comment->textContent);
}
I checked up the site, and it seems you are after the comment inside, you need to add comment() on your xpath query. Consider this example:
$contents = file_get_contents('http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?RID=UY5PPBRH014&CMD=Get');
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($contents);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$comment = $xpath->query('//p[#id="qb"]/comment()')->item(0)->nodeValue;
echo '<pre>';
print_r($comment);
Outputs:
QBlastInfoBegin
Status=READY
QBlastInfoEnd
I am trying to extract a complete table including the HTML tags, with XPath, that I can store in a variable, do a bit of string replacement on, then echo directly to the screen. I have found numerous posts on getting the text out of the table but I want to retain the HTML formatting since I am just going to display it (after minor modification).
At present I am extracting the table using string functions stristr, substr etc. but I would prefer to use XPath.
I can display the contents of the table with the following but it just displays the table TD fields with no formatting. It also does not store it in a variable that I can manipulate.
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$arr = $xpath->query('//table');
foreach($arr as $el) {
echo $el->textContent;
I tried this but got no output:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$arr = $xpath->query('//table');
echo $arr->saveHTML();
Use DOMNode::C14N():
foreach($arr as $el) {
echo $el->C14N();