How to get child nodes from an xml url? - php

I got this link https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text. I am trying to write a code that gets Interactions and GeneOntology within Gene-commentary_heading from the link. I only succeed using this code when there are the 2 or 3 nodes but in this case there are at least 6 nodes or more. Could someone help me?
Bellow is the example of the information I am looking for (it's to much to visualise so I just showed a part)
<Gene-commentary_heading>GeneOntology</Gene-commentary_heading>
<Gene-commentary_source>
<Other-source>
<Other-source_pre-text>Provided by</Other-source_pre-text>
<Other-source_anchor>GOA</Other-source_anchor>
<Other-source_url>http://www.ebi.ac.uk/GOA/</Other-source_url>
</Other-source>
</Gene-commentary_source>
<Gene-commentary_comment>
<Gene-commentary>
<Gene-commentary_type value="comment">254</Gene-commentary_type>
<Gene-commentary_label>Function</Gene-commentary_label>
<Gene-commentary_comment>
<Gene-commentary>
<Gene-commentary_type value="comment">254</Gene-commentary_type>
<Gene-commentary_source>
<Other-source>
<Other-source_src>
<Dbtag>
<Dbtag_db>GO</Dbtag_db>
<Dbtag_tag>
<Object-id>
<Object-id_id>3677</Object-id_id>
</Object-id>
</Dbtag_tag>
...
`$url = "https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text";
$document_xml = new DOMDocument();
$document_xml->loadXML($url);
$elements = $url->getElementsByTagName('Gene-commentary_heading');
echo $elements;
foreach($element as $node) {
$GO = $node -> getElementsByTagName('GeneOntology');
$Int = $node->getElementsByTagName('Interactions');
}

My answer
$esearch_test = "https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text";
$result = file_get_contents($esearch_test);
$xml = simplexml_load_string($result);
$doc = new DOMDocument();
$doc = DOMDocument::loadXML($xml);
$c = 1;
foreach($doc->getElementsByTagName('Gene-commentary_heading') as $node) {
echo "$c: ".$node->textContent."\n";
$c++;
}

Related

Xpath nodeValue/textContent unable to see <BR> tag

HTML is as follows:
ABC<BR>DEF
However, both nodeValue and textContent attributes show "ABCDEF" as the value.
Any way to show or parse the <BR>?
Maybe this'll help you: DOMNode::C14N
It'll return the HTML of the node.
<?php
$a = 'ABC<BR>DEF';
$doc = new DOMDocument();
#$doc->loadHTML($a);
$finder = new DomXPath($doc);
$nodes = $finder->query("//a");
foreach ($nodes as $node) {
var_dump($node->c14n());
}
Demo
I know you have already solved your problem, but I wanted to add a more direct way of solving it...
$a = 'ABC<BR>DEF';
$doc = new DOMDocument();
$doc->loadHTML($a);
$xp = new DomXPath($doc);
$nodes = $xp->query("//a/node()");
$text = '';
foreach ($nodes as $node) {
$text .= $doc->saveHTML($node);
}
echo $text;
Outputs...
ABC<br>DEF

Why the query doesn't match the DOM?

Here is my code:
$res = file_get_contents("http://www.lenzor.com/photo/search/index/type/user/%D8%B9%D9%84%DB%8C//text/%D9%81%D8%A7%D8%B7%D9%85%D9%87");
$doc = new \DOMDocument();
#$doc->loadHTMLFile($res);
$xpath = new \DOMXpath($doc);
$links = $xpath->query("//ul[#class='user_box']/li");
$result = array();
if (!is_null($links)) {
foreach ($links as $link) {
$href = $link->getAttribute('class');
$result[] = [$href];
}
}
print_r($result);
Here is the content I'm working on. I mean it's the result of echo $res.
Ok well, the result of my code is an empty array. So $links is empty and that foreach won't be executed. Why? Why //ul[#class='user_box']/li query doesn't match the DOM ?
Expected result is an array contains the class attribute of lis.
Try this, Hope this will be helpful. There are few mistakes in your code.
1. You should search like this '//ul[#class="user_box clearfix"]/li' because class="user_box clearfix" class attribute of that HTML source contains two classes.
2. You should use loadHTMLinstead of loadHTMLFile.
<?php
ini_set('display_errors', 1);
libxml_use_internal_errors(true);
$res = file_get_contents("http://www.lenzor.com/photo/search/index/type/user/%D8%B9%D9%84%DB%8C//text/%D9%81%D8%A7%D8%B7%D9%85%D9%87");
$doc = new \DOMDocument();
$doc->loadHTML($res);
$xpath = new \DOMXpath($doc);
$links = $xpath->query('//ul[#class="user_box clearfix"]/li');
$result = array();
if (!is_null($links)) {
foreach ($links as $link) {
$href = $link->getAttribute('class');
$result[] = [$href];
}
}
print_r($result);

DOM Parser grabbing href of <a> tag by class="Decision"

I'm working with a DOM parser and I'm having issues. I'm basically trying to grab the href within the tag that only contain the class ID of 'thumbnail '. I've been trying to print the links on the screen and still get no results. Any help is appreciated. I also turned on error_reporting(E_ALL); and still nothing.
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$classId = "thumbnail ";
$div = $html->find('a#'.$classId);
echo $div;
I also tried this but still had the same result of NOTHING:
include('simple_html_dom.php');
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
$ret = $html->find('a[class=thumbnail]');
echo $ret;
You were almost there:
<?php
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a[contains(concat(' ',normalize-space(#class),' '),' thumbnail ')]");
var_dump($hrefs);
Gives:
class DOMNodeList#28 (1) {
public $length =>
int(25)
}
25 matches, I'd call it success.
This code would probably work:
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hyperlinks = $xpath->query('//a[#class="thumbnail"]');
foreach($hyperlinks as $hyperlink) {
echo $hyperlink->getAttribute('href'), '<br>;'
}
if you're using simple_html_dom, why are you doing all these superfluous things? It already wraps the resource in everything you need -- http://simplehtmldom.sourceforge.net/manual.htm
include('simple_html_dom.php');
// set up:
$html = new simple_html_dom();
// load from URL:
$html->load_file('http://www.reddit.com/r/funny');
// find those <a> elements:
$links = $html->find('a[class=thumbnail]');
// done.
echo $links;
Tested it and made some changes - this works perfect too.
<?php
// load the url and set up an array for the links
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$links = array();
// loop thru all the A elements found
foreach($dom->getElementsByTagName('a') as $link) {
$url = $link->getAttribute('href');
$class = $link->getAttribute('class');
// Check if the URL is not empty and if the class contains thumbnail
if(!empty($url) && strpos($class,'thumbnail') !== false) {
array_push($links, $url);
}
}
// Print results
print_r($links);
?>

How can i get the value of attribute in of a xml node in php?

I'm using simplexml to read a xml file. So far i'm unable to get the attribute value i'm looking for. this is my code.
if(file_exists($xmlfile)){
$doc = new DOMDocument();
$doc->load($xmlfile);
$usergroup = $doc->getElementsByTagName( "preset" );
foreach($usergroup as $group){
$pname = $group->getElementsByTagName( "name" );
$att = 'code';
$name = $pname->attributes()->$att; //not working
$name = $pname->getAttribute('code'); //not working
if($name==$preset_name){
echo($name);
$group->parentNode->removeChild($group);
}
}
}
and my xml file looks like
<presets>
<preset>
<name code="default">Default</name>
<createdBy>named</createdBy>
<icons>somethignhere</icons>
</preset>
</presets>
Try this :
function getByPattern($pattern, $source)
{
$dom = new DOMDocument();
#$dom->loadHTML($source);
$xpath = new DOMXPath($dom);
$result = $xpath->evaluate($pattern);
return $result;
}
And you may use it like (using XPath) :
$data = getByPattern("/regions/testclass1/presets/preset",$xml);
UPDATE
Code :
<?php
$xmlstr = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?><presets><preset><name code=\"default\">Default</name><createdBy>named</createdBy><icons>somethignhere</icons></preset></presets>";
$xml = new SimpleXMLElement($xmlstr);
$result = $xml->xpath("/presets/preset/name");
foreach($result[0]->attributes() as $a => $b) {
echo $a,'="',$b,"\"\n";
}
?>
Output :
code="default"
P.S. And also try accepting answers as #TJHeuvel mentioned; it's an indication that you respect the community (and the community will be more than happy to help you more, next time...)
Actually question in my head includes deleting a node as well , mistakenly i could not add it. So in my point of view this is the complete answer, i a case if someone else find this useful.
This answer doesn't include SimpleXMLElement class because how hard i tried it didn't delete the node with unset(); . So back to where i was , i finally found an answer. This is my code.
and its Simple!!!
if(file_exists($xmlfile)){
$doc = new DOMDocument();
$doc->load($xmlfile);
$presetgroup = $doc->getElementsByTagName( "preset" );
foreach($presetgroup as $group){
$pname = $group->getElementsByTagName( "name" );
$pcode = $pname->item(0)->getAttribute('code');
if($pcode==$preset_name){
echo($preset_name);
$group->parentNode->removeChild($group);
}
}
}
$doc->save($xmlfile);

Simple HTML DOM gets only 1 element

I'm following a simplified version of the scraping tutorial by NetTuts here, which basically finds all divs with class=preview
http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/comment-page-1/#comments
This is my code. The problem is that when I count $items I get only 1, so it's getting only the first div with class=preview, not all of them.
$articles = array();
$html = new simple_html_dom();
$html->load_file('http://net.tutsplus.com/page/76/');
$items = $html->find('div[class=preview]');
echo "count: " . count($items);
Try using DOMDocument and DOMXPath:
$file = file_get_contents('http://net.tutsplus.com/page/76/');
$dom = new DOMDocument();
#$dom->loadHTML($file);
$domx = new DOMXPath($dom);
$nodelist = $domx->evaluate("//div[#class='preview']");
foreach ($nodelist as $node) { print $node->nodeValue; }

Categories