Extract http-equiv content with php - php

I'm trying to extract all meta http-equiv properties from url.
Here is the code
function fetch_http_equiv($url)
{
$data = file_get_contents($url);
$dom = new DomDocument;
#$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$metas = $xpath->query('//*/meta[starts-with(#http-equiv)]');
$http_equiv = array();
foreach($metas as $meta){
$property = $meta->getAttribute('http-equiv');
$content = $meta->getAttribute('content');
$http_equiv[$property] = $content;
}
return $http_equiv;
}
// fetch meta http-equiv 's
$http_equiv = fetch_http_equiv($link);
// if $http_equiv Content-Language exists
if (empty($http_equiv['Content-Language'])) {
}else{
$meta_content_language = $http_equiv['Content-Language'];
}
For the love of god In my mind it should work, what did I missed ?
edit:
I found a problem; I did changed
$property = $meta->getAttribute('http_equiv');
to
$property = $meta->getAttribute('http-equiv');
case solved.

I found a problem; I did changed
$property = $meta->getAttribute('http_equiv');
to
$property = $meta->getAttribute('http-equiv');
case solved.
Code works now.

Related

How to get child nodes from an xml url?

I got this link https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text. I am trying to write a code that gets Interactions and GeneOntology within Gene-commentary_heading from the link. I only succeed using this code when there are the 2 or 3 nodes but in this case there are at least 6 nodes or more. Could someone help me?
Bellow is the example of the information I am looking for (it's to much to visualise so I just showed a part)
<Gene-commentary_heading>GeneOntology</Gene-commentary_heading>
<Gene-commentary_source>
<Other-source>
<Other-source_pre-text>Provided by</Other-source_pre-text>
<Other-source_anchor>GOA</Other-source_anchor>
<Other-source_url>http://www.ebi.ac.uk/GOA/</Other-source_url>
</Other-source>
</Gene-commentary_source>
<Gene-commentary_comment>
<Gene-commentary>
<Gene-commentary_type value="comment">254</Gene-commentary_type>
<Gene-commentary_label>Function</Gene-commentary_label>
<Gene-commentary_comment>
<Gene-commentary>
<Gene-commentary_type value="comment">254</Gene-commentary_type>
<Gene-commentary_source>
<Other-source>
<Other-source_src>
<Dbtag>
<Dbtag_db>GO</Dbtag_db>
<Dbtag_tag>
<Object-id>
<Object-id_id>3677</Object-id_id>
</Object-id>
</Dbtag_tag>
...
`$url = "https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text";
$document_xml = new DOMDocument();
$document_xml->loadXML($url);
$elements = $url->getElementsByTagName('Gene-commentary_heading');
echo $elements;
foreach($element as $node) {
$GO = $node -> getElementsByTagName('GeneOntology');
$Int = $node->getElementsByTagName('Interactions');
}
My answer
$esearch_test = "https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text";
$result = file_get_contents($esearch_test);
$xml = simplexml_load_string($result);
$doc = new DOMDocument();
$doc = DOMDocument::loadXML($xml);
$c = 1;
foreach($doc->getElementsByTagName('Gene-commentary_heading') as $node) {
echo "$c: ".$node->textContent."\n";
$c++;
}

How to create looped XML file from HTML in PHP?

I would like to be able to create an XML file from some of the content of a html page. I have tried intensively but seem to miss something.
I have created two arrays, I have setup a DOMdocument and I have prepared to save an XML file on the server... I have tried to make tons of different foreach loops all over the place - but it won't work.
Here is my code:
<?php
$page = file_get_contents('http://www.halfmen.dk/!hmhb8/score.php');
$doc = new DOMDocument();
$doc->loadHTML($page);
$score = $doc->getElementsByTagName('div');
$keyarray = array();
$teamarray = array();
foreach ($score as $value) {
if ($value->getAttribute('class') == 'xml') {
$keyarray[] = $value->firstChild->nodeValue;
$teamarray[] = $value->firstChild->nextSibling->nodeValue;
}
}
print_r($keyarray);
print_r($teamarray);
$doc = new DOMDocument('1.0','utf-8');
$doc->formatOutput = true;
$droot = $doc->createElement('ROOT');
$droot = $doc->appendChild($droot);
$dsection = $doc->createElement('SECTION');
$dsection = $droot->appendChild($dsection);
$dkey = $doc->createElement('KEY');
$dkey = $dsection->appendChild($dkey);
$dteam = $doc->createElement('TEAM');
$dteam = $dsection->appendChild($dteam);
$dkeytext = $doc->createTextNode($keyarray);
$dkeytext = $dkey->appendChild($dkeytext);
$dteamtext = $doc->createTextNode($teamarray);
$dteamtext = $dteam->appendChild($dteamtext);
echo $doc->save('xml/test.xml');
?>
I really like simplicity, thank you.
You need to add each item in one at a time rather than as an array, which is why I build the XML for each div tag rather than as a second pass. I've had to assume that your XML is structured the way I've done it, but this may help you.
$page = file_get_contents('http://www.halfmen.dk/!hmhb8/score.php');
$doc = new DOMDocument();
$doc->loadHTML($page);
$score = $doc->getElementsByTagName('div');
$doc = new DOMDocument('1.0','utf-8');
$doc->formatOutput = true;
$droot = $doc->createElement('ROOT');
$droot = $doc->appendChild($droot);
foreach ($score as $value) {
if ($value->getAttribute('class') == 'xml') {
$dsection = $doc->createElement('SECTION');
$dsection = $droot->appendChild($dsection);
$dkey = $doc->createElement('KEY', $value->firstChild->nodeValue);
$dkey = $dsection->appendChild($dkey);
$dteam = $doc->createElement('TEAM', $value->firstChild->nextSibling->nodeValue);
$dteam = $dsection->appendChild($dteam);
}
}

Get Element by ClassName with DOMdocument() Method

Here is what I am trying to achieve : retrieve all products on a page and put them into an array. Here is the code I am using :
$page2 = curl_exec($ch);
$doc = new DOMDocument();
#$doc->loadHTML($page2);
$nodes = $doc->getElementsByTagName('title');
$noders = $doc->getElementsByClassName('productImage');
$title = $nodes->item(0)->nodeValue;
$product = $noders->item(0)->imageObject.src;
It works for the $title but not for the product. For info, in the HTML code the img tag looks like this :
<img alt="" class="productImage" data-altimages="" src="xxxx">
I have been looking at this (PHP DOMDocument how to get element?) but I still don't understand how to make it work.
PS : I get this error :
Call to undefined method DOMDocument::getElementsByclassName()
I finally used the following solution :
$classname="blockProduct";
$finder = new DomXPath($doc);
$spaner = $finder->query("//*[contains(#class, '$classname')]");
https://stackoverflow.com/a/31616848/3068233
Linking this answer as it helped me the most with this problem.
function getElementsByClass(&$parentNode, $tagName, $className) {
$nodes=array();
$childNodeList = $parentNode->getElementsByTagName($tagName);
for ($i = 0; $i < $childNodeList->length; $i++) {
$temp = $childNodeList->item($i);
if (stripos($temp->getAttribute('class'), $className) !== false) {
$nodes[]=$temp;
}
}
return $nodes;
}
Theres the code and heres the usage
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($html);
$content_node=$dom->getElementById("content_node");
$div_a_class_nodes=getElementsByClass($content_node, 'div', 'a');
function getElementsByClassName($dom, $ClassName, $tagName=null) {
if($tagName){
$Elements = $dom->getElementsByTagName($tagName);
}else {
$Elements = $dom->getElementsByTagName("*");
}
$Matched = array();
for($i=0;$i<$Elements->length;$i++) {
if($Elements->item($i)->attributes->getNamedItem('class')){
if($Elements->item($i)->attributes->getNamedItem('class')->nodeValue == $ClassName) {
$Matched[]=$Elements->item($i);
}
}
}
return $Matched;
}
// usage
$dom = new \DOMDocument('1.0');
#$dom->loadHTML($html);
$elementsByClass = getElementsByClassName($dom, $className, 'h1');

How can i get the value of attribute in of a xml node in php?

I'm using simplexml to read a xml file. So far i'm unable to get the attribute value i'm looking for. this is my code.
if(file_exists($xmlfile)){
$doc = new DOMDocument();
$doc->load($xmlfile);
$usergroup = $doc->getElementsByTagName( "preset" );
foreach($usergroup as $group){
$pname = $group->getElementsByTagName( "name" );
$att = 'code';
$name = $pname->attributes()->$att; //not working
$name = $pname->getAttribute('code'); //not working
if($name==$preset_name){
echo($name);
$group->parentNode->removeChild($group);
}
}
}
and my xml file looks like
<presets>
<preset>
<name code="default">Default</name>
<createdBy>named</createdBy>
<icons>somethignhere</icons>
</preset>
</presets>
Try this :
function getByPattern($pattern, $source)
{
$dom = new DOMDocument();
#$dom->loadHTML($source);
$xpath = new DOMXPath($dom);
$result = $xpath->evaluate($pattern);
return $result;
}
And you may use it like (using XPath) :
$data = getByPattern("/regions/testclass1/presets/preset",$xml);
UPDATE
Code :
<?php
$xmlstr = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?><presets><preset><name code=\"default\">Default</name><createdBy>named</createdBy><icons>somethignhere</icons></preset></presets>";
$xml = new SimpleXMLElement($xmlstr);
$result = $xml->xpath("/presets/preset/name");
foreach($result[0]->attributes() as $a => $b) {
echo $a,'="',$b,"\"\n";
}
?>
Output :
code="default"
P.S. And also try accepting answers as #TJHeuvel mentioned; it's an indication that you respect the community (and the community will be more than happy to help you more, next time...)
Actually question in my head includes deleting a node as well , mistakenly i could not add it. So in my point of view this is the complete answer, i a case if someone else find this useful.
This answer doesn't include SimpleXMLElement class because how hard i tried it didn't delete the node with unset(); . So back to where i was , i finally found an answer. This is my code.
and its Simple!!!
if(file_exists($xmlfile)){
$doc = new DOMDocument();
$doc->load($xmlfile);
$presetgroup = $doc->getElementsByTagName( "preset" );
foreach($presetgroup as $group){
$pname = $group->getElementsByTagName( "name" );
$pcode = $pname->item(0)->getAttribute('code');
if($pcode==$preset_name){
echo($preset_name);
$group->parentNode->removeChild($group);
}
}
}
$doc->save($xmlfile);

Getting meta title and description

I am having trouble getting the meta description/title from this specific site.
Here is some code:
$file = file('http://www.thegooddrugsguide.com/lsd/index.htm');
$file = implode("",$file);
if (preg_match('/<title>(.*?)<\/title>/is',$file,$t)) $title = $t[1];
It works with other sites, but not with the site in question. What could be the problem?
This should work fine:
$doc = new DOMDocument;
$doc->loadHTMLFile('http://example.com');
$title = $doc->getElementsByTagName('title');
$title = $title[0];
$metas = $doc->getElementsByTagName('meta');
foreach ($metas as $meta) {
if (strtolower($meta->getAttribute('name')) == 'description') {
$description = $meta->getAttribute('value');
}
}
More info: http://www.php.net/manual/en/book.dom.php
Edit: this shorter version can also work to find the description:
$xpath = new DOMXPath($doc);
$description = $xpath->query('//meta[#name="description"]/#content');
$url = "http://www.thegooddrugsguide.com/lsd/index.htm";
$tags = get_meta_tags($url);
$description = $tags["description"];

Categories