Extract http-equiv content with php

Extract http-equiv content with php - php

I'm trying to extract all meta http-equiv properties from url.
Here is the code
function fetch_http_equiv($url)
{
$data = file_get_contents($url);
$dom = new DomDocument;
#$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$metas = $xpath->query('//*/meta[starts-with(#http-equiv)]');
$http_equiv = array();
foreach($metas as $meta){
$property = $meta->getAttribute('http-equiv');
$content = $meta->getAttribute('content');
$http_equiv[$property] = $content;
}
return $http_equiv;
}
// fetch meta http-equiv 's
$http_equiv = fetch_http_equiv($link);
// if $http_equiv Content-Language exists
if (empty($http_equiv['Content-Language'])) {
}else{
$meta_content_language = $http_equiv['Content-Language'];
}
For the love of god In my mind it should work, what did I missed ?
edit:
I found a problem; I did changed
$property = $meta->getAttribute('http_equiv');
to
$property = $meta->getAttribute('http-equiv');
case solved.

I found a problem; I did changed
$property = $meta->getAttribute('http_equiv');
to
$property = $meta->getAttribute('http-equiv');
case solved.
Code works now.

Related

How to get child nodes from an xml url?

I got this link https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text. I am trying to write a code that gets Interactions and GeneOntology within Gene-commentary_heading from the link. I only succeed using this code when there are the 2 or 3 nodes but in this case there are at least 6 nodes or more. Could someone help me?
Bellow is the example of the information I am looking for (it's to much to visualise so I just showed a part)
<Gene-commentary_heading>GeneOntology</Gene-commentary_heading>
<Gene-commentary_source>
<Other-source>
<Other-source_pre-text>Provided by</Other-source_pre-text>
<Other-source_anchor>GOA</Other-source_anchor>
<Other-source_url>http://www.ebi.ac.uk/GOA/</Other-source_url>
</Other-source>
</Gene-commentary_source>
<Gene-commentary_comment>
<Gene-commentary>
<Gene-commentary_type value="comment">254</Gene-commentary_type>
<Gene-commentary_label>Function</Gene-commentary_label>
<Gene-commentary_comment>
<Gene-commentary>
<Gene-commentary_type value="comment">254</Gene-commentary_type>
<Gene-commentary_source>
<Other-source>
<Other-source_src>
<Dbtag>
<Dbtag_db>GO</Dbtag_db>
<Dbtag_tag>
<Object-id>
<Object-id_id>3677</Object-id_id>
</Object-id>
</Dbtag_tag>
...
`$url = "https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text";
$document_xml = new DOMDocument();
$document_xml->loadXML($url);
$elements = $url->getElementsByTagName('Gene-commentary_heading');
echo $elements;
foreach($element as $node) {
$GO = $node -> getElementsByTagName('GeneOntology');
$Int = $node->getElementsByTagName('Interactions');
}

My answer
$esearch_test = "https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text";
$result = file_get_contents($esearch_test);
$xml = simplexml_load_string($result);
$doc = new DOMDocument();
$doc = DOMDocument::loadXML($xml);
$c = 1;
foreach($doc->getElementsByTagName('Gene-commentary_heading') as $node) {
echo "$c: ".$node->textContent."\n";
$c++;
}

How to create looped XML file from HTML in PHP?

I would like to be able to create an XML file from some of the content of a html page. I have tried intensively but seem to miss something.
I have created two arrays, I have setup a DOMdocument and I have prepared to save an XML file on the server... I have tried to make tons of different foreach loops all over the place - but it won't work.
Here is my code:
<?php
$page = file_get_contents('http://www.halfmen.dk/!hmhb8/score.php');
$doc = new DOMDocument();
$doc->loadHTML($page);
$score = $doc->getElementsByTagName('div');
$keyarray = array();
$teamarray = array();
foreach ($score as $value) {
if ($value->getAttribute('class') == 'xml') {
$keyarray[] = $value->firstChild->nodeValue;
$teamarray[] = $value->firstChild->nextSibling->nodeValue;
}
}
print_r($keyarray);
print_r($teamarray);
$doc = new DOMDocument('1.0','utf-8');
$doc->formatOutput = true;
$droot = $doc->createElement('ROOT');
$droot = $doc->appendChild($droot);
$dsection = $doc->createElement('SECTION');
$dsection = $droot->appendChild($dsection);
$dkey = $doc->createElement('KEY');
$dkey = $dsection->appendChild($dkey);
$dteam = $doc->createElement('TEAM');
$dteam = $dsection->appendChild($dteam);
$dkeytext = $doc->createTextNode($keyarray);
$dkeytext = $dkey->appendChild($dkeytext);
$dteamtext = $doc->createTextNode($teamarray);
$dteamtext = $dteam->appendChild($dteamtext);
echo $doc->save('xml/test.xml');
?>
I really like simplicity, thank you.

You need to add each item in one at a time rather than as an array, which is why I build the XML for each div tag rather than as a second pass. I've had to assume that your XML is structured the way I've done it, but this may help you.
$page = file_get_contents('http://www.halfmen.dk/!hmhb8/score.php');
$doc = new DOMDocument();
$doc->loadHTML($page);
$score = $doc->getElementsByTagName('div');
$doc = new DOMDocument('1.0','utf-8');
$doc->formatOutput = true;
$droot = $doc->createElement('ROOT');
$droot = $doc->appendChild($droot);
foreach ($score as $value) {
if ($value->getAttribute('class') == 'xml') {
$dsection = $doc->createElement('SECTION');
$dsection = $droot->appendChild($dsection);
$dkey = $doc->createElement('KEY', $value->firstChild->nodeValue);
$dkey = $dsection->appendChild($dkey);
$dteam = $doc->createElement('TEAM', $value->firstChild->nextSibling->nodeValue);
$dteam = $dsection->appendChild($dteam);
}
}

Get Element by ClassName with DOMdocument() Method

Here is what I am trying to achieve : retrieve all products on a page and put them into an array. Here is the code I am using :
$page2 = curl_exec($ch);
$doc = new DOMDocument();
#$doc->loadHTML($page2);
$nodes = $doc->getElementsByTagName('title');
$noders = $doc->getElementsByClassName('productImage');
$title = $nodes->item(0)->nodeValue;
$product = $noders->item(0)->imageObject.src;
It works for the $title but not for the product. For info, in the HTML code the img tag looks like this :
<img alt="" class="productImage" data-altimages="" src="xxxx">
I have been looking at this (PHP DOMDocument how to get element?) but I still don't understand how to make it work.
PS : I get this error :
Call to undefined method DOMDocument::getElementsByclassName()

I finally used the following solution :
$classname="blockProduct";
$finder = new DomXPath($doc);
$spaner = $finder->query("//*[contains(#class, '$classname')]");

https://stackoverflow.com/a/31616848/3068233
Linking this answer as it helped me the most with this problem.
function getElementsByClass(&$parentNode, $tagName, $className) {
$nodes=array();
$childNodeList = $parentNode->getElementsByTagName($tagName);
for ($i = 0; $i < $childNodeList->length; $i++) {
$temp = $childNodeList->item($i);
if (stripos($temp->getAttribute('class'), $className) !== false) {
$nodes[]=$temp;
}
}
return $nodes;
}
Theres the code and heres the usage
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($html);
$content_node=$dom->getElementById("content_node");
$div_a_class_nodes=getElementsByClass($content_node, 'div', 'a');

function getElementsByClassName($dom, $ClassName, $tagName=null) {
if($tagName){
$Elements = $dom->getElementsByTagName($tagName);
}else {
$Elements = $dom->getElementsByTagName("*");
}
$Matched = array();
for($i=0;$i<$Elements->length;$i++) {
if($Elements->item($i)->attributes->getNamedItem('class')){
if($Elements->item($i)->attributes->getNamedItem('class')->nodeValue == $ClassName) {
$Matched[]=$Elements->item($i);
}
}
}
return $Matched;
}
// usage
$dom = new \DOMDocument('1.0');
#$dom->loadHTML($html);
$elementsByClass = getElementsByClassName($dom, $className, 'h1');

How can i get the value of attribute in of a xml node in php?

I'm using simplexml to read a xml file. So far i'm unable to get the attribute value i'm looking for. this is my code.
if(file_exists($xmlfile)){
$doc = new DOMDocument();
$doc->load($xmlfile);
$usergroup = $doc->getElementsByTagName( "preset" );
foreach($usergroup as $group){
$pname = $group->getElementsByTagName( "name" );
$att = 'code';
$name = $pname->attributes()->$att; //not working
$name = $pname->getAttribute('code'); //not working
if($name==$preset_name){
echo($name);
$group->parentNode->removeChild($group);
}
}
}
and my xml file looks like
<presets>
<preset>
<name code="default">Default</name>
<createdBy>named</createdBy>
<icons>somethignhere</icons>
</preset>
</presets>

Try this :
function getByPattern($pattern, $source)
{
$dom = new DOMDocument();
#$dom->loadHTML($source);
$xpath = new DOMXPath($dom);
$result = $xpath->evaluate($pattern);
return $result;
}
And you may use it like (using XPath) :
$data = getByPattern("/regions/testclass1/presets/preset",$xml);
UPDATE
Code :
<?php
$xmlstr = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?><presets><preset><name code=\"default\">Default</name><createdBy>named</createdBy><icons>somethignhere</icons></preset></presets>";
$xml = new SimpleXMLElement($xmlstr);
$result = $xml->xpath("/presets/preset/name");
foreach($result[0]->attributes() as $a => $b) {
echo $a,'="',$b,"\"\n";
}
?>
Output :
code="default"
P.S. And also try accepting answers as #TJHeuvel mentioned; it's an indication that you respect the community (and the community will be more than happy to help you more, next time...)

Actually question in my head includes deleting a node as well , mistakenly i could not add it. So in my point of view this is the complete answer, i a case if someone else find this useful.
This answer doesn't include SimpleXMLElement class because how hard i tried it didn't delete the node with unset(); . So back to where i was , i finally found an answer. This is my code.
and its Simple!!!
if(file_exists($xmlfile)){
$doc = new DOMDocument();
$doc->load($xmlfile);
$presetgroup = $doc->getElementsByTagName( "preset" );
foreach($presetgroup as $group){
$pname = $group->getElementsByTagName( "name" );
$pcode = $pname->item(0)->getAttribute('code');
if($pcode==$preset_name){
echo($preset_name);
$group->parentNode->removeChild($group);
}
}
}
$doc->save($xmlfile);

Getting meta title and description

I am having trouble getting the meta description/title from this specific site.
Here is some code:
$file = file('http://www.thegooddrugsguide.com/lsd/index.htm');
$file = implode("",$file);
if (preg_match('/<title>(.*?)<\/title>/is',$file,$t)) $title = $t[1];
It works with other sites, but not with the site in question. What could be the problem?

This should work fine:
$doc = new DOMDocument;
$doc->loadHTMLFile('http://example.com');
$title = $doc->getElementsByTagName('title');
$title = $title[0];
$metas = $doc->getElementsByTagName('meta');
foreach ($metas as $meta) {
if (strtolower($meta->getAttribute('name')) == 'description') {
$description = $meta->getAttribute('value');
}
}
More info: http://www.php.net/manual/en/book.dom.php
Edit: this shorter version can also work to find the description:
$xpath = new DOMXPath($doc);
$description = $xpath->query('//meta[#name="description"]/#content');

$url = "http://www.thegooddrugsguide.com/lsd/index.htm";
$tags = get_meta_tags($url);
$description = $tags["description"];

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extract http-equiv content with php - php

I found a problem; I did changed $property = $meta->getAttribute('http_equiv'); to $property = $meta->getAttribute('http-equiv'); case solved. Code works now.

Related

How to get child nodes from an xml url?

How to create looped XML file from HTML in PHP?

Get Element by ClassName with DOMdocument() Method

How can i get the value of attribute in of a xml node in php?

Getting meta title and description

Categories

Resources