Getting meta title and description - php

I am having trouble getting the meta description/title from this specific site.
Here is some code:
$file = file('http://www.thegooddrugsguide.com/lsd/index.htm');
$file = implode("",$file);
if (preg_match('/<title>(.*?)<\/title>/is',$file,$t)) $title = $t[1];
It works with other sites, but not with the site in question. What could be the problem?

This should work fine:
$doc = new DOMDocument;
$doc->loadHTMLFile('http://example.com');
$title = $doc->getElementsByTagName('title');
$title = $title[0];
$metas = $doc->getElementsByTagName('meta');
foreach ($metas as $meta) {
if (strtolower($meta->getAttribute('name')) == 'description') {
$description = $meta->getAttribute('value');
}
}
More info: http://www.php.net/manual/en/book.dom.php
Edit: this shorter version can also work to find the description:
$xpath = new DOMXPath($doc);
$description = $xpath->query('//meta[#name="description"]/#content');

$url = "http://www.thegooddrugsguide.com/lsd/index.htm";
$tags = get_meta_tags($url);
$description = $tags["description"];

Related

Extract http-equiv content with php

I'm trying to extract all meta http-equiv properties from url.
Here is the code
function fetch_http_equiv($url)
{
$data = file_get_contents($url);
$dom = new DomDocument;
#$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$metas = $xpath->query('//*/meta[starts-with(#http-equiv)]');
$http_equiv = array();
foreach($metas as $meta){
$property = $meta->getAttribute('http-equiv');
$content = $meta->getAttribute('content');
$http_equiv[$property] = $content;
}
return $http_equiv;
}
// fetch meta http-equiv 's
$http_equiv = fetch_http_equiv($link);
// if $http_equiv Content-Language exists
if (empty($http_equiv['Content-Language'])) {
}else{
$meta_content_language = $http_equiv['Content-Language'];
}
For the love of god In my mind it should work, what did I missed ?
edit:
I found a problem; I did changed
$property = $meta->getAttribute('http_equiv');
to
$property = $meta->getAttribute('http-equiv');
case solved.
I found a problem; I did changed
$property = $meta->getAttribute('http_equiv');
to
$property = $meta->getAttribute('http-equiv');
case solved.
Code works now.

Parsing RSS with PHP

I'm trying to parse RSS: http://www.mlssoccer.com/rss/en.xml .
$feed = new DOMDocument();
$feed->load($url)
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
foreach($items as $key => $item)
{
$title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
$description = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
// do some stuff
}
The thing is: I'm getting "$title" and "$pubDate" without a problem, but for some reason "$description" is always empty, there's nothing in it. What could be the reason for such behaviour and how to fix it?
The problem was with CDATA you need to use textContent instead of nodeValue to retreive value beetween
<?php
$feed = new DOMDocument();
$feed->load('http://www.mlssoccer.com/rss/en.xml');
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
foreach($items as $key => $item)
{
$title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
$description = $item->getElementsByTagName('description')->item(0)->textContent; // textContent
}
Here can be whitespaces between the opening <description> tag and the opening <![CDATA[. This is a text node.
So if you access the firstChild of description, you might fetch that whitespace text node.
In a generic way you can set the DOMdocument to ignore whitespace nodes:
$feed = new DOMDocument();
$feed->preserveWhiteSpace = FALSE;
$feed->load($url);
Additionally you should check out XPath, it makes reading a DOM much easier:
$xpath = new DOMXpath($feed);
foreach ($xpath->evaluate('//channel/item') as $item) {
$title = $xpath->evaluate('string(title)', $item);
$pubDate = $xpath->evaluate('string(pubDate)', $item);
$description = $xpath->evaluate('string(description)', $item);
// do some stuff
var_dump([$title, $pubData, $description]);
}

Get Element by ClassName with DOMdocument() Method

Here is what I am trying to achieve : retrieve all products on a page and put them into an array. Here is the code I am using :
$page2 = curl_exec($ch);
$doc = new DOMDocument();
#$doc->loadHTML($page2);
$nodes = $doc->getElementsByTagName('title');
$noders = $doc->getElementsByClassName('productImage');
$title = $nodes->item(0)->nodeValue;
$product = $noders->item(0)->imageObject.src;
It works for the $title but not for the product. For info, in the HTML code the img tag looks like this :
<img alt="" class="productImage" data-altimages="" src="xxxx">
I have been looking at this (PHP DOMDocument how to get element?) but I still don't understand how to make it work.
PS : I get this error :
Call to undefined method DOMDocument::getElementsByclassName()
I finally used the following solution :
$classname="blockProduct";
$finder = new DomXPath($doc);
$spaner = $finder->query("//*[contains(#class, '$classname')]");
https://stackoverflow.com/a/31616848/3068233
Linking this answer as it helped me the most with this problem.
function getElementsByClass(&$parentNode, $tagName, $className) {
$nodes=array();
$childNodeList = $parentNode->getElementsByTagName($tagName);
for ($i = 0; $i < $childNodeList->length; $i++) {
$temp = $childNodeList->item($i);
if (stripos($temp->getAttribute('class'), $className) !== false) {
$nodes[]=$temp;
}
}
return $nodes;
}
Theres the code and heres the usage
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($html);
$content_node=$dom->getElementById("content_node");
$div_a_class_nodes=getElementsByClass($content_node, 'div', 'a');
function getElementsByClassName($dom, $ClassName, $tagName=null) {
if($tagName){
$Elements = $dom->getElementsByTagName($tagName);
}else {
$Elements = $dom->getElementsByTagName("*");
}
$Matched = array();
for($i=0;$i<$Elements->length;$i++) {
if($Elements->item($i)->attributes->getNamedItem('class')){
if($Elements->item($i)->attributes->getNamedItem('class')->nodeValue == $ClassName) {
$Matched[]=$Elements->item($i);
}
}
}
return $Matched;
}
// usage
$dom = new \DOMDocument('1.0');
#$dom->loadHTML($html);
$elementsByClass = getElementsByClassName($dom, $className, 'h1');

How do I rename XML values using php?

How do I rename a value in xml using PHP? This is what I've got so far:
<?php
$q = $_GET["q"];
$q = stripslashes($q);
$q = explode('|^', $q);
$old = $q[0];
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->Load("test.xml");
$xpath = new DOMXPath($dom);
$query1 = 'channel/item[title="' . $old . '"]/title';
$entries = $xpath->query($query1);
foreach ($entries as $entry)
{
$oldchapter = $entry->parentNode->removeChild($entry);
$item = $dom->getElementsByTagName('item');
foreach ($item as $items)
{
$title = $dom->createElement('title', $q[1]);
$items->appendChild($title);
}
}
$dom->save("test.xml");
Basically, what it does is take two titles from a url, the old existing title, and the one the user wants to change it to (so like this oldtitle|^newtitle), and puts them into an array.
What I've tried doing is removing the existing old title, and then making a new title with, using the new title value from the url, but it doesn't seem to be working. Where am I going wrong, or is there an easier way of doing this?
The way to do this is with DOMNode::replaceChild(). The majority of your code is correct, you've just slightly over-complicated some of the DOM stuff.
Try this:
<?php
$q = $_GET["q"];
$q = stripslashes($q);
$q = explode('|^', $q);
$old = $q[0];
$dom = new DOMDocument;
// Do this *before* loading the document
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->Load("test.xml");
$xpath = new DOMXPath($dom);
$query1 = 'channel/item[title="' . $old . '"]/title';
$entries = $xpath->query($query1);
// This is all you need to do in the loop
foreach ($entries as $oldTitle) {
$newTitle = $dom->createElement('title', $q[1]);
$entry->parentNode->replaceChild($newTitle, $oldTitle);
}
$dom->save("test.xml");

Xpath for extracting links

I create an scraper for an automoto site and first I want to get all manufactures and after that all links of models for each manufactures but with the code below I get only the first model on the list. Why?
<?php
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.auto-types.com');
$xpath = new DOMXPath($dom);
$entries = $xpath->query("//li[#class='clearfix_center']/a/#href");
$output = array();
foreach($entries as $e) {
$dom2 = new DOMDocument();
#$dom2->loadHTMLFile('http://www.auto-types.com' . $e->textContent);
$xpath2 = new DOMXPath($dom2);
$data = array();
$data['newLinks'] = trim($xpath2->query("//div[#class='modelImage']/a/#href")->item(0)->textContent);
$output[] = $data;
}
echo '<pre>' . print_r($output, true) . '</pre>';
?>
SO I need to get: mercedes/100, mercedes/200, mercedes/300 but now with my script i get only the first link so mercedes/100...
please help
You need to iterate through the results instead of just taking the first item:
$items = $xpath2->query("//div[#class='modelImage']/a/#href");
$links = array();
foreach($items as $item) {
$links[] = $item->textContent;
}
$data['newLinks'] = implode(', ', $links);

Categories