This question already has answers here:
Getting title and meta tags from external website
(21 answers)
Closed 9 years ago.
I am trying to show rss feed links in my website, All going well but its taking so much time to get og:image property by using file_get_contents() method. Is there any other way to fetch meta tag properties?
Is Python helpful to get these tags faster ?
This is how I used to get all the og:tags:
libxml_use_internal_errors(true);
$doc = new DomDocument();
$doc->loadHTML(file_get_contents($url));
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(#property, \'og:\')]';
$metas = $xpath->query($query);
foreach ($metas as $meta) {
$property = $meta->getAttribute('property');
$content = $meta->getAttribute('content');
echo '<h1>Meta '.$property.' <span>'.$content.'</span></h1>';
}
<?php
$page_content = file_get_contents('http://example.com');
$dom_obj = new DOMDocument();
$dom_obj->loadHTML($page_content);
$meta_val = null;
foreach($dom_obj->getElementsByTagName('meta') as $meta) {
if($meta->getAttribute('property')=='og:image'){
$meta_val = $meta->getAttribute('content');
}
}
echo $meta_val;
?>
Related
This question already has answers here:
Loop over DOMDocument
(4 answers)
Closed 1 year ago.
Is it possible to get all HTML elements (children) with content using PHP (DOMDocument class)? I just can't get the results. Let say I only know that I will have <td> tag but don't know what tags would be inside <td>
Example:
$doc = new DOMDocument();
$el = "<td><a href='http://google.hr'>test1</a><div>Test2</div></td>";
$doc->loadHTML($el);
$doc->getElementsByTagName("td")->item(0)->nodeValue /* I only get plain text */
EDIT: No JavaScript like solution
This will give you all element information:
$html = "<td><a href='http://google.hr'>test1</a><div>Test2</div></td>";
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('*') as $element ){
echo "<pre>";
print_R($element);
echo "</pre>";
}
To get Attribute information use like:
$p = $dom->getElementsByTagName('a')->item(0);
if ($p->hasAttributes()) {
foreach ($p->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
}
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
This used to work fine to get me a text from a certain web page that exists in a div tag when a user type the id below:
function get_text($id) {
$result = file_get_contents('www.site.net/.$id.'');
$regex = '/<div class="x">([^<]*)<\/div>/';
if (preg_match($regex, $result, $matches) && !empty($matches[1])) {
return $matches[1];
} else {
return 'N/A';
}
}
Now the text is more difficult to get, because it's situated here:
<div class="X2">
<h2 style="font-family: 'Pacifico', cursive;">TEXT</h2>
</div>
I tried both div and h2 but it returns me nothing, please help ! thank you.
This is quite easily solved using PHP's DOMDocument:
$html = <<<'EOT'
<div class="X2">
<h2 style="font-family: 'Pacifico', cursive;">TEXT</h2>
</div>
EOT;
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$div = $xpath->query('//div[contains(#class, "X2")]')->item(0);
echo $div->textContent;
Output:
TEXT
Demo on 3v4l.org
To fit into your function environment, this should work:
function get_text($id) {
$html = file_get_contents("www.site.net/$id");
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$div = $xpath->query('//div[contains(#class, "X2")]');
if (count($div)) {
return $div->item(0)->textContent;
}
else {
return 'N/A';
}
}
This question already has answers here:
How to get Open Graph Protocol of a webpage by php?
(8 answers)
Closed 8 years ago.
I am trying to retrieve some meta data included into a SimpleXMLElement. I am using XPATH and I struggle to get the value that interests me.
Here is an extract of the webpage header (from : http://www.wayfair.de/CleverFurn-Couchtisch-Abby-69318X2-MFE2223.html)
Do you know how I could retrieve all xmlns data in an array containing :
1) og:type
2) og:url
3) og:image
....
x) og:upc
<meta xmlns:og="http://opengraphprotocol.org/schema/" property="og:title" content="CleverFurn Couchtisch "Abby"" />
And here's my php code
<?php
$html = file_get_contents("http://www.wayfair.de/CleverFurn-Couchtisch-Abby-69318X2-MFE2223.html");
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->recover=true;
#$doc->loadHTML("<html><body>".$html."</body></html>");
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*/meta[#property='og:url']");
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "<br/>[". $element->nodeName. "]";
var_dump($element);
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
?>
Just found the answer :
How to get Open Graph Protocol of a webpage by php?
<?php
$html = file_get_contents("http://www.wayfair.de/CleverFurn-Couchtisch-Abby-69318X2-MFE2223.html");
libxml_use_internal_errors(true); // Yeah if you are so worried about using # with warnings
$doc = new DomDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(#property, \'og:\')]';
$metas = $xpath->query($query);
foreach ($metas as $meta) {
$property = $meta->getAttribute('property');
$content = $meta->getAttribute('content');
$rmetas[$property] = $content;
}
var_dump($rmetas);
?>
This question already has answers here:
convert part of dom element to string with html tags inside of them
(2 answers)
Closed 9 years ago.
Im trying to echo HTML using PHP DOM:
$doc = new \DomDocument('1.0', 'UTF-8');
$doc->loadHTMLFile("http://www.nu.nl");
$tags = $doc->getElementsByTagName('a');
echo $doc->saveHTML($tags);
This is getting me a blank page. I also tried:
$doc = new DOMDocument();
$doc->loadHTMLFile("http://www.nu.nl");
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
echo $link->getAttribute('href') . '<br />';
}
This is getting me the "href" as plain text. I have Googled for hours now and tried many things but I can't figure out how to output HTML as HTML.
here is a fix that will add the root url for relative links
$pageurl = "http://www.nu.nl";
$html = file_get_contents($pageurl);
$html = str_replace('&','&',$html);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
$myLink = $link->getAttribute('href');
if (substr($myLink,0,7) == 'http://') {
echo ''.$myLink.'<br/>';
} else {
echo ''.$myLink.'<br/>';
}
}
You probably want something like this doing:
$doc = new DOMDocument();
$doc->loadHTMLFile("http://www.nu.nl");
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
$thelinks[] = '' . trim(preg_replace('/\s{2,}/', '', $link->textContent)) . '';
}
var_dump($thelinks);
In the foreach
echo $doc->saveHTML($link);
This question already has answers here:
Getting DOM elements by classname
(7 answers)
Closed 9 years ago.
I know how to get the value based on id and tags while parsing HTML. But I don't know how to get the value based on classname. This is what I have tried:
$dom = new DOMDocument();
$dom->loadHTML($html);
$data = $dom->getElementsByTagName($identifier);
foreach ($data as $v)
{
$value[] = $v->nodeValue;
}
You have just tried to get elements by tagname. What you need to do is:-
1.Get all the elements by TagName.
2. Now take the classname from the tagname if exists.
3.Compare the classname wit ur input classname, if exists print that data.
Try out this , worked for me:-
$dom = new DOMDocument();
$dom->loadHTML($html);
$new_data = $dom->getElementsByTagName('*');
$matched = array();
//echo $data->nodeValue;
for ($i = 0; $i < $new_data->length; $i++) {
if (isset($new_data->item($i)->attributes->getNamedItem('class')->nodeValue)) {
$classname = $new_data->item($i)->attributes->getNamedItem('class')->nodeValue;
if ($classname == $identifier) {
$matched[] = $new_data->item($i)->nodeValue;
}
}
}
print_r($matched);