innerHTML of each link not working - php

I have the following code
function DOMinnerHTML($element)
{
$innerHTML="";
$children=$element->childNodes;
foreach($children as $child)
{
$tmp_dom=new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child,true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
$doc=new DOMDocument();
$doc->loadHtmlFile('http://www.google.com/');
$links=$doc->getElementsByTagName('a');
foreach($links as $m)
{
echo DOMinnerHTML($links[$m]).'<br />';
}
And it outputs nothing.
How can I do so that it outputs the content of each link in all http://google.com ?

it seems this:
echo DOMinnerHTML($links[$m]).'<br />';
should be just this:
echo DOMinnerHTML($m).'<br />';

Related

How can I get the span tag value using php curl and simple html dom parser?the exact value does not show

code for this:
<span class="file-count-label" ng-init="totalResultCount=304575" ng-show="totalResultCount" style="">304,575</span>
$videos=$html->find('span[class=file-count-label]');
foreach($videos as $e)
{
echo $e->plaintext;
}
output:{{totalResultCount | number}} but i want 304,575
You can use xpath selector
<?php
$html = '<span class="file-count-label" ng-init="totalResultCount=304575" ng-show="totalResultCount" style="">304,575</span>';
$doc = new DOMDocument;
$doc->loadHTML($html);
$finder = new DomXPath($doc);
$classname="file-count-label";
$videos = $finder->query("//*[contains(#class, '$classname')]");
foreach($videos as $e)
{
echo $e->nodeValue;
}
Output:- https://eval.in/1056186
Reference taken:- https://stackoverflow.com/a/6366390/4248328
It works for me:
include 'simple_html_dom.php';
$html = str_get_html('<span class="file-count-label" ng-init="totalResultCount=304575" ng-show="totalResultCount" style="">304,575</span>');
$videos=$html->find('span[class=file-count-label]');
foreach($videos as $e)
{
echo $e->plaintext;
// 304,575
}

DomDocument parse Newline works with span but not img

See here: https://ideone.com/bjs3IC
Why does the newline correctly display with the spans but not imgs ?
<?php
outputImages();
outputSpans();
function outputImages(){
$html = "<div class='test'>
<pre>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
</pre>
</div>";
getHtml($html);
}
function outputSpans(){
$html = "<div class='test'>
<pre>
<span>a</span>
<span>b</span>
<span>c</span>
</pre>
</div>";
getHtml($html);
}
function getHtml($html){
$doc = new DOMDocument;
$doc->loadhtml($html);
$xpath = new DOMXPath($doc);
$tags = $xpath->query('//div[#class="test"]');
print(get_inner_html($tags[0]));
}
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
The DOMDocument::loadHTML function has a second options parameter. It appears like the LIBXML_NOBLANKS is (at least one of) the default values there.
You can use
$doc->loadhtml($html, LIBXML_NOEMPTYTAG);
To override that default value and your code will work the same for the two samples.
p.s.
Not sure why you use
print(get_inner_html($tags[0]));
The $tags variable is a DOMNodeList, so you should use $tags->item(0) to get the first tag.
Your complete code should look like this:
outputImages();
outputSpans();
function outputImages() {
$html = "<div class='test'>
<pre>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
<img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
</pre>
</div>";
getHtml($html);
}
function outputSpans() {
$html = "<div class='test'>
<pre>
<span>a</span>
<span>b</span>
<span>c</span>
</pre>
</div>";
getHtml($html);
}
function getHtml($html) {
$doc = new DOMDocument;
$doc->loadHTML($html, LIBXML_NOEMPTYTAG);
$xpath = new DOMXPath($doc);
$tags = $xpath->query('//div[#class="test"]');
print(get_inner_html($tags->item(0)));
}
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}

php dom not able to find any nodes

I'm trying to get the href of all anchor(a) tags using this code
$obj = json_decode($client->getResponse()->getContent());
$dom = new DOMDocument;
if($dom->loadHTML(htmlentities($obj->data->partial))) {
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
echo $node->getAttribute('href');
}
}
where the returned JSON is like here but it doesn't echo anything. The HTML does have a tags but the foreach is never run. What am I doing wrong?
Just remove that htmlentities(). It will work just fine.
$contents = file_get_contents('http://jsonblob.com/api/jsonBlob/54a7ff55e4b0c95108d9dfec');
$obj = json_decode($contents);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($obj->data->partial);
libxml_clear_errors();
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHTML($node) . '<br/>';
echo $node->getAttribute('href') . '<br/>';
}

Extracting href attribute and the value using php dom parser

From the given markup i have to extract the hyperlink and the ALL title of hyperlink
<span></span>
<span>Chapter1</span>
<span>Chapter2</span>
<span>Chapter3</span>
for this i've written follwing code but its not working
$doc = new DOMDocument();
$doc->loadHTML($page_links);
$tags = $doc->getElementsByTagName('span');
foreach ($tags as $tag) {
echo '\n'.$tag->nodeValue;
if($tag->hasChildNodes()) {
echo $tag->childNodes->getAttribute('href');
} else {
echo 'default.htm';
}
}
i am expecting this output:
Chapter1 default.htm
Chapter2 page2.htm
Chapter3 page3.htm
and so on
Could you please try this ?
$doc = new DOMDocument();
$doc->loadHTML($page_links);
$tags = $doc->getElementsByTagName('span');
for($i=0;$i<$tags->length;$i++){
echo $tags->item($i)->nodeValue;
if($tags->item($i)->hasChildNodes()) {
if($tags->item($i)->firstChild->nodeName=='a'){
echo " ".$tags->item($i)->firstChild->getAttribute('href').'<br/>';
}else{
echo " default.htm<br/>";
}
}
}

how to output text of elementById in DOM?

It should be very simple. I am loading in php via DOMDocument();
$doc = new DOMDocument();
$doc->loadHTML($html);
$el = $doc->getElementById('somethingId');
Lets say i have
<html><head></head><body><div id="somethingId">my
<span style="background:red">something else</span>
information</div></body></html>
Q1. How to echo whats inside that element ("my information") from $el?
Q2. How to echo whats inside and including span data (like innerHTML in javascript)?
Answer to Q2:
$children = $el->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
echo $innerHTML
You should do
echo ($el->nodeValue);
if (!is_null($el)) {
$content = $el->nodeValue;
if (empty($content)) {
$content = $el->textContent;
}
echo $content;
}

Categories