Extraction of src and value from html not working? - php

So the problem here is that when i use getElementById() it doesn't work. But, if i replace it with getElementsByTagName('img') it's perfectly fine.
How do i fix this problem, if possible?
(html codes are in file garden.php)Html:
<img id="head" src="images/flowers.png" value="blah">
(Php codes is in the head of the garden.php file)
Php:
<?
$html = file_get_contents('garden.php');
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementById('head') as $tag) {
echo $tag->getAttribute('value'); // "prints" yellow
echo "<br>";
echo $tag->getAttribute('src'); // prints images/flowers.png
}

You should not be using a foreach loop. IDs are unique, so getElementById returns a DOMElement, not a DOMNodeList.
$tag = $dom->getElementById('head');
echo $tag->getAttribute('value') . '<br>' . $tag->getAttribute('src');

Related

How to replace getElementsByTagName() By document.getElementsById()

I have this code and I want to get a link of an image stored in a website by its Id but this code use getElementsByTagName('') :
<?php
$html = file_get_contents('http://example.com/dir/webpage.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('img');
foreach ($links as $link){
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}
?>
And The HTML is:
<a href="/images/image1.png" id="img_1_id">
<div class="download"></div>
</a>
I want to replace getElementsByTagName('img') with document.getElementsByById(img_1_id)
so the script get the url of the selected image with the id: img_1_id
If there another way / code to do this please post it :)
Thank you pros!
getElementById returns a single element, you don't need a loop.
$link = $dom->getElemebtById('img_1_id');
echo $link->nodeValue;
echo $link->getAttribute('href');
BTW, img elements don't have an href attribute, they have src. They also don't have anything in their nodeValue, since <img> is not a container element.
you have to put the "
document.getElementsByById("img_1_id");
sou you get the element with id = "img_1_id"
what about this?
<?php
$html = file_get_contents('http://example.com/dir/webpage.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementById('img_1_id');
foreach ($links as $link){
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}
?>

php dom not able to find any nodes

I'm trying to get the href of all anchor(a) tags using this code
$obj = json_decode($client->getResponse()->getContent());
$dom = new DOMDocument;
if($dom->loadHTML(htmlentities($obj->data->partial))) {
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
echo $node->getAttribute('href');
}
}
where the returned JSON is like here but it doesn't echo anything. The HTML does have a tags but the foreach is never run. What am I doing wrong?
Just remove that htmlentities(). It will work just fine.
$contents = file_get_contents('http://jsonblob.com/api/jsonBlob/54a7ff55e4b0c95108d9dfec');
$obj = json_decode($contents);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($obj->data->partial);
libxml_clear_errors();
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHTML($node) . '<br/>';
echo $node->getAttribute('href') . '<br/>';
}

php get images only within body tags

I need to do the equivalent of this:
$tags2 = $doc->getElementsByTagName('img');
$mybody = $doc->getElementsByTagName('body');
//if there's a body tag
foreach ($mybody as $bod){
//loop through each img element
foreach ($tags2 as $tag) {
echo '<img src=' . $tag->getAttribute('src') . '/>';
echo "<br/>" . $tag->getAttribute('href') ;
}
}
Here's the context:
$str = file_get_contents('http://somewebsite.html');
$doc = new DOMDocument();
#$doc->loadHTML('<?xml encoding="UTF-8">' . $str);
$tidy = new tidy();
$tidy->parseFile($str);
$tidy->cleanRepair();
if(!empty($tidy->errorBuffer)) {
echo "The following errors or warnings occured:\n";
echo $tidy->errorBuffer;
}
else {
$str = $tidy;
}
$tags2 = $doc->getElementsByTagName('img');
$mybody = $doc->getElementsByTagName('body');
foreach ($mybody as $bod){
foreach ($tags2 as $tag) {
echo '<img src=' . $tag->getAttribute('src') . '/>';
echo "<br/>" . $tag->getAttribute('href') ;
}
}
^ outputs all the images on the page, in the header, on sidebars, etc. as well as the image in the body. I just want the image in the body. I tried a few other examples I saw on here using recursion but they were to get the styles or paragraph tags and I couldn't get them to retrieve image tag and image src attribute properly.
How can I do an inner loop for any images within the body once I have the body tag?
Thank you.
You just need to reverse two lines and rewrite a smidgen.
$mybody = $doc->getElementsByTagName('body')->item(0);
$tags2 = $mybody->getElementsByTagName('img');
The reason is that the Body tag is actually a DOMElement instance of the class, and is able to perform the same call to getElementsByTagName.

How to return outer html of DOMDocument?

I'm trying to replace video links inside a string - here's my code:
$doc = new DOMDocument();
$doc->loadHTML($content);
foreach ($doc->getElementsByTagName("a") as $link)
{
$url = $link->getAttribute("href");
if(strpos($url, ".flv"))
{
echo $link->outerHTML();
}
}
Unfortunately, outerHTML doesn't work when I'm trying to get the html code for the full hyperlink like <a href='http://www.myurl.com/video.flv'></a>
Any ideas how to achieve this?
As of PHP 5.3.6 you can pass a node to saveHtml, e.g.
$domDocument->saveHtml($nodeToGetTheOuterHtmlFrom);
Previous versions of PHP did not implement that possibility. You'd have to use saveXml(), but that would create XML compliant markup. In the case of an <a> element, that shouldn't be an issue though.
See http://blog.gordon-oheim.biz/2011-03-17-The-DOM-Goodie-in-PHP-5.3.6/
You can find a couple of propositions in the users notes of the DOM section of the PHP Manual.
For example, here's one posted by xwisdom :
<?php
// code taken from the Raxan PDI framework
// returns the html content of an element
protected function nodeContent($n, $outer=false) {
$d = new DOMDocument('1.0');
$b = $d->importNode($n->cloneNode(true),true);
$d->appendChild($b); $h = $d->saveHTML();
// remove outter tags
if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
return $h;
}
?>
The best possible solution is to define your own function which will return you outerhtml:
function outerHTML($e) {
$doc = new DOMDocument();
$doc->appendChild($doc->importNode($e, true));
return $doc->saveHTML();
}
than you can use in your code
echo outerHTML($link);
Rename a file with href to links.html or links.html to say google.com/fly.html that has flv in it or change flv to wmv etc you want href from if there are other href
it will pick them up as well
<?php
$contents = file_get_contents("links.html");
$domdoc = new DOMDocument();
$domdoc->preservewhitespaces=“false”;
$domdoc->loadHTML($contents);
$xpath = new DOMXpath($domdoc);
$query = '//#href';
$nodeList = $xpath->query($query);
foreach ($nodeList as $node){
if(strpos($node->nodeValue, ".flv")){
$linksList = $node->nodeValue;
$htmlAnchor = new DOMElement("a", $linksList);
$htmlURL = new DOMAttr("href", $linksList);
$domdoc->appendChild($htmlAnchor);
$htmlAnchor->appendChild($htmlURL);
$domdoc->saveHTML();
echo ("<a href='". $node->nodeValue. "'>". $node->nodeValue. "</a><br />");
}
}
echo("done");
?>

how to handle DOM in PHP

My PHP code
$dom = new DOMDocument();
#$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="text"]');
foreach ($tags as $tag) {
echo $tag->textContent;
}
What I'm trying to do here is to get the content of the div that has class 'text' but the problem when I loop and echo the results I only get the text I can't get the HTML code with images and all the HTML tags like p, br, img... etc i tried to use $tag->nodeValue; but also nothing worked out.
Personally, I like Simple HTML Dom Parser.
include "lib.simple_html_dom.php"
$html = str_get_html($file);
foreach($html->find('div.text') as $e){
echo $e->innertext;
}
Pretty simple, huh? It accommodates selectors like jQuery :)
What you need to do is create a temporary document, add the element to that and then use saveHTML():
foreach ($tags as $tag) {
$doc = new DOMDocument;
$doc->appendChild($doc->importNode($tag, true));
$html = $doc->saveHTML();
}
I found this snippet at http://www.php.net/manual/en/class.domelement.php:
<?php
function getInnerHTML($Node)
{
$Body = $Node->ownerDocument->documentElement->firstChild->firstChild;
$Document = new DOMDocument();
$Document->appendChild($Document->importNode($Body,true));
return $Document->saveHTML();
}
?>
Not sure if it works though.

Categories