Extraction of src and value from html not working?

Extraction of src and value from html not working? - php

So the problem here is that when i use getElementById() it doesn't work. But, if i replace it with getElementsByTagName('img') it's perfectly fine.
How do i fix this problem, if possible?
(html codes are in file garden.php)Html:
<img id="head" src="images/flowers.png" value="blah">
(Php codes is in the head of the garden.php file)
Php:
<?
$html = file_get_contents('garden.php');
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementById('head') as $tag) {
echo $tag->getAttribute('value'); // "prints" yellow
echo "<br>";
echo $tag->getAttribute('src'); // prints images/flowers.png
}

You should not be using a foreach loop. IDs are unique, so getElementById returns a DOMElement, not a DOMNodeList.
$tag = $dom->getElementById('head');
echo $tag->getAttribute('value') . '<br>' . $tag->getAttribute('src');

Related

How to replace getElementsByTagName() By document.getElementsById()

I have this code and I want to get a link of an image stored in a website by its Id but this code use getElementsByTagName('') :
<?php
$html = file_get_contents('http://example.com/dir/webpage.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('img');
foreach ($links as $link){
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}
?>
And The HTML is:
<a href="/images/image1.png" id="img_1_id">
<div class="download"></div>
</a>
I want to replace getElementsByTagName('img') with document.getElementsByById(img_1_id)
so the script get the url of the selected image with the id: img_1_id
If there another way / code to do this please post it :)
Thank you pros!

getElementById returns a single element, you don't need a loop.
$link = $dom->getElemebtById('img_1_id');
echo $link->nodeValue;
echo $link->getAttribute('href');
BTW, img elements don't have an href attribute, they have src. They also don't have anything in their nodeValue, since <img> is not a container element.

you have to put the "
document.getElementsByById("img_1_id");
sou you get the element with id = "img_1_id"

what about this?
<?php
$html = file_get_contents('http://example.com/dir/webpage.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementById('img_1_id');
foreach ($links as $link){
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}
?>

php dom not able to find any nodes

I'm trying to get the href of all anchor(a) tags using this code
$obj = json_decode($client->getResponse()->getContent());
$dom = new DOMDocument;
if($dom->loadHTML(htmlentities($obj->data->partial))) {
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
echo $node->getAttribute('href');
}
}
where the returned JSON is like here but it doesn't echo anything. The HTML does have a tags but the foreach is never run. What am I doing wrong?

Just remove that htmlentities(). It will work just fine.
$contents = file_get_contents('http://jsonblob.com/api/jsonBlob/54a7ff55e4b0c95108d9dfec');
$obj = json_decode($contents);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($obj->data->partial);
libxml_clear_errors();
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHTML($node) . '<br/>';
echo $node->getAttribute('href') . '<br/>';
}

php get images only within body tags

I need to do the equivalent of this:
$tags2 = $doc->getElementsByTagName('img');
$mybody = $doc->getElementsByTagName('body');
//if there's a body tag
foreach ($mybody as $bod){
//loop through each img element
foreach ($tags2 as $tag) {
echo '<img src=' . $tag->getAttribute('src') . '/>';
echo "<br/>" . $tag->getAttribute('href') ;
}
}
Here's the context:
$str = file_get_contents('http://somewebsite.html');
$doc = new DOMDocument();
#$doc->loadHTML('<?xml encoding="UTF-8">' . $str);
$tidy = new tidy();
$tidy->parseFile($str);
$tidy->cleanRepair();
if(!empty($tidy->errorBuffer)) {
echo "The following errors or warnings occured:\n";
echo $tidy->errorBuffer;
}
else {
$str = $tidy;
}
$tags2 = $doc->getElementsByTagName('img');
$mybody = $doc->getElementsByTagName('body');
foreach ($mybody as $bod){
foreach ($tags2 as $tag) {
echo '<img src=' . $tag->getAttribute('src') . '/>';
echo "<br/>" . $tag->getAttribute('href') ;
}
}
^ outputs all the images on the page, in the header, on sidebars, etc. as well as the image in the body. I just want the image in the body. I tried a few other examples I saw on here using recursion but they were to get the styles or paragraph tags and I couldn't get them to retrieve image tag and image src attribute properly.
How can I do an inner loop for any images within the body once I have the body tag?
Thank you.

You just need to reverse two lines and rewrite a smidgen.
$mybody = $doc->getElementsByTagName('body')->item(0);
$tags2 = $mybody->getElementsByTagName('img');
The reason is that the Body tag is actually a DOMElement instance of the class, and is able to perform the same call to getElementsByTagName.

How to return outer html of DOMDocument?

I'm trying to replace video links inside a string - here's my code:
$doc = new DOMDocument();
$doc->loadHTML($content);
foreach ($doc->getElementsByTagName("a") as $link)
{
$url = $link->getAttribute("href");
if(strpos($url, ".flv"))
{
echo $link->outerHTML();
}
}
Unfortunately, outerHTML doesn't work when I'm trying to get the html code for the full hyperlink like <a href='http://www.myurl.com/video.flv'></a>
Any ideas how to achieve this?

As of PHP 5.3.6 you can pass a node to saveHtml, e.g.
$domDocument->saveHtml($nodeToGetTheOuterHtmlFrom);
Previous versions of PHP did not implement that possibility. You'd have to use saveXml(), but that would create XML compliant markup. In the case of an <a> element, that shouldn't be an issue though.
See http://blog.gordon-oheim.biz/2011-03-17-The-DOM-Goodie-in-PHP-5.3.6/

You can find a couple of propositions in the users notes of the DOM section of the PHP Manual.
For example, here's one posted by xwisdom :
<?php
// code taken from the Raxan PDI framework
// returns the html content of an element
protected function nodeContent($n, $outer=false) {
$d = new DOMDocument('1.0');
$b = $d->importNode($n->cloneNode(true),true);
$d->appendChild($b); $h = $d->saveHTML();
// remove outter tags
if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
return $h;
}
?>

The best possible solution is to define your own function which will return you outerhtml:
function outerHTML($e) {
$doc = new DOMDocument();
$doc->appendChild($doc->importNode($e, true));
return $doc->saveHTML();
}
than you can use in your code
echo outerHTML($link);

Rename a file with href to links.html or links.html to say google.com/fly.html that has flv in it or change flv to wmv etc you want href from if there are other href
it will pick them up as well
<?php
$contents = file_get_contents("links.html");
$domdoc = new DOMDocument();
$domdoc->preservewhitespaces=“false”;
$domdoc->loadHTML($contents);
$xpath = new DOMXpath($domdoc);
$query = '//#href';
$nodeList = $xpath->query($query);
foreach ($nodeList as $node){
if(strpos($node->nodeValue, ".flv")){
$linksList = $node->nodeValue;
$htmlAnchor = new DOMElement("a", $linksList);
$htmlURL = new DOMAttr("href", $linksList);
$domdoc->appendChild($htmlAnchor);
$htmlAnchor->appendChild($htmlURL);
$domdoc->saveHTML();
echo ("<a href='". $node->nodeValue. "'>". $node->nodeValue. "</a><br />");
}
}
echo("done");
?>

how to handle DOM in PHP

My PHP code
$dom = new DOMDocument();
#$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="text"]');
foreach ($tags as $tag) {
echo $tag->textContent;
}
What I'm trying to do here is to get the content of the div that has class 'text' but the problem when I loop and echo the results I only get the text I can't get the HTML code with images and all the HTML tags like p, br, img... etc i tried to use $tag->nodeValue; but also nothing worked out.

Personally, I like Simple HTML Dom Parser.
include "lib.simple_html_dom.php"
$html = str_get_html($file);
foreach($html->find('div.text') as $e){
echo $e->innertext;
}
Pretty simple, huh? It accommodates selectors like jQuery :)

What you need to do is create a temporary document, add the element to that and then use saveHTML():
foreach ($tags as $tag) {
$doc = new DOMDocument;
$doc->appendChild($doc->importNode($tag, true));
$html = $doc->saveHTML();
}

I found this snippet at http://www.php.net/manual/en/class.domelement.php:
<?php
function getInnerHTML($Node)
{
$Body = $Node->ownerDocument->documentElement->firstChild->firstChild;
$Document = new DOMDocument();
$Document->appendChild($Document->importNode($Body,true));
return $Document->saveHTML();
}
?>
Not sure if it works though.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extraction of src and value from html not working? - php

You should not be using a foreach loop. IDs are unique, so getElementById returns a DOMElement, not a DOMNodeList. $tag = $dom->getElementById('head'); echo $tag->getAttribute('value') . '<br>' . $tag->getAttribute('src');

Related

How to replace getElementsByTagName() By document.getElementsById()

php dom not able to find any nodes

php get images only within body tags

How to return outer html of DOMDocument?

how to handle DOM in PHP

Categories

Resources