PHP DOMDocument: Get attribute value from id - php

I would like to extract the value of the attribute "value" using the id tag.
My code:
<?php
$url = 'http://turni.tt-contact.com/Default.aspx';
$contents = htmlentities(file_get_contents($url));
echo $contents."\n"; //html
$dom = new DOMDocument;
$dom->validateOnParse = true;
$dom->loadHTML($contents);
$dom->preserveWhiteSpace = false;
$data = $dom->getElementById("__VIEWSTATE");
echo $data->nodeValue;
?>
I would like the attribute "value" -> "THIS":
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="THIS">
but the code returns only the html code.
What do I need to change?
Also by modifying it to:
$xpath = new DOMXpath($dom);
$data = $xpath->query('//input[#id="__VIEWSTATE"]');
$node = $data->item(0);
echo $node->getAttribute('value');
I get this error:
Fatal error: Call to a member function getAttribute() on null

Try this :
$data->getAttribute('value');
PHP: DomElement->getAttribute
$attrs = array();
for ($i = 0; $i < $data->attributes->length; ++$i){
$node = $data->attributes->item($i);
$attrs[$node->nodeName] = $node->nodeValue;
}
var_dump($attrs);

Don't use htmlentities as it will change the document's HTML tags from : <html> to <html>and your document won't be HTML anymore, just a plain text full of < and >, and so the methods to get nodes won't work.

Related

Change outerHTML of a php DOMElement?

How do I change the outerHtml of an element using PHP DomDocument class? Make sure, no third party library is used such as Simple PHP Dom or else.
For example:
I want to do something like this.
$dom = new DOMDocument;
$dom->loadHTML($html);
$tag = $dom->getElementsByTagName('h3');
foreach ($tag as $e) {
$e->outerHTML = '<h5>Hello World</h5>';
}
libxml_clear_errors();
$html = $dom->saveHTML();
echo $html;
And the output should be like this:
Old Output: <h3>Hello World</h3>
But I need this new output: <p>Hello World</p>
You can create a copy of the element content and attributes in a new node (with the new name you need), and use the function replaceChild().
The current code will work only with simple elements (a text inside a node), if you have nested elements, you will need to write a recursive function.
$dom = new DOMDocument;
$dom->loadHTML($html);
$titles = $dom->getElementsByTagName('h3');
for($i = $titles->length-1 ; $i >= 0 ; $i--)
{
$title = $titles->item($i);
$titleText = $title->textContent ; // get original content of the node
$newTitle = $dom->createElement('h5'); // create a new node with the correct name
$newTitle->textContent = $titleText ; // copy the content of the original node
// copy the attribute (class, style, ...)
$attributes = $title->attributes ;
for($j = $attributes->length-1 ; $j>= 0 ; --$j)
{
$attributeName = $attributes->item($j)->nodeName ;
$attributeValue = $attributes->item($j)->nodeValue ;
$newAttribute = $dom->createAttribute($attributeName);
$newAttribute->nodeValue = $attributeValue ;
$newTitle->appendChild($newAttribute);
}
$title->parentNode->replaceChild($newTitle, $title); // replace original node per our copy
}
libxml_clear_errors();
$html = $dom->saveHTML();
echo $html;

DomXPath->query do not get HTML tags

I'm getting the content from a DOMDocument and using query to get a certain part of the DOM.
I get the content perfectly but not the HTML tags, only pure text.
Here is the code I use:
$DOM = file_get_contents($link);
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadHTML($DOM);
$finder = new DomXPath($doc);
$founds = 0;
$nodes = $finder->query("//div[contains(#class, 'td-post-content td-pb-padding-side')]");
foreach ( $nodes as $entrada ) {
if ( $founds == 0 ) {
$description = $entrada->nodeValue;
} $founds++;
}
Once I echo $description, no html tags like <p> <div> and so on, are not included.
What's wrong?

PHP DOM Document not parsing / retrieving HTML

I wrote the following:
<?php
$str = 'http://stackoverflow.com';
$DOM = new DOMDocument;
$DOM->loadHTML($str);
//get all H1
$items = $DOM->getElementsByTagName('h1');
//display all H1 text
for ($i = 0; $i < $items->length; $i++)
{
echo $items->item($i)->nodeValue . "<br/>";
}
?>
And just wanted to simply retrieve all the H1 elements of stackoverflow, but can't get it working. Whenever I try filling in the variable $str manually (for example: <h1>hello</h1><div><h1>hello2</h1></div>) it is working. But whenever I try to parse content from another webpage it is not doing anything at all...
Help would be appericiated!
$str = 'http://stackoverflow.com';
$DOM = new DOMDocument;
$DOM->loadHTMLFile($str); // get html
echo $DOM->saveHTML(); echo html
$DOM->saveHTMLFile(FILE_NAME); save html to file

PHP XPath to change stylesheet

I use xpath to change stylesheet of href of stylesheet <link> in header.
But it doesn't work at all.
$html=file_get_contents('http://stackoverflow.com');
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$css_links = $xpath->evaluate("//link[#type='text/css']");
for ($i = 0; $i < $css_links->length; $i++)
{
$csslink = $css_links->item($i);
$oldurl = $csslink->getAttribute('href');
$newURL='http://example.com/aaaa.css';
$csslink->removeAttribute('href');
$csslink->setAttribute('href', $newURL);
}
echo $html;
You're using #$doc->loadHTML(html); instead of #$doc->loadHTML($html); (note the $), otherwise it works.
Also use echo $doc->SaveHtml() instead of echoing $html.
Working example here.
You also can replace for($i...) with foreach because DOMNodeList implements Traversable:
foreach ($css_links as $csslink)
{
$oldurl = $csslink->getAttribute('href');

Replace Tag in HTML with DOMDocument

I'm trying to edit html tags with DOMDocument::loadHTML in php. The html data is a part of html and not the whole page. I followed what this page (PHP - DOMDocument - need to change/replace an existing HTML tag w/ a new one) says.
This should convert pre tags into div tags but it gives "Fatal error: Uncaught exception 'DOMException' with message 'Not Found Error'."
<?php
$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;
$dom = new DOMDocument;
#$dom->loadHTML($contents);
foreach( $dom->getElementsByTagName("pre") as $nodePre ) {
$nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
$dom->replaceChild($nodeDiv, $nodePre);
}
echo $dom->saveHTML();
?>
[Edit]
While I'm trying to iterate the node object backwards, I get this error, 'Notice: Trying to get property of non-object...'
<?php
$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;
$dom = new DOMDocument;
#$dom->loadHTML($contents);
$domPre = $dom->getElementsByTagName('pre');
$length = $domPre->length;
For ($i = $length; $i > -1 ; $i--) {
$nodePre = $domPre->item($i);
echo $nodePre->nodeValue . '<br />';
// $nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
// $dom->replaceChild($nodeDiv, $nodePre);
}
// echo $dom->saveHTML();
?>
[Edit]
Okey, solved. Since the answered code has some error I post the solution here. Thanks all.
Solution:
<?php
$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;
$dom = new DOMDocument;
#$dom->loadHTML($contents);
$domPre = $dom->getElementsByTagName('pre');
$length = $domPre->length;
For ($i = $length - 1; $i > -1 ; $i--) {
$nodePre = $domPre->item($i);
$nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
$nodePre->parentNode->replaceChild($nodeDiv, $nodePre);
}
echo $dom->saveHTML();
?>
The problem is the call to replaceChild(). Rather than
$dom->replaceChild($nodeDiv, $nodePre);
use
$nodePre->parentNode->replaceChild($nodeDiv, $nodePre);
update
Here is a working code. Seems there is some issue with replacing multiple nodes (more info here: http://php.net/manual/en/domnode.replacechild.php) so you'll have to use a regressive loop to replace the elements.
$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;
$dom = new DOMDocument;
#$dom->loadHTML($contents);
$elements = $dom->getElementsByTagName("pre");
for ($i = $elements->length - 1; $i >= 0; $i --) {
$nodePre = $elements->item($i);
$nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
$nodePre->parentNode->replaceChild($nodeDiv, $nodePre);
}
Another way with paquettg/php-html-parser (didn't find the way to change name, so had to use hack with re-binding $this):
use PHPHtmlParser\Dom;
use PHPHtmlParser\Dom\HtmlNode;
$dom = new Dom;
$dom->load($text);
/** #var HtmlNode[] $tags */
foreach($dom->find('pre') as $tag) {
$changeTag = function() {
$this->name = 'div';
};
$changeTag->call($tag->tag);
};
echo (string)$dom;

Categories