Here is what I am trying to achieve : retrieve all products on a page and put them into an array. Here is the code I am using :
$page2 = curl_exec($ch);
$doc = new DOMDocument();
#$doc->loadHTML($page2);
$nodes = $doc->getElementsByTagName('title');
$noders = $doc->getElementsByClassName('productImage');
$title = $nodes->item(0)->nodeValue;
$product = $noders->item(0)->imageObject.src;
It works for the $title but not for the product. For info, in the HTML code the img tag looks like this :
<img alt="" class="productImage" data-altimages="" src="xxxx">
I have been looking at this (PHP DOMDocument how to get element?) but I still don't understand how to make it work.
PS : I get this error :
Call to undefined method DOMDocument::getElementsByclassName()
I finally used the following solution :
$classname="blockProduct";
$finder = new DomXPath($doc);
$spaner = $finder->query("//*[contains(#class, '$classname')]");
https://stackoverflow.com/a/31616848/3068233
Linking this answer as it helped me the most with this problem.
function getElementsByClass(&$parentNode, $tagName, $className) {
$nodes=array();
$childNodeList = $parentNode->getElementsByTagName($tagName);
for ($i = 0; $i < $childNodeList->length; $i++) {
$temp = $childNodeList->item($i);
if (stripos($temp->getAttribute('class'), $className) !== false) {
$nodes[]=$temp;
}
}
return $nodes;
}
Theres the code and heres the usage
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($html);
$content_node=$dom->getElementById("content_node");
$div_a_class_nodes=getElementsByClass($content_node, 'div', 'a');
function getElementsByClassName($dom, $ClassName, $tagName=null) {
if($tagName){
$Elements = $dom->getElementsByTagName($tagName);
}else {
$Elements = $dom->getElementsByTagName("*");
}
$Matched = array();
for($i=0;$i<$Elements->length;$i++) {
if($Elements->item($i)->attributes->getNamedItem('class')){
if($Elements->item($i)->attributes->getNamedItem('class')->nodeValue == $ClassName) {
$Matched[]=$Elements->item($i);
}
}
}
return $Matched;
}
// usage
$dom = new \DOMDocument('1.0');
#$dom->loadHTML($html);
$elementsByClass = getElementsByClassName($dom, $className, 'h1');
Related
I am having some code and getting HTTP 500 Error. A bit getting confused. I need to extract from the web of weather cast weather digit information and add in the website.
Here is a code:
orai_class.php
<?php
Class orai{
var $url;
function generate_orai($url){
$html = file_get_contents($url);
$classname = 'wi wi-1';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[#class='" . $classname . "']");
$i=0;
foreach($results as $node)
{
if ($results->length > 0) {
$array[] = $results->item($i)->nodeValue;
}
$i++;
}
return $array;
}
}
?>
index.php
<?php
include("orai.class.php");
$orai = new orai();
print_r($orai->generate_orai('https://orai.15min.lt/prognoze/vilnius'));
?>
Thank You.
This question already has answers here:
How to get innerHTML of DOMNode?
(9 answers)
Closed 5 years ago.
How to Change innerHTML of a php DOMElement ?
Another solution:
1) create new DOMDocumentFragment from the HTML string to be inserted;
2) remove old content of our element by deleting its child nodes;
3) append DOMDocumentFragment to our element.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
while ($element->hasChildNodes())
$element->removeChild($element->firstChild);
$element->appendChild($fragment);
}
Alternatively, we can replace our element with its clean copy and then append DOMDocumentFragment to this clone.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
$clone = $element->cloneNode(); // Get element copy without children
$clone->appendChild($fragment);
$element->parentNode->replaceChild($clone, $element);
}
Test:
$doc = new DOMDocument();
$doc->loadXML('<div><span style="color: green">Old HTML</span></div>');
$div = $doc->getElementsByTagName('div')->item(0);
echo $doc->saveHTML();
setInnerHTML($div, '<p style="color: red">New HTML</p>');
echo $doc->saveHTML();
// Output:
// <div><span style="color: green">Old HTML</span></div>
// <div><p style="color: red">New HTML</p></div>
I needed to do this for a project recently and ended up with an extension to DOMElement: http://www.keyvan.net/2010/07/javascript-like-innerhtml-access-in-php/
Here's an example showing how it's used:
<?php
require_once 'JSLikeHTMLElement.php';
$doc = new DOMDocument();
$doc->registerNodeClass('DOMElement', 'JSLikeHTMLElement');
$doc->loadHTML('<div><p>Para 1</p><p>Para 2</p></div>');
$elem = $doc->getElementsByTagName('div')->item(0);
// print innerHTML
echo $elem->innerHTML; // prints '<p>Para 1</p><p>Para 2</p>'
// set innerHTML
$elem->innerHTML = 'FF';
// print document (with our changes)
echo $doc->saveXML();
?>
I think the best thing you can do is come up with a function that will take the DOMElement that you want to change the InnerHTML of, copy it, and replace it.
In very rough PHP:
function replaceElement($el, $newInnerHTML) {
$newElement = $myDomDocument->createElement($el->nodeName, $newInnerHTML);
$el->parentNode->insertBefore($newElement, $el);
$el->parentNode->removeChild($el);
return $newElement;
}
This doesn't take into account attributes and nested structures, but I think this will get you on your way.
I ended up making this function using a few functions from other people on this page. I changed the one from Joanna Goch the way that Peter Brand says mostly, and also added some code from Guest and from other places.
This function does not use an extension, and does not use appendXML (which is very picky and breaks even if it sees one BR tag that is not closed) and seems to be working good.
function set_inner_html( $element, $content ) {
$DOM_inner_HTML = new DOMDocument();
$internal_errors = libxml_use_internal_errors( true );
$DOM_inner_HTML->loadHTML( mb_convert_encoding( $content, 'HTML-ENTITIES', 'UTF-8' ) );
libxml_use_internal_errors( $internal_errors );
$content_node = $DOM_inner_HTML->getElementsByTagName('body')->item(0);
$content_node = $element->ownerDocument->importNode( $content_node, true );
while ( $element->hasChildNodes() ) {
$element->removeChild( $element->firstChild );
}
$element->appendChild( $content_node );
}
It seems that appendXML doesn't work always - for example if you try to append XML with 3 levels. Here is the function I wrote that always work (you want to set $content as innerHTML to $element):
function setInnerHTML($DOM, $element, $content) {
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0)->firstChild;
$contentNode = $DOM->importNode($contentNode, true);
$element->appendChild($contentNode);
return $elementNode;
}
Have a look at this library PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/
It looks pretty straightforward. You can change innertextproperty of your elements. It might help.
Here is a replace by class function I just wrote:
It will replace the innerHtml of a class. You can also specify the node type eg. div/p/a etc.
function replaceInnerHtmlByClass($html, $replace=null, $class=null, $nodeType=null){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
while($node->childNodes->length){
$node->removeChild($node->firstChild);
}
$fragment = $dom->createDocumentFragment();
$fragment->appendXML($replace);
$node->appendChild($fragment);
}
return $dom->saveHTML($dom->documentElement);
}
Here is another function I wrote to remove nodes with a specific class but preserving the inner html.
Setting replace to true will discard the inner html.
Setting replace to any other content will replace the inner html with the provided content.
function stripTagsByClass($html, $class=null, $nodeType=null, $replace=false){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
$innerHTML = '';
$children = $node->childNodes;
foreach($children as $child) {
$tmp = new DOMDocument();
$tmp->appendChild($tmp->importNode($child,true));
$innerHTML .= $tmp->saveHTML();
}
$fragment = $dom->createDocumentFragment();
if($replace !== null && $replace !== false){
if($replace === true){ $replace = ''; }
$innerHTML = $replace;
}
$fragment->appendXML($innerHTML);
$node->parentNode->replaceChild($fragment, $node);
}
return $dom->saveHTML($dom->documentElement);
}
Theses functions can easily be adapted to use other attributes as the selector.
I only needed it to evaluate the class attribute.
Developing on from Joanna Goch's answer, this function will insert either a text node or an HTML fragment:
function nodeFromContent($node, $content) {
//creates a text node, or dom node if content contains html
$lt = strpos($content, '<');
$gt = strrpos($content, '>');
if (!($lt === false || $gt === false) && $gt > $lt) {
//< followed by > means potentially contains HTML
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0);
$newNode = $node->ownerDocument->importNode($contentNode, true);
} else {
$newNode = $node->ownerDocument->createTextNode($content);
}
return $newNode;
}
usage
$newNode = nodeFromContent($node, $content);
$node->parentNode->insertBefore($newNode, $node);
//or $node->appendChild($newNode) depending on what you require
here is how you do it:
$doc = new DOMDocument('');
$label = $doc->createElement('label');
$label->appendChild($doc->createTextNode('test'));
$li->appendChild($label);
echo $doc->saveHTML();
function setInnerHTML($DOM, $element, $innerHTML) {
$node = $DOM->createTextNode($innerHTML);
$element->appendChild($node);
}
I need to print out my array, but print_r($test) doesn't work at last...
Here is a simple code :
$code = '<html><head></head><body><div class="list"><img src="http://google.com/564308080517287.jpg" alt="my title"></div></body></html>'; // Code is simplified here, but imagine you've got much more contents inside
$doc = new DOMDocument();
$doc->loadHTML( $code );
//
$test = array();
foreach($doc->getElementsByTagName('div') as $div){
if($div->getAttribute('class') == "list"){
$ads_count = $div->getElementsByTagName('a')->length;
for ($i=0; $i<=$ads_count; $i++) {
$ad = $div->getElementsByTagName('a')->item($i);
$ad_img = trim($ad->getElementsByTagName('img')->item(0)->getAttribute('src'));
$test[$i]['img'] = $ad_img;
}
}
}
print_r($test); // doesn't work !!
Any idea ?
<?php
$code = '<html><head></head><body><div class="list">
<img src="http://google.com/564308080517287.jpg" alt="my title"></div></body></html>'; // Code is simplified here, but imagine you've got much more contents inside
$dom = new DOMDocument();
$dom->loadHtml($code);
$selector = new DOMXPath($dom);
$parceiltable = $selector->query("//div[#class='list']/a/img");
foreach($parceiltable as $key=>$tds){
$test[]['img'] = $tds->getAttribute('src');
}
print_r($test);
?>
I'm trying to edit html tags with DOMDocument::loadHTML in php. The html data is a part of html and not the whole page. I followed what this page (PHP - DOMDocument - need to change/replace an existing HTML tag w/ a new one) says.
This should convert pre tags into div tags but it gives "Fatal error: Uncaught exception 'DOMException' with message 'Not Found Error'."
<?php
$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;
$dom = new DOMDocument;
#$dom->loadHTML($contents);
foreach( $dom->getElementsByTagName("pre") as $nodePre ) {
$nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
$dom->replaceChild($nodeDiv, $nodePre);
}
echo $dom->saveHTML();
?>
[Edit]
While I'm trying to iterate the node object backwards, I get this error, 'Notice: Trying to get property of non-object...'
<?php
$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;
$dom = new DOMDocument;
#$dom->loadHTML($contents);
$domPre = $dom->getElementsByTagName('pre');
$length = $domPre->length;
For ($i = $length; $i > -1 ; $i--) {
$nodePre = $domPre->item($i);
echo $nodePre->nodeValue . '<br />';
// $nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
// $dom->replaceChild($nodeDiv, $nodePre);
}
// echo $dom->saveHTML();
?>
[Edit]
Okey, solved. Since the answered code has some error I post the solution here. Thanks all.
Solution:
<?php
$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;
$dom = new DOMDocument;
#$dom->loadHTML($contents);
$domPre = $dom->getElementsByTagName('pre');
$length = $domPre->length;
For ($i = $length - 1; $i > -1 ; $i--) {
$nodePre = $domPre->item($i);
$nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
$nodePre->parentNode->replaceChild($nodeDiv, $nodePre);
}
echo $dom->saveHTML();
?>
The problem is the call to replaceChild(). Rather than
$dom->replaceChild($nodeDiv, $nodePre);
use
$nodePre->parentNode->replaceChild($nodeDiv, $nodePre);
update
Here is a working code. Seems there is some issue with replacing multiple nodes (more info here: http://php.net/manual/en/domnode.replacechild.php) so you'll have to use a regressive loop to replace the elements.
$contents = <<<STR
<pre>hi</pre>
<pre>hello</pre>
<pre>bye</pre>
STR;
$dom = new DOMDocument;
#$dom->loadHTML($contents);
$elements = $dom->getElementsByTagName("pre");
for ($i = $elements->length - 1; $i >= 0; $i --) {
$nodePre = $elements->item($i);
$nodeDiv = $dom->createElement("div", $nodePre->nodeValue);
$nodePre->parentNode->replaceChild($nodeDiv, $nodePre);
}
Another way with paquettg/php-html-parser (didn't find the way to change name, so had to use hack with re-binding $this):
use PHPHtmlParser\Dom;
use PHPHtmlParser\Dom\HtmlNode;
$dom = new Dom;
$dom->load($text);
/** #var HtmlNode[] $tags */
foreach($dom->find('pre') as $tag) {
$changeTag = function() {
$this->name = 'div';
};
$changeTag->call($tag->tag);
};
echo (string)$dom;
This question already has answers here:
How to get innerHTML of DOMNode?
(9 answers)
Closed 5 years ago.
How to Change innerHTML of a php DOMElement ?
Another solution:
1) create new DOMDocumentFragment from the HTML string to be inserted;
2) remove old content of our element by deleting its child nodes;
3) append DOMDocumentFragment to our element.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
while ($element->hasChildNodes())
$element->removeChild($element->firstChild);
$element->appendChild($fragment);
}
Alternatively, we can replace our element with its clean copy and then append DOMDocumentFragment to this clone.
function setInnerHTML($element, $html)
{
$fragment = $element->ownerDocument->createDocumentFragment();
$fragment->appendXML($html);
$clone = $element->cloneNode(); // Get element copy without children
$clone->appendChild($fragment);
$element->parentNode->replaceChild($clone, $element);
}
Test:
$doc = new DOMDocument();
$doc->loadXML('<div><span style="color: green">Old HTML</span></div>');
$div = $doc->getElementsByTagName('div')->item(0);
echo $doc->saveHTML();
setInnerHTML($div, '<p style="color: red">New HTML</p>');
echo $doc->saveHTML();
// Output:
// <div><span style="color: green">Old HTML</span></div>
// <div><p style="color: red">New HTML</p></div>
I needed to do this for a project recently and ended up with an extension to DOMElement: http://www.keyvan.net/2010/07/javascript-like-innerhtml-access-in-php/
Here's an example showing how it's used:
<?php
require_once 'JSLikeHTMLElement.php';
$doc = new DOMDocument();
$doc->registerNodeClass('DOMElement', 'JSLikeHTMLElement');
$doc->loadHTML('<div><p>Para 1</p><p>Para 2</p></div>');
$elem = $doc->getElementsByTagName('div')->item(0);
// print innerHTML
echo $elem->innerHTML; // prints '<p>Para 1</p><p>Para 2</p>'
// set innerHTML
$elem->innerHTML = 'FF';
// print document (with our changes)
echo $doc->saveXML();
?>
I think the best thing you can do is come up with a function that will take the DOMElement that you want to change the InnerHTML of, copy it, and replace it.
In very rough PHP:
function replaceElement($el, $newInnerHTML) {
$newElement = $myDomDocument->createElement($el->nodeName, $newInnerHTML);
$el->parentNode->insertBefore($newElement, $el);
$el->parentNode->removeChild($el);
return $newElement;
}
This doesn't take into account attributes and nested structures, but I think this will get you on your way.
I ended up making this function using a few functions from other people on this page. I changed the one from Joanna Goch the way that Peter Brand says mostly, and also added some code from Guest and from other places.
This function does not use an extension, and does not use appendXML (which is very picky and breaks even if it sees one BR tag that is not closed) and seems to be working good.
function set_inner_html( $element, $content ) {
$DOM_inner_HTML = new DOMDocument();
$internal_errors = libxml_use_internal_errors( true );
$DOM_inner_HTML->loadHTML( mb_convert_encoding( $content, 'HTML-ENTITIES', 'UTF-8' ) );
libxml_use_internal_errors( $internal_errors );
$content_node = $DOM_inner_HTML->getElementsByTagName('body')->item(0);
$content_node = $element->ownerDocument->importNode( $content_node, true );
while ( $element->hasChildNodes() ) {
$element->removeChild( $element->firstChild );
}
$element->appendChild( $content_node );
}
It seems that appendXML doesn't work always - for example if you try to append XML with 3 levels. Here is the function I wrote that always work (you want to set $content as innerHTML to $element):
function setInnerHTML($DOM, $element, $content) {
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0)->firstChild;
$contentNode = $DOM->importNode($contentNode, true);
$element->appendChild($contentNode);
return $elementNode;
}
Have a look at this library PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/
It looks pretty straightforward. You can change innertextproperty of your elements. It might help.
Here is a replace by class function I just wrote:
It will replace the innerHtml of a class. You can also specify the node type eg. div/p/a etc.
function replaceInnerHtmlByClass($html, $replace=null, $class=null, $nodeType=null){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
while($node->childNodes->length){
$node->removeChild($node->firstChild);
}
$fragment = $dom->createDocumentFragment();
$fragment->appendXML($replace);
$node->appendChild($fragment);
}
return $dom->saveHTML($dom->documentElement);
}
Here is another function I wrote to remove nodes with a specific class but preserving the inner html.
Setting replace to true will discard the inner html.
Setting replace to any other content will replace the inner html with the provided content.
function stripTagsByClass($html, $class=null, $nodeType=null, $replace=false){
if(!$nodeType){ $nodeType = '*'; }
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//{$nodeType}[contains(concat(' ', normalize-space(#class), ' '), '$class')]");
foreach($nodes as $node) {
$innerHTML = '';
$children = $node->childNodes;
foreach($children as $child) {
$tmp = new DOMDocument();
$tmp->appendChild($tmp->importNode($child,true));
$innerHTML .= $tmp->saveHTML();
}
$fragment = $dom->createDocumentFragment();
if($replace !== null && $replace !== false){
if($replace === true){ $replace = ''; }
$innerHTML = $replace;
}
$fragment->appendXML($innerHTML);
$node->parentNode->replaceChild($fragment, $node);
}
return $dom->saveHTML($dom->documentElement);
}
Theses functions can easily be adapted to use other attributes as the selector.
I only needed it to evaluate the class attribute.
Developing on from Joanna Goch's answer, this function will insert either a text node or an HTML fragment:
function nodeFromContent($node, $content) {
//creates a text node, or dom node if content contains html
$lt = strpos($content, '<');
$gt = strrpos($content, '>');
if (!($lt === false || $gt === false) && $gt > $lt) {
//< followed by > means potentially contains HTML
$DOMInnerHTML = new DOMDocument();
$DOMInnerHTML->loadHTML($content);
$contentNode = $DOMInnerHTML->getElementsByTagName('body')->item(0);
$newNode = $node->ownerDocument->importNode($contentNode, true);
} else {
$newNode = $node->ownerDocument->createTextNode($content);
}
return $newNode;
}
usage
$newNode = nodeFromContent($node, $content);
$node->parentNode->insertBefore($newNode, $node);
//or $node->appendChild($newNode) depending on what you require
here is how you do it:
$doc = new DOMDocument('');
$label = $doc->createElement('label');
$label->appendChild($doc->createTextNode('test'));
$li->appendChild($label);
echo $doc->saveHTML();
function setInnerHTML($DOM, $element, $innerHTML) {
$node = $DOM->createTextNode($innerHTML);
$element->appendChild($node);
}