Getting 'DomNode' type object in php - php

I am using php's DomDocument class to load an HTML file and then empty its contents. The problem is when I do .removeChild() it gives me 'Not Found Error'. heres my code
$doc=new DOMDocument();
$doc->loadHTMLFile("a.html");
$body= $doc->getElementsByTagName('body')->item(0);
foreach($body->childNodes as $child)
{
$body->removeChild($child);
}
$child is of DOMText type....may be because removeChild expects DOMNode and not DOMText? if yes then how can i iterate over childNodes such that $child is of type DOMNode?

Use a for loop instead of a foreach loop.
$doc=new DOMDocument();
$doc->loadHTML("c.html");
$doc->preserveWhiteSpace = true;
$body = $doc->getElementsByTagName('body')->item(0);
$children = $body->childNodes;
$length = $children->length;
for($i = 0 ; $i < $length; $i++) {
$child = $children->item($i);
if ($child)
$body->removeChild($child);
}
$html = $doc->saveHTML();
echo $html;

Related

change nodevalue with all childnodes in domelement

I have html code something like this:
<p><i>i_text</i>,p_text</p>
i_text,p_text
i want change all node values in this domelement and keep all tags
i_changed_text,p_changed_text
my attempts)
$html = '<p><i>i_text</i> p_text</p>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = true;
$elements = $dom->getElementsByTagName('*');
foreach ($elements as $element) {
$element->nodeValue = str_replace('_','_changed_',$element->nodeValue);
}
echo($dom->saveHTML());
output i_changed_text,p_changed_text
this code return correct text but don't save childnodes
$html = '<p><i>i_text</i>,p_text</p>';
$dom = new DOMDocument();
$dom->loadXML($html);
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = true;
$elements = $dom->getElementsByTagName('*');
$elem = $dom->createElement('dfn', 'tag');
$attr = $dom->createAttribute('text');
$attr->value = 'element';
$elem->appendChild($attr);
$elements = $dom->getElementsByTagName('*');
foreach ($elements as $element) {
while ($element->hasChildnodes()) {
$element = $element->childNodes->item(0);
}
$changed_value = str_replace('_','_changed_',$element->nodeValue);
$element->nodeValue = str_replace("tag", $dom->saveXML($elem), $changed_value);
}
echo ($dom->saveXML());
output
i_changed_text,p_text
this code save and change values in childnodes but don't change text in parentnode
my solution)
i_text,p_text,a_text,another one_text
$html = '<p><i>i_text</i>,p_text<b>,a_text</b>,another one_text</p>';
$dom = new DOMDocument();
$dom->loadXML($html);
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = true;
$elements = $dom->getElementsByTagName('*');
foreach ($elements as $element) {
if($element->hasChildnodes()==true && $element->parentNode->nodeName == '#document'){
foreach($element->childNodes as $element_child){
$element_child->nodeValue = str_replace('_','_changed_', $element_child->nodeValue);
}
}
}
echo ($dom->saveXML());
output
i_changed_text,p_changed_text,a_changed_text,another one_changed_text

php to extract data from a website

I want to get all <p> elements from 1st jokes so basically I made this script:
<?php
$url = "http://sms.hindijokes.co";
$html = file_get_contents($url);
$doc = new DOMDocument;
$doc->strictErrorChecking = false;
$doc->recover = true;
#$doc->loadHTML("<html><body>".$html."
</body> </html>");
$xpath = new DOMXPath($doc);
$query1 = "//h2[#class='entry-title']/a";
$query2 = "//div[#class='entry-content']/p";
$entries1 = $xpath->query($query1);
$entries2 = $xpath->query($query2);
$var1 = $entries1->item(0)->textContent;
$var2 = $entries2->item(0)->textContent;
echo "$var1";
echo "<br>";
$f = 5;
for($i = 0; $i < $f; $i++){
echo $entries2->item($i)->textContent."\n";
}
?>
This time I was knowing that there are five <p> elements in first joke but if I want it to be automate script, there would be sometimes more or less than five <p> elements so it would cause mess.
You need first div's p elements only, so your query would be:
$entries2 = $xpath->query('//(div[#class='entry-content'])[1]/p');
Now you can iterate all p elements with foreach() loop (extracting its html contents):
$innerHtml = '';
foreach ($entries2 as $entry) {
$children = $entry->childNodes;
foreach ($children as $child) {
$innerHtml .= $child->ownerDocument->saveXML($child);
}
}
$innerHtml = str_replace(["\r\n", "\r", "\n", "\t"], '', $innerHtml);
DOMXPath::query returns DOMNodeList object. Use DOMNodeList::length property.
$f = $entries2->length;
Try this way it is returning until null; but some joke has multiple p tags so its better for you to find it by your custom class/id
$i = 0;
while($entries2->item($i)->textContent!=NULL) {
echo "<br>";
echo $i." ".$entries2->item($i)->textContent;
$i++;
}

foreach ends before getting through all elements in SimpleXMLElement

I loop over a big xml document, and i need to remove some nodes. Unfortunately my foreach breaks after first removing. How is that?
$ids = [1, 2];
$data=<<<DNS_TXT
<feed xmlns:g="http://base.google.com/ns/1.0">
<entry><g:id>1</g:id><description>Desc 1</description></entry>
<entry><g:id>2</g:id><description>Desc 2</description></entry>
<entry><g:id>3</g:id><description>Desc 3</description></entry>
<entry><g:id>4</g:id><description>Desc 4</description></entry>
<entry><g:id>5</g:id><description>Desc 5</description></entry>
</feed>
DNS_TXT;
$doc = new SimpleXMLElement($data);
$i = 0;
foreach($doc->entry as $entry)
{
$i++;
$dom = $entry->children('http://base.google.com/ns/1.0');
if(!in_array($dom->id, $ids)) {
$dom = dom_import_simplexml($entry);
$dom->parentNode->removeChild($dom);
}
}
echo $i;
Result is 3 instead of 5...
Of course i can do that:
/.../
$toRemove = array();
foreach($doc->entry as $entry)
{
$dom = $entry->children('http://base.google.com/ns/1.0');
if(!in_array($dom->id, $ids)) {
$dom = dom_import_simplexml($entry);
$toRemove[] = $dom;
}
}
foreach ($toRemove as $dom) {
$dom->parentNode->removeChild($dom);
}
/.../
But why in first case foreach ends?
In such ways it's better to loop from maximum index to lowest. So it works:
$entries = $doc->entry;
for($i = count($entries)-1; $i >= 0; $i--)
{
$entry = $entries[$i];
$dom = $entry->children('http://base.google.com/ns/1.0');
if(!in_array($dom->id, $ids)) {
unset($doc->entry[$i]);
}

concatenate innerhtml of div into string variable

i tried to concatenate innerhtml of div into string variable:
games variable:
$games = '';
DOMinnerHTML function:
function DOMinnerHTML($element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
ExtractFromType function:
function ExtractFromType($type)
{
$html = file_get_contents('www.site.com/' .$type);
$dom = new domDocument;
#$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
if (strpos($div->getAttribute('style'),'MyString') !== false) {
//////
$games = $games.DOMinnerHTML($div);
//////
}
}
}
code:
ExtractFromType('MyType');
echo $games; // = Nothing.
this code return nothing.
$games is defined in the global scope, and it's not available inside ExctractFromType. Define it inside the function, then return the value:
function ExtractFromType($type) {
$html = file_get_contents('www.site.com/' .$type);
$dom = new domDocument;
#$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$divs = $dom->getElementsByTagName('div');
$games = '';
foreach ($divs as $div) {
if (strpos($div->getAttribute('style'),'MyString') !== false) {
$games = $games.DOMinnerHTML($div);
}
}
}
echo ExtractFromType('MyType');

PHP: DomElement->getAttribute

How can I take all the attribute of an element? Like on my example below I can only get one at a time, I want to pull out all of the anchor tag's attribute.
$dom = new DOMDocument();
#$dom->loadHTML(http://www.example.com);
$a = $dom->getElementsByTagName("a");
echo $a->getAttribute('href');
thanks!
$length = $a->attributes->length;
$attrs = array();
for ($i = 0; $i < $length; ++$i) {
$name = $a->attributes->item($i)->name;
$value = $a->getAttribute($name);
$attrs[$name] = $value;
}
print_r($attrs);
"Inspired" by Simon's answer. I think you can cut out the getAttribute call, so here's a solution without it:
$attrs = array();
for ($i = 0; $i < $a->attributes->length; ++$i) {
$node = $a->attributes->item($i);
$attrs[$node->nodeName] = $node->nodeValue;
}
var_dump($attrs);
$a = $dom->getElementsByTagName("a");
foreach($a as $element)
{
echo $element->getAttribute('href');
}
$html = $data['html'];
if(!empty($html)){
$doc = new DOMDocument();
$doc->loadHTML($html);
$doc->saveHTML();
$datadom = $doc->getElementsByTagName("input");
foreach($datadom as $element)
{
$class =$class." ".$element->getAttribute('class');
}
}

Categories