DOMDocument Remove div and it content by identifier with PHP - php

Hi I wanna remove a line from a HTML file with PHP
like this:
<div id="buttons">
<div id="buttonid_4">Button 4</div>
<div id="buttonid_3">Button 3</div>
<div id="buttonid_2">Button 2</div>
<div id="buttonid_1">Button 1</div>
</div>
So, I wanna remove the buttonid_4, and it content.
That it will be like this:
<div id="buttons">
<div id="buttonid_3">Button 3</div>
<div id="buttonid_2">Button 2</div>
<div id="buttonid_1">Button 1</div>
</div>
First I think it is easy, but I can't found the answer :|
I tried:
"as simple"
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTMLFile($The_Path_For_File);
$element = $dom->getElementById('buttonid_'. $Button_Id);
$element->parentNode->removeChild($element);
$dom->saveHTMLFile($The_Path_For_File);
I got
Call to a member function removeChild() on a non-object
and everytime when I tried with GetElementById, so I continue with XPATH:
$xpath = new DOMXpath($dom);
$nodeList = $xpath->query('//div[#id="buttonid'.$Button_Id.'"]');
foreach($nodeList as $element){
$dom->$element->removeChild($element);
}
$dom->saveHTMLFile($The_Path_For_File);
I didn't get error, the notepad requested the refresh for file, but no change
Anyone know how to produce this?

The use of getElementById requires a Document Type Declaration (DTD).
PHP Documentation
Notice your HTML fails validation $dom->validate()
Just add <!DOCTYPE html> to your HTML and it will work.
For this function to work, you will need either to set some ID
attributes with DOMElement::setIdAttribute or a DTD which defines an
attribute to be of type ID. In the later case, you will need to
validate your document with DOMDocument::validate or
DOMDocument::$validateOnParse before using this function.

Related

Is there any way in php to select all classes that contain the same word

I would like to know if there is any way, in php, to match all classes with the same word,
Example:
<div class="classeby_class">
<div class="classos-nope">
<div class="row">
<div class="class-show"></div>
</div>
</div>
</div>
<div class="class-first-one">
<div class="container">
<div class="classes-show">
<div class="class"></div>
<div class="classing"></div>
</div>
</div>
</div>
in the example above I would like to match all div that contain the word "class" but do not match those that have the word "classes"
like,
positive for
<div class="class-show">...</div>
<div class="class-first-one">...</div>
<div class="class">...</div>
<div class="class-first-one">...</div>
but negative for
<div class="classeby_class">...</div>
<div class="classes-show">...</div>
<div class="classing">...</div>
I am using php to display several different html pages.
As regex would not be the appropriate method, first because of several page breaks, second because of hosting limitations, I'm trying to do this by parse.
All html code is stored on the server.
I can liminate with a specific class using the example below.
$doc = new DomDocument();
$xpath = new DOMXPath($doc);
$classtoremove = $xpath->query('//div[contains(#class,"class")]');
foreach($classtoremove as $classremoved){
$classremoved->parentNode->removeChild($classremoved);
}
echo $HTMLDoc->saveHTML();
I know there are CSS selectors, but when I try to use it in PHP it doesn't work. Possibly because I'm using XPath.
Example:
'[id*="class"],[class*="class"]'
Still, I think he would take values beyond what I need.
Any way to get these values by Xpath?
the intent is to completely remove the div or other tags that contain that word.
You could make use of a regex with word boundaries \bclass\b for the class attribtute and make use of DOMXPath::registerPhpFunctions.
For example
$data = <<<DATA
<div class="classeby_class">
<div class="classos-nope">
<div class="row">
<div class="class-show"></div>
</div>
</div>
</div>
<div class="class-first-one">
<div class="container">
<div class="classes-show">
<div class="class"></div>
<div class="classing"></div>
</div>
</div>
</div>
DATA;
$doc = new DomDocument();
$doc->loadHTML($data);
$xpath = new DOMXPath($doc);
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions();
$classtoremove = $xpath->query("//div[1 = php:function('preg_match', '/\bclass\b/', string(#class))]");
foreach ($classtoremove as $a) {
var_dump($a->getAttribute("class"));
}
Output
string(10) "class-show"
string(15) "class-first-one"
string(5) "class"
See a PHP demo

Finding Child Elements of Div containing specific string [duplicate]

This question already has answers here:
Get DOMElement with specific text PHP / XPath
(2 answers)
Closed 1 year ago.
I am trying to find all child elements of a div that contains a specific string. For example, in the following HTML content, I need to find all child elements of the "Trees" div, including the <div>Trees pair. There are no classes or IDs associated with each div, so I can't search for IDs or classes.
I tried the following code, using an answer from https://stackoverflow.com/a/55989111/1466973 , but the expected content was not returned by the function.
<?php
$html_text = "
<html>
<div>Grass
<div>Good grass
<div>Grass 1</div>
<div>Grass 2</div>
<div>Grass 3</div>
</div>
<div>Weeds
<div>Weeds 2</div>
<div>Weeds 3</div>
<div>Weeds 4</div>
</div>
</div>
<div>Trees
<div>Good Trees
<div>Tree 1</div>
<div>Tree 2</div>
<div>Tree 3</div>
</div>
<div>Tall Trees
<div>Tree 11</div>
<div>Tree 12</div>
<div>Tree 13</div>
</div>
</div>
<div>Fruit
<div>Red
<div>Fruit 1</div>
<div>Fruit 2</div>
<div>Fruit 31</div>
</div>
</div>
</html> ";
echo find_content($html_text); // this should be only the content of the div containing "Trees"
// tried this solution from https://stackoverflow.com/a/55989111/1466973 , didn't work
function find_trees($html_text = "") {
$dom = new DOMDocument();
$dom->loadHTML($html_text);
$xpath = new DOMXpath($dom);
$res = $xpath->document->documentElement->textContent;
$textNodes = explode(PHP_EOL, $res);
$trees_html = "";
foreach ($textNodes as $key => $text) {
if ($text == 'Trees') {
$trees_html .= $textNodes[$key + 1];
break;
}
}
"end of this function<br>";
return $trees_html;
}
Try it this way and see if it works:
Edited:
Since you are using DOMDocument to parse XML, you might as well use its xpath support to specify, succinctly, what your are looking for:
$target = $xpath->query("//div[contains(.,'Trees')]");
That's it. The rest is just a method to output to screen the string representation, in XML format, of what you have located:
$trees = $target[0]->ownerDocument->saveXML($target[0]);
echo $trees;

Find next Div Class name after parent div class name, using php/dom/xpath?

I have a known Div Class name, and I can retrieve the inner html code all good, but how would I retrieve the next Div Class name (not inner from the known Div Class) using php, dom document and xpath?
For example with the code below, if I know the Div class "mobile-container mobile-filter-container", how would I return "mobile-container mobile-cart-content-container"?
<div class="mobile-container mobile-filter-container">
<div class="mobile-wrapper-header"></div>
<div class="mobile-filter-wrapper"></div>
</div>
<div class="mobile-container mobile-cart-content-container">
<div class="mobile-wrapper-header">
Thanks,
I believe this should get you close enough to what you need:
$data = <<<DATA
<html>
<div class="mobile-container mobile-filter-container">
<div class="mobile-wrapper-header"></div>
<div class="mobile-filter-wrapper"></div>
</div>
<div class="mobile-container mobile-cart-content-container">
<div class="mobile-wrapper-header"></div>
<div class="mobile-filter-wrapper"></div>
</div>
<div class="unwanted">
<div class="mobile-wrapper-header"></div>
<div class="mobile-filter-wrapper"></div>
</div>
</html>
DATA;
$doc = new DOMDocument();
$doc->loadHTML($data);
$xpath = new DOMXpath($doc);
$elements = $xpath->query('.//div[#class="mobile-container mobile-filter-container"]/following-sibling::div[1]/#class');
echo $elements[0]->nodeValue;
Output:
mobile-container mobile-cart-content-container

Count Similar Div : Simple html dom

I have a html layout like :
<div id="pageno">1</div>
<div id="pageno">2</div>
<div id="pageno">3</div>
<div id="pageno">4</div>
<div id="pageno">5</div>
I need to know using html dom parser how can i know the last div inner text?
THanks in advance
// Create a new DomDocument.
$dom = new DomDocument();
// Load your HTML into it.
$dom->loadHTML('
<div id="pageno">1</div>
<div id="pageno">2</div>
<div id="pageno">3</div>
<div id="pageno">4</div>
<div id="pageno">5</div>
');
// Obtain a list of the DIVs.
$divList = $dom->getElementsByTagName("div");
// Obtain the last element of the list.
$lastDiv = $divList->item($divList->length - 1);
// Output the inner text.
echo $lastDiv->nodeValue;
However, the HTML you have provided is not valid, as element IDs should be unique. This may cause an error in the loadHTML function.

Parse HTML with PHP's HTML DOMDocument

I was trying to do it with "getElementsByTagName", but it wasn't working, I'm new to using DOMDocument to parse HTML, as I used to use regex until yesterday some kind fokes here told me that DOMEDocument would be better for the job, so I'm giving it a try :)
I google around for a while looking for some explains but didn't find anything that helped (not with the class anyway)
So I want to capture "Capture this text 1" and "Capture this text 2" and so on.
Doesn't look to hard, but I can't figure it out :(
<div class="main">
<div class="text">
Capture this text 1
</div>
</div>
<div class="main">
<div class="text">
Capture this text 2
</div>
</div>
If you want to get :
The text
that's inside a <div> tag with class="text"
that's, itself, inside a <div> with class="main"
I would say the easiest way is not to use DOMDocument::getElementsByTagName -- which will return all tags that have a specific name (while you only want some of them).
Instead, I would use an XPath query on your document, using the DOMXpath class.
For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath class :
$html = <<<HTML
<div class="main">
<div class="text">
Capture this text 1
</div>
</div>
<div class="main">
<div class="text">
Capture this text 2
</div>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
And, then, you can use XPath queries, with the DOMXPath::query method, that returns the list of elements you were searching for :
$tags = $xpath->query('//div[#class="main"]/div[#class="text"]');
foreach ($tags as $tag) {
var_dump(trim($tag->nodeValue));
}
And executing this gives me the following output :
string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)
You can use http://simplehtmldom.sourceforge.net/
It is very simple easy to use DOM parser written in php, by which you can easily fetch the content of div tag.
Something like this:
// Find all <div> which have attribute id=text
$ret = $html->find('div[id=text]');
See the documentation of it for more help.

Categories