how to delete the elements in html using simple html dom? - php

I am trying this but its not working
include_once('simple_html_dom.php');
$handle = file_get_html('file name.php');
if (!empty($handle)) {
$ret = $handle->find('div', 0);//this is able to find the element
echo $ret;
$ret->outertext = '';//this is **NOT** deleting the video
outertext=''; is not working i dont know why
i have tried $handle->find('div[id="'.$i.'"]', 0)->outertext = '';
but this also didn't work.
any other ways I haven't tried??

I don't see why this would make a difference, but try doing your modification inline with the find.
$handle->find('div', 0)->outertext = '';

Related

Loop through XML Nodes using XmlStringStreamer in PHP

I have an XML file which is something like this:
(fileName : abc.xml)
<Envelope>
<Body>
<SResponse>
<Body>
<SResponseDetails>
<SItem>...</SItem>
<SItem>...</SItem>
<SItem>...</SItem>
<SItem>...</SItem>
</SResponseDetails>
</Body>
</SResponse>
</Body>
</Envelope>
I want to store all <SItem> in a db table (1 record for each SItem). I am using PHP for it and using XmlStringStreamer.
Here is my code for reading this file and processing it.
$streamer = \Prewk\XmlStringStreamer::createStringWalkerParser(__DIR__ . "/tempFile.xml");
$stream = new Stream\File(__DIR__ . "/abc.xml", 1024);
$parser = new Parser\StringWalker();
$streamer = new XmlStringStreamer($parser, $stream);
while ($node = $streamer->getNode()) {
$simpleXmlNode = simplexml_load_string($node);
//-- code here for getting single node
}
I am using XmlStringStreamer and did not get any answer from any forum, I also tried but could not get what I want so, can anyone please help me.
Thanks Alot.
I have solved it, here is my answer for it.
For looping for specific item, we can directly use this in xmlStringSteamer:
$stream = new Stream\File(__DIR__ . "/abc.xml", 1024);
$options = array(
"uniqueNode" => "SItem"
);
$parser = new Parser\UniqueNode($options);
// Create the streamer
$streamer = new XmlStringStreamer($parser, $stream);
$countNodes = 0;
while ($node = $streamer->getNode())
{
print_r($node);
}
The only problem is that it converts xml tags to lowercase. like <SItem> becomes <sitem>. So, anyone have idea about this problem?
I didn't executed your code but i think you can use PHP logic. If anything doesn't work for you then last solution will be exploding $simpleXmlNode to get some array structure and then process it according to your need. By the way can you share what you got after $simpleXmlNode = simplexml_load_string($node)?
For others, i can't add comments so posting this as answer.

preg_replace with wildcards?

I have HTML markup bearing the form
<div id='abcd1234A'><p id='wxyz1234A'>Hello</p></div>
which I need to replace to bear the form
<div id='abcd1234AN'><p id='wxyz1234AN'>Hello</p></div>
where N may be 1,2.. .
The best I have been able to do is as follows
function cloneIt($a,$b)
{
return substr_replace($a,$b,-1);
}
$ndx = "1'";
$str = "<div id='abcd1234A'><p id='wxyz1234A'>Hello</p></div>";
preg_match_all("/id='[a-z]{4}[0-9]{4}A'/",$str,$matches);
$matches = $matches[0];
$reps = array_merge($matches);
$ndxs = array_fill(0,count($reps),$ndx);
$reps = array_map("cloneIt",$reps,$ndxs);
$str = str_replace($matches,$reps,$str);
echo htmlspecialchars($str);
which works just fine. However, my REGEX skills are not much to write home about so I suspect that there is probably a better way to do this. I'd be most obliged to anyone who might be able to suggest a neater/quicker way of accomplishing the same result.
You can optimize your regex like this:
/id='[a-z]{4}\d{4}A'/
Sample code
preg_match_all("/id='[a-z]{4}\\d{4}A'/",$str,$matches);
However an alternative would consist in using en HTML parser. Here I'll use simple html dom:
// Load the HTML from URL or file
$html = file_get_html('http://www.mysite.com/');
// You can also load $html from string: $html = str_get_html($my_string);
// Find div with id attribute
foreach($html->find('div[id]') as $div) {
if (preg_match("/id='([a-z]{4}\\d{4})A'/" , $div->id, $matches)) {
$div->id = $matches[1] + $ndx;
}
}
echo $html->save();
Did you notice how elegant, concise and clear the code becomes with an html parser ?
References
Simple Html Dom Documentation

Remove tags with Simple HTML DOM parser [duplicate]

I would like to use Simple HTML DOM to remove all images in an article so I can easily create a small snippet of text for a news ticker but I haven't figured out how to remove elements with it.
Basically I would do
Get content as HTML string
Remove all image tags from content
Limit content to x words
Output.
Any help?
There is no dedicated methods for removing elements. You just find all the img elements and then do
$e->outertext = '';
when you only delete the outer text you delete the HTML content itself, but if you perform another find on the same elements it will appear in the result. the reason is that the simple HTML DOM object still has it's internal structure of the element, only without its actual content. what you need to do in order to really delete the element is simply reload the HTML as string to the same variable. this way the object will be recreated without the deleted content, and the simple HTML DOM object will be built without it.
here is an example function:
public function removeNode($selector)
{
foreach ($this->find($selector) as $node)
{
$node->outertext = '';
}
$this->load($this->save());
}
put this function inside the simple_html_dom class and you're good.
I think you have some difficulties because you forgot to save(dump the internal DOM tree back into string).
Try this:
$html = file_get_html("http://example.com");
foreach($html ->find('img') as $item) {
$item->outertext = '';
}
$html->save();
echo $html;
I could not figure out where to put the function so I just put the following directly in my code:
$html->load($html->save());
It basically locks changes made in the for loop back into the html per above.
The supposed solutions are quite expensive and practically unusable in a big loop or other kind of repetition.
I prefer to use "soft deletes":
foreach($html->find('somecondition'),$item){
if (somecheck) $item->setAttribute('softDelete', true); //<= set marker to check in further code
$item->outertext='';
foreach($foo as $bar){
if(!baz->getAttribute('softDelete'){
//do something
}
}
}
This is working for me:
foreach($html->find('element') as $element){
$element = NULL;
}
Adding new answer since removeNode is definitely a better way of removing it:
$html->removeNode('img');
This method probably was not available when accepted answer was marked. You do not need to loop the html to find each one, this will remove them.
Use outerhtml instead of outertext
<div id='your_div'>the contents of your div</div>
$your_div->outertext = '';
echo $your_div // echoes <div id='your_div'></div>
$your_div->outerhtml= '';
echo $your_div // echoes nothing
Try this:
$dom = new Dom();
$dom->loadStr($text);
foreach ($dom->find('element') as $element) {
$element->delete();
}
This works now:
$element->remove();
You can see the documentation for the method here.
Below I remove the HEADER and all SCRIPT nodes of the incoming url by using 2 different methods of the FIND() function. Remove the 2nd parameter to return an array of all matching nodes then just loop through the nodes.
$clean_html = file_get_html($url);
// Find and remove 1st instance of node.
$node = $clean_html->find('header', 0);
$node->remove();
// Find and remove all instances of Nde.
$nodes = $clean_html->find('script');
foreach($nodes as $node) {
$node->remove();
}

Replace an element with Dom Document PHP

I load a html page with PHP Dom Document :
$doc = new DOMDocument();
#$doc->loadHTMLFile($url);
I search in my page all "a" elements, and if they realize my condition i need to replace for example My link is beautiful by just My link is beautiful
Here my loop :
$liens = $div->getElementsByTagName('a');
foreach($liens as $lien){
if($lien->hasAttribute('href')){
if (preg_match("/metz2/i", $lien->getAttribute('href'))) {
//HERE I NEED TO REPLACE </a>
}
$cpt++;
}
}
Do you have any ideas ? Suggestions ? Thanks :)
Every time i need to manage DOM with PHP, i use a framework called PHP Simple HTLM DOM parser. (Link here)
It's very easy to use, something like this might work for you:
// Create DOM from URL or file
$html = file_get_html('http://www.page.com/');
// Find all links
foreach($html->find('a') as $element) {
//Do your custom logic here if you need it, for example this extracts the inner contents of the a-tag, and puts it freely.
$inner = $element->innertext;
$element->outertext($inner);
}
//To echo modified html again:
echo $html;
Could be done with preg_replace as well:
$sText = 'Stackoverflow';
$sText = preg_replace( '/<a.*>(.*)<\/a>/', '$1', $sText );
echo $sText;

Simple HTML Dom: How to remove elements?

I would like to use Simple HTML DOM to remove all images in an article so I can easily create a small snippet of text for a news ticker but I haven't figured out how to remove elements with it.
Basically I would do
Get content as HTML string
Remove all image tags from content
Limit content to x words
Output.
Any help?
There is no dedicated methods for removing elements. You just find all the img elements and then do
$e->outertext = '';
when you only delete the outer text you delete the HTML content itself, but if you perform another find on the same elements it will appear in the result. the reason is that the simple HTML DOM object still has it's internal structure of the element, only without its actual content. what you need to do in order to really delete the element is simply reload the HTML as string to the same variable. this way the object will be recreated without the deleted content, and the simple HTML DOM object will be built without it.
here is an example function:
public function removeNode($selector)
{
foreach ($this->find($selector) as $node)
{
$node->outertext = '';
}
$this->load($this->save());
}
put this function inside the simple_html_dom class and you're good.
I think you have some difficulties because you forgot to save(dump the internal DOM tree back into string).
Try this:
$html = file_get_html("http://example.com");
foreach($html ->find('img') as $item) {
$item->outertext = '';
}
$html->save();
echo $html;
I could not figure out where to put the function so I just put the following directly in my code:
$html->load($html->save());
It basically locks changes made in the for loop back into the html per above.
The supposed solutions are quite expensive and practically unusable in a big loop or other kind of repetition.
I prefer to use "soft deletes":
foreach($html->find('somecondition'),$item){
if (somecheck) $item->setAttribute('softDelete', true); //<= set marker to check in further code
$item->outertext='';
foreach($foo as $bar){
if(!baz->getAttribute('softDelete'){
//do something
}
}
}
This is working for me:
foreach($html->find('element') as $element){
$element = NULL;
}
Adding new answer since removeNode is definitely a better way of removing it:
$html->removeNode('img');
This method probably was not available when accepted answer was marked. You do not need to loop the html to find each one, this will remove them.
Use outerhtml instead of outertext
<div id='your_div'>the contents of your div</div>
$your_div->outertext = '';
echo $your_div // echoes <div id='your_div'></div>
$your_div->outerhtml= '';
echo $your_div // echoes nothing
Try this:
$dom = new Dom();
$dom->loadStr($text);
foreach ($dom->find('element') as $element) {
$element->delete();
}
This works now:
$element->remove();
You can see the documentation for the method here.
Below I remove the HEADER and all SCRIPT nodes of the incoming url by using 2 different methods of the FIND() function. Remove the 2nd parameter to return an array of all matching nodes then just loop through the nodes.
$clean_html = file_get_html($url);
// Find and remove 1st instance of node.
$node = $clean_html->find('header', 0);
$node->remove();
// Find and remove all instances of Nde.
$nodes = $clean_html->find('script');
foreach($nodes as $node) {
$node->remove();
}

Categories