how to remove parent element using php? - php

I want to get the HTML inside the parent element using php. For example, I have this structure:
<p>
<p>this is my first xml file </p>
</p>
and I want to get below text as a result.
<p>this is my first xml file </p>

Make use of a DOM Parser
<?php
$html='<p>
<p>this is my first xml file </p>
</p>';
$dom = new DOMDocument;
#$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('p') as $tag){
if(!empty($tag->nodeValue)){ echo $tag->nodeValue;}
}

Related

How can I strip html tags except some of them?

I need to remove all html codes from a php string except:
<p>
<em>
<small>
You know, strip_tags() function is good, but it strips all html tags, how can I tell it remove all html except those tags above?
You should check out the manual: Example #1 strip_tags() example
Syntax: strip_tags ( Your-string, Allowable-Tags )
If you pass the second parameter, these tags will not be stripped.
strip_tags($string, '<p><em><small>');
According to your comment, you want to remove HTML elements only if they have some class or attribute. You'll need to build up a DOM then:
<?php
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>I will be deleted as well</p>
<p>But keep this</p>
</div>
DATA;
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$xpath = new DOMXPath($dom);
$elements_to_be_removed = $xpath->query("//*[count(#*)>0]");
foreach ($elements_to_be_removed as $element) {
$element->parentNode->removeChild($element);
}
// just to check
echo $dom->saveHTML();
?>
To change which elements shall be removed, you'll need to change the query, ie to remove all elements with the class myclass, it must read "//*[class='myclass']".

Extract only first level paragraphs from html

I have the following html:
<div id="myID">
<p>I want this</p>
<p>and I want this</p>
<div>
<p>I don't want this</p>
</div>
</div>
I want to extract only the first level <p>...</p> elements.
I've tried using the excellent simple_html_dom library e.g. $html->find('#myID p') but in the case above, this finds all three <p>...</p> elements
Is there a better way to do this?
Instead of having to use some external library why don't you just use the built in classes to handle the dom?
First create a DOMDocument instance using your HTML:
$dom = new DOMDocument();
$dom->loadHtml($yourHtml);
After that use DOMXPath to select your elements:
$xpath = new DOMXpath($dom);
$nodes = $xpath->query("//*[#id='myID']/p");
var_dump($nodes->length); // outputs 2
This selects all p elements which are direct children of the element with the id myID. Demo

how to add a custom attributes with PHP Simple HTML DOM Parser

I am working with a project that require the use of PHP Simple HTML Dom Parser, and I need a way to add a custom attribute to a number of elements based on class name.
I am able to loop through the elements with a foreach loop, and it would be easy to set a standard attribute such as href, but I can't find a way to add a custom attribute.
The closest I can guess is something like:
foreach($html -> find(".myelems") as $element) {
$element->myattr="customvalue";
}
but this doesn't work.
I have seen a number of other questions on similar topics, but they all suggest using an alternative method for parsing html (domDocument etc.). In my case this is not an option, as I must use Simple HTML DOM Parser.
Did you try it? Try this example (Sample: adding data tags).
include 'simple_html_dom.php';
$html_string = '
<style>.myelems{color:green}</style>
<div>
<p class="myelems">text inside 1</p>
<p class="myelems">text inside 2</p>
<p class="myelems">text inside 3</p>
<p>simple text 1</p>
<p>simple text 2</p>
</div>
';
$html = str_get_html($html_string);
foreach($html->find('div p[class="myelems"]') as $key => $p_tags) {
$p_tags->{'data-index'} = $key;
}
echo htmlentities($html);
Output:
<style>.myelems{color:green}</style>
<div>
<p class="myelems" data-index="0">text inside 1</p>
<p class="myelems" data-index="1">text inside 2</p>
<p class="myelems" data-index="2">text inside 3</p>
<p>simple text 1</p>
<p>simple text 2</p>
</div>
Well, I think it's too old post but still i think it will help somebody like me :)
So in my case I added custom attribute to an image tag
$markup = file_get_contents('pathtohtmlfile');
//Create a new DOM document
$dom = new DOMDocument;
//Parse the HTML. The # is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
#$dom->loadHTML($markup);
//Get all images tags
$imgs = $dom->getElementsByTagName('img');
//Iterate over the extracted images
foreach ($imgs as $img)
{
$img->setAttribute('customAttr', 'customAttrVal');
}

Parse between comments in Simple HTML Dom

Can I fetch the data between two html comments using Simple HTML Dom ??
For example, See the below code:
<!-- start of comment -->
link1<br />
link2<br />
link3<br />
link4<br />
<!-- end of comment-->
link5<br />
link6<br />
There are totally six links and only 4 links are enclosed within a "" and "" tags.
I just want to get the links between the comment tags.
You can do this:
//get all comments
$comments = $html->find('comment');
...and use next_sibling() to get next element and check if it's an anchor tag till you get another comment tag, where the script will terminate.
Try this code
$dom = new DOMDocument();
$dom->loadHTML($html);
$elements = $dom->getElementsByTagName('a');
foreach ($elements as $child) {
echo $child->nodeValue;
}

Rewriting HTML tags with DOM/Xpath (PHP)

I'm parsing a block of HTML with DOM/Xpath in PHP. Within this HTML, there are a few p tags that I want to convert to h4 tags, instead.
Raw HTML =>
<p class="archive">Awesome line of text</p>
Desired HTML =>
<h4>Awesome line of text</h4>
How can I do this with Xpath? I think I need to call on appendChild, but I'm not sure. Thank you for any guidance.
Something along these lines should do it:
<?php
$html = <<<END
<html>
<head>
<title>Test</title>
</head>
<body>
<p>hi</p>
<p class="archive">Awesome line of text</p>
<p>bye</p>
<p class="archive">Another line of <b>text</b></p>
<p>welcome</p>
<p class="archive">Another <u>line</u> of <b>text</b></p>
</body>
</html>
END;
$doc = new DOMDocument();
$doc->loadXML($html);
$xpath = new DOMXPath($doc);
// Find the nodes we want to change
$nodes = $xpath->query("//p[#class = 'archive']");
foreach ($nodes as $node) {
// Create a new H4 node
$h4 = $doc->createElement('h4');
// Move the children of the current node to the new one
while ($node->hasChildNodes())
$h4->appendChild($node->firstChild);
// Replace the current node with the new
$node->parentNode->replaceChild($h4, $node);
}
echo $doc->saveXML();
?>

Categories