Simple way to replace elements with certain attributes in php - php

How is it possible to get an element with a certain attribute?
Afterwards I want to replace this element including the tags of the HTML document within PHP?
I tried it here:
$html = '<note>
<span data="getThisElement">New Text</span>
<div data="yes">More Text</div>
</note>';
echo$newTxt = str_replace("<? data="getThisElement", "<div>New Div</div>", $html);
The output should be:
<note>
<div>New Div</div>
<div data="yes">More Text</div>
</note>

You can use preg_replace with this
$html = '<note>
<span data="getThisElement">New Text</span>
<div data="yes">More Text</div>
</note>';
$replace = "<div>New Div</div>";
$output = preg_replace("/<span[^>]*>.*?<\/span>/is",$replace,$html);
echo $output;
or if you want to replace it by specific element
$element = "getThisElement";
$output = str_replace('<span data=" ' . $element . ' ">New Text</span>',$replace,$html);

I would suggest more advanced technique like Simple HTML DOM Parser. Here is your example:
<?php
include "simplehtmldom_1_9_1/simple_html_dom.php";
$html = '<note>
<span data="getThisElement">New Text</span>
<div data="yes">More Text</div>
</note>';
// Create DOM from string
$dom = str_get_html($html);
$dom->find('span[data=getThisElement]', 0)->outertext = '<div>New Div</div>';
echo $dom;
// <note>
// <div>New Div</div>
// <div data="yes">More Text</div>
// </note>

You can get PHP to echo some JS as well as the HTML. This can do the replacements for you as JS is good at parsing HTML and you don't have to add extra functionality to PHP. The result is the same. For more generality you could make the JS a function so can be used elsewhere echoed by the PHP if required.
<?php
$html = '<note>
<span data="getThisElement">New Text</span>
<div data="yes">More Text</div>
</note>';
$repAttr = 'data="getThisElement"';// The attribute of those elements we want to replace
$repWith = '<div>New Div</div>'; // what we want to replace those elements with
?>
<div id="temp">
<script>
let el = document.createElement( 'div' );
let replaceWith = "<div>New Div</div>";
el.innerHTML = `<?php echo $html; ?>`; //note the use of backticks so the string can span many lines
let elsToReplace = el.querySelectorAll( "note"[<?php echo $repAttr; ?>] ); // gets all the elements within $html that have the given attribute
elsToReplace.forEach(function (repEl) {
repEl.outerHTML = '<?php echo $repWith; ?>';
});
document.getElementById("temp").outerHTML = el.innerHTML; //this will overwrite all this setting-up JS so the DOM will have the content required and nothing else
</script>
</div>

Related

Replace <div> tag with <p> tag using php

<div style = "text-align:left;" class="ref"> Text </div>
I want to replace <div> with <p> without losing attributes.
Any help is appreciated.
Try This:
$str = '<div style = "text-align:left;" class="ref"> Text </div>';
$newstr = preg_replace('/<div [^<]*?class="([^<]*?ref.*?)">(.*?)<\/div>/','<p class="$1">$2</p>',$str);
echo $newstr;
Output : <p class="ref"> Text </p>

PHP Regex - Remove text from HTML Tags

How to remove all text between tags.
Input
<div>
<p>testing</p>
<div>my world</div>
</div>
Output
<div>
<p></p>
<div></div>
</div>
You can use either DOMDocument or PHP Simple HTML DOM Parser.
The following example uses the latter, although you may want to use what suits you best.
include("simple_html_dom.php");
$str = '
<div>
<p>testing</p>
<div>my world</div>
</div>
';
$html = str_get_html($str);
foreach($html->find("text") as $ht) {
$ht->innertext = "";
}
$html->save();
echo $html;
You could use two capture groups which would eliminate characters between them while replacing:
(\<.+\>).*(\<\/.+\>)
working example: http://ideone.com/Oq14El

Extracting parts of an html code

Let's say I had the below HTML code:
<p>Test text</p>
<p><img src="test.jpg" /></p>
<div id="test"><p>test</p></div>
<div class="block">
<img src="test2.jpg">
</div>
<p>test</p>
Parameters:
There will exist a div block with class "block"
There can be any amount of HTML code above or below the div block with class "block"
There could even be two div blocks with class "block"
I was using PHP's XPath to look at this HTML code using DOM. I want to be able to return two things:
The div block with class "block"
All the rest of the code without the div element with class "block" in it
Something like:
Block Code:
<div class="block">
<img src="test2.jpg">
</div>
Original without block code:
<p>Test text</p>
<p><img src="test.jpg" /></p>
<div id="test"><p>test</p></div>
<p>test</p>
By using DOMDocument you can do it like this :
$content = '<p>Test text</p>'.
'<p><img src="test.jpg" /></p>'.
'<div id="test"><p>test</p></div>'.
'<div class="block">'.
'<img src="test2.jpg">'.
'</div>'.
'<p>test</p>';
$blocks = array();
$doc = new DOMDocument();
$doc->loadHTML($content);
$elements = $doc->getElementsByTagName("*");
foreach ($elements as $element) {
if($element->hasAttributes()) {
if ($element->getAttribute('class') == 'block') {
//add block HTML to block array
$blocks[]=$doc->saveHTML($element);
//remove blocck element
$element->parentNode->removeChild($element);
}
}
}
echo '<pre>';
echo $blocks[0]; //iterate or print_r if multiple blocks
echo $doc->saveHTML();
echo '</pre>';
outputs the "block code" :
<div class="block"><img src="test2.jpg"></div>
and the "original without block code" :
<p>Test text</p><p><img src="test.jpg"></p><div id="test"><p>test</p></div><p>test</p>
If you simply cant accept that DOMDocument "enriches" the HTML with doctype, html and body, which can be very annoying when you want the complete document, not just some extracts, you can use this neat function and extract the body innerHTML with :
echo DOMinnerHTML($doc->getElementsByTagName('body')->item(0));

Retrieve a text node with Simple HTML DOM Parser

I'm quite new to Simple HTML DOM Parser. I want to get a child element from the following HTML:
<div class="article">
<div style="text-align:justify">
<img src="image.jpg" title="image">
<br>
<br>
"Text to grab"
<div>......</div>
<br></br>
................
................
</div>
</div>
I'm trying to get the text "Text to grab"
So far I've tried the following query:
$html->find('div[class=article] div')->children(3);
But it's not working. Any idea how to solve this ?
You don't need simple_html_dom here. It can be done with DOMDocument and DOMXPath. Both are part of the PHP core.
Example:
// your sample data
$html = <<<EOF
<div class="article">
<div style="text-align:justify">
<img src="image.jpg" title="image">
<br>
<br>
"Text to grab"
<div>......</div>
<br></br>
................
................
</div>
</div>
EOF;
// create a document from the above snippet
// if you are loading from a remote url use:
// $doc->load($url);
$doc = new DOMDocument();
$doc->loadHTML($html);
// initialize a XPath selector
$selector = new DOMXPath($doc);
// get the text node (also text elements in xml/html are nodes
$query = '//div[#class="article"]/div/br[2]/following-sibling::text()[1]';
$textToGrab = $selector->query($query)->item(0);
// remove newlines on start and end using trim() and output the text
echo trim($textToGrab->nodeValue);
Output:
"Text to grab"
If it's always in the same place you can do:
$html->find('.article text', 4);

Extract text from html tags in an rss feed

We have following rss feed
<title>THIS IS THE TITLE</title>
<link>http://www.website.com/....</link>
<description>
<div class="primary-image">
<img typeof="foaf:Image" src="http://website.com/" alt="Drink driving" title="Drink driving" />
</div>
<div class="field-group-format group_meta field-group-div group-meta speed-fast effect-none">
<span class="field field-name-field-published-date field-type-datetime field-label-hidden">
<span class="field-item even">
<span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2014-01-29T17:43:00+00:00">29 Jan, 2014 5:43pm</span>
</span>
</span>
<span class="field field-name-field-author field-type-node-reference field-label-hidden">
<span class="field-item even">Joe Finnerty</span>
</span>
</div>
<p class="short-desc">TEXT THAT I WANT TO EXTRACT FROM HERE</p>
</description>
And i am trying to extract the <p class="short-desc">TEXT THAT I WANT TO EXTRACT FROM HERE</p> with the following this script and checked some questions here but did not find a practical response.
I tried adding
$htmlStr = $node->getElementsByTagName('description')->item(0)->nodeValue;
$html = new DOMDocument();
$html->loadHTML($htmlStr);
$xpath = new DOMXPath($html);
$desc = $xpath->query("//*[contains(concat(' ', normalize-space(#class), ' '), ' short-desc')]");
before $item = array ( , within the foreach loop but did not work.
but did not do the job. Also instead of
< is replacing < AND
" is replacing " AND
> is replacing >
Please help i am trying to find an answer for some days now and did not find it.
Assuming that you are passing the above HTML content to the $html variable ..
$dom = new DOMDocument;
#$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('p') as $tag) {
if ($tag->getAttribute('class') === 'short-desc') {
echo $tag->nodeValue; //"prints" TEXT THAT I WANT TO EXTRACT FROM HERE
}
}
If i understand correctly, you want to remove tags from feeds so you can try like this:
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
?>
output will be:
Test paragraph. Other text
For more info:http://in3.php.net/strip_tags
why not use regex?
$strRegex = '%<p class="short-desc">(.+?)</p>%s';
if (preg_match_all($strRegex, $strContent, $arrMatches))
{
var_dump($arrMatches[1][0]);
}
and to get the content use
$path = 'path/to/file';
$strContent = file_get_contents($path);

Categories