PHP: Replace DOMElement with DOMText node - php

I want to create some customised tags for translating, for instance
<trad>SOMETHING</trad>
I've also got a file with some $GLOBALS variable, like:
$GLOBALS['SOMETHING'] = 'Some text';
$GLOBALS['SOMETHINGELSE'] = 'Some other text';
So I've been able to show my translation in this way:
$string = "<trad>SOMETHING</trad>";
$string = preg_replace('/<trad[^>]*?>([\\s\\S]*?)<\/trad>/','\\1', $string);
echo $GLOBALS[$string];
This works perfectly, but when I've got something more complex like the following code, or when I have more occurences of this tag, I'm not able to let it work:
$string = "Lorem ipsum <trad>SOMETHING</trad> <h1>Hello</h1> <trad>SOMETHINGELSE</trad>";
I ideally want to create a new variale $string, replacing the values that I found into my tags and being able to show it with a simple echo.
So I want an output like this with:
echo $string;
//output: Lorem ipsum Some text <h1>Hello</h1> Some other text
Can you guys help me?

Regex is not a valid approach for treating HTMLstring. Here we are using DOMDocument instead of Regex to achieve desired output. The last step of strip_tags has been done to achieve desired output, there will no need in case a valid HTML string is supplied to loadHTML, in that case saveHTML($node) will do the job.
Try this code snippet here
<?php
ini_set('display_errors', 1);
libxml_use_internal_errors(true);
$array["SOMETHING"]="some text";
$array["SOMETHINGELSE"]="some text other";
$string = "Lorem ipsum <trad>SOMETHING</trad> <h1>Hello</h1> <trad>SOMETHINGELSE</trad>";
$domDocument = new DOMDocument();
$domDocument->loadHTML($string,LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);
$results=$domDocument->getElementsByTagName("trad");
do
{
foreach($results as $result)
{
$result->parentNode->replaceChild($domDocument->createTextNode($array[trim($result->nodeValue)]),$result);
}
}
while($results->length>0);
echo strip_tags($domDocument->saveHTML(),"<h1>");

Related

How to remove scripts tags inside another code by regex

I'm trying to remove script tags from the source code using regular expression.
/<\s*script[^>]*[^\/]>(.*?)<\s*\/\s*script\s*>/is
But I ran into the problem when I need to remove the code inside another code.
Please see this screenshot
I'm tested in https://regex101.com/r/R6XaUT/1
How do I correctly create a regular expression so that it can cover all the code?
Sample text:
$text = '<b>sample</b> text with <div>tags</div>';
Result for strip_tags($text):
Output: sample text with tags
Result for strip_tags_content($text):
Output: text with
Result for strip_tags_content($text, ''):
Output: <b>sample</b> text with
Result for strip_tags_content($text, '', TRUE);
Output: text with <div>tags</div>
I hope that someone is useful :)
source link
Simply use the PHP function strip_tags. See
http://php.net/manual/de/function.strip-tags.php
$string = "<div>hello</div>";
echo strip_tags($string);
Will output
hello
You also can provide a list of tags to keep.
==
Another approach is this:
// Load a file into $html
$html = file_get_contents('scratch.html');
$matches = [];
preg_match_all("/<\/*([^\s>]*)>/", $html, $matches);
// Have a list of all Tags only once
$tags = array_unique($matches[1]);
// Find the script index and remove it
$scriptTagIndex = array_search("script", $tags);
if($scriptTagIndex !== false) unset($tags[$scriptTagIndex]);
// Taglist must be a string containing <tagname1><tagename2>...
$allowedTags = array_map(function ($s) { return "<$s>"; }, $tags);
// Stript the HTML and keep all Tags except for removed ones (script)
$noScript = strip_tags($html,join("", $allowedTags));
echo $noScript;

Regular expression to remove links with their inner text from a string with PHP

I have the following code:
$string = 'Try to remove the link text from the content links in it Try to remove the link text from the content testme Try to remove the link text from the content';
$string = preg_replace('#(<a.*?>).*?(</a>)#', '$1$2', $string);
$result = preg_replace('/<a href="(.*?)">(.*?)<\/a>/', "\\2", $string);
echo $result; // this will output "I am a lot of text with links in it";
I am looking to merge these preg_replace lines. Please suggest.
You need to use DOM for these tasks. Here is a sample that removes links from this content of yours:
$str = 'Try to remove the link text from the content links in it Try to remove the link text from the content testme Try to remove the link text from the content';
$dom = new DOMDocument;
#$dom->loadHTML($str, LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);
$xp = new DOMXPath($dom);
$links = $xp->query('//a');
foreach ($links as $link) {
$link->parentNode->removeChild($link);
}
echo preg_replace('/^<p>([^<>]*)<\/p>$/', '$1', #$dom->saveHTML());
Since the text node is the only one in the document, the PHP DOM creates a dummy p node to wrap the text, so I am using a preg_replace to remove it. I think it is not your case.
See IDEONE demo

How to match the first image element without preceding text?

I need to select the the first image tag in a HTML string, but only if it does not have preceding text. So for example, it should match this:
<p><span><img src="some.jpg"></span></p>
But it should not match this:
<p>Text text text<span><img src="some.jpg"></span></p>
nor this:
<p><span>Text text text<img src="some.jpg"></span></p>
I've tryed something like:
/(<[^>]+>)<img/is
So that I can select the tags before the img tag, but I'm not able to exclude the text that can be in any tag preceding the img element.
Some thought?
Regex solution:
$regex='#^(<[^>]+>)*<img#i';
var_dump(preg_match($regex,'<p><span><img src="some.jpg"></span></p>'));
var_dump(preg_match($regex,'<p>Text text text<span><img src="some.jpg"></span></p>'));
var_dump(preg_match($regex,'<p><span>Text text text<img src="some.jpg"></span></p>'));
Outputs:
int(1)
int(0)
int(0)
Live demo
Edit:
DOM/XPath solution:
foreach(array('<p><span><img src="some.jpg"></span></p>',
'<p>Text text text<span><img src="some.jpg"></span></p>',
'<p><span>Text text text<img src="some.jpg"></span></p>') as $html)
{
$dom=new DOMDocument();
$dom->loadHTML($html);
$xpath=new DOMXPath($dom);
var_dump($xpath->query('//img[string-length(//text())<=0]')->length);
}
Also outputs 1,0,0.
Live demo
Edit #2: The XPath solution still works, but it also eliminated the situation that text come after <img>. Since the question hinted that "preceding" means literally, I think Regex is a better tool here.
May be like this
$str = '
<p><span><img src="some1.jpg"></span></p>
<p><span>Text text text<img src="some2.jpg"></span></p>
<p><span>Text text text<img src="some3.jpg"></span></p>
<p><span><img src="some4.jpg"></span></p>';
preg_match_all('#<p>\s*<span>\s*<a.*(<img[^>]+>)#U', $str, $match);
echo '<pre>' . htmlspecialchars(print_r($match, 1)) . '</pre>';
$content = strip_tags($yourContent, '<p><img>');
preg_match_all("#<p>(<img[^>]+>)#U", $content, $out);
print_r($out);

Getting content of partial html in DomDocument

I have a string:
$string = 'some text <img src="www">';
I want to get the image source and the text.
Here is what I have:
$doc= new DOMDocument();
$doc->loadHTML($string);
$nodes=$doc->getElementsByTagName ('img');
From $nodes->item(0) I get the image source.
How can I get the the "some text"?
textContent, or with DOMXPaths $xpath->query('//text()')
For simple cases like this, try:
$doc->documentElement->textContent
You could make it like jQuery in javascript. Wrap the whole string with anything, and get this. Then you can get the TextNode, which contains this text.
$string = 'some text <img src="www">';
$string = '<div id="wrapper">' . $string . '</div>';
$nodes = $doc->getElementById('wrapper');

Using PHP to remove a html element from a string

I am having trouble working out how to do this, I have a string looks something like this...
$text = "<p>This is some example text This is some example text This is some example text</p>
<p><em>This is some example text This is some example text This is some example text</em></p>
<p>This is some example text This is some example text This is some example text</p>";
I basically want to use something like preg_repalce and regex to remove
<em>This is some example text This is some example text This is some example text</em>
So I need to write some PHP code that will search for the opening <em> and closing </em> and delete all text in-between
hope someone can help,
Thanks.
$text = preg_replace('/([\s\S]*)(<em>)([\s\S]*)(</em>)([\s\S]*)/', '$1$5', $text);
In case if you are interested in a non-regex solution following would aswell:
<?php
$text = "<p>This is some example text This is some example text This is some example text</p>
<p><em>This is some example text This is some example text This is some example text</em></p>
<p>This is some example text This is some example text This is some example text</p>";
$emStartPos = strpos($text,"<em>");
$emEndPos = strpos($text,"</em>");
if ($emStartPos && $emEndPos) {
$emEndPos += 5; //remove <em> tag aswell
$len = $emEndPos - $emStartPos;
$text = substr_replace($text, '', $emStartPos, $len);
}
?>
This will remove all the content in between tags.
$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';
preg_match("#<em>(.+?)</em>#", $text, $output);
echo $output[0]; // This will output it with em style
echo '<br /><br />';
echo $output[1]; // This will output only the text between the em
[ View output ]
For this example to work, I changed the <em></em> contents a little, otherwise all your text is the same and you cannot really understand if the script works.
However, if you want to get rid of the <em> and not to get the contents:
$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';
echo preg_replace("/<em>(.+)<\/em>/", "", $text);
[ View output ]
Use strrpos to find the first element and
then the last element.
Use substr to get the part of string.
And then replace the substring with empty string from original string.
format: $text = str_replace('<em>','',$text);
$text = str_replace('</em>','',$text);

Categories