Remove javascript links - php

I'm looking for a regex that will be able to replace all links like Link with a warning. I've been having a play but no success so far! I've always been bad with regex, can someone point me in the right direction? I have this so far:
Edit: People saying don't use Regex - the HTML will be the output of a markdown parser with all HTML tags in the markdown stripped. Therefore i know that the output of all links will be formatted as stated above, therefore regex would surely be a good tool in this particular situation. I am not allowing users to enter pure HTML. And SO has done something very similar, try creating a javascript link, and it will be removed
<?php
//Javascript link filter test
if(isset($_POST['jsfilter'])){
$html = " JS Link ";
$pattern = "/ href\\s*?=\\s*?[\"']\\s*?(javascript)\\s*?(:).*?([\"']) /is";
$replacement = "\"javascript: alert('Javascript links have been blocked');\"";
$html = preg_replace($pattern, $replacement, $html);
echo $html;
}
?>
<form method="post">
<input type="text" name="jsfilter" />
<button type="submit">Submit</button>
</form>

The right regex should be :
$pattern = '/href="javascript:[^"]+"/';
$replacement = 'href="javascript:alert(\'Javascript links have been blocked\')"';

Use strip_tags and htmlSpecialChars() to display user generated content. If you want to let users use specific tags, refer to BBcode.

You should test quote and double quotes, handle white spaces, etc...
$html = preg_replace( '/href\s*=\s*"javascript:[^"]+"/i' , 'href="#"' , $html );
$html = preg_replace( '/href\s*=\s*\'javascript:[^i]+\'/i' , 'href=\'#\'' , $html );

Try this code. I think, this would help.
<?php
//Javascript link filter test
if(isset($_POST['jsfilter'])){
$html = " JS Link ";
$pattern = '/a href="javascript:(.*?)"/i';
$replacement = 'a href="javascript: alert(\'Javascript links have been blocked\');"';
$html = preg_replace($pattern, $replacement, $html);
echo $html;
}
?>

Related

How to preg_match_all to get the text inside the tags "<h3>" and "<h3> <a/> </h3>"

Hello I am currently creating an automatic table of contents my wordpress web. My reference from
https://webdeasy.de/en/wordpress-table-of-contents-without-plugin/
Problem :
Everything goes well unless in the <h3> tag has an <a> tag link. It make $names result missing.
I see problems because of this regex section
preg_match_all("/<h[3,4](?:\sid=\"(.*)\")?(?:.*)?>(.*)<\/h[3,4]>/", $content, $matches);
// get text under <h3> or <h4> tag.
$names = $matches[2];
I have tried modifying the regex (I don't really understand this)
preg_match_all (/ <h [3,4] (?: \ sid = \ "(. *) \")? (?:. *)?> <a (. *)> (. *) <\ / a> <\ / h [3,4]> /", $content, $matches)
// get text under <a> tag.
$names = $matches[4];
The code above work for to find the text that is in the <h3> <a> a text </a> <h3> tag, but the h3 tag which doesn't contain the <a> tag is a problem.
My Question :
How combine code above?
My expectation is if when the first code result does not appear then it is execute the second code as a result.
Or maybe there is a better solution? Thank you.
Here's a way that will remove any tags inside of header tags
$html = <<<EOT
<h3>Here's an alternative solution</h3> to using regex. <h3>It may <a name='#thing'>not</a></h3> be the most elegant solution, but it works
EOT;
preg_match_all('#<h(.*?)>(.*?)<\/h(.*?)>#si', $html, $matches);
foreach ($matches[0] as $num=>$blah) {
$look_for = preg_quote($matches[0][$num],"/");
$tag = str_replace("<","",explode(">",$matches[0][$num])[0]);
$replace_with = "<$tag>" . strip_tags($matches[2][$num]) . "</$tag>";
$html = preg_replace("/$look_for/", $replace_with,$html,1);
}
echo "<pre>$html</pre>";
The answer #kinglish is the base of this solution, thank you very much. I slightly modify and simplify it according to my question article link. This code worked for me:
preg_match_all('#(\<h[3-4])\sid=\"(.*?)\"?\>(.*?)(<\/h[3-4]>)#si',$content, $matches);
$tags = $matches[0];
$ids = $matches[2];
$raw_names = $matches[3];
/* Clean $rawnames from other html tags */
$clean_names= array_map(function($v){
return trim(strip_tags($v));
}, $raw_names);
$names = $clean_names;

preg_replace question

I'm trying to use preg_replace to strip out a section of code but I am having problems getting it to work right.
Code Example:
$str = '<p class="code">some string here</p>';
PHP I'm using:
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
preg_replace($pattern,"", $str);
This strips out the code just as I want with the exception of the space between the p and class.
Returns:
some string here //notice the single space at the beginning.
I'm trying to get:
some string here //no space at the beginning.
I have been beating my head against the wall trying to find a solution. The reason I'm trying to strip it out in a chunk instead of breaking the preg_replace into pieces is because I don't want to change anything that may be in the string between the tags. Any ideas?
That does not happen for me (and it shouldn't).
It may be a space output somewhere else (use var_dump() to view the string).
You might want to look into this thread to see if you want to switch to using DOMDocument. It'll save you a great deal of headaches trying to parse through HTML.
Robust and Mature HTML Parser for PHP
test:
<?php
$str = '<p class="code">some string here</p>';
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
$result = preg_replace($pattern,"", $str);
var_dump($result);
result:
php pregrep.php
string(16) "some string here"
seems to work just fine.
Alex I figured out where I was picking up the extra space.
I was putting that code into a text area like this:
$str = '<p class="code">some string here</p>';
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
$strip_str = preg_replace($pattern,"", $str);
<textarea id="code_area" class="syntaxhl" name="code" cols="66" rows="5">
<?php echo $strip_str; ?>
</textarea>
This gave me my extra space but when I changed the code to:
<textarea id="code_area" class="syntaxhl" name="code" cols="66" rows="5"><?php echo $strip_str; ?></textarea>
No line spaces or breaks the extra space went away.
Why not use trim()?
$text = trim($text);
This removes white spaces around strings.

Removal of bad hyperlinks and the content inside of them

Ok, basically I have an array of bad urls and I would like to search through a string and strip them out. I want to strip everything from the opening tag to the closing tag, but only if the url in the hyperlink is in the array of bad urls. Here is how I would picture it working but I don't understand regular expressions well.
foreach($bad_urls as $bad_url){
$pattern = "/<a*$bad_url*</a>/";
$replacement = ' ';
preg_replace($pattern, $replacement, $content);
}
Thanks in advance.
Assuming that your 'bad urls' are properly formatted URLs, I would suggest doing something like this:
foreach($bad_urls as $bad_url){
$pattern = '/<[aA]\s.+[href|HREF]\=\"' . convert_to_pattern($bad_url) . '\".+<\/[aA]>/msU';
$replacement = ' ';
$content = preg_replace_all($pattern, $replacement, $content);
}
and separately
function convert_to_pattern($url)
{
searches = array('%', '&', '?', '.', '/', ';', ' ');
replaces = array('\%','\&','\?','\.','\/','\;','\ ');
return preg_replace_all($searches, $replaces, $url);
}
Please do not try to parse HTML using regular expressions. Just load up the HTML in a DOM, find all the <a> tags and check the href property. Much simpler and fool-proof.

Stripping html tags using php

How can i strip html tag except the content inside the pre tag
code
$content="
<div id="wrapper">
Notes
</div>
<pre>
<div id="loginfos">asdasd</div>
</pre>
";
While using strip_tags($content,'') the html inside the pre tag too stripped of. but i don't want the html inside pre stripped off
Try :
echo strip_tags($text, '<pre>');
You may do the following:
Use preg_replace with 'e' modifier to replace contents of pre tags with some strings like ###1###, ###2###, etc. while storing this contents in some array
Run strip_tags()
Run preg_relace with 'e' modifier again to restore ###1###, etc. into original contents.
A bit kludgy but should work.
<?php
$document=html_entity_decode($content);
$search = array ("'<script[^>]*?>.*?</script>'si","'<[/!]*?[^<>]*?>'si","'([rn])[s]+'","'&(quot|#34);'i","'&(amp|#38);'i","'&(lt|#60);'i","'&(gt|#62);'i","'&(nbsp|#160);'i","'&(iexcl|#161);'i","'&(cent|#162);'i","'&(pound|#163);'i","'&(copy|#169);'i","'&#(d+);'e");
$replace = array ("","","\1","\"","&","<",">"," ",chr(161),chr(162),chr(163),chr(169),"chr(\1)");
$text = preg_replace($search, $replace, $document);
echo $text;
?>
$text = 'YOUR CODE HERE';
$org_text = $text;
// hide content within pre tags
$text = preg_replace( '/(<pre[^>]*>)(.*?)(<\/pre>)/is', '$1###pre###$3', $text );
// filter content
$text = strip_tags( $text, '<pre>' );
// insert back content of pre tags
if ( preg_match_all( '/(<pre[^>]*>)(.*?)(<\/pre>)/is', $org_text, $parts ) ) {
foreach ( $parts[2] as $code ) {
$text = preg_replace( '/###pre###/', $code, $text, 1 );
}
}
print_r( $text );
Ok!, you leave nothing but one choice: Regular Expressions... Nobody likes 'em, but they sure get the job done. First, replace the problematic text with something weird, like this:
preg_replace("#<pre>(.+?)</pre>#", "||k||", $content);
This will effectively change your
<pre> blah, blah, bllah....</pre>
for something else, and then call
strip_tags($content);
After that, you can just replace the original value in ||k||(or whatever you choose) and you'll get the desired result.
I think your content is not stored very well in the $content variable
could you check once by converting inner double quotes to single quotes
$content="
<div id='wrapper'>
Notes
</div>
<pre>
<div id='loginfos'>asdasd</div>
</pre>
";
strip_tags($content, '<pre>');
You may do the following:
Use preg_replace with 'e' modifier to replace contents of pre tags with some strings like ###1###, ###2###, etc. while storing this contents in some array
Run strip_tags()
Run preg_relace with 'e' modifier again to restore ###1###, etc. into original contents.
A bit kludgy but should work.
Could you please write full code. I understood, but something goes wrong. Please write full programming code

How to remove a link from content in php?

How can i remove the link and remain with the text?
text text text. <br><a href='http://www.example.com' target='_blank' title='title' style='text-decoration:none;'>name</a>
like this:
text text text. <br>
i still have a problem.....
$text = file_get_contents('http://www.example.com/file.php?id=name');
echo preg_replace('#<a.*?>.*?</a>#i', '', $text)
in that url was that text(with the link) ...
this code doesn't work...
what's wrong?
Can someone help me?
I suggest you to keep the text in link.
strip_tags($text, '<br>');
or the hard way:
preg_replace('#<a.*?>(.*?)</a>#i', '\1', $text)
If you don't need to keep text in the link
preg_replace('#<a.*?>.*?</a>#i', '', $text)
While strip_tags() is capable of basic string sanitization, it's not fool-proof. If the data you need to filter is coming in from a user, and especially if it will be displayed back to other users, you might want to look into a more comprehensive HTML sanitizer, like HTML Purifier. These types of libraries can save you from a lot of headache up the road.
strip_tags() and various regex methods can't and won't stop a user who really wants to inject something.
Try:
preg_replace('/<a.*?<\/a>/','',"test test testa<br> <a href='http://www.example.com' target='_blank' title='title' style='text-decoration:none;'>name</a>");
this is my solutions :
function removeLink($str){
$regex = '/<a (.*)<\/a>/isU';
preg_match_all($regex,$str,$result);
foreach($result[0] as $rs)
{
$regex = '/<a (.*)>(.*)<\/a>/isU';
$text = preg_replace($regex,'$2',$rs);
$str = str_replace($rs,$text,$str);
}
return $str;}
A version from the above compiled notes:
$withoutlink = preg_replace('/<a.*>(.*)<\/a>/isU','$1',$String);
strip_tags() will strip HTML tags.
Try this one. Very simple!
$content = "text text text. <br><a href='http://www.example.com' target='_blank' title='title' style='text-decoration:none;'>name</a>";
echo preg_replace("/<a[^>]+\>[a-z]+/i", "", $content);
Output:
text text text. <br>
Try:
$string = preg_replace( '#<(a)[^>]*?>.*?</\\1>#si', '', $string );
Note:
this code remove link with text.
One more short solution without regexps:
function remove_links($s){
while(TRUE){
#list($pre,$mid) = explode('<a',$s,2);
#list($mid,$post) = explode('</a>',$mid,2);
$s = $pre.$post;
if (is_null($post))return $s;
}
}
?>

Categories