How to remove a link from content in php? - php

How can i remove the link and remain with the text?
text text text. <br><a href='http://www.example.com' target='_blank' title='title' style='text-decoration:none;'>name</a>
like this:
text text text. <br>
i still have a problem.....
$text = file_get_contents('http://www.example.com/file.php?id=name');
echo preg_replace('#<a.*?>.*?</a>#i', '', $text)
in that url was that text(with the link) ...
this code doesn't work...
what's wrong?
Can someone help me?

I suggest you to keep the text in link.
strip_tags($text, '<br>');
or the hard way:
preg_replace('#<a.*?>(.*?)</a>#i', '\1', $text)
If you don't need to keep text in the link
preg_replace('#<a.*?>.*?</a>#i', '', $text)

While strip_tags() is capable of basic string sanitization, it's not fool-proof. If the data you need to filter is coming in from a user, and especially if it will be displayed back to other users, you might want to look into a more comprehensive HTML sanitizer, like HTML Purifier. These types of libraries can save you from a lot of headache up the road.
strip_tags() and various regex methods can't and won't stop a user who really wants to inject something.

Try:
preg_replace('/<a.*?<\/a>/','',"test test testa<br> <a href='http://www.example.com' target='_blank' title='title' style='text-decoration:none;'>name</a>");

this is my solutions :
function removeLink($str){
$regex = '/<a (.*)<\/a>/isU';
preg_match_all($regex,$str,$result);
foreach($result[0] as $rs)
{
$regex = '/<a (.*)>(.*)<\/a>/isU';
$text = preg_replace($regex,'$2',$rs);
$str = str_replace($rs,$text,$str);
}
return $str;}

A version from the above compiled notes:
$withoutlink = preg_replace('/<a.*>(.*)<\/a>/isU','$1',$String);

strip_tags() will strip HTML tags.

Try this one. Very simple!
$content = "text text text. <br><a href='http://www.example.com' target='_blank' title='title' style='text-decoration:none;'>name</a>";
echo preg_replace("/<a[^>]+\>[a-z]+/i", "", $content);
Output:
text text text. <br>

Try:
$string = preg_replace( '#<(a)[^>]*?>.*?</\\1>#si', '', $string );
Note:
this code remove link with text.

One more short solution without regexps:
function remove_links($s){
while(TRUE){
#list($pre,$mid) = explode('<a',$s,2);
#list($mid,$post) = explode('</a>',$mid,2);
$s = $pre.$post;
if (is_null($post))return $s;
}
}
?>

Related

Replacing Relative Links with External Links in PHP String

I am working with an editor that works purely with internal relative links for files which is great for 99% of what I use it for.
However, I am also using it to insert links to files within an email body and relative links don't cut the mustard.
Instead of modifying the editor, I would like to search the string from the editor and replace the relative links with external links as shown below
Replace
files/something.pdf
With
https://www.someurl.com/files/something.pdf
I have come up with the following but I am wondering if there is a better / more efficient way to do it with PHP
<?php
$string = 'A link, some other text, A different link';
preg_match_all('/<a[^>]+href=([\'"])(?<href>.+?)\1[^>]*>/i', $string, $result);
if (!empty($result)) {
// Found a link.
$baseUrl = 'https://www.someurl.com';
$newUrls = array();
$newString = '';
foreach($result['href'] as $url) {
$newUrls[] = $baseUrl . '/' . $url;
}
$newString = str_replace($result['href'], $newUrls, $string);
echo $newString;
}
?>
Many thanks
Lee
You can simply use preg_replace to replace all the occurrences of files starting URLs inside double quotes:
$string = 'A link, some other text, A different link';
$string = preg_replace('/"(files.*?)"/', '"https://www.someurl.com/$1"', $string);
The result would be:
A link, some other text, A different link
You really should use DOMdocument for such job, but if you want to use a regex, this one does the job:
$string = '<a some_attribute href="files/something.pdf" class="abc">A link</a>, some other text, <a class="def" href="files/somethingelse.pdf" attr="xyz">A different link</a>';
$baseUrl = 'https://www.someurl.com';
$newString = preg_replace('/(<a[^>]+href=([\'"]))(.+?)\2/i', "$1$baseUrl/$3$2", $string);
echo $newString,"\n";
Output:
<a some_attribute href="https://www.someurl.comfiles/something.pdf" class="abc">A link</a>, some other text, <a class="def" href="https://www.someurl.com/files/somethingelse.pdf" attr="xyz">A different link</a>

Strip just <p> tags in PHP

I have a set of <p></p> tags wrapping a set of data, that data includes other tags such as <script></script> however that content could contain any number of different tags.
I just need to remove any paragraph tags from the content
Example below
$text = "<p><script>example text inside script.<script></p>";
I see the strip_tags function but I believe that will remove all the tags.
How would I go about just removing the paragraph?
Thanks.
Try,
$text = "<p><script>example text inside script.<script></p>";
$formatted_text = str_replace(['<p>', '</p>'], '', $text);
You can allow tag with strip_tags()
like this example:
$text = "<p><script>example text inside script.<script></p>";
echo strip_tags($text, '<script>');
Try this.
<?php
$text = "<p><script>example text inside script.<script></p>";
$replace = array('<p>','</p>');
echo str_replace($replace,'',$text);
http://sandbox.onlinephpfunctions.com/code/43150c7af4e7e5f572827d91abca5756213ab7ba
Version 2 (works for classes)
echo preg_replace('%<p(.*?)>|</p>%s','',$text);
http://sandbox.onlinephpfunctions.com/code/6c5414773efc1317578b5f0581b68e5acabb9a2b
Hope this helps.
You can use str_replace(),
add this line -
$text = str_replace('<p>','',$text);
$text = str_replace('</p>','',$text);
It will remove both
<p> and </p>

How to remove plain text from a string after using strip_tags()

So i have a string and I used the strip_tags() function to remove all tags except IMG but I still have plain text next to my IMG element. Here a visual example
$myvariable = "This text needs to be removed<a href='blah_blah_blah'>Blah</a><img src='blah.jpg'>"
So using PHP strip_tags() I was able to remove all tags except the <img> tag (which is what I want). But the thing is now it didn't remove the text.
How do I remove the left over text? Text will always either before tag or after tag as well
[ADDED MORE DETAILS]
$description = 'crazy stuff<img src="https://scontent.cdninstagram.com/t51.2885-15/e15/14287934_1389514537744146_673363238_n.jpg?ig_cache_key=MTMzNzM3MzgwNjAyNDY5NDAzMA%3D%3D.2">';
that's what the variable is actually holding.
Thanks in Advance
Instead of replacing something you can very well extract the values you want:
(<(\w+).+</\2>)
To be used with preg_match(), see a demo on regex101.com.
IN PHP:
<?php
$regex = '~(<(\w+).+</\2>)~';
$string = 'crazy stuff<img src="https://scontent.cdninstagram.com/t51.2885-15/e15/14287934_1389514537744146_673363238_n.jpg?ig_cache_key=MTMzNzM3MzgwNjAyNDY5NDAzMA%3D%3D.2">here as well';
if (preg_match($regex, $string, $match)) {
echo $match[1];
}
?>
Please show your whole piece of code with the use of strip_tags.
You can try: preg_replace('~.*(<img[^>]+>)~', '$1', $myvariable);

Remove javascript links

I'm looking for a regex that will be able to replace all links like Link with a warning. I've been having a play but no success so far! I've always been bad with regex, can someone point me in the right direction? I have this so far:
Edit: People saying don't use Regex - the HTML will be the output of a markdown parser with all HTML tags in the markdown stripped. Therefore i know that the output of all links will be formatted as stated above, therefore regex would surely be a good tool in this particular situation. I am not allowing users to enter pure HTML. And SO has done something very similar, try creating a javascript link, and it will be removed
<?php
//Javascript link filter test
if(isset($_POST['jsfilter'])){
$html = " JS Link ";
$pattern = "/ href\\s*?=\\s*?[\"']\\s*?(javascript)\\s*?(:).*?([\"']) /is";
$replacement = "\"javascript: alert('Javascript links have been blocked');\"";
$html = preg_replace($pattern, $replacement, $html);
echo $html;
}
?>
<form method="post">
<input type="text" name="jsfilter" />
<button type="submit">Submit</button>
</form>
The right regex should be :
$pattern = '/href="javascript:[^"]+"/';
$replacement = 'href="javascript:alert(\'Javascript links have been blocked\')"';
Use strip_tags and htmlSpecialChars() to display user generated content. If you want to let users use specific tags, refer to BBcode.
You should test quote and double quotes, handle white spaces, etc...
$html = preg_replace( '/href\s*=\s*"javascript:[^"]+"/i' , 'href="#"' , $html );
$html = preg_replace( '/href\s*=\s*\'javascript:[^i]+\'/i' , 'href=\'#\'' , $html );
Try this code. I think, this would help.
<?php
//Javascript link filter test
if(isset($_POST['jsfilter'])){
$html = " JS Link ";
$pattern = '/a href="javascript:(.*?)"/i';
$replacement = 'a href="javascript: alert(\'Javascript links have been blocked\');"';
$html = preg_replace($pattern, $replacement, $html);
echo $html;
}
?>

What is the best way to strip out all html tags from a string?

Using PHP, given a string such as: this is a <strong>string</strong>; I need a function to strip out ALL html tags so that the output is: this is a string. Any ideas? Thanks in advance.
PHP has a built-in function that does exactly what you want: strip_tags
$text = '<b>Hello</b> World';
print strip_tags($text); // outputs Hello World
If you expect broken HTML, you are going to need to load it into a DOM parser and then extract the text.
What about using strip_tags, which should do just the job ?
For instance (quoting the doc) :
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
will give you :
Test paragraph. Other text
Edit : but note that strip_tags doesn't validate what you give it. Which means that this code :
$text = "this is <10 a test";
var_dump(strip_tags($text));
Will get you :
string 'this is ' (length=8)
(Everything after the thing that looks like a starting tag gets removed).
strip_tags is the function you're after. You'd use it something like this
$text = '<strong>Strong</strong>';
$text = strip_tags($text);
// Now $text = 'Strong'
I find this to be a little more effective than strip_tags() alone, since strip_tags() will not zap javascript or css:
$search = array(
"'<head[^>]*?>.*?</head>'si",
"'<script[^>]*?>.*?</script>'si",
"'<style[^>]*?>.*?</style>'si",
);
$replace = array("","","");
$text = strip_tags(preg_replace($search, $replace, $html));

Categories