How to use preg_replace to remove part of url address? - php

I have some HTML code like this:
<a href="http://mysite.com/documentos/Servicios/SUCRE/sucDoc19.pdf&sa=U&ei=sf0JUrmjIc3Nswb154CgDQ&ved=0CCkQFjAA&usg=AFQjCNGfXg_9x83U3pYr6JfkJcWuXv8X0Q">
I need to clean my code to get something like this
<a href="http://mysite.com/documentos/Servicios/SUCRE/sucDoc19.pdf">
using preg_replace.
My code is the following:
$serp = preg_replace('&sa=(.*)" ', '" ', $serp);
and it doesn't work.
BTW i need to restrict search with preg_replace until the FIRST entrance, i.e. i need to replace all html from &sa= to the FIRST ", but now it search from &sa= to the LAST "...

You're missing the regex delimiters.
$serp = preg_replace('/&sa=(.*)" /', '" ', $serp);
will give you this.

You missed the delimiter.
So your code looks like:
$serp = preg_replace('/&sa=(.*)" /', '" ', $serp);
okay, if you want to delete everything till the first quote then you can try the following instead of regex:
$temp = substr($serp,strpos($serp,'&sa='),strpos($serp,'"',strpos($serp,'&sa=')));
$serp = str_replace($temp,"",$serp);

Just another regex to do it :)
$text = '<a href="http://mysite.com/documentos/Servicios/SUCRE/sucDoc19.pdf&sa=U&ei=sf0JUrmjIc3Nswb154CgDQ&ved=0CCkQFjAA&usg=AFQjCNGfXg_9x83U3pYr6JfkJcWuXv8X0Q" target="_blank">';
$text = preg_replace('/(&sa=[^"]*)/', '', $text);
echo $text;
// Output:
<a href="http://mysite.com/documentos/Servicios/SUCRE/sucDoc19.pdf" target="_blank">
You can try it HERE (thks to hjpotter92 for this tool)

Related

Problems with Regular expressions in PHP

(I want to let my users tag other users with their names, problem: when someone edits his post again, he gets the link in his tinymce editor. when he saves his edits, the script will destroy the old link...)
I replace all words in a big string with words included in an array.
$users = {'this', 'car'}
$text = hello, this is <a title="this" href="">a test this</a>
$search = '!\b('.implode('|', $users).')\b!i';
$replace = '<a target="_blank" alt="$1" href="/user/$1">$1</a>';
$text = preg_replace($search, $replace, $text);
as you can see above, I try to replace 'this' and 'car' in $text with
<a target="_blank" alt="$1" href="/user/$1">$1</a>
the problem is, that my script also replaces 'this', when it's in my link:
<a title="this" href="">this</a>
im not completely sure, but I think, you know what I mean.
so my script destroys my links...
I don't need to detect, if the word is in a html element, because it should be able to replace words in other tags like h1 or p ...
I need something like
a pattern, which only matches, when the word looks like:
" this "
" this, "
",this "
" this: "...
(no problem, if i have to set these manually...)
another great solution: a string, where I can set the html tags which are not allowed.
$tags = 'a,e,article';
Greets
This should do it
<.*?this.*?>(*SKIP)(*FAIL)|\b(this)\b
Demo: https://regex101.com/r/fX0pT1/1
More on this regex approach, http://www.rexegg.com/regex-best-trick.html.
PHP Usage:
$users = array('this', 'car');
$text = 'hello, this is <a title="this" href="">a test this</a>';
$terms = '(' . implode('|', $users) . ')';
$search = '!<.*?'.$terms.'.*?>(*SKIP)(*FAIL)|\b(' . $terms . ')\b!i';
echo $search;
$replace = '<a target="_blank" alt="$2" href="/user/$2">$2</a>';
echo preg_replace($search, $replace, $text);
Output:
hello, <a target="_blank" alt="" href="/user/this">this</a> is <a title="this" href="">a test <a target="_blank" alt="" href="/user/this">this</a></a>
PHP Demo: https://eval.in/415964
...or if you only want it for links, https://regex101.com/r/fX0pT1/2, <a.*?this.*?>(*SKIP)(*FAIL)|\b(this)\b.

How to replace b tag with # sign

I have a string that looks like this
$t="<b>vist</b>thank you for the follow.";
I am trying to remove the tag b and put an "#" instead of this tag.
I tried this
str_replace("<b></b>","#",$t);
but it doesn't replace the closing tag.
I don't know why it is not working may be there is something omitted in the code.
Try with
$search = array('&#60b&#62','&#60/b&#62');
$replace = '#';
echo str_replace($search, $replace, $t);
To replace multiple words using str_replace() function,
You can Try this
$t="<b>vist</b> thank you for the follow";
$pattern=array();
$pattern[0]="<b>";
$pattern[1]="</b>";
$replacement=array();
$replacement[0]="#";
$replacement[1]="";
echo str_replace($pattern,$replacement,$t);
View the Demo
Try with -
$t="&#60b&#62vist&#60/b&#62";
echo str_replace(array("&#60b&#62", "&#60/b&#62"),"#",$t);

How to ignore single quotes in regex using preg_replace function in PHP?

I am basically trying to transform any hash-tagged word in a string into a link:
Here is what my code looks like:
public function linkify($text)
{
// ... generating $url
$text = preg_replace("/\B#(\w+)/", "<a href=" . $url . "/$1>#$1</a>", $text);
return $text;
}
It works pretty good excepting the case when that $text contains a single quote. Here are
Example1:
"What is your #name ?"
Result: "What is your #name?" Works fine.
Example2:
"What's your #name ?"
Result: "What's your #name?" Does not work, I want
this result: "What's your #name?"
Any idea about how I can get rid of that single quote problem using PHP ?
EDIT1:
Just for info, before or after html_entity_decode($text) I got
"What's your #name?"
Something like this.
$string = "' \'' '";
$string = preg_replace("#[\\\\']#", "\'", $string);
Something is protecting your html entities. This can save your life if the string is coming from a get/post request - but iI it's from a trusted source just use html_entity_decode to convert it back. This 39-thing is a way to express the single quote as you might have realized.
if the problem is html_entities, then maybe you only need to html_entity_decode your $text
$text = preg_replace("/\B#(\w+)/", "<a href=" . $url . "/html_entity_decode($1)>#$1</a>", $text);
Thanks all for your suggestions, I've finally sorted this out with this :
html_entity_decode($str, ENT_QUOTES);

Remove javascript links

I'm looking for a regex that will be able to replace all links like Link with a warning. I've been having a play but no success so far! I've always been bad with regex, can someone point me in the right direction? I have this so far:
Edit: People saying don't use Regex - the HTML will be the output of a markdown parser with all HTML tags in the markdown stripped. Therefore i know that the output of all links will be formatted as stated above, therefore regex would surely be a good tool in this particular situation. I am not allowing users to enter pure HTML. And SO has done something very similar, try creating a javascript link, and it will be removed
<?php
//Javascript link filter test
if(isset($_POST['jsfilter'])){
$html = " JS Link ";
$pattern = "/ href\\s*?=\\s*?[\"']\\s*?(javascript)\\s*?(:).*?([\"']) /is";
$replacement = "\"javascript: alert('Javascript links have been blocked');\"";
$html = preg_replace($pattern, $replacement, $html);
echo $html;
}
?>
<form method="post">
<input type="text" name="jsfilter" />
<button type="submit">Submit</button>
</form>
The right regex should be :
$pattern = '/href="javascript:[^"]+"/';
$replacement = 'href="javascript:alert(\'Javascript links have been blocked\')"';
Use strip_tags and htmlSpecialChars() to display user generated content. If you want to let users use specific tags, refer to BBcode.
You should test quote and double quotes, handle white spaces, etc...
$html = preg_replace( '/href\s*=\s*"javascript:[^"]+"/i' , 'href="#"' , $html );
$html = preg_replace( '/href\s*=\s*\'javascript:[^i]+\'/i' , 'href=\'#\'' , $html );
Try this code. I think, this would help.
<?php
//Javascript link filter test
if(isset($_POST['jsfilter'])){
$html = " JS Link ";
$pattern = '/a href="javascript:(.*?)"/i';
$replacement = 'a href="javascript: alert(\'Javascript links have been blocked\');"';
$html = preg_replace($pattern, $replacement, $html);
echo $html;
}
?>

Removal of bad hyperlinks and the content inside of them

Ok, basically I have an array of bad urls and I would like to search through a string and strip them out. I want to strip everything from the opening tag to the closing tag, but only if the url in the hyperlink is in the array of bad urls. Here is how I would picture it working but I don't understand regular expressions well.
foreach($bad_urls as $bad_url){
$pattern = "/<a*$bad_url*</a>/";
$replacement = ' ';
preg_replace($pattern, $replacement, $content);
}
Thanks in advance.
Assuming that your 'bad urls' are properly formatted URLs, I would suggest doing something like this:
foreach($bad_urls as $bad_url){
$pattern = '/<[aA]\s.+[href|HREF]\=\"' . convert_to_pattern($bad_url) . '\".+<\/[aA]>/msU';
$replacement = ' ';
$content = preg_replace_all($pattern, $replacement, $content);
}
and separately
function convert_to_pattern($url)
{
searches = array('%', '&', '?', '.', '/', ';', ' ');
replaces = array('\%','\&','\?','\.','\/','\;','\ ');
return preg_replace_all($searches, $replaces, $url);
}
Please do not try to parse HTML using regular expressions. Just load up the HTML in a DOM, find all the <a> tags and check the href property. Much simpler and fool-proof.

Categories