php preg_replace between two needles - php

I know this type of question has been asked and answered before but I cant isolate the error in my pattern match.
due to some very screwy legacy db input I am trying to remove anything between two html special chars and then will move on to process the remains after.
the original code went 1<b>2 to bold anything after 1, but has ended up as 1<b>2
I would like to be left with either 1<>2 or 1 2
am I even close?
thx
Art
$str = '1<b>2';
$output = preg_replace('/&#?[a-z0-9]{2,8};(.*?)\/&#?[a-z0-9]{2,8};/is', '',$str);

Looks like you should remove the slash in the middle
/&#?[a-z0-9]{2,8};[^&]+&#?[a-z0-9]{2,8};/is

Have you tried to do:
$str = strip_tags(html_entity_decode('1<b>2'));
or if you want to replaces tags with something else, like an space:
$str = html_entity_decode('1<b>2');
$output = preg_replace('/<\/?[^\>]+>/ui', ' ',$str);

Related

Remove '(' and text which follows from a string

Goal, to trim all text starting with the a left parenthesis '(' from a string. I've read through stack for the last hour, php.net, googled, I've tried using trim, ltrim, rtrim, strpos, preg_replace, etc. Everything that I have found so far has dealt with how to replace the text IF it is a know quantity - mine will vary.
Examples:
Text i want to keep (All of this i want to remove) as well as this...
Example 2:
Text 2 keep (text to remove 123)
Example 3:
Keep Please (123remove)
What is the best way to sanitize this string? The text which follows the first paren will be alphanumeric (letters, numbers, possibly even Exclamation points, etc). The only constant is the first paren '(', anything after i want to trim away/remove.
I am of novice level, I am not yet dealing with classes or jQuery, etc. I wish to do this on the server.
Thank you for any help or guidance.
You can use strpos to find the first parenthesis and substr to get the substring until this position :
$str = 'Test keep (remove) remove';
$pos = strpos($str, '(');
$newString = '';
if ($pos !== false) {
$newString = substr($str, 0, $pos);
}
echo $newString;
Output
Test keep
You were on the right track with preg_replace. You could try the following:
preg_replace('\([^]*', $replacement, $subject)
Tested and works
echo preg_replace('#\(.*#i','',$string_tostrip);
$str =" Text i want to keep (All of this i want to remove)";
$s=explode("(",$str);
$concatinated_str = $s[0];
echo $concatinated_str; // Text i want to keep

str_replace matches incorrect part of string

I'm having some issues with str_replace, when trying to automatically put backticks around table- and fieldnames.
Assuming i have the following arrays:
$match = array('rooms.roomID','r_rooms.roomID');
$replace = array('`rooms`.`roomID`','`r_rooms`.`roomID`');
$subject = 'rooms.roomID = r_rooms.roomID';
str_replace($match,$replace,$subject);
The result that I expect is:
`rooms`.`roomID` = `r_rooms`.`roomID`
But instead I'm getting this:
`rooms`.`roomID` = r_`rooms`.`roomID`
However if i change r_rooms to r_ooms, my result is as expected
`rooms`.`roomID` = `r_ooms`.`roomID`
I've tried the same precedure, using preg_replace, but this gives me the same output aswell.
Quick fix would be reordering $match and $replace arrays like this...
$match = array('r_rooms.roomID', 'rooms.roomID');
$replace = array('`r_rooms`.`roomID`', '`rooms`.`roomID`');
The problem of the original approach is that str_replace processes $match array element by element, at each step trying to cover the whole string - and replaces the found parts immediately.
As rooms.roomID string 'matches' both [rooms.roomID] and r_[rooms.roomID], and replaces these accordingly, the second iteration will have nothing to do.
As I said, that's only a quick fix. In this case I'd try to use preg_replace instead, surrounding the actual search with \b (word boundary anchors).
Then again, with all due respect I smell XY problem here. Aren't you trying to make your own routine for quoteIdentifier? That was already solved (and asked here a lot of times).
It is correct. First replaced value is rooms.roomID to rooms.roomID (2 times)
Change order of $match and $replace tables to get expected result
$match = array('r_rooms.roomID','rooms.roomID');
$replace = array('`r_rooms`.`roomID`','`rooms`.`roomID`');

Converting links occuring inside a string

I am attempting to change a string occurance e.g. http://www.bbc.co.uk/ so that it appears inside a html link e.g. http://www.bbc.co.uk
however for some reason my regex conversion does not work. Can someone please point me in the correct direction?
$text = "I love this website http://www.bbc.co.uk/";
$x = preg_replace("#[a-z]+://[^<>\s]+[[a-z0-9]/]#i", "\\0", $text);
var_dump($x);
outputs I love this website http://www.bbc.co.uk/ (No html link)
Your weird character class is at fault:
[[a-z0-9]/]
Double square brackets are for POSIX character classes like [[:digit:]].
You meant to write just:
[a-z0-9/]
It is because you regex is giving you a match (in fact it's really not even close to giving you a match as you are not accepting periods in the domain name at all). Try something like this:
$pattern = '#https?://.*\b#i';
$replace = '$0';
$x = preg_replace($pattern, $replace, $text);
Note that I am not actually trying to validate the URL format here, so I just accept anything like http():// up to the next word boundary. It didn't seem as if you were going for a true URL validation regex anyway (i.e. validating there is at least one ., that the TLD component has 2-6 characters, etc.), so I just figure I would give you the simplest pattern that would match.
Use this:
$x = preg_replace('#http://[?=&a-z0-9._/-]+#i', '<a target="_blank" href="$0">$0</a>', $text);

PHP Regex to remove everything after a character

So I've seen a couple articles that go a little too deep, so I'm not sure what to remove from the regex statements they make.
I've basically got this
foo:bar all the way to anotherfoo:bar;seg98y34g.?sdebvw h segvu (anything goes really)
I need a PHP regex to remove EVERYTHING after the colon. the first part can be any length (but it never contains a colon. so in both cases above I'd end up with
foo and anotherfoo
after doing something like this horrendous example of psuedo-code
$string = 'foo:bar';
$newstring = regex_to_remove_everything_after_":"($string);
EDIT
after posting this, would an explode() work reliably enough? Something like
$pieces = explode(':', 'foo:bar')
$newstring = $pieces[0];
explode would do what you're asking for, but you can make it one step by using current.
$beforeColon = current(explode(':', $string));
I would not use a regex here (that involves some work behind the scenes for a relatively simple action), nor would I use strpos with substr (as that would, effectively, be traversing the string twice). Most importantly, this provides the person who reads the code with an immediate, "Ah, yes, that is what the author is trying to do!" instead of, "Wait, what is happening again?"
The only exception to that is if you happen to know that the string is excessively long: I would not explode a 1 Gb file. Instead:
$beforeColon = substr($string, 0, strpos($string,':'));
I also feel substr isn't quite as easy to read: in current(explode you can see the delimiter immediately with no extra function calls and there is only one incident of the variable (which makes it less prone to human errors). Basically I read current(explode as "I am taking the first incident of anything prior to this string" as opposed to substr, which is "I am getting a substring starting at the 0 position and continuing until this string."
Your explode solution does the trick. If you really want to use regexes for some reason, you could simply do this:
$newstring = preg_replace("/(.*?):(.*)/", "$1", $string);
A bit more succinct than other examples:
current(explode(':', $string));
You can use RegEx that m.buettner wrote, but his example returns everything BEFORE ':', if you want everything after ':' just use $2 instead of $1:
$newstring = preg_replace("/(.*?):(.*)/", "$2", $string);
You could use something like the following. demo: http://codepad.org/bUXKN4el
<?php
$s = 'anotherfoo:bar;seg98y34g.?sdebvw h segvu';
$result = array_shift(explode(':', $s));
echo $result;
?>
Why do you want to use a regex?
list($beforeColon) = explode(':', $string);

PHP URL to Link with Regex

I know I've seen this done a lot in places, but I need something a little more different than the norm. Sadly When I search this anywhere it gets buried in posts about just making the link into an html tag link. I want the PHP function to strip out the "http://" and "https://" from the link as well as anything after the .* so basically what I am looking for is to turn A into B.
A: http://www.youtube.com/watch?v=spsnQWtsUFM
B: www.youtube.com
If it helps, here is my current PHP regex replace function.
ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]", "\\0", htmlspecialchars($body, ENT_QUOTES)));
It would probably also be helpful to say that I have absolutely no understanding in regular expressions. Thanks!
EDIT: When I entered a comment like this blahblah https://www.facebook.com/?sk=ff&ap=1 blah I get html like this<a class="bwl" href="blahblah https://www.facebook.com/?sk=ff&ap=1 blah">www.facebook.com</a> which doesn't work at all as it is taking the text around the link with it. It works great if someone only comments a link however. This is when I changed the function to this
preg_replace("#^(.*)//(.*)/(.*)$#",'<a class="bwl" href="\0">\2</a>', htmlspecialchars($body, ENT_QUOTES));
This is the simples and cleanest way:
$str = 'http://www.youtube.com/watch?v=spsnQWtsUFM';
preg_match("#//(.+?)/#", $str, $matches);
$site_url = $matches[1];
EDIT: I assume that the $str had been checked to be a URL in the first place, so I left that out. Also, I assume that all the URLs will contain either 'http://' or 'https://'. In case the url is formatted like this www.youtube.com/watch?v=spsnQWtsUFM or even youtube.com/watch?v=spsnQWtsUFM, the above regexp won't work!
EDIT2: I'm sorry, I didn't realize that you were trying to replace all strings in a whole test. In that case, this should work the way you want it:
$str = preg_replace('#(\A|[^=\]\'"a-zA-Z0-9])(http[s]?://(.+?)/[^()<>\s]+)#i', '\\1\\3', $str);
I am not a regex whizz either,
^(.*)//(.*)/(.*)$
\2
was what worked for me when I tried to use as find and replace in programmer's notepad.
^(.)// should extract the protocol - referred as \1 in the second line.
(.)/ should extract everything till the first / - referred as \2 in the second line.
(.*)$ captures everything till the end of the string. - referred as \3 in the second line.
Added later
^(.*)( )(.*)//(.*)/(.*)( )(.*)$
\1\2\4 \7
This should be a bit better, but will only replace just 1 URL
The \0 is replaced by the entire matched string, whereas \x (where x is a number other than 0 starting at 1) will be replaced by each subpart of your matched string based on what you wrap in parentheses and the order those groups appear. Your solution is as follows:
ereg_replace("[[:alpha:]]+://([^<>[:space:]]+[:alnum:]*)[[:alnum:]/]", "\\1
I haven't been able to test this though so let me know if it works.
I think this should do it (I haven't tested it):
preg_match('/^http[s]?:\/\/(.+?)\/.*/i', $main_url, $matches);
$final_url = ''.$matches[1].'';
I'm surprised no one remembers PHP's parse_url function:
$url = 'http://www.youtube.com/watch?v=spsnQWtsUFM';
echo parse_url($url, PHP_URL_HOST); // displays "www.youtube.com"
I think you know what to do from there.
$result = preg_replace('%(http[s]?://)(\S+)%', '\2', $subject);
The code with regex does not work completely.
I made this code. It is much more comprehensive, but it works:
See the result here: http://cht.dk/data/php-scripts/inc_functions_links.php
See the source code here: http://cht.dk/data/php-scripts/inc_functions_links.txt

Categories