Search and Replace with Regex

Search and Replace with Regex - php

I am trying to search through text for a specific word and then add a html tag around that word.For example if i had the string "I went to the shop to buy apples and oranges" and wanted to add html bold tags around apples.
The problem, the word i search the string with is stored in a text file and can be uppercase,lowercase etc.When i use preg_replace to do this i manage to replace it correctly adding the tags but for example if i searched for APPLES and the string contained "apples" it would change the formatting from apples to APPLES, i want the format to stay the same.
I have tried using preg_replace but i cant find a way to keep the same word casing.This is what i have:
foreach($keywords as $value)
{
$pattern = "/\b$value\b/i";
$replacement = "<b>$value</b>";
$new_string = preg_replace($pattern, $replacement, $string);
}
So again if $value was APPLES it would change every case format of apples in the $string to uppercase due to $replacemant having $value in it which is "APPLES".
How could i achieve this with the case format staying the same and without having to do multiple loops with different versions of case format?
Thanks

Instead of using $value verbatim in the replacement, you can use the literal strings \0 or $0. Just as \n/$n, for some integer n, refers back to the nth capturing group of parentheses, \0/$0 is expanded to the entire match. Thus, you'd have
foreach ($keywords as $value) {
$new_string = preg_replace("/\\b$value\\b/i", '<b>$0</b>', $string);
}
Note that '<b>$0</b>' uses single quotes. You can get away with double quotes here, because $0 isn't interpreted as a reference to a variable, but I think this is clearer. In general, you have to be careful with using a $ inside a double-quoted string, as you'll often get a reference to an existing variable unless you escape the $ as \$. Similarly, you should escape the backslash in \b inside the double quotes for the pattern; although it doesn't matter in this specific case, in general backslash is a meaningful character within double quotes.

I might have misunderstood your question, but if what you are struggling on is differentiating between upper-case letter (APPLE) and lower-case letter (apple), then the first thing you could do is convert the word into upper-case, or lower-case, and then run the tests to find it and put HTML tags around it. That is just my guess and maybe I completely misunderstood the question.

In the code exists offtopic error: the result value have been rewritten on not first loop iteration. And ending value of $new_string will be only last replacement.

Related

Use regex to quote the name in name-value pair of a list of pairs

I am trying to put quotes around the names of name-value pairs separated by commas. I use preg_replace and regex to achieve that. However, my pattern is not working properly.
$str="f1=1,f2='2',f3='a',f4=4,f5='5'";
$newstr=Preg_replace(/'(?.[^=]+)'/,"'$1'",$str);
I expected $newstr to come out like so:
'f1'=1,'f2'='2','f3'='a','f4'=4,'f5'='5'
But it doesn't and the qoutes don't contain the name.
What should the pattern be and how can I use the comma to get all of them correctly?

There are a few issues with your attempt:
PHP does not have a regex-literal syntax as in JavaScript, so starting the regex value with a forward slash is a syntax error. It should be a string, so start with a quote. Maybe you accidently swapped the slash and quote at the start and the end.
(?. is not valid. Maybe you intended (?:, but then there is no capture group and $1 is not a valid back reference. To have the capture group, you should not have (?., but just (.
[^=]+ could include substrings like 1,f2. There should be logic to not start matching while still inside a value (whether quoted or not).
I would suggest a regex where you match both parts around the = (both key and value), and then in the replacement, just reproduce the second part without change. This will ensure you don't accidently use anything in the value side for wrapping in quotes:
$newstr = preg_replace("/([^,=]+)=('[^']*'|[^,]*)/","'$1'=$2",$str);

Basically, match beginning of line or a comma (with negative capture) and then capture everything until a =
$reg = "/(?<=^|,)([^=]+)/";
$str = "f1=1,f2='2',f3='a',f4=4,f5='5'";
print_r(preg_replace($reg, "'$1'", $str));
// output:
// 'f1'=1,'f2'='2','f3'='a','f4'=4,'f5'='5'

This will also work, a different approach, but assuming there will be no comma in the values or names except the separators..
$newstr = preg_replace("/(.)(?==)|(?<=,|^)(.)/", "$1'$2", $str);
But I believe string and simple array operations will be faster as the regex is really getting complex and there are so many steps to get the characters.. Here is the same output but with array functions only.
$newstr = implode(",", array_map(function($element){ return "'". implode("'=", explode("=", $element)); }, explode(",", $str)));
RegEx is not always fast than string or array operations, but yes it can do complex things with little bit of code.

Insert a string into a string at a specific point using PHP

I'm wondering if there is a way to insert a string into another string at a specific point (in this case, near the end)? For example:
$string1 = "item one, item two, item three, item four";
$string2 = "AND";
//do something fancy here
echo $string1;
OUTPUT:
item one, item two, item three, AND item four
I need some help with the fancy string work part. Basically inserting the word after the last ", " if possible.

You can use preg_replace for this, and I find it to be terser than other string manipulation methods and also more easily adaptable if your use case should change:
$string1 = "item one, item two, item three, item four";
$string2 = "AND";
$pattern = "/,(?!.*,)/";
$string1 = preg_replace($pattern, ", $string2", $string1);
echo $string1;
Where you pass preg_replace a regex pattern, the replacement string, and the original string. Instead of modifying the original string, preg_replace returns a new string, so you will set $string1 equal to the output of preg_replace.
The pattern: You can use any delimiter to signal the beginning and end of the expression. Typically I see / used*, so the expression will be "/pattern/", where the pattern consists of the comma and a negative lookahead (?!) to find the comma that isn't followed by another comma. It isn't necessary to explicitly declare $pattern. You can just use the expression directly in the preg_replace arguments, but sometimes it can be just a little easier (especially for complex patterns) to separate the pattern declaration from its use.
The replacement: preg_replace is going to replace the entire match, so you need to prepend your replacement text with the comma (that's getting replaced) and a space. Since variables wrapped in double quotes are evaluated in strings, you put $string2 inside the quotes**.
The target: you just put your original string here.
* I prefer to use ~ as my delimiter, since / starts to get cumbersome when you deal with urls, but you can use anything.
Here is a cheat sheet for regex patterns, but there are plenty of these floating around. Just google regex cheat sheet if you need one.
https://www.rexegg.com/regex-quickstart.html
Also, you can find plenty of online regex testers. Here is one that includes a quick reference and also lets you switch regex engines (there are a few, and some can be just a little bit different than others):
https://regex101.com/
** I prefer to also wrap the variable in curly braces to make it more explicit that I am inserting the value, but it's optional. That would look like ", {$string2}"

Lots of ways to do this - but since you specifically stated "after the last ," then strrpos seemed appropriate:
// insert this line where you indicate 'do something fancy here'
$string1 = substr_replace($string1, " ".$string2, strrpos($string1,",")+1, 0);
Find the right-most comma and insert the $string2 (with a space prepended) one position after it. The last parameter indicates the length of the substring to replace so a 0 means "do not replace anything but only insert."
Note the extra space added to $string2. Obviously you could modify how $string2 is initialized to include the space and eliminate this part.

You can use a pattern to match the last comma in the string using .*, and then use \K to forget what is matched so far.
In the replacement use AND
$string1 = "item one, item two, item three, item four";
$string2 = " AND";
$string1 = preg_replace("/.*,\K/", $string2, $string1);
echo $string1;
Output
item one, item two, item three, AND item four
See a PHP demo.

Regex grab all text between brackets, and NOT in quotes

I'm attempting to match all text between {brackets}, however not if it is in quotation marks:
For example:
$str = 'value that I {want}, vs value "I do {NOT} want" '
my results should snatch "want", but omit "NOT". I've searched stackoverflow desperately for the regex that could perform this operation with no luck. I've seen answers that allow me to get the text between quotes but not outside quotes and in brackets. Is this even possible?
And if so how is it done?
So far this is what I have:
preg_match_all('/{([^}]*)}/', $str, $matches);
But unfortunately it only gets all text inside brackets, including {NOT}

It's quite tricky to get this done in one go. I even wanted to make it compatible with nested brackets so let's also use a recursive pattern :
("|').*?\1(*SKIP)(*FAIL)|\{(?:[^{}]|(?R))*\}
Ok, let's explain this mysterious regex :
("|') # match eiter a single quote or a double and put it in group 1
.*? # match anything ungreedy until ...
\1 # match what was matched in group 1
(*SKIP)(*FAIL) # make it skip this match since it's a quoted set of characters
| # or
\{(?:[^{}]|(?R))*\} # match a pair of brackets (even if they are nested)
Online demo
Some php code:
$input = <<<INP
value that I {want}, vs value "I do {NOT} want".
Let's make it {nested {this {time}}}
And yes, it's even "{bullet-{proof}}" :)
INP;
preg_match_all('~("|\').*?\1(*SKIP)(*FAIL)|\{(?:[^{}]|(?R))*\}~', $input, $m);
print_r($m[0]);
Sample output:
Array
(
[0] => {want}
[1] => {nested {this {time}}}
)

Personally I'd process this in two passes. The first to strip out everything in between double quotes, the second to pull out the text you want.
Something like this perhaps:
$str = 'value that I {want}, vs value "I do {NOT} want" ';
// Get rid of everything in between double quotes
$str = preg_replace("/\".*\"/U","",$str);
// Now I can safely grab any text between curly brackets
preg_match_all("/\{(.*)\}/U",$str,$matches);
Working example here: http://3v4l.org/SRnva

Regex extra spaces in string not in double or single quotes - PHP

I would like to replace extra spaces (instances of consecutive whitespace characters) with one space, as long as those extra spaces are not in double or single quotes (or any other enclosures I may want to include).
I saw some similar questions, but I could not find a direct response to my needs above. Thank you!

Hope you're still looking, or come back to check! This seems to work for me:
'/\s+((["\']).*?(?=\2)\2)|\s\s+/'
...and replace with $1
EDIT
Also, if you need to allow for escaped quotes like \" or \', you could use this expression:
'/\s+((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/'
It gets a bit stickier if you want to add support for "balanced" quotes like brackets (e.g. () or {})
END EDIT
Let me know if you find problems or would like some explanation!
HOPEFULLY FINAL EDIT AND WARNINGS
Potential problem: If a quoted string starts at the beginning of the string variable (or file), it will either not count as a quoted string (and have any whitespace reduced) or it will throw off the whole thing, making anything NOT in quotes get treated as though it was in quotes and vice versa -
A potential change that might remedy this is to use the following match expression
/(?:^|\s+)((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/
this replaces \s+ with (?:^|\s+) at the beginning of the expression
this will add a space at the beginning of the variable if the string starts with a quote - just trim() or remove that whitespace to continue
I seem to have used the "line by line" approach (like sed, if I'm not mistaken) to reach my original results - if you use the "whole file" or "whole string" setting or approach, carriage-return-line-feed seems to count as two whitespace characters (can't imagine why...), thus turning any newlines into single spaces (unless they are inside quotes and "dot-matches-newline" is used, of course)
this could be resolved by replacing the . and \s shorthand character classes with the specific characters you want to match, like the following:
/(?:^|[ \t]+)((["\'])(\\\\\2|(?!\2)[\s\S])*?(?=\2)\2)|[ \t]{2,}/
this does not require the dot-matches-newline switch and only replaces multiple spaces or tabs - not newlines - with a single space (and of course, only if they are not quoted)
EXAMPLE
This link shows an example of the first expression and last expression in use on sample text on http://codepad.viper-7.com

You could do it in several steps. Consider the following example:
$str = 'This is a string with "Bunch of extra spaces". Leave them "untouched !".';
$id = 0;
$buffer = array();
$str = preg_replace_callback('|".*?"|', function($m) use (&$id, &$buffer) {
$buffer[] = $m[0];
return '__' . $id++;
}, $str);
$str = preg_replace('|\s+|', ' ', $str);
$str = preg_replace_callback('|__(\d+)|', function($m) use ($buffer) {
return $buffer[$m[1]];
}, $str);
echo $str;
This will output the string:
This is a string with "Bunch of extra spaces". Leave them "untouched !".
Although this is is not the prettiest solution.

Regular Expressions find and replace

I am having problems with RegEx in PHP and can't seem to find the answer.
I have a string, which is 3 letters, all caps ie COS.
the letters will change but always be 3 chars long and in caps, it will also be in the center of another string, surrounded by commas.
I need a regEx to find 3 caps letter inside a string and cahnge them from COS to 'COS'
(im doing this to amend a sql insert string)
I can't seem to find the regEx unless i use spercifit letter but the letters will change.
I need something along the lines of
[A-z]{3} then replace with '[A-Z]' (I know this isnt anywere near correct, just shorthand)
Anyone any suggestions?
Cheers
EDIT:
Just wanted to add incase anyone comes accross this question at a later date:
the sql insert string (provided from an external source and ftp's to my server daily)
contained the 3 capital string twice, once with commas and once with out
so I had to also remove the double commas added from the first regEx
$sqlString = preg_replace('/([A-Z]{3})/', "'$1'", $isqlString);
$sqlString = preg_replace('/\'\'([A-Z]{3})\'\'/', "'$1'", $sqlStringt);
Thanks everyone

You were actually very close. You could use:
echo preg_replace('/([A-Z]{3})/', "'$1'", 'COS'); //will output 'COS'
For MySQL statements I would advise to use the function mysql_real_escape_string() though.

$string = preg_replace('/([A-Z]{3})/', "'$1'", $string);
http://php.net/manual/en/function.preg-replace.php

Assuming it's like you said, "three capital letters surrounded by commas, e.g.
Foo bar,COS,Foo Bar
You can use look-ahead and look-behinds and find the letters:
(?<=,)([A-Z]{3})(?=,)
Then a simple replace to surround with single quotes will be adequate:
'$1'
All together, Here's it working.

preg_replace('/(^|\b)([A-Z]{3})(\b|$)/', "'${2}'", $string);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.