Regex Group Replace - php

So I'm trying to do a string replace and something is happening that I wouldn't expect to happen and wanted to see if someone could shed some light on it.
I'm trying to do a regex replace where I replace '| ' if it is present. I'm using a group matching and the question mark to get it done, but for some reason it's replacing just spaces as well.
$str = 'x x';
$str = preg_replace('/(| )?/','',$str);
echo $str; // Echoes out 'xx' whereas it should return 'x x'
But when I replace a space with a carret I get:
$str = 'x^x';
$str = preg_replace('/(|^)?/','',$str);
echo $str; // Echoes out 'x^x' as expected
Is there some special thing with spaces that I'm not remembering? Or should this just work?
I tried the following:
$str = preg_replace('/(|\s)?/','',$str);
$str = preg_replace('/(|[ ])?/','',$str);
And both of them are also giving the inaccurate results. Thoughts?

Oh, didn't know you were waiting for me xD
As per comment, you should escape the pipe with a backslash: \|.
The | (pipe) is a special character in regex and means 'or', so that your regex were matching either 'nothing' or 'space' in the first one and either nothing or caret ^ in the second one.

Related

Find and replace string with condition in php

I am newbie in PHP. I want to replace certain characters in a string. My code is in below:
$str="this 'is' a new 'string and i wanna' replace \"in\" \"it here\"";
$find = [
'\'',
'"'
];
$replace = [
['^', '*']
['#', '#']
];
$result = null;
$odd = true;
for ($i=0; $i < strlen($str); $i++) {
if (in_array($str[$i], $find)) {
$key = array_search($str[$i], $find);
$result .= $odd ? $replace[$key][0] : $replace[$key][1];
$odd = !$odd;
} else {
$result .= $str[$i];
}
}
echo $result;
the output of the above code is:
this ^is* a new ^string and i wanna* replace #in# #it here#.
but I want the output to be:
this ^is* a new 'string and i wanna' replace #in# "it here".
That means character will replace for both quotation(left quotation and right quotation- condition is for ' and "). for single quotation, string will not be replaced either if have left or right quotation. it will be replaced for left and right quotation.
Ok, I don't know what all that code is trying to accomplish.
But anyway here is my go at it
$str = "this 'is' a new 'string and i wanna' replace \"in\" \"it here\"";
$str = preg_replace(["/'([^']+)'/",'/"([^"]+)"/'], ["^$1*", "#$1#"], $str, 1);
print_r($str);
You can test it here
Ouptput
this ^is* a new 'string and i wanna' replace #in# "it here"
Using preg_replace and a fairly simple Regular expression, we can replace the quotes. Now the trick here is the fourth parameter of preg_replace is $count And is defined as this:
count If specified, this variable will be filled with the number of replacements done.
Therefore, setting this to 1 limits it to the first match only. In other words it will do $count replacements, or 1 in this case. Now because it's an array of patterns, each pattern is treated separately. So each one is basically treated as a separate operation, and thus each is allowed $count matches, or each get 1 match/replacement.
Now rather or not this fits every use case you have I cannot say, but it's the most straight forward way to do it for the example you provided.
As for the match itself /'([^']+)'/
/ opening and closing "delimiters" for the Expression (its a required thing, although it doesn't have to be /)
' literal match, matches ' one time (the opening quote)
( ... ) capture group (group1) so we can use it in the replacement, as $1
[^']+ character set with a [^ not modifier, match anything not in the set, so anything that is not a ' one or more times, greedy
' literal match, matches ' one time (the ending quote)
The replacement "^$1*"
^ literal, adds this char in
$1 use the contents of the capture group (group1)
* literal, adds the char in
Hope that helps understand how it works.
UPDATE
Ok I think I finally deciphered what you want:
string will be replaced for if any word have left and right quotation. example..'word'..here string will be changed..but 'word...in this case not change or word' also not be changed.
This seems like you are trying to say only "whole" words with no spaces.
So in that case we have to adjust our regular expression like this:
$str = preg_replace(["/'([-\w]+)'/",'/"([-\w]+)"/'], ["^$1*", "#$1#"], $str);
So we removed the limit $count and we changed what is in the character group to be more strict:
[-\w]+ the \w means the working set, or in other words a-zA-Z0-9_ then the - is a literal (it has to/should go first in this case)
What we are saying with this is to match only strings that start and end with a quote(single|double) and only if the string within them match the working set plus the hyphen. This does not include the space. This way in the first case, your example, it produces the same result, but if you were to flip it to
//[ORIGINAL] this 'is' a new 'string and i wanna' replace \"in\" \"it here\"
this a new 'string and i wanna' replace 'is' \"it here\" \"in\"
You would get his output
this a new 'string and i wanna' replace ^is* \"it here\" #in#
Before this change you would have gotten
this a new ^string and i wanna* replace 'is' #it here# "in"
In other words it would have only replaced the first occurrence, now it will replace anything between the quotes if and only if it's a whole word.
As a final note you can be even more strict if you only want alpha characters by changing the character set to this [a-zA-Z]+, then it will match only a to z, upper or lower case. Whereas the example above will match 0 to 9 (or any combination of them) the - hyphen, the _ underline and the previously mentioned alpha sets.
Hope that is what you need.

How can remove the numberic suffix in php?

For example, if I want to get rid of the repeating numeric suffix from the end of an expression like this:
some_text_here_1
Or like this:
some_text_here_1_5
and I want finally receive something like this:
some_text_here
What's the best and flexible solution?
$newString = preg_replace("/_?\d+$/","",$oldString);
It is using regex to match an optional underscore (_?) followed by one or more digits (\d+), but only if they are the last characters in the string ($) and replacing them with the empty string.
To capture unlimited _ numbers, just wrap the whole regex (except the $) in a capture group and put a + after it:
$newString = preg_replace("/(_?\d+)+$/","",$oldString);
If you only want to remove a numberic suffix if it is after an underscore (e.g. you want some_text_here14 to not be changed, but some_text_here_14 to be changed), then it should be:
$newString = preg_replace("/(_\d+)+$/","",$oldString);
Updated to fix more than one suffix
Strrpos is far better than regex on such a simple string problem.
$str = "some_text_here_13_15";
While(is_numeric(substr($str, strrpos($str, "_")+1))){
$str = substr($str,0 , strrpos($str, "_"));
}
Echo $str;
Strrpos finds the last "_" in str and if it's numeric remove it.
https://3v4l.org/OTdb9
Just to give you an idea of what I mean with regex not being a good solution on this here is the performance.
Regex:
https://3v4l.org/Tu8o2/perf#output
0.027 seconds for 100 runs.
My code with added numeric check:
https://3v4l.org/dkAqA/perf#output
0.003 seconds for 100 runs.
This new code performs even better than before oddly enough, regex is very slow. Trust me on that
You be the judge on what is best.
First you'll want to do a preg_replace() in order to remove all digits by using the regex /\d+/. Then you'll also want to trim any underscores from the right using rtrim(), providing _ as the second parameter.
I've combined the two in the following example:
$string = "some_text_here_1";
echo rtrim(preg_replace('/\d+/', '', $string), '_'); // some_text_here
I've also created an example of this at 3v4l here.
Hope this helps! :)
$reg = '#_\d+$#';
$replace = '';
echo preg_replace($reg, $replace, $string);
This would do
abc_def_ghi_123 > abc_def_ghi
abc_def_1 > abc_def
abc_def_ghi > abc_def_ghi
abd_def_ > abc_def_
abc_123_def > abd_123_def
in case of abd_def_123_345 > abc_def
one could change the line
$reg = '#(?:_\d+)+$#';

Regex rules in an array

Maybe it can not be solved this issue as I want, but maybe you can help me guys.
I have a lot of malformed words in the name of my products.
Some of them has leading ( and trailing ) or maybe one of these, it is same for / and " signs.
What I do is that I am explode the name of the product by spaces, and examines these words.
So I want to replace them to nothing. But, a hard drive could be 40GB ATA 3.5" hard drive. I need to process all the word, but I can not use the same method for 3.5" as for () or // because this 3.5" is valid.
So I only need to replace the quotes, when it is at the start of the string AND at end of the string.
$cases = [
'(testone)',
'(testtwo',
'testthree)',
'/otherone/',
'/othertwo',
'otherthree/',
'"anotherone',
'anothertwo"',
'"anotherthree"',
];
$patterns = [
'/^\(/',
'/\)$/',
'~^/~',
'~/$~',
//Here is what I can not imagine, how to add the rule for `"`
];
$result = preg_replace($patterns, '', $cases);
This is works well, but can it be done in one regex_replace()? If yes, somebody can help me out the pattern(s) for the quotes?
Result for quotes should be this:
'"anotherone', //no quote at end leave the leading
'anothertwo"', //no quote at start leave the trailin
'anotherthree', //there are quotes on start and end so remove them.
You may use another approach: rather than define an array of patterns, use one single alternation based regex:
preg_replace('~^[(/]|[/)]$|^"(.*)"$~s', '$1', $s)
See the regex demo
Details:
^[(/] - a literal ( or / at the start of the string
| - or
[/)]$ - a literal ) or / at the end of the string
| - or
^"(.*)"$ - a " at the start of the string, then any 0+ characters (due to /s option, the . matches a linebreak sequence, too) that are captured into Group 1, and " at the end of the string.
The replacement pattern is $1 that is empty when the first 2 alternatives are matched, and contains Group 1 value if the 3rd alternative is matched.
Note: In case you need to replace until no match is found, use a preg_match with preg_replace together (see demo):
$s = '"/some text/"';
$re = '~^[(/]|[/)]$|^"(.*)"$~s';
$tmp = '';
while (preg_match($re, $s) && $tmp != $s) {
$tmp = $s;
$s = preg_replace($re, '$1', $s);
}
echo $s;
This works
preg_replace([[/(]?(.+)[/)]?|/\"(.+)\"/], '$1', $string)

Make user name bolded in text in PHP

$text = 'Hello #demo here!';
$pattern = '/#(.*?)[ ]/';
$replacement = '<strong>${1}</strong> ';
echo preg_replace($pattern, $replacement, $text);
This works, I get HTML like this: Hello <strong>demo</strong> here!. But this not works, when that #demo is at the end of string, example: $text = 'Hello #demo';. How can I change my pattern, so it will return same output whenever it is end of the string or not.
Question 2:
What if the string is like $text = 'Hello #demo!';, so it will not put ! as bolded text? Just catch space, end of string or not real-word.
Sorry for bad English, hope you know what I need.
In order to select a word beginning with the # symbol, this regex will work:
$pattern = "/#(\w+)\b/"
`\w` is a short hand character class for `[a-zA-Z0-9_]`. `\b` is an anchor for the beginning or end of a word, in this case the end. So the regex is saying: select something starting with an '#' followed by one or more word characters until the end of the word is reached.
Reference: http://www.regular-expressions.info/tutorial.
You could use a word boundary, that's what they're for:
$pattern = '/#(.+?)\b/';
This will work for question 2 also
You can add an option to match the end of the string:
#(.*?)(?= |\p{P}?$)
Replace with <strong>$1</strong>.
You can also use \p{P} (any Unicode punctuation symbol) to prevent punctuation from bold formatting.
Here is a demo.

How to trim special chars from string?

I want to remove all non-alphanumeric signs from left and right of the string, leaving the ones in middle of string.
I've asked similar question here, and good solution is:
$str = preg_replace('/^\W*(.*\w)\W*$/', '$1', $str);
But it does remove also some signs like ąĄćĆęĘ etc and it should not as its still alphabetical sign.
Above example would do:
~~AAA~~ => AAA (OK)
~~AA*AA~~ => AA*AA (OK)
~~ŚAAÓ~~ => AA (BAD)
Make sure you use u flag for unicode while using your regex.
Following works with your input:
$str = preg_replace('/^\W*(.*\w)\W*$/u', '$1', '~~ŚAAÓ~~' );
// str = ŚAAÓ
But this won't work: (Don't Use it)
$str = preg_replace('/^\W*(.*\w)\W*$/', '$1', '~~ŚAAÓ~~' );
You can pass in a list of valid characters and tell the function to replace any character that is not in that list:
$str = preg_replace('/[^a-zA-Z0-9*]+/', '', $str);
The square brackets say select everything in this range. The carat (^) is the regex for not. We then list our valid characters (lower case a to z, uppercase a to z, numbers from 0 to 9, and an asterisks). The plus symbol on the end of the square bracket says select 0 or more characters.
Edit:
If this is the list of all characters you want to keep, then:
$str = preg_replace('/[^ĄąĆ毿ŹźŃńŁłÓó*]+/', '', $str);

Categories