How to detect a string with specific pattern from a larger string? - php

I have a long string from which I want to detect and replace with some other text. Suppose my text is 'my first name is #[[Rameez]] and second name is #[[Rami]]'. I want to detect #[[Rameez]] and replace with Rameez dynamically to all likewise strings.

You could simply do:
preg_replace('/#\[\[(\w+)\]\]/', "$1", $string);
[ and ] need to be escaped because they have special meaning in a regex.
This will replace any string #[[whatever]] by whatever

Specific version
// Find Rameez specifically
$re = '/#\[\[(?<name>Rameez)\]\]/i'; // Use i flag if you to want a case insensitive search
$str = 'my first name is #[[Rameez]] and second name is #[[Rami]].\nDid I forget to mention that my name is #[[rameez]]?';
echo preg_replace($re, '$1', '**RAMEEZ** (specific)<br/>' . PHP_EOL);
Generic version
Regex
#\[\[(?<name>.+?)\]\]
Description
(?<name> .. ) represents here a named capturing group. See this answer for details.
Sample code
// Find any name enclosed by #[[ and ]].
$re = '/#\[\[(?<name>Rameez)\]\]/i'; // Use i flag if you to want a case insensitive search
$str = 'my first name is #[[Rameez]] and second name is #[[Rami]].\nDid I forget to mention that my name is #[[rameez]]?';
echo preg_replace($re, '$1', '**RAMEEZ** (generic)<br/>' . PHP_EOL);
DEMO

You can create a regex pattern then user it to match, find and replace a given string. Here's example:
string input = "This is text with far too much " +
"whitespace.";
string pattern = "\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
It's C# code but you can apply it to any language really. In your case you can substitute the pattern with something like string pattern = "#[[Rameez]]"; and then use different replacement: string replacement = "Rameez";
I hope that makes sense.

Related

Regex to replace a string from one place to another in the same record

I have a record separated by | symbol. I need to replace a string from one place to another in the same record:
My input looks like this:
BANG|ADAR|**285815**|MOTOR|GOOD||INDIA|2.4|SOFTWARE|285816_AKS|SAB_PART|**AKS_PN|285816**
I need to replace 285815 with the string after AKS_PN, in this case I need to replace 285815 with 285816.
With the (([^|]*\|){3})(.*) I am able to fetch 285815, need help in fetching string after AKS_PN in the same regular expression.
I am aware of how to replace 285815 with 285816. I am using PHP.
Regex solution
You need to use capturing groups. In general:
(everything_before)(interesting_part_1)(between)(interesting_part_in_the_end)
Afterwards, just put it together as you wish
(everything_before)(interesting_part_in_the_end)(between)
This leaves (interesting_part_1) out of the final string.
In your specific example this might come down to
^((?:[^|]*\|){2})([^|]*)\|(.*?AKS_PN)\|(.*)
which would need to be replaced by
$1$4|$3
See an example on regex101.com (still not sure what to do with 285815 here).
Everything in PHP:
<?php
$string = "BANG|ADAR|285815|MOTOR|GOOD||INDIA|2.4|SOFTWARE|285816_AKS|SAB_PART|AKS_PN|285816";
$regex = '~^((?:[^|]*\|){2})([^|]*)\|(.*?AKS_PN)\|(.*)~';
$string = preg_replace($regex, "$1$4|$3", $string);
echo $string;
# BANG|ADAR|285816|MOTOR|GOOD||INDIA|2.4|SOFTWARE|285816_AKS|SAB_PART|AKS_PN
?>
Non-regex solution
You don't even need a regular expression here (far too complicated), just split, switch and join afterwards:
<?php
$string = "BANG|ADAR|285815|MOTOR|GOOD||INDIA|2.4|SOFTWARE|285816_AKS|SAB_PART|AKS_PN|285816";
$parts = explode("|", $string);
$parts[2] = $parts[count($parts) - 1];
$string = implode("|", $parts);
echo $string;
?>

preg replace would ignore non-letter characters when detecting words

I have an array of words and a string and want to add a hashtag to the words in the string that they have a match inside the array. I use this loop to find and replace the words:
foreach($testArray as $tag){
$str = preg_replace("~\b".$tag."~i","#\$0",$str);
}
Problem: lets say I have the word "is" and "isolate" in my array. I will get ##isolate at the output. this means that the word "isolate" is found once for "is" and once for "isolate". And the pattern ignores the fact that "#isoldated" is not starting with "is" anymore and it starts with "#".
I bring an example BUT this is only an example and I don't want to just solve this one but every other possiblity:
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
Output will be:
this #is ##isolated #is an example of this and that
You may build a regex with an alternation group enclosed with word boundaries on both ends and replace all the matches in one pass:
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
echo preg_replace('~\b(?:' . implode('|', $testArray) . ')\b~i', '#$0', $str);
// => this #is #isolated #is an example of this and that
See the PHP demo.
The regex will look like
~\b(?:is|isolated|somethingElse)\b~
See its online demo.
If you want to make your approach work, you might add a negative lookbehind after \b: "~\b(?<!#)".$tag."~i","#\$0". The lookbehind will fail all matches that are preceded with #. See this PHP demo.
A way to do that is to split your string by words and to build a associative array with your original array of words (to avoid the use of in_array):
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
$hash = array_flip(array_map('strtolower', $testArray));
$parts = preg_split('~\b~', $str);
for ($i=1; $i<count($parts); $i+=2) {
$low = strtolower($parts[$i]);
if (isset($hash[$low])) $parts[$i-1] .= '#';
}
$result = implode('', $parts);
echo $result;
This way, your string is processed only once, whatever the number of words in your array.

PHP Array str_replace Whole Word

I'm doing str_replace on a very long string and my $search is an array.
$search = array(
" tag_name_item ",
" tag_name_item_category "
);
$replace = array(
" tag_name_item{$suffix} ",
" tag_name_item_category{$suffix} "
);
echo str_replace($search, $replace, $my_really_long_string);
The reason why I added spaces on both $search and $replace is because I want to only match whole words. As you would have guessed from my code above, if I removed the spaces and my really long string is:
...
tag_name_item ...
tag_name_item_category ...
...
Then I would get something like
...
tag_name_item_sfx ...
tag_name_item_sfx_category ...
...
This is wrong because I want the following result:
...
tag_name_item_sfx ...
tag_name_item_category_sfx ...
...
So what's wrong?
Nothing really, it works. But I don't like it. Looks dirty, not well coded, inefficient.
I realized I can do something like this using regular expressions using the \b modifier but I'm not good with regex and so I don't know how to preg_replace.
A possible approach using regular expressions would/could look like this:
$result = preg_replace(
'/\b(tag_name_item(_category)?)\b/',
'$1' . $suffix,
$string
);
How it works:
\b: As you say are word boundaries, this is to ensure we're only matching words, not word parts
(: We want to use part of our match in the replacement string (tag_name_index has to be replaced with itself + a suffix). That's why we use a match group, so we can refer back to the match in the replacement string
tag_name_index is a literal match for that string.
(_category)?: Another literal match, grouped and made optional through use of the ? operator. This ensures that we're matching both tag_name_item and tag_name_item_category
): end of the first group (the optional _category match is the second group). This group, essentially, holds the entire match we're going to replace
\b: word boundary again
These matches are replaced with '$1' . $suffix. The $1 is a reference to the first match group (everything inside the outer brackets in the expression). You could refer to the second group using $2, but we're not interested in that group right now.
That's all there is to it really
More generic:
So, you're trying to suffix all strings starting with tag_name, which judging by your example, can be followed by any number of snake_cased words. A more generic regex for that would look something like this:
$result = preg_replace(
'/\b(tag_name[a-z_]*)\b/',
'$1' . $suffix,
$string
);
Like before, the use of \b, () and the tag_name literal remains the same. what changed is this:
[a-z_]*: This is a character class. It matches characters a-z (a to z), and underscores zero or more times (*). It matches _item and _item_category, just as it would match _foo_bar_zar_fefe.
These regex's are case-sensitive, if you want to match things like tag_name_XYZ, you'll probably want to use the i flag (case-insensitive): /\b(tag_name[a-z_]*)\b/i
Like before, the entire match is grouped, and used in the replacement string, to which we add $suffix, whatever that might be
To avoid the problem, you can use strtr that parses the string only once and chooses the longest match:
$pairs = [ " tag_name_item " => " tag_name_item{$suffix} ",
" tag_name_item_category " => " tag_name_item_category{$suffix} " ];
$result = strtr($str, $pairs);
This function replaces the entire whole word but not the substring with an array element which matches the word
<?PHP
function removePrepositions($text){
$propositions=array('/\b,\b/i','/\bthe\b/i','/\bor\b/i');
if( count($propositions) > 0 ) {
foreach($propositions as $exceptionPhrase) {
$text = preg_replace($exceptionPhrase, '', trim($text));
}
$retval = trim($text);
}
return $retval;
}
?>
See the entire example

How to replace an unknown string? (only pattern is known)

I want to replace a string which might appear within a URL.
The string has the following pattern:
%26TID%3D123456
I want to replace the 123456 part, to a specific value such as: 777777.
To be on the safe side though, I don't want to assume that the relevant part of the original string has necessarily 6 digits after the %3D part; I want to assume that the original string might contain a few more or few less characters (and I also can't tell the real value of each digit).
In addition, when I replace the string, since that string will usually appear in the middle of the URL, I need to replace it without modifying the rest of the URL. After that string, there would usually be another %26 string which I want to keep including whatever that is after it, but to be on the safe side, I don't want to assume that the original string is necessarily followed by %26.
What is the best practice to make such a replacement, that would stand up to all my above conditions?
The general rule is to specify the boundary (or an "anchor") (here, a starting one) and then match whatever you want with the more generic pattern.
Here, the "anchor" is the literal text TID%3D. The more generic pattern is one or more digits: \d+.
Since you need to replace the first occurrence, you need to pass 1 as the limit argument value in preg_replace.
So, combining all that:
$re = '~TID%3D\d+~';
$str = "%26TID%3D123456 %26PID%3D123456 %26TID%3D123456";
$subst = 'TID%3D7652';
echo $result = preg_replace($re, $subst, $str, 1);
// = > %26TID%3D7652 %26PID%3D123456 %26TID%3D123456
See IDEONE demo
If you do not want (or do not know) the "anchor" text, use a capturing mechanism (demo):
$re = '~(TID%\w{2})\d+~'; // (...) specify a capturing group referenced with ${1} later
$str = "%26TID%3D123456 %26PID%3D123456 %26TID%3D123456";
$subst = '${1}7652';
echo $result = preg_replace($re, $subst, $str, 1);
// = > %26TID%3D7652 %26PID%3D123456 %26TID%3D123456
You can also use a lookbehind approach, but it is less efficient:
$re = '~(?<=TID%3D)\d+~'; // (?<=TID%3D) makes sure digits are preceded with TID%3D substring
$str = "%26TID%3D123456 %26PID%3D123456 %26TID%3D123456";
$subst = '${1}7652';
echo $result = preg_replace($re, $subst, $str, 1);

Check string for defined format and get part of it

How can I check if a string has the format [group|any_title] and give me the title back?
[group|This is] -> This is
[group|just an] -> just an
[group|example] -> example
I would do that with explode and [group| as the delimiter and remove the last ]. If length (of explode) is > 0, then the string has the correct format.
But I think that is not quite a good way, isn't it?
So you want to check if a string matches a regex?
if(preg_match('/^\[group\|(.+)\]$/', $string, $m)) {
$title = $m[1];
}
If the group part is supposed to be dynamic as well:
if(preg_match('/^\[(.+)\|(.+)\]$/', $string, $m)) {
$group = $m[1];
$title = $m[2];
}
Use regular expression matching using PHP function preg_match.
You can use for example regexr.com to create and test a regular expression and when you're done, then implement it in your PHP script (replace the first parameter of preg_match with your regular expression):
$text = '[group|This is]';
// replace "pattern" with regular expression pattern
if (preg_match('/pattern/', $text, $matches)) {
// OK, you have parts of $text in $matches array
}
else {
// $text doesn't contain text in expected format
}
Specific regular expression pattern depends on how strictly you want to check your input string. It can be for example something like /^\[.+\|(.+)\]$/ or /\|([A-Za-z ]+)\]$/. First checks if string starts with [, ends with ] and contains any characters delimited by | in between. Second one just checks if string ends with | followed by upper and lower case alphabetic characters and spaces and finally ].

Categories