How would I remove repeating characters (e.g. remove the letter k in cakkkke for it to be cake)?
One straightforward way to do this would be to loop through each character of the string and append each character of the string to a new string if the character isn't a repeat of the previous character.
Here is some code that can do this:
$newString = '';
$oldString = 'cakkkke';
$lastCharacter = '';
for ($i = 0; $i < strlen($oldString); $i++) {
if ($oldString[$i] !== $lastCharacter) {
$newString .= $oldString[$i];
}
$lastCharacter = $oldString[$i];
}
echo $newString;
Is there a way to do the same thing more concisely using regex or built-in functions?
Use backrefrences
echo preg_replace("/(.)\\1+/", "$1", "cakkke");
Output:
cake
Explanation:
(.) captures any character
\\1 is a backreferences to the first capture group. The . above in this case.
+ makes the backreference match atleast 1 (so that it matches aa, aaa, aaaa, but not a)
Replacing it with $1 replaces the complete matched text kkk in this case, with the first capture group, k in this case.
You want to first match a character, followed by that character repeated: (.)\1+. Replace that with the first character. The brackets create a backreference to the first character, which you use both to match the repeated instances and as the replacement text.
preg_replace('/(.)\1+/', '$1', $str);
Related
I like to replace the letters "KELLY" bettween "#" with the same length of "#". (here, repetitive five #'s instead of 'KELLY')
$str = "####KELLY#####"; // any alpabet letters can come.
preg_replace('/(#{3,})[A-Z]+(#{3,})/', "$1$2", $str);
It returns ######### (four hashes then five hashes) without 'KELLY'.
How can I get ############## which is four original leading hashes, then replace each letter with a hash, then the five original trailing hashes?
The \G continue metacharacter makes for a messier pattern, but it enables the ability to use preg_replace() instead of preg_replace_callback().
Effectively, it looks for the leading three-or-more hashes, then makes single-letter replacements until it reaches the finishing sequence of three-or-more hashes.
This technique also allows hash markers to be "shared" -- I don't actually know if this is something that is desired.
Code: (Demo)
$str = "####KELLY##### and ###ANOTHER###### not ####foo#### but: ###SHARE###MIDDLE###HASHES### ?";
echo $str . "\n";
echo preg_replace('/(?:#{3}|\G(?!^))\K[A-Z](?=[A-Z]*#{3})/', '#', $str);
Output:
####KELLY##### and ###ANOTHER###### not ####foo#### but: ###SHARE###MIDDLE###HASHES### ?
############## and ################ not ####foo#### but: ############################# ?
Breakdown:
/ #starting pattern delimiter
(?: #start non-capturing group
#{3} #match three hash symbols
| # OR
\G(?!^) #continue matching, disallow matching from the start of string
) #close non-capturing group
\K #forget any characters matched up to this point
[A-Z] #match a single letter
(?= #lookahead (do not consume any characters) for...
[A-Z]* #zero or more letters then
#{3} #three or more hash symbols
) #close the lookahead
/ #ending pattern delimiter
Or you can achieve the same result with preg_replace_callback().
Code: (Demo)
echo preg_replace_callback(
'/#{3}\K[A-Z]+(?=#{3})/',
function($m) {
return str_repeat('#', strlen($m[0]));
},
$str
);
I solved the problem with preg_replace_callback function in php.
Thanks CBroe for the tips.
preg_replace_callback('/#{3,}([A-Z]+)#{3,}/i', 'replaceLetters', $str);
function replaceLetters($matches) {
$ret = '';
for($i=0; $i < strlen($matches[0]); $i++) {
$ret .= "#";
}
return $ret;
}
Considering this input string:
"this is a Test String to get the last index of word with an uppercase letter in PHP"
How can I get the position of the last uppercase letter (in this example the position of the first "P" (not the last one "P") of "PHP" word?
I think this regex works. Give it a try.
https://regex101.com/r/KkJeho/1
$pattern = "/.*\s([A-Z])/";
//$pattern = "/.*\s([A-Z])[A-Z]+/"; pattern to match only all caps word
Edit to solve what Wiktor wrote in comments I think you could str_replace all new lines with space as the input string in the regex.
That should make the regex treat it as a single line regex and still give the correct output.
Not tested though.
To find the position of the letter/word:
$str = "this is a Test String to get the last index of word with an uppercase letter in PHP";
$pattern = "/.*\s([A-Z])(\w+)/";
//$pattern = "/.*\s([A-Z])([A-Z]+)/"; pattern to match only all caps word
preg_match($pattern, $str, $match);
$letter = $match[1];
$word = $match[1] . $match[2];
$position = strrpos($str, $match[1].$match[2]);
echo "Letter to find: " . $letter . "\nWord to find: " . $word . "\nPosition of letter: " . $position;
https://3v4l.org/sJilv
If you also want to consider a non-regex version: You can try splitting the string at the whitespace character, iterating the resulting string array backwards and checking if the current string's first character is an upper case character, something like this (you may want to add index/null checks):
<?php
$str = "this is a Test String to get the last index of word with an uppercase letter in PHP";
$explodeStr = explode(" ",$str);
$i = count($explodeStr) - 1;
$characterCount=0;
while($i >= 0) {
$firstChar = $explodeStr[$i][0];
if($firstChar == strtoupper($firstChar)){
echo $explodeStr[$i]. ' at index: ';
$idx = strlen($str)-strlen($explodeStr[$i] -$characterCount);
echo $idx;
break;
}
$characterCount += strlen($explodeStr[i]) +1; //+1 for whitespace
$i--;
}
This prints 80 which is indeed the index of the first P in PHP (including whitespaces).
Andreas' pattern looks pretty solid, but this will find the position faster...
.* \K[A-Z]{2,}
Pattern Demo
Here is the PHP implementation: Demo
$str='this is a Test String to get the last index of word with an uppercase letter in PHP test';
var_export(preg_match('/.* \K[A-Z]{2,}/',$str,$out,PREG_OFFSET_CAPTURE)?$out[0][1]:'fail');
// 80
If you want to see a condensed non-regex method, this will work:
Code: Demo
$str='this is a Test String to get the last index of word with an uppercase letter in PHP test';
$allcaps=array_filter(explode(' ',$str),'ctype_upper');
echo "Position = ",strrpos($str,end($allcaps));
Output:
Position = 80
This assumes that there is an all caps word in the input string. If there is a possibility of no all-caps words, then a conditional would sort it out.
Edit, after re-reading the question, I am unsure what exactly makes PHP the targeted substring -- whether it is because it is all caps, or just the last word to start with a capitalized letter.
If just the last word starting with an uppercase letter then this pattern will do: /.* \K[A-Z]/
If the word needs to be all caps, then it is possible that /b word boundaries may be necessary.
Some more samples and explanation from the OP would be useful.
Another edit, you can declare a set of characters to exclude and use just two string functions. I am using a-z and a space with rtrim() then finding the right-most space, and adding 1 to it.
$str='this is a Test String to get the last index of word with an uppercase letter in PHP test';
echo strrpos(rtrim($str,'abcdefghijklmnopqrstuvwxyz '),' ')+1;
// 80
How to match exact word contains any special character ?
$string = 'Fall in love with #PepsiMoji! Celebrate #WorldEmojiDay by downloading our keyboard # http://bit.ly/pepsiKB & take your text game up a notch. - teacher';
preg_match("/\b#worldemojiday\b/i",$string); //false
I want to match exact word containing any character. Like if I want to match word 'download' in this string, It should return false
preg_match("/\bdownload\b/i",$string); //false
But when I search for downloading, It should return true.
Thanks
The problem is with \b word boundary before # non-word character. \b cannot match the position between 2 non-word (or between 2 word) characters, thus, you do not get a match.
A solution is either to remove the first \b, or use \B (a non-word boundary matching between 2 word or 2 non-word characters) instead of it.
\B#worldemojiday\b
Or
#worldemojiday\b
See demo (or this one)
Note that \B also matches at the beginning of a string.
Here is a way to build a regex dynamically, adding word boundaries only where necessary:
$srch = "žvolen";
$srch = preg_quote($srch);
if (preg_match('/\w$/u', $srch)) {
$srch .= '\\b';
}
if (preg_match('/^\w/u', $srch)) {
$srch = '\\b' . $srch;
}
echo preg_match("/" . $srch . "/ui", "žvolen is used.");
What about using lookarounds:
(?<!\w)#WorldEmojiDay(?!\w)
This ensures, that there's no word character before or after the string. See test at regex101
I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);
Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries
You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''
Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);
As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"
To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.
Here is my concern,
I have a string and I need to extract chraracters two by two.
$str = "abcdef" should return array('ab', 'bc', 'cd', 'de', 'ef'). I want to use preg_match_all instead of loops. Here is the pattern I am using.
$str = "abcdef";
preg_match_all('/[\w]{2}/', $str);
The thing is, it returns Array('ab', 'cd', 'ef'). It misses 'bc' and 'de'.
I have the same problem if I want to extract a certain number of words
$str = "ab cd ef gh ij";
preg_match_all('/([\w]+ ){2}/', $str); // returns array('ab cd', 'ef gh'), I'm also missing the last part
What am I missing? Or is it simply not possible to do so with preg_match_all?
For the first problem, what you want to do is match overlapping string, and this requires zero-width (not consuming text) look-around to grab the character:
/(?=(\w{2}))/
The regex above will capture the match in the first capturing group.
DEMO
For the second problem, it seems that you also want overlapping string. Using the same trick:
/(?=(\b\w+ \w+\b))/
Note that \b is added to check the boundary of the word. Since the match does not consume text, the next match will be attempted at the next index (which is in the middle of the first word), instead of at the end of the 2nd word. We don't want to capture from middle of a word, so we need the boundary check.
Note that \b's definition is based on \w, so if you ever change the definition of a word, you need to emulate the word boundary with look-ahead and look-behind with the corresponding character set.
DEMO
In case if you need a Non-Regex solution, Try this...
<?php
$str = "abcdef";
$len = strlen($str);
$arr = array();
for($count = 0; $count < ($len - 1); $count++)
{
$arr[] = $str[$count].$str[$count+1];
}
print_r($arr);
?>
See Codepad.