Condensed function to strip double letters away from a string (PHP) - php

I need to take every double letter occurrence away from a word. (I.E. "attached" have to become: "aached".)
I wrote this function:
function strip_doubles($string, $positions) {
for ($i = 0; $i < strlen($string); $i++) {
$stripped_word[] = $string[$i];
}
foreach($positions['word'] as $position) {
unset($stripped_word[$position], $stripped_word[$position + 1]);
}
$returned_string= "";
foreach($stripped_words $key => $value) {
$returned_string.= $stripped_words[$key];
}
return $returned_string;
}
where $string is the word to be stripped and $positions is an array containing the positions of any first double letter.
It perfectly works but how would a real programmer write the same function... in a more condensed way? I have a feeling it could be possible to do the same thing without three loops and so much code.

Non-regex solution, tested:
$string = 'attached';
$stripped = '';
for ($i=0,$l=strlen($string);$i<$l;$i++) {
$matched = '';
// if current char is the same as the next, skip it
while (substr($string, $i, 1)==substr($string, $i+1, 1)) {
$matched = substr($string, $i, 1);
$i++;
}
// if current char is NOT the same as the matched char, append it
if (substr($string, $i, 1) != $matched) {
$stripped .= substr($string, $i, 1);
}
}
echo $stripped;

You should use a regular expression. It matches on certain characteristics and can replace the matched occurences with some other string(s).
Something like
$result = preg_replace('#([a-zA-Z]{1})\1#i', '', $string);
Should work. It tells the regexp to match one character from a-z followed by the match itself, thus effectively two identical characters after each other. The # mark the start and end of the regexp. If you want more characters than just a-z and A-Z, you could use other identifiers like [a-ZA-Z0-9]{1} or for any character .{1} or for only Unicode characters (including combined characters), use \p{L}\p{M}*
The i flag after the last # means 'case insensitive' and will instruct the regexp to also match combinations with different cases, like 'tT'. If you want only combinations in the same case, so 'tt' and 'TT', then remove the 'i' from the flags.
The '' tells the regexp to replace the matched occurences (the two identical characters) with an empty string.
See http://php.net/manual/en/function.preg-replace.php and http://www.regular-expressions.info/

Related

PHP convert uppercase words to lowercase, but keep ucfirst on lowercase words

An example:
THIS IS A Sentence that should be TAKEN Care of
The output should be:
This is a Sentence that should be taken Care of
Rules
Convert UPPERCASE words to lowercase
Keep the lowercase words with an uppercase first character intact
Set the first character in the sentence to uppercase.
Code
$string = ucfirst(strtolower($string));
Fails
It fails because the ucfirst words are not being kept.
This is a sentence that should be taken care of
You can test each word for those rules:
$str = 'THIS IS A Sentence that should be TAKEN Care of';
$words = explode(' ', $str);
foreach($words as $k => $word){
if(strtoupper($word) === $word || // first rule
ucfirst($word) !== $word){ // second rule
$words[$k] = strtolower($word);
}
}
$sentence = ucfirst(implode(' ', $words)); // third rule
Output:
This is a Sentence that should be taken Care of
A little bit of explanation:
Since you have overlapping rules, you need to individually compare them, so...
Break down the sentence into separate words and check each of them based on the rules;
If the word is UPPERCASE, turn it into lowercase; (THIS, IS, A, TAKEN)
If the word is ucfirst, leave it alone; (Sentence, Care)
If the word is NOT ucfirst, turn it into lowercase, (that, should, be, of)
You can break the sentence down into individual words, then apply a formatting function to each of them:
$sentence = 'THIS IS A Sentence that should be TAKEN Care of';
$words = array_map(function ($word) {
// If the word only has its first letter capitalised, leave it alone
if ($word === ucfirst(strtolower($word)) && $word != strtoupper($word)) {
return $word;
}
// Otherwise set to all lower case
return strtolower($word);
}, explode(' ', $sentence));
// Re-combine the sentence, and capitalise the first character
echo ucfirst(implode(' ', $words));
See https://eval.in/936462
$str = "THIS IS A Sentence that should be TAKEN Care of";
$str_array = explode(" ", $str);
foreach ($str_array as $testcase =>$str1) {
//Check the first word
if ($testcase ==0 && ctype_upper($str1)) {
echo ucfirst(strtolower($str1))." ";
}
//Convert every other upercase to lowercase
elseif( ctype_upper($str1)) {
echo strtolower($str1)." ";
}
//Do nothing with lowercase
else {
echo $str1." ";
}
}
Output:
This is a Sentence that should be taken Care of
I find preg_replace_callback() to be a direct tool for this task. Create a pattern that will capture the two required strings:
The leading word
Any non-leading, ALL-CAPS word
Code: (Demo)
echo preg_replace_callback(
'~(^\pL+\b)|(\b\p{Lu}+\b)~u',
function($m) {
return $m[1]
? mb_convert_case($m[1], MB_CASE_TITLE, 'UTF-8')
: mb_strtolower($m[2], 'UTF-8');
},
'THIS IS A Sentence that should be TAKEN Care of'
);
// This is a Sentence that should be taken Care of
I did not test this with multibyte input strings, but I have tried to build it with multibyte characters in mind.
The custom function works like this:
There will always be either two or three elements in $m. If the first capture group matches the first word of the string, then there will be no $m[2]. When a non-first word is matched, then $m[2] will be populated and $m[1] will be an empty string. There is a modern flag that can be used to force that empty string to be null, but it is not advantageous in this case.
\pL+ means one or more of any letter (single or multi-byte)
\p{Lu}+ means one or more uppercase letters
\b is a word boundary. It is a zero-width character -- it doesn't match a character, it checks that the two consecutive characters change from a word to a non-word or vice versa.
My answer makes just 3 matches/replacement on the sample input string.
$string='THIS IS A Sentence that should be TAKEN Care of';
$arr=explode(" ", $string);
foreach($arr as $v)
{
$v = ucfirst(strtolower($v));
$stry = $stry . ' ' . $v;
}
echo $stry;

php regex replace each character with asterisk

I am trying to something like this.
Hiding users except for first 3 characters.
EX)
apple -> app**
google -> goo***
abc12345 ->abc*****
I am currently using php like this:
$string = "abcd1234";
$regex = '/(?<=^(.{3}))(.*)$/';
$replacement = '*';
$changed = preg_replace($regex,$replacement,$string);
echo $changed;
and the result be like:
abc*
But I want to make a replacement to every single character except for first 3 - like:
abc*****
How should I do?
Don't use regex, use substr_replace:
$var = "abcdef";
$charToKeep = 3;
echo strlen($var) > $charToKeep ? substr_replace($var, str_repeat ( '*' , strlen($var) - $charToKeep), $charToKeep) : $var;
Keep in mind that regex are good for matching patterns in string, but there is a lot of functions already designed for string manipulation.
Will output:
abc***
Try this function. You can specify how much chars should be visible and which character should be used as mask:
$string = "abcd1234";
echo hideCharacters($string, 3, "*");
function hideCharacters($string, $visibleCharactersCount, $mask)
{
if(strlen($string) < $visibleCharactersCount)
return $string;
$part = substr($string, 0, $visibleCharactersCount);
return str_pad($part, strlen($string), $mask, STR_PAD_RIGHT);
}
Output:
abc*****
Your regex matches all symbols after the first 3, thus, you replace them with a one hard-coded *.
You can use
'~(^.{3}|(?!^)\G)\K.~'
And replace with *. See the regex demo
This regex matches the first 3 characters (with ^.{3}) or the end of the previous successful match or start of the string (with (?!^)\G), and then omits the characters matched from the match value (with \K) and matches any character but a newline with ..
See IDEONE demo
$re = '~(^.{3}|(?!^)\G)\K.~';
$strs = array("aa","apple", "google", "abc12345", "asdddd");
foreach ($strs as $s) {
$result = preg_replace($re, "*", $s);
echo $result . PHP_EOL;
}
Another possible solution is to concatenate the first three characters with a string of * repeated the correct number of times:
$text = substr($string, 0, 3).str_repeat('*', max(0, strlen($string) - 3));
The usage of max() is needed to avoid str_repeat() issue a warning when it receives a negative argument. This situation happens when the length of $string is less than 3.

match whole word only without regex

Since i cant use preg_match (UTF8 support is somehow broken, it works locally but breaks at production) i want to find another way to match word against blacklist. Problem is, i want to search a string for exact match only, not first occurrence of the string.
This is how i do it with preg_match
preg_match('/\b(badword)\b/', strtolower($string));
Example string:
$string = "This is a string containing badwords and one badword";
I want to only match the "badword" (at the end) and not "badwords".
strpos('badword', $string) matches the first one
Any ideas?
Assuming you could do some pre-processing, you could use replace all your punctuation marks with white spaces and put everything in lowercase and then either:
Use strpos with something like so strpos(' badword ', $string) in a while loop to keep on iterating through your entire document;
Split the string at white spaces and compare each word with a list of bad words you have.
So if you where trying the first option, it would something like so (untested pseudo code)
$documet = body of text to process . ' '
$document.replace('!##$%^&*(),./...', ' ')
$document.toLowerCase()
$arr_badWords = [...]
foreach($word in badwords)
{
$badwordIndex = strpos(' ' . $word . ' ', $document)
while(!badWordIndex)
{
//
$badwordIndex = strpos($word, $document)
}
}
EDIT: As per #jonhopkins suggestion, adding a white space at the end should cater for the scenario where there wanted word is at the end of the document and is not proceeded by a punctuation mark.
If you want to mimic the \b modifier of regex you can try something like this:
$offset = 0;
$word = 'badword';
$matched = array();
while(($pos = strpos($string, $word, $offset)) !== false) {
$leftBoundary = false;
// If is the first char, it has a boundary on the right
if ($pos === 0) {
$leftBoundary = true;
// Else, if it is on the middle of the string, we must check the previous char
} elseif ($pos > 0 && in_array($string[$pos-1], array(' ', '-',...)) {
$leftBoundary = true;
}
$rightBoundary = false;
// If is the last char, it has a boundary on the right
if ($pos === (strlen($string) - 1)) {
$rightBoundary = true;
// Else, if it is on the middle of the string, we must check the next char
} elseif ($pos < (strlen($string) - 1) && in_array($string[$pos+1], array(' ', '-',...)) {
$rightBoundary = true;
}
// If it has both boundaries, we add the index to the matched ones...
if ($leftBoundary && $rightBoundary) {
$matched[] = $pos;
}
$offset = $pos + strlen($word);
}
You can use strrpos() instead of strpos:
strrpos — Find the position of the last occurrence of a substring in a string
$string = "This is a string containing badwords and one badword";
var_dump(strrpos($string, 'badword'));
Output:
45
A simple way to use word boundaries with unicode properties:
preg_match('/(?:^|[^pL\pN_])(badword)(?:[^pL\pN_]|$)/u', $string);
In fact it's much more complicated, have a look at here.

Put something after the nth digit, rather than nth character?

I'm working on an autocomplete for SSN-like numbers in PHP. So if the user searches for '123', it should find the number 444123555. I want to bold the results, thus, 444<b>123</b>555. I then, however, want to format it as an SSN - thus creating 444-<b>12-3</b>555.
Is there some way to say 'put the dash after the nth digit'? Because I don't want the nth character, just the nth digit - if I could say 'put a dash after the third digit and the fifth digit, ignoring non-numeric characters like <, b, and >' that would be awesome. Is this doable in a regex?
Or is there a different method that's escaping me here?
Just iterate over the string and check that each character is a digit and count the digits as you go.
That will be so much faster than regex, even if regex were a feasible solution here (which I am not convinced it is).
This will do exactly what you asked for:
$str = preg_replace('/^ ((?:\D*\d){3}) ((?:\D*\d){2}) /x', '$1-$2-', $str);
The (?:\D*\d) Will match any number of non-digits, then a digit. By repeating that n times, you match n digits, "ignoring" everything else.
Here's a simple function using an iterative approach as Platinum Azure suggests:
function addNumberSeparator($numString, $n, $separator = '-')
{
$numStringLen = strlen($numString);
$numCount = 0;
for($i = 0; $i < $numStringLen; $i++)
{
if(is_numeric($numString[$i]))
{
$numCount++;
//echo $numCount . '-' . $i;
}
if($numCount == $n)
return substr($numString, 0, $i + 1) . $separator . substr($numString, $i + 1);
}
}
$string = '444<b>123</b>555';
$string = addNumberSeparator($string, 3);
$string = addNumberSeparator($string, 5);
echo $string;
This outputs the following:
4x<b>x123</b>555
That will, of course, only work with a non-numeric separator character. Not the most polished piece of code, but it should give you a start!
Hope that helps.
If you want to get formated number and surrounding text:
<?php
preg_match("/(.*)(\d{3})(12)(3)(.*)/", "assd444123666as555", $match);
$str = $match[1];
if($match[2]!=="") $str.=$match[2]."-<b>";
$str.=$match[3]."-".$match[4]."</b>";
if($match[5]!=="") $str.=$match[5];
echo $str;
?>
If only formatted number:
<?php
preg_match("/(.*)(\d{3})(12)(3)(.*)/", "as444123666as555", $match);
$str = "";
if($match[2]!=="") $str.=$match[2]."-<b>";
$str.=$match[3]."-".$match[4]."</b>";
echo $str;
?>
Sorry, but it is a bit ambiguous.

PHP string letters

We have a variable $string, its contains some text like:
About 200 million CAPTCHAs are solved by humans around the world every day.
How can we get 2-3 last or first letters of each word (which length is more than 3 letters)?
Will check them for matched text with foreach():
if ('ey' is matched in the end of some word) {
replace 'ey' with 'ei' in this word;
}
Thanks.
First, I'll give you an example of how to loop through a string and work with each word in the string.
Second, I'll explain each part of the code so that you can modify it to your exact needs.
Here is how to switch out the last 2 letters (if they are "ey") of each word that is more than 3 letters long.
<?php
// Example string
$string = 'Hey they ey shay play stay nowhey';
// Create array of words splitting at spaces
$string = explode(" ", $string);
// The search and replace strings
$lookFor = "ey";
$switchTo = "ei";
// Cycle through the words
foreach($string as $key => $word)
{
// If the word has more than 3 letters
if(strlen($word) > 3)
{
// If the last two letters are what we want
if ( substr($word, -2) == $lookFor )
{
// Replace the last 2 letters of the word
$string[$key] = substr_replace($word, $switchTo, -2);
}
}
}
// Recreate string from array
$string = implode(" ", $string);
// See what we got
echo $string;
// The above will print:
// Hey thei ey sashei play nowhei
?>
Live example
I'll explain each function so that you can modify the above to exactly how you want it, since I don't precisely understand all your specifications:
explode() will take a string and split it apart into an array. The first argument is what you use to split it. The second argument is the string, so explode(" ", $string) will split $string by the use of spaces. The spaces will not be included in the array.
foreach() will cycle through each element of an array. foreach($string as $key => $word) will go through each element of $string and for each element it will assign the index number to $key and the value of the element (the word in this case) to $word.
strlen() returns how long a string is.
substr() returns a portion of a string. The first argument is the string, the second argument is where the substring starts, and a third optional argument is the length of the substring. With a negative start, the start will be calculated from the end of the string to the end of the string. In other words, substr($word, -2) returns the substring that begins two from the end of the string and goes to the end of the string.... the last two letters. If you want the first two letters, you would use substr($word, 0, 2), since you're starting at the very beginning and want a length of 2 letters.
substr_replace() will replace a substring within a string. The first argument is the entire string. The second argument is your replacement substring. The third argument is where the replacement starts, and the fourth optional argument is the length of the substring, so substr_replace($word, $switchTo, -2) will take $word and starting at the penultimate letter, replace what's there with $switchTo. In this case, we'll switch out the last two letter. If you want to replace the first two letters, you would use substr_replace($word, $switchTo, 0, 2)
implode() is the opposite of explode. It takes an array and forms it into a string using the separator specified.
$string = 'About 200 million CAPTCHAs are solved by humans around the world every day.';
$result = array();
$words = explode(" ",$string);
foreach($words as $word){
if(strlen($word) > 3){
$result[] = substr($word,0,3); //first 3 characters, use "-3" for second paramter if you want last three
}
}
function get_symbols($str, $reverse = false)
{
$symbols = array();
foreach (explode(' ', $str) as $word)
{
if ($reverse)
$word = strrev($word);
if (strlen($word) > 3)
$word = substr($word, 0, 3);
array_push($symbols, $word);
}
return $symbols;
}
EDIT:
function change_reverse_symbol_in_word($str, $symbol, $replace_to)
{
$result = "";
foreach (explode(' ', $str) as $word)
{
$rword = $word;
if (strlen($rword) > 3)
{
$rword = substr($word, 0, -3);
}
if (!strcmp($symbol, $rword))
{
$word = substr($word, 0, strlen($word) - strlen($rword)) . $replace_to;
}
$result .= $word . " ";
}
return $result;
}
And if you want to use this like a your question you must call this like that:
$string_malformed = change_reverse_symbol_in_word($str, "ey", "ei");

Categories