We have a variable $string, its contains some text like:
About 200 million CAPTCHAs are solved by humans around the world every day.
How can we get 2-3 last or first letters of each word (which length is more than 3 letters)?
Will check them for matched text with foreach():
if ('ey' is matched in the end of some word) {
replace 'ey' with 'ei' in this word;
}
Thanks.
First, I'll give you an example of how to loop through a string and work with each word in the string.
Second, I'll explain each part of the code so that you can modify it to your exact needs.
Here is how to switch out the last 2 letters (if they are "ey") of each word that is more than 3 letters long.
<?php
// Example string
$string = 'Hey they ey shay play stay nowhey';
// Create array of words splitting at spaces
$string = explode(" ", $string);
// The search and replace strings
$lookFor = "ey";
$switchTo = "ei";
// Cycle through the words
foreach($string as $key => $word)
{
// If the word has more than 3 letters
if(strlen($word) > 3)
{
// If the last two letters are what we want
if ( substr($word, -2) == $lookFor )
{
// Replace the last 2 letters of the word
$string[$key] = substr_replace($word, $switchTo, -2);
}
}
}
// Recreate string from array
$string = implode(" ", $string);
// See what we got
echo $string;
// The above will print:
// Hey thei ey sashei play nowhei
?>
Live example
I'll explain each function so that you can modify the above to exactly how you want it, since I don't precisely understand all your specifications:
explode() will take a string and split it apart into an array. The first argument is what you use to split it. The second argument is the string, so explode(" ", $string) will split $string by the use of spaces. The spaces will not be included in the array.
foreach() will cycle through each element of an array. foreach($string as $key => $word) will go through each element of $string and for each element it will assign the index number to $key and the value of the element (the word in this case) to $word.
strlen() returns how long a string is.
substr() returns a portion of a string. The first argument is the string, the second argument is where the substring starts, and a third optional argument is the length of the substring. With a negative start, the start will be calculated from the end of the string to the end of the string. In other words, substr($word, -2) returns the substring that begins two from the end of the string and goes to the end of the string.... the last two letters. If you want the first two letters, you would use substr($word, 0, 2), since you're starting at the very beginning and want a length of 2 letters.
substr_replace() will replace a substring within a string. The first argument is the entire string. The second argument is your replacement substring. The third argument is where the replacement starts, and the fourth optional argument is the length of the substring, so substr_replace($word, $switchTo, -2) will take $word and starting at the penultimate letter, replace what's there with $switchTo. In this case, we'll switch out the last two letter. If you want to replace the first two letters, you would use substr_replace($word, $switchTo, 0, 2)
implode() is the opposite of explode. It takes an array and forms it into a string using the separator specified.
$string = 'About 200 million CAPTCHAs are solved by humans around the world every day.';
$result = array();
$words = explode(" ",$string);
foreach($words as $word){
if(strlen($word) > 3){
$result[] = substr($word,0,3); //first 3 characters, use "-3" for second paramter if you want last three
}
}
function get_symbols($str, $reverse = false)
{
$symbols = array();
foreach (explode(' ', $str) as $word)
{
if ($reverse)
$word = strrev($word);
if (strlen($word) > 3)
$word = substr($word, 0, 3);
array_push($symbols, $word);
}
return $symbols;
}
EDIT:
function change_reverse_symbol_in_word($str, $symbol, $replace_to)
{
$result = "";
foreach (explode(' ', $str) as $word)
{
$rword = $word;
if (strlen($rword) > 3)
{
$rword = substr($word, 0, -3);
}
if (!strcmp($symbol, $rword))
{
$word = substr($word, 0, strlen($word) - strlen($rword)) . $replace_to;
}
$result .= $word . " ";
}
return $result;
}
And if you want to use this like a your question you must call this like that:
$string_malformed = change_reverse_symbol_in_word($str, "ey", "ei");
Related
An example:
THIS IS A Sentence that should be TAKEN Care of
The output should be:
This is a Sentence that should be taken Care of
Rules
Convert UPPERCASE words to lowercase
Keep the lowercase words with an uppercase first character intact
Set the first character in the sentence to uppercase.
Code
$string = ucfirst(strtolower($string));
Fails
It fails because the ucfirst words are not being kept.
This is a sentence that should be taken care of
You can test each word for those rules:
$str = 'THIS IS A Sentence that should be TAKEN Care of';
$words = explode(' ', $str);
foreach($words as $k => $word){
if(strtoupper($word) === $word || // first rule
ucfirst($word) !== $word){ // second rule
$words[$k] = strtolower($word);
}
}
$sentence = ucfirst(implode(' ', $words)); // third rule
Output:
This is a Sentence that should be taken Care of
A little bit of explanation:
Since you have overlapping rules, you need to individually compare them, so...
Break down the sentence into separate words and check each of them based on the rules;
If the word is UPPERCASE, turn it into lowercase; (THIS, IS, A, TAKEN)
If the word is ucfirst, leave it alone; (Sentence, Care)
If the word is NOT ucfirst, turn it into lowercase, (that, should, be, of)
You can break the sentence down into individual words, then apply a formatting function to each of them:
$sentence = 'THIS IS A Sentence that should be TAKEN Care of';
$words = array_map(function ($word) {
// If the word only has its first letter capitalised, leave it alone
if ($word === ucfirst(strtolower($word)) && $word != strtoupper($word)) {
return $word;
}
// Otherwise set to all lower case
return strtolower($word);
}, explode(' ', $sentence));
// Re-combine the sentence, and capitalise the first character
echo ucfirst(implode(' ', $words));
See https://eval.in/936462
$str = "THIS IS A Sentence that should be TAKEN Care of";
$str_array = explode(" ", $str);
foreach ($str_array as $testcase =>$str1) {
//Check the first word
if ($testcase ==0 && ctype_upper($str1)) {
echo ucfirst(strtolower($str1))." ";
}
//Convert every other upercase to lowercase
elseif( ctype_upper($str1)) {
echo strtolower($str1)." ";
}
//Do nothing with lowercase
else {
echo $str1." ";
}
}
Output:
This is a Sentence that should be taken Care of
I find preg_replace_callback() to be a direct tool for this task. Create a pattern that will capture the two required strings:
The leading word
Any non-leading, ALL-CAPS word
Code: (Demo)
echo preg_replace_callback(
'~(^\pL+\b)|(\b\p{Lu}+\b)~u',
function($m) {
return $m[1]
? mb_convert_case($m[1], MB_CASE_TITLE, 'UTF-8')
: mb_strtolower($m[2], 'UTF-8');
},
'THIS IS A Sentence that should be TAKEN Care of'
);
// This is a Sentence that should be taken Care of
I did not test this with multibyte input strings, but I have tried to build it with multibyte characters in mind.
The custom function works like this:
There will always be either two or three elements in $m. If the first capture group matches the first word of the string, then there will be no $m[2]. When a non-first word is matched, then $m[2] will be populated and $m[1] will be an empty string. There is a modern flag that can be used to force that empty string to be null, but it is not advantageous in this case.
\pL+ means one or more of any letter (single or multi-byte)
\p{Lu}+ means one or more uppercase letters
\b is a word boundary. It is a zero-width character -- it doesn't match a character, it checks that the two consecutive characters change from a word to a non-word or vice versa.
My answer makes just 3 matches/replacement on the sample input string.
$string='THIS IS A Sentence that should be TAKEN Care of';
$arr=explode(" ", $string);
foreach($arr as $v)
{
$v = ucfirst(strtolower($v));
$stry = $stry . ' ' . $v;
}
echo $stry;
I am trying to something like this.
Hiding users except for first 3 characters.
EX)
apple -> app**
google -> goo***
abc12345 ->abc*****
I am currently using php like this:
$string = "abcd1234";
$regex = '/(?<=^(.{3}))(.*)$/';
$replacement = '*';
$changed = preg_replace($regex,$replacement,$string);
echo $changed;
and the result be like:
abc*
But I want to make a replacement to every single character except for first 3 - like:
abc*****
How should I do?
Don't use regex, use substr_replace:
$var = "abcdef";
$charToKeep = 3;
echo strlen($var) > $charToKeep ? substr_replace($var, str_repeat ( '*' , strlen($var) - $charToKeep), $charToKeep) : $var;
Keep in mind that regex are good for matching patterns in string, but there is a lot of functions already designed for string manipulation.
Will output:
abc***
Try this function. You can specify how much chars should be visible and which character should be used as mask:
$string = "abcd1234";
echo hideCharacters($string, 3, "*");
function hideCharacters($string, $visibleCharactersCount, $mask)
{
if(strlen($string) < $visibleCharactersCount)
return $string;
$part = substr($string, 0, $visibleCharactersCount);
return str_pad($part, strlen($string), $mask, STR_PAD_RIGHT);
}
Output:
abc*****
Your regex matches all symbols after the first 3, thus, you replace them with a one hard-coded *.
You can use
'~(^.{3}|(?!^)\G)\K.~'
And replace with *. See the regex demo
This regex matches the first 3 characters (with ^.{3}) or the end of the previous successful match or start of the string (with (?!^)\G), and then omits the characters matched from the match value (with \K) and matches any character but a newline with ..
See IDEONE demo
$re = '~(^.{3}|(?!^)\G)\K.~';
$strs = array("aa","apple", "google", "abc12345", "asdddd");
foreach ($strs as $s) {
$result = preg_replace($re, "*", $s);
echo $result . PHP_EOL;
}
Another possible solution is to concatenate the first three characters with a string of * repeated the correct number of times:
$text = substr($string, 0, 3).str_repeat('*', max(0, strlen($string) - 3));
The usage of max() is needed to avoid str_repeat() issue a warning when it receives a negative argument. This situation happens when the length of $string is less than 3.
Since i cant use preg_match (UTF8 support is somehow broken, it works locally but breaks at production) i want to find another way to match word against blacklist. Problem is, i want to search a string for exact match only, not first occurrence of the string.
This is how i do it with preg_match
preg_match('/\b(badword)\b/', strtolower($string));
Example string:
$string = "This is a string containing badwords and one badword";
I want to only match the "badword" (at the end) and not "badwords".
strpos('badword', $string) matches the first one
Any ideas?
Assuming you could do some pre-processing, you could use replace all your punctuation marks with white spaces and put everything in lowercase and then either:
Use strpos with something like so strpos(' badword ', $string) in a while loop to keep on iterating through your entire document;
Split the string at white spaces and compare each word with a list of bad words you have.
So if you where trying the first option, it would something like so (untested pseudo code)
$documet = body of text to process . ' '
$document.replace('!##$%^&*(),./...', ' ')
$document.toLowerCase()
$arr_badWords = [...]
foreach($word in badwords)
{
$badwordIndex = strpos(' ' . $word . ' ', $document)
while(!badWordIndex)
{
//
$badwordIndex = strpos($word, $document)
}
}
EDIT: As per #jonhopkins suggestion, adding a white space at the end should cater for the scenario where there wanted word is at the end of the document and is not proceeded by a punctuation mark.
If you want to mimic the \b modifier of regex you can try something like this:
$offset = 0;
$word = 'badword';
$matched = array();
while(($pos = strpos($string, $word, $offset)) !== false) {
$leftBoundary = false;
// If is the first char, it has a boundary on the right
if ($pos === 0) {
$leftBoundary = true;
// Else, if it is on the middle of the string, we must check the previous char
} elseif ($pos > 0 && in_array($string[$pos-1], array(' ', '-',...)) {
$leftBoundary = true;
}
$rightBoundary = false;
// If is the last char, it has a boundary on the right
if ($pos === (strlen($string) - 1)) {
$rightBoundary = true;
// Else, if it is on the middle of the string, we must check the next char
} elseif ($pos < (strlen($string) - 1) && in_array($string[$pos+1], array(' ', '-',...)) {
$rightBoundary = true;
}
// If it has both boundaries, we add the index to the matched ones...
if ($leftBoundary && $rightBoundary) {
$matched[] = $pos;
}
$offset = $pos + strlen($word);
}
You can use strrpos() instead of strpos:
strrpos — Find the position of the last occurrence of a substring in a string
$string = "This is a string containing badwords and one badword";
var_dump(strrpos($string, 'badword'));
Output:
45
A simple way to use word boundaries with unicode properties:
preg_match('/(?:^|[^pL\pN_])(badword)(?:[^pL\pN_]|$)/u', $string);
In fact it's much more complicated, have a look at here.
I need to take every double letter occurrence away from a word. (I.E. "attached" have to become: "aached".)
I wrote this function:
function strip_doubles($string, $positions) {
for ($i = 0; $i < strlen($string); $i++) {
$stripped_word[] = $string[$i];
}
foreach($positions['word'] as $position) {
unset($stripped_word[$position], $stripped_word[$position + 1]);
}
$returned_string= "";
foreach($stripped_words $key => $value) {
$returned_string.= $stripped_words[$key];
}
return $returned_string;
}
where $string is the word to be stripped and $positions is an array containing the positions of any first double letter.
It perfectly works but how would a real programmer write the same function... in a more condensed way? I have a feeling it could be possible to do the same thing without three loops and so much code.
Non-regex solution, tested:
$string = 'attached';
$stripped = '';
for ($i=0,$l=strlen($string);$i<$l;$i++) {
$matched = '';
// if current char is the same as the next, skip it
while (substr($string, $i, 1)==substr($string, $i+1, 1)) {
$matched = substr($string, $i, 1);
$i++;
}
// if current char is NOT the same as the matched char, append it
if (substr($string, $i, 1) != $matched) {
$stripped .= substr($string, $i, 1);
}
}
echo $stripped;
You should use a regular expression. It matches on certain characteristics and can replace the matched occurences with some other string(s).
Something like
$result = preg_replace('#([a-zA-Z]{1})\1#i', '', $string);
Should work. It tells the regexp to match one character from a-z followed by the match itself, thus effectively two identical characters after each other. The # mark the start and end of the regexp. If you want more characters than just a-z and A-Z, you could use other identifiers like [a-ZA-Z0-9]{1} or for any character .{1} or for only Unicode characters (including combined characters), use \p{L}\p{M}*
The i flag after the last # means 'case insensitive' and will instruct the regexp to also match combinations with different cases, like 'tT'. If you want only combinations in the same case, so 'tt' and 'TT', then remove the 'i' from the flags.
The '' tells the regexp to replace the matched occurences (the two identical characters) with an empty string.
See http://php.net/manual/en/function.preg-replace.php and http://www.regular-expressions.info/
echo $string can give any text.
How do I remove word "blank", only if it is the last word of the $string?
So, if we have a sentence like "Steve Blank is here" - nothing should not removed, otherwise if the sentence is "his name is Granblank", then "Blank" word should be removed.
You can easily do it using a regex. The \b ensures it's only removed if it's a separate word.
$str = preg_replace('/\bblank$/', '', $str);
As a variation on Teez's answer:
/**
* A slightly more readable, non-regex solution.
*/
function remove_if_trailing($haystack, $needle)
{
// The length of the needle as a negative number is where it would appear in the haystack
$needle_position = strlen($needle) * -1;
// If the last N letters match $needle
if (substr($haystack, $needle_position) == $needle) {
// Then remove the last N letters from the string
$haystack = substr($haystack, 0, $needle_position);
}
return $haystack;
}
echo remove_if_trailing("Steve Blank is here", 'blank'); // OUTPUTS: Steve blank is here
echo remove_if_trailing("his name is Granblank", 'blank'); // OUTPUTS: his name is Gran
Try the below code:
$str = trim($str);
$strlength = strlen($str);
if (strcasecmp(substr($str, ($strlength-5), $strlength), 'blank') == 0)
echo $str = substr($str, 0, ($strlength-5))
Don't use preg_match unless it is not required. PHP itself recommends using string functions over regex functions when the match is straightforward. From the preg_match manual page.
ThiefMaster is quite correct. A technique that doesn't involve the end of line $ regex character would be to use rtrim.
$trimmed = rtrim($str, "blank");
var_dump($trimmed);
^ That's if you want to remove the last characters of the string. If you want to remove the last word:
$trimmed = rtrim($str, "\sblank");
var_dump($trimmed);