The goal is to get from string $a="NewYork" new string without lowercase that stands before uppercase.
In this example, we should get output "NeYork"
I tried to do this through positions of small and big letters in ASCII table, but it doesn't work. I'm not sure is it possible to to do this in similar way, through positions in ASCII table.
function delete_char($a)
{
global $b;
$a = 'NewYork';
for($i =0; $i<strlen($a); $i++)
{
if( ord($a[$i])< ord($a[$i+1])){//this solves only part of a problem
chop($a,'$a[$i]');
}
else{
$b.=$a[$i];
}
}
return $b;
}
This is something a regular expression handles with ease
<?php
$a ="NewYorkNewYork";
$reg="/[a-z]([A-Z])/";
echo preg_replace($reg, "$1", $a); // NeYorNeYork
The regular expression searches for a lower case letter followed by an upper case letter, and captures the upper case one. preg_replace() then replace that combination with just the captured letter ($1).
See https://3v4l.org/o43bO
You don't need to capture the uppercase letter and use a backreference in the replacement string.
More simply, match the lowercase letter then use a lookahead for an uppercase letter -- this way you only replace the lowercase character with an empty string. (Demo)
echo preg_replace('~[a-z](?=[A-Z])~', '', 'NewYork');
// NeYork
As for a review of your code, there are multiple issues.
global $b doesn't make sense to me. You need the variable to be instantiated as an empty string within the scope of the custom function only. It more simply should be $b = '';.
The variable and function naming is unhelpful. A function's name should specifically describe the function's action. A variable should intuitively describe the data that it contains. Generally speaking, don't sacrifice clarity for brevity.
As a matter of best practice, you should not repeatedly call a function when you know that the value has not changed. Calling strlen() on each iteration of the loop is not beneficial. Declare $length = strlen($input) and use $length over and over.
$a[$i+1] is going to generate an undefined offset warning on the last iteration of the loop because there cannot possibly be a character at that offset when you already know the length of the string has been fully processed. In other words, the last character of a string will have an offset of "length - 1". There is more than one way to address this, but I'll use the null coalescing operator to set a fallback character that will not qualify the previous letter for removal.
Most importantly, you cannot just check that the current ord value is less than the next ord value. See here that lowercase letters have an ordinal range of 97 through 122 and uppercase letters have an ordinal range of 65 through 90. You will need to check that both letters meet the qualifying criteria for the current letter to be included in the result string.
Rewrite: (Demo)
function removeLowerCharBeforeUpperChar(string $input): string
{
$output = '';
$length = strlen($input);
for ($offset = 0; $offset < $length; ++$offset) {
$currentOrd = ord($input[$offset]);
$nextOrd = ord($input[$offset + 1] ?? '_');
if ($currentOrd < 97
|| $currentOrd > 122
|| $nextOrd < 65
|| $nextOrd > 90
){
$output .= $input[$offset];
}
}
return $output;
}
echo removeLowerCharBeforeUpperChar('MickMacKusa');
// MicMaKusa
Or with ctype_ functions: (Demo)
function removeLowerCharBeforeUpperChar(string $input): string
{
$output = '';
$length = strlen($input);
for ($offset = 0; $offset < $length; ++$offset) {
$nextLetter = $input[$offset + 1] ?? '';
if (ctype_lower($input[$offset]) && ctype_upper($nextLetter)) {
$output .= $nextLetter; // omit current letter, save next
++$offset; // double iterate
} else {
$output .= $input[$offset]; // save current letter
}
}
return $output;
}
To clarify, I would not use the above custom function in a professional script and both snippets are not built to process strings containing multibyte characters.
Simply, I create new variable $s used for store new string to be returned and a make loop iterate over $a string, I used ctype_upper to check if next character not uppercase append it to $s. at the end i return $s concatenate with last char of string.
function delete_char(string $a): string
{
if(!strlen($a))
{
return '';
}
$s='';
for($i = 0; $i < strlen($a)-1; $i++)
{
if(!ctype_upper($a[$i+1])){
$s.=$a[$i];
}
}
return $s.$a[-1];
}
echo delete_char("NewYork");//NeYork
Something like this maybe?
<?php
$word = 'NewYork';
preg_match('/.[A-Z].*/', $word, $match);
if($match){
$rlen = strlen($match[0]); //length from character before capital letter
$start = strlen($word)-$rlen; //first lower case before the capital
$edited_word = substr_replace($word, '', $start, 1); //removes character
echo $edited_word; //prints NeYork
}
?>
Related
I have a string formed up by numbers and sometimes by letters.
Example AF-1234 or 345ww.
I have to get the numeric part and increment it by one.
how can I do that? maybe with regex?
You can use preg_replace_callback as:
function inc($matches) {
return ++$matches[1];
}
$input = preg_replace_callback("|(\d+)|", "inc", $input);
Basically you match the numeric part of the string using the regex \d+ and replace it with the value returned by the callback function which returns the incremented value.
Ideone link
Alternatively this can be done using preg_replace() with the e modifier as:
$input = preg_replace("|(\d+)|e", "$1+1", $input);
Ideone link
If the string ends with numeric characters it is this simple...
$str = 'AF-1234';
echo $str++; //AF-1235
That works the same way with '345ww' though the result may not be what you expect.
$str = '345ww';
echo $str++; //345wx
#tampe125
This example is probably the best method for your needs if incrementing string that end with numbers.
$str = 'XXX-342';
echo $str++; //XXX-343
Here is an example that worked for me by doing a pre increment on the value
$admNo = HF0001;
$newAdmNo = ++$admNo;
The above code will output HF0002
If you are dealing with strings that have multiple number parts then it's not so easy to solve with regex, since you might have numbers overflowing from one numeric part to another.
For example if you have a number INV00-10-99 which should increment to INV00-11-00.
I ended up with the following:
for ($i = strlen($string) - 1; $i >= 0; $i--) {
if (is_numeric($string[$i])) {
$most_significant_number = $i;
if ($string[$i] < 9) {
$string[$i] = $string[$i] + 1;
break;
}
// The number was a 9, set it to zero and continue.
$string[$i] = 0;
}
}
// If the most significant number was set to a zero it has overflowed so we
// need to prefix it with a '1'.
if ($string[$most_significant_number] === '0') {
$string = substr_replace($string, '1', $most_significant_number, 0);
}
Here's some Python code that does what you ask. Not too great on my PHP, but I'll see if I can convert it for you.
>>> import re
>>> match = re.match(r'(\D*)(\d+)(\D*)', 'AF-1234')
>>> match.group(1) + str(int(match.group(2))+1) + match.group(3)
'AF-1235'
This is similar to the answer above, but contains the code inline and does a full check for the last character.
function replace_title($title) {
$pattern = '/(\d+)(?!.*\d)+/';
return preg_replace_callback($pattern, function($m) { return ++$m[0]; }, $title);
}
echo replace_title('test 123'); // test 124
echo replace_title('test 12 3'); // test 12 4
echo replace_title('test 123 - 2'); // test 123 - 3
echo replace_title('test 123 - 3 - 5'); // test 123 - 3 - 6
echo replace_title('123test'); // 124test
I am trying to generate random voucher code applying the following rules:
Alphanumeric combination 5 characters in capital case (A-Z, 0-9, and take away 1, 0, I, O).
This is my try
<?php
function generateRandomString($length = 5) {
return substr(str_shuffle("23456789ABCDEFGHIJKMNPQRSTUVWXYZ"), 0, $length);
}
echo generateRandomString();
?>
but i am not sure if there is a better way of doing this
If you need to call this function lots of times, your current implementation will be very slow, because it uses much more calls of random function than it is necessary (if $length < 32). Also if your set of allowed characters is smaller than number of characters in the result, your current implementation will return wrong result too. And also your implementation does not allow repeating of characters in the result, but in the specification it is not forbidden to repeat characters.
A little more accurate solution is to use array_rand():
function generateRandomString($length = 5) {
$allowed = str_split('23456789ABCDEFGHIJKMNPQRSTUVWXYZ'); // it is enough to do it once
$res = '';
foreach (array_rand($allowed, $length) as $k)
$res .= $allowed[$k];
return $res;
}
function generateRandom($length = 5) {
$possibleChars = '123456789ABCDEFGHJKMNPQRSTUVWXYZ';
$rndString = '';
for ($i = 0; $i < $length; $i++) {
$rndString .= $possibleChars[rand(0, strlen($possibleChars) - 1)];
}
return $rndString;
}
echo generateRandom();
Here you can define the characters which you want to have in your random string.
The problem with your function is that any char will be just used 1 time per call. Its not really random. And the lenght of the random string would also be limited to the amount of characters you have.
For example: AAAAA is not possible with your function, with mine it is.
If you need a string longer than your charset, that method will fail. Please can you try the code below;
<?php
function generateRandomString($length = 5) {
$chars = "23456789ABCDEFGHIJKMNPQRSTUVWXYZ"; //Your char-set
$charArray = str_split($chars); //Your array representation of chars
$charCount = strlen($chars); //Your char-set length
$result = "";
//Loop throught required `$length`
for($i=1;$i<=$length;$i++)
{
$randChar = rand(0,$charCount-1); //Pick a random char in range of our chars
$result .= $charArray[$randChar]; //Concatenate picked char to result
}
return $result;
}
echo generateRandomString(75);
?>
Here is a working example: https://ideone.com/D1EQ9T
Hope this helps.
Given a string such as:
$a = '00023407283';
$b = 'f045602345';
Is there a built in function that can count the number of occurrences of a specific character starting at the beginning and continuing until it finds a different character that is not specified?
Given the above, and specifying zero (0) as the character, the expected result would be:
$a = '00023407283'; // 3 (the other zeros don't count)
$b = 'f0045602345'; // 0 (It does not start with zero)
This should do the trick:
function count_leading($haystack,$value) {
$i = 0;
$mislead = false;
while($i < strlen($haystack) && !$mislead) {
if($haystack[$i] == $value) {
$i += 1;
} else {
$mislead = true;
}
}
return $i;
}
//examples
echo count_leading('aaldfkjlk','a'); //returns 2
echo count_leading('dskjheelk','c'); //returns 0
I don't think there's any built-in functions that could do that (it's too specific) but you could write a method to do that
function repeatChar($string, $char) {
$pos = 0;
while($string{$pos} == $char) $pos++;
return $pos;
}
Yes, you want strspn, which counts the number of characters from the second argument at the beginning of the first argument:
echo strspn($a, '0'); // === 3
echo strspn($b, '0'); // === 0
See it live at 3v4l.org. Besides being a built-in (read "fast"), this also accepts any number of single characters to look at the beginning. However, note that the function is byte-oriented, so it will not work as expected for multi-byte characters.
I need to trim words from begining and end of string. Problem is, sometimes the words can be abbreviated ie. only first three letters (followed by dot).
I tried hard to find suitable regular expression. Basicaly I need to chatch three or more initial characters up to length of replacement, but I cannot find regular expression, that will match variable length and will keep order of characters.
For example, if I need to trim 'insurance' from sentence 'insur. companies are rich', then pattern \^[insurance]{3,9}\ comes to my mind, but this pattern will also catch words like 'sensace', because order of characters (and their occurance) inside [] is not important for regexp.
Also, at end of string, I need remove serial-numbers, that are abbreviated from beginig - say 'XK-25F14' is sometimes presented as '25F14'. So I decided to go purely with character by character comparison.
Therefore I end with following php function
function trimWords($s, $dirt, $case_insensitive = false, $reverse = true)
{
$pos = 0;
$func = $case_insensitive ? 'strncasecmp' : 'strncmp';
// Get number of initial characters, that match in both strings
while ($func($s, $dirt, $pos + 1) === 0)
$pos++;
// If more than 2 initial characters match, then remove the match
if ($pos > 2)
$s = substr($s, $pos);
// Reverse $s and $dirt so it will trim from the end of string
$s = strrev($s);
if ($reverse)
return trimWords($s, strrev($dirt), $case_insensitive, false);
// After second run return back-reversed string
return trim($s, ' .-');
}
I'm happy with this function, but it has one drawback. It trims only one occurence of word. How to make it trim more occurances, i.e. remove both 'insurance ' from 'Insurance insur. companies'.
And I'm also curious, it realy does not exists such regular expression, that will match variable length and will respect order of characters in pattern?
Final solution
Thanks to mrhobo I have ended with function based on regular expression. This function can be easily improved and shall also be the most efficient for this task.
I have modified my previous function and it is two times quicker than regexp, but it can remove only one word per single run, so to be able to remove word from begin and end, it has to runs itself twice and performance is same as regexp and to remove more than one occurance of word, it has to runs itself multiple times, which will then be more and more slower.
The final function goes like this.
function trimWords($string, $word, $case_insensitive = false, $min_abbrv = 3)
{
$exc = substr($word, $min_abbrv);
$pat = null;
$i = strlen($exc);
while ($i--)
$pat = '(?>'.preg_quote($exc[$i], '#').$pat.')?';
$pat = substr($word, 0, $min_abbrv).$pat;
$pat = '#(?<begin>^)?(?:\W*\b'.$pat.'\b\W*)+(?(begin)|$)#';
if ($case_insensitive)
$pat .= 'i';
return preg_replace($pat, '', $string);
}
NOTE: with this function, it does not matter, if abbreviation ends with dot or not, it wipes out any shorter form of word and also removes all nonword characters around the word.
EDIT: I just tried create replace pattern like insu(r|ra|ran|ranc|rance) and function with atomic groups is faster by ~30% and with longer words it could be possibly even more efficient.
Matching a word and all possible abbreviations from the nth letter isn't quite an easy task in regex.
Here is how I would do it for the word insurance from the 4th letter:
insu(?>r(?>a(?>n(?>c(?>(?<last>e))?)?)?)?)?(?(last)|\.)
http://regex101.com/r/aL2gV4
It works by using atomic groups to force the regex engine as far as possible forward past the last 'rance' letters using the nested pattern (?>a(?>b)?)?. If the last letter letter is matched we're not dealing with an abbreviation thus no dot is required, otherwise the dot is required. This is coded by (?(last)|\.).
To trim, I would create a function to build the above regex for an abbreviation. Then you can write a while loop that replaces each of the abbreviation regexes with empty space until there are no more matches.
Non regex version
Here is my non regex version that removes multiple words and abbreviated words from a string:
function trimWords($str, $word, $min_abbrv, $case_insensitive = false) {
$len = 0;
$word_len = strlen($word);
$strlen = strlen($str);
$cmp = $case_insensitive ? strncasecmp : strncmp;
for ($i = 0; $i < $strlen; $i++) {
if ($cmp($str[$i], $word[$len], $i) == 0) {
$len++;
} else if ($len > 0) {
if ($len == $word_len || ($len >= $min_abbrv && ($dot = $str[$i] == '.'))) {
$i -= $len;
$len += $dot;
$str = substr($str, 0, $i) . substr($str, $i+$len);
$strlen = strlen($str);
$dot = 0;
}
$len = 0;
}
}
return $str;
}
Example:
$string = 'ins. <- "ins." / insu. insuranc. insurance / insurance. <- "."';
echo trimWords($string, 'insurance', 4);
Output is:
ins. <- "ins." / / . <- "."
I wrote function that constructs regular expression pattern according to mrhobo and also simple test and benchmarked it against my function with pure PHP string comparison.
Here is the code:
$string = 'Insur. companies are nasty rich';
$dirt = 'insurance';
$cycles = 500000;
$start = microtime(true);
$i = $cycles;
while ($i) {
$i--;
regexpStyle($string, $dirt, true);
}
$stop = microtime(true);
$i = $cycles;
while ($i) {
$i--;
trimWords($string, $dirt, true);
}
$end = microtime(true);
$res1 = $stop - $start;
$res2 = $end - $stop;
$winner = $res1 < $res2 ? '<<<' : '>>>';
echo 'regexp: '.$res1.' '.$winner.' string operations: '.$res2;
function trimWords($s, $dirt, $case_insensitive = false, $reverse = true)
{
$pos = 0;
$func = $case_insensitive ? 'strncasecmp' : 'strncmp';
// Get number of initial characters, that match in both strings
while ($func($s, $dirt, $pos + 1) === 0)
$pos++;
// If more than 2 initial characters match, then remove the match
if ($pos > 2)
$s = substr($s, $pos);
// After second run return back-reversed string
return trim($s, ' .-');
}
function regexpStyle($s, $dirt, $case_insensitive, $min_abbrev = 3)
{
$ss = substr($dirt, $min_abbrev);
$arr = str_split($ss);
$patt = '(?>(?<last>'.array_pop($arr).'))?';
$i = count($arr);
while ($i)
$patt = '(?>'.$arr[--$i].$patt.')?';
$patt = '#^'.substr($dirt, 0, $min_abbrev).$patt.'(?(last)|\.)#';
$patt .= $case_insensitive ? 'i' : null;
return trim(preg_replace($patt, '', $s));
}
and the winner is... moment of silence... it is...
a draw
regexp: 8.5169589519501 >>> string operations: 8.0951890945435
but I have strong feeling that regexp approach could be better utilized.
How can I swap around / toggle the case of the characters in a string, for example:
$str = "Hello, My Name is Tom";
After I run the code I get a result like this:
$newstr = "hELLO, mY nAME Is tOM";
Is this even possible?
If your string is ASCII only, you can use XOR:
$str = "Hello, My Name is Tom";
print strtolower($str) ^ strtoupper($str) ^ $str;
Outputs:
hELLO, mY nAME IS tOM
OK I know you've already got an answer, but the somewhat obscure strtr() function is crying out to be used for this ;)
$str = "Hello, My Name is Tom";
echo strtr($str,
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ');
The quickest way is with a bitmask. No clunky string functions or regex. PHP is a wrapper for C, so we can manipulate bits quite easily if you know your logical function like OR, NOT, AND, XOR, NAND, etc..:
function swapCase($string) {
for ($i = 0; $i < strlen($string); $i++) {
$char = ord($string{$i});
if (($char > 64 && $char < 91) || ($char > 96 && $char < 123)) {
$string{$i} = chr($char ^ 32);
}
}
return $string;
}
This is what changes it:
$string{$i} = chr($char ^ 32);
We take the Nth character in $string and perform an XOR (^) telling the interpreter to take the integer value of $char and swapping the 6th bit (32) from a 1 to 0 or 0 to 1.
All ASCII characters are 32 away from their counterparts (ASCII was an ingenious design because of this. Since 32 is a power of 2 (2^5), it's easy to shift bits. To get the ASCII value of a letter, use the built in PHP function ord():
ord('a') // 65
ord('A') // 97
// 97 - 65 = 32
So you loop through the string using strlen() as the middle part of the for loop, and it will loop exactly the number of times as your string has letters. If the character at position $i is a letter (a-z (65-90) or A-Z (97-122)), it will swap that character for the uppercase or lowercase counterpart using a bitmask.
Here's how the bitmask works:
0100 0001 // 65 (lowercase a)
0010 0000 // 32 (bitmask of 32)
--------- // XOR means: we put a 1 if the bits are different, a 0 if they are same.
0110 0001 // 97 (uppercase A)
We can reverse it:
0110 0001 // 97 (A)
0010 0000 // Bitmask of 32
---------
0100 0001 // 65 (a)
No need for str_replace or preg_replace, we just swap bits to add or subtract 32 from the ASCII value of the character and we swap cases. The 6th bit (6th from the right) determines if the character is uppercase or lowercase. If it's a 0, it's lowercase and 1 if uppercase. Changing the bit from a 0 to a 1 ads 32, getting the uppercase chr() value, and changing from a 1 to a 0 subtracts 32, turning an uppercase letter lowercase.
swapCase('userId'); // USERiD
swapCase('USERiD'); // userId
swapCase('rot13'); // ROT13
We can also have a function that swaps the case on a particular character:
// $i = position in string
function swapCaseAtChar($string, $i) {
$char = ord($string{$i});
if (($char > 64 && $char < 91) || ($char > 96 && $char < 123)) {
$string{$i} = chr($char ^ 32);
return $string;
} else {
return $string;
}
}
echo swapCaseAtChar('iiiiiiii', 0); // Iiiiiiii
echo swapCaseAtChar('userid', 4); // userId
// Numbers are no issue
echo swapCaseAtChar('12345qqq', 7); // 12345qqQ
Very similar in function to the answer by Mark.
preg_replace_callback(
'/[a-z]/i',
function($matches) {
return $matches[0] ^ ' ';
},
$str
)
Explanation by #xtempore:
'a' ^ ' ' returns A. It works because A is 0x41 and a is 0x61 (and likewise for all A-Z), and because a space is 0x20. By xor-ing you are flipping that one bit. In simple terms, you are adding 32 to upper case letters making them lower case and subtracting 32 from lower case letters making them upper case.
You'll need to iterate through the string testing the case of each character, calling strtolower() or strtoupper() as appropriate, adding the modified character to a new string.
I know this question is old - but here's my 2 flavours of a multi-byte implementation.
Multi function version:
(mb_str_split function found here):
function mb_str_split( $string ) {
# Split at all position not after the start: ^
# and not before the end: $
return preg_split('/(?<!^)(?!$)/u', $string );
}
function mb_is_upper($char) {
return mb_strtolower($char, "UTF-8") != $char;
}
function mb_flip_case($string) {
$characters = mb_str_split($string);
foreach($characters as $key => $character) {
if(mb_is_upper($character))
$character = mb_strtolower($character, 'UTF-8');
else
$character = mb_strtoupper($character, 'UTF-8');
$characters[$key] = $character;
}
return implode('',$characters);
}
Single function version:
function mb_flip_case($string) {
$characters = preg_split('/(?<!^)(?!$)/u', $string );
foreach($characters as $key => $character) {
if(mb_strtolower($character, "UTF-8") != $character)
$character = mb_strtolower($character, 'UTF-8');
else
$character = mb_strtoupper($character, 'UTF-8');
$characters[$key] = $character;
}
return implode('',$characters);
}
Following script supports UTF-8 characters like "ą" etc.
PHP 7.1+
$before = 'aaAAąAŚĆżź';
$after = preg_replace_callback('/./u', function (array $char) {
[$char] = $char;
return $char === ($charLower = mb_strtolower($char))
? mb_strtoupper($char)
: $charLower;
}, $before);
PHP 7.4+
$before = 'aaAAąAŚĆżź';
$after = implode(array_map(function (string $char) {
return $char === ($charLower = mb_strtolower($char))
? mb_strtoupper($char)
: $charLower;
}, mb_str_split($before)));
$before: aaAAąAŚĆżź
$after: AAaaĄaśćŻŹ
I suppose a solution might be to use something like this :
$str = "Hello, My Name is Tom";
$newStr = '';
$length = strlen($str);
for ($i=0 ; $i<$length ; $i++) {
if ($str[$i] >= 'A' && $str[$i] <= 'Z') {
$newStr .= strtolower($str[$i]);
} else if ($str[$i] >= 'a' && $str[$i] <= 'z') {
$newStr .= strtoupper($str[$i]);
} else {
$newStr .= $str[$i];
}
}
echo $newStr;
Which gets you :
hELLO, mY nAME IS tOM
i.e. you :
loop over each character of the original string
if it's between A and Z, you put it to lower case
if it's between a and z, you put it to upper case
else, you keep it as-is
The problem being this will probably not work nicely with special character like accents :-(
And here is a quick proposal that might (or might not) work for some other characters :
$str = "Hello, My Name is Tom";
$newStr = '';
$length = strlen($str);
for ($i=0 ; $i<$length ; $i++) {
if (strtoupper($str[$i]) == $str[$i]) {
// Putting to upper case doesn't change the character
// => it's already in upper case => must be put to lower case
$newStr .= strtolower($str[$i]);
} else {
// Putting to upper changes the character
// => it's in lower case => must be transformed to upper case
$newStr .= strtoupper($str[$i]);
}
}
echo $newStr;
An idea, now, would be to use mb_strtolower and mb_strtoupper : it might help with special characters, and multi-byte encodings...
For a multibyte/unicode-safe solution, I'd probably recommend mutating/toggling the case of each letter based on which capture group contains a letter. This way you don't have to make a multibyte-base check after matching a letter with regex.
Code: (Demo)
$string = 'aaAAąAŚĆżź';
echo preg_replace_callback(
'/(\p{Lu})|(\p{Ll})/u',
function($m) {
return $m[1]
? mb_strtolower($m[1])
: mb_strtoupper($m[2]);
},
$string
);
// AAaaĄaśćŻŹ
See this answer about how to match letters that might be multibyte.