I take a sentence as input like this:
abcd 01234 87 01235
Next, I have to check every word to see if its characters are consecutive in the alphabet. The output looks like this:
abcd 01234
Well, 01235 contains consecutive chars, but the whole word ALSO contains non-consecutive chars (35), so it's not printed on the screen.
So far I wrote this:
function string_to_ascii($string)
{
$ascii = NULL;
for ($i = 0; $i < strlen($string); $i++)
{
$ascii[] = ord($string[$i]);
}
return($ascii);
}
$input = "abcd 01234 87 01235";
//first, we split the sentence into separate words
$input = explode(" ",$input);
foreach($input as $original_word)
{
//we need it clear
unset($current_word);
//convert current word into array of ascii chars
$ascii_array = string_to_ascii($original_word);
//needed for counting how many chars are already processed
$i = 0;
//we also need to count the total number chars in array
$ascii_count = count($ascii_array);
//here we go, checking each character from array
foreach ($ascii_array as $char)
{
//if IT'S THE LAST WORD'S CHAR
if($i+1 == $ascii_count)
{
//IF THE WORD HAS JUST 1 char, output it
if($ascii_count == 1)
{
$current_word .= chr($char);
}
//IF THE WORDS HAS MORE THAN 1 CHAR
else
{
//IF PREVIOUS CHAR CODE IS (CURRENT_CHAR-1) (CONSECUTIVE, OUTPUT IT)
if(($char - 1) == $ascii_array[($i-1)])
{
$current_word .=chr($char);
}
}
}
//IF WE AREN'T YET AT THE ENDING
else
{
//IF NEXT CHAR CODE IS (CURRENT_CHAR+1) (CONSECUTIVE, OUTPUT IT)
if(($char + 1) == ($ascii_array[($i+1)]))
{
$current_word .=chr($char);
}
}
$i++;
}
//FINALLY, WE CHECK IF THE TOTAL NUMBER OF CONSECUTIVE CHARS is the same as THE NUMBER OF CHARS
if(strlen($current_word) == strlen($original_word))
{
$output[] = $current_word;
}
}
//FORMAT IT BACK AS SENTENCE
print(implode(' ',$output));
But maybe there is another way to do this, more simple?
sorry for bad spelling
This works...
$str = 'abcd 01234 87 01235';
$words = explode(' ', $str);
foreach($words as $key => $word) {
if ($word != implode(range($word[0], chr(ord($word[0]) + strlen($word) - 1)))) {
unset($words[$key]);
}
}
echo implode(' ', $words);
CodePad.
Basically, it grabs the first character of each word, and creates the range of characters which would be the value if the word consisted of sequential characters.
It then does a simple string comparison.
For a more performant version...
$str = 'abcd 01234 87 01235';
$words = explode(' ', $str);
foreach($words as $key => $word) {
foreach(str_split($word) as $index => $char) {
$thisOrd = ord($char);
if ($index > 0 AND $thisOrd !== $lastOrd + 1) {
unset($words[$key]);
break;
}
$lastOrd = $thisOrd;
}
}
echo implode(' ', $words);
CodePad.
Both these examples rely on the ordinals of the characters being sequential for sequential characters. This is the case in ASCII, but I am not sure about other characters.
Related
How can I shift characters of string in PHP by 5 spaces?
So say:
A becomes F
B becomes G
Z becomes E
same with symbols:
!##$%^&*()_+
so ! becomes ^
% becomes )
and so on.
Anyway to do this?
The other answers use the ASCII table (which is good), but I've got the impression that's not what you're looking for. This one takes advantage of PHP's ability to access string characters as if the string itself is an array, allowing you to have your own order of characters.
First, you define your dictionary:
// for simplicity, we'll only use upper-case letters in the example
$dictionary = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
Then you go through your input string's characters and replace each of them with it's $position + 5 in the dictionary:
$input_string = 'STRING';
$output_string = '';
$dictionary_length = strlen($dictionary);
for ($i = 0, $length = strlen($input_string); $i < $length; $i++)
{
$position = strpos($dictionary, $input_string[$i]) + 5;
// if the searched character is at the end of $dictionary,
// re-start counting positions from 0
if ($position > $dictionary_length)
{
$position = $position - $dictionary_length;
}
$output_string .= $dictionary[$position];
}
$output_string will now contain your desired result.
Of course, if a character from $input_string does not exist in $dictionary, it will always end up as the 5th dictionary character, but it's up to you to define a proper dictionary and work around edge cases.
Iterate over characters and, get ascii value of each character and get char value of the ascii code shifted by 5:
function str_shift_chars_by_5_spaces($a) {
for( $i = 0; $i < strlen($a); $i++ ) {
$b .= chr(ord($a[$i])+5);};
}
return $b;
}
echo str_shift_chars_by_5_spaces("abc");
Prints "fgh"
Iterate over string, character at a time
Get character its ASCII value
Increase by 5
Add to new string
Something like this should work:
<?php
$newString = '';
foreach (str_split('test') as $character) {
$newString .= chr(ord($character) + 5);
}
echo $newString;
Note that there is more than one way to iterate over a string.
PHP has a function for this; it's called strtr():
$shifted = strtr( $string,
"ABCDEFGHIJKLMNOPQRSTUVWXYZ",
"FGHIJKLMNOPQRSTUVWXYZABCDE" );
Of course, you can do lowercase letters and numbers and even symbols at the same time:
$shifted = strtr( $string,
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!##$%^&*()_+",
"FGHIJKLMNOPQRSTUVWXYZABCDEfghijklmnopqrstuvwxyzabcde5678901234^&*()_+!##$%" );
To reverse the transformation, just swap the last two arguments to strtr().
If you need to change the shift distance dynamically, you can build the translation strings at runtime:
$shift = 5;
$from = $to = "";
$sequences = array( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz",
"0123456789", "!##$%^&*()_+" );
foreach ( $sequences as $seq ) {
$d = $shift % strlen( $seq ); // wrap around if $shift > length of $seq
$from .= $seq;
$to .= substr($seq, $d) . substr($seq, 0, $d);
}
$shifted = strtr( $string, $from, $to );
I need to trim words from begining and end of string. Problem is, sometimes the words can be abbreviated ie. only first three letters (followed by dot).
I tried hard to find suitable regular expression. Basicaly I need to chatch three or more initial characters up to length of replacement, but I cannot find regular expression, that will match variable length and will keep order of characters.
For example, if I need to trim 'insurance' from sentence 'insur. companies are rich', then pattern \^[insurance]{3,9}\ comes to my mind, but this pattern will also catch words like 'sensace', because order of characters (and their occurance) inside [] is not important for regexp.
Also, at end of string, I need remove serial-numbers, that are abbreviated from beginig - say 'XK-25F14' is sometimes presented as '25F14'. So I decided to go purely with character by character comparison.
Therefore I end with following php function
function trimWords($s, $dirt, $case_insensitive = false, $reverse = true)
{
$pos = 0;
$func = $case_insensitive ? 'strncasecmp' : 'strncmp';
// Get number of initial characters, that match in both strings
while ($func($s, $dirt, $pos + 1) === 0)
$pos++;
// If more than 2 initial characters match, then remove the match
if ($pos > 2)
$s = substr($s, $pos);
// Reverse $s and $dirt so it will trim from the end of string
$s = strrev($s);
if ($reverse)
return trimWords($s, strrev($dirt), $case_insensitive, false);
// After second run return back-reversed string
return trim($s, ' .-');
}
I'm happy with this function, but it has one drawback. It trims only one occurence of word. How to make it trim more occurances, i.e. remove both 'insurance ' from 'Insurance insur. companies'.
And I'm also curious, it realy does not exists such regular expression, that will match variable length and will respect order of characters in pattern?
Final solution
Thanks to mrhobo I have ended with function based on regular expression. This function can be easily improved and shall also be the most efficient for this task.
I have modified my previous function and it is two times quicker than regexp, but it can remove only one word per single run, so to be able to remove word from begin and end, it has to runs itself twice and performance is same as regexp and to remove more than one occurance of word, it has to runs itself multiple times, which will then be more and more slower.
The final function goes like this.
function trimWords($string, $word, $case_insensitive = false, $min_abbrv = 3)
{
$exc = substr($word, $min_abbrv);
$pat = null;
$i = strlen($exc);
while ($i--)
$pat = '(?>'.preg_quote($exc[$i], '#').$pat.')?';
$pat = substr($word, 0, $min_abbrv).$pat;
$pat = '#(?<begin>^)?(?:\W*\b'.$pat.'\b\W*)+(?(begin)|$)#';
if ($case_insensitive)
$pat .= 'i';
return preg_replace($pat, '', $string);
}
NOTE: with this function, it does not matter, if abbreviation ends with dot or not, it wipes out any shorter form of word and also removes all nonword characters around the word.
EDIT: I just tried create replace pattern like insu(r|ra|ran|ranc|rance) and function with atomic groups is faster by ~30% and with longer words it could be possibly even more efficient.
Matching a word and all possible abbreviations from the nth letter isn't quite an easy task in regex.
Here is how I would do it for the word insurance from the 4th letter:
insu(?>r(?>a(?>n(?>c(?>(?<last>e))?)?)?)?)?(?(last)|\.)
http://regex101.com/r/aL2gV4
It works by using atomic groups to force the regex engine as far as possible forward past the last 'rance' letters using the nested pattern (?>a(?>b)?)?. If the last letter letter is matched we're not dealing with an abbreviation thus no dot is required, otherwise the dot is required. This is coded by (?(last)|\.).
To trim, I would create a function to build the above regex for an abbreviation. Then you can write a while loop that replaces each of the abbreviation regexes with empty space until there are no more matches.
Non regex version
Here is my non regex version that removes multiple words and abbreviated words from a string:
function trimWords($str, $word, $min_abbrv, $case_insensitive = false) {
$len = 0;
$word_len = strlen($word);
$strlen = strlen($str);
$cmp = $case_insensitive ? strncasecmp : strncmp;
for ($i = 0; $i < $strlen; $i++) {
if ($cmp($str[$i], $word[$len], $i) == 0) {
$len++;
} else if ($len > 0) {
if ($len == $word_len || ($len >= $min_abbrv && ($dot = $str[$i] == '.'))) {
$i -= $len;
$len += $dot;
$str = substr($str, 0, $i) . substr($str, $i+$len);
$strlen = strlen($str);
$dot = 0;
}
$len = 0;
}
}
return $str;
}
Example:
$string = 'ins. <- "ins." / insu. insuranc. insurance / insurance. <- "."';
echo trimWords($string, 'insurance', 4);
Output is:
ins. <- "ins." / / . <- "."
I wrote function that constructs regular expression pattern according to mrhobo and also simple test and benchmarked it against my function with pure PHP string comparison.
Here is the code:
$string = 'Insur. companies are nasty rich';
$dirt = 'insurance';
$cycles = 500000;
$start = microtime(true);
$i = $cycles;
while ($i) {
$i--;
regexpStyle($string, $dirt, true);
}
$stop = microtime(true);
$i = $cycles;
while ($i) {
$i--;
trimWords($string, $dirt, true);
}
$end = microtime(true);
$res1 = $stop - $start;
$res2 = $end - $stop;
$winner = $res1 < $res2 ? '<<<' : '>>>';
echo 'regexp: '.$res1.' '.$winner.' string operations: '.$res2;
function trimWords($s, $dirt, $case_insensitive = false, $reverse = true)
{
$pos = 0;
$func = $case_insensitive ? 'strncasecmp' : 'strncmp';
// Get number of initial characters, that match in both strings
while ($func($s, $dirt, $pos + 1) === 0)
$pos++;
// If more than 2 initial characters match, then remove the match
if ($pos > 2)
$s = substr($s, $pos);
// After second run return back-reversed string
return trim($s, ' .-');
}
function regexpStyle($s, $dirt, $case_insensitive, $min_abbrev = 3)
{
$ss = substr($dirt, $min_abbrev);
$arr = str_split($ss);
$patt = '(?>(?<last>'.array_pop($arr).'))?';
$i = count($arr);
while ($i)
$patt = '(?>'.$arr[--$i].$patt.')?';
$patt = '#^'.substr($dirt, 0, $min_abbrev).$patt.'(?(last)|\.)#';
$patt .= $case_insensitive ? 'i' : null;
return trim(preg_replace($patt, '', $s));
}
and the winner is... moment of silence... it is...
a draw
regexp: 8.5169589519501 >>> string operations: 8.0951890945435
but I have strong feeling that regexp approach could be better utilized.
I am using the following code to pull some keywords and add them as tags in wordpress.
if (!is_array($keywords)) {
$count = 0;
$keywords = explode(',', $keywords);
}
foreach($keywords as $thetag) {
$count++;
wp_add_post_tags($post_id, $thetag);
if ($count > 3) break;
}
The code will fetch only 4 keywords, However on top of that, I want to pull ONLY if they are higher than 2 characters, so i dont get tags with 2 letters only.
Can someone help me.
strlen($string) will give you the lenght of the string:
if (!is_array($keywords)) {
$count = 0;
$keywords = explode(',', $keywords);
}
foreach($keywords as $thetag) {
$thetag = trim($thetag); // just so if the tags were "abc, de, fgh" then de won't be selected as a valid tag
if(strlen($thetag) > 2){
$count++;
wp_add_post_tags($post_id, $thetag);
}
if ($count > 3) break;
}
Use strlen to check the length.
int strlen ( string $string )
Returns the length of the given string.
if(strlen($thetag) > 2) {
$count++;
wp_add_post_tags($post_id, $thetag);
}
Given a text, how could I count the density / count of word lengths, so that I get an output like this
1 letter words : 52 / 1%
2 letter words : 34 / 0.5%
3 letter words : 67 / 2%
Found this but for python
counting the word length in a file
Index by word length
You could start by splitting your text into words, using either explode() (as a very/too simple solution) or preg_split() (allows for stuff that's a bit more powerful) :
$text = "this is some kind of text with several words";
$words = explode(' ', $text);
Then, iterate over the words, getting, for each one of those, its length, using strlen() ; and putting those lengths into an array :
$results = array();
foreach ($words as $word) {
$length = strlen($word);
if (isset($results[$length])) {
$results[$length]++;
}
else {
$results[$length] = 1;
}
}
If you're working with UTF-8, see mb_strlen().
At the end of that loop, $results would look like this :
array
4 => int 5
2 => int 2
7 => int 1
5 => int 1
The total number of words, which you'll need to calculate the percentage, can be found either :
By incrementing a counter inside the foreach loop,
or by calling array_sum() on $results after the loop is done.
And for the percentages' calculation, it's a bit of maths -- I won't be that helpful, about that ^^
You could explode the text by spaces and then for each resulting word, count the number of letters. If there are punctuation symbols or any other word separator, you must take this into account.
$lettercount = array();
$text = "lorem ipsum dolor sit amet";
foreach (explode(' ', $text) as $word)
{
#$lettercount[strlen($word)]++; // # for avoiding E_NOTICE on first addition
}
foreach ($lettercount as $numletters => $numwords)
{
echo "$numletters letters: $numwords<br />\n";
}
ps: I have not proved this, but should work
You can be smarter about removing punctuation by using preg_replace.
$txt = "Sean Hoare, who was first named News of the World journalist to make hacking allegations, found dead at Watford home. His death is not being treated as suspiciou";
$txt = str_replace( " ", " ", $txt );
$txt = str_replace( ".", "", $txt );
$txt = str_replace( ",", "", $txt );
$a = explode( " ", $txt );
$cnt = array();
foreach ( $a as $b )
{
if ( isset( $cnt[strlen($b)] ) )
$cnt[strlen($b)] += 1;
else
$cnt[strlen($b)] = 1;
}
foreach ( $cnt as $k => $v )
{
echo $k . " letter words: " . $v . " " . round( ( $v * 100 ) / count( $a ) ) . "%\n";
}
My simple way to limit the number of words characters in some string with php.
function checkWord_len($string, $nr_limit) {
$text_words = explode(" ", $string);
$text_count = count($text_words);
for ($i=0; $i < $text_count; $i++){ //Get the array words from text
// echo $text_words[$i] ; "
//Get the array words from text
$cc = (strlen($text_words[$i])) ;//Get the lenght char of each words from array
if($cc > $nr_limit) //Check the limit
{
$d = "0" ;
}
}
return $d ; //Return the value or null
}
$string_to_check = " heare is your text to check"; //Text to check
$nr_string_limit = '5' ; //Value of limit len word
$rez_fin = checkWord_len($string_to_check,$nr_string_limit) ;
if($rez_fin =='0')
{
echo "false";
//Execute the false code
}
elseif($rez_fin == null)
{
echo "true";
//Execute the true code
}
?>
How can I swap around / toggle the case of the characters in a string, for example:
$str = "Hello, My Name is Tom";
After I run the code I get a result like this:
$newstr = "hELLO, mY nAME Is tOM";
Is this even possible?
If your string is ASCII only, you can use XOR:
$str = "Hello, My Name is Tom";
print strtolower($str) ^ strtoupper($str) ^ $str;
Outputs:
hELLO, mY nAME IS tOM
OK I know you've already got an answer, but the somewhat obscure strtr() function is crying out to be used for this ;)
$str = "Hello, My Name is Tom";
echo strtr($str,
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ');
The quickest way is with a bitmask. No clunky string functions or regex. PHP is a wrapper for C, so we can manipulate bits quite easily if you know your logical function like OR, NOT, AND, XOR, NAND, etc..:
function swapCase($string) {
for ($i = 0; $i < strlen($string); $i++) {
$char = ord($string{$i});
if (($char > 64 && $char < 91) || ($char > 96 && $char < 123)) {
$string{$i} = chr($char ^ 32);
}
}
return $string;
}
This is what changes it:
$string{$i} = chr($char ^ 32);
We take the Nth character in $string and perform an XOR (^) telling the interpreter to take the integer value of $char and swapping the 6th bit (32) from a 1 to 0 or 0 to 1.
All ASCII characters are 32 away from their counterparts (ASCII was an ingenious design because of this. Since 32 is a power of 2 (2^5), it's easy to shift bits. To get the ASCII value of a letter, use the built in PHP function ord():
ord('a') // 65
ord('A') // 97
// 97 - 65 = 32
So you loop through the string using strlen() as the middle part of the for loop, and it will loop exactly the number of times as your string has letters. If the character at position $i is a letter (a-z (65-90) or A-Z (97-122)), it will swap that character for the uppercase or lowercase counterpart using a bitmask.
Here's how the bitmask works:
0100 0001 // 65 (lowercase a)
0010 0000 // 32 (bitmask of 32)
--------- // XOR means: we put a 1 if the bits are different, a 0 if they are same.
0110 0001 // 97 (uppercase A)
We can reverse it:
0110 0001 // 97 (A)
0010 0000 // Bitmask of 32
---------
0100 0001 // 65 (a)
No need for str_replace or preg_replace, we just swap bits to add or subtract 32 from the ASCII value of the character and we swap cases. The 6th bit (6th from the right) determines if the character is uppercase or lowercase. If it's a 0, it's lowercase and 1 if uppercase. Changing the bit from a 0 to a 1 ads 32, getting the uppercase chr() value, and changing from a 1 to a 0 subtracts 32, turning an uppercase letter lowercase.
swapCase('userId'); // USERiD
swapCase('USERiD'); // userId
swapCase('rot13'); // ROT13
We can also have a function that swaps the case on a particular character:
// $i = position in string
function swapCaseAtChar($string, $i) {
$char = ord($string{$i});
if (($char > 64 && $char < 91) || ($char > 96 && $char < 123)) {
$string{$i} = chr($char ^ 32);
return $string;
} else {
return $string;
}
}
echo swapCaseAtChar('iiiiiiii', 0); // Iiiiiiii
echo swapCaseAtChar('userid', 4); // userId
// Numbers are no issue
echo swapCaseAtChar('12345qqq', 7); // 12345qqQ
Very similar in function to the answer by Mark.
preg_replace_callback(
'/[a-z]/i',
function($matches) {
return $matches[0] ^ ' ';
},
$str
)
Explanation by #xtempore:
'a' ^ ' ' returns A. It works because A is 0x41 and a is 0x61 (and likewise for all A-Z), and because a space is 0x20. By xor-ing you are flipping that one bit. In simple terms, you are adding 32 to upper case letters making them lower case and subtracting 32 from lower case letters making them upper case.
You'll need to iterate through the string testing the case of each character, calling strtolower() or strtoupper() as appropriate, adding the modified character to a new string.
I know this question is old - but here's my 2 flavours of a multi-byte implementation.
Multi function version:
(mb_str_split function found here):
function mb_str_split( $string ) {
# Split at all position not after the start: ^
# and not before the end: $
return preg_split('/(?<!^)(?!$)/u', $string );
}
function mb_is_upper($char) {
return mb_strtolower($char, "UTF-8") != $char;
}
function mb_flip_case($string) {
$characters = mb_str_split($string);
foreach($characters as $key => $character) {
if(mb_is_upper($character))
$character = mb_strtolower($character, 'UTF-8');
else
$character = mb_strtoupper($character, 'UTF-8');
$characters[$key] = $character;
}
return implode('',$characters);
}
Single function version:
function mb_flip_case($string) {
$characters = preg_split('/(?<!^)(?!$)/u', $string );
foreach($characters as $key => $character) {
if(mb_strtolower($character, "UTF-8") != $character)
$character = mb_strtolower($character, 'UTF-8');
else
$character = mb_strtoupper($character, 'UTF-8');
$characters[$key] = $character;
}
return implode('',$characters);
}
Following script supports UTF-8 characters like "ą" etc.
PHP 7.1+
$before = 'aaAAąAŚĆżź';
$after = preg_replace_callback('/./u', function (array $char) {
[$char] = $char;
return $char === ($charLower = mb_strtolower($char))
? mb_strtoupper($char)
: $charLower;
}, $before);
PHP 7.4+
$before = 'aaAAąAŚĆżź';
$after = implode(array_map(function (string $char) {
return $char === ($charLower = mb_strtolower($char))
? mb_strtoupper($char)
: $charLower;
}, mb_str_split($before)));
$before: aaAAąAŚĆżź
$after: AAaaĄaśćŻŹ
I suppose a solution might be to use something like this :
$str = "Hello, My Name is Tom";
$newStr = '';
$length = strlen($str);
for ($i=0 ; $i<$length ; $i++) {
if ($str[$i] >= 'A' && $str[$i] <= 'Z') {
$newStr .= strtolower($str[$i]);
} else if ($str[$i] >= 'a' && $str[$i] <= 'z') {
$newStr .= strtoupper($str[$i]);
} else {
$newStr .= $str[$i];
}
}
echo $newStr;
Which gets you :
hELLO, mY nAME IS tOM
i.e. you :
loop over each character of the original string
if it's between A and Z, you put it to lower case
if it's between a and z, you put it to upper case
else, you keep it as-is
The problem being this will probably not work nicely with special character like accents :-(
And here is a quick proposal that might (or might not) work for some other characters :
$str = "Hello, My Name is Tom";
$newStr = '';
$length = strlen($str);
for ($i=0 ; $i<$length ; $i++) {
if (strtoupper($str[$i]) == $str[$i]) {
// Putting to upper case doesn't change the character
// => it's already in upper case => must be put to lower case
$newStr .= strtolower($str[$i]);
} else {
// Putting to upper changes the character
// => it's in lower case => must be transformed to upper case
$newStr .= strtoupper($str[$i]);
}
}
echo $newStr;
An idea, now, would be to use mb_strtolower and mb_strtoupper : it might help with special characters, and multi-byte encodings...
For a multibyte/unicode-safe solution, I'd probably recommend mutating/toggling the case of each letter based on which capture group contains a letter. This way you don't have to make a multibyte-base check after matching a letter with regex.
Code: (Demo)
$string = 'aaAAąAŚĆżź';
echo preg_replace_callback(
'/(\p{Lu})|(\p{Ll})/u',
function($m) {
return $m[1]
? mb_strtolower($m[1])
: mb_strtoupper($m[2]);
},
$string
);
// AAaaĄaśćŻŹ
See this answer about how to match letters that might be multibyte.