PHP Word Length Density / Count calc for a string

PHP Word Length Density / Count calc for a string - php

Given a text, how could I count the density / count of word lengths, so that I get an output like this
1 letter words : 52 / 1%
2 letter words : 34 / 0.5%
3 letter words : 67 / 2%
Found this but for python
counting the word length in a file
Index by word length

You could start by splitting your text into words, using either explode() (as a very/too simple solution) or preg_split() (allows for stuff that's a bit more powerful) :
$text = "this is some kind of text with several words";
$words = explode(' ', $text);
Then, iterate over the words, getting, for each one of those, its length, using strlen() ; and putting those lengths into an array :
$results = array();
foreach ($words as $word) {
$length = strlen($word);
if (isset($results[$length])) {
$results[$length]++;
}
else {
$results[$length] = 1;
}
}
If you're working with UTF-8, see mb_strlen().
At the end of that loop, $results would look like this :
array
4 => int 5
2 => int 2
7 => int 1
5 => int 1
The total number of words, which you'll need to calculate the percentage, can be found either :
By incrementing a counter inside the foreach loop,
or by calling array_sum() on $results after the loop is done.
And for the percentages' calculation, it's a bit of maths -- I won't be that helpful, about that ^^

You could explode the text by spaces and then for each resulting word, count the number of letters. If there are punctuation symbols or any other word separator, you must take this into account.
$lettercount = array();
$text = "lorem ipsum dolor sit amet";
foreach (explode(' ', $text) as $word)
{
#$lettercount[strlen($word)]++; // # for avoiding E_NOTICE on first addition
}
foreach ($lettercount as $numletters => $numwords)
{
echo "$numletters letters: $numwords<br />\n";
}
ps: I have not proved this, but should work

You can be smarter about removing punctuation by using preg_replace.
$txt = "Sean Hoare, who was first named News of the World journalist to make hacking allegations, found dead at Watford home. His death is not being treated as suspiciou";
$txt = str_replace( " ", " ", $txt );
$txt = str_replace( ".", "", $txt );
$txt = str_replace( ",", "", $txt );
$a = explode( " ", $txt );
$cnt = array();
foreach ( $a as $b )
{
if ( isset( $cnt[strlen($b)] ) )
$cnt[strlen($b)] += 1;
else
$cnt[strlen($b)] = 1;
}
foreach ( $cnt as $k => $v )
{
echo $k . " letter words: " . $v . " " . round( ( $v * 100 ) / count( $a ) ) . "%\n";
}

My simple way to limit the number of words characters in some string with php.
function checkWord_len($string, $nr_limit) {
$text_words = explode(" ", $string);
$text_count = count($text_words);
for ($i=0; $i < $text_count; $i++){ //Get the array words from text
// echo $text_words[$i] ; "
//Get the array words from text
$cc = (strlen($text_words[$i])) ;//Get the lenght char of each words from array
if($cc > $nr_limit) //Check the limit
{
$d = "0" ;
}
}
return $d ; //Return the value or null
}
$string_to_check = " heare is your text to check"; //Text to check
$nr_string_limit = '5' ; //Value of limit len word
$rez_fin = checkWord_len($string_to_check,$nr_string_limit) ;
if($rez_fin =='0')
{
echo "false";
//Execute the false code
}
elseif($rez_fin == null)
{
echo "true";
//Execute the true code
}
?>

Related

PHP: Shift characters of string by 5 spaces? So that A becomes F, B becomes G etc

How can I shift characters of string in PHP by 5 spaces?
So say:
A becomes F
B becomes G
Z becomes E
same with symbols:
!##$%^&*()_+
so ! becomes ^
% becomes )
and so on.
Anyway to do this?

The other answers use the ASCII table (which is good), but I've got the impression that's not what you're looking for. This one takes advantage of PHP's ability to access string characters as if the string itself is an array, allowing you to have your own order of characters.
First, you define your dictionary:
// for simplicity, we'll only use upper-case letters in the example
$dictionary = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
Then you go through your input string's characters and replace each of them with it's $position + 5 in the dictionary:
$input_string = 'STRING';
$output_string = '';
$dictionary_length = strlen($dictionary);
for ($i = 0, $length = strlen($input_string); $i < $length; $i++)
{
$position = strpos($dictionary, $input_string[$i]) + 5;
// if the searched character is at the end of $dictionary,
// re-start counting positions from 0
if ($position > $dictionary_length)
{
$position = $position - $dictionary_length;
}
$output_string .= $dictionary[$position];
}
$output_string will now contain your desired result.
Of course, if a character from $input_string does not exist in $dictionary, it will always end up as the 5th dictionary character, but it's up to you to define a proper dictionary and work around edge cases.

Iterate over characters and, get ascii value of each character and get char value of the ascii code shifted by 5:
function str_shift_chars_by_5_spaces($a) {
for( $i = 0; $i < strlen($a); $i++ ) {
$b .= chr(ord($a[$i])+5);};
}
return $b;
}
echo str_shift_chars_by_5_spaces("abc");
Prints "fgh"

Iterate over string, character at a time
Get character its ASCII value
Increase by 5
Add to new string
Something like this should work:
<?php
$newString = '';
foreach (str_split('test') as $character) {
$newString .= chr(ord($character) + 5);
}
echo $newString;
Note that there is more than one way to iterate over a string.

PHP has a function for this; it's called strtr():
$shifted = strtr( $string,
"ABCDEFGHIJKLMNOPQRSTUVWXYZ",
"FGHIJKLMNOPQRSTUVWXYZABCDE" );
Of course, you can do lowercase letters and numbers and even symbols at the same time:
$shifted = strtr( $string,
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!##$%^&*()_+",
"FGHIJKLMNOPQRSTUVWXYZABCDEfghijklmnopqrstuvwxyzabcde5678901234^&*()_+!##$%" );
To reverse the transformation, just swap the last two arguments to strtr().
If you need to change the shift distance dynamically, you can build the translation strings at runtime:
$shift = 5;
$from = $to = "";
$sequences = array( "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz",
"0123456789", "!##$%^&*()_+" );
foreach ( $sequences as $seq ) {
$d = $shift % strlen( $seq ); // wrap around if $shift > length of $seq
$from .= $seq;
$to .= substr($seq, $d) . substr($seq, 0, $d);
}
$shifted = strtr( $string, $from, $to );

Better / Cleaner method for splitting a string based on number of words?

I was in need of a method to count the number of words (not characters) within PHP, and start a <SPAN> tag within HTML to wrap around the remaining words after the specified number.
I looked into functions such as wordwrap and str_word_count, but those didn't seem to help. I went ahead and modified the code found here: http://php.timesoft.cc/manual/en/function.str-word-count.php#55818
Everything seems to work great, however I wanted to post here as this code is from 2005 and maybe there is a more modern / efficient way of handling what I'm trying to achieve?
<?php
$string = "One two three four five six seven eight nine ten.";
// the first number words to extract
$n = 3;
// extract the words
$words = explode(" ", $string);
// chop the words array down to the first n elements
$first = array_slice($words, 0, $n);
// chop the words array down to the retmaining elements
$last = array_slice($words, $n);
// glue the 3 elements back into a spaced sentence
$firstString = implode(" ", $first);
// glue the remaining elements back into a spaced sentence
$lastString = implode(" ", $last);
// display it
echo $firstString;
echo '<span style="font-weight:bold;"> '.$lastString.'</span>';
?>

You could use preg_split() with a regex instead. This is the modified version of this answer with an improved regex that uses a positive lookbehind:
function get_snippet($str, $wordCount) {
$arr = preg_split(
'/(?<=\w)\b/',
$str,
$wordCount*2+1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY
);
$first = implode('', array_slice($arr, 0, $wordCount));
$last = implode('', array_slice($arr, $wordCount));
return $first.'<span style="font-weight:bold;">'.$last.'</span>';
}
Usage:
$string = "One two three four five six seven eight nine ten.";
echo get_snippet($string, 3);
Output:
One two three four five six seven eight nine ten.
Demo

Lets more even simple . Try this
<?php
$string = "One two three four five six seven eight nine ten.";
// the first number words to extract
$n = 2;
// extract the words
$words = explode(" ", $string);
for($i=0; $i<=($n-1); $i++) {
$firstString[] = $words[$i]; // This will return one, two
}
for($i =$n; $i<count($words); $i++) {
$firstString[] = $words[$i]; // This will return three four five six seven eight nine ten
}
print_r($firstString);
print_r($firstString);
?>
Demo here

I borrowed the code from here:
https://stackoverflow.com/a/18589825/1578471
/**
* Find the position of the Xth occurrence of a substring in a string
* #param $haystack
* #param $needle
* #param $number integer > 0
* #return int
*/
function strposX($haystack, $needle, $number){
if($number == '1'){
return strpos($haystack, $needle);
}elseif($number > '1'){
return strpos($haystack, $needle, strposX($haystack, $needle, $number - 1) + strlen($needle));
}else{
return error_log('Error: Value for parameter $number is out of range');
}
}
$string = "One two three four five six seven eight nine ten.";
$afterThreeWords = strposX($string, " ", 3);
echo substr($string, 0, $afterThreeWords); // first three words

This looks good to me, here's another way that you might check against this for efficiency?
I have no idea which is quicker. My guess is yours is quicker for longer strings
$string = "This is some reasonably lengthed string";
$n = 3;
$pos = 0
for( $i = 0; $i< $n; $i++ ){
$pos = strpos($string, ' ', $pos + 1);
if( !$pos ){
break;
}
}
if( $pos ){
$firstString = substr($string, 0, $pos);
$lastString = substr($string, $pos + 1);
}else{
$firstString = $string;
$lastString = null;
}

Add space after every 4th character

I want to add a space to some output after every 4th character until the end of the string.
I tried:
$str = $rows['value'];
<? echo substr($str, 0, 4) . ' ' . substr($str, 4); ?>
Which just got me the space after the first 4 characters.
How can I make it show after every 4th ?

You can use chunk_split [docs]:
$str = chunk_split($rows['value'], 4, ' ');
DEMO
If the length of the string is a multiple of four but you don't want a trailing space, you can pass the result to trim.

Wordwrap does exactly what you want:
echo wordwrap('12345678' , 4 , ' ' , true )
will output:
1234 5678
If you want, say, a hyphen after every second digit instead, swap the "4" for a "2", and the space for a hyphen:
echo wordwrap('1234567890' , 2 , '-' , true )
will output:
12-34-56-78-90
Reference - wordwrap

Have you already seen this function called wordwrap?
http://us2.php.net/manual/en/function.wordwrap.php
Here is a solution. Works right out of the box like this.
<?php
$text = "Thiswordissoverylong.";
$newtext = wordwrap($text, 4, "\n", true);
echo "$newtext\n";
?>

Here is an example of string with length is not a multiple of 4 (or 5 in my case).
function space($str, $step, $reverse = false) {
if ($reverse)
return strrev(chunk_split(strrev($str), $step, ' '));
return chunk_split($str, $step, ' ');
}
Use :
echo space("0000000152748541695882", 5);
result: 00000 00152 74854 16958 82
Reverse mode use ("BVR code" for swiss billing) :
echo space("1400360152748541695882", 5, true);
result: 14 00360 15274 85416 95882
EDIT 2021-02-09
Also useful for EAN13 barcode formatting :
space("7640187670868", 6, true);
result : 7 640187 670868
short syntax version :
function space($s=false,$t=0,$r=false){return(!$s)?false:(($r)?trim(strrev(chunk_split(strrev($s),$t,' '))):trim(chunk_split($s,$t,' ')));}
Hope it could help some of you.

On way would be to split into 4-character chunks and then join them together again with a space between each part.
As this would technically miss to insert one at the very end if the last chunk would have exactly 4 characters, we would need to add that one manually (Demo):
$chunk_length = 4;
$chunks = str_split($str, $chunk_length);
$last = end($chunks);
if (strlen($last) === $chunk_length) {
$chunks[] = '';
}
$str_with_spaces = implode(' ', $chunks);

one-liner:
$yourstring = "1234567890";
echo implode(" ", str_split($yourstring, 4))." ";
This should give you as output:
1234 5678 90
That's all :D

The function wordwrap() basically does the same, however this should work as well.
$newstr = '';
$len = strlen($str);
for($i = 0; $i < $len; $i++) {
$newstr.= $str[$i];
if (($i+1) % 4 == 0) {
$newstr.= ' ';
}
}

PHP3 Compatible:
Try this:
$strLen = strlen( $str );
for($i = 0; $i < $strLen; $i += 4){
echo substr($str, $i, 4) . ' ';
}
unset( $strLen );

StringBuilder str = new StringBuilder("ABCDEFGHIJKLMNOP");
int idx = str.length() - 4;
while (idx > 0){
str.insert(idx, " ");
idx = idx - 4;
}
return str.toString();
Explanation, this code will add space from right to left:
str = "ABCDEFGH" int idx = total length - 4; //8-4=4
while (4>0){
str.insert(idx, " "); //this will insert space at 4th position
idx = idx - 4; // then decrement 4-4=0 and run loop again
}
The final output will be:
ABCD EFGH

doesn't while loop work correctly in PHP?

I want to get the count of characters from the following words in the string. For example, if my input is I am John then the output must be like this:
1 // count of 'I'
4 // count of 'I am'
9 // count of 'I am John'
I use the code like this in PHP for this process:
$string = 'I am John';
$words = explode(' ',$string);
$count_words = count($words);
$i =0;
while ($i<=$count_words){
$word_length =0;
$k=0;
while($k<=$i){
$word_length = strlen($words[$k-1]);
$word_length = $word_length + strlen($words[$k]);
$k++;
}
$word_length = $word_length + $i; // there is "$i" means "space"
echo $word_length.'<br/>';
$i++;
}
But it return the output like this:
1
4
8
7
Why ? Where is my error ? What does my code must be like ?
Thanks in advance!

<?php
$string = 'I am John';
$words = explode(' ',$string);
$count_words = count($words);
$i =0;
while ($i<$count_words){
if($i==0) {
$wordsc[$i] = strlen($words[$i]);
} else {
$wordsc[$i] = strlen($words[$i])+1+$wordsc[$i-1];
}
echo $wordsc[$i]."<br>";
$i++;
}
?>

Your error is here:
$i =0;
while ($i<=$count_words){
//....
}
$count_words is 3, but you iterate 4 times because of <=. Use < instead.

You were looping through to many words. When you use count it returns the number of elements in an array. Remember an array starts at 0.
$word_length + strlen($words[$k - 1]); // You were subtracting 1 I think you were trying to cater for the count offest but you are subtracting -1 from 0 causing the first word to be missed.
CODE SNIPPET START
//Set up the words
$string = 'I am John';
$words = explode(' ',$string);
$count_words = count($words);
//Loop through the words
$i =0;
while ($i<$count_words){
$word_length =0;
$k=0;
$debugString = '';
//Loop through all the previous words to the current
while($k<= $i){
//dont really need this since were adding the word length later
//$word_length = strlen($words[$k]);
//if you -1 from 0 you will get an undefined offest notice. You
//want to look at your current word
$word_length = $word_length + strlen($words[$k]);
//A bit of debugging you can delete this once you have seen the results
$debugString = $debugString ." ".$words[$k];
$k++;
}
$word_length = $word_length + $i ; // there is "$i" means "space"
//Added the debugString for debugging so remove it once you have seen the results
echo $word_length." " .$debugString.' <br/>';
$i++;
}
CODE SNIPPET END

I am happy to provide you with a completely different approach for generating your desired data in a very direct way. (Demo of what is to follow)
var_export(preg_match_all('/ |$/','I am John',$out,PREG_OFFSET_CAPTURE)?array_column($out[0],1):'failure');
Output:
array (
0 => 1,
1 => 4,
2 => 9,
)
Determining the length of each word-incremented substring is effectively the same as determining the offset of each trailing space, or on the final word - the full string length.
preg_match_all() has a built-in "flag" for this: PREG_OFFSET_CAPTURE
preg_match_all() before any array manipulation will output this:
array (
0 =>
array (
0 =>
array (
0 => ' ', // <-- the space after 'I' matched by ' '
1 => 1,
),
1 =>
array (
0 => ' ', // <-- the space after 'am' matched by ' '
1 => 4,
),
2 =>
array (
0 => '', // <-- the end of the string (after 'John') matched by '$'
1 => 9,
),
),
)
array_column() is used on the $out[0] to extract only the offset values (omitting the useless blank and empty strings).
Here is another, totally different method:
array_reduce(preg_split('/(?= )/',$string),function($carry,$item){echo $carry+=strlen($item)," "; return $carry;},0);
output: 1 4 9
This splits the string on the "zero-width" string that is followed by a space. This means that in the exploding process, the spaces are not lost -- this maintains the string and substring lengths for simple addition.

Check words if they are composed of Consecutive Alphabetic Characters

I take a sentence as input like this:
abcd 01234 87 01235
Next, I have to check every word to see if its characters are consecutive in the alphabet. The output looks like this:
abcd 01234
Well, 01235 contains consecutive chars, but the whole word ALSO contains non-consecutive chars (35), so it's not printed on the screen.
So far I wrote this:
function string_to_ascii($string)
{
$ascii = NULL;
for ($i = 0; $i < strlen($string); $i++)
{
$ascii[] = ord($string[$i]);
}
return($ascii);
}
$input = "abcd 01234 87 01235";
//first, we split the sentence into separate words
$input = explode(" ",$input);
foreach($input as $original_word)
{
//we need it clear
unset($current_word);
//convert current word into array of ascii chars
$ascii_array = string_to_ascii($original_word);
//needed for counting how many chars are already processed
$i = 0;
//we also need to count the total number chars in array
$ascii_count = count($ascii_array);
//here we go, checking each character from array
foreach ($ascii_array as $char)
{
//if IT'S THE LAST WORD'S CHAR
if($i+1 == $ascii_count)
{
//IF THE WORD HAS JUST 1 char, output it
if($ascii_count == 1)
{
$current_word .= chr($char);
}
//IF THE WORDS HAS MORE THAN 1 CHAR
else
{
//IF PREVIOUS CHAR CODE IS (CURRENT_CHAR-1) (CONSECUTIVE, OUTPUT IT)
if(($char - 1) == $ascii_array[($i-1)])
{
$current_word .=chr($char);
}
}
}
//IF WE AREN'T YET AT THE ENDING
else
{
//IF NEXT CHAR CODE IS (CURRENT_CHAR+1) (CONSECUTIVE, OUTPUT IT)
if(($char + 1) == ($ascii_array[($i+1)]))
{
$current_word .=chr($char);
}
}
$i++;
}
//FINALLY, WE CHECK IF THE TOTAL NUMBER OF CONSECUTIVE CHARS is the same as THE NUMBER OF CHARS
if(strlen($current_word) == strlen($original_word))
{
$output[] = $current_word;
}
}
//FORMAT IT BACK AS SENTENCE
print(implode(' ',$output));
But maybe there is another way to do this, more simple?
sorry for bad spelling

This works...
$str = 'abcd 01234 87 01235';
$words = explode(' ', $str);
foreach($words as $key => $word) {
if ($word != implode(range($word[0], chr(ord($word[0]) + strlen($word) - 1)))) {
unset($words[$key]);
}
}
echo implode(' ', $words);
CodePad.
Basically, it grabs the first character of each word, and creates the range of characters which would be the value if the word consisted of sequential characters.
It then does a simple string comparison.
For a more performant version...
$str = 'abcd 01234 87 01235';
$words = explode(' ', $str);
foreach($words as $key => $word) {
foreach(str_split($word) as $index => $char) {
$thisOrd = ord($char);
if ($index > 0 AND $thisOrd !== $lastOrd + 1) {
unset($words[$key]);
break;
}
$lastOrd = $thisOrd;
}
}
echo implode(' ', $words);
CodePad.
Both these examples rely on the ordinals of the characters being sequential for sequential characters. This is the case in ASCII, but I am not sure about other characters.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Word Length Density / Count calc for a string - php

Given a text, how could I count the density / count of word lengths, so that I get an output like this 1 letter words : 52 / 1% 2 letter words : 34 / 0.5% 3 letter words : 67 / 2% Found this but for python counting the word length in a file Index by word length

Related

PHP: Shift characters of string by 5 spaces? So that A becomes F, B becomes G etc

Better / Cleaner method for splitting a string based on number of words?

Add space after every 4th character

doesn't while loop work correctly in PHP?

Check words if they are composed of Consecutive Alphabetic Characters

Categories

Resources