I'm getting "illegal offset type" error for line $wordCountArr[$val]['bytotal'] = $wordCountArr[$val]['count'] / $totalWords; of this code. Here's the code in case anyone can help:
<?php
function extractCommonWords($string)
{
$stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
$string = preg_replace('/ss+/i', '', $string);
$string = trim($string); // trim the string
$string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
$string = strtolower($string); // make it lowercase
preg_match_all('/\b.*?\b/i', $string, $matchWords);
$matchWords = $matchWords[0];
$totalWords = count($matchWords[0]);
foreach ( $matchWords as $key=>$item ) {
if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
unset($matchWords[$key]);
}
}
$wordCountArr = array();
if ( is_array($matchWords) ) {
foreach ( $matchWords as $key => $val ) {
$val = strtolower($val);
if ( !isset($wordCountArr[$val])) {
$wordCountArr[$val] = array();
}
if ( isset($wordCountArr[$val]['count']) ) {
$wordCountArr[$val]['count']++;
} else {
$wordCountArr[$val]['count'] = 1;
}
}
arsort($wordCountArr);
$wordCountArr = array_slice($wordCountArr, 0, 10);
foreach ( $wordCountArr as $key => $val) {
$wordCountArr[$val]['bytotal'] = $wordCountArr[$val]['count'] / $totalWords;
}
}
return $wordCountArr;
}
$text = "AES algo to encrypt files.";
$words = extractCommonWords($text);
echo implode(',', array_keys($words));
?>
Look your entire foreach loop:
Change the variable $wordCountArr to $val:
foreach ( $wordCountArr as $key => $val) {
$val['bytotal'] = $val['count'] / $totalWords;
}
Hope it helps you.
You should be using $key not $val in your final foreach loop.
foreach ( $wordCountArr as $key => $val) {
$wordCountArr[$key]['bytotal'] = $wordCountArr[$key]['count'] / $totalWords;
}
Related
I have an array of words. e.g.,
$pattern = ['in', 'do', 'mie', 'indo'];
I wanna split a words match by the patterns to some ways.
input =
indomie
to output =
$ in, do, mie
$ indo, mie
any suggests?
*ps sorry for bad english. thank you very much!
it was an very interesting question.
Input:-
$inputSting = "indomie";
$pattern = ['in', 'do', 'mie', 'indo','dom','ie','indomi','e'];
Output:-
in,do,mie
in,dom,ie
indo,mie
indomi,e
Approach to this challenge
Get the pattern string length
Get the all the possible combination of matrix
Check whether the pattern matching.
if i understand your question correctly then above #V. Prince answer will work only for finding max two pattern.
function sampling($chars, $size, $combinations = array()) {
if (empty($combinations)) {
$combinations = $chars;
}
if ($size == 1) {
return $combinations;
}
$new_combinations = array();
foreach ($combinations as $combination) {
foreach ($chars as $char) {
$new_combinations[] = $combination . $char;
}
}
return sampling($chars, $size - 1, $new_combinations);
}
function splitbyPattern($inputSting, $pattern)
{
$patternLength= array();
// Get the each pattern string Length
foreach ($pattern as $length) {
if (!in_array(strlen($length), $patternLength))
{
array_push($patternLength,strlen($length));
}
}
// Get all the matrix combination of pattern string length to check the probablbe match
$combination = sampling($patternLength, count($patternLength));
$MatchOutput=Array();
foreach ($combination as $comp) {
$intlen=0;
$MatchNotfound = true;
$value="";
// Loop Through the each probable combination
foreach (str_split($comp,1) as $length) {
if($intlen<=strlen($inputSting))
{
// Check whether the pattern existing
if(in_array(substr($inputSting,$intlen,$length),$pattern))
{
$value = $value.substr($inputSting,$intlen,$length).',';
}
else
{
$MatchNotfound = false;
break;
}
}
else
{
break;
}
$intlen = $intlen+$length;
}
if($MatchNotfound)
{
array_push($MatchOutput,substr($value,0,strlen($value)-1));
}
}
return array_unique($MatchOutput);
}
$inputSting = "indomie";
$pattern = ['in', 'do', 'mie', 'indo','dom','ie','indomi','e'];
$output = splitbyPattern($inputSting,$pattern);
foreach($output as $out)
{
echo $out."<br>";
}
?>
try this one..
and if this solves your concern. try to understand it.
Goodluck.
<?php
function splitString( $pattern, $string ){
$finalResult = $semiResult = $output = array();
$cnt = 0;
# first loop of patterns
foreach( $pattern as $key => $value ){
$cnt++;
if( strpos( $string, $value ) !== false ){
if( implode("",$output) != $string ){
$output[] = $value;
if( $cnt == count($pattern) ) $semiResult[] = implode( ",", $output );
}else{
$semiResult[] = implode( ",", $output );
$output = array();
$output[] = $value;
if( implode("",$output) != $string ){
$semiResult[] = implode( ",", $output );
}
}
}
}
# second loop of patterns
foreach( $semiResult as $key => $value ){
$stackString = explode(",", $value);
/* if value is not yet equal to given string loop the pattern again */
if( str_replace(",", "", $value) != $string ){
foreach( $pattern as $key => $value ){
if( !strpos(' '.implode("", $stackString), $value) ){
$stackString[] = $value;
}
}
if( implode("", $stackString) == $string ) $finalResult[] = implode(",", $stackString); # if result equal to given string
}else{
$finalResult[] = $value; # if value is already equal to given string
}
}
return $finalResult;
}
$pattern = array('in','do','mie','indo','mi','e', 'i');
$string = 'indomie';
var_dump( '<pre>',splitString( $pattern, $string ) );
?>
I want to extract keywords automatically from Bengali text files using php.I have this code for reading a Bengali text file.
<?php
$target_path = $_FILES['uploadedfile']['name'];
header('Content-Type: text/plain;charset=utf-8');
$fp = fopen($target_path, 'r') or die("Can't open CEDICT.");
$i = 0;
while ($line = fgets($fp, 1024))
{
print $line;
$i++;
}
fclose($fp) or die("Can't close file.");
And I found following codes to extract most common 10 keywords but it's not working for Bengali texts. What changes should I make?
function extractCommonWords($string){
$stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
$string = preg_replace('/\s\s+/i', '', $string); // replace whitespace
$string = trim($string); // trim the string
$string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
$string = strtolower($string); // make it lowercase
preg_match_all('/\b.*?\b/i', $string, $matchWords);
$matchWords = $matchWords[0];
foreach ( $matchWords as $key=>$item ) {
if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
unset($matchWords[$key]);
}
}
$wordCountArr = array();
if ( is_array($matchWords) ) {
foreach ( $matchWords as $key => $val ) {
$val = strtolower($val);
if ( isset($wordCountArr[$val]) ) {
$wordCountArr[$val]++;
} else {
$wordCountArr[$val] = 1;
}
}
}
arsort($wordCountArr);
$wordCountArr = array_slice($wordCountArr, 0, 10);
return $wordCountArr;
}
Please help :(
You should make simple changes:
replace stopwords in $stopWords array with proper Bengali stopwords
remove this string $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); because Bengali sybmols doesn't match this pattern
Full code looks like:
<?php
function extractCommonWords($string){
// replace array below with proper Bengali stopwords
$stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
$string = preg_replace('/\s\s+/i', '', $string); // replace whitespace
$string = trim($string); // trim the string
// remove this preg_replace because Bengali sybmols doesn't match this pattern
// $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
$string = strtolower($string); // make it lowercase
preg_match_all('/\s.*?\s/i', $string, $matchWords);
$matchWords = $matchWords[0];
foreach ( $matchWords as $key=>$item ) {
if ( $item == '' || in_array(strtolower(trim($item)), $stopWords) || strlen($item) <= 3 ) {
unset($matchWords[$key]);
}
}
$wordCountArr = array();
if ( is_array($matchWords) ) {
foreach ( $matchWords as $key => $val ) {
$val = trim(strtolower($val));
if ( isset($wordCountArr[$val]) ) {
$wordCountArr[$val]++;
} else {
$wordCountArr[$val] = 1;
}
}
}
arsort($wordCountArr);
$wordCountArr = array_slice($wordCountArr, 0, 10);
return $wordCountArr;
}
$string = <<<EOF
টিপ বোঝে না, টোপ বোঝে না টিপ বোঝে না, কেমন বাপু লোক
EOF;
var_dump(extractCommonWords($string), $string);
Output will be:
array(4) {
["বোঝে"]=>
int(2)
["টোপ"]=>
int(1)
["না"]=>
int(1)
["কেমন"]=>
int(1)
}
string(127) "টিপ বোঝে না, টোপ বোঝে না টিপ বোঝে না, কেমন বাপু লোক"
I have this foreach :
<?php foreach ($manufacturers as $key => $manufacturer) {
if($manufacturer->virtuemart_manufacturercategories_id == 1){
$lang = 'heb'; //Hebrew
} else {
$lang = 'eng'; //English
}
//add to letter list
$letter = mb_substr($manufacturer->mf_name, 0, 1, 'UTF-8');
${'html_letters_'.$lang}[] = $letter;
/*
echo '<pre>';
print_r($manufacturer);
echo '</pre>';*/
$link = JROUTE::_('index.php?option=com_virtuemart&view=category&virtuemart_manufacturer_id=' . $manufacturer->virtuemart_manufacturer_id);
${'html_manufacturers_'.$lang} .= '<div class="manufacturer" data-lang="'.$lang.'" data-letter="'.$letter.'"><a href="'.$link.'">';
?>
<?php
if ($manufacturer->images && ($show == 'image' or $show == 'all' )) {
${'html_manufacturers_'.$lang} .= $manufacturer->images[0]->displayMediaThumb('',false);
}
if ($show == 'text' or $show == 'all' ) {
${'html_manufacturers_'.$lang} .= '<div>'.$manufacturer->mf_name.'</div>';
}
${'html_manufacturers_'.$lang} .= '</a>
</div> <!-- /manufacturer -->';
if ($col == $manufacturers_per_row){
$col = 1;
} else {
$col++;
}
}
?>
How i check if i have more then 2 same letters and unset all others but keep one.
The output for letter is :
AABCHKIUKP
I want this will be :
ABCHKIUKP
How i do this ?
EDIT: I have updated all the foreach code. The issue is if i have more then same start letter in name EG:Aroma,Air the loop the take the first letter A and foreach him 2 times, i want to show only one if there even 10 same start letter in name .
Thanks.
When I understand you right ... just use an array
<?php
$letters = array();
foreach ($manufacturers as $key => $manufacturer) {
....
$letter = mb_substr($manufacturer->mf_name, 0, 1, 'UTF-8');
$letters[] = $letter;
}
$uni = array_unique($letters)
echo implode('',$uni);
$s = 'AABCHKIUKP';
$letters = $letters = preg_split('/(?<!^)(?!$)/u', $s); //utf8
$prev_letter = '';
$temp = '';
foreach($letters as $key => $letter) {
if (!($letter === $prev_letter)) {
$temp .= $letter;
}
$prev_letter = $letter;
}
echo $temp;
Result:
ABCHKIUKP
Info about breaking a utf8 string to an array you can find at comments here mb_split.
To strip ONLY same consecutive chars you can use this:
function stripConsecutiveChars( $string ) {
// creates an array with chars from the string
$charArray = str_split( $string );
// saves the last char thats on the new string
$lastChar = "";
// variable for the new string to return
$returnString = "";
foreach( $charArray as $char ) {
if( $char === $lastChar ) continue;
$lastChar = $char;// save current char
$returnString .= $char;// concat current char to new string
}
return $returnString;
}
$string = "AABCHKIUKP";
echo stripConsecutiveChars( $string );
In your example you could try this:
...
$letter = mb_substr( $manufacturer->mf_name, 0, 1, 'UTF-8' );
$lastLetter = end( ${'html_letters_' . $lang} );//point to last element in array
reset( ${'html_letters_' . $lang} );// reset the pointer
if( $letter === $lastLetter ) {
// i guess you just want to continue?
continue;
}
...
OUTPUT:
ABCHKIUKP
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to extract keywords along with their frequency count from a text file using php. I have found one code that outputs only keywords eg. some, text, machines, vending. I also need frequency count along with these keywords eg. some 3, text 2, machines 1, vending 1. Can you suggest the necessary modifications.
function extractCommonWords($string)
{
$stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
$string = preg_replace('/ss+/i', '', $string);
$string = trim($string); // trim the string
$string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
$string = strtolower($string); // make it lowercase
preg_match_all('/\b.*?\b/i', $string, $matchWords);
$matchWords = $matchWords[0];
$totalWords = count($matchWords[0]);
foreach ( $matchWords as $key=>$item )
{
if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 )
{
unset($matchWords[$key]);
}
}
$wordCountArr = array();
if ( is_array($matchWords) )
{
foreach ( $matchWords as $key => $val )
{
$val = strtolower($val);
if ( !isset($wordCountArr[$val]))
{
$wordCountArr[$val] = array();
}
if ( isset($wordCountArr[$val]['count']) )
{
$wordCountArr[$val]['count']++;
}
else
{
$wordCountArr[$val]['count'] = 1;
}
}
arsort($wordCountArr);
$wordCountArr = array_slice($wordCountArr, 0, 10);
foreach ($wordCountArr as $key => $val)
{
$val['bytotal'] = $val['count'] / $totalWords;
}
}
return $wordCountArr;
}
$text = "This is some text. This is some text. Vending Machines are great.";
$words = extractCommonWords($text);
echo implode(',', array_keys($words));
function extractCommonWords($string)
{
$stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
$string = preg_replace('/ss+/i', '', $string);
$string = trim($string);
$string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
$string = strtolower($string); // make it lowercase
echo $string."<br>";
preg_match_all('/\b.*?\b/i', $string, $matchWords);
$matchWords = $matchWords[0];
$totalWords = count($matchWords[0]);
foreach ( $matchWords as $key=>$item ){
if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
unset($matchWords[$key]);
}
}
$wordCountArr = array();
if ( is_array($matchWords) ) {
foreach ( $matchWords as $key => $val ) {
$val = strtolower($val);
if (isset($wordCountArr[$val])){
$wordCountArr[$val] += 1;
} else {
$wordCountArr[$val] = 1;
}
}
arsort($wordCountArr);
}
}
$text = "This is some text. This is some text. Vending Machines are great.";
$words = extractCommonWords($text);
foreach ($words as $word => $count){
print ($word . " was found " . $count . " time(s)<br> ");
}
I have some data in this format:
even--heaped<br />
even--trees<br />
hardrocks-cocked<br />
pebble-temple<br />
heaped-feast<br />
trees-feast<br />
and I want to end up with an output so that all lines with the same words get added to each other with no repeats.
even--heaped--trees--feast<br />
hardrocks--cocked<br />
pebbles-temple<br />
i tried a loop going through both arrays but its not the exact result I want. for an array $thing:
Array ( [0] => even--heaped [1] => even--trees [2] => hardrocks--cocked [3] => pebbles--temple [4] => heaped--feast [5] => trees--feast )
for ($i=0;$i<count($thing);$i++){
for ($j=$i+1;$j<count($thing);$j++){
$first = explode("--",$thing[$i]);
$second = explode("--",$thing[$j]);
$merge = array_merge($first,$second);
$unique = array_unique($merge);
if (count($unique)==3){
$fix = implode("--",$unique);
$out[$i] = $thing[$i]."--".$thing[$j];
}
}
}
print_r($out);
but the result is:
Array ( [0] => even--heaped--heaped--feast [1] => even--trees--trees--feast [4] => heaped--feast--trees--feast )
which is not what i want. Any suggestions (sorry about the terrible variable names).
This might help you:
$in = array(
"even--heaped",
"even--trees",
"hardrocks--cocked",
"pebbles--temple",
"heaped--feast",
"trees--feast"
);
$clusters = array();
foreach( $in as $item ) {
$words = explode("--", $item);
// check if there exists a word in an existing cluster...
$check = false;
foreach($clusters as $k => $cluster) {
foreach($words as $word) {
if( in_array($word, $cluster) ) {
// add the words to this cluster
$clusters[$k] = array_unique( array_merge($cluster, $words) );
$check = true;
break;
}
}
}
if( !$check ) {
// create a new cluster
$clusters[] = $words;
}
}
// merge back
$out = array();
foreach( $clusters as $cluster ) {
$out[] = implode("--", $cluster);
}
pr($out);
Try this code:
<?php
$data = array ("1--2", "3--1", "4--5", "2--6");
$n = count($data);
$elements = array();
for ($i = 0; $i < $n; ++$i)
{
$split = explode("--", $data[$i]);
$word_num = NULL;
foreach($split as $word_key => $word)
{
foreach($elements as $key => $element)
{
if(isset($element[$word]))
{
$word_num = $key;
unset($split[$word_key]);
}
}
}
if(is_null($word_num))
{
$elements[] = array();
$word_num = count($elements) - 1;
}
foreach($split as $word_key => $word)
{
$elements[$word_num][$word] = 1;
}
}
//combine $elements into words
foreach($elements as $key => $value)
{
$words = array_keys($value);
$elements[$key] = implode("--", $words);
}
var_dump($elements);
It uses $elements as an array of hashes to store the individual unique words as keys. Then combines the keys to create appropriate words.
Prints this:
array(2) {
[0]=>
string(10) "1--2--3--6"
[1]=>
string(4) "4--5"
}
Here is a solution with a simple control flow.
<?php
$things = array('even--heaped', 'even--trees', 'hardrocks--cocked',
'pebble--temple', 'heaped--feast' ,'trees--feast');
foreach($things as $thing) {
$str = explode('--', $thing);
$first = $str[0];
$second = $str[1];
$i = '0';
while(true) {
if(!isset($a[$i])) {
$a[$i] = array();
array_push($a[$i], $first);
array_push($a[$i], $second);
break;
} else if(in_array($first, $a[$i]) && !in_array($second, $a[$i])) {
array_push($a[$i], $second);
break;
} else if(!in_array($first, $a[$i]) && in_array($second, $a[$i])) {
array_push($a[$i], $first);
break;
} else if(in_array($first, $a[$i]) && in_array($second, $a[$i])) {
break;
}
$i++;
}
}
print_r($a);
?>
It seems you have already selected user4035's answer as best.
But i feel this one is optimized(Please correct me if i am wrong): eval.in
Code:
$array = Array ( 'even--heaped' , 'even--trees' ,'hardrocks--cocked' , 'pebbles--temple' , 'heaped--feast' , 'trees--feast' );
print "Input: ";
print_r($array);
for($j=0;$j < count($array);$j++){
$len = count($array);
for($i=$j+1;$i < $len;$i++){
$tmp_array = explode("--", $array[$i]);
$pos1 = strpos($array[$j], $tmp_array[0]);
$pos2 = strpos($array[$j], $tmp_array[1]);
if (!($pos1 === false) && $pos2 === false){
$array[$j] = $array[$j] . '--'.$tmp_array[1];unset($array[$i]);
}elseif(!($pos2 === false) && $pos1 === false){
$array[$j] = $array[$j] . '--'.$tmp_array[0];unset($array[$i]);
}elseif(!($pos2 === false) && !($pos1 === false)){
unset($array[$i]);
}
}
$array = array_values($array);
}
print "\nOutput: ";
print_r($array);