I am a fairly comfortable PHP programmer, and have very little Python experience. I am trying to help a buddy with his project, the code is easy enough to write in Php, I have most of it ported over, but need a bit of help completing the translation if possible.
The target is to:
Generate a list of basic objects with uid's
Randomly select a few Items to create a second list keyed to the uid containing new
properties.
Test for intersections between the two lists to alter response accordingly.
The following is a working example of what I am trying to code in Python
<?php
srand(3234);
class Object{ // Basic item description
public $x =null;
public $y =null;
public $name =null;
public $uid =null;
}
class Trace{ // Used to update status or move position
# public $x =null;
# public $y =null;
# public $floor =null;
public $display =null; // Currently all we care about is controlling display
}
##########################################################
$objects = array();
$dirtyItems = array();
#CREATION OF ITEMS########################################
for($i = 0; $i < 10; $i++){
$objects[] = new Object();
$objects[$i]->uid = rand();
$objects[$i]->x = rand(1,30);
$objects[$i]->y = rand(1,30);
$objects[$i]->name = "Item$i";
}
##########################################################
#RANDOM ITEM REMOVAL######################################
foreach( $objects as $item )
if( rand(1,10) <= 2 ){ // Simulate full code with 20% chance to remove an item.
$derp = new Trace();
$derp->display = false;
$dirtyItems[$item->uid] = $derp; //# <- THIS IS WHERE I NEED THE PYTHON HELP
}
##########################################################
display();
function display(){
global $objects, $dirtyItems;
foreach( $objects as $key => $value ){ // Iterate object list
if( #is_null($dirtyItems[$value->uid]) ) // Print description
echo "<br />$value->name is at ($value->x, $value->y) ";
else // or Skip if on second list.
echo "<br />Player took item $value->uid";
}
}
?>
So, really I have most of it sorted I am just having trouble with Python's version of an Associative array, to have a list whose keys match the Unique number of Items in the main list.
The output from the above code should look similar to:
Player took item 27955
Player took item 20718
Player took item 10277
Item3 is at (8, 4)
Item4 is at (11, 13)
Item5 is at (3, 15)
Item6 is at (20, 5)
Item7 is at (24, 25)
Item8 is at (12, 13)
Player took item 30326
My Python skills are still course, but this is roughly the same code block as above.
I've been looking at and trying to use list functions .insert( ) or .setitem( ) but it is not quite working as expected.
This is my current Python code, not yet fully functional
import random
import math
# Begin New Globals
dirtyItems = {} # This is where we store the object info
class SimpleClass: # This is what we store the object info as
pass
# End New Globals
# Existing deffinitions
objects = []
class Object:
def __init__(self,x,y,name,uid):
self.x = x # X and Y positioning
self.y = y #
self.name = name #What will display on a 'look' command.
self.uid = uid
def do_items():
global dirtyItems, objects
for count in xrange(10):
X=random.randrange(1,20)
Y=random.randrange(1,20)
UID = int(math.floor(random.random()*10000))
item = Object(X,Y,'Item'+str(count),UID)
try: #This is the new part, we defined the item, now we see if the player has moved it
if dirtyItems[UID]:
print 'Player took ', UID
except KeyError:
objects.append(item) # Back to existing code after this
pass # Any error generated attempting to access means that the item is untouched by the player.
# place_items( )
random.seed(1234)
do_items()
for key in objects:
print "%s at %s %s." % (key.name, key.x, key.y)
if random.randint(1, 10) <= 1:
print key.name, 'should be missing below'
x = SimpleClass()
x.display = False
dirtyItems[key.uid]=x
print ' '
objects = []
random.seed(1234)
do_items()
for key in objects:
print "%s at %s %s." % (key.name, key.x, key.y)
print 'Done.'
So, sorry for the long post, but I wanted to be through and provide both sets of full code. The PhP works perfectly, and the Python is close. If anyone can point me in the correct direction it would be a huge help.
dirtyItems.insert(key.uid,x) is what i tried to use to make a list work as an Assoc array
Edit: minor correction.
You're declaring dirtyItems as an array instead of a dictionary. In python they're distinct types.
Do dirtyItems = {} instead.
Make a dictionary instead of an array:
import random
import math
dirtyItems = {}
Then you can use like:
dirtyItems[key.uid] = x
Related
The problem -> I need to find out the percentage of overlap between two routes.
Solution tried so far -> I tried passing the origin and destination (along with the key of course) to the following URL
https://maps.googleapis.com/maps/api/directions/json
and parsed the json response accordingly. Here is my code snippet -
$endpoint = 'https://maps.googleapis.com/maps/api/directions/json?origin='.$startLatitude1.','.$startLongitude1.'&destination='.$endLatitude1.','.$endLongitude1.'&key='.$mykey;
$json1 = file_get_contents($endpoint.http_build_query(array())); //array is empty here
$data1 = json_decode($json1);
if ($data1->status === 'OK') {
$endpoint = 'https://maps.googleapis.com/maps/api/directions/json?origin='.$startLatitude2.','.$startLongitude2.'&destination='.$endLatitude2.','.$endLongitude2.'&key='.$mykey;
$json2 = file_get_contents($endpoint.http_build_query(array())); //array is empty here
$data2 = json_decode($json2);
$polyline = array();
if ($data2->status === 'OK') {
$route2 = $data2->routes[0];
foreach ($route2->legs as $leg2) {
foreach ($leg2->steps as $step2) {
$polyline[$step2->polyline->points] = $step2->distance->value;
}
}
}
$overlap = 0;
$totalDistance = 0;
$route1 = $data1->routes[0];
foreach ($route1->legs as $leg1) {
$totalDistance = $leg1->distance->value;
foreach ($leg1->steps as $step1) {
if (array_key_exists($step1->polyline->points, $polyline)) {
$overlap = $overlap + $step1->distance->value;
}
}
}
echo 'Total Distance -> '.$totalDistance;
echo 'Overlap -> '.$overlap.'<br>';
}
So we are first traversing route 1 and storing the polylines as key in an associative array with distance as the value. Next we traverse route 2 and check if polylines from route 2 is already present in the associative array created earlier.
The problem -> This works until and unless the roads are straight. Let's assume there are 4 points - A, B, C, D and all are in a straight line in that order. Person X wants to go from A to D whereas person Y wants to go from B to C. So there is an overlap of B-C. But because the polylines will never match (origin and destination being different for X & Y), my code will not detect any overlap.
Any other way out?
In a previous Using R, how to reference variable variables (or variables variable) a la PHP[post]
I asked a question about something in R analagous to PHP $$ function:
Using R stats, I want to access a variable variable scenario similar to PHP double-dollar-sign technique: http://php.net/manual/en/language.variables.variable.php
Specifically, I am looking for a function in R that is equivalent to $$ in PHP.
The get( response works for strings (characters).
lapply is a way to loop over lists
Or I can loop over and get the values ...
for(name in names(vars))
{
val = vars[[name]];
I still haven't had the $$ function in R answered, although the lapply solved what I needed in the moment.
`$$` <- function
that allows any variable type to be evaluated. That is still the question.
UPDATES
> mlist = list('four'="score", 'seven'="years");
> str = 'mlist$four'
> mlist
$four
[1] "score"
$seven
[1] "years"
> str
[1] "mlist$four"
> get(str)
Error in get(str) : object 'mlist$four' not found
> mlist$four
[1] "score"
Or how about attributes for an object such as mobj#index
UPDATES #2
So let's put specific context on the need. I was hacking the texreg package to build a custom latex output of 24 models of regression for a research paper. I am using plm fixed effects, and the default output of texreg uses dcolumns to center, which I don't like (I prefer r#{}l, so I wanted to write my own template. The purpose for me, to code this, is for me to write extensible code that I can use again and again. I can rebuild my 24 tables across 4 pages in seconds, so if the data change, or if I want to tweak the function, I immediately have a nice answer. The power of abstraction.
As I hacked this, I wanted to get more than the number of observations, but also the number of groups, which can be any user defined index. In my case it is "country" (wait for it, hence, the need for variable variables).
If I do a lookup of the structure, what I want is right there: model$model#index$country which would be nice to simply call as $$('model$model#index$country'); where I can easily build the string using paste. Nope, this is my workaround.
getIndexCount = function(model,key="country")
{
myA = attr(summary(model)$model,"index");
for(i in 1:length(colnames(myA)))
{
if(colnames(myA)[i] == key) {idx = i; break;}
}
if(!is.na(idx))
{
length(unique(myA[,idx]));
} else {
FALSE;
}
}
UPDATES #3
Using R, on the command line, I can type in a string and it gets evaluated. Why can't that internal function be directly accessed, and the element captured that then gets printed to the screen?
There is no equivalent function in R. get() works for all types, not just strings.
Here is what I came up with, after chatting with the R-bug group, and getting some ideas from them. KUDOS!
`$$` <- function(str)
{
E = unlist( strsplit(as.character(str),"[#]") );
k = length(E);
if(k==1)
{
eval(parse(text=str));
} else {
# k = 2
nstr = paste("attributes(",E[1],")",sep="");
nstr = paste(nstr,'$',E[2],sep="");
if(k>2) {
for(i in 3:k)
{
nstr = paste("attributes(",nstr,")",sep="");
nstr = paste(nstr,'$',E[i],sep="");
}
}
`$$`(nstr);
}
}
Below are some example use cases, where I can directly access what the str(obj) is providing... Extending the utility of the '$' operator by also allowing '#' for attributes.
model = list("four" = "score", "seven"="years");
str = 'model$four';
result = `$$`(str);
print(result);
matrix = matrix(rnorm(1000), ncol=25);
str='matrix[1:5,8:10]';
result = `$$`(str);
print(result);
## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14);
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69);
group <- gl(2, 10, 20, labels = c("Ctl","Trt"));
weight <- c(ctl, trt);
lm.D9 <- lm(weight ~ group);
lm.D90 <- lm(weight ~ group - 1); # omitting intercept
myA = anova(lm.D9); myA; str(myA);
str = 'myA#heading';
result = `$$`(str);
print(result);
myS = summary(lm.D90); myS; str(myS);
str = 'myS$terms#factors';
result = `$$`(str);
print(result);
str = 'myS$terms#factors#dimnames';
result = `$$`(str);
print(result);
str = 'myS$terms#dataClasses#names';
result = `$$`(str);
print(result);
After realizing the back-tick can be a bit tedious, I chose to update the function, calling it access
access <- function(str)
{
E = unlist( strsplit(as.character(str),"[#]") );
k = length(E);
if(k==1)
{
eval(parse(text=str));
} else {
# k = 2
nstr = paste("attributes(",E[1],")",sep="");
nstr = paste(nstr,'$',E[2],sep="");
if(k>2) {
for(i in 3:k)
{
nstr = paste("attributes(",nstr,")",sep="");
nstr = paste(nstr,'$',E[i],sep="");
}
}
access(nstr);
}
}
jquery 1.6.2 / Firefox 6.0.1
OK I'm working on this shipment manager interface and when the page is loaded, each table row tr is assigned an id "shipment_XXXXX" where XXXXX is the id of the shipment from the database.
All the data regarding the shipment is, in PHP set to a multidimensional associative array which contains "shipmentItems" and "pkgs" among other things which are irrelevant. The shipmentItems object is a regular numeric array, where each element has several associative values such as "name", "qty", "price" so for example:
$shipmentItems[0]["name"] = "item 1";
$shipmentItems[0]["qty"] = 5;
$shipmentItems[0]["price"] = 20.00;
$shipmentItems[1]["name"] = "item 2";
$shipmentItems[1]["qty"] = 3;
$shipmentItems[1]["price"] = 5.00;
This array indicates all of the items that are part of this shipment in whole.
The other array pkgs is a list of each package in the shipment and each package has a packing_slip object/associative array. Example:
PACKAGE #1
$pkgs[0]['packing_slip'][0]['name'] = "item 1";
$pkgs[0]['packing_slip'][0]['qty'] = 3;
$pkgs[0]['packing_slip'][0]['price'] = 20.00;
$pkgs[0]['packing_slip'][1]['name'] = "item 2";
$pkgs[0]['packing_slip'][1]['qty'] = 1;
$pkgs[0]['packing_slip'][1]['price'] = 50.00;
PACKAGE #2
$pkgs[1]['packing_slip'][0]['name'] = "item 1";
$pkgs[1]['packing_slip'][0]['qty'] = 2;
$pkgs[1]['packing_slip'][0]['price'] = 20.00;
$pkgs[1]['packing_slip'][1]['name'] = "item 2";
$pkgs[1]['packing_slip'][1]['qty'] = 2;
$pkgs[1]['packing_slip'][1]['price'] = 50.00;
You'll see that the pkg array has the full shipment item list for each packing slip for each package. if you add the 0index qty from both packages you'll see that it adds up to the full shipment qty for that item line.
This set of data gets converted to a JSON string by php and tucked into a hidden form element within its corrosponding row.
After the page is loaded, jquery goes through each hidden json element, parses the json string to an object, the attaches the object to the TR.data('shipmentItems') and TR.data('pkgs') for each shipment in the list.
This is where things get funky...
I'm doing a function where the user can add a new package to the shipment. When they do this they are prompted to specify which package has how many quantities of each item in the whole shipment. They essentially lay out the packing slips.
The function they execute after they have mapped out the quantities, recreates the pkgs array(object) on the fly, which it gets from its rows .data('pkgs') container - and then re-attaches the pkgs object back to the data('pkgs') container.
I have logged the output of this function heavily and the quantities are all being assigned to the proper values see here:
var shipmentItems = $('#shipment_'+shipid).data('shipmentItems');
var pkgs = $('#shipment_'+shipid).data('pkgs');
var pkgnum = pkgs.length; // always returns one higher than last index.
// add new pkg to array
pkgs[pkgnum] = new Object();
pkgs[pkgnum].weight = weight;
console.log("("+pkgnum+") pkgs length: " + pkgs.length);
// overwrite packing slip data.
for(var x = 0; x < pkgs.length; x++) {
var curPS = new Array();
var curins = 0;
for(var y = 0; y < shipmentItems.length; y++) {
var curqty = parseInt($('#pkgqty-'+y+'-'+x).val());
curins += curqty * shipmentItems[y]['price'];
curPS.push(shipmentItems[y]);
console.log("["+y+"] before: " + curPS[y]['qty']);
curPS[y]['qty'] = curqty;
console.log("["+y+"] after: " + curPS[y]['qty']);
}
console.log(curPS[0]['qty'] + ' - ' + curPS[1]['qty']);
pkgs[x].packing_slip = curPS;
pkgs[x].insurance = curins;
}
// write pkgs data()
$('#shipment_'+shipid).removeData('pkgs');
$('#shipment_'+shipid).data('pkgs', pkgs);
The log output of the above is as follows:
(1) pkgs length: 2
[0] before: 3
[0] after: 2
[1] before: 4
[1] after: 3
2 - 3 // value of curPS[0]['qty'] and curPS[1]['qty'] for pkg#1 - pkgs[0] is set to curPS at this point.
[0] before: 2
[0] after: 1
[1] before: 3
[1] after: 1
1 - 1 // value of curPS[0]['qty'] and curPS[1]['qty'] for pkg#2 - pkgs[1] is set to curPS at this point.
This looks like it worked, right? Wrong. After the function completes I have a button I can push that prints out all the data() vars for a row. Not only are the qty value of every single pkg['packing_slip'][x] item set to 1, but if I look at the shipmentItems from this same object log, the qty values for all of the shipmentItems have also been reset to 1. Which is odd because at no point in the code does shipmentItems ever get overwritten and should still be the exact same as it was when the page loaded...
Anyone have any ideas whats going on here?
maybe it's because you pass shipmentItems by Reference - curPS.push(shipmentItems[y]);
Try to pass it by Value - curPS.push(shipmentItems[y].slice());
OK I ended up getting this working thanks to your (Alon) suggestion pointing me in the right direction. The slice() method did not work outright because the array elements I'm slicing out contained objects and so the deeper objects were still being passed as references rather than being copied. After some searching around I found the jQuery.extend() method was what I needed to copy the object arrays! Thanks again!
I am trying to convert the following python code to a PHP code. Can you please explain me what is wrong in my PHP code, because I do not get the same results. If you need example data please let me know.
# Returns a distance-based similarity score for person1 and person2
def sim_distance(prefs,person1,person2):
# Get the list of shared_items
si={}
for item in prefs[person1]:
if item in prefs[person2]: si[item]=1
# if they have no ratings in common, return 0
if len(si)==0: return 0
# Add up the squares of all the differences
sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2)
for item in prefs[person1] if item in prefs[person2]])
return 1/(1+sum_of_squares)
My PHP code:
$sum = 0.0;
foreach($arr[$person1] as $item => $val)
{
if(array_key_exists($item, $arr[$person2]))
{
$p = sqrt(pow($arr[$person1][$item] - $arr[$person2][$item], 2));
$sum = $sum + $p;
}
}
$sum = 1 / (1 + $sum);
echo $sum;
Thanks for helping!
The main difference is that you've added sqrt to the PHP code. The PHP also doesn't handle the special case of no prefs in common, which gives 0 in the python version and 1 in the PHP version.
I tested both versions and those are the only differences I found.
this is close as i could make a direct translation... (untested)
function sim_distance($prefs, $person1, $person2) {
$si = array();
foreach($prefs[$person1] as $item) {
if($item in $prefs[$person2]) $si[$item]=1;
}
if(count($si)==0) return 0;
$squares = array();
foreach($prefs[$person1] as $item) {
if(array_key_exists($item,$prefs[$person2])) {
$squares[] = pow($prefs[$person1][$item]-$prefs[$person2][$item],2);
}
}
$sum_of_squares = array_sum($squares);
return 1/(1+$sum_of_squares);
}
I don't really know what you're trying to do, or if I've interpreted the indentation correctly...but maybe this'll help. I'm assuming your data structures have the same layout as in the python script.
oh...and i'm interpreting the python as this:
def sim_distance(prefs,person1,person2):
# Get the list of shared_items
si={}
for item in prefs[person1]:
if item in prefs[person2]: si[item]=1
# if they have no ratings in common, return 0
if len(si)==0: return 0
# Add up the squares of all the differences
sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) for item in prefs[person1] if item in prefs[person2]])
return 1/(1+sum_of_squares)
In my news page project, I have a database table news with the following structure:
- id: [integer] unique number identifying the news entry, e.g.: *1983*
- title: [string] title of the text, e.g.: *New Life in America No Longer Means a New Name*
- topic: [string] category which should be chosen by the classificator, e.g: *Sports*
Additionally, there's a table bayes with information about word frequencies:
- word: [string] a word which the frequencies are given for, e.g.: *real estate*
- topic: [string] same content as "topic" field above, e.h. *Economics*
- count: [integer] number of occurrences of "word" in "topic" (incremented when new documents go to "topic"), e.g: *100*
Now I want my PHP script to classify all news entries and assign one of several possible categories (topics) to them.
Is this the correct implementation? Can you improve it?
<?php
include 'mysqlLogin.php';
$get1 = "SELECT id, title FROM ".$prefix."news WHERE topic = '' LIMIT 0, 150";
$get2 = mysql_abfrage($get1);
// pTOPICS BEGIN
$pTopics1 = "SELECT topic, SUM(count) AS count FROM ".$prefix."bayes WHERE topic != '' GROUP BY topic";
$pTopics2 = mysql_abfrage($pTopics1);
$pTopics = array();
while ($pTopics3 = mysql_fetch_assoc($pTopics2)) {
$pTopics[$pTopics3['topic']] = $pTopics3['count'];
}
// pTOPICS END
// pWORDS BEGIN
$pWords1 = "SELECT word, topic, count FROM ".$prefix."bayes";
$pWords2 = mysql_abfrage($pWords1);
$pWords = array();
while ($pWords3 = mysql_fetch_assoc($pWords2)) {
if (!isset($pWords[$pWords3['topic']])) {
$pWords[$pWords3['topic']] = array();
}
$pWords[$pWords3['topic']][$pWords3['word']] = $pWords3['count'];
}
// pWORDS END
while ($get3 = mysql_fetch_assoc($get2)) {
$pTextInTopics = array();
$tokens = tokenizer($get3['title']);
foreach ($pTopics as $topic=>$documentsInTopic) {
if (!isset($pTextInTopics[$topic])) { $pTextInTopics[$topic] = 1; }
foreach ($tokens as $token) {
echo '....'.$token;
if (isset($pWords[$topic][$token])) {
$pTextInTopics[$topic] *= $pWords[$topic][$token]/array_sum($pWords[$topic]);
}
}
$pTextInTopics[$topic] *= $pTopics[$topic]/array_sum($pTopics); // #documentsInTopic / #allDocuments
}
asort($pTextInTopics); // pick topic with lowest value
if ($chosenTopic = each($pTextInTopics)) {
echo '<p>The text belongs to topic '.$chosenTopic['key'].' with a likelihood of '.$chosenTopic['value'].'</p>';
}
}
?>
The training is done manually, it isn't included in this code. If the text "You can make money if you sell real estates" is assigned to the category/topic "Economics", then all words (you,can,make,...) are inserted into the table bayes with "Economics" as the topic and 1 as standard count. If the word is already there in combination with the same topic, the count is incremented.
Sample learning data:
word topic count
kaczynski Politics 1
sony Technology 1
bank Economics 1
phone Technology 1
sony Economics 3
ericsson Technology 2
Sample output/result:
Title of the text: Phone test Sony Ericsson Aspen - sensitive Winberry
Politics
....phone
....test
....sony
....ericsson
....aspen
....sensitive
....winberry
Technology
....phone FOUND
....test
....sony FOUND
....ericsson FOUND
....aspen
....sensitive
....winberry
Economics
....phone
....test
....sony FOUND
....ericsson
....aspen
....sensitive
....winberry
Result: The text belongs to topic Technology with a likelihood of 0.013888888888889
Thank you very much in advance!
It looks like your code is correct, but there are a few easy ways to optimize it. For example, you calculate p(word|topic) on the fly for every word while you could easily calculate these values beforehand. (I'm assuming you want to classify multiple documents here, if you're only doing a single document I suppose this is okay since you don't calculate it for words not in the document)
Similarly, the calculation of p(topic) could be moved outside of the loop.
Finally, you don't need to sort the entire array to find the maximum.
All small points! But that's what you asked for :)
I've written some untested PHP-code showing how I'd implement this below:
<?php
// Get word counts from database
$nWordPerTopic = mystery_sql();
// Calculate p(word|topic) = nWord / sum(nWord for every word)
$nTopics = array();
$pWordPerTopic = array();
foreach($nWordPerTopic as $topic => $wordCounts)
{
// Get total word count in topic
$nTopic = array_sum($wordCounts);
// Calculate p(word|topic)
$pWordPerTopic[$topic] = array();
foreach($wordCounts as $word => $count)
$pWordPerTopic[$topic][$word] = $count / $nTopic;
// Save $nTopic for next step
$nTopics[$topic] = $nTopic;
}
// Calculate p(topic)
$nTotal = array_sum($nTopics);
$pTopics = array();
foreach($nTopics as $topic => $nTopic)
$pTopics[$topic] = $nTopic / $nTotal;
// Classify
foreach($documents as $document)
{
$title = $document['title'];
$tokens = tokenizer($title);
$pMax = -1;
$selectedTopic = null;
foreach($pTopics as $topic => $pTopic)
{
$p = $pTopic;
foreach($tokens as $word)
{
if (!array_key_exists($word, $pWordPerTopic[$topic]))
continue;
$p *= $pWordPerTopic[$topic][$word];
}
if ($p > $pMax)
{
$selectedTopic = $topic;
$pMax = $p;
}
}
}
?>
As for the maths...
You're trying to maximize p(topic|words), so find
arg max p(topic|words)
(IE the argument topic for which p(topic|words) is the highest)
Bayes theorem says
p(topic)*p(words|topic)
p(topic|words) = -------------------------
p(words)
So you're looking for
p(topic)*p(words|topic)
arg max -------------------------
p(words)
Since p(words) of a document is the same for any topic this is the same as finding
arg max p(topic)*p(words|topic)
The naive bayes assumption (which makes this a naive bayes classifier) is that
p(words|topic) = p(word1|topic) * p(word2|topic) * ...
So using this, you need to find
arg max p(topic) * p(word1|topic) * p(word2|topic) * ...
Where
p(topic) = number of words in topic / number of words in total
And
p(word, topic) 1
p(word | topic) = ---------------- = p(word, topic) * ----------
p(topic) p(topic)
number of times word occurs in topic number of words in total
= -------------------------------------- * --------------------------
number of words in total number of words in topic
number of times word occurs in topic
= --------------------------------------
number of words in topic