How do you rearrange text within a string from a MySQL query? - php

Solution I am looking for:
I would like to rearrange words within the text string results such that the job title is moved from the end of the string to the beginning of the string for each line item.
Currently, I am retrieving data from an external medical database query ($query). However, I cannot make any changes to the database or to the MySQL query statement itself.
The $query is retrieved and I then place the results in a $data array via the following command:
while($row = mysql_fetch_assoc($query)){$data[] = $row;}
I then change all the job titles to uppercase in the $data array as follows:
$job_01 = 'anesthesiologist';
$job_02 = 'dentist';
$job_03 = 'general practitioner';
$job_04 = 'internist';
$job_05 = 'lawyer';
$job_06 = 'manager';
$job_07 = 'pediatrician';
$job_08 = 'psychiatrist';
$replace_01 = 'ANESTHESIOLOGIST';
$replace_02 = 'DENTIST';
$replace_03 = 'GENERAL PRACTITIONER';
$replace_04 = 'INTERNIST';
$replace_05 = 'LAWYER';
$replace_06 = 'MANAGER';
$replace_07 = 'PEDIATRICIAN';
$replace_08 = 'PSYCHIATRIST';
$searchArray = array($job_01, $job_02, $job_03, $job_04, $job_05, $job_06, $job_07, $job_08);
$replaceArray = array($replace_01, $replace_02, $replace_03, $replace_04, $replace_05, $replace_06, $replace_07, $replace_08);
for ($i=0; $i<=count($data)-1; $i++) {
$line[$i] = str_ireplace($searchArray, $replaceArray, $data[$i]));
}
The final output is in the following line item text string format:
Example Query results (4 line items)
California Long time medical practitioner - ANESTHESIOLOGIST 55yr
New York Specializing in working with semi-passive children - PEDIATRICIAN (doctor) 42yr
Nevada Currently working in a new medical office - PSYCHIATRIST 38yr
Texas Represents the medical-liability industry - LAWYER (attorney) 45yr
I would like to rearrange these results such that I can output the data to my users in the following format by moving the job title to the beginning of each line item as in:
Desired results (usually over 1000 items)
ANESTHESIOLOGIST - California Long time medical practitioner - 55yr
PEDIATRICIAN - New York Specializing in working with semi-passive children - (doctor) 42yr
PSYCHIATRIST - Nevada Currently working in a new medical office - psychiatrist 38yr
LAWYER - Texas Represents the medical-liability industry - lawyer (attorney) 45yr
Ideally, if possible, it would also be nice to have the age moved to the beginning of the text string results as follows:
Ideal Results
55yr - ANESTHESIOLOGIST - California Long time medical practitioner
42yr - PEDIATRICIAN - New York Specializing in working with semi-passive children - (doctor)
38yr - PSYCHIATRIST - Nevada Currently working in a new medical office - psychiatrist
45yr - LAWYER - Texas Represents the medical-liability industry - lawyer (attorney)

You could use a regular expression to extract and rearrange the array:
for ($i=0; $i<=count($data)-1; $i++) {
$line[$i] = str_ireplace($searchArray, $replaceArray, $data[$i]));
// variant a, complete line
if(preg_match_all('/(.*)\s+-\s+(.*)\s+(\d+)yr$/', $line[$i],$matches)) {
$line[$i] = $matches[3][0].'yr - '.$matches[2][0].' - '.$matches[1][0];
// variant b, a line with age, but no jobtitle
} elseif(preg_match_all('/(.*)\s+-\s+(\d+)yr$/', $line[$i],$matches)) {
$line[$i] = $matches[2][0].'yr - '.$matches[1][0];
// variant c, no age
} elseif(preg_match_all('/(.*)\s+-\s+(.*)$/', $line[$i],$matches)) {
$line[$i] = $matches[2][0].' - '.$matches[1][0];
}
// in other cases (no age, no jobtitle), the line is not modified at all.
}

Related

How to get explode #img::value to lookup in my database

I want to get value from my content such as #img::56:
#img::56 => value=56 to get record from datatabase mysql_query("select * from tblgallery where g_id=56");.
Tara Angkor Hotel is the first 4-Star Luxury Hotel built in the mystical land of Angkor. #img::56 Ideally and #youtube::https://www.youtube.com/watch?v=k4YRWT_Aldo conveniently located, Tara Angkor Hotel is situated only 6 km from the World Heritage site of Angkor Wat Temples, 15 min drive from the Siem Reap International Airport, a few minutes stroll to the Angkor National Museum and #img::41 a short ride to the city town center with an array of Cambodian souvenirs, shopping and culture.
I want to get result
Tara Angkor Hotel is the first 4-Star Luxury Hotel built in the mystical land of Angkor. mysql_query("select * from tblgallery where g_id=56") Ideally and <iframe width="560" height="315" src="//www.youtube.com/embed/k4YRWT_Aldo" frameborder="0" allowfullscreen></iframe> conveniently located, Tara Angkor Hotel is situated only 6 km from the World Heritage site of Angkor Wat Temples, 15 min drive from the Siem Reap International Airport, a few minutes stroll to the Angkor National Museum and mysql_query("select * from tblgallery where g_id=41") a short ride to the city town center with an array of Cambodian souvenirs, shopping and culture.
If I understand your question, you want to retrieve the 56 from the string #img::56?
If so it can be done as follows.
$string = '#img::56';
// Explode the string as the delimiter.
$components = explode('::', $string);
// Ensure the components array has two elements.
if(count($components) < 2) {
// Do something here to indicate an error has occurred.
//Throw an exception (preferred) or emit an error
}
$id = $components[1];
// You can use even more type checking here to ensure you have the right piece of data.
if(!ctype_digit($id)) {
//Throw an exception (preferred) or emit an error
}
// Accessing the database using a PDO object and prepared statements
$pdo = new \PDO('mysql:host=...', 'username', 'password')
$statement = $pdo->prepare('SELECT * FROM tblgallary WHERE g_id = :id');
$statement->bindParam(':id', $id, \PDO::PARAM_INT);
if(!$statement->execute()) {
//Throw an exception (preferred) or emit an error
}
$result = $statement->fetchAll(\PDO::FETCH_ASSOC);
// End the database connection
$pdo = null;
Now you have the id. If you need the $id to be an integer you can type cast it with (int) $components[1] instead.
Hope this helps.

Get live NFL scores/stats to read and manipulate?

I need some sort of database or feed to access live scores(and possibly player stats) for the NFL. I want to be able to display the scores on my site for my pickem league and show the users if their pick is winning or not.
I'm not sure how to go about this. Can someone point me in the right direction?
Also, it needs to be free.
Disclaimer: I'm the author of the tools I'm about to promote.
Over the past year, I've written a couple Python libraries that will do what you want. The first is nflgame, which gathers game data (including play-by-play) from NFL.com's GameCenter JSON feed. This includes active games where data is updated roughly every 15 seconds. nflgame has a wiki with some tips on getting started.
I released nflgame last year, and used it throughout last season. I think it is reasonably stable.
Over this past summer, I've worked on its more mature brother, nfldb. nfldb provides access to the same kind of data nflgame does, except it keeps everything stored in a relational database. nfldb also has a wiki, although it isn't entirely complete yet.
For example, this will output all current games and their scores:
import nfldb
db = nfldb.connect()
phase, year, week = nfldb.current(db)
q = nfldb.Query(db).game(season_year=year, season_type=phase, week=week)
for g in q.as_games():
print '%s (%d) at %s (%d)' % (g.home_team, g.home_score,
g.away_team, g.away_score)
Since no games are being played, that outputs all games for next week with 0 scores. This is the output with week=1: (of the 2013 season)
CLE (10) at MIA (23)
DET (34) at MIN (24)
NYJ (18) at TB (17)
BUF (21) at NE (23)
SD (28) at HOU (31)
STL (27) at ARI (24)
SF (34) at GB (28)
DAL (36) at NYG (31)
WAS (27) at PHI (33)
DEN (49) at BAL (27)
CHI (24) at CIN (21)
IND (21) at OAK (17)
JAC (2) at KC (28)
PIT (9) at TEN (16)
NO (23) at ATL (17)
CAR (7) at SEA (12)
Both are licensed under the WTFPL and are free to use for any purpose.
N.B. I realized you tagged this as PHP, but perhaps this will point you in the right direction. In particular, you could use nfldb to maintain a PostgreSQL database and query it with your PHP program.
So I found something that gives me MOST of what I was looking for. It has live game stats, but doesn't include current down, yards to go, and field position.
Regular Season:
http://www.nfl.com/liveupdate/scorestrip/ss.xml
Post Season:
http://www.nfl.com/liveupdate/scorestrip/postseason/ss.xml
I'd still like to find a live player stat feed to use to add Fantasy Football to my website, but I don't think a free one exists.
I know this is old, but this is what I use for scores only... maybe it will help someone some day. Note: there are some elements that you will not use and are specific for my site... but this would be a very good start for someone.
<?php
require('includes/application_top.php');
$week = (int)$_GET['week'];
//load source code, depending on the current week, of the website into a variable as a string
$url = "http://www.nfl.com/liveupdate/scorestrip/ss.xml"; //LIVE GAMES
if ($xmlData = file_get_contents($url)) {
$xml = simplexml_load_string($xmlData);
$json = json_encode($xml);
$games = json_decode($json, true);
}
$teamCodes = array(
'JAC' => 'JAX',
);
//build scores array, to group teams and scores together in games
$scores = array();
foreach ($games['gms']['g'] as $gameArray) {
$game = $gameArray['#attributes'];
//ONLY PULL SCORES FROM COMPLETED GAMES - F=FINAL, FO=FINAL OVERTIME
if ($game['q'] == 'F' || $game['q'] == 'FO') {
$overtime = (($game['q'] == 'FO') ? 1 : 0);
$away_team = $game['v'];
$home_team = $game['h'];
foreach ($teamCodes as $espnCode => $nflpCode) {
if ($away_team == $espnCode) $away_team = $nflpCode;
if ($home_team == $espnCode) $home_team = $nflpCode;
}
$away_score = (int)$game['vs'];
$home_score = (int)$game['hs'];
$winner = ($away_score > $home_score) ? $away_team : $home_team;
$gameID = getGameIDByTeamID($week, $home_team);
if (is_numeric(strip_tags($home_score)) && is_numeric(strip_tags($away_score))) {
$scores[] = array(
'gameID' => $gameID,
'awayteam' => $away_team,
'visitorScore' => $away_score,
'hometeam' => $home_team,
'homeScore' => $home_score,
'overtime' => $overtime,
'winner' => $winner
);
}
}
}
//see how the scores array looks
//echo '<pre>' . print_r($scores, true) . '</pre>';
echo json_encode($scores);
//game results and winning teams can now be accessed from the scores array
//e.g. $scores[0]['awayteam'] contains the name of the away team (['awayteam'] part) from the first game on the page ([0] part)
I've spent the last year or so working on a simple CLI tool to easily create your own NFL databases. It currently supports PostgreSql and Mongo natively, and you can programmatically interact with the Engine if you'd like to extend it.
Want to create your own different database (eg MySql) using the Engine (or even use Postgres/Mongo but with your own schema)? Simply implement an interface and the Engine will do the work for you.
Running everything, including the database setup and updating with all the latest stats, can be done in a single command:
ffdb setup
I know this question is old, but I also realize that there's still a need out there for a functional and easy-to-use tool to do this. The entire reason I built this is to power my own football app in the near future, and hopefully this can help others.
Also, because the question is fairly old, a lot of the answers are not working at the current time, or reference projects that are no longer maintained.
Check out the github repo page for full details on how to download the program, the CLI commands, and other information:
FFDB Github Repository
$XML = "http://www.nfl.com/liveupdate/scorestrip/ss.xml";
$lineXML = file_get_contents($XML);
$subject = $lineXML;
//match and capture week then print
$week='/w="([0-9])/';
preg_match_all($week, $subject, $week);
echo "week ".$week[1][0]."<br/>";
$week2=$week[1][0];
echo $week2;
//capture team, scores in two dimensional array
$pattern = '/hnn="(.+)"\shs="([0-9]+)"\sv="[A-Z]+"\svnn="(.+)"\svs="([0-9]+)/';
preg_match_all($pattern, $subject, $matches);
//enumerate length of array (number games played)
$count= count($matches[0]);
//print array values
for ($x = 0; $x < $count ; $x++) {
echo"<br/>";
//print home team
echo $matches[1][$x]," ",
//print home score
$matches[2][$x]," ",
//print visitor team
$matches[3][$x]," ",
//print visitor score
$matches[4][$x];
echo "<br/>";
}
I was going through problems finding a new source for the 2021 season. Well I finally found one on ESPN.
http://site.api.espn.com/apis/site/v2/sports/football/nfl/scoreboard
Returns the results in JSON format.
I recommend registering at http://developer.espn.com and get access to their JSON API. It just took me 5 minutes and they have documentation to make pretty much any call you need.

Convert tabbed text into tree (SQL insert each line into TreeNode table (TreeNodeID, ParentID, Title)) with PHP

I'm experimenting with ArborJS, attempting to build a Knowledge Tree. Here is my test area (left click to enter a node, right click to get back to the beginning). I have "all" of the "Humanities and the Arts" section fleshed out, so I suggest playing through that area.
I'm building this tree from Wikipedia's List of Academic Disciplines article.
Now, I am pulling data from one mySQL table (via PHP). The table structure is TreeNodeID, ParentID, Title. The "TreeNodeID" is the primary key (autoincrementing), "ParentID" is the node's parent, the "Title" is the text that should be displayed on the node.
I'm now on page 7 of 27 on this article. I feel like I'm not taking advantage of my computer's ability to automate this process of typing in manually.
I just made a text file of all the subjects. It's in the the following format:
Anthropology
Biological Anthropology
Forensic Anthropology
Gene-Culture Coevolution
Human Behavioral Ecology
Anthropological Linguistics
Synchronic Linguistics
Diachronic Linguistics
Ethnolinguistics
Socioloinguistics
Cultural Anthropology
Anthropology of Religion
Economic Anthropology
Archaelogy
...
How can I use PHP to go through this and fill my database (with the correct ParentIDs for each node)?
UPDATE #3: The working code (given by correct answer below)
<?php
//echo "Checkpoint 1";
$data = "
Social sciences
Anthropology
Biological anthropology
Forensic anthropology
Gene-culture coevolution
Human behavioral ecology
Human evolution
Medical anthropology
Paleoanthropology
Population genetics
Primatology
Anthropological linguistics
Synchronic linguistics (or Descriptive linguistics)
Diachronic linguistics (or Historical linguistics)
Ethnolinguistics
Sociolinguistics
Cultural anthropology
Anthropology of religion
Economic anthropology
Ethnography
Ethnohistory
Ethnology
Ethnomusicology
Folklore
Mythology
Political anthropology
Psychological anthropology
Archaeology
...(goes on for a long time)
";
//echo "Checkpoint 2\n";
$lines = preg_split("/\n/", $data);
$parentids = array(0 => null);
$db = new PDO("host", 'username', 'pass');
$sql = 'INSERT INTO `TreeNode` SET ParentID = ?, Title = ?';
$stmt = $db->prepare($sql);
//echo "Checkpoint 3\n";
foreach ($lines as $line) {
if (!preg_match('/^([\s]*)(.*)$/', $line, $m)) {
continue;
}
$spaces = strlen($m[1]);
//$level = intval($spaces / 4); //assumes four spaces per indent
$level = strlen($m[1]); // if data is tab indented
$title = $m[2];
$parentid = ($level > 0 ? $parentids[$level - 1] : 1); //All "roots" are children of "Academia" which has an ID of "1";
$rv = $stmt->execute(array($parentid, $title));
$parentids[$level] = $db->lastInsertId();
echo "inserted $parentid - " . $parentid . " title: " . $title . "\n";
}
?>
Untested, but this should work for you (uses PDO):
<?php
$data = "
Anthropology
Biological Anthropology
Forensic Anthropology
Gene-Culture Coevolution
Human Behavioral Ecology
Anthropological Linguistics
Synchronic Linguistics
Diachronic Linguistics
Ethnolinguistics
Socioloinguistics
Cultural Anthropology
Anthropology of Religion
Economic Anthropology
Archaelogy
";
$lines = preg_split("/\n/", $data);
$parentids = array(0 => null);
$sql = 'INSERT INTO `table` SET ParentID = ?, Title = ?';
$stmt = $db->prepare($sql);
foreach ($lines as $line) {
if (!preg_match('/^([\s]*)(.*)$/', $line, $m)) {
continue;
}
#$spaces = strlen($m[1]);
#$level = intval($spaces / 4); # if data is space indented
$level = strlen($m[1]); # assumes data is tab indented
$title = $m[2];
$parentid = $level > 0
? $parentids[$level - 1]
null;
$rv = $stmt->execute(array($parentid, $title));
$parentids[$level] = $db->lastInsertId();
}
I would say that it's easier to copy-paste to a text file first, indenting as you have done above. Then parse it:
read each line (one at a time), gives you the node text.
each indent is a new child so the prev node is the parent id
check for dedents - count \t's if you've been consistent or keep a count of indent level. Watch out for 0-indent (roots).
This will allow you to build an associative array containing each of the disciplines. Then you interpret that. For example:
Get all the root nodes (1st child of root, depending) and give them an incremental id parse_id.
Continue along the array from above, assigning parse_ids to all nodes.
Then start putting that data into MySQL. As you do, add mysqli_insert_id in the array along with parse_id, calling db_id for example. That should be used to associate the parent_id required in the db with the parent's parse_id.
Assuming you are not trying to check for common studies or unique node text, that should be straightforward enough.
you can try with the following
// parser.php
<?php
include_once './vendor/autoload.php';
use Symfony\Component\DomCrawler\Crawler;
$crawler = new Crawler(file_get_contents('http://en.wikipedia.org/wiki/List_of_academic_disciplines'));
$texts = $crawler->filter('.tocnumber + .toctext');
$numbers = $crawler->filter('.tocnumber');
$last = '';
for ($i=0; $i < count($numbers); $i++) {
$value = $numbers->eq($i)->text();
if(!preg_match('/\d+.\d+/', $value)) {
// is a root discipline
$last = $texts->eq($i)->text();
} else {
// is a leaf discipline
$disciplines[$last][$texts->eq($i)->text()] = $texts->eq($i)->text();
}
}
var_dump($disciplines);
with this you can do some more like persist in database or any and is useful for others DOM parsing tasks
i used CssSelector and DomCrawler from Symfony Components is easy to install
composer.json
{
"name": "wiki-parser",
"require": {
"php": ">=5.3.3",
"symfony/dom-crawler": "2.1.0",
"symfony/css-selector": "2.1.0"
}
}
in the console
$ php composer.phar install
take a look for getcomposer

PHP getting the value of the highest dynamic select

I have a series of selects that contain world regions.
For instance select r0 would contain
Africa
North America
Europe
When the user selects North America, a new select named r1 would appear with the following values:
Canada
United States
Mexico
Then the user would select US, r2 would appear with the states and so on.
As the data structure allows, currently, there can be up to 5 boxes (r0-r4)
I am trying to figure out how in php I can determine that there are 4 or 5 selects, and save that value of the highest number select to the database.
Am I going at this the wrong way?
Currently, I don't have any code written, because I'm not sure how to test the range of the $_POST["r#"] arrays, but was thinking something along the lines of:
<?php
$i = 0;
while (isset($_POST['r'.$i])) {
$highest_value = $_POST['r'.$i];
$i++;
}
?>
is there a better way?
I will try this :
$Value = null;
for ($i = 5; $i >= 0; $i--) {
if(isset($_POST['r'.$i]) AND $_POST['r'.$i]){
$Value = $_POST['r'.$i];
break;
}
}
I did not test it out.

PHP implementation of Bayes classificator: Assign topics to texts

In my news page project, I have a database table news with the following structure:
- id: [integer] unique number identifying the news entry, e.g.: *1983*
- title: [string] title of the text, e.g.: *New Life in America No Longer Means a New Name*
- topic: [string] category which should be chosen by the classificator, e.g: *Sports*
Additionally, there's a table bayes with information about word frequencies:
- word: [string] a word which the frequencies are given for, e.g.: *real estate*
- topic: [string] same content as "topic" field above, e.h. *Economics*
- count: [integer] number of occurrences of "word" in "topic" (incremented when new documents go to "topic"), e.g: *100*
Now I want my PHP script to classify all news entries and assign one of several possible categories (topics) to them.
Is this the correct implementation? Can you improve it?
<?php
include 'mysqlLogin.php';
$get1 = "SELECT id, title FROM ".$prefix."news WHERE topic = '' LIMIT 0, 150";
$get2 = mysql_abfrage($get1);
// pTOPICS BEGIN
$pTopics1 = "SELECT topic, SUM(count) AS count FROM ".$prefix."bayes WHERE topic != '' GROUP BY topic";
$pTopics2 = mysql_abfrage($pTopics1);
$pTopics = array();
while ($pTopics3 = mysql_fetch_assoc($pTopics2)) {
$pTopics[$pTopics3['topic']] = $pTopics3['count'];
}
// pTOPICS END
// pWORDS BEGIN
$pWords1 = "SELECT word, topic, count FROM ".$prefix."bayes";
$pWords2 = mysql_abfrage($pWords1);
$pWords = array();
while ($pWords3 = mysql_fetch_assoc($pWords2)) {
if (!isset($pWords[$pWords3['topic']])) {
$pWords[$pWords3['topic']] = array();
}
$pWords[$pWords3['topic']][$pWords3['word']] = $pWords3['count'];
}
// pWORDS END
while ($get3 = mysql_fetch_assoc($get2)) {
$pTextInTopics = array();
$tokens = tokenizer($get3['title']);
foreach ($pTopics as $topic=>$documentsInTopic) {
if (!isset($pTextInTopics[$topic])) { $pTextInTopics[$topic] = 1; }
foreach ($tokens as $token) {
echo '....'.$token;
if (isset($pWords[$topic][$token])) {
$pTextInTopics[$topic] *= $pWords[$topic][$token]/array_sum($pWords[$topic]);
}
}
$pTextInTopics[$topic] *= $pTopics[$topic]/array_sum($pTopics); // #documentsInTopic / #allDocuments
}
asort($pTextInTopics); // pick topic with lowest value
if ($chosenTopic = each($pTextInTopics)) {
echo '<p>The text belongs to topic '.$chosenTopic['key'].' with a likelihood of '.$chosenTopic['value'].'</p>';
}
}
?>
The training is done manually, it isn't included in this code. If the text "You can make money if you sell real estates" is assigned to the category/topic "Economics", then all words (you,can,make,...) are inserted into the table bayes with "Economics" as the topic and 1 as standard count. If the word is already there in combination with the same topic, the count is incremented.
Sample learning data:
word topic count
kaczynski Politics 1
sony Technology 1
bank Economics 1
phone Technology 1
sony Economics 3
ericsson Technology 2
Sample output/result:
Title of the text: Phone test Sony Ericsson Aspen - sensitive Winberry
Politics
....phone
....test
....sony
....ericsson
....aspen
....sensitive
....winberry
Technology
....phone FOUND
....test
....sony FOUND
....ericsson FOUND
....aspen
....sensitive
....winberry
Economics
....phone
....test
....sony FOUND
....ericsson
....aspen
....sensitive
....winberry
Result: The text belongs to topic Technology with a likelihood of 0.013888888888889
Thank you very much in advance!
It looks like your code is correct, but there are a few easy ways to optimize it. For example, you calculate p(word|topic) on the fly for every word while you could easily calculate these values beforehand. (I'm assuming you want to classify multiple documents here, if you're only doing a single document I suppose this is okay since you don't calculate it for words not in the document)
Similarly, the calculation of p(topic) could be moved outside of the loop.
Finally, you don't need to sort the entire array to find the maximum.
All small points! But that's what you asked for :)
I've written some untested PHP-code showing how I'd implement this below:
<?php
// Get word counts from database
$nWordPerTopic = mystery_sql();
// Calculate p(word|topic) = nWord / sum(nWord for every word)
$nTopics = array();
$pWordPerTopic = array();
foreach($nWordPerTopic as $topic => $wordCounts)
{
// Get total word count in topic
$nTopic = array_sum($wordCounts);
// Calculate p(word|topic)
$pWordPerTopic[$topic] = array();
foreach($wordCounts as $word => $count)
$pWordPerTopic[$topic][$word] = $count / $nTopic;
// Save $nTopic for next step
$nTopics[$topic] = $nTopic;
}
// Calculate p(topic)
$nTotal = array_sum($nTopics);
$pTopics = array();
foreach($nTopics as $topic => $nTopic)
$pTopics[$topic] = $nTopic / $nTotal;
// Classify
foreach($documents as $document)
{
$title = $document['title'];
$tokens = tokenizer($title);
$pMax = -1;
$selectedTopic = null;
foreach($pTopics as $topic => $pTopic)
{
$p = $pTopic;
foreach($tokens as $word)
{
if (!array_key_exists($word, $pWordPerTopic[$topic]))
continue;
$p *= $pWordPerTopic[$topic][$word];
}
if ($p > $pMax)
{
$selectedTopic = $topic;
$pMax = $p;
}
}
}
?>
As for the maths...
You're trying to maximize p(topic|words), so find
arg max p(topic|words)
(IE the argument topic for which p(topic|words) is the highest)
Bayes theorem says
p(topic)*p(words|topic)
p(topic|words) = -------------------------
p(words)
So you're looking for
p(topic)*p(words|topic)
arg max -------------------------
p(words)
Since p(words) of a document is the same for any topic this is the same as finding
arg max p(topic)*p(words|topic)
The naive bayes assumption (which makes this a naive bayes classifier) is that
p(words|topic) = p(word1|topic) * p(word2|topic) * ...
So using this, you need to find
arg max p(topic) * p(word1|topic) * p(word2|topic) * ...
Where
p(topic) = number of words in topic / number of words in total
And
p(word, topic) 1
p(word | topic) = ---------------- = p(word, topic) * ----------
p(topic) p(topic)
number of times word occurs in topic number of words in total
= -------------------------------------- * --------------------------
number of words in total number of words in topic
number of times word occurs in topic
= --------------------------------------
number of words in topic

Categories