Naive Bayes in PHP - php

I’m trying to implement a Naive Bayes classifier in PHP, and I found this script.
I’m running the script on a standard LAMP stack (php-fpm), and I am getting a an error:
Fatal error: Call to undefined function 3184791920() in
/location/file.php on line 73
But I can't figure out what is causing this, since there is no 3184791920() function. I assume it has something to do with the hashing:
define("LL_NB_HASH_FUNCTION", "crc32");
Here is the full text of my implementation:
<?php
global $LL_NB_STOP_WORDS;
$LL_NB_STOP_WORDS = array("a", "about", "above", "...");
define("LL_NB_HASH_FUNCTION", "crc32");// crc32 is the fastest built in hash function.
// $xs is a bunch of "strings" and ys are their labels.
function ll_naivebayes($xs, $ys, $testStrings) {
$topicWords = array();
foreach($xs as $i=>$x) {
if(isset($topicWords[$ys[$i]]))
$topicWords[$ys[$i]] .= $x;
else
$topicWords[$ys[$i]] = $x;
}
$topicWords = _ll_computeWordCounts($topicWords); // get the number of each word, by topic.
$probWordsGivenTopic = array(); // probability of each word in a given topic.
$countTopics = array();
foreach($topicWords as $topicIndex=>$xWordCounts) {
$totalWordsTopic = array_sum($xWordCounts);
$countTopics[$topicIndex] = $total_wordsTopic;
foreach($xCount as $hash=>$count) {
$probWordsGivenTopic[$topicIndex][$hash] = ($count/$totalWordsTopic);
}
}
$probTopics = array(); // probability of a given topic (number of words / total words), i.e., relative frequency of topics in terms of words
foreach($countTopics as $i=>$topicCount) {
$probTopics[$i] = ($topicCount/$totalWords);
}
if(!is_array($testStrings))
$testStrings = array($testStrings);
// process the input testStrings array
$return = array();
foreach($testStrings as $i=>$string) {
$testStringWords = _ll_computeWordCount($string);
$topicsPosterior = array();
foreach($probTopics as $key=>$probTopic) {
$p = $probTopic;
foreach($testStringWords as $hash=>$count) {
if(isset($probWordsGivenTopic[$key][$hash]))
$p *= $probWordsGivenTopic[$key][$hash] * $count;
}
$topicsPosterior[$key] = $p;
}
sort($topicsPosterior);
$return[$i] = $topicsPosterior;
}
return $return;
}
function _ll_computeWordCounts($strings) {
$wcs = array();
foreach($strings as $string) {
$wcs[] = _ll_computeWordCount($string);
}
return $wcs;
}
function _ll_computeWordCount($string) {
$string = trim($string);
$string = explode(' ', $string);
natcasesort($string);
$hash = LL_NB_HASH_FUNCTION;
$words = array();
for($i=0, $count = count($string); $i<$count; $i++) {
$word = trim($string[$i]);
if(preg_match('/[^a-zA-Z\']/', $word))
continue;
$hash = (string) $hash($word);
if(!isset($words[$hash]))
$words[$hash] = 1; //$words[$hash] = array('word'=>$word, 'count'=>1);
else
$words[$hash]++; //$words[$hash]['count']++;
}
return $words;
}
$output = ll_naivebayes(array("Will I marry John", "Marriage is cool", "A string about Windows XP"), array("marriage", "marriage", "windows"), array("this is about marriage"));
?>

It looks like a bug see my comments in the code
function _ll_computeWordCount($string) {
$string = trim($string);
$string = explode(' ', $string);
natcasesort($string);
$hash = LL_NB_HASH_FUNCTION; // $hash = crc32
$words = array();
for($i=0, $count = count($string); $i<$count; $i++) {
$word = trim($string[$i]);
if(preg_match('/[^a-zA-Z\']/', $word))
continue;
$hash = (string) $hash($word); // 1st iteration $hash = crc32($word)
//2nd iteration $hash = 2949202($word) - fatal error
if(!isset($words[$hash]))
$words[$hash] = 1; //$words[$hash] = array('word'=>$word, 'count'=>1);
else
$words[$hash]++; //$words[$hash]['count']++;
}
return $words;
}
Try this
function _ll_computeWordCount($string) {
$string = trim($string);
$string = explode(' ', $string);
natcasesort($string);
$hash_function = LL_NB_HASH_FUNCTION;
$words = array();
for($i=0, $count = count($string); $i<$count; $i++) {
$word = trim($string[$i]);
if(preg_match('/[^a-zA-Z\']/', $word))
continue;
$hash = (string) $hash_function($word);
if(!isset($words[$hash]))
$words[$hash] = 1; //$words[$hash] = array('word'=>$word, 'count'=>1);
else
$words[$hash]++; //$words[$hash]['count']++;
}
return $words;
}

Related

PHP, Fatal error: Uncaught Error: Using $this when not in object context in

I don't understand why this error occure.
I want change some string in array on data from database, so i used preg_replace_callback. But when I use in definition of callback function
$row = $this->result->fetch_assoc();
parser replies with error
All code:
public function tRsql() {
$argNums = func_num_args();
$argsArr = func_get_args();
function change($matches) {
if(stripos($matches[0], "sql:")) {
$str = ltrim($matches[0], "#sql:");
$str = rtrim($str, ":");
$row = $this->result->fetch_assoc();
$str = $row[$str];
return $str;
} else {
return $matches[0];
}
}
for($i = 0; $i < $this->numOfRows; $i++) {
$argsArr = preg_replace_callback("/(#sql\:)\S+\:/", "change", $argsArr);
$this->tR(implode(",",$argsArr));
}
}
Do it in this way:
public function tRsql() {
$argNums = func_num_args();
$argsArr = func_get_args();
$change = function ($matches) {
if(stripos($matches[0], "sql:")) {
$str = ltrim($matches[0], "#sql:");
$str = rtrim($str, ":");
$row = $this->result->fetch_assoc();
$str = $row[$str];
return $str;
} else {
return $matches[0];
}
}
for($i = 0; $i < $this->numOfRows; $i++) {
$argsArr = preg_replace_callback("/(#sql\:)\S+\:/", $change, $argsArr);
$this->tR(implode(",",$argsArr));
}
}
Read more here: http://php.net/manual/en/functions.anonymous.php

Comma separated string to parent child relationship array php

I have a comma separated string like
$str = "word1,word2,word3";
And i want to make a parent child relationship array from it.
Here is an example:
Try this simply making own function as
$str = "word1,word2,word3";
$res = [];
function makeNested($arr) {
if(count($arr)<2)
return $arr;
$key = array_shift($arr);
return array($key => makeNested($arr));
}
print_r(makeNested(explode(',', $str)));
Demo
function tooLazyToCode($string)
{
$structure = null;
foreach (array_reverse(explode(',', $string)) as $part) {
$structure = ($structure == null) ? $part : array($part => $structure);
}
return $structure;
}
Please check below code it will take half of the time of the above answers:
<?php
$str = "sports,cricket,football,hockey,tennis";
$arr = explode(',', $str);
$result = array();
$arr_len = count($arr) - 1;
$prev = $arr_len;
for($i = $arr_len; $i>=0;$i--){
if($prev != $i){
$result = array($arr[$i] => $result);
} else {
$result = array ($arr[$i]);
}
$prev = $i;
}
echo '<pre>',print_r($result),'</pre>';
Here is another code for you, it will give you result as you have asked :
<?php
$str = "sports,cricket,football,hockey,tennis";
$arr = explode(',', $str);
$result = array();
$arr_len = count($arr) - 1;
$prev = $arr_len;
for($i = $arr_len; $i>=0;$i--){
if($prev != $i){
if($i == 0){
$result = array($arr[$i] => $result);
}else{
$result = array(array($arr[$i] => $result));
}
} else {
$result = array ($arr[$i]);
}
$prev = $i;
}
echo '<pre>',print_r($result),'</pre>';

Multiple String Replace Based on Index

I need to replace multiple sections of a string based on their indices.
$string = '01234567890123456789';
$replacements = array(
array(3, 2, 'test'),
array(8, 2, 'haha')
);
$expected_result = '012test567haha0123456789';
Indices in $replacements are expected not to have overlaps.
I have been trying to write my own solution, split the original array into multiple pieces based on sections which needs to be replaced or not, and finally combine them:
echo str_replace_with_indices($string, $replacements);
// outputs the expected result '012test567haha0123456789'
function str_replace_with_indices ($string, $replacements) {
$string_chars = str_split($string);
$string_sections = array();
$replacing = false;
$section = 0;
foreach($string_chars as $char_idx => $char) {
if ($replacing != (($r_idx = replacing($replacements, $char_idx)) !== false)) {
$replacing = !$replacing;
$section++;
}
$string_sections[$section] = $string_sections[$section] ? $string_sections[$section] : array();
$string_sections[$section]['original'] .= $char;
if ($replacing) $string_sections[$section]['new'] = $replacements[$r_idx][2];
}
$string_result = '';
foreach($string_sections as $s) {
$string_result .= ($s['new']) ? $s['new'] : $s['original'];
}
return $string_result;
}
function replacing($replacements, $idx) {
foreach($replacements as $r_idx => $r) {
if ($idx >= $r[0] && $idx < $r[0]+$r[1]) {
return $r_idx;
}
}
return false;
}
Is there any more effective way to achieve the same result?
The above solution doesn't look elegant and feels quite long for string replacement.
Use this
$str = '01234567890123456789';
$rep = array(array(3,3,'test'), array(8,2,'haha'));
$index = 0;
$ctr = 0;
$index_strlen = 0;
foreach($rep as $s)
{
$index = $s[0]+$index_strlen;
$str = substr_replace($str, $s[2], $index, $s[1]);
$index_strlen += strlen($s[2]) - $s[1];
}
echo $str;

Parsing vCard in php

Hi i want to parse vCard format to a array. User may upload vCard 2,1 or vCard 3.0 i should be able to parse it. I just want the email with names in the vCard in to a php array.
i have tried vcardphp.sourceforge.net.
<?php
require("vcard.php");
$cards = parse_vcards(file('sample.txt'));
print_r($cards);
function parse_vcards($lines)
{
$cards = array();
$card = new VCard();
while ($card->parse($lines)) {
$property = $card->getProperty('N');
if (!$property) {
return "";
}
$n = $property->getComponents();
$tmp = array();
if ($n[3]) $tmp[] = $n[3]; // Mr.
if ($n[1]) $tmp[] = $n[1]; // John
if ($n[2]) $tmp[] = $n[2]; // Quinlan
if ($n[4]) $tmp[] = $n[4]; // Esq.
$ret = array();
if ($n[0]) $ret[] = $n[0];
$tmp = join(" ", $tmp);
if ($tmp) $ret[] = $tmp;
$key = join(", ", $ret);
$cards[$key] = $card;
// MDH: Create new VCard to prevent overwriting previous one (PHP5)
$card = new VCard();
}
ksort($cards);
return $cards;
}
?>
Undefined index: ENCODING in H:\www\vcardphp\vcard.php on line 146
Notice: Undefined index: CHARSET in H:\www\vcardphp\vcard.php on line 149
and the sample code given doesnt work at all Too many Undefined index: errors
I would take a look at the open source project vCard PHP. Has worked for me!
http://vcardphp.sourceforge.net/
It's just that the http://vcardphp.sourceforge.net/ sample doesn't work with the given code. You can modify the code to make it work (so it doesn't fail on missing data - first from vbook.php:
See the added: if (!empty($n[*])) $tmp[] = $n[*];
function parse_vcards(&$lines)
{
$cards = array();
$card = new VCard();
while ($card->parse($lines)) {
$property = $card->getProperty('N');
if (!$property) {
return "";
}
$n = $property->getComponents();
$tmp = array();
if (!empty($n[3])) $tmp[] = $n[3]; // Mr.
if (!empty($n[1])) $tmp[] = $n[1]; // John
if (!empty($n[2])) $tmp[] = $n[2]; // Quinlan
if (!empty($n[4])) $tmp[] = $n[4]; // Esq.
$ret = array();
if (!empty($n[0])) $ret[] = $n[0];
$tmp = join(" ", $tmp);
if ($tmp) $ret[] = $tmp;
$key = join(", ", $ret);
$cards[$key] = $card;
// MDH: Create new VCard to prevent overwriting previous one (PHP5)
$card = new VCard();
}
ksort($cards);
return $cards;
}
And modify the vcard.php parse function to accomodate not having the expected parameters.
function parse(&$lines)
{
while (list(, $line) = each($lines)) {
$line = rtrim($line);
$tmp = split_quoted_string(":", $line, 2);
if (count($tmp) == 2) {
$this->value = $tmp[1];
$tmp = strtoupper($tmp[0]);
$tmp = split_quoted_string(";", $tmp);
$this->name = $tmp[0];
$this->params = array();
for ($i = 1; $i < count($tmp); $i++) {
$this->_parseParam($tmp[$i]);
}
$encoding_defined = array_key_exists('ENCODING', $this->params);
if ($encoding_defined && $this->params['ENCODING'][0] == 'QUOTED-PRINTABLE') {
$this->_decodeQuotedPrintable($lines);
}
$charset_defined = array_key_exists('CHARSET', $this->params);
if ($charset_defined && $this->params['CHARSET'][0] == 'UTF-8') {
$this->value = utf8_decode($this->value);
}
return true;
}
}
return false;
}

"like" search and highlighting in PHP

I have list of brands and want to provide a search function with highlighting. For example, there are the following brands
Apple
Cewe Color
L'Oréal
Microsoft
McDonald's
Tom Tailor
The user then types lor in search form. I'm using the following snippet for searching
class search {
private function simplify($str) {
return str_replace(array('&',' ',',','.','?','|','\'','"'), '', iconv('UTF-8', 'ASCII//TRANSLIT', $str));
}
public function do_search($search) {
$search = self::simplify($search);
$found = array();
foreach (self::$_brands as $brand) {
if (mb_strstr(self::simplify($brand['name']), $search) !== false) $found[]= $brand;
}
return $found;
}
}
That gives me:
Cewe Color
L'Oréal
Tom Tailor
How would be a highlighting possible? Like:
Cewe Co<b>lor</b>
L'<b>Oré</b>al
Tom Tai<b>lor</b>
Btw: I know, that most things can be done with str_replace(), but that fit my needs not in all cases
$highlighted = str_replace($search, "<b>$search</b>", $brand);
would be the simplest method.
:)
Works with FedEx also ;)
$_brands = array
(
"Apple",
"Cewe Color",
"L'Oréal",
"Microsoft",
"McDonald's",
"Tom Tailor"
);
$q = 'lor';
$search = clean($q);
foreach($_brands as $key => $brand){
$brand = clean($brand);
$x = stripos($brand, $search);
if($x !== false){
$regexp = NULL;
$l = strlen($q);
for($i = 0; $i < $l; $i++){
$regexp .= mb_strtoupper($q[$i]).'.?';
}
$regexp = substr($regexp, 0, strlen($regexp) - 2);
$new = $_brands[$key];
$new = preg_replace('#('.$regexp.')#ui', '<b>$0</b>', $new);
echo $new."<br />";
}
}
function clean($string){
$string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);
$string = preg_replace('#[^\w]#ui', '', $string);
return $string;
}
self::$_brands contains result from database (containing columns name, name_lower, name_translit, name_simplified)
class search {
private function translit($str) {
return iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', str_replace(array('ä', 'ü', 'ö', 'ß'), array('a', 'u', 'o', 's'), mb_strtolower($str)));
}
private function simplify($str) {
return preg_replace('/([^a-z0-9])/ui', '', self::translit($str));
}
public function do_search($simplified) {
$found = array();
foreach (self::$_brands as $brand) {
if (mb_strstr($brand['name_simplified'], $simplified) !== false) $found[]= $brand;
}
return $found;
}
private function actionDefault() {
$search = $_POST['search_fld'];
$simplified = self::simplify($search);
$result = self::do_search($simplified);
$brands = array();
foreach ($result as $brand) {
$hl_start = mb_strpos($brand['name_simplified'], $simplified);
$hl_len = mb_strlen($simplified);
$brand_len = mb_strlen($brand['name']);
$tmp = '';
$cnt_extra = 0;
$start_tag = false;
$end_tag = false;
for ($i = 0; $i < $brand_len; $i++) {
if (($i - $cnt_extra) < mb_strlen($brand['name_simplified']) && mb_substr($brand['name_translit'], $i, 1) != mb_substr($brand['name_simplified'], $i - $cnt_extra, 1)) $cnt_extra++;
if (($i - $cnt_extra) == $hl_start && !$start_tag) {
$tmp .= '<b>';
$start_tag = true;
}
$tmp .= mb_substr($brand['name'], $i, 1);
if (($i - $cnt_extra + 1) == ($hl_start + $hl_len) && !$end_tag) {
$tmp .= '</b>';
$end_tag = true;
}
}
if ($start_tag && !$end_tag) $tmp .= '</b>';
$brands[] = "" . $tmp . "";
}
echo implode(' | ', $brands);
}
}

Categories