Why doesn't my testing script (PHP) work? - php

I have a database containing tweets. Furthermore, I have classified these tweets as either being 'negative', 'neutral' or 'positive'. I have done so manually and am now trying to figure out how well my computer could classify them, based on a Naive Bayes classifier.
For testing the accuracy of classification (the amount of tweets classified by the computer in the same way as I did manually divided by the total amount), a script has been written.
I however face a problem with this PHP script. When running it, it gives the error 'Division by zero in C:\wamp\and-so-on'. This is probably due to the fact that the counter is not updated. Furthermore, the amount of 'right classes' do not seem to be updated either. These two parts are essential as the formula for accuracy is: 'right classes' divided by 'counter'.
My question is: what do you think the problem is when looking at the script? And how could I potentially fix it?
The script for testing:
$test_array = array();
$counter = 0;
$timer1 = microtime(true);
$right_classes = 0;
foreach ($test_set as $test_item) {
$tweet_id = $test_item['tweet_id'];
$class_id_shouldbe = $test_item['class_id'];
$tweet = Tweets::loadOne($tweet_id);
// # Preprocess if not done already
// $steps->processTweet($tweet_id, $tweet);
// $tweet = Tweets::loadOne($tweet_id);
if ((int) $tweet['classified'] > 0 || !$tweet['valid']) continue;
if (strlen($tweet['processed_text']) == 0) {
$steps->processTweet($tweet_id, $tweet);
$tweet = Tweets::loadOne($tweet_id);
if (strlen($tweet['processed_text']) == 0) {
echo "Kon tweet '$tweet_id' niet processen. <br>";
continue;
}
}
$class_id = $classifier->classify($tweet['processed_text']);
# Add tweets in database
// Tweets::addClassId($tweet_id, $class_id_shouldbe);
$test_array[$tweet_id] = array(
'what_human_said' => $class_id_shouldbe,
'what_classifier_said' => $class_id,
);
if ($class_id_shouldbe == $class_id) $right_classes++;
$counter++;
if ($counter > 936) break;
echo "$tweet_id,$class_id_shouldbe,$class_id<br>";
}
$timer2 = microtime(true);
echo '<br><br>klaar in '.round($timer2-$timer1, 3).' sec<br>';
echo ($right_classes/$counter)*100 .' %';
exit();

first of all just fix error, and then try to verify why $counter is zero. To fix $counter just verify before division:
if($counter!=0) echo ($right_classes/$counter)*100 .' %'; else echo '0 %';
Then looking at your code, you use continue to get next item in foreach then it's not guaranteed that $counter is reached and then you get Division by zero error.
Hope it helps!

Related

Random generator returning endless duplicates

I am trying to create a random string which will be used as a short reference number. I have spent the last couple of days trying to get this to work but it seems to get to around 32766 records and then it continues with endless duplicates. I need at minimum 200,000 variations.
The code below is a very simple mockup to explain what happens. The code should be syntaxed according to 1a-x1y2z (example) which should give a lot more results than 32k
I have a feeling it may be related to memory but not sure. Any ideas?
<?php
function createReference() {
$num = rand(1, 9);
$alpha = substr(str_shuffle("abcdefghijklmnopqrstuvwxyz"), 0, 1);
$char = '0123456789abcdefghijklmnopqrstuvwxyz';
$charLength = strlen($char);
$rand = '';
for ($i = 0; $i < 6; $i++) {
$rand .= $char[rand(0, $charLength - 1)];
}
return $num . $alpha . "-" . $rand;
}
$codes = [];
for ($i = 1; $i <= 200000; $i++) {
$code = createReference();
while (in_array($code, $codes) == true) {
echo 'Duplicate: ' . $code . '<br />';
$code = createReference();
}
$codes[] = $code;
echo $i . ": " . $code . "<br />";
}
exit;
?>
UPDATE
So I am beginning to wonder if this is not something with our WAMP setup (Bitnami) as our local machine gets to exactly 1024 records before it starts duplicating. By removing 1 character from the string above (instead of 6 in the for loop I make it 5) it gets to exactly 32768 records.
I uploaded the script to our centos server and had no duplicates.
What in our enviroment could cause such a behaviour?
The code looks overly complex to me. Let's assume for the moment you really want to create n unique strings each based on a single random value (rand/mt_rand/something between INT_MIN,INT_MAX).
You can start by decoupling the generation of the random values from the encoding (there seems to be nothing in the code that makes a string dependant on any previous state - excpt for the uniqueness). Comparing integers is quite a bit faster than comparing arbitrary strings.
mt_rand() returns anything between INT_MIN and INT_MAX, using 32bit integers (could be 64bit as well, depends on how php has been compiled) that gives ~232 elements. You want to pick 200k, let's make it 400k, that's ~ a 1/10000 of the value range. It's therefore reasonable to assume everything goes well with the uniqueness...and then check at a later time. and add more values if a collision occured. Again much faster than checking in_array in each iteration of the loop.
Once you have enough values, you can encode/convert them to a format you wish. I don't know whether the <digit><character>-<something> format is mandatory but assume it is not -> base_convert()
<?php
function unqiueRandomValues($n) {
$values = array();
while( count($values) < $n ) {
for($i=count($values);$i<$n; $i++) {
$values[] = mt_rand();
}
$values = array_unique($values);
}
return $values;
}
function createReferences($n) {
return array_map(
function($e) {
return base_convert($e, 10, 36);
},
unqiueRandomValues($n)
);
}
$start = microtime(true);
$references = createReferences(400000);
$end = microtime(true);
echo count($references), ' ', count(array_unique($references)), ' ', $end-$start, ' ', $references[0];
prints e.g. 400000 400000 3.3981630802155 f3plox on my i7-4770. (The $end-$start part is constantly between 3.2 and 3.4)
Using base_convert() there can be strings like li10, which can be quite annoying to decipher if you have to manually type the string.

Knapsack Equation with item groups

Can't call it a problem on Stack Overflow apparently, however I am currently trying to understand how to integrate constraints in the form of item groups within the Knapsack problem. My math skills are proving to be fairly limiting in this situation, however I am very motivated to both make this work as intended as well as figure out what each aspect does (in that order since things make more sense when they work).
With that said, I have found an absolutely beautiful implementation at Rosetta Code and cleaned up the variable names some to help myself better understand this from a very basic perspective.
Unfortunately I am having an incredibly difficult time figuring out how I can apply this logic to include item groups. My purpose is for building fantasy teams, supplying my own value & weight (points/salary) per player but without groups (positions in my case) I am unable to do so.
Would anyone be able to point me in the right direction for this? I'm reviewing code examples from other languages and additional descriptions of the problem as a whole, however I would like to get the groups implemented by whatever means possible.
<?php
function knapSolveFast2($itemWeight, $itemValue, $i, $availWeight, &$memoItems, &$pickedItems)
{
global $numcalls;
$numcalls++;
// Return memo if we have one
if (isset($memoItems[$i][$availWeight]))
{
return array( $memoItems[$i][$availWeight], $memoItems['picked'][$i][$availWeight] );
}
else
{
// At end of decision branch
if ($i == 0)
{
if ($itemWeight[$i] <= $availWeight)
{ // Will this item fit?
$memoItems[$i][$availWeight] = $itemValue[$i]; // Memo this item
$memoItems['picked'][$i][$availWeight] = array($i); // and the picked item
return array($itemValue[$i],array($i)); // Return the value of this item and add it to the picked list
}
else
{
// Won't fit
$memoItems[$i][$availWeight] = 0; // Memo zero
$memoItems['picked'][$i][$availWeight] = array(); // and a blank array entry...
return array(0,array()); // Return nothing
}
}
// Not at end of decision branch..
// Get the result of the next branch (without this one)
list ($without_i,$without_PI) = knapSolveFast2($itemWeight, $itemValue, $i-1, $availWeight,$memoItems,$pickedItems);
if ($itemWeight[$i] > $availWeight)
{ // Does it return too many?
$memoItems[$i][$availWeight] = $without_i; // Memo without including this one
$memoItems['picked'][$i][$availWeight] = array(); // and a blank array entry...
return array($without_i,array()); // and return it
}
else
{
// Get the result of the next branch (WITH this one picked, so available weight is reduced)
list ($with_i,$with_PI) = knapSolveFast2($itemWeight, $itemValue, ($i-1), ($availWeight - $itemWeight[$i]),$memoItems,$pickedItems);
$with_i += $itemValue[$i]; // ..and add the value of this one..
// Get the greater of WITH or WITHOUT
if ($with_i > $without_i)
{
$res = $with_i;
$picked = $with_PI;
array_push($picked,$i);
}
else
{
$res = $without_i;
$picked = $without_PI;
}
$memoItems[$i][$availWeight] = $res; // Store it in the memo
$memoItems['picked'][$i][$availWeight] = $picked; // and store the picked item
return array ($res,$picked); // and then return it
}
}
}
$items = array("map","compass","water","sandwich","glucose","tin","banana","apple","cheese","beer","suntan cream","camera","t-shirt","trousers","umbrella","waterproof trousers","waterproof overclothes","note-case","sunglasses","towel","socks","book");
$weight = array(9,13,153,50,15,68,27,39,23,52,11,32,24,48,73,42,43,22,7,18,4,30);
$value = array(150,35,200,160,60,45,60,40,30,10,70,30,15,10,40,70,75,80,20,12,50,10);
## Initialize
$numcalls = 0;
$memoItems = array();
$selectedItems = array();
## Solve
list ($m4, $selectedItems) = knapSolveFast2($weight, $value, sizeof($value)-1, 400, $memoItems, $selectedItems);
# Display Result
echo "<b>Items:</b><br>" . join(", ", $items) . "<br>";
echo "<b>Max Value Found:</b><br>$m4 (in $numcalls calls)<br>";
echo "<b>Array Indices:</b><br>". join(",", $selectedItems) . "<br>";
echo "<b>Chosen Items:</b><br>";
echo "<table border cellspacing=0>";
echo "<tr><td>Item</td><td>Value</td><td>Weight</td></tr>";
$totalValue = 0;
$totalWeight = 0;
foreach($selectedItems as $key)
{
$totalValue += $value[$key];
$totalWeight += $weight[$key];
echo "<tr><td>" . $items[$key] . "</td><td>" . $value[$key] . "</td><td>".$weight[$key] . "</td></tr>";
}
echo "<tr><td align=right><b>Totals</b></td><td>$totalValue</td><td>$totalWeight</td></tr>";
echo "</table><hr>";
?>
That knapsack program is traditional, but I think that it obscures what's going on. Let me show you how the DP can be derived more straightforwardly from a brute force solution.
In Python (sorry; this is my scripting language of choice), a brute force solution could look like this. First, there's a function for generating all subsets with breadth-first search (this is important).
def all_subsets(S): # brute force
subsets_so_far = [()]
for x in S:
new_subsets = [subset + (x,) for subset in subsets_so_far]
subsets_so_far.extend(new_subsets)
return subsets_so_far
Then there's a function that returns True if the solution is valid (within budget and with a proper position breakdown) – call it is_valid_solution – and a function that, given a solution, returns the total player value (total_player_value). Assuming that players is the list of available players, the optimal solution is this.
max(filter(is_valid_solution, all_subsets(players)), key=total_player_value)
Now, for a DP, we add a function cull to all_subsets.
def best_subsets(S): # DP
subsets_so_far = [()]
for x in S:
new_subsets = [subset + (x,) for subset in subsets_so_far]
subsets_so_far.extend(new_subsets)
subsets_so_far = cull(subsets_so_far) ### This is new.
return subsets_so_far
What cull does is to throw away the partial solutions that are clearly not going to be missed in our search for an optimal solution. If the partial solution is already over budget, or if it already has too many players at one position, then it can safely be discarded. Let is_valid_partial_solution be a function that tests these conditions (it probably looks a lot like is_valid_solution). So far we have this.
def cull(subsets): # INCOMPLETE!
return filter(is_valid_partial_solution, subsets)
The other important test is that some partial solutions are just better than others. If two partial solutions have the same position breakdown (e.g., two forwards and a center) and cost the same, then we only need to keep the more valuable one. Let cost_and_position_breakdown take a solution and produce a string that encodes the specified attributes.
def cull(subsets):
best_subset = {} # empty dictionary/map
for subset in filter(is_valid_partial_solution, subsets):
key = cost_and_position_breakdown(subset)
if (key not in best_subset or
total_value(subset) > total_value(best_subset[key])):
best_subset[key] = subset
return best_subset.values()
That's it. There's a lot of optimization to be done here (e.g., throw away partial solutions for which there's a cheaper and more valuable partial solution; modify the data structures so that we aren't always computing the value and position breakdown from scratch and to reduce the storage costs), but it can be tackled incrementally.
One potential small advantage with regard to composing recursive functions in PHP is that variables are passed by value (meaning a copy is made) rather than reference, which can save a step or two.
Perhaps you could better clarify what you are looking for by including a sample input and output. Here's an example that makes combinations from given groups - I'm not sure if that's your intention... I made the section accessing the partial result allow combinations with less value to be considered if their weight is lower - all of this can be changed to prune in the specific ways you would like.
function make_teams($players, $position_limits, $weights, $values, $max_weight){
$player_counts = array_map(function($x){
return count($x);
}, $players);
$positions = array_map(function($x){
$positions[] = [];
},$position_limits);
$num_positions = count($positions);
$combinations = [];
$hash = [];
$stack = [[$positions,0,0,0,0,0]];
while (!empty($stack)){
$params = array_pop($stack);
$positions = $params[0];
$i = $params[1];
$j = $params[2];
$len = $params[3];
$weight = $params[4];
$value = $params[5];
// too heavy
if ($weight > $max_weight){
continue;
// the variable, $positions, is accumulating so you can access the partial result
} else if ($j == 0 && $i > 0){
// remember weight and value after each position is chosen
if (!isset($hash[$i])){
$hash[$i] = [$weight,$value];
// end thread if current value is lower for similar weight
} else if ($weight >= $hash[$i][0] && $value < $hash[$i][1]){
continue;
// remember better weight and value
} else if ($weight <= $hash[$i][0] && $value > $hash[$i][1]){
$hash[$i] = [$weight,$value];
}
}
// all positions have been filled
if ($i == $num_positions){
$positions[] = $weight;
$positions[] = $value;
if (!empty($combinations)){
$last = &$combinations[count($combinations) - 1];
if ($weight < $last[$num_positions] && $value > $last[$num_positions + 1]){
$last = $positions;
} else {
$combinations[] = $positions;
}
} else {
$combinations[] = $positions;
}
// current position is filled
} else if (count($positions[$i]) == $position_limits[$i]){
$stack[] = [$positions,$i + 1,0,$len,$weight,$value];
// otherwise create two new threads: one with player $j added to
// position $i, the other thread skipping player $j
} else {
if ($j < $player_counts[$i] - 1){
$stack[] = [$positions,$i,$j + 1,$len,$weight,$value];
}
if ($j < $player_counts[$i]){
$positions[$i][] = $players[$i][$j];
$stack[] = [$positions,$i,$j + 1,$len + 1
,$weight + $weights[$i][$j],$value + $values[$i][$j]];
}
}
}
return $combinations;
}
Output:
$players = [[1,2],[3,4,5],[6,7]];
$position_limits = [1,2,1];
$weights = [[2000000,1000000],[10000000,1000500,12000000],[5000000,1234567]];
$values = [[33,5],[78,23,10],[11,101]];
$max_weight = 20000000;
echo json_encode(make_teams($players, $position_limits, $weights, $values, $max_weight));
/*
[[[1],[3,4],[7],14235067,235],[[2],[3,4],[7],13235067,207]]
*/

How to check if data already exists then randomly generate new data from an Array

Okay so, I made an array containing 270+ different strings. The main goal is too echo out 60 strings from that array and combine them..
Ex. 1.String, 1.String2.
The 2 echo'd out strings are a combination, and should be together like 1-1 and 2-2, 3-3, and ect. In total there would be 30 combinations, but 60 strings.
Also, the way I have it is the first generated string should be combo'd with the 31st string generated
I was able to do this using a for loop, but now I had to use a mysql database to check if that combination already exists, then tell the random operator to generate another combination, until it finds a combination that doesn't already exist in the table. Here is what I have so far, but I'm not sure where to go from here.
$input = array("ITEM1", "ITEM2", "ITEM3", "ITEM4", "ITEM5"); //There are 273 items in the list
$rand_keys = array_rand($input, 60);
$con = mysqli_connect($server, $user, $password, $name_db);
for ($i = 0; $i <= 29; $i++) {
$check = "SELECT * FROM contest WHERE name1 = '$input[$rand_keys[$i]]' AND name2 = '$input[$rand_keys[$i+30]]'";
$rs = mysqli_query($con, $check);
$data = mysqli_fetch_array($rs, MYSQLI_NUM);
if ($data[$i] > 1) {
//This combination already exists generate a new one until it generates one that doesn't exist
while ($data[$i] > 1) {
$rand_keys2 = array_rand($input, 2);
$input[$rand_keys2[0]] = $input[$rand_keys[$i]];
$input[$rand_keys2[1]] = $input[$rand_keys[$i + 30]];
}
echo $input[$rand_keys[$i]];
echo '-';
echo $input[$rand_keys[$i + 30]];
} else {
echo $input[$rand_keys[$i]];
echo '-';
echo $input[$rand_keys[$i + 30]];
}
}
That is all I have so far, I'm pretty sure my error is in the mysql statement could anyone fix that for me, and there is probably a bunch of mistakes in the rest of it. I'm just beginning, so don't mind some noob mistakes if you see them. Btw, when I run it, it just comes to a blank page.
In my personal opinion, NEVER try to get data from an array within quotes! Always do it outside of quotes; especially in multi-denominational arrays.
'$input[$rand_keys[$i]]' should be rewritten as '".$input[$rand_keys[$i]]."' OR '{$input[$rand_keys[$i]]}'.
In my opinion it is better to do it outside of quotes instead of using { }.

Add increasing number to filename, how not to loop?

I'm trying to modify a script that generates random numbers to filenames and instead have a pretty increasing counter of +1 each new file
The original function is very simple it looks like this:
$name .= generateRandomString(5);
What I've came up with my mediocre skills is:
$name .= $count = 1; while ($count <= 10) { echo "$count "; ++$count; }
However, what happens when I run the code is that it just keeps on looping. I was looking for a function similar to the: generateRandomString but for increasing numbers, is there any?
Any ideas?
user current timestamp in place of $count, it will always keep increasing.
$name .= time()

PHP Word guessing game (highlight letters in right and wrong position - like mastermind)

Sorry for the long title.
Wanted it to be as descriptive as possible.
Disclaimer : Could find some "find the differences" code here and elsewhere on Stackoverflow, but not quite the functionality I was looking for.
I'll be using these terminoligy later on:
'userguess' : a word that will be entered by the user
'solution' : the secret word that needs to be guessed.
What I need to create
A word guessing game where:
The user enters a word (I'll make sure through Javascript/jQuery that
the entered word contains the same number of letters as the word to
be guessed).
A PHP function then checks the 'userguess' and highlights (in green)
the letters in that word which are in the correct place, and
highlights (in red) the letters that are not yet in the right place,
but do show up somewhere else in the word. The letters that don't
show up in the 'solution' are left black.
Pitfall Scenario : - Let's say the 'solution' is 'aabbc' and the user guesses 'abaac'
In the above scenario this would result in : (green)a(/green)(red)b(/red)(red)a(/red)(black)a(/black)(green)c(/green)
Notice how the last "a" is black cause 'userguess' has 3 a's but 'solution' only has 2
What I have so far
Code is working more or less, but I've got a feeling it can be 10 times more lean and mean.
I'm filling up 2 new Arrays (one for solution and one for userguess) as I go along to prevent the pitfall (see above) from messing things up.
function checkWord($toCheck) {
global $solution; // $solution word is defined outside the scope of this function
$goodpos = array(); // array that saves the indexes of all the right letters in the RIGHT position
$badpos = array(); // array that saves the indexes of all the right letters in the WRONG position
$newToCheck = array(); // array that changes overtime to help us with the Pitfall (see above)
$newSolution = array();// array that changes overtime to help us with the Pitfall (see above)
// check for all the right letters in the RIGHT position in entire string first
for ($i = 0, $j = strlen($toCheck); $i < $j; $i++) {
if ($toCheck[$i] == $solution[$i]) {
$goodpos[] = $i;
$newSolution[$i] = "*"; // RIGHT letters in RIGHT position are 'deleted' from solution
} else {
$newToCheck[] = $toCheck[$i];
$newSolution[$i] = $solution[$i];
}
}
// go over the NEW word to check for letters that are not in the right position but show up elsewhere in the word
for ($i = 0, $j = count($newSolution); $i <= $j; $i++) {
if (!(in_array($newToCheck[$i], $newSolution))) {
$badpos[] = $i;
$newSolution[$i] = "*";
}
}
// use the two helper arrays above 'goodpos' and 'badpos' to color the characters
for ($i = 0, $j = strlen($toCheck), $k = 0; $i < $j; $i++) {
if (in_array($i,$goodpos)) {
$colored .= "<span class='green'>";
$colored .= $toCheck[$i];
$colored .= "</span>";
} else if (in_array($i,$badpos)) {
$colored .= "<span class='red'>";
$colored .= $toCheck[$i];
$colored .= "</span>";
} else {
$colored .= $toCheck[$i];
}
}
// output to user
$output = '<div id="feedbackHash">';
$output .= '<h2>Solution was : ' . $solution . '</h2>';
$output .= '<h2>Color corrected: ' . $colored . '</h2>';
$output .= 'Correct letters in the right position : ' . count($goodpos) . '<br>';
$output .= 'Correct letters in the wrong position : ' . count($badpos) . '<br>';
$output .= '</div>';
return $output;
} // checkWord
Nice question. I'd probably do it slightly differently to you :) (I guess that's what you were hoping for!)
You can find my complete solution function here http://ideone.com/8ojAG - but I'm going to break it down step by step too.
Firstly, please try and avoid using global. There's no reason why you can't define your function as:
function checkWord($toCheck, $solution) {
You can pass the solution in and avoid potential nasties later on.
I'd start by splitting both the user guess, and the solution into arrays, and have another array to store my output in.
$toCheck = str_split($toCheck, 1);
$solution = str_split($solution, 1);
$out = array();
At each stage of the process, I'd remove the characters that have been identified as correct or incorrect from the users guess or the solution, so I don't need to flag them in any way, and the remaining stages of the function run more efficiently.
So to check for matches.
foreach ($toCheck as $pos => $char) {
if ($char == $solution[$pos]) {
$out[$pos] = "<span class=\"green\">$char</span>";
unset($toCheck[$pos], $solution[$pos]);
}
}
So for your example guess/solution, $out now contains a green 'a' at position 0, and a green c at position 4. Both the guess and the solution no longer have these indices, and will not be checked again.
A similar process for checking letters that are present, but in the wrong place.
foreach ($toCheck as $pos => $char) {
if (false !== $solPos = array_search($char, $solution)) {
$out[$pos] = "<span class=\"red\">$char</span>";
unset($toCheck[$pos], $solution[$solPos]);
}
}
In this case we are searching for the guessed letter in the solution, and removing it if it is found. We don't need to count the number of occurrences because the letters are removed as we go.
Finally the only letters remaining in the users guess, are ones that are not present at all in the solution, and since we maintained the numbered indices throughout, we can simply merge the leftover letters back in.
$out += $toCheck;
Almost there. $out has everything we need, but it's not in the correct order. Even though the indices are numeric, they are not ordered. We finish up with:
ksort($out);
return implode($out);
The result from this is:
"<span class="green">a</span><span class="red">b</span><span class="red">a</span>a<span class="green">c</span>"
Here try this, See In Action
Example output:
<?php
echo checkWord('aabbc','abaac').PHP_EOL;
echo checkWord('funday','sunday').PHP_EOL;
echo checkWord('flipper','ripple').PHP_EOL;
echo checkWord('monkey','kenney').PHP_EOL;
function checkWord($guess, $solution){
$arr1 = str_split($solution);
$arr2 = str_split($guess);
$arr1_c = array_count_values($arr1);
$arr2_c = array_count_values($arr2);
$out = '';
foreach($arr2 as $key=>$value){
$arr1_c[$value]=(isset($arr1_c[$value])?$arr1_c[$value]-1:0);
$arr2_c[$value]=(isset($arr2_c[$value])?$arr2_c[$value]-1:0);
if(isset($arr2[$key]) && isset($arr1[$key]) && $arr1[$key] == $arr2[$key]){
$out .='<span style="color:green;">'.$arr2[$key].'</span>';
}elseif(in_array($value,$arr1) && $arr2_c[$value] >= 0 && $arr1_c[$value] >= 0){
$out .='<span style="color:red;">'.$arr2[$key].'</span>';
}else{
$out .='<span style="color:black;">'.$arr2[$key].'</span>';
}
}
return $out;
}
?>

Categories