Searching for matrix way finding algorithm

Searching for matrix way finding algorithm - php

i am developing a board game in php and now i have problems in writing an algorithm...
the game board is a multidimensional array ($board[10][10]) to define rows and columns of the board matrix or vector...
now i have to loop through the complete board but with a dynamic start point. for example the user selects cell [5,6] this is the start point for the loop. goal is to find all available board cells around the selected cell to find the target cells for a move method. i think i need a performant and efficient way to do this. does anyone know an algorithm to loop through a matrix/vector, only ones every field to find the available and used cells?
extra rule...
in the picture appended is a blue field selected (is a little bigger than the other). the available fields are only on the right side. the left side are available but not reachable from the current selected position... i think this is a extra information which makes the algorithm a little bit complicated....
big thx so far!
kind regards

not completely sure that I got the requirements right, so let me restate them:
You want an efficient algorithm to loop through all elements of an nxn matrix with n approximately 10, which starts at a given element (i,j) and is ordered by distance from (i,j)!?
I'd loop through a distance variable d from 0 to n/2
then for each value of d loop for l through -(2*d) to +(2*d)-1
pick the the cells (i+d, j+l), if i>=0 also pick (i+l,j-d),(i+l, j+d)
for each cell you have to apply a modulo n, to map negativ indexes back to the matrix.
This considers the matrix basically a torus, glueing upper and lower edge as well as left and right edge together.
If you don't like that you can let run d up to n and instead of a modulo operation just ignore values outside the matrix.
These aproaches give you the fields directly in the correct order. For small fields I do doubt any kind of optimization on this level has much of an effect in most situations, Nicholas approach might be just as good.
Update
I slightly modified the cells to pick in order to honor the rule 'only consider fields that are right from the current column or on the same column'

If your map is only 10x10, I'd loop through from [0][0], collecting all the possible spaces for the player to move, then grade the spaces by distance to current player position. N is small, so the fact that the algorithm has O(N^2) shouldn't affect your performance much.
Maybe someone with more background in algorithms has something up their sleeve.

Related

How do I determine the optimum variable weights when ranking the "best" candidate for a prize

Consider I am writing a program to objectively select a winner in a competition. There are 'n' human judges secretly assigning a 1st, 2nd, 3rd position ranking to the top three candidates from a pool of 'm' candidates.
The program must then go through the judges decisions, and based on weights assigned to 1st place, 2nd place and 3rd place, each candidate will be rated based on the number of 1st, 2nd, and 3rd place votes they received, multiplied by the appropriate rating for each finishing position.
However, at the start, the program has no idea of what weights are appropriate, so I have created an automated "program" that is intended to "discover" the proper weights based on how the judges would pick the winner from a hypothetical situation.
I present a table where the horizontal axis contains the finishing position, and the judges' codes (e.g. Judge W, Judge X, Judge Y, Judge Z). The vertical axis has three rows (1st place, 2nd place, 3rd place), and at the intersection of each Judge/Row, I have randomly generated a candidate ID (from the set A through F).
After rendering the table, I then ask the judge who THEY would have chosen as the winner (the judge has the option to PASS if there is not sufficient information to choose).
After the judges run through an appropriate number of scenarios, I wish to now take the results of the various runs and use that information to determine the "best fit" for the weighting of 1st, 2nd, and 3rd positions.
Let's say one of the hypothetical grids looks like this:
<table border="1"><tbody><tr><th>Position</th><th>Judge 'W'</th><th>Judge 'X'</th><th>Judge 'Y'</th><th>Judge 'Z'</th></tr><tr><td>1st</td><td><center>A</center></td><td><center>F</center></td><td><center>C</center></td><td><center>B</center></td></tr><tr><td>2nd</td><td><center>D</center></td><td><center>B</center></td><td><center>E</center></td><td><center>D</center></td></tr><tr><td>3rd</td><td><center>C</center></td><td><center>E</center></td><td><center>B</center></td><td><center>C</center></td></tr></tbody></table>
and the human judge picks candidate "B" as the winner. My program should react by calculating the (w1 + w2 + w3) > (w1 + 2w3) (i.e. B better than C) and (w1 + w2 + w3) > (2 w2) (i.e. B better than D), etc.
From these various algebraic comparisons, over a number of "hypothetical scenarios", I want to be able to calculate the optimum values for w1, w2 and w3. And then, at some point when there is enough "good" data, I want to be able to use these "discovered" weights to go back over the training data an identify areas where perhaps the human judges were mistaken.
I am using PHP as the programming language and don't know which functions or possible existing libraries are appropriate to solve this kind of "fuzzy" equation.
I'm looking for some direction to help me tackle this problem.
Thank you for your assistance.

For the winning candidate count how many times he appears in each position, then do the same for all the other candidates. Then write the following formula for each candidate:
GoodForJ=w1*nw1+w2*nw2+w3*nw3>w1*nj1+w2*nj2+w3*nj3
Where nw1-3 are the times the winner appears in each position and nj1-3 are the times the j candidate appears in each position.
If goodForJ is true for all the candidates this means that the tuple of weights is good. Now you just have to try a bounch of weights combinations and find out which one fits. Trying all combinations of weights between 1 and 10 requires 1000 iterations.
To make things a bit fuzzier, for each try you could count how many timrs goodForJ is true and choose the weights that produces the highest score.

What is the most efficient way to find the euclidean distance in 3d using mysql?

I have a MySQL table with thousands of data points stored in 3 columns R, G, B. how can I find which data point is closest to a given point (a,b,c) using Euclidean distance?
I'm saving RGB values of colors separately in a table, so the values are limited to 0-255 in each column. What I'm trying to do is find the closest color match by finding the color with the smallest euclidean distance.
I could obviously run through every point in the table to calculate the distance but that wouldn't be efficient enough to scale. Any ideas?

I think the above comments are all true, but they are - in my humble opinion - not answering the original question. (Correct me if I'm wrong). So, let me here add my 50 cents:
You are asking for a select statement, which, given your table is called 'colors', and given your columns are called r, g and b, they are integers ranged 0..255, and you are looking for the value, in your table, closest to a given value, lets say: rr, gg, bb, then I would dare trying the following:
select min(sqrt((rr-r)*(rr-r)+(gg-g)*(gg-g)+(bb-b)*(bb-b))) from colors;
Now, this answer is given with a lot of caveats, as I am not sure I got your question right, so pls confirm if it's right, or correct me so that I can be of assistance.

Since you're looking for the minimum distance and not exact distance you can skip the square root. I think Squared Euclidean Distance applies here.
You've said the values are bounded between 0-255, so you can make an indexed look up table with 255 values.
Here is what I'm thinking in terms of SQL. r0, g0, and b0 represent the target color. The table Vector would hold the square values mentioned above in #2. This solution would visit all the records but the result set can be set to 1 by sorting and selecting only the first row.
select
c.r, c.g, c.b,
mR.dist + mG.dist + mB.dist as squared_dist
from
colors c,
vector mR,
vector mG,
vector mB
where
c.r-r0 = mR.point and
c.g-g0 = mG.point and
c.b-b0 = mB.point
group by
c.r, c.g, c.b

The first level of optimization that I see you can do would be square the distance to which you want to limit the query so that you don't need to perform the square root for each row.
The second level of optimization I would encourage would be some preprocessing to alleviate the need for extraneous squaring for each query (which could possibly create some extra run time for large tables of RGB's). You'd have to do some benchmarking to see, but by substituting in values for a, b, c, and d and then performing the query, you could alleviate some stress from MySQL.
Note that the performance difference between the last two lines may be negligible. You'll have to use test queries on your system to determine which is faster.
I just re-read and noticed that you are ordering by distance. In which case, the d should be removed everything should be moved to one side. You can still plug in the constants to prevent extra processing on MySQL's end.

I believe there are two options.
You have to either as you say iterate across the entire set and compare and check against a maximum that you set initially at an impossibly low number like -1. This runs in linear time, n times (since you're only comparing 1 point to every point in the set, this scales in a linear way).
I'm still thinking of another option... something along the lines of doing a breadth first search away from the input point until a point is found in the set at the searched point, but this requires a bit more thought (I imagine the 3D space would have to be pretty heavily populated for this to be more efficient on average though).

If you run through every point and calculate the distance, don't use the square root function, it isn't necessary. The smallest sum of squares will be enough.
This is the problem you are trying to solve. (Planar case, select all points sorted by a x, y, or z axis. Then use PHP to process them)
MySQL also has a Spatial Database which may have this as a function. I'm not positive though.

Space a series of values so they don't overlap

There have been a couple of questions very close to this topic, but none really helped me.
Ive been programming a graphing library, and I need an algorithm to vertically place labels without overlapping. I've been stuck on this for a couple of days now, and managed to distil it to the most basic function:
If given a series of label positions along the Y axis, say, 1 1 2 3 5 6 9, and an upper and a lower limits 10 and 0 respectively, I need a way to space out the values to output 1 2 3 4 5 6 9
333467 should be 234567 weighted to be close to the original coordinates.
This should also work backwards, if values are bunched up at the upper end of the scale, they should be spread as much as possible (before overflowing)
I'm not looking for a definitive answer, but I'd like some help on how to approach this problem. Im completely stuck.
Last train of thought was to scan all labels for possible collisions and position them as one big block, aligning to the centre of all the Y coordinates. But this will not work if there are multiple sets of collisions.
EDIT: To put this algorithm in a bigger context, have a look at these two google chart API pie charts:
1) Top stacked labels
2) Bottom Stacked Labels
The labels are almost springy, they avoid collisions by joining together and moving their entire mass to the center of their mass.

Make the set of labels unique by inserting into an ordered set. Divide the difference between the y-axis upper and lower bound by the number of elements in the set. This is your spacing increment. Iterate over the set in order and position one label every spacing increment.
You didn't say anything about needing to preserve a scale...

Well, After some thought and advice from other sources i came up with a solution:
Pseudocode:
foreach labels as label
if label->collidesWith(labels->lowerLimit)
label->moveAwayFrom(labels->lowerLimit)
if label->collidesWith(labels->upperLimit)
label->moveAwayFrom(labels->upperLimit)
if label->collidesWith(label->previous)
label->moveAwayFrom(label->previous)
label->previous->moveAwayFrom(label)
if label->collidesWith(label->next)
label->moveAwayFrom(label->next)
label->next->moveAwayFrom(label)
endforeach
MoveAwayFrom moves 1 pixel at a time. When this function is run multiple times it rejiggles the labels until none of them collide. (in reality im calling this loop 100 times, havent figured out a way to do it more inteligently)

Algorithm to sort hues into shortest span, e.g. `350,354,2,10,15`? [0-360 deg]

In color schemes, I would like to sort the hues, but would like to avoid 'big gaps', i.e. prefer 350,354,2,10,15 over 2,10,15,350,354 (when expressed as 0-360 degree values). What's the best approach of doing that (eg in php)? Is it finding the 'biggest gap' and start after that? Any better ideas?

Just find the biggest gap and put it in the beginning.
Sort the array
Find biggest gap (loop through the array, find the biggest distance between two neighbors)
Move the gap to be the beginning (Another loop to shift all the numbers)

If you don't have that many:
just sort in order
find the variance (modulo 360) (i.e, how far out are they from the 'modulo 360 mean')
Move the first to the end, check variance again.
After you have tried all of them, choose the one with the smallest.
This algorithm is O(N^2) in the size of the of the list.
The main takeaway is that you only have N 'rotations' here. Decide a 'gappiness' statistic, and brute force it over all N rotations, and use the arrangement that minimizes the 'gappiness'.

LSA - Latent Semantic Analysis - How to code it in PHP?

I would like to implement Latent Semantic Analysis (LSA) in PHP in order to find out topics/tags for texts.
Here is what I think I have to do. Is this correct? How can I code it in PHP? How do I determine which words to chose?
I don't want to use any external libraries. I've already an implementation for the Singular Value Decomposition (SVD).
Extract all words from the given text.
Weight the words/phrases, e.g. with tf–idf. If weighting is too complex, just take the number of occurrences.
Build up a matrix: The columns are some documents from the database (the more the better?), the rows are all unique words, the values are the numbers of occurrences or the weight.
Do the Singular Value Decomposition (SVD).
Use the values in the matrix S (SVD) to do the dimension reduction (how?).
I hope you can help me. Thank you very much in advance!

LSA links:
Landauer (co-creator) article on LSA
the R-project lsa user guide
Here is the complete algorithm. If you have SVD, you are most of the way there. The papers above explain it better than I do.
Assumptions:
your SVD function will give the singular values and singular vectors in descending order. If not, you have to do more acrobatics.
M: corpus matrix, w (words) by d (documents) (w rows, d columns). These can be raw counts, or tfidf or whatever. Stopwords may or may not be eliminated, and stemming may happen (Landauer says keep stopwords and don't stem, but yes to tfidf).
U,Sigma,V = singular_value_decomposition(M)
U: w x w
Sigma: min(w,d) length vector, or w * d matrix with diagonal filled in the first min(w,d) spots with the singular values
V: d x d matrix
Thus U * Sigma * V = M
# you might have to do some transposes depending on how your SVD code
# returns U and V. verify this so that you don't go crazy :)
Then the reductionality.... the actual LSA paper suggests a good approximation for the basis is to keep enough vectors such that their singular values are more than 50% of the total of the singular values.
More succintly... (pseudocode)
Let s1 = sum(Sigma).
total = 0
for ii in range(len(Sigma)):
val = Sigma[ii]
total += val
if total > .5 * s1:
return ii
This will return the rank of the new basis, which was min(d,w) before, and we'll now approximate with {ii}.
(here, ' -> prime, not transpose)
We create new matrices: U',Sigma', V', with sizes w x ii, ii x ii, and ii x d.
That's the essence of the LSA algorithm.
This resultant matrix U' * Sigma' * V' can be used for 'improved' cosine similarity searching, or you can pick the top 3 words for each document in it, for example. Whether this yeilds more than a simple tf-idf is a matter of some debate.
To me, LSA performs poorly in real world data sets because of polysemy, and data sets with too many topics. It's mathematical / probabilistic basis is unsound (it assumes normal-ish (Gaussian) distributions, which don't makes sense for word counts).
Your mileage will definitely vary.
Tagging using LSA (one method!)
Construct the U' Sigma' V' dimensionally reduced matrices using SVD and a reduction heuristic
By hand, look over the U' matrix, and come up with terms that describe each "topic". For example, if the the biggest parts of that vector were "Bronx, Yankees, Manhattan," then "New York City" might be a good term for it. Keep these in a associative array, or list. This step should be reasonable since the number of vectors will be finite.
Assuming you have a vector (v1) of words for a document, then v1 * t(U') will give the strongest 'topics' for that document. Select the 3 highest, then give their "topics" as computed in the previous step.

This answer isn't directly to the posters' question, but to the meta question of how to autotag news items. The OP mentions Named Entity Recognition, but I believe they mean something more along the line of autotagging. If they really mean NER, then this response is hogwash :)
Given these constraints (600 items / day, 100-200 characters / item) with divergent sources, here are some tagging options:
By hand. An analyst could easily do 600 of these per day, probably in a couple of hours. Something like Amazon's Mechanical Turk, or making users do it, might also be feasible. Having some number of "hand-tagged", even if it's only 50 or 100, will be a good basis for comparing whatever the autogenerated methods below get you.
Dimentionality reductions, using LSA, Topic-Models (Latent Dirichlet Allocation), and the like.... I've had really poor luck with LSA on real-world data sets and I'm unsatisfied with its statistical basis. LDA I find much better, and has an incredible mailing list that has the best thinking on how to assign topics to texts.
Simple heuristics... if you have actual news items, then exploit the structure of the news item. Focus on the first sentence, toss out all the common words (stop words) and select the best 3 nouns from the first two sentences. Or heck, take all the nouns in the first sentence, and see where that gets you. If the texts are all in english, then do part of speech analysis on the whole shebang, and see what that gets you. With structured items, like news reports, LSA and other order independent methods (tf-idf) throws out a lot of information.
Good luck!
(if you like this answer, maybe retag the question to fit it)

That all looks right, up to the last step. The usual notation for SVD is that it returns three matrices A = USV*. S is a diagonal matrix (meaning all zero off the diagonal) that, in this case, basically gives a measure of how much each dimension captures of the original data. The numbers ("singular values") will go down, and you can look for a drop-off for how many dimensions are useful. Otherwise, you'll want to just choose an arbitrary number N for how many dimensions to take.
Here I get a little fuzzy. The coordinates of the terms (words) in the reduced-dimension space is either in U or V, I think depending on whether they are in the rows or columns of the input matrix. Off hand, I think the coordinates for the words will be the rows of U. i.e. the first row of U corresponds to the first row of the input matrix, i.e. the first word. Then you just take the first N columns of that row as the word's coordinate in the reduced space.
HTH
Update:
This process so far doesn't tell you exactly how to pick out tags. I've never heard of anyone using LSI to choose tags (a machine learning algorithm might be more suited to the task, like, say, decision trees). LSI tells you whether two words are similar. That's a long way from assigning tags.
There are two tasks- a) what are the set of tags to use? b) how to choose the best three tags?. I don't have much of a sense of how LSI is going to help you answer (a). You can choose the set of tags by hand. But, if you're using LSI, the tags probably should be words that occur in the documents. Then for (b), you want to pick out the tags that are closest to words found in the document. You could experiment with a few ways of implementing that. Choose the three tags that are closest to any word in the document, where closeness is measured by the cosine similarity (see Wikipedia) between the tag's coordinate (its row in U) and the word's coordinate (its row in U).

There is an additional SO thread on the perils of doing this all in PHP at link text.
Specifically, there is a link there to this paper on Latent Semantic Mapping, which describes how to get the resultant "topics" for a text.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.