All possible combinations from sets - php

I have a set of numbers:
1,22
1,46
32,1
1,9
32,22
1,14
1,45
1,33
33,22
45,22
32,46
32,9
3,1
3,9
3,22
3,32
3,46
9,22
46,22
46,45
46,33
15,1
15,46
15,6
15,22
15,3
15,9
15,45
15,33
15,32
15,14
I need to get combinations from them with a rule that each new pair can only be appended if the latter number is the same as the first in the pair.
For example if I have a pair {15,1}, the next on can be only {1,46} and the next {46,45}, and the final pair must end with the first number of the whole set. In this case it could be for example {45,1}.
So the end result of sets with 4 set limit would be
{15,1,1,46,46,45,45,1}
I can do basic power sets and generate all possible combinations from set of numbers but this seems to be too advanced for me.
I can do C, Javascript or PHP so all the help or solutions to this are highly appreciated. And for clarification, this is not a homework, this is just something I would like to learn and understand.

This looks as if some graph data structure, and some graph algorithms, would be appropriate. Your graph would comprise nodes (each of which is a number) and edges (each of which represents one of your pairs). Then write the appropriate routine for walking round the graph. It's not entirely clear from your question what the rules for the walk are, but I guess you know.
EDIT
Of course, I should point out that what you have is already a graph data structure, it's called an adjacency list. Google around for algorithms and representations.

Related

Recursive array analysis (math program)

I'm making a PHP trigonometry tool using PHP that can analyze how best to solve a given problem.
For instance, I know angle A and sides b and c, and I need the computer to calculate which formulas to use in which order to make the best mathematical solution to finding an unknown value.
Right now, I have created an array with numerous options on how to find the unknown value:
Image of array
The array is made like this:[formula: xxx] is a suggestion on what formula to use to find a previous value.
[target: xxx] is the name of the value we're looking for in order to satisfy the need of the previous formula.
There might be more than one target attached to a formula, because a formula might need more than one information in order to be complete.
At the end of each path there is an array showing a mathematical function which can be solved with the information we have, and if you track backwards from there you have enough information to solve the task which was initially given (which was angle B (cannot be seen on the image, but it is))
So, out of all of these solution options I need to find the shortest solution, the solution which requires the fewest steps.
Keep in mind that a formula might have two unknown variables, which means that we need to calculate the total sum of the shortest path and end up with the an array containing the optimal path.
Bonus question: I'm aware that this only solves the problem using the information provided by the user, maybe at some point along the way we find a variable, which is needed at some other point. Maybe if a function needs two variables, it is faster to find A before B before A calculates a "target" which might be needed in B.
I would really like to have a solution for the first part, but I will need to solve the "bonus question" later on.

PHP matrix splitting algorithm

I have the following issue:
Given a sqare matrix (n×n) I want to be able to split it in a certain number of distinct areas. I made the splitting by diagonals but I'm not happy with the result. I want to be able to split the matrix based on at least 2 functions(chosen randomly). The problem with that is that I don't have any ideea on how to implement such splitting. Any suggestion is welcome. Please note that the small parts will have to be recombined later on in the application.

PHP library for word clustering/NLP?

What I am trying to implement is a rather trivial "take search results (as in title & short description), cluster them into meaningful named groups" program in PHP.
After hours of googling and countless searches on SO (yielding interesting results as always, albeit nothing really useful) I'm still unable to find any PHP library that would help me handle clustering.
Is there such a PHP library out there that I might have missed?
If not, is there any FOSS that handles clustering and has a decent API?
Like this:
Use a list of stopwords, get all words or phrases not in the stopwords, count occurances of each, sort in descending order.
The stopwords needs to be a list of all common English terms. It should also include punctuation, and you will need to preg_replace all the punctuation to be a separate word first, e.g. "Something, like this." -> "Something , like this ." OR, you can just remove all punctuation.
$content=preg_replace('/[^a-z\s]/', '', $content); // remove punctuation
$stopwords='the|and|is|your|me|for|where|etc...';
$stopwords=explode('|',$stopwords);
$stopwords=array_flip($stopwords);
$result=array(); $temp=array();
foreach ($content as $s)
if (isset($stopwords[$s]) OR strlen($s)<3)
{
if (sizeof($temp)>0)
{
$result[]=implode(' ',$temp);
$temp=array();
}
} else $temp[]=$s;
if (sizeof($temp)>0) $result[]=implode(' ',$temp);
$phrases=array_count_values($result);
arsort($phrases);
Now you have an associative array in order of the frequency of terms that occur in your input data.
How you want to do the matches depends upon you, and it depends largely on the length of the strings in the input data.
I would see if any of the top 3 array keys match any of the top 3 from any other in the data. These are then your groups.
Let me know if you have any trouble with this.
"... cluster them into meaningful groups" is a bit to vague, you'll need to be more specific.
For starters you could look into K-Means clustering.
Have a look at this page and website:
PHP/irInformation Retrieval and other interesting topics
EDIT: You could try some data mining yourself by cross referencing search results with something like the open directory dmoz RDF data dump and then enumerate the matching categories.
EDIT2: And here is a dmoz/category question that also mentions "Faceted Search"!
Dmoz/Monster algorithme to calculate count of each category and sub category?
If you're doing this for English only, you could use WordNet: http://wordnet.princeton.edu/. It's a lexicon widely used in research which provides, among other things, sets of synonyms for English words. The shortest distance between two words could then serve as a similarity metric to do clustering yourself as zaf proposed.
Apparently there is a PHP interface to WordNet here: http://www.foxsurfer.com/wordnet/. It came up in this question: How to use word Net with php, but I have not tried it. However, interfacing with a command line tool from PHP yourself is feasible as well.
You could also have a look at Programming Collective Intelligence (Chapter 3 : Discovering Groups) by Toby Segaran which goes through just this use case using Python. However, you should be able to implement things in PHP once you understand how it works.
Even though it is not PHP, the Carrot2 project offers several clustering engines and can be integrated with Solr.
This may be way off but check out OpenCalais. They have a web service which allows you to pass a block of text in and it will pass you back a parseable response of things that it found in the text, such as places, people, facts etc. You could use these categories to build your "clouds" and too choose which results to display.
I've used this library a few times in php and it's always been quite easy to work with.
Again, might not be relevant to what your trying to do. Maybe you could post an example of what your trying to accomplish?
If you can pre-define the filters for your faceted search (the named groups) then it will be much easier.
Rather than relying on an algorithm that uses the current searcher's input and their particular results to generate the filter list, you would use an aggregate of the most commonly performed searches by all users and then tag results with them if they match.
You would end up with a table (or something) of URLs in a many-to-many join to a table of tags, so each result url could have several appropriate tags.
When the user searches, you simply match their search against the full index. But for the filters, you take the top results from among the current resultset.
I'll work on query examples if you want.

Pattern matching for people who dont know algorithms - finding adjacent X's in a grid

I'm wondering what the best method would be for me to approach a problem where I need to find adjacent (horizontal, vertical, diagonal) X's in a grid which is provided.
I wanted to know what the recursive way, and the nonrecursive way would be. I tried a recursive method of checking each column, and then iterating rows - that gives me X's in one direction - should I write seperate recursive functions for the other directions?
Example grid:
XXX0X
0000X
00X00
XXXX0
0000X
output should be :
(0,0),(1,0),(2,0)
(4,0),(4,1)
(2,2),(0,3),(1,3),(2,3)(3,3)
You may want to check out the Flood Fill algorithm. You can find it on Wikipedia.
I think what you're describing is more or less it. What you do is basically:
For a given position:
If it is of the desired color (in your case 'O'):
mark it (say, re-color it to a color 'M'),
recurse on all desirable directions (run the same algorithm
on new positions, which are +/-1 away);
else
do nothing.
In your case, the result are the positions marked 'M'. If you want to find additional adjacencies, you can always reset the ones marked 'M' and start the algorithm on a different position.
EDIT: According to your examples, it seems you're looking for adjacent 'X's. :)

PHP Detect Pages Genre/Category

I was wondering if their was any sort of way to detect a pages genre/category.
Possibly their is a way to find keywords or something?
Unfortunately I don't have any idea so far, so I don't have any code to show you.
But if anybody has any ideas at all, let me know.
Thanks!
EDIT #Nican
Perhaps their is a way to set, let's say 10 category's (Entertainment, Funny, Tech).
Then creating keywords for these category's (Funny = Laughter, Funny, Joke etc).
Then searching through a webpage (maybe using a cUrl) for these keywords and assigning it to the right category.
Hope that makes sense.
What you are talking about is basically what Google Adsense and similar services do, and it's based on analyzing the content of a page and matching it to topics. Generally, this kind of stuff is beyond what you would call simple programming / development and would require significant resources to be invested to get it to work "right".
A basic system might work along the following lines:
Get page content
Get X most commonly used words (omitting stuff like "and" "or" etc.)
Get words used in headings
Assign weights to different words according to a set of factors (is used in heading, is used in more than one paragraph, is used in link anchors)
Match the filtered words against a database of words related to a specific "category"
If cumulative score > treshold, classify site as belonging to category
Rinse and repeat
Folksonomy may be a way of accomplishing what you're looking for:
http://en.wikipedia.org/wiki/Folksonomy
For instance, in Drupal they have a Folksonomy module:
http://drupal.org/node/19697 (Note this module appears to be dead, see http://drupal.org/taxonomy/term/71)
Couple that with a tag cloud generator, and you may get somewhere:
http://drupal.org/project/searchcloud
Plus, a little more complexity may be able to derive mapped relationships to other terms, especially if you control the structure of the tagging options.
http://intranetblog.blogware.com/blog/_archives/2008/5/22/3707044.html
EDIT
In general, the type of system you're trying to build relies on unique word values on a page. So you would need to...
Get unique word values from your content (index values or create a bot to crawl your site)
Remove all words and symbols you can't use (at, the, or, and, etc...)
Count the number of times the unique words appear on the page
Add them to some type of datastore so you can call them based on the relationships you're mapping
If you have a root label system in place, associate those values with the word counts on the page (such as a query or derived table)
This is very general, and there are a number of ways this can be implemented/interpreted. Folksonomies are meant to "crowdsource" much of the effort for you, in a "natural way", as long as you have a user base that will contribute.

Categories