I want to find Euclidean distance to check similarity of strings.
From above image in a painting object field there are many image types in database. Images is displaying using this paining_object field. Now i want to show related images of one selected image by comparing strings from paining_object field. So i have used Euclidean distance method to find similarities of strings.
But i am facing issue with length. For ex. In first row from database there are four image types in paining_object field and in the second row there are more than four image types. So, how could i measure distance with this method for the arrays having unequal length.
non euclidean distances
The distance between two unordered arrays can be rephrased as distance between sets.
A quick lookup shows there exists several distances representing the similarity between sets such as
the Jaccard distance
d(a,b) = |a inter b| / |a union b|
the maximum difference metric
d(a,b) = 1 - |a inter b| / max(|a|, |b|)
there are more distances (for instance) on the paper Distances between sets on set commonality
still euclidean distance
You can still force it:
Get all your mangas as a vocabulary V, say size n.
Consider the set R^n.
A row of your table can be represented as a vector v of R^n:
if the row contains word i, put v[i] = 1, v[i]=0 otherwise
Finally the euclidean distance can trivially be applied on the vectors of same length.
distance thus be like
d(a,b) = || v_b - v_a ||_2 = sqrt( (v_b[0] - v_a[0])^2 + ... + (v_b[n-1] - v_a[n-1)^2)
Every square is equal to 1 iff v_b[i]!=v_a[i] that is you want to count the elements in a not in b U b not in a idem the symmetric difference of a and b.
You can thus rewrite your distance:
d(a,b) = sqrt(|a Δ b|)
I have did this by using Jaccard distance as below. First created two tables for unique object from where we can collect object from id and second where all object comes together seperated from (,)
1) image_sub_main Table
2) image_main Table
3) PHP file as Wordpress Way
global $wpdb;
$post_id = $wpdb->get_results("SELECT * FROM `image_main`");
$i=1;
$finimgarray = array();
$aa = array();
$bb = array();
$firstarray = array('similarity' =>100 , 'id' => $post_id[0]->id );
foreach($post_id as $key => $post){
if($i < count($post_id)){
$arraya =$post_id[0]->image_types;
$a = explode(",",$arraya);
$arrayb =$post_id[$i]->image_types;
$b = explode(",",$arrayb);
$array = array_unique (array_merge ($a, $b));
$result=array_intersect($a,$b);
$finalres = count($result) / count($array)*100 ;
$finimgarray[] = array('similarity' =>round($finalres, 2) , 'id' => $post_id[$i]->id );
}
$i++;
}
array_push($finimgarray, $firstarray);
arsort($finimgarray);
foreach($finimgarray as $findimgarr){
$id = $findimgarr['id'];
$image = $wpdb->get_row("SELECT * FROM `image_main` WHERE `id` = $id ");
echo "<img src='$image->image'/>";
}
Your output will compare images to first one image and show according to similarity %
We cannot apply Euclidean Distance here because:
The array lengths can be different
The order of strings should not be considered. For example, hellsing
can be at any index in the array. So, we should not compare the first element of the first array with the first element of the second array only.
Instead, we can define a similarity function which takes care of both the above problems - we can use the ratio of number of string matches to the total number of combinations as the similarity score.
// Assuming $firstArr and $secondArr are sets, i.e., don't contain duplicates
function similarityScore($firstArr, $secondArr) {
$matchCount = 0;
foreach ($firstArr as $first) {
foreach($secondArr as $second) {
if ($first == $second) {
$matchCount++;
}
}
}
return $matchCount/(count($firstArr)*count($secondArr));
}
This function returns a real number in the range [0,1] where higher value indicates greater similarity.
Related
Haven't programmed in a while and here I am with this simple problem, basically I need to generate two random numbers that are within a maximum limit, but with set distance apart from each other in order to randomly (each time) call fields from the database using the mysql LIMIT.
ex. say there are 95 database fields ($max), fields returned = 20 ($fields) (number of fields that should be returned) or range limit 20 (ie difference between min - max is 20)
SELECT............... LIMIT $a, $fields
$a - $b = 20
$b < 95
I just use ...
$a = floor(mt_rand(0, ($max-$fields)));
to generate an $a and use the $fields value which gives me all the info I need (ie from where to start getting the records ($a) and how many records to get ($fields))
Any ideas on another way on how to do this?
If your table is always bigger then the required distance of the two random numbers you can go about it like this:
// $row_count is the number of rows you have in the database
$max = mt_rand($distance, $row_count);
$min = $max - $distance;
// now you can use these in sql like
$sql = "select ... offset {$min} limit {$distance}";
References: mt_rand
In a browser game we have items that occur based on their probabilities.
P(i1) = 0.8
P(i2) = 0.45
P(i3) = 0.33
P(i4) = 0.01
How do we implement a function in php that returns a random item based on its probability chance?
edit
The items have a property called rarity which varies from 1 to 100 and represents the probability to occcur. The item that occurs is chosen from a set of all items of a certain type. (e.x the given example above represents all artifacts tier 1)
I don't know if its the best solution but when I had to solve this a while back this is what I found:
Function taken from this blog post:
// Given an array of values, and weights for those values (any positive int)
// it will select a value randomly as often as the given weight allows.
// for example:
// values(A, B, C, D)
// weights(30, 50, 100, 25)
// Given these values C should come out twice as often as B, and 4 times as often as D.
function weighted_random($values, $weights){
$count = count($values);
$i = 0;
$n = 0;
$num = mt_rand(0, array_sum($weights));
while($i < $count){
$n += $weights[$i];
if($n >= $num){
break;
}
$i++;
}
return $values[$i];
}
Example call:
$values = array('A','B','C');
$weights = array(1,50,100);
$weighted_value = weighted_random($values, $weights);
It's somewhat unwieldy as obviously the values and weights need to be supplied separately but this could probably be refactored to suit your needs.
Tried to understand how Bulk's function works, and here is how I understand based on Benjamin Kloster answer:
https://softwareengineering.stackexchange.com/questions/150616/return-random-list-item-by-its-weight
Generate a random number n in the range of 0 to sum(weights), in this case $num so lets say from this: weights(30, 50, 100, 25).
Sum is 205.
Now $num has to be 0-30 to get A,
30-80 to get B
80-180 to get C
and 180-205 to get D
While loop finds in which interval the $num falls.
I have a fully-populated array of values, and I would like to arbitrarily remove elements from this array with more removed towards the far end.
For example, given input ( where a . signifies a populated index )
............................................
I would like something like
....... . ... .. . . .. . .
My first thought was to count the elements, then iterate over the array generating a random number somewhere between the current index and the total size of the array, eg:
if ( mt_rand( 0, $total ) > $total - $current_index )
//remove this element
however, as this entails making a random number each time the loop goes round it becomes very arduous.
Is there a better way of doing this?
One easy way is to flip a weighted coin for each entry with coin flips more weighted towards the end. For example, if the array is size n, for each entry you could choose a random number from 0 to n-1 and only keep the value if the index is less than or equal to the random number. (That is, keep each entry with probability 1 - index/total.) This has the nice advantage that if you're going to be compacting your array anyways, and you're using a good enough but efficient random number generator (could be a simple integer hash over a nonce), it's going to be rather fast for memory access.
On the other hand if you're only blanking out a few items and aren't rearranging the array, you can go with some sort of weighted random number generator that more often chooses numbers that are toward the end of the index. For example, if you have a random number generator that generates floats in the value of [0,1] (closed or open bounds not mattering that much likely), consider obtaining such a random float r and squaring it. This will tend to prefer lower values. You can fix this by flipping it around: 1-r^2. Of course, you need this to be in your index range of 0 to n - 1, so take floor(n * (1 - r^2)) and also round n down to n-1.
There's practically an infinite number of variations on both of these techniques.
This is quite probably not the best/most efficient way to do this, but it is the best I can come up with and it does work.
N.B. the codepad example takes a long time to execute, but this is because of the pretty-print loop I added to the end so you can see it visibly working. If you remove the inner loop, execution time drops to acceptable levels.
<?php
$array = range(0, 99);
for ($i = 0, $count = count($array); $i < $count; $i++) {
// Get array keys
$keys = array_keys($array);
// Get a random number between 0 and count($keys) - 1
$rand = mt_rand(0, count($keys) - 1);
// Cut $rand elements off the beginning of the keys
$keys = array_slice($keys, $rand);
// Unset a random key from the remaining keys
unset($array[$keys[array_rand($keys)]]);
}
This method isn't random- it works by you defining a function, and its inverse. Different functions, with different constant coefficients will have different distribution characteristics.
The results are very pattern like, as expected when mapping a continuous function to a discrete structure like an array.
Here's an example using a quadratic function. You could try varying the constant.
demo: http://codepad.org/ojU3s9xM
#as in y = x^2 / 7;
function y($x) {
return $x * $x / 7;
}
function x($y) {
return 7 * sqrt($y);
}
$theArray = range(0,100);
$size = count($theArray);
//use func inverse to find the max value we can input to $y() without going out of array bounds
$maximumX = x($size);
for ($i=0; $i<$maximumX; $i++) {
$index = (int) y($i);
//unset the index if it still exists, else, the next greatest index
while (!isset($theArray[$index]) && $index < $size) {
$index++;
}
unset($theArray[$index]);
}
for ($i=0; $i<$size; $i++) {
printf("[%-3s]", isset($theArray[$i]) ? $theArray[$i] : '');
}
I am working on a Web Application that includes long listings of names. The client originally wanted to have the names split up into divs by letter so it is easy to jump to a particular name on the list.
Now, looking at the list, the client pointed out several letters that have only one or two names associated with them. He now wants to know if we can combine several consecutive letters if there are only a few names in each.
(Note that letters with no names are not displayed at all.)
What I do right now is have the database server return a sorted list, then keep a variable containing the current character. I run through the list of names, incrementing the character and printing the opening and closing div and ul tags as I get to each letter. I know how to adapt this code to combine some letters, however, the one thing I'm not sure about how to handle is whether a particular combination of letters is the best-possible one. In other words, say that I have:
A - 12 names
B - 2 names
C - 1 name
D - 1 name
E - 1 name
F - 23 names
I know how to end up with a group A-C and then have D by itself. What I'm looking for is an efficient way to realize that A should be by itself and then B-D should be together.
I am not really sure where to start looking at this.
If it makes any difference, this code will be used in a Kohana Framework module.
UPDATE 2012-04-04:
Here is a clarification of what I need:
Say the minimum number of items I want in a group is 30. Now say that letter A has 25 items, letters B, C, and D, have 10 items each, and letter E has 32 items. I want to leave A alone because it will be better to combine B+C+D. The simple way to combine them is A+B, C+D+E - which is not what I want.
In other words, I need the best fit that comes closest to the minimum per group.
If a letter contains more than 10 names, or whatever reasonable limit you set, do not combine it with the next one. However, if you start combining letters, you might have it run until 15 or so names are collected if you want, as long as no individual letter has more than 10. That's not a universal solution, but it's how I'd solve it.
I came up with this function using PHP.
It groups letters that combined have over $ammount names in it.
function split_by_initials($names,$ammount,$tollerance = 0) {
$total = count($names);
foreach($names as $name) {
$filtered[$name[0]][] = $name;
}
$count = 0;
$key = '';
$temp = array();
foreach ($filtered as $initial => $split) {
$count += count($split);
$temp = array_merge($split,$temp);
$key .= $initial.'-';
if ($count >= $ammount || $count >= $ammount - $tollerance) {
$result[$key] = $temp;
$count = 0;
$key = '';
$temp = array();
}
}
return $result;
}
the 3rd parameter is used for when you want to limit the group to a single letter that doesn't have the ammount specified but is close enough.
Something like
i want to split in groups of 30
but a has 25
to so, if you set a tollerance of 5, A will be left alone and the other letters will be grouped.
I forgot to mention but it returns a multi dimensional array with the letters it contains as key then the names it contains.
Something like
Array
(
[A-B-C-] => Array
(
[0] => Bandice Bergen
[1] => Arey Lowell
[2] => Carmen Miranda
)
)
It is not exactly what you needed but i think it's close enough.
Using the jsfiddle that mrsherman put, I came up with something that could work: http://jsfiddle.net/F2Ahh/
Obviously that is to be used as a pseudocode, some techniques to make it more efficient could be applied. But that gets the job done.
Javascrip Version: enhanced version with sort and symbols grouping
function group_by_initials(names,ammount,tollerance) {
tolerance=tollerance||0;
total = names.length;
var filtered={}
var result={};
$.each(names,function(key,value){
val=value.trim();
var pattern = /[a-zA-Z0-9&_\.-]/
if(val[0].match(pattern)) {
intial=val[0];
}
else
{
intial='sym';
}
if(!(intial in filtered))
filtered[intial]=[];
filtered[intial].push(val);
})
var count = 0;
var key = '';
var temp = [];
$.each(Object.keys(filtered).sort(),function(ky,value){
count += filtered[value].length;
temp = temp.concat(filtered[value])
key += value+'-';
if (count >= ammount || count >= ammount - tollerance) {
key = key.substring(0, key.length - 1);
result[key] = temp;
count = 0;
key = '';
temp = [];
}
})
return result;
}
Background;
to create a dropdown menu for a fun gambling game (Students can 'bet' how much that they are right) within a form.
Variables;
$balance
Students begin with £3 and play on the £10 table
$table(there is a;
£10 table, with a range of 1,2,3 etc to 10.
£100 table with a range of 10,20,30 etc to 100.
£1,000 table with a range of 100, 200, 300, 400 etc to 1000.)
I have assigned $table to equal number of zeros on max value,
eg $table = 2; for the £100 table
Limitations;
I only want the drop down menu to offer the highest 12 possible values (this could include the table below -IMP!).
Students are NOT automatically allowed to play on the 'next' table.
resources;
an array of possible values;
$a = array(1,2,3,4,5,6,7,8,9,10,20,30,40,50,60,70,80,90,10,20,30,40,50,60,70,80,90,100,200,300,400,500,600,700,800,900,1000);
I can write a way to restrict the array by table;
(the maximum key for any table is (9*$table) )//hence why i use the zeroes above (the real game goes to $1 billion!)
$arrayMaxPos = (9*$table);
$maxbyTable = array_slice($a, 0, $arrayMaxPos);
Now I need a way to make sure no VALUE in the $maxbyTable is greater than $balance.
to create a $maxBet array of all allowed bets.
THIS IS WHERE I'M STUCK!
(I would then perform "array_slice($maxBet, -12);" to present only the highest 12 in the dropdown)
EDIT - I'd prefer to NOT have to use array walk because it seems unnecessary when I know where i want the array to end.
SECOND EDIT Apologies I realised that there is a way to mathematically ascertain which KEY maps to the highest possible bid.
It would be as follows
$integerLength = strlen($balance);//number of digits in $balance
$firstDigit = substr($balance, 0, 1);
then with some trickery because of this particular pattern
$maxKeyValue = (($integerlength*9) - 10 + $firstDigit);
So for example;
$balance = 792;
$maxKeyValue = ((3*9) - 10 + 7);// (key[24] = 700)
This though works on this problem and does not solve my programming problem.
Optional!
First of all, assuming the same rule applies, you don't need the $a array to know what prices are allowed on table $n
$table = $n; //$n being an integer
for ($i = 1; $i <= 10; $i++) {
$a[] = $i * pow(10, $n);
}
Will generate a perfectly valid array (where table #1 is 1-10, table #2 is 10-100 etc).
As for slicing it according to value, use a foreach loop and generate a new array, then stop when you hit the limit.
foreach ($a as $value) {
if ($value > $balance) { break; }
$allowedByTable[] = $value;
}
This will leave you with an array $allowedByTable that only has the possible bets which are lower then the user's current balance.
Important note
Even though you set what you think is right as options, never trust the user input and always validate the input on the server side. It's fairly trivial for someone to change the value in the combobox using DOM manipulation and bet on sums he's not supposed to have. Always check that the input you're getting is what you expect it to be!