Price Filter Grouping Algorithm - php

I am creating an ecommerce site, and I am having trouble developing a good algorithm to sort a products that are pulled from the database into halfway appropriate groups. I have tried simply dividing the highest price into 4, and basing each group off that. I also tried standard deviations based around the mean. Both could result with price ranges that no product would fall into, which isn't a useful filtering option.
I also tried take quartiles of the products, but my problem is that the price ranges from $1 items to $4,000. The $4,000 almost never sell, and are far less important, but they keep skewing my results.
Any thoughts? I should have paid more attention in stats class ...
Update:
I ended up combining methods a bit. I used the quartile/bucket method, but hacked it a bit by hardcoding certain ranges within which a greater number of price groups would appear.
//Price range algorithm
sort($prices);
//Divide the number of prices into four groups
$quartilelength = count($prices)/4;
//Round to the nearest ...
$simplifier = 10;
//Get the total range of the prices
$range = max($prices)-min($prices);
//Assuming we actually are working with multiple prices
if ($range>0 )
{
// If there is a decent spread in price, and there are a decent number of prices, give more price groups
if ($range>20 && count($prices) > 10)
{
$priceranges[0] = floor($prices[floor($quartilelength)]/$simplifier)*$simplifier;
}
// Always grab the median price
$priceranges[1] = floor($prices[floor($quartilelength*2)]/$simplifier)*$simplifier;
// If there is a decent spread in price, and there are a decent number of prices, give more price groups
if ($range>20 && count($this->data->prices) > 10)
{
$priceranges[2] = floor($prices[floor($quartilelength*3)]/$simplifier)*$simplifier;
}
}

Here is an idea: basically you would sort the price into buckets of 10, each price as the key in the array, the value is a count of how many products are at the given price point:
public function priceBuckets($prices)
{
sort($prices);
$buckets = array(array());
$a = 0;
$c = count($prices);
for($i = 0; $i !== $c; ++$i) {
if(count($buckets[$a]) === 10) {
++$a;
$buckets[$a] = array();
}
if(isset($buckets[$a][$prices[$i]])) {
++$buckets[$a][$prices[$i]];
} else if(isset($buckets[$a - 1][$prices[$i]])) {
++$buckets[$a - 1][$prices[$i]];
} else {
$buckets[$a][$prices[$i]] = 1;
}
}
return $buckets;
}
//TEST CODE
$prices = array();
for($i = 0; $i !== 50; ++$i) {
$prices[] = rand(1, 100);
}
var_dump(priceBuckets($prices));
From the result, you can use reset and end to get the min/max of each bucket
Kinda brute force, but might be useful...

Here is an idea, following the line of thought of my comment:
I assume you have a set of products, each of them tagged by a price and a sales volume estimate (as a percent from the total sales). First, sort all products by their price. Next, start splitting: traverse the ordered list, and accumulate sales volume. Each time you reach about 25%, cut there. If you do so 3 times, it will result in 4 subsets having disjoint price ranges, and a similar sales volume.

What exactly are you looking for as your end result (could you give us an example grouping)? If your only goal is for all groups to have a significant number of important enough products, then, even if you come up with the perfect algorithm that works for your current data set that does not mean it will work with tomorrow's dataset. Depending on the number of sets of groups you need I would simply make arbitrary groups that fit your needs instead of using an algorithm. Ex. ($1 - $25, $25-100, $100+). From a consumer's perspective my mind naturally distributes products into 3 difference price categories (cheap, midrange and expensive).

I think you're thinking too much.
If you know your products, and you like fine grained results, I would simply hard code those price ranges.
If you think $1 to $10 makes sense for what you are selling, put it in, you don't need an algorithm. Just do a check so that you only show ranges that have results.
If you don't know your products, I would just sort all the products by price, and divide it into 4 groups of equal number of products.

Related

Compare one by one characters from a mysql db with php

I'm trying to compare in my DB a row with another character by character and give as a result the id which best fits the given data. For example I have on my DB the user David with a AAA sequence and I want to compare it with one I give in which is a ABA so I'd like to receive a percentage (66.6% in this case) of match,
I have done until here but don't know how to go on:
$uname = $_POST['sequence'];
$query = "SELECT name FROM dna WHERE sequence = '$uname'";
$result = mysql_query($query);
while($row = mysql_fetch_array($result))
{
echo $row['name'];
}
In order to get the similarity in percent, you might use the PHP function similar_text().
The two strings are compared and the similarity percentage is returned, if the third parameter is passed to the function.
$string_1 = 'AAA';
$string_2 = 'ABA';
similar_text($string_1, $string_2, $percent);
echo $percent;
// 66.666666666667
The database part is a bit more work. A very basic implementation could look like this.
Keep in mind, that the real problem is, that you compare a string against 1 million rows.
In general: one wouldn't do that, because instead of chars, there a bits. And to compare bits, you would use simply bit-shifts. Anyway...
Here, when working with chars/strings, a rolling row requests or limited query could help, too.
That would mean, that you ask the db for chunks of let's say 500 rows and do the calc work.
It depends on the number of rows and the memory use of the dataset.
// incomming via user input
$string_1 = $_POST['sequence'];
// temporary var to store the highest similarity percentage and it's row_id
$bestValue = array('row_id' => 0, 'similarity' => '0');
// iterate over the "total number of rows" in the database
foreach($rows as $id => $row)
{
// get a new string_2 from db
$string_2 = $row['name'];
// calculate similarity
similar_text($string_1, $string_2, $percent);
// if calculated similarity is higher, then update the "best" value
if($percent > $bestValue['similarity']) {
$bestValue = array('row_id' = $id, 'similiarity' = $percent);
}
}
var_dump($bestValue);
After all db rows are processed, bestValue will containg the highest percentage and it's row id.
You can do all kinds of things here, for instance:
switch from first match update (<) to last match update (<=)
stop iteration on first match
store row_id's, which have the same similarity (multi row match)
if you don't need multi row match, you might drop the array and use two vars for row and percent
proper error handling, escaping, mysqli usage
Be warned: this isn't the most efficient approach, especially not, when working with large datasets. If you need this on a level, which is not hobby or homework, then simply pull a tool, which is optimized for this job, like EMBOSS (http://emboss.sourceforge.net/).

Building pricing logic using PHP

Hope y'all are having a wonderful day full of rainbows and roses!
Ok, ok .. I'll get to the point:
I've got a function that accepts multiple parameters and calculates the price of a product based on those parameters. Here's a conceptual example:
public function getPrice( $params ) {
// black magic goes here
return $price;
}
The parameters include:
$params[width] which ranges all the way from 10 - 72 inches
$params[height] which ranges from 10 - 44 inches
(There are actually more params, but for the sake of simplicity, I've kept those out).
Now, I have a table in Excel (something like a truth table) with rows that represent the width and columns that represent the height. The value in the corresponding cell is the price.
How could I best implement this pricing strategy in PHP? I thought nested if statements would work but got tired after the 10th if. Help?
You could store the information from Excel in a 2D array.
I.e.
$prices = array( // Columns --
array(1,2,3), // Rows
array(4,5,6), // |
array(7,8,9) // |
);
Then you can look up your price based on width/height by doing:
return $prices[row][column];
In your case, row would be width and column would be height.
Some extra work would be required as you have a range starting at 10, so you'd need to subtract 10 from the value you enter.
I.e:
return $prices[width-10][height-10];.
Why wouldn't you just put all that data in your database? Have a row with all values of every parameter, and the accoriding price ? It's just 1 query away ...

Calculate distances and sort them

I wrote a function that can calculate the distance between two addresses using the Google Maps API.
The addresses are obtained from the database. What I want to do is calculate the distance using the function I wrote and sort the places according to the distance. Just like "Locate Store Near You" feature in online stores.
I'm going to specify what I want to do with an example:
So, lets say we have 10 addresses in database. And we have a variable $currentlocation. And I have a function called calcdist(), so that I can calculate the distances between 10 addresses and $currentlocation, and sort them. Here is how I do it:
$query = mysql_query("SELECT name, address FROM table");
while ($write = mysql_fetch_array($query)) {
$distance = array(calcdist($currentlocation, $write["address"]));
sort($distance);
for ($i=0; $i<1; $i++) {
echo "<tr><td><strong>".$distance[$i]." kms</strong></td><td>".$write['name']."</td></tr>";
}
}
But this doesn't work very well. It doesn't sort the numbers.
Another challenge:
How can I do this in an efficient way? Imagine there are infinite numbers of addresses; how can I sort these addresses and page them?
$query = mysql_query("SELECT name, address FROM table");
$rows = array();
while ($row = mysql_fetch_array($query)) {
$row['distance'] = array(calcdist($currentlocation, $row['address']));
$rows[$row['name']] = $row;
}
function cmp_distances($a, $b) {
if($a['distance'] > $b['distance']) return 1;
elseif($a['distance'] < $b['distance']) return -1;
else return 0;
}
// sort distances while preserving key=>value associations
uasort($rows, 'cmp_distances');
// iterate over the sortest list and displaythe entries
foreach($rows as $name => $row) {
echo '<tr><td><strong>'.$row['distance'].' km</strong></td><td>'.$name.'</td></tr>';
}
In your example you calculate the distance to one address at the time:
$distance = array(calcdist($currentlocation, $write["address"]));
And when you do this
sort($distance);
you only have one item in your array. Basically you are printing the values exactly in the same order they are coming from the db, before the distance calculation.
You could:
1) Calculate all the addresses and put them into an array
2) Sort the array
3) Print out the results
About the another challenge you mentioned. This is a bit more tricky and I'm sure there are plenty of options. I would start by thinking how many addresses you really need to compare with each other? Is it really infinite? :)
Is this inside one country or world wide? In your db addresses, you most likely have the postal code. You can use this to narrow the search. Use only the postal codes near by and make the calculations only for those addresses.
But rule of thumb usually is that we worry about the performance too soon. Before it's even a problem.
I think you got some nice answers about your first question.
Concerning the second problem, it depends little bit how your database looks like. Do you store just a string with an address? I assume that you use some geocoding service to convert the address to a (lat,lon) position and then you calculate the distance, is this right?
In case you do something like this, you could start saving the coordinates for each geocoded address in your dataabse. In this way you would geocode the address only once (maybe you later will be willing to update this information now and then, but this is another issue).
Once you have in your table "Address,lat,lon" you can use SQL to narrow down your search imposing some conditions on (lat,lon) or you may even try to make SQL do the whole job for you, defining a new column (in the result set) like for example distance = sqrt((lat-x)^2 + (lon-y)^2)) where (x,y) is the point you start from (the point where the user is) and later return the first N results sorted by distance

How can I create a specified amount of random values that all equal up to a specified number in PHP?

For example, say I enter '10' for the amount of values, and '10000' as a total amount.
The script would need to randomize 10 different numbers that all equal up to 10000. No more, no less.
But it needs to be dynamic, as well. As in, sometimes I might enter '5' or '6' or even '99' for the amount of values, and any number (up to a billion or even higher) as the total amount.
How would I go about doing this?
EDIT: I should also mention that all numbers need to be a positive integer
The correct answer here is unbelievably simple.
Just imagine a white line, let's say 1000 units long.
You want to divide the line in to ten parts, using red marks.
VERY SIMPLY, CHOOSE NINE RANDOM NUMBERS and put a red paint mark at each of those points.
It's just that simple. You're done!
Thus, the algorithm is:
(1) pick nine random numbers between 0 and 1000
(2) put the nine numbers, a zero, and a 1000, in an array
(3) sort the array
(4) using subtraction get the ten "distances" between array values
You're done.
(Obviously if you want to have no zeros in your final set, in part (1) simply rechoose another random number if you get a collision.)
Ideally as programmers, we can "see" visual algorithms like this in our heads -- try to think visually whatever we do!
Footnote - for any non-programmers reading this, just to be clear pls note that this is like "the first thing you ever learn when studying computer science!" i.e. I do not get any credit for this, I just typed in the answer since I stumbled on the page. No kudos to me!
Just for the record another common approach (depending on the desired outcome, whether you're dealing with real or whole numbers, and other constraints) is also very "ah hah!" elegant. All you do is this: get 10 random numbers. Add them up. Remarkably simply, just: multiply or divide them all by some number, so that, the total is the desired total! It's that easy!
maybe something like this:
set max amount remaining to the target number
loop for 1 to the number of values you want - 1
get a random number from 0 to the max amount remaining
set new max amount remaining to old max amount remaining minus the current random number
repeat loop
you will end up with a 'remainder' so the last number is determined by whatever is left over to make up the original total.
Generate 10 random numbers till 10000 .
Sort them from big to small : g0 to g9
g0 = 10000 - r0
g1 = r0 - r1
...
g8 = r8 - r9
g9 = r9
This will yield 10 random numbers over the full range which add up to 10000.
I believe the answer provided by #JoeBlow is largely correct, but only if the 'randomness' desired requires uniform distribution. In a comment on that answer, #Artefacto said this:
It may be simple but it does not generate uniformly distributed numbers...
Itis biased in favor of numbers of size 1000/10 (for a sum of 1000 and 10 numbers).
This begs the question which was mentioned previously regarding the desired distribution of these numbers. JoeBlow's method does ensure a that element 1 has the same chance at being number x as element 2, which means that it must be biased towards numbers of size Max/n. Whether the OP wanted a more likely shot at a single element approaching Max or wanted a uniform distribution was not made clear in the question. [Apologies - I am not sure from a terminology perspective whether that makes a 'uniform distribution', so I refer to it in layman's terms only]
In all, it is incorrect to say that a 'random' list of elements is necessarily uniformly distributed. The missing element, as stated in other comments above, is the desired distribution.
To demonstrate this, I propose the following solution, which contains sequential random numbers of a random distribution pattern. Such a solution would be useful if the first element should have an equal chance at any number between 0-N, with each subsequent number having an equal chance at any number between 0-[Remaining Total]:
[Pseudo code]:
Create Array of size N
Create Integer of size Max
Loop through each element of N Except the last one
N(i) = RandomBetween (0, Max)
Max = Max - N(i)
End Loop
N(N) = Max
It may be necessary to take these elements and randomize their order after they have been created, depending on how they will be used [otherwise, the average size of each element decreases with each iteration].
Update: #Joe Blow has the perfect answer. My answer has the special feature of generating chunks of approximately the same size (or at least a difference no bigger than (10000 / 10)), leaving it in place for that reason.
The easiest and fastest approach that comes to my mind is:
Divide 10000 by 10 and store the values in an array. (10 times the value 10000)
Walk through every one of the 10 elements in a for loop.
From each element, subtract a random number between (10000 / 10).
Add that number to the following element.
This will give you a number of random values that, when added, will result in the end value (ignoring floating point issues).
Should be half-way easy to implement.
You'll reach PHP's maximum integer limit at some point, though. Not sure how far this can be used for values towards a billion and beyond.
Related: http://www.mathworks.cn/matlabcentral/newsreader/view_thread/141395
See this MATLAB package. It is accompanied with a file with the theory behind the implementation.
This function generates random, uniformly distributed vectors, x = [x1,x2,x3,...,xn]', which have a specified sum s, and for which we have a <= xi <= b, for specified values a and b. It is helpful to regard such vectors as points belonging to n-dimensional Euclidean space and lying in an n-1 dimensional hyperplane constrained to the sum s. Since, for all a and b, the problem can easily be rescaled to the case where a = 0 and b = 1, we will henceforth assume in this description that this is the case, and that we are operating within the unit n-dimensional "cube".
This is the implementation (© Roger Stafford):
function [x,v] = randfixedsum(n,m,s,a,b)
% Rescale to a unit cube: 0 <= x(i) <= 1
s = (s-n*a)/(b-a);
% Construct the transition probability table, t.
% t(i,j) will be utilized only in the region where j <= i + 1.
k = max(min(floor(s),n-1),0); % Must have 0 <= k <= n-1
s = max(min(s,k+1),k); % Must have k <= s <= k+1
s1 = s - [k:-1:k-n+1]; % s1 & s2 will never be negative
s2 = [k+n:-1:k+1] - s;
w = zeros(n,n+1); w(1,2) = realmax; % Scale for full 'double' range
t = zeros(n-1,n);
tiny = 2^(-1074); % The smallest positive matlab 'double' no.
for i = 2:n
tmp1 = w(i-1,2:i+1).*s1(1:i)/i;
tmp2 = w(i-1,1:i).*s2(n-i+1:n)/i;
w(i,2:i+1) = tmp1 + tmp2;
tmp3 = w(i,2:i+1) + tiny; % In case tmp1 & tmp2 are both 0,
tmp4 = (s2(n-i+1:n) > s1(1:i)); % then t is 0 on left & 1 on right
t(i-1,1:i) = (tmp2./tmp3).*tmp4 + (1-tmp1./tmp3).*(~tmp4);
end
% Derive the polytope volume v from the appropriate
% element in the bottom row of w.
v = n^(3/2)*(w(n,k+2)/realmax)*(b-a)^(n-1);
% Now compute the matrix x.
x = zeros(n,m);
if m == 0, return, end % If m is zero, quit with x = []
rt = rand(n-1,m); % For random selection of simplex type
rs = rand(n-1,m); % For random location within a simplex
s = repmat(s,1,m);
j = repmat(k+1,1,m); % For indexing in the t table
sm = zeros(1,m); pr = ones(1,m); % Start with sum zero & product 1
for i = n-1:-1:1 % Work backwards in the t table
e = (rt(n-i,:)<=t(i,j)); % Use rt to choose a transition
sx = rs(n-i,:).^(1/i); % Use rs to compute next simplex coord.
sm = sm + (1-sx).*pr.*s/(i+1); % Update sum
pr = sx.*pr; % Update product
x(n-i,:) = sm + pr.*e; % Calculate x using simplex coords.
s = s - e; j = j - e; % Transition adjustment
end
x(n,:) = sm + pr.*s; % Compute the last x
% Randomly permute the order in the columns of x and rescale.
rp = rand(n,m); % Use rp to carry out a matrix 'randperm'
[ig,p] = sort(rp); % The values placed in ig are ignored
x = (b-a)*x(p+repmat([0:n:n*(m-1)],n,1))+a; % Permute & rescale x
return

get max value in php (instead of mysql)

I have two msyql tables, Badges and Events. I use a join to find all the events and return the badge info for that event (title & description) using the following code:
SELECT COUNT(Badges.badge_ID) AS
badge_count,title,Badges.description
FROM Badges JOIN Events ON
Badges.badge_id=Events.badge_id GROUP
BY title ASC
In addition to the counts, I need to know the value of the event with the most entries. I thought I'd do this in php with the max() function, but I had trouble getting that to work correctly. So, I decided I could get the same result by modifying the above query by using "ORDER BY badgecount DESC LIMIT 1," which returns an array of a single element, whose value is the highest count total of all the events.
While this solution works well for me, I'm curious if it is taking more resources to make 2 calls to the server (b/c I'm now using two queries) instead of working it out in php. If I did do it in php, how could I get the max value of a particular item in an associative array (it would be nice to be able to return the key and the value, if possible)?
EDIT:
OK, it's amazing what a few hours of rest will do for the mind. I opened up my code this morning, and made a simple modification to the code, which worked out for me. I simply created a variable on the count field and, if the new one was greater than the old one, changed it to the new value (see the "if" statement in the following code):
if ( $c > $highestCount ) {
$highestCount = $c; }
This might again lead to a "religious war", but I would go with the two queries version. To me it is cleaner to have data handling in the database as much as possible. In the long run, query caching, etc.. would even out the overhead caused by the extra query.
Anyway, to get the max in PHP, you simply need to iterate over your $results array:
getMax($results) {
if (count($results) == 0) {
return NULL;
}
$max = reset($results);
for($results as $elem) {
if ($max < $elem) { // need to do specific comparison here
$max = $elem;
}
}
return $max;
}

Categories