I've got a table with 1000 recipes in it, each recipe has calories, protein, carbs and fat values associated with it.
I need to figure out an algorithm in PHP that will allow me to specify value ranges for calories, protein, carbs and fat as well as dictating the number of recipes in each permutation. Something like:
getPermutations($recipes, $lowCal, $highCal, $lowProt, $highProt, $lowCarb, $highCarb, $lowFat, $highFat, $countRecipes)
The end goal is allowing a user to input their calorie/protein/carb/fat goals for the day (as a range, 1500-1600 calories for example), as well as how many meals they would like to eat (count of recipes in each set) and returning all the different meal combinations that fit their goals.
I've tried this previously by populating a table with every possible combination (see: Best way to create Combination of records (Order does not matter, no repetition allowed) in mySQL tables ) and querying it with the range limits, however that proved not to be efficient as I end up with billions of records to scan through and it takes an indefinite amount of time.
I've found some permutation algorithms that are close to what I need, but don't have the value range restraint for calories/protein/carbs/fat that I'm looking for (see: Create fixed length non-repeating permutation of larger set) I'm at a loss at this point when it comes to this type of logic/math, so any help is MUCH appreciated.
Based on some comment clarification, I can suggest one way to go about it. Specifically, this is my "try the simplest thing that could possibly work" approach to a problem that is potentially quite tricky.
First, the tricky part is that the sum of all meals has to be in a certain range, but SQL does not have a built-in feature that I'm aware of that does specifically what you want in one pass; that's ok, though, as we can just implement this functionality in PHP instead.
So lets say you request 5 meals that will total 2000 calories - we leave the other variables aside for simplicity, but they will work the same way. We then calculate that the 'average' meal is 2000/5=400 calories, but obviously any one meal could be over or under that amount. I'm no dietician, but I assume you'll want no meal that takes up more than 1.25x-2x the average meal size, so we can restrict out initial query to this amount.
$maxCalPerMeal = ($highCal / $countRecipes) * 1.5;
$mealPlanCaloriesRemaining = $highCal; # more on this one in a minute
We then request 1 random meal which is less than $maxCalPerMeal, and 'save' it as our first meal. We then subtract its actual calorie count from $mealPlanCaloriesRemaining. We now recalculate:
$maxCalPerMeal = ($highCal / $countRecipesRemaining) * 1.5); # 1.5 being a maximum deviation from average multiple
Now the next query will ask for both a random meal that is less than $maxCalPerMeal AND $mealPlanCaloriesRemaining, AND NOT one of the meals you already have saved in this particular meal plan option (thus ensuring unique meals - no mac'n'cheese for breakfast, lunch, and dinner!). And we update the variables as in the last query, until you reach the end. For the last meal requested it we don't care about the average and it's associated multiple, as thanks to a compound query you'll get what you want anyway and don't need to complicate your control loops.
Assuming the worst case with the 5 meal 2000 calorie max diet:
Meal 1: 600 calories
Meal 2: 437
Meal 3: 381
Meal 4: 301
Meal 5: 281
Or something like that, and in most cases you'll get something a bit nicer and more random. But in the worst-case it still works! Now this actually just plain works for the usual case. Adding more maximums like for fat and protein, etc, is easy, so lets deal with the lows next.
All we need to do to support "minimum calories per day" is add another set of averages, as such:
$minCalPerMeal = ($lowCal / $countRecipes) * .5 # this time our multiplier is less than one, as we allow for meals to be bigger than average we must allow them to be smaller as well
And you restrict the query to being greater than this calculated minimum, recalculating with each loop, and happiness naturally ensues.
Finally we must deal with the degenerate case - what if using this method you end up needing a meal that is to small or too big to fill the last slot? Well, you can handle this a number of ways. Here's what I'd recommended.
The easiest is just returning less than the desired amount of meals, but this might be unacceptable. You could also have special low calorie meals that, due to the minimum average dietary content, would only be likely to be returned if someone really had to squeeze in a light meal to make the plan work. I rather like this solution.
The second easiest is throw out the meal plan you have so far and regenerate from scratch; it might work this time, or it just might not, so you'll need a control loop to make sure you don't get into an infinite work-intensive loop.
The least easy, requires a control loop max iteration again, but here you use a specific strategy to try to get a more acceptable meal plan. In this you take the optional meal with the highest value that is exceeding your dietary limits and throw it out, then try pulling a smaller meal - perhaps one that is no greater than the new calculated average. It might make the plan as a whole work, or you might go over value on another plan, forcing you back into a loop that could be unresolvable - or it might just take a few dozen iterations to get one that works.
Though this sounds like a lot when writing it out, even a very slow computer should be able to churn out hundreds of thousands of suggested meal plans every few seconds without pausing. Your database will be under very little strain even if you have millions of recipes to choose from, and the meal plans you return will be as random as it gets. It would also be easy to make certain multiple suggested meal plans are not duplicates with a simple comparison and another call or two for an extra meal plan to be generated - without fear of noticeable delay!
By breaking things down to small steps with minimal mathematical overhead a daunting task becomes manageable - and you don't even need a degree in mathematics to figure it out :)
(As an aside, I think you have a very nice website built there, so no worries!)
Related
I am working on a project where I need to create a data in such optimised format that it doesn't slow down the framework too much.
The problem is:
Suppose you have ordered 12 unit from my ecommerce store of a product. And for that product I have 5 different bundles to offer this much quantity from.
Suppose the array of bundles with serial number as key and max units available in that bundle as value is like:
$arr = array(
array('sr_no1'=>5),
array('sr_no2'=>7),
array('sr_no3'=>2),
array('sr_no4'=>9),
array('sr_no5'=>12)
);
Now there are two main conditions to my greedy approach to give away the quantity requested by customer.
there should be minimum wastage like if you order 11 units then you would give 9+2-11=0 wastage instead of giving it from 12-11=1 wastage
value should be chosen from minimum number of lot/bundles like if you order 12 units then there are 5+7-12=0 wastage and 12-12=0 wastage so we'll choose array('sr_no5'=>12) for giving away the requested quantity.
I have been trying to figure out the solution for last 3 days.
Consider the test cases for quantity ordered to be like 12 or 11 or 6
or 35 or 30, etc.
What I need as a result is the arrays that we'll choose to distribute the quantities like array('sr_no5'=>12) for giving away 12 units of quantity ordered and arrays array('sr_no3'=>2),array('sr_no4'=>9) for giving away 11 units of quantity.
I have tried knapsack, greedy, and minimum spanning tree while trying to figure out the solution.
Please find the most optimised solution as we do not want to achieve server time out.
NOTE: all the values above like quantity/unit ordered, no. of
bundles, max available unit in each bundle are variables and can
change for any no. of cases.
Not sure of an algorithm but you could view all possible outcomes by assigning each bundle a binary with, 0 or 1. 11000 would equal 5+7+0+0+0 (12), 00010 would equal 0+0+0+9+0 etc.
Then build a master array based on that pseudo binary value and the total that gives.
Then filter by match (or nearest match) and see which result has least amount of 1's.
It's crude but would work.
It looks like a 3 parts algorithm:
Get an array with all possible bundles combinations
Filter possible solutions: all solutions which gives exactly or more items than needed
Order by preference ( PHP: Sorting Arrays ):
First those with minimal waste
First those which waste less number of bundles
First those which waste smaller bundles
You can optimize the code unifying parts 1 and 2, filtering the possible solutions while creating the array. There are probably many possible programming patterns or even libraries to accomplish such a purpose. Depending on the amount of bundles affecting each case, it would be over skilled to optimize that much.
This is kind of a neat problem and I've enjoyed thinking it through...
Assume that you run a "Widget Rental" website, and on your application and you want to allow prospective purchasers to sort the widgets based on prices. (Low to high or high to low).
Each widget can have a different price based on the time of year. Some widgets will have dozens of different prices depending on the season as you get "high" seasons and "low" seasons.
However, the sellers of the "Widgets" are especially mischievous, and have realised that if they set their widget to be really expensive for one day of the year, and also really cheap one day of the year, then they can easily appear at the low and high sort ranges.
Currently, I took a very naive solution in order to calculate the "lowest price" for a Widget, which is to just take the lowest( N ) value from a dataset.
What I would like to is to get a "lowest from price" for a widget, which accurately portrays the price which it could be rented from.. and remove the lower/higher-band outliers.
Take a look at this chart... with values...
X Axis - Time (each significant interval is a day)
Y Axis - Price
The X axis is time, and the Y axis is the price. Now, this contains a normal distribution, and there aren't any real statistical outliers in that dataset. It's common to see the price between the lowest value and the upper value to fluctuate as much as 200%.
However, take a look at this second chart... It contains a single day tariff, which is only 20 ēuros...
I've played around with using Grubbs test and it seems to work quite well.
The important thing is that I want to get a "from price". That is to say, I want to be able to say, "You can rent this widget from XXXX". So it should be reflect the overall pricing taken as a whole and ignore clear outliers.
PHP bonus points if you point me in the direction of anything that already exists. (But I'm happy to code this myself in PHP).
One issue is that there are multiple definitions for what an outlier actually is. However, for this purpose a straight forward solution seems sufficient.
You could remove outliers by limiting the range of values to either +- some percentage or +- some number of standard deviations (probably one or two, but it could very) from the average price. Likely you'd probably want to use a combination of both, as if the prices don't very much, then a discount could be viewed as an outlier, which may or may not be appropriate. In any case, you'd likely have to do some experimenting to determine how sensitive it is. Chances are you'd probably want to set it so outliers must be at least some percentage away from the mean even if it's only 5-20 percent. Below are a few percentage based limits based on an average of $500.
90%: $50 to $950
75%: $125 to $875
50%: $250 to $750
30%: $350 to $650
25%: $375 to $650
If multiple passes are used, then it would be easier to sort the prices, then remove the price that is farthest from the average (perhaps considering the highest price as well as the lowest price) as long as it exceeds the range. This ends up being O(N*D log D) to obtain the result of continuous single passes until they have no effect, instead of O(N*D) for a single pass, where N is the number of items to rent and D is the number of days considered.
You also might find the Ramer–Douglas–Peucker algorithm useful for finding points of interest after a bit of experimenting with how to define the value of epsilon.
I have a PHP rating system (1-5), in which, some judges come rate some products. I want the results of these products to be fair. Normally what happens is some judges are very strict and may rate products only in the range of 1-2. While some judges rate products only in range of 4-5. Some judge correctly between 1-5.
Can some one give an idea or help in creating an algorithm for mean judges which scales the judges' ratings and compute the product score.
I thought of taking mean of the judges scores on all products but is that the way to go forward or some one has another good alternative to get fair results.
Edit
The rating system is not for an ecommerce application. Here there are only few judges say 10 who rate all the products. The product may be a song in a contest for example. Some of the judges may be very strict and some very liberal. There maybe several contests, so I have to record ratings of these very strict and liberal judges even for other contests and set a rule for them.
Simply put, you assign a weight to a judge based on the range of their typical votes (note, they must not be aware of this weight, or they will throw the system off.) Judges who always vote a single score get the lowest weight. Judges that give things a wide range of scores are considered more accurate.
This also assumes that these judges judge products with a fair range of quality; so if you give them a bunch of good or bad products and expect a range of vote levels, it might be unrealistic.
What you're looking for is the judge with the highest standard deviation (highest variation) in votes having the highest weight, whereas the judge with the lowest would have the least.
The non-algorithmic solution is (essentially) to run the algorithm on the judges, and then pick, American Idol style, judges that balance each other off to get what feels like an accurate result. In which case, you'd want to note the average vote as well as the standard deviation, and perhaps set three judges, one with the wide standard deviation, and then two narrows, one high and one low (liberal and strict) to judge it. This way they don't feel like they get 'less voice' because they are stricter or looser.
Then again, that could be an impetus for them to be less/more strict - if they are too easy or too hard on the product consistently they 'lose voice'.
It sounds like you may be trying to apply an algorithmic solution to a non-algorithmic problem. I'd think about why some "judges" vote only 1-2 and others vote only 4-5.
One possible cause could be self-selection. For example, people who bought an item online may be more likely to review the item if they were particularly disappointed or particularly pleased with their purchase. If this is your problem, you could try to to encourage shoppers to vote more, so that even those who had a non-extreme experience come back to vote.
Another possible issue may be guidance. Maybe your explanation of the rating system isn't clear to the judges. You can try to add a description of what each rating means, and see if that improves the quality of data.
In summary, any kind of a solution to your rating problem will need to have a "human" component and take into account the full story of how the judges choose ratings and why. There is not a whole lot that a ranking algorithm can do if your input data is poor quality. On the other hand, if your data has decent quality, then taking a mean works quite well.
One unrelated problem with taking a mean is that an item with one 5-star rating will rank above an item with hundred 5-star ratings + one 4-star rating. One simple solution is Laplace Smoothing, which addresses the problem by effectively starting every item with one vote of each value (1,2,3,4,5). You don't display the "smoothed" values, but you use them when sorting. See How Not To Sort By Average Rating post for an alternate solution.
How about truncated mean? Here is a good explanation of the idea.
EDIT
Let's say you have votes like: [1,4,3,2,5,1,1,3,2,4].
You need to sort the array in ascending order, giving you: [1,1,1,2,2,3,3,4,4,5].
Then let's say you want to get rid of 25% of the votes, which is 3 (rounding up). You simply discard three votes from the left and from the right, giving you [2,2,3,3].
Then, use arithmetic mean to get 2.5.
EDIT 2
Depending on your database schema, you could query the database to return the votes in ascending order. Then, calculate the percentage, use array_slice() to help you (read the documentation) and calculating the arithmetic mean is the least of your concerns now.
EDIT: Im sorry guys my explantion of the problem wasn't clear! This should be better:
User sends ID numbers of articles and the max. number of bundles(packages)
API searches for all prices available for the articles and calculates best result for min. number of bundles (limit to max. number provided by customer)
ONE Bundle is one package of items delivered to ONE platform(buyer)
Thanks!
This is a fun little problem. I spent a few hours on it this morning, and while I don't have a complete solution, I think I have enough for you to get started (which I believe was what you asked for).
First of all, I'm assuming these things, based on your description of the problem:
All buyers quote a price for all the items
There's no assumption about the items, they may all be different
The user can only interact with a limited number of buyers
The user wants to sell every item, each to one buyer
The user may sell multiple items to a single buyer
Exact solution -- brute force approach
For this, the first thing to realize is that, for a given set of buyers, it is straight forward to calculate the maximum total revenue, because you can just choose the highest price offered in that set of buyers for each item. Add up all those highest prices, and you have the max total revenue for that set of buyers.
Now all you have to do is make that calculation for every possible combination of buyers. That's a basic combinations problem: "n choose k" where n is the total number of buyers and k is the number of buyers you're limited to. There are functions out there that will generate lists of these combinations (I wrote my own... there's also this PEAR package for php).
Once you have a max total revenue for every combination of chosen buyers, just pick the biggest one, and you've solved the problem.
More elegant algorithm?
However, as I intimated by calling this "brute force", the above is not fast, and scales horribly. My machine runs out of memory with 20 buyers and 20 items. I'm sure a better algorithm exists, and I've got a good one, but it isn't perfect.
It's based on opportunity costs. I calculate the difference between the highest price and the second highest price for each item. That difference is an opportunity cost for not picking the buyer with that highest price.
Then I pick buyers offering high prices for items where the opportunity cost is the highest (thus avoiding the worst opportunity costs), until I have k - 1 buyers (where k is the max I can pick). The final choice is tricky, and instead of writing a much more complicated algorithm, I just run all the possibilities for the final buyer and pick the best revenue.
This strategy picks the best combination most of the time, and if it misses, it doesn't miss much. Its also scales relatively well. It's 10x faster than brute force on small scales, and if I quadruple all the parameters (buyers, buyer limit, and items), calculation time goes up by a factor of 20. Considering how many combinations are involved, that's pretty good.
I've got some code drafted, but it's too long for this post. Let me know if you're interested, and I'll figure out a way to send it to you.
This is a graph problem. It can be solved with the Edmond's Blossom V algorithm. It's a matching algorithm to find the best pairwise matching for example in dating programs. Maybe you want to look for the 1d bin-packing algorithm. In 1d bin-packing you have a limit items to assign to unlimited boxes or shelves the better the boxes get filled.
If I understand the problem correctly, it is NP-complete via reduction from Minimum Set Cover. We can translate an instance of Set Cover into an instance of the OP's problem as follows:
Let an instance of Set Cover be given by a set X of size n and a collection of subsets S_1, S_2, ..., S_m of X. Construct an instance of the OP's problem where the seller has n items to sell to m buyers, where buyer i offers a price of 1 for item j if *S_i* contains item j and 0 otherwise. A solution to the OP's problem where the number of buyers is limited by k and the total price paid is n corresponds to a solution to the original Set Cover problem with k sets. So, if you had a polynomial-time solution to the OP's problem, you could solve Minimum Set Cover by successively solving it for the case of 1, 2, 3, etc... buyers until you found a solution with total price equal to n.
I'd like to populate the homepage of my user-submitted-illustrations site with the "hottest" illustrations uploaded.
Here are the measures I have available:
How many people have favourited that illustration
votes table includes date voted
When the illustration was uploaded
illustration table has date created
Number of comments (not so good as max comments total about 10 at the moment)
comments table has comment date
I have searched around, but don't want user authority to play a part, but most algorithms include that.
I also need to find out if it's better to do the calculation in the MySQL that fetches the data or if there should be a PHP/cron method every hour or so.
I only need 20 illustrations to populate the home page. I don't need any sort of paging for this data.
How do I weight age against votes? Surely a site with less submission needs less weight on date added?
Many sites that use some type of popularity ranking do so by using a standard algorithm to determine a score and then decaying eternally over time. What I've found works better for sites with less traffic is a multiplier that gives a bonus to new content/activity - it's essentially the same, but the score stops changing after a period of time of your choosing.
For instance, here's a pseudo-example of something you might want to try. Of course, you'll want to adjust how much weight you're attributing to each category based on your own experience with your site. Comments are rare, but take more effort from the user than a favorite/vote, so they probably should receive more weight.
score = (votes / 10) + comments
age = UNIX_TIMESTAMP() - UNIX_TIMESTAMP(date_created)
if(age < 86400) score = score * 1.5
This type of approach would give a bonus to new content uploaded in the past day. If you wanted to approach this in a similar way only for content that had been favorited or commented on recently, you could just add some WHERE constraints on your query that grabs the score out from the DB.
There are actually two big reasons NOT to calculate this ranking on the fly.
Requiring your DB to fetch all of that data and do a calculation on every page load just to reorder items results in an expensive query.
Probably a smaller gotcha, but if you have a relatively small amount of activity on the site, small changes in the ranking can cause content to move pretty drastically.
That leaves you with either caching the results periodically or setting up a cron job to update a new database column holding this score you're ranking by.
Obviously there is some subjectivity in this - there's no one "correct" algorithm for determining the proper balance - but I'd start out with something like votes per unit age. MySQL can do basic math so you can ask it to sort by the quotient of votes over time; however, for performance reasons, it might be a good idea to cache the result of the query. Maybe something like
SELECT images.url FROM images ORDER BY (NOW() - images.date) / COUNT((SELECT COUNT(*) FROM votes WHERE votes.image_id = images.id)) DESC LIMIT 20
but my SQL is rusty ;-)
Taking a simple average will, of course, bias in favor of new images showing up on the front page. If you want to remove that bias, you could, say, count only those votes that occurred within a certain time limit after the image being posted. For images that are more recent than that time limit, you'd have to normalize by multiplying the number of votes by the time limit then dividing by the age of the image. Or alternatively, you could give the votes a continuously varying weight, something like exp(-time(vote) + time(image)). And so on and so on... depending on how particular you are about what this algorithm will do, it could take some experimentation to figure out what formula gives the best results.
I've no useful ideas as far as the actual agorithm is concerned, but in terms of implementation, I'd suggest caching the result somewhere, with a periodic update - if the resulting computation results in an expensive query, you probably don't want to slow your response times.
Something like:
(count favorited + k) * / time since last activity
The higher k is the less weight has the number of people having it favorited.
You could also change the time to something like the time it first appeared + the time of the last activity, this would ensure that older illustrations would vanish with time.