MySQL/PHP - Return Closest Numerical Match

MySQL/PHP - Return Closest Numerical Match - php

I want to make an order form where user can choose a clothing size. I have all the available sizes stored in a database, as code, size and category.
size is formatted like 66/68/76 (Waist/Hips/Leg Length). User is able to input these three values. If user's size is available - there's no problem. But if it's not I want the site to offer or change it to nearest available size. For example if user entered 65/66/74 and exact value doesn't exist (or unavailable right now) it will be changed to 66/65/74.

You need to define what you mean by "closest". Along the way, you should also store the three values in three different columns. Storing multiple values in a single column is a bad idea.
Sometimes, one is stuck with a particular data format because of someone else's poor design decisions.
One perhaps reasonable measure is Euclidean distance -- the sum of the squares of each component. You can calculate this in MySQL:
select t.*
from (select t.*,
substring_index(size, '/', 1) as waist,
substring_index(substring_index(size, '/', 2), '/', -1) as hips,
substring_index(size, '/', -1) as legs
from t
) t
order by pow(waist - $waist, 2) + pow(hips - $hips, 2) + pow(legs - $legs, 2)
limit 1;

One way whould be to store 3 values as an integer 666876 (66/68/76).
Then, you can do a query of the minimun value of the substraction of productSize - userSize, that is greater than 0.
This approach takes always the clothing higher size.

Related

MySQL Query Between Two Ranges

I need help with a query. I am taking input from a user where they enter a range between 1-100. So it could be like 30-40 or 66-99. Then I need a query to pull data from a table that has a high_range and a low_range to find a match to any number in their range.
So if a user did 30-40 and the table had entries for 1-80, 21-33, 32-40, 40-41, 66-99, and 1-29 it would find all but the last two in the table.
What is the easiest why to do this?
Thanks

If I understood correctly (i.e. you want any range that overlaps the one entered by the user), I'd say:
SELECT * FROM table WHERE low <= $high AND high >= $low

What I understood is that the range is stored in this format low-high. If that is the case, then this is a poor design. I suggest splitting the values into two columns: low, and high.
If you already have the values split, you can use some statement like:
SELECT * FROM myTable WHERE low <= $needleHigherBound AND high >= $needleLowerBound
If you have the values stored in one column, and insist they stay so, You might find the SUBSTRING_INDEX function of MySQL useful. But in this case, you'll have to write a complicated query to parse all the values of all the rows, and then compare them to your search values. It seems like a lot of effort to cover up a design flaw.

Item rankings, order by confidence using Reddit Ranking Algorithms

I am interested to use this ranking class, based off of an article by Evan Miller to rank a table I have that has upvotes and downvotes. I have a system very similar to Stack Overflow's up/down voting system for an events site I am working on, and by using this ranking class I feel as though results will be more accurate. My question is how do I order by the function 'hotness'?
private function _hotness($upvotes = 0, $downvotes = 0, $posted = 0) {
$s = $this->_score($upvotes, $downvotes);
$order = log(max(abs($s), 1), 10);
if($s > 0) {
$sign = 1;
} elseif($s < 0) {
$sign = -1;
} else {
$sign = 0;
}
$seconds = $posted - 1134028003;
return round($order + (($sign * $seconds)/45000), 7);
}
I suppose each time a user votes I could have a column in my table that has the hotness data recalculated for the new vote, and order by that column on the main page. But I am interested to do this more on-the-fly incorporating the function above, and I am not sure if that is possible.
From Evan Miller, he uses:
SELECT widget_id, ((positive + 1.9208) / (positive + negative) -
1.96 * SQRT((positive * negative) / (positive + negative) + 0.9604) /
(positive + negative)) / (1 + 3.8416 / (positive + negative))
AS ci_lower_bound FROM widgets WHERE positive + negative > 0
ORDER BY ci_lower_bound DESC;
But I rather not do this calculation in the sql as I feel this is messy and difficult to change down the line if I utilize this code on multiple pages .etc.

Accessing the corresponding "Posts" table for anything (reading, writing, sorting, comparing, etc.) is extremely quick and thus relying on the database is the "most on-the-fly" alternative you have for non-temporary data storage (memory/sessions are still quicker but, logically, cannot be used to store this information).
You should be more worried about building a good ranking algorithm delivering the results you want (you are proposing two different systems, delivering different results) and working on making the whole code and the code-database communication as efficient as possible.
In principle, small codes with iterative simple orders offer the quickest and most reliable solution for this kind of situations. Example:
Ranking function (like the first one you propose or any
other one built on the ranking rules you want) called every time a
vote is given. It writes to the corresponding column(s) in the
"Posts" table (the simpler the query, the better: you can create a
ranking system as complex as you wish, but try to rely on PHP
rather than on queries).
Every time a comparison between posts is required, the "Posts" table is read with a simple SELECT ordering the records by ranking
(you can have various "assessing columns" (e.g., up-votes,
down-votes, further considerations); but better having one with the
definitive ranking).

You are right, query like this is rather messy and expensive as well.
Mixed PHP/MySQL on the fly is a bad idea well as you will have to select values for all posts and calculate hotness and then select a list of hotest ones. Extremely expensive.
You should consider saving at least part of your calculation to the database. Definitely order should go to the database. It's always better to calculate something and save just once on every save/update, instead of calculating each time it will be displayed. Try to do a benchmark on how much time you will save by calculating order on save/update instead of every time you calculate the hotness. Good thing is that order never changes unless someone upvotes/downvotes which you save to the db anyway, same for the sign.
Even if you save the sign to the db you are stil not able to avoid calculating on the fly due to the posted timestamp parameter.
I would see what difference does it make and where it makes a difference and calculate hotness with a CLI script every x amount of time only for those scripts where this is crucial, every y amount of time where it's making less of a difference.
Taking this approach you will be recalculating hotness only when necessary. This will make your application much more efficient.

I am not sure if it is possible with your DB and Schema however have you consider writing a UDF for custom sorting?
A post from stackoverflow talks about how to do this here.

PHP Calculation - getting relevance score

I have an application that stores data in database. I need search functionality to work on this database.
For this to work I need a "relevance" score, a score that is calculated based on a set of criteria to output as a value that can be then used to order a set of data.
Say for instance the user enters three keywords: X, Y and Z - I need to generate a score based on a database entry. I wish the criteria to be related to how many times each appears.
Example:
Database Entry A - X appears 8 times Y appears once and Z appears once. Giving a collective score of 10.
Database Entry B - X appears 24 times Y does not appear and Z does not appear. Giving a collective score of 24.
Here's my problem. Database Entry A IS more relevant based on the search of XYZ because it has all three database entries, not just one, yet a standard calculation would class Database Entry B as more relevant.
I need to figure out a way to calculate the results and give an number score to the result based on not just how many of each keyword appears, but also giving higher scores for those results that have more than one keyword displayed, exponentially (i.e. entering 10 keywords would show results where all 10 appear above ones with large amounts of one).
I need to achieve this with PHP which will be retrieving my database results and feeding them back to my website page.

You could compute two relevance scores. One that rates based on how many fields provided a match, and then your regular "how matches were found". From your examples, that would provide:
Example A - field_count: 3, match_count: 10
Example B - field_count: 1, match_count: 24
and then have your query do
ORDER BY field_count, match_count
so that matches with more fields get sorted first.

Since the (first) presence of a keyword is so important, give it a better score than the rest of the occurrences. For example:
$score = 0;
foreach ($keywords as $count) {
$score += $count==0 ? 0 : 1000000;
$score += $count;
}
If you apply this algorithm to your example, you will have:
Entry1 ---> (1000000 + 8) + (1000000 + 1) + (1000000 + 1) = 3000010
Entry2 ---> (1000000 + 24) = 1000024
So Entry1 scores better than Entry2 as you wanted.

php - how do I display 5 results from possible 50 randomly but ensure all results are displayed equal amount

In php - how do I display 5 results from possible 50 randomly but ensure all results are displayed equal amount.
For example table has 50 entries.
I wish to show 5 of these randomly with every page load but also need to ensure all results are displayed rotationally an equal number of times.
I've spent hours googling for this but can't work it out - would very much like your help please.

please scroll down for "biased randomness" if you dont want to read.
In mysql you can just use SeleCT * From table order by rand() limit 5.
What you want just does not work. Its logically contradicting.
You have to understand that complete randomness by definition means equal distribution after an infinite period of time.
The longer the interval of selection the more evenly the distribution.
If you MUST have even distribution of selection for example every 24h interval, you cannot use a random algorithm. It is by definition contradicting.
It really depends no what your goal is.
You could for example take some element by random and then lower the possibity for the same element to be re-chosen at the next run. This way you can do a heuristic that gives you a more evenly distribution after a shorter amount of time. But its not random. Well certain parts are.
You could also randomly select from your database, mark the elements as selected, and now select only from those not yet selected. When no element is left, reset all.
Very trivial but might do your job.
You can also do something like that with timestamps to make the distribution a bit more elegant.
This could probably look like ORDER BY RAND()*((timestamps-min(timestamps))/(max(timetamps)-min(timestamps))) DESC or something like that. Basically you could normalize the timestamp of selection of an entry using the time interval window so it gets something between 0 and 1 and then multiply it by rand.. then you have 50% fresh stuff less likely selected and 50% randomness... i am not sure about the formular above, just typed it down. probably wrong but the principle works.
I think what you want is generally referred to as "biased randomness". there are a lot of papers on that and some articles on SO. for example here:
Biased random in SQL?

Copy the 50 results to some temporary place (file, database, whatever you use). Then everytime you need random values, select 5 random values from the 50 and delete them from your temporary data set.
Once your temporary data set is empty, create a new one copying the original again.

Popularity Algorithm

I'd like to populate the homepage of my user-submitted-illustrations site with the "hottest" illustrations uploaded.
Here are the measures I have available:
How many people have favourited that illustration
votes table includes date voted
When the illustration was uploaded
illustration table has date created
Number of comments (not so good as max comments total about 10 at the moment)
comments table has comment date
I have searched around, but don't want user authority to play a part, but most algorithms include that.
I also need to find out if it's better to do the calculation in the MySQL that fetches the data or if there should be a PHP/cron method every hour or so.
I only need 20 illustrations to populate the home page. I don't need any sort of paging for this data.
How do I weight age against votes? Surely a site with less submission needs less weight on date added?

Many sites that use some type of popularity ranking do so by using a standard algorithm to determine a score and then decaying eternally over time. What I've found works better for sites with less traffic is a multiplier that gives a bonus to new content/activity - it's essentially the same, but the score stops changing after a period of time of your choosing.
For instance, here's a pseudo-example of something you might want to try. Of course, you'll want to adjust how much weight you're attributing to each category based on your own experience with your site. Comments are rare, but take more effort from the user than a favorite/vote, so they probably should receive more weight.
score = (votes / 10) + comments
age = UNIX_TIMESTAMP() - UNIX_TIMESTAMP(date_created)
if(age < 86400) score = score * 1.5
This type of approach would give a bonus to new content uploaded in the past day. If you wanted to approach this in a similar way only for content that had been favorited or commented on recently, you could just add some WHERE constraints on your query that grabs the score out from the DB.
There are actually two big reasons NOT to calculate this ranking on the fly.
Requiring your DB to fetch all of that data and do a calculation on every page load just to reorder items results in an expensive query.
Probably a smaller gotcha, but if you have a relatively small amount of activity on the site, small changes in the ranking can cause content to move pretty drastically.
That leaves you with either caching the results periodically or setting up a cron job to update a new database column holding this score you're ranking by.

Obviously there is some subjectivity in this - there's no one "correct" algorithm for determining the proper balance - but I'd start out with something like votes per unit age. MySQL can do basic math so you can ask it to sort by the quotient of votes over time; however, for performance reasons, it might be a good idea to cache the result of the query. Maybe something like
SELECT images.url FROM images ORDER BY (NOW() - images.date) / COUNT((SELECT COUNT(*) FROM votes WHERE votes.image_id = images.id)) DESC LIMIT 20
but my SQL is rusty ;-)
Taking a simple average will, of course, bias in favor of new images showing up on the front page. If you want to remove that bias, you could, say, count only those votes that occurred within a certain time limit after the image being posted. For images that are more recent than that time limit, you'd have to normalize by multiplying the number of votes by the time limit then dividing by the age of the image. Or alternatively, you could give the votes a continuously varying weight, something like exp(-time(vote) + time(image)). And so on and so on... depending on how particular you are about what this algorithm will do, it could take some experimentation to figure out what formula gives the best results.

I've no useful ideas as far as the actual agorithm is concerned, but in terms of implementation, I'd suggest caching the result somewhere, with a periodic update - if the resulting computation results in an expensive query, you probably don't want to slow your response times.

Something like:
(count favorited + k) * / time since last activity
The higher k is the less weight has the number of people having it favorited.
You could also change the time to something like the time it first appeared + the time of the last activity, this would ensure that older illustrations would vanish with time.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.