Related
http://viper-7.com/6soAKr
I'm trying to display a list of items in groups of 4. If I go from 1-10 in the for loop, it works great and I get the following output:
1 2 3 4
5 6 7 8
9 10
I'm using this code: http://viper-7.com/6soAKr
I actually need to display them in reverse order from 10-1 in the same format
When I try the code in reverse order:
($sucid = 10; $sucid > 0; $sucid = $sucid - 1)
I get:
10 9 8
7 6 5 4
3 2 1
And the HTML layout is out of place compares to the output of the top
What I need is:
10 9 8 7
6 5 4 3
2 1
I know it's the modulus part that is wrong, but I am having trouble understanding how to change it when I go backwards
http://viper-7.com/6soAKr
You could keep the first for-loop (i.e. the one looping from 1 to 10) and instead of $sucid print 11-$scuid.
For the mysql "between" operator, is it necessary for the before and after value to be numerically in order?
like:
BETWEEN -10 AND 10
BETWEEN 10 AND -10
Will both of these work or just the first one?
Also, can I do:
WHERE thing<10 AND thing>-10
Will that work or do I have to use between?
Lastly, can I do:
WHERE -10<thing<10
?
BETWEEN -10 AND 10
This will match any value from -10 to 10, bounds included.
BETWEEN 10 AND -10
This will never match anything.
WHERE thing<10 AND thing>-10
This will match any value from -10 to 10, bounds excluded.
Also, if thing is a non-deterministic expression, it is evaluated once in case of BETWEEN and twice in case of double inequality:
SELECT COUNT(*)
FROM million_records
WHERE RAND() BETWEEN 0.6 AND 0.8;
will return a value around 200,000;
SELECT COUNT(*)
FROM million_records
WHERE RAND() >= 0.6 AND RAND() <= 0.8;
will return a value around 320,000
The min value must come before the max value. Also note that the end points are included, so BETWEEN is equivalent to:
WHERE thing>=-10 AND thing<=10
Please keep it to one question per post. Anyway:
http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#operator_between
BETWEEN min AND max, in that order.
from the link:
This is equivalent to the expression (min <= expr AND expr <= max) if
all the arguments are of the same type
The second alternative will also work, of course.
First question:
Will both of these work or just the first one?
yes,both of these work
Second question:
Will that work or do I have to use between?
it also valid but as you can see just empty result
Yes your between must be in order to return the excepted result.
Let's say you have a table with a row called mynumber that contains 10 rows :
MyNumber
--------
1
2
3
4
5
6
7
8
9
10
So
select * from thistable table where table.myNumber BETWEEN 1 and 5
will return
1
2
3
4
5
but
select * from thistable table where table.myNumber BETWEEN 5 and 1
return nothing.
Your 2nd question : yes it is the same thing. but beware in you example you will have to put <= and >= to be the same as between. if not, in our example, you would get
2
3
4
Hope it help
I've already seen such things work with integers :
WHERE -10
But it's better to avoid it. One reason is that it doesn't seem to work well with other types. And MySQL doesn't issue any warning.
I've tried it with datetime columns, and the result was wrong.
My request looked like this one:
SELECT *
FROM FACT__MODULATION_CONSTRAINTS constraints
WHERE constraints.START_VALIDITY<= now() < constraints.END_VALIDITY
The result was not as expected. I got twice as many results as the same request with two inequalities (which returned correct results). Only the 1st part of the expression evaluated correctly.
I'm working with some ocean tide data that's structured like this:
$data = array('date' => array('time' => array('predicted','observed')));
Here's a sample of real data that I'm using: http://pastebin.com/raw.php?i=bRc2rmpG
And this is my attempt at finding the high/low values: http://pastebin.com/8PS1frc0
Current issues with my code:
When the readings fluctuate (as seen in the 11/14/2010=>11:30:00 to 11/14/2010=>11:54:00 span in the sample data), it creates a "wobble" in the direction logic. This creates an erroneous Peak and Trough. How can I avoid/correct this?
Note: My method is very "ad-hoc".. I assumed I wouldn't need any awesome math stuff since I'm not trying to find any averages, approximations, or future estimations. I'd really appreciate a code example of a better method, even if it means throwing away the code I've written so far.
I've had to perform similar tasks on a noisy physiological data. In my opinion, you have a signal conditioning problem. Here is a process that worked for me.
Convert your time values to seconds, i.e. (HH*3600)+(MM*60)+(SS), to generate a numeric "X" value.
Smooth the resulting X and Y arrays with a sliding window, say 10 points in width. You might also consider filtering data with redundant and/or bogus timestamps in this step.
Perform an indication phase detection by comparing the smoothed Y[1] and Y[0]. Similar to the post above, if (Y[1] > Y[0]), you may assume the data are climbing to a peak. If (Y[1] < Y[0]), you may assume the data are descending to a trough.
Once you know the initial phase, peak and trough detection may be performed as described above: if Y[i] > Y[i+1] and Y[i] < Y[i-1], you have encountered a peak.
You can estimate the peak/trough time by projecting the smoothed X value back to the original X data by considering the sliding window size (in order to compensate for "signal lag induced" by the sliding window). The resulting time value (in seconds) can then be converted back to an HH:MM:SS format for reporting.
You're looking for local minima and maxima, I presume? That's really easy to do:
<?php
$data = array(1, 9, 4, 5, 6, 9, 9, 1);
function minima($data, $radius = 2)
{
$minima = array();
for ($i = 0; $i < count($data); $i += $radius)
{
$minima[] = min(array_slice($data, $i, $radius));
}
return $minima;
}
function maxima($data, $radius = 2)
{
$maxima = array();
for ($i = 0; $i < count($data); $i += $radius)
{
$maxima[] = max(array_slice($data, $i, $radius));
}
return $maxima;
}
print_r(minima($data));
print_r(maxima($data));
?>
You just have to specify a radius of search, and it will give you back an array of local minima and maxima of the data. It works in a simple way: it cuts the array into segments of length $radius and finds the minimum of that segment. This process is repeated for the whole set of data.
Be careful with the radius: usually, you want to select the radius to be the average distance from peak to trough of the data, but you will have to find that manually. It is defaulted to 2, and that will only search for minima/maxima within a radius of 2, which will probably give false positives with your set of data. Select the radius wisely.
You'll have to hack it into your script, but that shouldn't be too hard at all.
I haven't read it in detail, but your approach seems very ad-hoc. A more correct way would probably be to fit it to a function
f(A,B,w,p;t)=Asin(wt+p)+B
using a method such as non-linear least squares (which unfortunately has to be solved using an iterative method). Looking at your sample data, it seems like it would be a good fit. When you have calculated w and p, it's easy to locate the peaks and valleys by just taking the time derivative of the function and solving for zero:
t = (pi(1+2n)-2p)/w
But I suppose, that if your code really does what you want, there's no use to complicate things. Stop second-guessing yourself. :)
A problem is I think that the observations are observations and can contain small errors. That at least needs to be accounted for. For example:
Only change direction if at least the next 2 entries are also in the same direction.
Don't let decisions be made by data on a too small difference. Throw away insignificant numbers. It will be a lot better probably when you say $error = 0.10; and change your conditions to if $previous - $error > $current etcetera.
How accurate does the peak/valley detection have to be? If you just need to find the exact record where a peak or valley occurs, isn't it enough to check for inflection points?
e.g. considering a record at position 'i', if record[i-1] and record[i+1] are both "higher" than record[i], you've got a valley. and if record[i-1] and record[i+1] are both lower than record[i], you've got a peak. As long as your sampling rate is faster than the tide changes (look up Nyquist frequency), that process should get you your data's peaks/troughs.
If you need to generate a graph from this and try to extrapolate more accurate time points for the peaks/troughs, then you're in for more work.
One way may be to define an absolute or relative deviation past which you classify further peaks/troughs as new ones rather than fluctuations around an existing peak/trough.
Currently, $direction determines whether you are finding a peak or trough, so instead of transiting to the other state (finding the trough or peak) once the derivative changes in sign, you can consider changing the state only when the deviation from the current peak/trough is "large" enough.
Given that you should never see two max or 2 min in less than about 12 hours, a simple solution would be to use a sliding windows of 3-5 hr or so and find the max and min. If it ends up being the in the first or last 30 min, ignore it.
As an example, given the following data:
1 2 3 4 5 6 5 6 7 8 7 6 5 4 3 2 1 2
and a window of size 8, with the first and last 2 ignored and only looking a peeks you would see:
1 2 | 3 4 5 6 | 5 6, max = 6, ignore = Y
2 3 | 4 5 6 5 | 6 7, max = 7, ignore = Y
3 4 | 5 6 5 6 | 7 8, max = 8, ignore = Y
4 5 | 6 5 6 7 | 8 7, max = 8, ignore = Y
5 6 | 5 6 7 8 | 7 6, max = 8, ignore = N
6 5 | 6 7 8 7 | 6 5, max = 8, ignore = N
5 6 | 7 8 7 6 | 5 4, max = 8, ignore = N
6 7 | 8 7 6 5 | 4 3, max = 8, ignore = N
7 8 | 7 6 5 4 | 3 2, max = 8, ignore = Y
8 7 | 6 5 4 3 | 2 1, max = 8, ignore = Y
7 6 | 5 4 3 2 | 1 2, max = 7, ignore = Y
Ok, let's suppose we have members table. There is a field called, let's say, about_member. There will be a string like this 1-1-2-1-2 for everybody. Let's suppose member_1 has this string 1-1-2-2-1 and he searches who has the similar string or as much similar as possible. For example if member_2 has string 1-1-2-2-1 it will be 100% match, but if member_3 has string like this 2-1-1-2-1 it will be 60% match. And it has to be ordered by match percent. What is the most optimal way to do it with MYSQL and PHP? It's really hard to explain what I mean, but maybe you got it, if not, ask me. Thanks.
Edit: Please give me ideas without Levenshtein method. That answer will get bounty. Thanks. (bounty will be announced when I will be able to do that)
convert your number sequences to bit masks and use BIT_COUNT(column ^ search) as similarity function, ranged from 0 (= 100% match, strings are equal) to [bit length] (=0%, strings are completely different). To convert this similarity function to the percent value use
100 * (bit_length - similarity) / bit_length
For example, "1-1-2-2-1" becomes "00110" (assuming you have only two states), 2-1-1-2-1 is "10010", bit_count(00110 ^ 10010) = 2, bit-length = 5, and 100 * (5 - 2) / 5 = 60%.
Jawa posted this idea originally; here is my attempt.
^ is the XOR function. It compares 2 binary numbers bit-by-bit and returns 0 if both bits are the same, and 1 otherwise.
0 1 0 0 0 1 0 1 0 1 1 1 (number 1)
^ 0 1 1 1 0 1 0 1 1 0 1 1 (number 2)
= 0 0 1 1 0 0 0 0 1 1 0 0 (result)
How this applies to your problem:
// In binary...
1111 ^ 0111 = 1000 // (1 bit out of 4 didn't match: 75% match)
1111 ^ 0000 = 1111 // (4 bits out of 4 didn't match: 0% match)
// The same examples, except now in decimal...
15 ^ 7 = 8 (1000 in binary) // (1 bit out of 4 didn't match: 75% match)
15 ^ 0 = 15 (1111 in binary) // (4 bits out of 4 didn't match: 0% match)
How we can count these bits in MySQL:
BIT_COUNT(b'0111') = 3 // Bit count of binary '0111'
BIT_COUNT(7) = 3 // Bit count of decimal 7 (= 0111 in binary)
BIT_COUNT(b'1111' ^ b'0111') = 1 // (1 bit out of 4 didn't match: 75% match)
So to get the similarity...
// First we focus on calculating mismatch.
(BIT_COUNT(b'1111' ^ b'0111') / YOUR_TOTAL_BITS) = 0.25 (25% mismatch)
(BIT_COUNT(b'1111' ^ b'1111') / YOUR_TOTAL_BITS) = 0 (0% mismatch; 100% match)
// Now, getting the proportion of matched bits is easy
1 - (BIT_COUNT(b'1111' ^ b'0111') / YOUR_TOTAL_BITS) = 0.75 (75% match)
1 - (BIT_COUNT(b'1111' ^ b'1111') / YOUR_TOTAL_BITS) = 1.00 (100% match)
If we could just make your about_member field store data as bits (and be represented by an integer), we could do all of this easily! Instead of 1-2-1-1-1, use 0-1-0-0-0, but without the dashes.
Here's how PHP can help us:
bindec('01000') == 8;
bindec('00001') == 1;
decbin(8) == '01000';
decbin(1) == '00001';
And finally, here's the implementation:
// Setting a member's about_member property...
$about_member = '01100101';
$about_member_int = bindec($about_member);
$query = "INSERT INTO members (name,about_member) VALUES ($name,$about_member_int)";
// Getting matches...
$total_bits = 8; // The maximum length the member_about field can be (8 in this example)
$my_member_about = '00101100';
$my_member_about_int = bindec($my_member_about_int);
$query = "
SELECT
*,
(1 - (BIT_COUNT(member_about ^ $my_member_about_int) / $total_bits)) match
FROM members
ORDER BY match DESC
LIMIT 10";
This last query will have selected the 10 members most similar to me!
Now, to recap, in layman's terms,
We use binary because it makes things easier; the binary number is like a long line of light switches. We want to save our "light switch configuration" as well as find members that have the most similar configurations.
The ^ operator, given 2 light switch configurations, does a comparison for us. The result is again a series of switches; a switch will be ON if the 2 original switches were in different positions, and OFF if they were in the same position.
BIT_COUNT tells us how many switches are ON--giving us a count of how many switches were different. YOUR_TOTAL_BITS is the total number of switches.
But binary numbers are still just numbers... and so a string of 1's and 0's really just represents a number like 133 or 94. But it's a lot harder to visualize our "light switch configuration" if we use decimal numbers. That's where PHP's decbin and bindec come in.
Learn more about the binary numeral system.
Hope this helps!
The obvious solution is to look at the levenstein distance (there isn't an implementation built into mysql but there are other implementations accesible e.g. this one in pl/sql and some extensions), however as usual, the right way to solve the problem would be to have normalised the data properly in the first place.
One way to do this is to calculate the Levenshtein distance between your search string and the about_member fields for each member. Here's an implementation of the function as a MySQL stored function.
With that you can do:
SELECT name, LEVENSHTEIN(about_member, '1-1-2-1-2') AS diff
FROM members
ORDER BY diff ASC
The % of similarity is related to diff; if diff=0 then it's 100%, if diff is the size of the string (minus the amount of dashes), it's 0%.
Having read the clarification comments on the original question, the Levenshtein distance is not the answer you are looking for.
You are not trying to compute the smallest number of edits to change one string into another.
You are trying to compare one set of numbers with another set of numbers. What you are looking for is the minimum (weighted) sum of the differences between the two sets of numbers.
Place each answer in a separate column (Ans1, Ans2, Ans3, Ans4, .... )
Assume you are searching for similarities to 1-2-1-2.
SELECT UserName, Abs( Ans1 - 1 ) + Abs( Ans2 - 2 ) + Abs( Ans3 - 1 ) + Abs( Ans4 - 2) as Difference ORDER BY Difference ASC
Will list users by similarity to answers 1-2-1-2, assuming all questions are weighted evenly.
If you want to make certain answers more important, just multiply each of the terms by a weighting factor.
If the questions will always be yes/no and the number of answers is small enough that all the answers can be fitted into a single integer and all answers are equally weighted, then you could encode all the answers in a single column and use BIT_COUNT as suggested. This would be a faster and more space-efficient implementation.
I would go with the similar_text() PHP built-in. It seems to be exactly what you want:
$percent = 0;
similar_text($string1, $string2, $percent);
echo $percent;
It works as the question expects.
I would go with the Levenshtein distance approach, you can use it within MySQL or PHP.
If you don't have too many fields, you could create an index on the integer representation of about_member. Then you can find the 100% by an exact match on the about_member field, followed by the 80% matches by changing 1 bit, the 60% matches by changing 2 bits, and so on.
If you represent your answer patterns as bit sequences you can use the formula (100 * (bit_length - similarity) / bit_length).
Following the mentioned example, when we convert "1"s to bit off and "2"s to bit on "1-1-2-2-1" becomes 6 (as base-10, 00110 in binary) and "2-1-1-2-1" becomes 18 (10010b) etc.
Also, I think you should store the answers' bits to the least significant bits, but it doesn't matter as long as you are consistent that the answers of different members align.
Here's a sample script to be run against MySQL.
DROP TABLE IF EXISTS `test`;
CREATE TABLE `members` (
`id` VARCHAR(16) NOT NULL ,
`about_member` INT NOT NULL
) ENGINE = InnoDB;
INSERT INTO `members`
(`id`, `about_member`)
VALUES
('member_1', '6'),
('member_2', '18');
SELECT 100 * ( 5 - BIT_COUNT( about_member ^ (
SELECT about_member
FROM members
WHERE id = 'member_1' ) ) ) / 5
FROM members;
The magical 5 in the script is the number of answers (bit_length in the formula above). You should change it according to your situation, regardless of how many bits there are in the actual data type used, as BIT_COUNT doesn't know how many bytes you are using.
BIT_COUNT returns the number of bits set and is explained in MySQL manual. ^ is the binary XOR operator in MySQL.
Here the comparison of member_1's answers is compared with everybody's, including their own - which results as 100% match, naturally.
i have mathematical problem which I'm unable to convert to PHP or explain differently. Can somebody send my in the right direction?
I have two number which are in order (sequence). #1 and #2. I want to somehow compare those two numbers and get a positive number lower than 100 as the result of the comparison. The higher the values, the higher the result should be. However, if #2 is also high, then the result should be "dimmed" accordingly...
These are the "expected results":
#1: 5 #2: 10 => ~ 5
#2: 10 #2: 5 => ~ 8
#1: 50 #2: 60 => ~ 12
#1: 50 #2: 100 => ~ 8
#1: 100 #2: 50 => ~ 20
#1: 500 #2: 500 => ~ 25
#1: 500 #2: 100 => ~ 50
#1: 100 #2: 500 => ~ 15
The number (1 or 2) values are ranged between 0 and 1000. The results are just estimates, they will obviously be different. I just want to show how it is related.
At first you should try to set up a clean mathematical model by defining a (mathematical) function f(x, y) = z where x = #1, y = #2 and z = your output. I couldn't state such a function by your data, but that's the basis which you need whenever you want to implement your problem in a programming language.
So let's say you want something like
f(x, y) = {
50 + (x - y) * 50/y if y > x
(y - x) * 50/x if y
this function compares #1 (x) and #2 (y) and gives as a result a number between 50 and 100 or 0 and 50, depending whether x or y is bigger.
Implementing a mathematical function right in any programming language like PHP is very easy:
function f(x, y) {
return (y > x) ? ( 50 + (x-y)*50/y ) : ( (y-x) * 50/x );
}
Now you can call that function from your code or e.g. via some HTML form.
Sounds like a opportunity to fit a curve to the data and them use those coefficients to give a result for two input variables.
To get started, plot out you expected results on graph paper, see if you have a plot that makes sense.