How to sort alphanumeric data in mysql?

How to sort alphanumeric data in mysql? - php

Firstly I want to point out that I have tried almost everything. I am trying since last 8 hours to make my list in order, and I have applied dozen of solutions found here.
Here is SQL Fiddle with the sample data. I have found a page that manages to sort my list in the right order, and that is:
1
2
2.B3
5
9
10 A-1
10 A-3
10 B-4
10 B-5
11
12
B3-43
B3-44
B3 - 48
B3 - 49
Basztowa 3
Basztowa 4
Basztowa 5
Basztowa 7
Basztowa 9
D.1
D.2
D.10
D.11
D.12
Kabaty ul. Pod lipą 4
But I am not able to reproduce this using MySQL.
I would appreciate any help as I have no more ideas. I consider using PHP to sort my list but as far as I know DBMS are optimized for this kid of operations so if it's possible I would like to avoid doing this using PHP.
#UPDATE
Thanks to #Jakumi I have created two functions that helps me to solve my problem.
You need to create a column to store your values in sort-friendly format (zeropadded_name), create trigger on update and insert to fill zeropadded_name when name changes and that's all! Now just order by zeropadded_name and enjoy!
Helper functions
regex_replace - Its task is to help us sanitize value by removing all non-alphanumeric characters.
lpad_numbers - pads every number in our string. It's a bit ugly, as I don't know MySQL functions much, but hey, it works, quite fast.
Example:
SELECT lpad_numbers(regex_replace('[^a-zA-Z0-9]', ' ', 'B3 - A-5'));
#B0003A0005
DROP FUNCTION IF EXISTS regex_replace;
CREATE FUNCTION `regex_replace`(
pattern VARCHAR(1000)
CHARSET utf8
COLLATE utf8_polish_ci,
replacement VARCHAR(1000)
CHARSET utf8
COLLATE utf8_polish_ci,
original VARCHAR(1000)
CHARSET utf8
COLLATE utf8_polish_ci
) RETURNS varchar(1000) CHARSET utf8
DETERMINISTIC
BEGIN
DECLARE temp VARCHAR(1000)
CHARSET utf8
COLLATE utf8_polish_ci;
DECLARE ch VARCHAR(1)
CHARSET utf8
COLLATE utf8_polish_ci;
DECLARE i INT;
SET i = 1;
SET temp = '';
IF original REGEXP pattern
THEN
loop_label: LOOP
IF i > CHAR_LENGTH(original)
THEN
LEAVE loop_label;
END IF;
SET ch = SUBSTRING(original, i, 1);
IF NOT ch REGEXP pattern
THEN
SET temp = CONCAT(temp, ch);
ELSE
SET temp = CONCAT(temp, replacement);
END IF;
SET i = i + 1;
END LOOP;
ELSE
SET temp = original;
END IF;
RETURN temp;
END;
DROP FUNCTION IF EXISTS lpad_numbers;
CREATE FUNCTION `lpad_numbers`(str VARCHAR(256)) RETURNS varchar(256) CHARSET utf8 COLLATE utf8_polish_ci
BEGIN
DECLARE i, len SMALLINT DEFAULT 1;
DECLARE ret VARCHAR(256) DEFAULT '';
DECLARE num VARCHAR(256) DEFAULT '';
DECLARE c CHAR(1);
IF str IS NULL
THEN
RETURN "";
END IF;
SET len = CHAR_LENGTH(str);
REPEAT
BEGIN
SET c = MID(str, i, 1);
IF c BETWEEN '0' AND '9'
THEN
SET num = c;
SET i = i + 1;
REPEAT
BEGIN
SET c = MID(str, i, 1);
SET num = CONCAT(num, c);
SET i = i + 1;
END;
UNTIL c NOT BETWEEN '0' AND '9' END REPEAT;
SET ret = CONCAT(ret, LPAD(num, 4, '0'));
ELSE
SET ret = CONCAT(ret, c);
SET i = i + 1;
END IF;
END;
UNTIL i > len END REPEAT;
RETURN ret;
END;

splitting according to underlying structure
Technically, the mysql sorting mechanism works correctly but your strings are formatted in the wrong way. The underlying structure of your data is something like the following (Original column kept for ease of association to the example):
alpha1 num1 alpha2 num2 ... Original
1 1
2 2
2 B 3 2.B3
5 5
9 9
10 A 1 10 A-1
10 A 3 10 A-3
10 B 4 10 B-4
10 B 5 10 B-5
11 11
12 12
B 3 43 B3-43
B 3 44 B3-44
B 3 48 B3 - 48
B 3 49 B3 - 49
Basztowa 3 Basztowa 3
Basztowa 4 Basztowa 4
Basztowa 5 Basztowa 5
Basztowa 7 Basztowa 7
Basztowa 9 Basztowa 9
D 1 D.1
D 2 D.2
D 10 D.10
D 11 D.11
D 12 D.12
If you would sort them now with ORDER BY alpha1, num1, alpha2, num2 they would be sorted as you want them. But the already "formatted" version (the Original column) cannot be sorted easily, because the parts that shall be sorted alphabetically and the parts that shall be sorted numerically are mixed together.
zeropadding
There is a somewhat less extensive alternative needing only one extra column where you assume no number ever goes beyond let's say 10000 and you can now replace every number (not digit!) with a zero-padded version, so 10 A-1 would become 0010A0001 (which is 0010 and A and 0001, obviously), but I don't see this being made on-the-fly in an ORDER BY statement.
But for this example, the zeropadded version (Assumption: every number < 10000):
Original Zeropadded
1 0001
2 0002
2.B3 0002B0003
5 0005
9 0009
10 A-1 0010A0001
10 A-3 0010A0003
10 B-4 0010B0004
10 B-5 0010B0005
11 0011
12 0012
B3-43 B00030043
B3-44 B00030043
B3 - 48 B00030048
B3 - 49 B00030049
Basztowa 3 Baztowa0003
Basztowa 4 Baztowa0004
Basztowa 5 Baztowa0005
Basztowa 7 Baztowa0007
Basztowa 9 Baztowa0009
D.1 D0001
D.2 D0002
D.10 D0010
D.11 D0011
D.12 D0012
This would be sortable to your wishes with ORDER BY zeropadded.
So in the end, you probably have to sort in php or create more columns that help you sort via reformatting/sanitizing/splitting your input.
update
zeropadding explained (simplified)
The main idea behind zeropadding is that the natural format of numbers is different from their format in the computer. In the computer the number 2 is effectively the sequence of digits 0..0002 (so the leading zeros are included) similar 10 (0..0010). When the computer compares numbers, it will go from left to right until it finds different digits:
0...0002
0...0010
======!. (the ! marks the point where the first digit is different)
And then it will determine which digit is bigger or smaller. In this case 0 < 1, and therefore 2 < 10. (Of course the computer uses binary, but that doesn't change the idea).
Now, a string is technically a sequence of characters. String comparison works slightly differently. When two strings are compared, they are not (left) padded, so the first character of each string is really the first character and not a padding (like a space for example). So technically the string A10 is a sequence of characters A, 1 and 0. And since the string comparison is used, it is "smaller" than A2, because the string comparison doesn't see the numbers as numbers but as characters (that are digits):
A10
A2
=! (the ! marks the point where the first character is different)
and because 1 < 2 as characters, A10 < A2. Now to circumvent this problem, we force the format of numbers in the string to be the same as it would be in numerical comparisons, by padding the numbers to the same length which is aligning the digits according to their place value:
A0010
A0002
===!. (the ! marks the point where the first character is different)
Now it's effectively the same comparison you would expect in numerical comparisons. However, you have to make some assumption about the maximal length of numbers, so that you can choose the padding appropriately. Without that assumption, you'd have a problem.
The only (logical) point that remains: When the compared string has an alphabetical character where the other has a number, what does the padding change? The answer is: Nothing. We don't change numbers into letters, and numbers are smaller than letters, so everything stays in the same order in that case.
The effect of zeropadding is: We adjust the "number" comparison in strings to be similar to the real number comparison by aligning the digit characters according their value.

SELECT name FROM realestate ORDER BY name ASC;
This should sort your list in alphanumeric data... I don't see the issue.
EDIT: OK, I still don't know if I really understood what is the goal of this issue (is it for a contest?), but I can submit this "twisted" query (that I hope I will never use in my career):
SELECT name FROM realestate
ORDER BY IF(SUBSTRING(name, 1, 2) REGEXP '[A-Z]', 100000, CAST(name AS UNSIGNED)) ASC,
SUBSTRING(name, 1, 2) ASC,
CAST(SUBSTRING(name FROM LOCATE('.', name)+1) AS UNSIGNED) ASC,
REPLACE(name, ' ', '') ASC;
Maybe someone can find an easier way, because I admit my answer is a bit complicated. BUT, Kamil and Jakumi solutions are much more tricky and complicated.

Related

Reverse ranking order numbers , without an array

Let's say we have a ranking system with integers 1 till a maximum of 100.000 .
I want a function that reverses the rank of an integer.
So that value 100.000 becomes rank 1 and value 1 becomes rank 100.000 .
function reverseRank($currentRank,$maxRank){
// create array with numbers 1 till $maxRank.
// reverse order of values and return key of $currentRank...
// but this seems a bit a waste of resources.
return $reversedRank;
}
What would be the best way to do this performance wise in php ?

Lets assume for simplicity that you have a range of ranks between 1 and 10.
We need to find a mapping function that will swap
1 -> 10
2 -> 9
3 -> 8
4 -> 7
5 -> 6
6 -> 5
7 -> 4
8 -> 3
9 -> 2
10 -> 1
Now it might be easier to think about the solution.
What function will work for it? This function will have a couple of things known in the runtime.
Lower and upper bands of the range, so 1 and 10 respectively.
We can sketch this in slightly more formal way:
f(1) -> 10
f(2) -> 9
f(3) -> 8
(...)
f(x) -> y; // 1 and 10 are know to be the limits
what if we try to apply
Lets try playing with it. f(1) to be 10 could be:
def f(x):
return x*UPPER_LIMIT
Definitely it will break as soon as we try it with 2.
F(2) -> 9, looking at this I am able to observe that I can write it as:
Lets return a number that is as much smaller from UPPER limit as the x is more than LOWER limit.
def f(x):
return UPPER_LIMIT - (x-LOWER_LIMIT)
And, by running it for more values it looks like it works.
I hope I understood your question and that helps.

Get all n-size combinations with k-size letters list

Could anyone help me? I am trying find formula and write piece of code in PHP language which makes next
Imagine, we have 3 types of something, k = 1,2,3 and length of this numbers could be various (n-length), but neighboring type should not(!) be the same - 1,1 or 2,2
For example
k = 1,2,3
n = 5
Output
1,2,3,1,2 |
1,2,3,1,3 |
1,2,3,2,1 |
1,2,3,2,3 |
1,3,2,1,3 |
1,3,2,1,2 |
1,3,2,1,3 |
1,3,2,3,1 |
1,3,2,3,2
.........
Mb this is has some common named problem, share with me pls and I'will try to find some resources about
Thanks

The simplest way of generation such lists is recursive (if n, k are not large -note that variant count is k*(k-1)n-1).
Pseudocode:
Generate(list, n, k, lastvalue)
if (list.length = n)
output(list)
else
for i = 1 .. k
if (i != lastvalue)
Generate(list + i, n, k, i)
Delphi code
procedure Generate(list: string; n, k, lastvalue: Integer);
var
i: Integer;
begin
if (Length(list) = n) then
Memo1.Lines.Add(list)
else
for i := 1 to k do
if (i <> lastvalue) then
Generate(list + IntToStr(i), n, k, i)
end;
begin
Generate('', 4, 3, 0);
Output for n=4, k=3
1212 1213 1231 1232 1312 1313 1321 1323
2121 2123 2131 2132 2312 2313 2321 2323
3121 3123 3131 3132 3212 3213 3231 3232

well u do a loop in a loop. since k has a length, and the numbers in the k variable are the ones you move along, they present the outer loop. (for loop)
now u can run along the output because u have the number n known as well.
u place k[1] as the first variable of n[1] and it doesn't change until the inner loop is over. (in this case k[1] is 1). now u do a while loop with a changeable (a for example) variable that runs over the n array created. n will be (1,null, null, null, null). while a != n.lenth(). u check the n(a-1) for it's value to make sure is not the same. whenever a reaches n.length, u change the value of the last number by the next on the k array, and then you go back 2 spots (n[a-1]) and change it and go back, go recoursive all the way to the start until all spots have been changed and the n[2] of the array has the highest value of the k array. to make life easier, u can make a new array, let us assign as j for the matter, which will get a value as soon as the closest n[a] spot gets the last value possible.
BTW whenever u go back reset the value of spots u ran by to null so that all the numbers in the k array are optional again. when the j array is full, you reset all of it and move on in the for loop.
hope i was of help, if you have any questions feel free to ask

Check 2 numbers algorithm

I have 2 fields on db. Minor and Major:
Minor, Major
0,0
1,0
2,0
3,0
4,0
5,0
7,0
8,0
...
65536,0
0,1
1,1
2,1
3,1
4,1
...
65536,1
0,2
What is best way to compare this. I am doing this on Bookshelf.js but in php or ruby also is welcome. I need to check current situation, get greater major and add minor + 1, if is not 65536 else minor is 0 major gets major + 1.
Thanks in advance.
EDIT:
I have to save major and minor to respective fields. They increment for every user registered.
eg.
Users
id, username,minor,major
1, john , 0, 0
2, mike, 1, 0
....
65537, jeff, 65536,0
Now Tom's ,major increments becuse last minor on table is 65536.
65538, tom, 0 , 1
I don't know how to explain more.

I'm absolutely not sure to understand the problem, but here are some ideas about limiting the range of an integer value:
Like many languages, MySQL has some UNSIGNED SMALLINT data types that holds 2-bytes values, that is from 0 to 65535 (not 65536 !)
Most programming laguage have a "modulus" operator (% -- php mysql) that allow you to collect the rest of an integral division. For example, ... % 65536 will return a value between 0 and 65535 incl. If you really need a value between 0 and 65536 incl, you will write ... % 65537 instead.
You could use mask operator ("bitwise and" & -- php mysql). For example, ... & 0xFFFF will only keep the two lowest significant bytes of a number -- actually performing the equivalent of a "modulo 65536" operation (having a result between 0 and 65535 incl.)

$magicNumber = 65536;
$sql = "
SELECT
MAX(userIndex) userIndex
FROM (
SELECT
(Minor + (Major * ".$magicNumber.")) AS userIndex
FROM TableName
) AS innerSelect
";
running the sql gives you the currently highest userIndex, let's say it is 145323.
Now increment this by one, and you have $newIndex = 145324.
This gives you the currently highest Index. Now the fields can be calculated like this:
$major = (int)($newIndex / $magicNumber);
$minor = $newIndex % $magicNumber;

mysql between question

For the mysql "between" operator, is it necessary for the before and after value to be numerically in order?
like:
BETWEEN -10 AND 10
BETWEEN 10 AND -10
Will both of these work or just the first one?
Also, can I do:
WHERE thing<10 AND thing>-10
Will that work or do I have to use between?
Lastly, can I do:
WHERE -10<thing<10
?

BETWEEN -10 AND 10
This will match any value from -10 to 10, bounds included.
BETWEEN 10 AND -10
This will never match anything.
WHERE thing<10 AND thing>-10
This will match any value from -10 to 10, bounds excluded.
Also, if thing is a non-deterministic expression, it is evaluated once in case of BETWEEN and twice in case of double inequality:
SELECT COUNT(*)
FROM million_records
WHERE RAND() BETWEEN 0.6 AND 0.8;
will return a value around 200,000;
SELECT COUNT(*)
FROM million_records
WHERE RAND() >= 0.6 AND RAND() <= 0.8;
will return a value around 320,000

The min value must come before the max value. Also note that the end points are included, so BETWEEN is equivalent to:
WHERE thing>=-10 AND thing<=10

Please keep it to one question per post. Anyway:
http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#operator_between
BETWEEN min AND max, in that order.
from the link:
This is equivalent to the expression (min <= expr AND expr <= max) if
all the arguments are of the same type
The second alternative will also work, of course.

First question:
Will both of these work or just the first one?
yes,both of these work
Second question:
Will that work or do I have to use between?
it also valid but as you can see just empty result

Yes your between must be in order to return the excepted result.
Let's say you have a table with a row called mynumber that contains 10 rows :
MyNumber
--------
1
2
3
4
5
6
7
8
9
10
So
select * from thistable table where table.myNumber BETWEEN 1 and 5
will return
1
2
3
4
5
but
select * from thistable table where table.myNumber BETWEEN 5 and 1
return nothing.
Your 2nd question : yes it is the same thing. but beware in you example you will have to put <= and >= to be the same as between. if not, in our example, you would get
2
3
4
Hope it help

I've already seen such things work with integers :
WHERE -10
But it's better to avoid it. One reason is that it doesn't seem to work well with other types. And MySQL doesn't issue any warning.
I've tried it with datetime columns, and the result was wrong.
My request looked like this one:
SELECT *
FROM FACT__MODULATION_CONSTRAINTS constraints
WHERE constraints.START_VALIDITY<= now() < constraints.END_VALIDITY
The result was not as expected. I got twice as many results as the same request with two inequalities (which returned correct results). Only the 1st part of the expression evaluated correctly.

Finding similar number patterns in table

Ok, let's suppose we have members table. There is a field called, let's say, about_member. There will be a string like this 1-1-2-1-2 for everybody. Let's suppose member_1 has this string 1-1-2-2-1 and he searches who has the similar string or as much similar as possible. For example if member_2 has string 1-1-2-2-1 it will be 100% match, but if member_3 has string like this 2-1-1-2-1 it will be 60% match. And it has to be ordered by match percent. What is the most optimal way to do it with MYSQL and PHP? It's really hard to explain what I mean, but maybe you got it, if not, ask me. Thanks.
Edit: Please give me ideas without Levenshtein method. That answer will get bounty. Thanks. (bounty will be announced when I will be able to do that)

convert your number sequences to bit masks and use BIT_COUNT(column ^ search) as similarity function, ranged from 0 (= 100% match, strings are equal) to [bit length] (=0%, strings are completely different). To convert this similarity function to the percent value use
100 * (bit_length - similarity) / bit_length
For example, "1-1-2-2-1" becomes "00110" (assuming you have only two states), 2-1-1-2-1 is "10010", bit_count(00110 ^ 10010) = 2, bit-length = 5, and 100 * (5 - 2) / 5 = 60%.

Jawa posted this idea originally; here is my attempt.
^ is the XOR function. It compares 2 binary numbers bit-by-bit and returns 0 if both bits are the same, and 1 otherwise.
0 1 0 0 0 1 0 1 0 1 1 1 (number 1)
^ 0 1 1 1 0 1 0 1 1 0 1 1 (number 2)
= 0 0 1 1 0 0 0 0 1 1 0 0 (result)
How this applies to your problem:
// In binary...
1111 ^ 0111 = 1000 // (1 bit out of 4 didn't match: 75% match)
1111 ^ 0000 = 1111 // (4 bits out of 4 didn't match: 0% match)
// The same examples, except now in decimal...
15 ^ 7 = 8 (1000 in binary) // (1 bit out of 4 didn't match: 75% match)
15 ^ 0 = 15 (1111 in binary) // (4 bits out of 4 didn't match: 0% match)
How we can count these bits in MySQL:
BIT_COUNT(b'0111') = 3 // Bit count of binary '0111'
BIT_COUNT(7) = 3 // Bit count of decimal 7 (= 0111 in binary)
BIT_COUNT(b'1111' ^ b'0111') = 1 // (1 bit out of 4 didn't match: 75% match)
So to get the similarity...
// First we focus on calculating mismatch.
(BIT_COUNT(b'1111' ^ b'0111') / YOUR_TOTAL_BITS) = 0.25 (25% mismatch)
(BIT_COUNT(b'1111' ^ b'1111') / YOUR_TOTAL_BITS) = 0 (0% mismatch; 100% match)
// Now, getting the proportion of matched bits is easy
1 - (BIT_COUNT(b'1111' ^ b'0111') / YOUR_TOTAL_BITS) = 0.75 (75% match)
1 - (BIT_COUNT(b'1111' ^ b'1111') / YOUR_TOTAL_BITS) = 1.00 (100% match)
If we could just make your about_member field store data as bits (and be represented by an integer), we could do all of this easily! Instead of 1-2-1-1-1, use 0-1-0-0-0, but without the dashes.
Here's how PHP can help us:
bindec('01000') == 8;
bindec('00001') == 1;
decbin(8) == '01000';
decbin(1) == '00001';
And finally, here's the implementation:
// Setting a member's about_member property...
$about_member = '01100101';
$about_member_int = bindec($about_member);
$query = "INSERT INTO members (name,about_member) VALUES ($name,$about_member_int)";
// Getting matches...
$total_bits = 8; // The maximum length the member_about field can be (8 in this example)
$my_member_about = '00101100';
$my_member_about_int = bindec($my_member_about_int);
$query = "
SELECT
*,
(1 - (BIT_COUNT(member_about ^ $my_member_about_int) / $total_bits)) match
FROM members
ORDER BY match DESC
LIMIT 10";
This last query will have selected the 10 members most similar to me!
Now, to recap, in layman's terms,
We use binary because it makes things easier; the binary number is like a long line of light switches. We want to save our "light switch configuration" as well as find members that have the most similar configurations.
The ^ operator, given 2 light switch configurations, does a comparison for us. The result is again a series of switches; a switch will be ON if the 2 original switches were in different positions, and OFF if they were in the same position.
BIT_COUNT tells us how many switches are ON--giving us a count of how many switches were different. YOUR_TOTAL_BITS is the total number of switches.
But binary numbers are still just numbers... and so a string of 1's and 0's really just represents a number like 133 or 94. But it's a lot harder to visualize our "light switch configuration" if we use decimal numbers. That's where PHP's decbin and bindec come in.
Learn more about the binary numeral system.
Hope this helps!

The obvious solution is to look at the levenstein distance (there isn't an implementation built into mysql but there are other implementations accesible e.g. this one in pl/sql and some extensions), however as usual, the right way to solve the problem would be to have normalised the data properly in the first place.

One way to do this is to calculate the Levenshtein distance between your search string and the about_member fields for each member. Here's an implementation of the function as a MySQL stored function.
With that you can do:
SELECT name, LEVENSHTEIN(about_member, '1-1-2-1-2') AS diff
FROM members
ORDER BY diff ASC
The % of similarity is related to diff; if diff=0 then it's 100%, if diff is the size of the string (minus the amount of dashes), it's 0%.

Having read the clarification comments on the original question, the Levenshtein distance is not the answer you are looking for.
You are not trying to compute the smallest number of edits to change one string into another.
You are trying to compare one set of numbers with another set of numbers. What you are looking for is the minimum (weighted) sum of the differences between the two sets of numbers.
Place each answer in a separate column (Ans1, Ans2, Ans3, Ans4, .... )
Assume you are searching for similarities to 1-2-1-2.
SELECT UserName, Abs( Ans1 - 1 ) + Abs( Ans2 - 2 ) + Abs( Ans3 - 1 ) + Abs( Ans4 - 2) as Difference ORDER BY Difference ASC
Will list users by similarity to answers 1-2-1-2, assuming all questions are weighted evenly.
If you want to make certain answers more important, just multiply each of the terms by a weighting factor.
If the questions will always be yes/no and the number of answers is small enough that all the answers can be fitted into a single integer and all answers are equally weighted, then you could encode all the answers in a single column and use BIT_COUNT as suggested. This would be a faster and more space-efficient implementation.

I would go with the similar_text() PHP built-in. It seems to be exactly what you want:
$percent = 0;
similar_text($string1, $string2, $percent);
echo $percent;
It works as the question expects.

I would go with the Levenshtein distance approach, you can use it within MySQL or PHP.

If you don't have too many fields, you could create an index on the integer representation of about_member. Then you can find the 100% by an exact match on the about_member field, followed by the 80% matches by changing 1 bit, the 60% matches by changing 2 bits, and so on.

If you represent your answer patterns as bit sequences you can use the formula (100 * (bit_length - similarity) / bit_length).
Following the mentioned example, when we convert "1"s to bit off and "2"s to bit on "1-1-2-2-1" becomes 6 (as base-10, 00110 in binary) and "2-1-1-2-1" becomes 18 (10010b) etc.
Also, I think you should store the answers' bits to the least significant bits, but it doesn't matter as long as you are consistent that the answers of different members align.
Here's a sample script to be run against MySQL.
DROP TABLE IF EXISTS `test`;
CREATE TABLE `members` (
`id` VARCHAR(16) NOT NULL ,
`about_member` INT NOT NULL
) ENGINE = InnoDB;
INSERT INTO `members`
(`id`, `about_member`)
VALUES
('member_1', '6'),
('member_2', '18');
SELECT 100 * ( 5 - BIT_COUNT( about_member ^ (
SELECT about_member
FROM members
WHERE id = 'member_1' ) ) ) / 5
FROM members;
The magical 5 in the script is the number of answers (bit_length in the formula above). You should change it according to your situation, regardless of how many bits there are in the actual data type used, as BIT_COUNT doesn't know how many bytes you are using.
BIT_COUNT returns the number of bits set and is explained in MySQL manual. ^ is the binary XOR operator in MySQL.
Here the comparison of member_1's answers is compared with everybody's, including their own - which results as 100% match, naturally.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.