What is the best flexible means of comparing version numbers? - php

I am working with a script to compare version numbers for installed and available applications. I would, on a normal basis, use simple comparison operators. Since I am building this application in a PHP 5.3 environment, I have considered the use of version_compare(), but that doesn't seem to suit my needs as cleanly as I would like.
The version strings I am comparing can follow many formats, but those I have encountered thus far are:
'2.6.18-164.6.1.el5' versus '2.6.18-92.1.13.el5'
'4.3p2' versus '5.1p1'
'5.1.6' versus '5.2.12'
'2.6.24.4-foo.bar.x.i386' versus '2.4.21-40'
As you can see, there really is no consistent format for me to work with.
The one thing I considered doing was splitting each version string on the non-numeric characters, then iterating the resulting arrays and comparing relative indices. However, I'm not sure that would be a good way of doing it, especially in the case of '2.6.24-4-foo.a.12.i386' versus '2.6.24-4-foo.b.12.i386'.
Are there any well-tested methods of comparing very loose version numbers such as this, specifically in a PHP environment?

Splitting by symbol (see preg_split) and comparing each element numerically (if both are numeric) or using string comparison (when both are alphanumeric) works for your examples:
'2.6.18-164.6.1.el5' > '2.6.18-92.1.13.el5'
2 6 18 164 6 1 e15
2 6 18 92 1 13 e16 // higher
^
'4.3p2' < '5.1p1'
4 3 p2
5 1 p1 // higher
^
'5.1.6' < '5.2.12'
5 1 6
5 2 12 // higher
^
'2.6.24.4-foo.bar.x.i386' > '2.4.21-40'
2 6 24 4 foo bar x i386 // higher
2 4 21 40 --- --- - ----
^
Where it potentially falls down is a version like 5.2-alpha-foo vs 5.2.49.4-beta-bar where you must compare a purely numeric sub-string with an alphanumeric sub-string:
5.2-alpha-foo > 5.2.49.9.-beta-bar
5 2 alpha foo ---- --- // wrong - ascii 65(a) vs 52(4)
5 2 49 4 beta bar
^
You could solve this by treating the alphanumeric field as 0 any time you have a purely numeric sub-string compared against an alphanumeric sub string.

For reference rpm compare version strings something like this:
Split on all non-alpha-numeric character
Group consecutive numeric characters together and all non-numeric characters together (i.e. 1.12.ab002 is split into 1, 12, ab, 002)
Compare each group left to right
if both versions have numeric group they are compared as numbers (i.e. 1 = 001 and 12 > 5)
if either is a non-numeric group a simple string comparison is performed
The first non-equal comparison is the result
Longer versions are considered greater (i.e. 1.2.3 < 1.2.3.0 and alp < alpha)
This has flaws: 1.2.3rc1 > 1.2.3 and 1.2.3alpha > 1.2.3 which may not be right

Related

Store many numbers as a single unique number

I have the necessity to store many numbers (i can decide which numbers) as a single unique number from which i should be able to retrieve the original number.
I already know 2 ways to do this:
1) Fundamental theorem of arithmetic (Prime Numbers)
Say i have 5 values, i assign a prime number other than 1 to each value
a = 2
b = 3
c = 5
d = 7
e = 13
If i want to store a, b and c i can multiply them 2*3*5=30 and i know no other product of primes can be 30. Then to check if a value contains, for example, b, all i need to do is 30 % b == 0
2) Bitmask
Just like Linux permissions, use powers of 2 and sum each value
But these 2 methods grow up fast (1st way faster than 2nd), and using prime numbers requires me to have a lot of primes.
Is there any other method to do this efficiently when you have, for example, a thousand values?
If you are storing, say, base 10 numbers, then do a conversion through base 11 numbers. With the increased base, you have an extra 'digit'. Use that digit as a separator. So, three base 10 numbers "10, 42, 457" become "10A42A457": a single base 11 number (with 'A' as the additional digit).
Whatever base your original numbers are in, increase the base by 1 and concatenate, using the extra digit as a separator. That will give you a single number in the increased base.
That single number can be stored in whatever number base you find convenient: binary, denary or hex for example.
To retrieve your original numbers just convert to base 11 (or whatever) and replace the extra digit with separators.
ETA: You don't have to use base 11. The single number "10A42A457" is also a valid hexadecimal number, so any base of 11 or above could be used. Hex may be easier to work with than base 11.
Is there any other method to do this efficiently when you have, for example, a thousand values?
I an not a mathematician but it's basic math, all depends on range
Range 0-1: You want to store 4 numbers 0-1 - it's basically binary system
Number1 + Number2 * 2^1 + Number3 * 2^2 + Number4 * 2^3
Range 0-50 You want to store 4 numbers 0-49
Number1 + Number2 * 50^1 + Number3 * 50^2 + Number4 * 50^3
Range 0-X You want to store N numbers 0-X
Number1 + Number2 * (X+1)^1 + Number3 * (X+1)^2 + ... + NumberN * (X+1)^(N-1)
If you have no pattern for your numbers (so it can get compressed in some way) there is really no other way.
It's also super easy for computer to resolve the number unlike the prime numbers
Predetermined values
#FlorainK comment pointed me to fact I missed
(i can decide which numbers)
The only logical solution is give your numbers references
0 is 15342
1 is 6547
2 is 76234
3 is "i like stack overflow"
4 is 42141
so you'll work range 0-4 (5 options) and whatever combination length. Use reference when "encoding" and "decoding" the number
a thousand values?
so you'll work with Range 0-999
0 is 62342
1 is 7456345653
2 is 45656234532
...
998 is 7623452
999 is 4324234326453
Let's say you use 64-bit system and programming/db language that works with 64-bit integers
2^64 = 18446744073709551616
your max range is 1000^X < 18446744073709551616 where X is number of numbers you can store in one single 64-bit integer number
Which is only 6.
You can store only 6 separate numbers 0-999 that will fit one 64-bit integer number.
0,0,0,0,0,0 is 0
1,0,0,0,0,0 is 1
0,1,0,0,0,0 is 1000
999,999,999,999,999,999 is ~1e+18
Ok so you want to store "a,b,c" or "a,b" or "a,b,c,d" or "a" etc. (thanks #FlorianK)
in such case just could use bitwise operators and powers of two
$a = 1 << 0; // 1
$b = 1 << 1; // 2
$c = 1 << 2; // 4
$d = 1 << 3; // 8
.. etc
let's say $flag has $a and $c
$flag = $a | $c; // $flag is integer here
now check it
$ok = ($flag & $a) && ($flag & $c); // true
$ok = ($flag & $a) && ($flag & $b); // false
so in 64 bit system/language/os you can use up to 64 flags which gives you a 2^64 combinations
there is no really other option. prime numbers are much worse for this as you skip many numbers in-between while binary system uses every single number.
I see you are using database and you want to store this in DB.
I really think we are dealing here with XY Problem and you should reconsider your application instead of making such workarounds.

Encode/Decode ID Reversal issue

SCENARIO:
A) You have a charset of 100 in which the first characters are A, B, C and the last characters are -, _.
B) The encode function returns a string of length 10.
C) The encode converts a number into the correlating number in the charset
Example: A == 0 || B == 1 || C == 2 || - == 98 || _ == 99
Amount of possibilities: 100 ^ 10 = 1e+20 || 100,000 Quadrillion || 100,000,000,000 Billion.
PROBLEM: How would you figure out whether 999 is iii, _i or i_?
Note: The solution to the problem sketched above should work for every possible situation
looks like homework...
lets have a look at our problem:
999 can not be represented as a single char in our charset
we can encode it in 3 different ways
9 9 9 => I I I
99 9 => _I
9 99 => I_
now... a charset alone does not make an encoding ... at this point you should probably read up about what a "code" is ... http://en.wikipedia.org/wiki/Code
please notice that this has absolutely nothing to do with encryption ...
so ... we need a ruleset for encoding/decoding our code
since we are supposed to make that ruleset, it is our free choice how we handle things, as long as we keep in mind what other key rules we have to follow...
the code shall be 10 characters long ... at max from what i see, or else III wouldn't possibly be a valid example of our code ... AAAAAAAAIII would be ... so lets assume that we may drop leading zeros, or As in this case, and further assume that III and AAAAAAAIII are identical
now we have the given fact that our code has 100^10 possible codewords, which can only be achived if every combination of our charset with a length of 10 is a valid codeword
so all three ... III and I_ and _I ... have to be valid codewords ...
does that mean that all three have the value of 999?
short: no
long:
as mentioned earlier, there is a ruleset needed to give the code a meaning...
since there is no encoding ruleset given, we seem to be free to create one...
lets have a look at the ruleset to encode our regular base 10 numbers ...
we have a charset from 0 to 9 -> 10 digits
the position of a digit in a number contains information...
123 for example can be written as 1*10^10 + 2*10^1 + 3*10^0
if we transfer this to our new encoding ... let's call it base 100 ... it would look like this:
123 -> 1*100^1 + 23*100^0
=> 1=B ... 23=X => 123 -> BX
999 -> 9*100^1 + 99*100^0 -> I_
but who says we have to declare the left most digit in our code to be the most siginificant digit?
what if we would interpret it otherwise?
isn't 99*100^0 + 9*100^1 = 999 too?
yes ... therefore we could write it as _I too ...
which one is the correct one now? ... that ONLY depends on the ruleset of our code ... if it says the leftmost digit ist the most significant one, the answer is I_ ... if the rightmost digit ist the most significant one, the answer is _I
as long as the ruleset for the encoding is not specified, the answer to this question cannot be solved ... you can only try to make an educated guess, and use the same convention as in our "normal" base 10 encoding ... leftmost digit = most significant digit -> I_
but please keep in mind ... this is a guess ... if i'd get such a question in a test, i'd explain why there is no answer unless the encoding rules have been specified.
tldr:
with the provided information, it's a free choice if it is i_ or _i

& as a Arithmetic Operator in PHP [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Reference - What does this symbol mean in PHP?
I am working on some legacy code where I came across the following function:
function is_odd($number) {
return $number & 1; // 0 = even, 1 = odd
}
I have never seen a method to check if a number is odd written like that before and am just trying to understand what they actually are doing.
& is a bitwise-and, it basically works so that for every bit that is 1 in both operands, it yields 1 in the resulting value and 0 for all other bits. If you convert any number to its bit representation, you quickly see that it's the lowest bit that determines whether a number is even or odd, for example:
5 = 101
10 = 1010
13 = 1101
1030 = 10000000110
The lowest bit (the one on the very right, also called the least significant bit) is 1 for every odd number and 0 for every even number. Doing $n & 1 will always yield 0 for every other bit than the lowest bit (because the number 1 only has one bit, you can imagine that the rest of the bits are left-padded with 0 to match the length of the other operand). So basically the operation boils down to comparing the lowest bit of the operands, and 1 & 1 is 1, all other combinations are 0. So basically when the $n & 1 yields 1, it means the number is odd, otherwise it's even.
EDIT.
Here's a few examples to demonstrate how the bitwise-and works for the example values I gave earlier, the number in the parenthesis is the original decimal number:
101 (5)
& 001 (1)
---
001 (1) = true
1010 (10)
& 0001 (1)
----
0000 (0) = false
1101 (13)
& 0001 (1)
----
0001 (1) = true
10000000110 (1030)
& 00000000001 (1)
-----------
00000000000 (0) = false
From this you can easily see that the result is only true when both operands' right-most bits are 1.

Alternating table row styles in PHP - strange usage of bitwise operator

Looking at some code written by another developer, I came across this:
for($i=1; $i<=30; $i++)
{
if($i&1)
$color = '#fff';
else
$color = '#bbb';
}
This $color variable is used for row background colour later in the code. The alternating colours work fine.
If I was writing this, I would have used the modulus operator (%) rather than the bitwise (&) operator.
Why does the bitwise operator work in this case? Is there any advantage of using this method rather than the modulus operator?
The & operator does a bitwise comparison on the number. So if you do
$i & 1
it will then tell you if the '1' flag is set, such as in binary:
001010111010
The last number is the '1' flag (remember, binary goes 1, 2, 4, 8 etc. in reverse order), which in this case is set to 0.
Since 1 is the only odd flag in binary, it will tell you if the number is odd or even.
if $i is 3 for example, then in binary it will be 011 - the last number is a 1 (the 1 flag) and thus $i & 1 will be true.
if $i is 4 for example, then in binary it will be 100 - the last number is a 0 (the 1 flag) and thus $i & 1 will be false.
It works because the first bit is always 1 if the number is odd and 0 if the number is even.
1
10
11
100
101
110
111
etc.
In theory bitwise operation is faster than the modulus operation, but it's possible that the interpreter would have optimized the modulus operation down to bitwise operation anyway.
Why the other developer used it, we can only guess: out of habit, copy-pasted from somewhere, doesn't know about the modulus operator, showing off, wanting to optimize...

modulus operand in ruby compared to php

I'm looking for a simple explanation for how Ruby's modulo operand works and why, in Ruby
puts 4 % 3 # 1
puts -4 % 3 # 2 <--why?
puts -4 % -3 # -1
but in PHP:
<?php
echo 4 % 3; # 1
echo -4 % 3; # -1
echo -4 % -3; # -1
Looks to me like -4 % 3 is actally 8 % 3 (8 being the difference between 4 and -4).
They can both be considered correct, depending on your definition. If a % n == r, then it should hold that:
a == q*n + r
where q == a / n.
Whether r is positive or negative is determined by the value of q. So in your example, either of:
-4 == -1*3 + (-1) // PHP
-4 == -2*3 + 2 // Ruby
To put it another way, the definition of % depends on the definition of /.
See also the table here: http://en.wikipedia.org/wiki/Modulus_operator#Remainder_calculation_for_the_modulo_operation. You'll see that this varies substantially between various programming languages.
Here's a snippet on the topic from The Ruby Programming Language, by Matz and David Flanagan.
When one (but not both) of the operands is negative, Ruby performs the
integer division and modulo operations differently than languages like
C, C++, and Java do (but the same as the languages Python and Tcl).
Consider the quotient -7/3. Ruby rounds toward negative infinity and
returns -3. C and related languages round toward zero instead and
return -2. In Ruby, -a/b equals a/-b but my not equal -(a/b).
Ruby's definition of the module operation also differs from that of C
and Java. In Ruby, -7%3 is 2. In C and Java, the result is -1
instead. The magnitude of the result differs, because the quotient
differed. But the sign of the result differs, too. In Ruby, the sign
of the result is always the sign of the second operand. In C and
Java, the sign of the result is always the sign of the first operand.
(Ruby's remainder method behaves like the C modulo operator.)
It actually boils down to the implementation of the language's integer casting/rounding. Since the actual equation is:
a - (n * int(a/n))
It is the int(a/n) portion of the equation that differs. If a == -4 and n == 3, PHP will return -1, while Ruby will produce -2. Now the equation looks like this in Ruby:
-4 - (3 * -2)
and this in PHP
-4 - (3 * -1)
According to Wolfram Alpha, 2 is correct.
edit: Seems you should be asking why PHP works that way?
edit2: PHP defines it as the remainder from the devision A/B. Whether you consider it a bug, wrong, or a different way of doing things is up to you, I suppose. Personally, I go for the first 2.

Categories