Encode/Decode ID Reversal issue - php

SCENARIO:
A) You have a charset of 100 in which the first characters are A, B, C and the last characters are -, _.
B) The encode function returns a string of length 10.
C) The encode converts a number into the correlating number in the charset
Example: A == 0 || B == 1 || C == 2 || - == 98 || _ == 99
Amount of possibilities: 100 ^ 10 = 1e+20 || 100,000 Quadrillion || 100,000,000,000 Billion.
PROBLEM: How would you figure out whether 999 is iii, _i or i_?
Note: The solution to the problem sketched above should work for every possible situation

looks like homework...
lets have a look at our problem:
999 can not be represented as a single char in our charset
we can encode it in 3 different ways
9 9 9 => I I I
99 9 => _I
9 99 => I_
now... a charset alone does not make an encoding ... at this point you should probably read up about what a "code" is ... http://en.wikipedia.org/wiki/Code
please notice that this has absolutely nothing to do with encryption ...
so ... we need a ruleset for encoding/decoding our code
since we are supposed to make that ruleset, it is our free choice how we handle things, as long as we keep in mind what other key rules we have to follow...
the code shall be 10 characters long ... at max from what i see, or else III wouldn't possibly be a valid example of our code ... AAAAAAAAIII would be ... so lets assume that we may drop leading zeros, or As in this case, and further assume that III and AAAAAAAIII are identical
now we have the given fact that our code has 100^10 possible codewords, which can only be achived if every combination of our charset with a length of 10 is a valid codeword
so all three ... III and I_ and _I ... have to be valid codewords ...
does that mean that all three have the value of 999?
short: no
long:
as mentioned earlier, there is a ruleset needed to give the code a meaning...
since there is no encoding ruleset given, we seem to be free to create one...
lets have a look at the ruleset to encode our regular base 10 numbers ...
we have a charset from 0 to 9 -> 10 digits
the position of a digit in a number contains information...
123 for example can be written as 1*10^10 + 2*10^1 + 3*10^0
if we transfer this to our new encoding ... let's call it base 100 ... it would look like this:
123 -> 1*100^1 + 23*100^0
=> 1=B ... 23=X => 123 -> BX
999 -> 9*100^1 + 99*100^0 -> I_
but who says we have to declare the left most digit in our code to be the most siginificant digit?
what if we would interpret it otherwise?
isn't 99*100^0 + 9*100^1 = 999 too?
yes ... therefore we could write it as _I too ...
which one is the correct one now? ... that ONLY depends on the ruleset of our code ... if it says the leftmost digit ist the most significant one, the answer is I_ ... if the rightmost digit ist the most significant one, the answer is _I
as long as the ruleset for the encoding is not specified, the answer to this question cannot be solved ... you can only try to make an educated guess, and use the same convention as in our "normal" base 10 encoding ... leftmost digit = most significant digit -> I_
but please keep in mind ... this is a guess ... if i'd get such a question in a test, i'd explain why there is no answer unless the encoding rules have been specified.
tldr:
with the provided information, it's a free choice if it is i_ or _i

Related

How to calculate the maximum length of the output of the Laravel encryption method? [duplicate]

This question already has answers here:
Laravel AES-256 Encryption & MySQL
(2 answers)
Closed 2 years ago.
Setup
Given the following:
$s = Crypt::encryptString('a');
Is is possible to know, for a string of length 1, the possible range of lengths of $s?
Context
Database storage - need to store an encrypted value, and would like to set validation of the input string so the longest length input string, when encrypted, is inserted into the db without truncation.
Basic tests
Running some very crude tests locally, using the following snippet:
Route::get('/test', function() {
echo '<table>';
for ($i=0; $i < 100; $i++) {
$s = str_repeat('a', $i);
$l1 = strlen($s);
$l2 = strlen(Crypt::encryptString($s));
echo "<tr><td>$l1</td><td>$l2</td></tr>";
}
echo '</table>';
});
I can see the following, but it varies between runs, for example, a string of 'a' will be of length of either 188 or 192 (longer values seem to be between 244 and 248).
So there must be a formula. I have seen output_size = input_size + (16 - (input_size % 16)) but doesn't account for the variance.
Output
0 192
1 188
2 188
3 192
4 188
5 188
6 188
7 192
8 192
9 188
10 188
11 192
12 192
13 192
14 192
15 192
16 220
17 220
18 216
19 216
20 220
Edit
Ok, so after chatting with #Luke Joshua Park below, the variance in length comes from the laravel encryption function and the way $iv is created, which is random bytes, which can contain /.
$value inside the encryption method can also contain a /.
When values that contain a / are JSON encoded, the / is escaped to \\\/ adding an additional 3 characters per occurrence.
The real problem - can $iv and $value contain more than a single '/'?
Looking through the source code for Crypt::encryptString, we can see that the final result will be a base64 encoded JSON object that has the following structure:
{ "iv": "<128 bits in base64>", "value": "<x bits in base64>", "mac": "<256 bits in hex>" }
Where the value of x is ceil(n / 128) * 128 where n is the number of bits in the original plaintext.
This means that, for an input plaintext of length 1, the size of the output should be:
24 characters for the IV (base64).
24 characters for the ciphertext (base64).
64 characters for the SHA256 mac (hex).
10 characters for the names of the JSON fields.
19 characters of extra JSON characters e.g. {, ", :.
A final round of base64 encoding of the whole thing... (ceil(141 / 3) * 4)
Gives a total of 188. The fluctuations up to 192 are odd - your inputs are not changing in size at all (since the plaintext should always be 16 bytes between 0 - 15 length).
The real problem - can $iv and $value contain more than a single '/'?
Sure. Your worst case for the IV is the IV FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF (hex), which has a Base64 value of /////////////////////w==.
21 forward slashes * extra 3 bytes each = 63 extra bytes.
For the HMAC-SHA-2-256, you could get 32 bytes of 0xFF (worst case), which is //////////////////////////////////////////8= in base64.
42 forward slashes => 126 extra bytes.
For the ciphertext, again, the entire output could be (but likely isn't) FF FF ... FF. All one letter inputs (no matter what encoding) are a single block of ciphertext, making the output be /////////////////////w== again (+63).
The generalized formula for the maximum seems to be
IV: 24 + 63 = 87
HMAC: 24 + 63 = 87
JSON Property Names: 10
JSON Structure: 19
Ciphertext: ceil(ceil((n+1) / 16) * 16 / 3) * 4 * 4 (I used n as bytes. padded ciphertext is ceil((n+1) / blocksize) * blocksize, base64 is 4 * ceil(data / 3), extra *4 is "everything is slashes")
Base64 it all again: 4 * ceil(sum / 3)
= 4 * ceil((4 * 4 * ceil(16 * ceil((n + 1) / 16) / 3) + 203) / 3)
For n=1 that produces 400 bytes. The actual maximum is (I think) 388, because the ciphertext formula is counting 24 slashes as the worst case when 21 is the worst case. So the true supremum needs to call the ciphertext something more complicated involving floor, ceiling, and subtraction.
Note I'm going to award the bounty to #Luke Joshua Park as he got me closest to what ended up being the (closest thing to a) solution, which is to follow.
(Not a) solution
The answer is, there is no concrete answer, not without unknowns and variance. Across the three people looking at this at the time of writing (myself, Luke, and bartonjs) there was still some doubt to a 100% accurate solution.
The question was posed to figure out a reliable type and size to store encrypted data, ideally in a database independent fashion (I didn't want to specify a particular database, as I wanted to know and understand how to calculate a length regardless of the way it was persisted).
However, even strings of the smallest lengths turned out to be quite long in the worst case scenario (where a random $iv was created containing many slashes - unlikely or not, it was possible). Possible encrypted strings of n=1 possibly being 400 bytes long mean that a varchar will never be the right answer.
So... what should be done?
So, instead, it seems best, most consistent and most reliable to store encrypted data as a text field and not a varchar (in mysql land), regardless of the length of the original string. This is a disappointingly boring answer with no fancy maths involved. It's not the answer I would like to accept, but makes the most sense.
But, what about passwords?
In a brief moment of stupidity, I thought, but what about the password field? That is a varchar. But of course that is a hashed value, not an encrypted value (I hadn't had enough coffee when that thought popped into my head, ok?)

Store many numbers as a single unique number

I have the necessity to store many numbers (i can decide which numbers) as a single unique number from which i should be able to retrieve the original number.
I already know 2 ways to do this:
1) Fundamental theorem of arithmetic (Prime Numbers)
Say i have 5 values, i assign a prime number other than 1 to each value
a = 2
b = 3
c = 5
d = 7
e = 13
If i want to store a, b and c i can multiply them 2*3*5=30 and i know no other product of primes can be 30. Then to check if a value contains, for example, b, all i need to do is 30 % b == 0
2) Bitmask
Just like Linux permissions, use powers of 2 and sum each value
But these 2 methods grow up fast (1st way faster than 2nd), and using prime numbers requires me to have a lot of primes.
Is there any other method to do this efficiently when you have, for example, a thousand values?
If you are storing, say, base 10 numbers, then do a conversion through base 11 numbers. With the increased base, you have an extra 'digit'. Use that digit as a separator. So, three base 10 numbers "10, 42, 457" become "10A42A457": a single base 11 number (with 'A' as the additional digit).
Whatever base your original numbers are in, increase the base by 1 and concatenate, using the extra digit as a separator. That will give you a single number in the increased base.
That single number can be stored in whatever number base you find convenient: binary, denary or hex for example.
To retrieve your original numbers just convert to base 11 (or whatever) and replace the extra digit with separators.
ETA: You don't have to use base 11. The single number "10A42A457" is also a valid hexadecimal number, so any base of 11 or above could be used. Hex may be easier to work with than base 11.
Is there any other method to do this efficiently when you have, for example, a thousand values?
I an not a mathematician but it's basic math, all depends on range
Range 0-1: You want to store 4 numbers 0-1 - it's basically binary system
Number1 + Number2 * 2^1 + Number3 * 2^2 + Number4 * 2^3
Range 0-50 You want to store 4 numbers 0-49
Number1 + Number2 * 50^1 + Number3 * 50^2 + Number4 * 50^3
Range 0-X You want to store N numbers 0-X
Number1 + Number2 * (X+1)^1 + Number3 * (X+1)^2 + ... + NumberN * (X+1)^(N-1)
If you have no pattern for your numbers (so it can get compressed in some way) there is really no other way.
It's also super easy for computer to resolve the number unlike the prime numbers
Predetermined values
#FlorainK comment pointed me to fact I missed
(i can decide which numbers)
The only logical solution is give your numbers references
0 is 15342
1 is 6547
2 is 76234
3 is "i like stack overflow"
4 is 42141
so you'll work range 0-4 (5 options) and whatever combination length. Use reference when "encoding" and "decoding" the number
a thousand values?
so you'll work with Range 0-999
0 is 62342
1 is 7456345653
2 is 45656234532
...
998 is 7623452
999 is 4324234326453
Let's say you use 64-bit system and programming/db language that works with 64-bit integers
2^64 = 18446744073709551616
your max range is 1000^X < 18446744073709551616 where X is number of numbers you can store in one single 64-bit integer number
Which is only 6.
You can store only 6 separate numbers 0-999 that will fit one 64-bit integer number.
0,0,0,0,0,0 is 0
1,0,0,0,0,0 is 1
0,1,0,0,0,0 is 1000
999,999,999,999,999,999 is ~1e+18
Ok so you want to store "a,b,c" or "a,b" or "a,b,c,d" or "a" etc. (thanks #FlorianK)
in such case just could use bitwise operators and powers of two
$a = 1 << 0; // 1
$b = 1 << 1; // 2
$c = 1 << 2; // 4
$d = 1 << 3; // 8
.. etc
let's say $flag has $a and $c
$flag = $a | $c; // $flag is integer here
now check it
$ok = ($flag & $a) && ($flag & $c); // true
$ok = ($flag & $a) && ($flag & $b); // false
so in 64 bit system/language/os you can use up to 64 flags which gives you a 2^64 combinations
there is no really other option. prime numbers are much worse for this as you skip many numbers in-between while binary system uses every single number.
I see you are using database and you want to store this in DB.
I really think we are dealing here with XY Problem and you should reconsider your application instead of making such workarounds.

How to calculate check digit for Barcode 128 Auto php

I have a string which uses 128B and 128C conversion. ANCV0005YRF01234.
So
ANCV = 128B
0005 = 128C
YRF0= 128B
1234= 128 C
Cant use code 128 Auto as it converts the 0 after F into 128C (which i dont want.). At the moment using two different scripts and concatenating the barcode images,Need to calculate the check digit for that. Not sure how the check digit will be generated ?.
Any help is appreciated. Thanks
I got sum of characters by weights (rightmost column) from your example string = 5468 % 103 = 9 for a checksum if you want to switch back and forth.

What is the best flexible means of comparing version numbers?

I am working with a script to compare version numbers for installed and available applications. I would, on a normal basis, use simple comparison operators. Since I am building this application in a PHP 5.3 environment, I have considered the use of version_compare(), but that doesn't seem to suit my needs as cleanly as I would like.
The version strings I am comparing can follow many formats, but those I have encountered thus far are:
'2.6.18-164.6.1.el5' versus '2.6.18-92.1.13.el5'
'4.3p2' versus '5.1p1'
'5.1.6' versus '5.2.12'
'2.6.24.4-foo.bar.x.i386' versus '2.4.21-40'
As you can see, there really is no consistent format for me to work with.
The one thing I considered doing was splitting each version string on the non-numeric characters, then iterating the resulting arrays and comparing relative indices. However, I'm not sure that would be a good way of doing it, especially in the case of '2.6.24-4-foo.a.12.i386' versus '2.6.24-4-foo.b.12.i386'.
Are there any well-tested methods of comparing very loose version numbers such as this, specifically in a PHP environment?
Splitting by symbol (see preg_split) and comparing each element numerically (if both are numeric) or using string comparison (when both are alphanumeric) works for your examples:
'2.6.18-164.6.1.el5' > '2.6.18-92.1.13.el5'
2 6 18 164 6 1 e15
2 6 18 92 1 13 e16 // higher
^
'4.3p2' < '5.1p1'
4 3 p2
5 1 p1 // higher
^
'5.1.6' < '5.2.12'
5 1 6
5 2 12 // higher
^
'2.6.24.4-foo.bar.x.i386' > '2.4.21-40'
2 6 24 4 foo bar x i386 // higher
2 4 21 40 --- --- - ----
^
Where it potentially falls down is a version like 5.2-alpha-foo vs 5.2.49.4-beta-bar where you must compare a purely numeric sub-string with an alphanumeric sub-string:
5.2-alpha-foo > 5.2.49.9.-beta-bar
5 2 alpha foo ---- --- // wrong - ascii 65(a) vs 52(4)
5 2 49 4 beta bar
^
You could solve this by treating the alphanumeric field as 0 any time you have a purely numeric sub-string compared against an alphanumeric sub string.
For reference rpm compare version strings something like this:
Split on all non-alpha-numeric character
Group consecutive numeric characters together and all non-numeric characters together (i.e. 1.12.ab002 is split into 1, 12, ab, 002)
Compare each group left to right
if both versions have numeric group they are compared as numbers (i.e. 1 = 001 and 12 > 5)
if either is a non-numeric group a simple string comparison is performed
The first non-equal comparison is the result
Longer versions are considered greater (i.e. 1.2.3 < 1.2.3.0 and alp < alpha)
This has flaws: 1.2.3rc1 > 1.2.3 and 1.2.3alpha > 1.2.3 which may not be right

Help understanding what this line of PHP is doing

The variables values are listed below
$v['flag'] = 10
kPOSTAGE_HOME = 8
So what the heck does the following line do?!
if(($v['flag']&kPOSTAGE_HOME)==kPOSTAGE_HOME) {
//do something
}
& sets the bits set on both values. Some binary maths:
00001010 | 10
& 00001000 | 8
---------------
= 00001000 | 8
So 10&8 returns 8, and 8==8. Reason is to check whether a flag in that bit mask is set ...
It checks whether the bit-pattern in $v['flag'] has it's 3rd bit set.
And, for better readability, it can be simplified to the following:
if ( $v['flag'] & kPOSTAGE_HOME ) {
It's masking the '8' bit in the variable. The number '10' in base 10 == 1001 in binary, and 8 == 1000. So this means "does 1001 have the 1000" bit set?" The answer is 'yes'.
It checks whenever third bit is on in $v['flag'].
The & is "bitwise and" operator, binary of 8 is "00000100", therefore then you will do "bitwise and" all bits except the third will be zero, so in case third bit is on it will remains, therefore you have further check for equality.

Categories