I could use some advice -
I'm parsing a binary file in php, to be specific, it's a Sega Genesis rom-file. According to the table I have made, certain bytes correspond to characters or control different stuff with the game's text-engine.
There are bytes, which are used for characters as well as "controller"-bytes, for line-breaks, conditions, color and a bunch of other stuff, so a typical sentence will probably look like this:
FC 03 E7 05 D3 42 79 20 64 6F 69 6E 67 20 73 6F 2C BC BE 08 79 6F 75 20 6A 75 73 74 20 61 63 71 75 69 72 65 64 BC BE 04 61 20 74 65 73 74 61 6D 65 6E 74 20 74 6F 20 79 6F 75 72 BC 73 74 61 74 75 73 20 61 73 20 61 20 77 61 72 72 69 6F 72 21 BD BC
which I can translate to:
<FC><03><E7><05><D3>By doing so,<NL><BE><08>you just acquired<NL><BE><04>a testament to your<NL>status as a warrior!<CURSOR>
I want to specify properties for such a controller-byte-string such as length and write my own values to certain positions..
See,
bytes that translate into characters (00 to 7F) or line-breaks (BC) only consist of a single byte while others consist of 2 (BE XX). Conditions (FC) even consist of 5 bytes:
FC XX YY (where X and Y refer to offsets which I need to calculate while I put my translated strings together)
I want my parser to recognize such bytes and let me write XX YY dynamicly.
Using strtr I can only replace "groups" e.g. when I put the static bytestring into an array.
How would you do this while keeping the parser flexible?
Thanks!
Assuming you have your hex values available as string, you can use this regex to parse it like you've mentioned. If you identify more rules other than FC**** or BE** then you can directly add them to the below regex so that they are also extracted.
(?<fc>FC(\w\w){4})|(?<be>BE(\w\w))|(?<any>(\w\w))
Now using named groups fc, be, any to identify result set easily using arrays such as $matches['fc'].
Regex Demo: https://regex101.com/r/kR9kdP/5
$re = '/(?<fc>FC(\w\w){4})|(?P<be>BE(\w\w))|(?P<any>(\w\w))/';
$str = 'FC03E705D3FC0006042842616D20626162612062';
preg_match_all($re, $str, $matches, PREG_PATTERN_ORDER, 0);
// Print the entire match result
print_r(array_filter($matches['fc'])); // Returns an array with all FC****
print_r(array_filter($matches['be'])); // Returns an array with all BE**
print_r(array_filter($matches['any'])); // Returns rest **
PHP Demo: http://ideone.com/qWUaob
Sample Results:
Array
(
[0] => FC03E705D3
[1] => FC00060428
)
Array
(
[50] => BE08
[59] => BE04
[113] => BE08
[132] => BE04
)
Hope this helps!
You can put hex characters in a regexp by using \x##, where ## is the hex code for the character. So you can match FC XX YY with:
preg_match('/(?=\xfc).{4}/, $bytes, $match);
$match[0] will then contain the 4 bytes after FC. You could split them up into pairs with capture groups:
preg_match('/(?=\xfc)(..)(..)/, $bytes, $match);
$match[1] will contain XX and $match[2] will contain YY.
This question already has answers here:
Reference - What does this error mean in PHP?
(38 answers)
Closed 4 years ago.
I can't seem to find why I'm getting a
Parse error: syntax error, unexpected 'if' (T_IF) in /Applications/XAMPP/xamppfiles/htdocs/oop/index.php on line 8
I checked for missing semi-colons/parentheses/curly braces but can't find anything! There are a few more other files these are referring to. Should I post them as well?
(getInstance is a static method in my db class)
(count and get are methods in my db class)
Thanks in advance!
<?php
require_once 'core/init.php';
$user = DB::getInstance()->get('users', array('username', '=', 'bob'));
if(!$user->count()) {
echo 'No user';
} else {
echo 'OK';
}
?>
init.php looks like this:
<?php
session_start();
$GLOBALS['config'] = array(
'mysql' => array(
'host' => '127.0.0.1',
'username' => 'root',
'password' => '',
'db' => 'oop'),
'remember' => array(
'cookie_name' => 'hash',
'cookie_expiry' => 604800),
'session' => array(
'session_name' => 'user')
);
spl_autoload_register(function($class) {
require_once 'classes/' . $class . '.php';
});
require_once 'functions/sanitize.php';
?>
Your file contains the byte sequence 0xefbbbf (which happens to be the UTF-8 encoding of U+FEFF, the zero-width non-breaking space) at the end of line 6:
$ hexdump -C index.txt
00000000 3c 3f 70 68 70 0a 0a 72 65 71 75 69 72 65 5f 6f |<?php..require_o|
00000010 6e 63 65 20 27 63 6f 72 65 2f 69 6e 69 74 2e 70 |nce 'core/init.p|
00000020 68 70 27 3b 0a 0a 24 75 73 65 72 20 3d 20 44 42 |hp';..$user = DB|
00000030 3a 3a 67 65 74 49 6e 73 74 61 6e 63 65 28 29 3b |::getInstance();|
00000040 0a 24 75 73 65 72 2d 3e 67 65 74 28 27 75 73 65 |.$user->get('use|
00000050 72 73 27 2c 20 61 72 72 61 79 28 27 75 73 65 72 |rs', array('user|
00000060 6e 61 6d 65 27 2c 20 27 3d 27 2c 20 27 62 6f 62 |name', '=', 'bob|
00000070 27 29 29 3b ef bb bf 0a 0a 69 66 28 21 24 75 73 |'));.....if(!$us|
00000080 65 72 2d 3e 63 6f 75 6e 74 28 29 29 20 7b 0a 09 |er->count()) {..|
00000090 65 63 68 6f 20 27 4e 6f 20 75 73 65 72 27 3b 0a |echo 'No user';.|
000000a0 7d 20 65 6c 73 65 20 7b 0a 09 65 63 68 6f 20 27 |} else {..echo '|
000000b0 4f 4b 27 3b 0a 7d 0a 0a 3f 3e |OK';.}..?>|
000000ba
Most likely your text editor erroneously added such a non-breaking space because it didn't understand that you were working with code.
PHP happily parses it as the name of a(n undefined) constant, and consequently complains when it then encounters an if construct (which is syntactically invalid immediately after a constant).
Remove this non-breaking space from your file.
I use the following test program to retrieve a website's content:
<?php
function getData( $url, $output ) {
// set the path for CURL
if (file_exists( '/var/lib'))
$curl = 'curl';
else
$curl = 'curl.exe';
$curl .= ' --trace trace.txt --header "User-Agent: Some-Agent/1.0" ';
echo "\nreading $url...\n";
$buffer = shell_exec( "$curl -i \"$url\"" );
// if this is a 301 redirection URL, follow it one step
if ((preg_match( '~^HTTP.+? 301 ~', $buffer )) and preg_match( '~Location: (.+)~', $buffer, $location )) {
$cmd = "$curl -i \"$location[1]\"";
echo "$cmd\n";
$buffer = shell_exec( $cmd );
}
file_put_contents( $output, $buffer );
}
// test with a URL that will be redirected:
getData( "http://www.onvista.de/aktien/fundamental/EISEN-UND-HUETTENWERKE-AG-Aktie-DE0006055007", "DE0006055007-AG.html" );
On my windows machine, this code runs fine. On a linux machine it returns a 500 internal server error.
This is the start of the trace file trace.txt:
== Info: About to connect() to www.onvista.de port 80 (#0)<br>
== Info: Trying 217.11.205.10... == Info: connected<br>
== Info: Connected to www.onvista.de (217.11.205.10) port 80 (#0)<br>
=> Send header, 130 bytes (0x82)<br>
0000: 47 45 54 20 2f 61 6b 74 69 65 6e 2f 66 75 6e 64 GET /aktien/fund<br>
0010: 61 6d 65 6e 74 61 6c 2f 31 53 54 2d 52 45 44 2d amental/1ST-RED-<br>
0020: 41 47 2d 41 6b 74 69 65 2d 44 45 30 30 30 36 30 AG-Aktie-DE00060<br>
0030: 35 35 30 30 37 0d 20 48 54 54 50 2f 31 2e 31 0d 55007. HTTP/1.1.<br>
0040: 0a 48 6f 73 74 3a 20 77 77 77 2e 6f 6e 76 69 73 .Host: www.onvis<br>
0050: 74 61 2e 64 65 0d 0a 41 63 63 65 70 74 3a 20 2a ta.de..Accept: *<br>
0060: 2f 2a 0d 0a 55 73 65 72 2d 41 67 65 6e 74 3a 20 /*..User-Agent: <br>
0070: 53 6f 6d 65 2d 41 67 65 6e 74 2f 31 2e 30 0d 0a Some-Agent/1.0..<br>
0080: 0d 0a ..<br>
<= Recv header, 36 bytes (0x24)<br>
0000: 48 54 54 50 2f 31 2e 31 20 35 30 30 20 49 6e 74 HTTP/1.1 500 Int<br>
0010: 65 72 6e 61 6c 20 53 65 72 76 65 72 20 45 72 72 ernal Server Err<br>
0020: 6f 72 0d 0a <br>
The only difference between the windows trace and this one is a CR character after the filename (ending in DE0006055007. How did get there and how can I suppress it? (And no, I don't want to use the PHP cURL module which leads to other problems.)
The http headers you get end with \r\n, as should. It seems curl on linux converts them to \n if it detects the output is tty, but try to redirect to a file and you'll see the \rs in there.
. in preg_match matches also \r character, so it becomes part of $location[1]. Simple solution is to trim it.
This doesn't happen on windows only because you can execute curl -i "http://google.com. The quotation is ended by the shell automaticaly after newline.
And you should really use escapeshellarg.
I have been trying to implement Ciphertext Stealing(CTS) in PHP for CBC.
Referring below two links
How can I encrypt/decrypt data using AES CBC+CTS (ciphertext stealing) mode in PHP?
and
http://en.wikipedia.org/wiki/Ciphertext_stealing
I am confused and stuck on the last and simplest step of XOR.
I know this is silly but having tried all the combinations, i don't know what am i missing.
Code follows.
// 1. Decrypt the second to last ciphertext block, using zeros as IV.
$second_to_last_cipher_block = substr($cipher_text, strlen($cipher_text) - 32, 16);
$second_to_last_plain = #mcrypt_decrypt(MCRYPT_RIJNDAEL_128, $key, $second_to_last_cipher_block, MCRYPT_MODE_CBC);
// 2. Pad the ciphertext to the nearest multiple of the block size using the last B-M
// bits of block cipher decryption of the second-to-last ciphertext block.
$n = 16 - (strlen($cipher_text) % 16);
$cipher_text .= substr($second_to_last_plain, -$n);
// 3. Swap the last two ciphertext blocks.
$cipher_block_last = substr($cipher_text, -16);
$cipher_block_second_last = substr($cipher_text, -32, 16);
$cipher_text = substr($cipher_text, 0, -32) . $cipher_block_last . $cipher_block_second_last;
// 4. Decrypt the ciphertext using the standard CBC mode up to the last block.
$cipher = mcrypt_module_open(MCRYPT_RIJNDAEL_128, '', MCRYPT_MODE_CBC, '');
mcrypt_generic_init($cipher, $key, $iv);
$plain_text = mcrypt_decrypt(MCRYPT_RIJNDAEL_128, $key, $cipher_text, MCRYPT_MODE_CBC , $iv);
// 5. Exclusive-OR the last ciphertext (was already decrypted in step 1) with the second last ciphertext.
// ???
// echo $??? ^ $???;
I find that concrete use cases are very helpful in understanding algorithms. Here are 2 use cases, and a step-by-step walk-through.
Starting point for both Use Cases.
These Use Cases assume that you are decrypting messages uses AES-256 with CBC chaining mode and ciphertext stealing for block quantisation. To generate these Use Cases, I used Delphi 2010 compiler and the TurboPower LockBox3 library (SVN revision 243). In what follows, I use a notation like so...
IV := [16] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
... to mean that some variable named 'IV' is assigned to be equal to an array of 16 bytes. The left most byte, is the rendering of the Least Signficant (lowest address) byte of the array, and the right-most byte, the most significant. These bytes are written in hexadecimal, so for example if one puts...
X := [2] 03 10
... it means that the LSB is 3 and the MSB is 16.
Use Case One
Let the AES-256 32 byte compressed key (as defined in the AES standard) be...
key = [32] 0D EE 8F 9F 8B 0B D4 A1 17 59 FA 05 FA 2B 65 4F 23 00 29 26 0D EE 8F 9F 8B 0B D4 A1 17 59 FA 05
With TurboPower LockBox 3, this can be achieved by setting the password ('UTF8Password') property of the TCodec component to...
password = (UTF-8) 'Your lips are smoother than vasoline.'
The plaintext message to be sent will be
Message = (UTF-8) 'Leeeeeeeeeroy Jenkins!'
Encoded this is 22 bytes long. AES-256 has a 16 byte block size, so this is some-where between 1 and 2 blocks long.
Let the IV be 1. (Aside: On the Delphi side, this can be achieved by setting
TRandomStream.Instance.Seed := 1;
just before encryption).
Thus the ciphertext message to be decrypted by PHP will be (with 8 byte IV prepended a la LockBox3) ...
ciphertext = [30] 01 00 00 00 00 00 00 00 17 5C C0 97 FF EF 63 5A 88 83 6C 00 62 BF 87 E5 1D 66 DB 97 2E 2C
(base64 equivalent ='AQAAAAAAAAAXXMCX/+9jWoiDbABiv4flHWbbly4s')
Breaking this down into IV, first ciphertext block (c[0]) and last (partial) ciphertext block (c[ 1])...
IV = [16] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c[0] = [16] 17 5C C0 97 FF EF 63 5A 88 83 6C 00 62 BF 87 E5
c[1] = [6] 1D 66 DB 97 2E 2C
Now let's walk-through the decryption with ciphertext stealing.
CV := IV
CV = [16] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
In general, the for n'th block (except for the last 2 blocks), our normal CBC algorithm is...
m[n] := Decrypt( c[n]) XOR CV;
CV[n+1] := c[n]
where:
m is the output plaintext block;
Decrypt() means AES-256 ECB decryption on that block;
CV is our Carry-Vector. The chaining mode defines how this changes from block to block.
but for the second last block (N-1) (N=2 in Use Case One), the transformation changes to ... (This exception is made due to the selection of ciphertext stealing)
m[n] := Decrypt( c[n]) XOR CV;
CV[n+1] := CV[n] // Unchanged!
Applying to our use case:
CV = [16] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c[0] = [16] 17 5C C0 97 FF EF 63 5A 88 83 6C 00 62 BF 87 E5
Decrypt(c[0]) = [16] 6F 6B 69 6E 73 21 F0 7B 79 F2 AF 27 B1 52 D6 0B
m[0] := Decrypt(c[0]) XOR CV = [16] 6E 6B 69 6E 73 21 F0 7B 79 F2 AF 27 B1 52 D6 0B
Now to process the last block. It is a partial one, 6 bytes long. In general, the processing of the last block goes like this...
y := c[N-1] | LastBytes( m[N-2], BlockSize-Length(c[N-1]));
m[N-1] := Decrypt( y) XOR CV
Applying to Use Case One:
c[1] = [6] 1D 66 DB 97 2E 2C
y := c[1] | LastBytes( m[0], 10)
y = [16] 1D 66 DB 97 2E 2C F0 7B 79 F2 AF 27 B1 52 D6 0B
Decrypt( y) = [16]= 4D 65 65 65 65 65 65 65 65 65 72 6F 79 20 4A 65
m[1] := Decrypt(y) XOR CV
m[1] = [16] 4C 65 65 65 65 65 65 65 65 65 72 6F 79 20 4A 65
The last step in the decryption process is the emission of the last two blocks. We reverse the order, emitting m[N-1] first, and then emit the first part of m[N-2] (the length of which is equal to the length of c[N-1]). Applying to Use Case One...
Emit m[ 1]
m[1] = [16] 4C 65 65 65 65 65 65 65 65 65 72 6F 79 20 4A 65
Emit the first 6 bytes of m[0]
FirstBytes( m[0], 6) = 6E 6B 69 6E 73 21
Putting it altogether, we get a reconstructed plaintext of ...
[22] 4C 65 65 65 65 65 65 65 65 65 72 6F 79 20 4A 65 6E 6B 69 6E 73 21
which is the UTF-8 encoding of 'Leeeeeeeeeroy Jenkins!'
Use Case Two
In this use case, the message is precisely 2 blocks long. This is called the round case. In round cases, there is no partial block to quantise, so it proceeds as if it were normal CBC. The password, key and IV are the same as in Use Case One. The ciphertext message to be decrypted (included prepended 8 byte IV) is...
Set-up
Ciphertext = [40] 01 00 00 00 00 00 00 00 70 76 12 58 4E 38 1C E1 92 CA 34 FB 9A 37 C5 0A 75 F2 0B 46 A1 DF 56 60 D4 5C 76 4B 52 19 DA 83
which is encoded base64 as 'AQAAAAAAAABwdhJYTjgc4ZLKNPuaN8UKdfILRqHfVmDUXHZLUhnagw=='
This breaks down into IV, first block and second block, like so...
IV = [16] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c[0] = [16] 70 76 12 58 4E 38 1C E1 92 CA 34 FB 9A 37 C5 0A
c[1] = [16] 75 F2 0B 46 A1 DF 56 60 D4 5C 76 4B 52 19 DA 83
General and 2nd last block
Decrypt(c[0]) = [16] 45 61 6E 63 65 20 74 68 65 6E 2C 20 77 68 65 72
m[0] := Decrypt(c[0]) XOR CV = [16] 44 61 6E 63 65 20 74 68 65 6E 2C 20 77 68 65 72
Next CV := c[0] = [16] 70 76 12 58 4E 38 1C E1 92 CA 34 FB 9A 37 C5 0A
Last Block:
Our last block is round in this use case.
Decrypt(c[1]) = [16] 75 F2 0B 46 A1 DF 56 60 D4 5C 76 4B 52 19 DA 83
m[1] := Decrypt(c[1]) XOR CV = [16] 65 65 76 65 72 20 79 6F 75 20 6D 61 79 20 62 65
The last step in the decryption process is the emission of the last two blocks. In the round case, we don't reverse the order. We emit m[N-2] first, and then m[N-1]. Applying to Use Case Two...
Emit m[0]
m[0] = [16] 44 61 6E 63 65 20 74 68 65 6E 2C 20 77 68 65 72
Emit the whole of m1
m[1] = [16] 65 65 76 65 72 20 79 6F 75 20 6D 61 79 20 62 65
Putting it altogether, we get a reconstructed plaintext of ...
[32] 44 61 6E 63 65 20 74 68 65 6E 2C 20 77 68 65 72 65 65 76 65 72 20 79 6F 75 20 6D 61 79 20 62 65
which is the UTF-8 encoding of 'Dance then, whereever you may be'
Edge Cases to consider.
There are two edge cases, not illustrated by the two Use Cases provided here.
Short messages. A short message is a message, whose length in bytes is:
Not zero; and
Less than one block;
Zero length messages.
In the case of short messages, technically one could still implement ciphertext stealing by using the the IV as the prior block of ciphertext. However, IMHO, this use of ciphertext stealing, in this way, is not justified by lack of research into the impact on cryptographic strength, not to mention the added implementation complexity. In TurboPower LockBox 3, when the message is a short message, and the chaining mode is not a key-streaming one, then the chaining mode is treated as CFB-8bit. CFB-8 bit is a key-streaming mode.
In the case of zero-length messages, its really simple. Zero-length plaintext message maps one-to-one to zero-length ciphertext messages. No IV is needed, generated nor prepended. This mapping is independent of chaining mode and cipher (in the case of block mode ciphers).
Notes on PHP Implementation
Caveat
I am not a PHP programmer. I don't know PHP. Any thing I say here should be taken with a grain of salt.
Arrays of bytes
It looks like you are using PHP strings to store arrays of bytes. This looks dangerous to me. What if one of the byte values was zero? Would that shorten the string? How would strlen() behave in that case? If PHP has a native data type which was an array of byte, then this probably would be safer. But I don't really know. I am just bringing this point to your attention, if you are not already aware of it. Possibly it is not really an issue.
mcrypt_decrypt library
I am not familiar with this library. Does it natively support ciphertext stealing? I assume not. So there are two possible strategies for you.
Call the library's decrypt for all but the last two blocks with CBC mode. Process the last two blocks as I have described to you. But this requires access to the CV. Does the API expose this? If not, the this strategy is not a viable option for you.
Call the library's decrypt for all but the last two blocks with ECB mode, and roll your CBC chaining. Fairly easy to implement, and be definition, you have access to the CV.
How to do XOR in PHP
Some-one else posted an answer to this question, but has currently withdrawn it. But he was right. It looks like to do an XOR in PHP on an array of bytes, iterate through the characters, one by one, and do a byte level XOR. The technique is shown here.
I was looking for a similar answer for perl. Perl's libraries were limited to CBC mode. Here's how I got CTS to work using AES 256 CBC mode and CTS method 3. I thought this may be helpful for PHP as well.
Here's the actual NIST documentation.
Doc ID: NIST800-38A CBC-CS3
Title: �Recommendation for Block Cipher Modes of Operation; Three Variants of Ciphertext Stealing for CBC Mode�
Source: http://csrc.nist.gov/publications/nistpubs/800-38a/addendum-to-nist_sp800-38A.pdf
Here's the code...
use Crypt::CBC;
use Crypt::Cipher::AES;
my $key = pack("H*","0000000000000000000000000000000000000000000000000000000000000000");
my $iv = pack("H*","00000000000000000000000000000000");
my $pt = pack("H*","0000000000000000000000000000000000");
my $ct = aes256_cbc_cts_decrypt( $key, $iv, $pt );
#AES 256 CBC with CTS
sub aes256_cbc_cts_decrypt {
my ($key, $iv, $in) = #_;
my $len_in_bytes = length(unpack("H*", $in)) / 2;
my $in_idx = 0;
my $null_iv = pack( "H32", "00000000000000000000000000000000");
my $cipher = Crypt::CBC->new(
-key => $key,
-iv => $null_iv,
-literal_key => '1',
-keysize => 32,
-blocksize => 16,
-header => 'none',
-cipher => 'Crypt::Cipher::AES');
my $out;
while ( $len_in_bytes >= 16 )
{
my $tmp = substr($in, $in_idx, 16);
my $outblock = $cipher->decrypt($tmp);
if ( ( ($len_in_bytes % 16) eq 0 ) || ( $len_in_bytes > 32 ) )
{
$outblock = $outblock ^ $iv;
$iv = $tmp;
}
$out .= $outblock;
$in_idx += 16;
$len_in_bytes -= 16;
}
if ($len_in_bytes) {
my $tmp = substr($in,$in_idx,$len_in_bytes);
my $out_idx = $in_idx - 16;
$tmp .= substr($out,$out_idx + $len_in_bytes, 16 - $len_in_bytes);
$out .= substr($out, $out_idx, $len_in_bytes) ^ substr($tmp, 0, $len_in_bytes);
substr($out,$out_idx,16) = $iv ^ $cipher->decrypt($tmp);
}
return $out;
}
But why?
if ('i' == 'і')
echo 'good';
else
echo 'bad';
echos:
>> bad
You should copy this snippet. If you write it by hand, it will work.
It drives me crazy.
You are sneaky! The second I is not a lower case latin small i. I hexdumped it:
hexdump -C check
00000000 69 66 20 28 27 69 27 20 3d 3d 20 27 d1 96 27 29 |if ('i' == '..')|
00000010 0a 20 20 20 20 65 63 68 6f 20 27 67 6f 6f 64 27 |. echo 'good'|
00000020 3b 0a 65 6c 73 65 0a 20 20 20 20 65 63 68 6f 20 |;.else. echo |
00000030 27 62 61 64 27 3b 20 20 0a 0a |'bad'; ..|
0000003a
I'll let you look up D1 96 :-) Awesome tricksy riddle. +1
Delete the code and retype it :-)
There is an extra character or looks-alike nonsense in there (the 'i' == 'i' bit).
With a copy'n'paste -- "bad"
With the line replaced -- "good"
Another way to prove ('i' != 'і') visually!!
http://jsfiddle.net/naeDE/1/
<pre style="font-size:700%">
if ('i' == 'і')
echo 'good';
else
echo 'bad';
</pre>