I am tasked with converting some Fortran code to PHP and am stumbling at the last hurdle.
In essence the Fortran converts a REAL into a binary CHAR(4) which it ultimately writes to file.
The Fortran (which also confuses me) is as follows:
FUNCTION MKS(x)
CHARACTER (LEN=4) :: MKS ! The 4-character string which is returned to
REAL :: x ! The incoming single-precision variable
CHARACTER (LEN=1), DIMENSION(4) :: a ! A working variable
CHARACTER (LEN=4) :: d ! A working variable
CALL MKS1(x,a) ! Send x - get back a(1), a(2), a(3), a(4)
! Note: x will hold the first 32 bits referenced
! and a will hold the next 32 bits
d = a(1) // a(2) // a(3) // a(4) ! concatenate into 1 string (d)
MKS = d ! assign string to variable MKS
END FUNCTION MKS
SUBROUTINE MKS1 (b,c)
IMPLICIT NONE
CHARACTER (LEN=1), DIMENSION(4) :: b ! array with incoming 32 bits
CHARACTER (LEN=1), DIMENSION(4) :: c ! array with each character returned
INTEGER :: i ! DO Loop counter
DO i=1,4
c(i) = b(i)
END DO
END SUBROUTINE MKS1
I have attempted to recreate this function using php as follows
pack('CCCC', $value & 0x000F,
($value>>8) & 0x000F,
($value>>16) & 0x000F,
($value>>24) &0x000F);
But on comparing the output values using the *nix od command shows completely different results.
What is the correct way to pack the equivalent to a Fortran REAL into a char[4] Array in PHP?
It turned out to be quite simple.
Your FORTAN REAL is stored as an IEEE 754 32 bit floating point number.
The output from your od was misleading. Converting it to hex gives the following.
0115040 0134631 0005077
0x20, 0x9A, 0x99, 0xB9, 0x3f, 0x0a
The first and last bytes of the file are redundant, they are a space and a carriage return respectively. The bit we're after is the middle 4 bytes.
Using pack we can convert from floats (warning - endianness is machine dependant).
The following:
var_dump(bin2hex(pack('f', 1.450)));
Gives us a familar sequence of bytes.
string(8) "9a99b93f"
So instead of converting to hex, output that to a file with a space at the start and a carriage return at the end, and you'll have an identical file. (as long as your PHP/machine configuration doesn't do something mad with floats - but even then if you follow the IEEE 754 spec, you should be able to reproduce it)
This might be an extended comment rather than an answer.
You are, in the statement
CALL MKS1(x,a)
passing a REAL argument where the subroutine expects an array of 4 length-1 characters. You deserve all the bad things which happen to you :-) You can only compile this because you haven't required explicit interfaces on your subroutines.
What do you want the 4 characters that your PHP program reads to be ? If, for instance, your Fortran wrote the REAL into 4 bytes in a binary file and your PHP read 4-bytes as 4 single characters would you get the characters you want ? I'm uncertain of what your requirement is.
Related
I want to convert byte array to UINT64 using PHP.
I can do this easily in C# but I want to do this in PHP.
Here is C# code.
bytes = Encoding.UTF8.GetBytes(hashed);
BitConverter.ToUInt64(bytes, 0);
I want to convert this to PHP.
I tried to use pack() function but this does not works.
Let's say this is a byte array.
$bytes = [101,102,54,55,99,55,56,49];
pack("J*","101","102","54","55","99","55","56","49");
This shows a warning.
epack(): 7 arguments unused on line
How can I fix this?
The major issue here (if I understand it correctly) is you're using PHP numbers to represent a byte array however unpack requires an input string. If you keep the array as is then PHP seems to just convert the numbers to strings meaning a 101 will be '101' which in turn is 3 bytes, which breaks the whole thing down.
You need to first convert the numbers to bytes. A byte is essentially as an unsigned char so you could first pack your array into unsigned chars and then unpack them:
$bytes = [101,102,54,55,99,55,56,49];
$unpacked = unpack("J", pack('C*', ...$bytes));
echo current($unpacked);
Explanation:
C is the pack code for unsigned char and * indicates that you need to use all array entries. This will generate a string of characters based on the array. You can then unpack this string using J (if you know for a fact that the bytes were generated in a big endian byte order) or P if you know the bytes were generated in little endian order or Q if you want to use the machine order. If the bytes were generated in the same machine then Q would probably be a better choice than J otherwise you need to know the endianess.
Example: http://sandbox.onlinephpfunctions.com/code/5cba2c29522f7b9f9a0748b99fac768012e759ce
Note: This is my personal understanding of what is happening so anyone with better pack/unpack knowledge can let me know if I got things wrong.
Is there any reason for this behavior/implementation ?Example:
$array = array("index_of_an_array" => "value");
class Foo {
private $index_of_an_array;
function __construct() {}
}
$foo = new Foo();
$array = (array)$foo;
$key = str_replace("Foo", "", array_keys($array)[0]);
echo $array[$key];
Gives us an error which is complete:
NOTICE Undefined index: on line number 9
Example #2:
echo date("Y\0/m/d");
Outputs:
2016
BUT! echo or var_dump(), for example, and some other functions, would output the string "as it is", just \0 bytes are being hidden by browsers.
$string = "index-of\0-an-array";
$strgin2 = "Y\0/m/d";
echo $string;
echo $string2;
var_dump($string);
var_dump($string2);
Outputs:
index-of-an-array
"Y/m/d"
string(18) "index-of-an-array"
string(6) "Y/m/d"
Notice, that $string lenght is 18, but 17 characters are shown.
EDIT
From possible duplicate and php manual:
The key can either be an integer or a string. The value can be of any type.
Strings containing valid integers will be cast to the integer type. E.g. the key "8" will actually be stored under 8. On the other hand "08" will not be cast, as it isn't a valid decimal integer. So in short, any string can be a key. And a string can contain any binary data (up to 2GB). Therefore, a key can be any binary data (since a string can be any binary data).
From php string details:
There are no limitations on the values the string can be composed of;
in particular, bytes with value 0 (“NUL bytes”) are allowed anywhere
in the string (however, a few functions, said in this manual not to be
“binary safe”, may hand off the strings to libraries that ignore data
after a NUL byte.)
But I still do not understand why the language is designed this way? Are there reasons for this behavior/implementation? Why PHP does'nt handle input as binary safe everywhere but just in some functions?
From comment:
The reason is simply that many PHP functions like printf use the C library's implementation behind the scenes, because the PHP developers were lazy.
Arent those such as echo, var_dump, print_r ? In other words, functions that output something. They are in fact binary safe if we take a look at my first example. Makes no sense to me to implement some binary-safe and binary-unsafe functions for output. Or just use some as they are in std lib in C and write some completely new functions.
The short answer to "why" is simply history.
PHP was originally written as a way to script C functions so they could be called easily while generating HTML. Therefore PHP strings were just C strings, which are a set of any bytes. So in modern PHP terms we would say nothing was binary-safe, simply because it wasn't planned to be anything else.
Early PHP was not intended to be a new programming language, and grew organically, with Lerdorf noting in retrospect: "I don’t know how to stop it, there was never any intent to write a programming language […] I have absolutely no idea how to write a programming language, I just kept adding the next logical step on the way."
Over time the language grew to support more elaborate string-processing functions, many taking the string's specific bytes into account and becoming "binary-safe". According to the recently written formal PHP specification:
As to how the bytes in a string translate into characters is unspecified. Although a user of a string might choose to ascribe special semantics to bytes having the value \0, from PHP's perspective, such null bytes have no special meaning. PHP does not assume strings contain any specific data or assign special values to any bytes or sequences.
As a language that has grown organically, there hasn't been a move to universally treat strings in a manner different from C. Therefore functions and libraries are binary-safe on a case-by-case basis.
Fist Example from Question
Your first example is a confusing because the error message is the part that's terminating on the null character not because the string is being handled incorrectly by the array. The original code you posted with the error message follows:
$array = array("index-of-an-array" => "value");
$string = "index-of\0-an-array";
echo $array[$string];
Notice: Undefined index: index-of in
Note, the error message above has been truncated index-of due to the null character, the array is working as expected because if you try it this way it will work just fine:
$array = array("index-of\0-an-array" => "value");
$string = "index-of\0-an-array";
echo $array[$string];
The error message correctly identified the that the two keys were wrong, which
they are
"index-of\0-an-array" != "index-of-an-array"
The problem is that the error message printed out everything up to the null character. If that's the case then it might be considered a bug by some.
Second Example is starting plumb the depths of PHP :)
I've added some code to it so we can see what's happening
<?php
class Foo {
public $index_public;
protected $index_prot;
private $index_priv;
function __construct() {
$this->index_public = 0;
$this->index_prot = 1;
$this->index_priv = 2;
}
}
$foo = new Foo();
$array = (array)$foo;
print_r($foo);
print_r($array);
//echo $array["\0Foo\0index_of_an_array2"];//This prints 2
//echo $foo->{"\0Foo\0index_of_an_array2"};//This fails
var_dump($array);
echo array_keys($array)[0] . "\n";
echo $array["\0Foo\0index_priv"] . "\n";
echo $array["\0*\0index_prot"] . "\n";
The above codes output is
Foo Object
(
[index_public] => 0
[index_prot:protected] => 1
[index_priv:Foo:private] => 2
)
Array
(
[index_public] => 0
[*index_prot] => 1
[Fooindex_priv] => 2
)
array(3) {
'index_public' =>
int(0)
'\0*\0index_prot' =>
int(1)
'\0Foo\0index_priv' =>
int(2)
}
index_public
2
1
The PHP developers choose to use the \0 character as a way to split member variable types. Note, protected fields use a * to indicate that the member variable may actually belong to many classes. It's also used to protect private access ie this code would not work.
echo $foo->{"\0Foo\0index_priv"}; //This fails
but once you cast it to an array then there is no such protection ie this works
echo $array["\0Foo\0index_priv"]; //This prints 2
Is there any reason for this behavior/implementation?
Yes. On any system that you need to interface with you need to make system
calls, if you want the current time or to convert a date etc you need to talk
to the operating system and this means calling the OS API, in the case of Linux
this API is in C.
PHP was original developed as a thin wrapper around C quite a few languages
start out this way and evolve, PHP is no exception.
Is there any reason for this behavior/implementation?
In the absence of any backwards compatibility issues I'd say some of the choices are less than optimal but my suspicion is that backwards compatibility is a large factor.
But I still do not understand why the language is designed this way?
Backwards compatibility is almost always the reason why features that people don't like remain in a language. Over time languages evolve and remove things but it's incremental and prioritized. If you had asked all the PHP developers do they want better binary string handling for some functions or a JIT compiler I think a JIT might win which it did in PHP 7. Note, the people doing the actual work ultimately decide what they work on and working on a JIT compiler is more fun than fixing libraries that do things in seemingly odd ways.
I'm not aware of a any language implementor that doesn't wish they'd done some things differently from the outset. Anyone implementing a compiler before a
language is popular is under a lot of pressure to get something that works for
them and that means cutting corners, not all languages in existence today had a
huge company backing them, most often it was a small dedicated team and they
made mistakes, some were lucky enough to get paid to do it. Calling them lazy
is a bit unfair.
All language have dark corners warts and boils and features you'll eventually hate. Some more than others and PHP has a bad rep because it has/had a lot more than most. Note, PHP 5 is a vast leap forward from PHP 4. I'd imagine that PHP 7 will improve things even more.
Anyone that thinks their favorite language is free from problems is delusional and has almost certainly not plumbed the depths of the tool their using to any great depth.
Functions in PHP which internally operate with C strings are "not binary safe" in PHP terminology. C string is an array of bytes ending with byte 0. When a PHP function internally uses C strings, it reads one by one character and when it encounters byte 0 it considers it as an end of string. Byte 0 tells C string functions where is the end of string since C string does not contain any information about string length.
"Not binary safe" means that, if function which operates with C string is somehow handed a C string not terminated with byte 0, behavior is unpredictable because function will read/write bytes beyond end of the string, adding garbage to string and/or potentially crashing PHP.
In C++, for example, we have string object. This object also contains an array of characters, but it has also a length field which it updates on any length change. So it does not require byte 0 to tell it where the end is. This is why string object can contain any number of 0 bytes, although this is generally not valid since it should contain only valid characters.
In order for this to be corrected, the whole PHP core, including any modules which operate with C strings, need to be rewritten in order to send "non binary safe" functions to history. The amount of job needed for this is huge and all the modules' creators need to produce new code for their modules. This can introduce new bugs and instabilities into the whole story.
Issue with byte 0 and "non binary safe" functions is not that much critical to justify rewriting PHP and PHP modules code. Maybe in some newer PHP version where some things need to be coded from scratch it would make sense to correct this.
Until then, you just need to know that any arbitrary binary data put to some string by using binary-safe functions needs to have byte 0 added at the end. Usually you will notice this when there is unexpected garbage at end of your string or PHP crashes.
Today I just made an interesting discovery while testing what happens calculating bitwisely in php like INF ^ 0 (^ => Bitwise Operator for Exclusive OR (XOR)) what gave me int(-9223372036854775808) => greatest possible negative value in a 64-Bit system.
But then I was asking myself: "Why is the result going negative in XOR when the "positive infinit" means 9223372036854775807 (63 Bits on 1 with a leading 0) and 0 (64 Bits on 0 => 0 xor 0 = 0) What is PHP's infinit value though and what is the calculation behind it? And why do I get a (correct?) negative value when I use "negative infinit"(A leading 1 against a leading 0 on 0 => 1 xor 0 = 1?".
Another interesting point is that this just happens on PHP Version 5.5.9-1, and not e.g. on 5.3.x. and 5.6.x (where i've tested it)! Maybe someone has an idea what happens there? Tested it on three versions but just mine (5.5.9-1) gives those results:
Just to let you guys know, it's just an abstract playaround i've done for fun but I find it's interesting. Maybe someone can help here or explain me a wrong thought I have? Just tell me if someone needs more informations about anything!
EDIT: Accordingly to jbafford it would be great to get a complete answere, so i'll just quote him: why does 5.5 and 5.6 result in PHP_INT_MIN, and everything else return 0?
First off, ^ itself isn't what's special here. If you XOR anything with zero, or OR anything with zero, you just get back the original answer. What you're seeing here is not part of the operation itself, but rather what happens before the operation: the bitwise operators take integers, so PHP converts the float to an integer. It's in the float-to-integer conversion that the weird behaviour appears, and it's not exclusive to the bitwise operators. It also happens for (int), for example.
Why does it produce these weird results? Simply because that's what the C code PHP is written in produces when converting a float to an integer. In the C standard, C's behaviour for float-to-integer conversions is undefined for the special values of INF, -INF and NAN (or, more accurately, for "integral parts" an integer can't represent: §6.3.1.4). This undefined behaviour means the compiler is free to do whatever it wants. It just so happens in this case that the code it generates produces the minimum integer value here, but there's no guarantee that will always happen, and it's not consistent across platforms or compilers.1 Why did the behaviour change between 5.4 and 5.5? Because PHP's code for converting floats to integers changed to always perform a modulo conversion. This fixed the undefined behaviour for very large floating-point numbers,2 but it still didn't check for special values, so for that case it still produced undefined behaviour, just slightly different this time.
In PHP 7, I decided to clean up this part of PHP's behaviour with the Integer Semantics RFC, which makes PHP check for the special values (INF, -INF and NAN) and convert them consistently: they always convert to integer 0. There's no longer undefined behaviour at work here.
1 For example, a test program I wrote in C to try to convert Infinity to an integer (specifically a C long) has different results on 32-bit and 64-bit builds. The 64-bit build always produces -9223372036854775808, the minimum integer value, while the 32-bit build always produces 0. This behaviour is the same for GCC and clang, so I guess they're both producing very similar machine code.
2 If you tried to convert a float to an integer, and that float's value was too big to fit in an integer (e.g. PHP_INT_MAX * 2, or PHP_INT_MIN * 2), the result was undefined. PHP 5.5 makes the result consistent, though unintuitive (it acts if the float was converted to a very large integer, and the most significant bits were discarded).
Your float(INF) gets implicitly casted to an Integer.
and XOR with 0 does not change the first parameter. So basically this is just a cast from float to int which is undefined for values which are not in the integer range. (for all other values it will be truncated towards zero)
https://3v4l.org/52bA5
Assume that I have a file named data.txt with the contents "Blah Blah !".
So when I use the code below
$hnd=fopen('data.txt','r');
echo fgets($hnd,2);
it displays just one character "B" instead of "Bl". Later I read the manual stating:
length
Reading ends when length - 1 bytes have been read, or a newline (which is included in the return value), or an EOF (whichever comes first). If no length is specified, it will keep reading from the stream until it reaches the end of the line.
Can anyone explain to me why it is this way? I mean why is it length-1 and not length.
The C fgets() function reads length - 1 bytes, because it has to add a terminating zero to turn the data into a proper string.
My best guess is that PHP's fgets() exhibits the same behaviour because it is either:
a legacy from the bad old days when PHP functions were little more that wrappers around the corresponding C functions, and string functions were binary unsafe (eg. strings could not contain embedded NUL characters). Changing the behaviour of the fgets() function would introduce new bugs in existing programs. Or,
a deliberate decision to make the PHP function compatible with the C function to avoid unnecessary surprises.
or both.
Interestingly, it looks like PHP internally adds a terminating zero when storing string values, for example in _php_stream_get_line() (called from fgets()) and zend_string_init().
Since _zend_string objects store the string length anyway, it shouldn't be necessary to store the terminating zero, unless there are still binary unsafe functions in PHP.
Because PHP, like many C-derivatives count from 0, and not from 1. They have Zero-based numbering
Eg for arrays: An array of length/size n has 0 to n - 1, elements.
i.e. 0, 1, 2 , 3, 4 .... n-1
So an array of length 5 has elements 0, 1, 2, 3, 4
So you will find that whether reading byte, strings, arrays... they always reference the to the (n-1)th element or marker, for an n-size structure
Please use following code for your raised questionarries
$hnd=fopen('E:\\data.txt','r');
echo fgets($hnd,2);
Good day, I am making my hashing algorthm, so I am rewriting it to C++ from PHP.
But result in C++ is different than php result. PHP result contains more than 10 characters, C++ result only 6 - 8 characters. But those last 8 characters of PHP result are same as C++ result.
So here is PHP code:
<?php function JL1($text) {
$text.="XQ";
$length=strlen($text);
$hash=0;
for($j=0;$j<$length;$j++) {
$p=$text[$j];
$s=ord($p);
if($s%2==0) $s+=9999;
$hash+=$s*($j+1)*0x40ACEF*0xFF;
}
$hash+=33*0x40ACEF*0xFF;
$hash=sprintf("%x",$hash);
return $hash; } ?>
And here C++ code:
char * JL1(char * str){
int size=(strlen(str)+3),s=0; //Edit here (+2 replaced with +3)
if(size<=6) //Edit here (<9 replaced with <=6)
size=9;
char *final=new char[size],temp;
strcpy(final,str);
strcat(final,"XQ");
long length=strlen(final),hash=0L;
for(int i=0;i<length;i++){
temp=final[i];
s=(int)temp;
if(s%2==0)s+=9999;
hash+=((s)*(i+1)*(0x40ACEF)*(0xFF));
}
hash+=33*(0x40ACEF)*(0xFF);
sprintf(final,"%x",hash); //to hex string
final[8]='\0';
return final; }
Example of C++ result for word: "Hi!" : 053c81be
And PHP result for this word: 324c053c81be
Does anyone know,where is that mistake and how to fix that, whether in php or in cpp code?
By the way, when I cut those first letters in php result I get C++ result, but it wont help, because C++ result have not to be 8 characters long, it can be just 6 characters long in some cases.
Where to begin...
Data types do not have fixed guaranteed sizes in C or C++. As such, hash may overflow every iteration, or it may never do so.
chars can be either signed or unsigned, therefore converting one to an integer may result in negative and positive values on different implementations, for the same character.
You may be writing past the end of final when printing the value of hash into it. You may also be cutting the string off prematurely when setting the 9th character to 0.
strcat will write past the end of final if str is at least 7 characters long.
s, a relatively short-lived temporary variable, is declared way too soon. Same with temp.
Your code looks very crowded with almost no whitespace, and is very hard to read.
The expression "33*(0x40ACEF)*(0xFF)" overflows; did you mean 0x4DF48431L?
Consider using std::string instead of char arrays when dealing with strings in C++.
long hash in C++ is most likely limited to 32 bits on your platform. PHP's number isn't.
sprintf(final, "%x", hash) produces a possibly incorrect result. %x interprets the argument as an unsigned int, which is 32 bits on both Windows and Linux x64. So it's interpreting a long as an unsigned int, if your long is more than 32 bits, your result will get truncated.
See all the issues raised by aib. Especially the premature termination of the result.
You will need to deal with the 3rd point yourself, but I can answer the first two. You need to clamp the result to 32 bits: $hash &= 0xFFFFFFFF;.
If you clamp the final value, the php code will produce the same results as the C++ code would on x64 Linux (that means 64 bit integers for intermediate results).
If you clamp it after every computation, you should get the same results as the C++ code would on 32 bit platforms or Windows x64 (32 bit integers for intermediate results).
There seems to be a bug here...
int size=(strlen(str)+2),s=0;
if(size<9)
size=9;
char *final=new char[size],temp;
strcpy(final,str);
strcat(final,"XQ");
If strlen was say 10, then size will be 12 and 12 chars will be allocated.
You then copy in the original 10 characters, and add XQ, but the final terminating \0 will be outside of the allocated memory.
Not sure if that's your bug or not but it doesn;t look right