Can anyone explain the length parameter to fgets() in PHP? - php

Assume that I have a file named data.txt with the contents "Blah Blah !".
So when I use the code below
$hnd=fopen('data.txt','r');
echo fgets($hnd,2);
it displays just one character "B" instead of "Bl". Later I read the manual stating:
length
Reading ends when length - 1 bytes have been read, or a newline (which is included in the return value), or an EOF (whichever comes first). If no length is specified, it will keep reading from the stream until it reaches the end of the line.
Can anyone explain to me why it is this way? I mean why is it length-1 and not length.

The C fgets() function reads length - 1 bytes, because it has to add a terminating zero to turn the data into a proper string.
My best guess is that PHP's fgets() exhibits the same behaviour because it is either:
a legacy from the bad old days when PHP functions were little more that wrappers around the corresponding C functions, and string functions were binary unsafe (eg. strings could not contain embedded NUL characters). Changing the behaviour of the fgets() function would introduce new bugs in existing programs. Or,
a deliberate decision to make the PHP function compatible with the C function to avoid unnecessary surprises.
or both.
Interestingly, it looks like PHP internally adds a terminating zero when storing string values, for example in _php_stream_get_line() (called from fgets()) and zend_string_init().
Since _zend_string objects store the string length anyway, it shouldn't be necessary to store the terminating zero, unless there are still binary unsafe functions in PHP.

Because PHP, like many C-derivatives count from 0, and not from 1. They have Zero-based numbering
Eg for arrays: An array of length/size n has 0 to n - 1, elements.
i.e. 0, 1, 2 , 3, 4 .... n-1
So an array of length 5 has elements 0, 1, 2, 3, 4
So you will find that whether reading byte, strings, arrays... they always reference the to the (n-1)th element or marker, for an n-size structure

Please use following code for your raised questionarries
$hnd=fopen('E:\\data.txt','r');
echo fgets($hnd,2);

Related

Pack Convert byte array into UINT64

I want to convert byte array to UINT64 using PHP.
I can do this easily in C# but I want to do this in PHP.
Here is C# code.
bytes = Encoding.UTF8.GetBytes(hashed);
BitConverter.ToUInt64(bytes, 0);
I want to convert this to PHP.
I tried to use pack() function but this does not works.
Let's say this is a byte array.
$bytes = [101,102,54,55,99,55,56,49];
pack("J*","101","102","54","55","99","55","56","49");
This shows a warning.
epack(): 7 arguments unused on line
How can I fix this?
The major issue here (if I understand it correctly) is you're using PHP numbers to represent a byte array however unpack requires an input string. If you keep the array as is then PHP seems to just convert the numbers to strings meaning a 101 will be '101' which in turn is 3 bytes, which breaks the whole thing down.
You need to first convert the numbers to bytes. A byte is essentially as an unsigned char so you could first pack your array into unsigned chars and then unpack them:
$bytes = [101,102,54,55,99,55,56,49];
$unpacked = unpack("J", pack('C*', ...$bytes));
echo current($unpacked);
Explanation:
C is the pack code for unsigned char and * indicates that you need to use all array entries. This will generate a string of characters based on the array. You can then unpack this string using J (if you know for a fact that the bytes were generated in a big endian byte order) or P if you know the bytes were generated in little endian order or Q if you want to use the machine order. If the bytes were generated in the same machine then Q would probably be a better choice than J otherwise you need to know the endianess.
Example: http://sandbox.onlinephpfunctions.com/code/5cba2c29522f7b9f9a0748b99fac768012e759ce
Note: This is my personal understanding of what is happening so anyone with better pack/unpack knowledge can let me know if I got things wrong.

PHP pack: do not really understand

I posted this (php pack: problems with data types and verification of my results) and found that I had two problems.
So here again only one issue (I solved the other one) Hopefully this is easy to understand:
I want to use the PHP pack() function.
1) My aim is to convert any integer number info a hex one of length 2-Bytes.
Example: 0d37 --> 0x0025
2) Second aim is to toggle high / low byte of each value: 0x0025 --> 0x2500
3) There are many input values which will form 12-Bytes of binary data.
Can anyone help me?
You just have to lookup the format table in the pack() manual page and it is quite easy.
2 bytes means 16 bits, or also called a "short". I assume you want that unsigned ... so we get n for big endian (high) and v for little endian (low) byte order.
The only potentially tricky part is figuring out how to combine the format and parameters, as each format character is tied to a value argument:
bin2hex(pack('nv', 34, 34)) // returns 00222200
If you need a variable number of values, you'll need agument unpacking (a PHP language feature, not to be confused with unpack()):
$format = 'nv';
$values = [34, 34];
pack($format, ... $values); // does the same thing
And alternatively, if all of your values should be packed with the same format, you could do this:
pack('v*', $values); // will "pack" as many short integers as you want

Why are there binary safe AND binary unsafe functions in php?

Is there any reason for this behavior/implementation ?Example:
$array = array("index_of_an_array" => "value");
class Foo {
private $index_of_an_array;
function __construct() {}
}
$foo = new Foo();
$array = (array)$foo;
$key = str_replace("Foo", "", array_keys($array)[0]);
echo $array[$key];
Gives us an error which is complete:
NOTICE Undefined index: on line number 9
Example #2:
echo date("Y\0/m/d");
Outputs:
2016
BUT! echo or var_dump(), for example, and some other functions, would output the string "as it is", just \0 bytes are being hidden by browsers.
$string = "index-of\0-an-array";
$strgin2 = "Y\0/m/d";
echo $string;
echo $string2;
var_dump($string);
var_dump($string2);
Outputs:
index-of-an-array
"Y/m/d"
string(18) "index-of-an-array"
string(6) "Y/m/d"
Notice, that $string lenght is 18, but 17 characters are shown.
EDIT
From possible duplicate and php manual:
The key can either be an integer or a string. The value can be of any type.
Strings containing valid integers will be cast to the integer type. E.g. the key "8" will actually be stored under 8. On the other hand "08" will not be cast, as it isn't a valid decimal integer. So in short, any string can be a key. And a string can contain any binary data (up to 2GB). Therefore, a key can be any binary data (since a string can be any binary data).
From php string details:
There are no limitations on the values the string can be composed of;
in particular, bytes with value 0 (“NUL bytes”) are allowed anywhere
in the string (however, a few functions, said in this manual not to be
“binary safe”, may hand off the strings to libraries that ignore data
after a NUL byte.)
But I still do not understand why the language is designed this way? Are there reasons for this behavior/implementation? Why PHP does'nt handle input as binary safe everywhere but just in some functions?
From comment:
The reason is simply that many PHP functions like printf use the C library's implementation behind the scenes, because the PHP developers were lazy.
Arent those such as echo, var_dump, print_r ? In other words, functions that output something. They are in fact binary safe if we take a look at my first example. Makes no sense to me to implement some binary-safe and binary-unsafe functions for output. Or just use some as they are in std lib in C and write some completely new functions.
The short answer to "why" is simply history.
PHP was originally written as a way to script C functions so they could be called easily while generating HTML. Therefore PHP strings were just C strings, which are a set of any bytes. So in modern PHP terms we would say nothing was binary-safe, simply because it wasn't planned to be anything else.
Early PHP was not intended to be a new programming language, and grew organically, with Lerdorf noting in retrospect: "I don’t know how to stop it, there was never any intent to write a programming language […] I have absolutely no idea how to write a programming language, I just kept adding the next logical step on the way."
Over time the language grew to support more elaborate string-processing functions, many taking the string's specific bytes into account and becoming "binary-safe". According to the recently written formal PHP specification:
As to how the bytes in a string translate into characters is unspecified. Although a user of a string might choose to ascribe special semantics to bytes having the value \0, from PHP's perspective, such null bytes have no special meaning. PHP does not assume strings contain any specific data or assign special values to any bytes or sequences.
As a language that has grown organically, there hasn't been a move to universally treat strings in a manner different from C. Therefore functions and libraries are binary-safe on a case-by-case basis.
Fist Example from Question
Your first example is a confusing because the error message is the part that's terminating on the null character not because the string is being handled incorrectly by the array. The original code you posted with the error message follows:
$array = array("index-of-an-array" => "value");
$string = "index-of\0-an-array";
echo $array[$string];
Notice: Undefined index: index-of in
Note, the error message above has been truncated index-of due to the null character, the array is working as expected because if you try it this way it will work just fine:
$array = array("index-of\0-an-array" => "value");
$string = "index-of\0-an-array";
echo $array[$string];
The error message correctly identified the that the two keys were wrong, which
they are
"index-of\0-an-array" != "index-of-an-array"
The problem is that the error message printed out everything up to the null character. If that's the case then it might be considered a bug by some.
Second Example is starting plumb the depths of PHP :)
I've added some code to it so we can see what's happening
<?php
class Foo {
public $index_public;
protected $index_prot;
private $index_priv;
function __construct() {
$this->index_public = 0;
$this->index_prot = 1;
$this->index_priv = 2;
}
}
$foo = new Foo();
$array = (array)$foo;
print_r($foo);
print_r($array);
//echo $array["\0Foo\0index_of_an_array2"];//This prints 2
//echo $foo->{"\0Foo\0index_of_an_array2"};//This fails
var_dump($array);
echo array_keys($array)[0] . "\n";
echo $array["\0Foo\0index_priv"] . "\n";
echo $array["\0*\0index_prot"] . "\n";
The above codes output is
Foo Object
(
[index_public] => 0
[index_prot:protected] => 1
[index_priv:Foo:private] => 2
)
Array
(
[index_public] => 0
[*index_prot] => 1
[Fooindex_priv] => 2
)
array(3) {
'index_public' =>
int(0)
'\0*\0index_prot' =>
int(1)
'\0Foo\0index_priv' =>
int(2)
}
index_public
2
1
The PHP developers choose to use the \0 character as a way to split member variable types. Note, protected fields use a * to indicate that the member variable may actually belong to many classes. It's also used to protect private access ie this code would not work.
echo $foo->{"\0Foo\0index_priv"}; //This fails
but once you cast it to an array then there is no such protection ie this works
echo $array["\0Foo\0index_priv"]; //This prints 2
Is there any reason for this behavior/implementation?
Yes. On any system that you need to interface with you need to make system
calls, if you want the current time or to convert a date etc you need to talk
to the operating system and this means calling the OS API, in the case of Linux
this API is in C.
PHP was original developed as a thin wrapper around C quite a few languages
start out this way and evolve, PHP is no exception.
Is there any reason for this behavior/implementation?
In the absence of any backwards compatibility issues I'd say some of the choices are less than optimal but my suspicion is that backwards compatibility is a large factor.
But I still do not understand why the language is designed this way?
Backwards compatibility is almost always the reason why features that people don't like remain in a language. Over time languages evolve and remove things but it's incremental and prioritized. If you had asked all the PHP developers do they want better binary string handling for some functions or a JIT compiler I think a JIT might win which it did in PHP 7. Note, the people doing the actual work ultimately decide what they work on and working on a JIT compiler is more fun than fixing libraries that do things in seemingly odd ways.
I'm not aware of a any language implementor that doesn't wish they'd done some things differently from the outset. Anyone implementing a compiler before a
language is popular is under a lot of pressure to get something that works for
them and that means cutting corners, not all languages in existence today had a
huge company backing them, most often it was a small dedicated team and they
made mistakes, some were lucky enough to get paid to do it. Calling them lazy
is a bit unfair.
All language have dark corners warts and boils and features you'll eventually hate. Some more than others and PHP has a bad rep because it has/had a lot more than most. Note, PHP 5 is a vast leap forward from PHP 4. I'd imagine that PHP 7 will improve things even more.
Anyone that thinks their favorite language is free from problems is delusional and has almost certainly not plumbed the depths of the tool their using to any great depth.
Functions in PHP which internally operate with C strings are "not binary safe" in PHP terminology. C string is an array of bytes ending with byte 0. When a PHP function internally uses C strings, it reads one by one character and when it encounters byte 0 it considers it as an end of string. Byte 0 tells C string functions where is the end of string since C string does not contain any information about string length.
"Not binary safe" means that, if function which operates with C string is somehow handed a C string not terminated with byte 0, behavior is unpredictable because function will read/write bytes beyond end of the string, adding garbage to string and/or potentially crashing PHP.
In C++, for example, we have string object. This object also contains an array of characters, but it has also a length field which it updates on any length change. So it does not require byte 0 to tell it where the end is. This is why string object can contain any number of 0 bytes, although this is generally not valid since it should contain only valid characters.
In order for this to be corrected, the whole PHP core, including any modules which operate with C strings, need to be rewritten in order to send "non binary safe" functions to history. The amount of job needed for this is huge and all the modules' creators need to produce new code for their modules. This can introduce new bugs and instabilities into the whole story.
Issue with byte 0 and "non binary safe" functions is not that much critical to justify rewriting PHP and PHP modules code. Maybe in some newer PHP version where some things need to be coded from scratch it would make sense to correct this.
Until then, you just need to know that any arbitrary binary data put to some string by using binary-safe functions needs to have byte 0 added at the end. Usually you will notice this when there is unexpected garbage at end of your string or PHP crashes.

Php array string or var string?

I've been wooking with some strings in php to make my own framework...
There's something that "bothers" me.
$var = "hello!";
$arr = array("h","e","l","l","o","!");
Can someone tell me which one ($var or $arr) uses more memory then the other one? And why?
At first sight I would say the array would use more memory since it has to position each character inside the array itself, but I'm not sure.
The array will use more memory than the string
A string and an array are zval structures in their own right, but each element in the array is a string as well, each with its own zval; arrays take a surprising amount of memory. There is also the fact that an array element comprises both a key and a value, each using memory
Take a read of this article to see just how much memory is used by an array structure
The array takes (a lot!) more memory.
A string in PHP is an object in memory that holds (for example) the length and a pointer to the actual data in memory. I think that on most platforms that gives you 32 bits for the length and 64 bits for the pointer. With 16-byte alignment requirements from some CPUs, this means that each string will be at least 32 bytes (descriptor + actual data) - even if it's only a single character.
The array from your example contains 6 strings. That will be 192 bytes plus the overhead of storing an array, which is not insignificant either (count on at least 128 more bytes).
Disclaimer: the numbers used in this answer are a rough approximation - expect far more overhead than mentioned here.

Converting Fortran REAL packing routine to PHP

I am tasked with converting some Fortran code to PHP and am stumbling at the last hurdle.
In essence the Fortran converts a REAL into a binary CHAR(4) which it ultimately writes to file.
The Fortran (which also confuses me) is as follows:
FUNCTION MKS(x)
CHARACTER (LEN=4) :: MKS ! The 4-character string which is returned to
REAL :: x ! The incoming single-precision variable
CHARACTER (LEN=1), DIMENSION(4) :: a ! A working variable
CHARACTER (LEN=4) :: d ! A working variable
CALL MKS1(x,a) ! Send x - get back a(1), a(2), a(3), a(4)
! Note: x will hold the first 32 bits referenced
! and a will hold the next 32 bits
d = a(1) // a(2) // a(3) // a(4) ! concatenate into 1 string (d)
MKS = d ! assign string to variable MKS
END FUNCTION MKS
SUBROUTINE MKS1 (b,c)
IMPLICIT NONE
CHARACTER (LEN=1), DIMENSION(4) :: b ! array with incoming 32 bits
CHARACTER (LEN=1), DIMENSION(4) :: c ! array with each character returned
INTEGER :: i ! DO Loop counter
DO i=1,4
c(i) = b(i)
END DO
END SUBROUTINE MKS1
I have attempted to recreate this function using php as follows
pack('CCCC', $value & 0x000F,
($value>>8) & 0x000F,
($value>>16) & 0x000F,
($value>>24) &0x000F);
But on comparing the output values using the *nix od command shows completely different results.
What is the correct way to pack the equivalent to a Fortran REAL into a char[4] Array in PHP?
It turned out to be quite simple.
Your FORTAN REAL is stored as an IEEE 754 32 bit floating point number.
The output from your od was misleading. Converting it to hex gives the following.
0115040 0134631 0005077
0x20, 0x9A, 0x99, 0xB9, 0x3f, 0x0a
The first and last bytes of the file are redundant, they are a space and a carriage return respectively. The bit we're after is the middle 4 bytes.
Using pack we can convert from floats (warning - endianness is machine dependant).
The following:
var_dump(bin2hex(pack('f', 1.450)));
Gives us a familar sequence of bytes.
string(8) "9a99b93f"
So instead of converting to hex, output that to a file with a space at the start and a carriage return at the end, and you'll have an identical file. (as long as your PHP/machine configuration doesn't do something mad with floats - but even then if you follow the IEEE 754 spec, you should be able to reproduce it)
This might be an extended comment rather than an answer.
You are, in the statement
CALL MKS1(x,a)
passing a REAL argument where the subroutine expects an array of 4 length-1 characters. You deserve all the bad things which happen to you :-) You can only compile this because you haven't required explicit interfaces on your subroutines.
What do you want the 4 characters that your PHP program reads to be ? If, for instance, your Fortran wrote the REAL into 4 bytes in a binary file and your PHP read 4-bytes as 4 single characters would you get the characters you want ? I'm uncertain of what your requirement is.

Categories