Why are these pthreads segmentation faults possible?

Why are these pthreads segmentation faults possible? - php

In this introduction to pthreads I read that:
When the programmer calls Thread::start, a new thread is created, a PHP interpreter context is initialized and then (safely) manipulated to mirror the context that made the call to ::start.
And later in the text the segmentation fault problem is addressed. This example of a segmentation fault is given:
class W extends Worker {
public function run(){}
}
class S extends Stackable {
public function run(){}
}
/* 1 */
$w = new W();
/* 2 */
$j = array(
new S(), new S(), new S()
);
/* 3 */
foreach ($j as $job)
$w->stack($job);
/* 4 */
$j = array();
$w->start();
$w->shutdown();
The above example will always segfault; steps 1-3 are perfectly normal, but before the Worker is started the stacked objects are deleted, resulting in a segfault when the Worker is allowed to start.
The questions are:
Is the whole context that starts the new thread copied into the new thread when start() is called, or only at the time the interpreter sees a reference to a variable of the old context? In other words, is it enough to keep refcounts > 0 until start() is called?
Shouldn't references to the Stackable array enties be stored inside the Worker object so that the refcount of them after overwriting $j is still 1 and no segfault could occur?

1) The whole context is copied when the thread is started. You have to keep refcounts > 0 until the stacked objects are actually executed by the worker thread.
2) The reference counting built into variables in PHP was never prepared for multi-threading, many api functions decrement and increment refcounts and there is no opportunity to synchronize (lock). For that reason, you are responsible for maintaining a reference to any object that is destined to be executed by another thread.
These rather annoying, but unavoidable, facts can be side stepped by using the Pool abstraction provided with pthreads. It maintains references for you in the proper way.
http://php.net/Pool

Related

How should a PHP thread store its data?

So I have been googling and reading up and down the internet about PHP pthreads3 and how they are supposed to store data. (Or rather, how they are not)
It seems to me that the only way for a thread to store its data properly is to create a new Threaded object and send it to the thread. The thread can then use this Threaded object to store nearly any data.
My question, and biggest issue with grasping PHP threads:
Is it possible to have the thread create its own storage objects when it wants?
I have no idea how or why, since all the answer I've found on this tell a vague, elaborate and confusing "maybe, but no", mostly related to poor performance and memory issues/safety.
This seems like it should be possible, somehow:
class someFantasticThread extends Thread {
public $someData;
function run(){
while(true){
//Create a fresh storage for the new data this iteration
$this->someData = new SomeCoolStorage(); // Can this work somehow without all the issues?
$this->someData[] = 'amazingdata'; // Do something amazing and store the new results in $someData
$this->someData[] = new SomeCoolStorage(); // This would also be desireable, if it can somehow be done
//don't mind the obvious loop issues. Imagine this is a well formed loop
}
}
}
class SomeCoolStorage extends Threaded{}
// Start the thread
$threadObj = new someFantasticThread();
$threadObj->start();
while(true){
// at some point, retrieve the data and do something useful with the contained results
// doSomethingAwesome($threadObj->someData);
}

It seems to me that the only way for a thread to store its data properly is to create a new Threaded object and send it to the thread.
Yes, that is one way to do it.
Is it possible to have the thread create its own storage objects when it wants?
Yes, but only if you manipulate it within that thread (or any child threads it may spawn).
One of the fundamental things to understand when using threads in PHP is that objects of a Threaded class are tied to the context in which they are created. This means that if you create a Threaded object in the main thread, pass this object into a spawned child thread, and then join that spawned child thread, then you may continue to use that Threaded object as normal.
Example 1 (constructor injection):
<?php
$store = new Threaded(); // created in the main thread
$thread = new class($store) extends Thread {
public $store;
public function __construct(Threaded $store)
{
$this->store = $store;
}
public function run()
{
$this->store[] = 1;
$this->store[] = 2;
}
};
$thread->start() && $thread->join();
print_r($store); // continue using it in the main thread
This will output:
Threaded Object
(
[0] => 1
[1] => 2
)
In the example above, we could also have created the Threaded object inside of the constructor, and then performed a var_dump($thread->store); at the end of the script. This works because the Threaded object is still being created in the outermost scope in which it is needed, and thus it is not tied to the scope of any child threads that may have already been destroyed. (The only part of a Thread in PHP that is executed in a separate thread is the Thread::run method.)
Similar to the above example, we could also have used setter injection. (Though, again, just so long as the setter is being called by the thread in the outer most scope in which the Threaded object will be used.)
The problem that many developers who are new to threading in PHP seem to encounter, is when they create a Threaded object from inside of a new thread, and then expect to be able to use that Threaded object when they have joined that same thread.
Example:
<?php
$thread = new class() extends Thread {
public $store;
public function run()
{
$this->store = new Threaded(); // created inside of the child thread
$this->store[] = 1;
$this->store[] = 2;
}
};
$thread->start() && $thread->join();
print_r($thread->store); // attempt to use it in the outer context (the main thread)
This will output:
RuntimeException: pthreads detected an attempt to connect to an object which has already been destroyed in %s:%d
This is because the Threaded object in $thread->store has been destroyed when joining the spawned child thread. This problem can be far more subtle, too. For example, creating new arrays inside of Threaded objects will automatically cast them to Volatile objects (which are also Threaded objects).
This means that the following example will not work either:
<?php
$thread = new class() extends Thread {
public $store;
public function run()
{
$this->store = [];
$this->store[] = 1;
$this->store[] = 2;
}
};
$thread->start() && $thread->join();
print_r($thread->store);
Output:
RuntimeException: pthreads detected an attempt to connect to an object which has already been destroyed in %s:%d
To come back to your example code, what you're doing is absolutely fine, but only so long as you do not attempt to use $this->someData outside of that child thread.

How does a class extension or Interface work?

Have come across this so many times and am not sure why so it got me curious. Some classes work before they are declared and others don't;
Example 1
$test = new TestClass(); // top of class
class TestClass {
function __construct() {
var_dump(__METHOD__);
}
}
Output
string 'TestClass::__construct' (length=22)
Example 2
When a class extends another class or implements any interface
$test = new TestClass(); // top of class
class TestClass implements JsonSerializable {
function __construct() {
var_dump(__METHOD__);
}
public function jsonSerialize() {
return json_encode(rand(1, 10));
}
}
Output
Fatal error: Class 'TestClass' not found
Example 3
Let's try the same class above but change the position
class TestClass implements JsonSerializable {
function __construct() {
var_dump(__METHOD__);
}
public function jsonSerialize() {
return json_encode(rand(1, 10));
}
}
$test = new TestClass(); // move this from top to bottom
Output
string 'TestClass::__construct' (length=22)
Example 4 ( I also tested with class_exists )
var_dump(class_exists("TestClass")); //true
class TestClass {
function __construct() {
var_dump(__METHOD__);
}
public function jsonSerialize() {
return null;
}
}
var_dump(class_exists("TestClass")); //true
as soon as it implements JsonSerializable ( Or any other)
var_dump(class_exists("TestClass")); //false
class TestClass implements JsonSerializable {
function __construct() {
var_dump(__METHOD__);
}
public function jsonSerialize() {
return null;
}
}
var_dump(class_exists("TestClass")); //true
Also Checked Opcodes without JsonSerializable
line # * op fetch ext return operands
---------------------------------------------------------------------------------
3 0 > SEND_VAL 'TestClass'
1 DO_FCALL 1 $0 'class_exists'
2 SEND_VAR_NO_REF 6 $0
3 DO_FCALL 1 'var_dump'
4 4 NOP
14 5 > RETURN 1
Also Checked Opcodes with JsonSerializable
line # * op fetch ext return operands
---------------------------------------------------------------------------------
3 0 > SEND_VAL 'TestClass'
1 DO_FCALL 1 $0 'class_exists'
2 SEND_VAR_NO_REF 6 $0
3 DO_FCALL 1 'var_dump'
4 4 ZEND_DECLARE_CLASS $2 '%00testclass%2Fin%2FaDRGC0x7f563932f041', 'testclass'
5 ZEND_ADD_INTERFACE $2, 'JsonSerializable'
13 6 ZEND_VERIFY_ABSTRACT_CLASS $2
14 7 > RETURN 1
Question
I know Example 3 worked because the class was declared before its initiated but why would Example 1 work in the first place ?
How does this entire process of extending or interface work in PHP to make one valid and the other invalid?
What Exactly is happening in Example 4?
Opcodes was supposed to make things clear but just made it more complex because class_exists was called before TestClass but the reverse is the case.

I can not find a write up on PHP class definitions; however, I imagine it is precisely the same as the User-defined functions which your experiments indicate.
Functions need not be defined before they are referenced, except when a function is conditionally defined as shown in the two examples below. When a function is defined in a conditional manner; its definition must be processed prior to being called.
<?php
$makefoo = true;
/* We can't call foo() from here
since it doesn't exist yet,
but we can call bar() */
bar();
if ($makefoo) {
function foo()
{
echo "I don't exist until program execution reaches me.\n";
}
}
/* Now we can safely call foo()
since $makefoo evaluated to true */
if ($makefoo) foo();
function bar()
{
echo "I exist immediately upon program start.\n";
}
?>
This is true for classes as well:
Example 1 works because the class is not conditional on anything else.
Example 2 fails because the class is conditional upon JsonSerializable.
Example 3 works because the class is correctly defined prior to being called.
Example 4 gets false the first time because the class is conditional but succeeds later because the class has been loaded.
The class is made conditional by either implementing an interface or extending another class from another file (require). I'm calling it conditional because the definition now relies upon another definition.
Imagine the PHP interpreter takes a first look at the code in this file. It sees a non-conditional class and/or function, so it goes ahead and loads them in memory. It sees a few conditional ones and skips over them.
Then the Interpreter begins to parse the page for execution. In example 4, it gets to the class_exists("TestClass") instruction, checks memory, and says nope, I don't have that. If doesn't have it because it was conditional. It continues executing the instructions, see the conditional class and executes the instructions to actually load the class into memory.
Then it drops down to the last class_exists("TestClass") and sees that the class does indeed exist in memory.
In reading your opcodes, the TestClass doesn't get called before class_exist. What you see is the SEND_VAL which is sending the value TestClass so that it is in memory for the next line, which actually calls DO_FCALL on class_exists
You can then see how it is handling the class definition itself:
ZEND_DECLARE_CLASS - this is loading your class definition
ZEND_ADD_INTERFACE - this fetches JsonSerializable and adds that to your class defintion
ZEND_VERIFY_ABSTRACT_CLASS - this verifies everything is sane.
It is that second piece ZEND_ADD_INTERFACE that appears to prevent the PHP Engine from merely loading the class on the initial peak at it.
If you desire a more detailed discussion of how the PHP Interpreter
Compiles and Executes the code in these scenarios, I suggest taking a
look at #StasM answer
to this question, he
provides an excellent overview of it in greater depth than this answer goes.
I think we answered all of your questions.
Best Practice: Place each of your classes in it's own file and then autoload them as needed, as #StasM states in his answer, use a sensible file naming and autoloading strategy - e.g. PSR-0 or something similar. When you do this, you no longer have to be concerned with the order of the Engine loading them, it just handles that for you automatically.

The basic premise is that for class to be used it has to be defined, i.e. known to the engine. This can never be changed - if you need an object of some class, the PHP engine needs to know what the class is.
However, the moment where the engine gains such knowledge can be different. First of all, consuming of the PHP code by the engine consists of two separate processes - compilation and execution. On compilation stage, the engine converts PHP code as you know it to the set of opcodes (which you are already familiar with), on the second stage the engine goes through the opcodes as processor would go through instructions in memory, and executes them.
One of the opcodes is the opcode that defines a new class, which is usually inserted in the same place where the class definition is in the source.
However, when the compiler encounters class definition, it may be able to enter the class into the list of the classes known to the engine before executing any code. This is called "early binding". This can happen if the compiler decides that it already has all the information it needs to create a class definition, and there's no reason to defer the class creation until the actual runtime. Currently, the engine does this only if the class:
has no interfaces or traits attached to it
is not abstract
either does not extend any classes or extends only the class that is already known to the engine
is declared as top statement (i.e. not inside condition, function, etc.)
This behavior can also be modified by compiler options, but those are available only to extensions like APC so should not be a matter of much concern to you unless you are going to develop APC or similar extension.
This also means this would be OK:
class B extends A {}
class A { }
but this would not be:
class C extends B {}
class B extends A {}
class A { }
Since A would be early bound, and thus available for B's definition, but B would be defined only in line 2 and thus unavailable for line 1's definition of C.
In your case, when your class implemented the interface, it was not early bound, and thus became known to the engine at the point when "class" statement was reached. When it was simple class without interfaces, it was early bound and thus became known to the engine as soon as the file compilation was finished (you can see this point as one before the first statement in the file).
In order not to bother with all these weird details of the engine, I would support the recommendation of the previous answer - if your script is small, just declare classes before usage. If you have bigger application, define your classes in individual files, and have sensible file naming and autoloading strategy - e.g. PSR-0 or something similar, as suitable in your case.

In which order are objects destructed in PHP?

What is the exact order of object deconstruction?
From testing, I have an idea: FIFO for the current scope.
class test1
{
public function __destruct()
{
echo "test1\n";
}
}
class test2
{
public function __destruct()
{
echo "test2\n";
}
}
$a = new test1();
$b = new test2();
Which produces the same results time and time again:
test1
test2
The PHP manual is vague (emphasis mine to highlight uncertainty): "The destructor method will be called as soon as there are no other references to a particular object or in any order during the shutdown sequence."
What is the exact order of deconstruction? Can anyone describe in details the implementation of destruction order that PHP uses? And, if this order is not consistent between all the PHP versions, can anyone pinpoint which PHP versions change in this order?

First of all, a bit on general object destruction order is covered here: https://stackoverflow.com/a/8565887/385378
In this answer I will only concern myself with what happens when objects are still alive during the request shutdown, i.e. if they were not previously destroyed through the refcounting mechanism or the circular garbage collector.
The PHP request shutdown is handled in the php_request_shutdown function. The first step during the shutdown is calling the registered shutdown functions and subsequently freeing them. This can obviously also result in objects being destructed if one of the shutdown functions was holding the last reference to some object (or if the shutdown function itself was an object, e.g. a closure).
After the shutdown functions have run the next step is the one interesting to you: PHP will run zend_call_destructors, which then invokes shutdown_destructors. This function will (try to) call all destructors in three steps:
First PHP will try to destroy the objects in the global symbol table. The way in which this happens is rather interesting, so I reproduced the code below:
int symbols;
do {
symbols = zend_hash_num_elements(&EG(symbol_table));
zend_hash_reverse_apply(&EG(symbol_table), (apply_func_t) zval_call_destructor TSRMLS_CC);
} while (symbols != zend_hash_num_elements(&EG(symbol_table)));
The zend_hash_reverse_apply function will walk the symbol table backwards, i.e. start with the variable that was created last and going towards the variable that was created first. While walking it will destroy all objects with refcount 1. This iteration is performed until no further objects are destroyed with it.
So what this basically does is a) remove all unused objects in the global symbol table b) if there are new unused objects, remove them too c) and so on. This way of destruction is used so objects can depend on other objects in the destructor. This usually works fine, unless the objects in the global scope have complicated (e.g. circular) interrelations.
The destruction of the global symbol table differs significantly from the destruction of all other symbol tables. Normally symbol tables are destructed by walking them forward and just dropping the refcount on all objects. For the global symbol table on the other hand PHP uses a smarter algorithm that tries to respect object dependencies.
The second step is calling all remaining destructors:
zend_objects_store_call_destructors(&EG(objects_store) TSRMLS_CC);
This will walk all objects (in order of creation) and call their destructor. Note that this only calls the "dtor" handler, but not the "free" handler. This distinction is internally important and basically means that PHP will only call __destruct, but will not actually destroy the object (or even change its refcount). So if other objects reference the dtored object, it will still be available (even though the destructor was already called). They will be using some kind of "half-destroyed" object, in a sense (see example below).
In case the execution is stopped while calling the destructors (e.g. due to a die) the remaining destructors are not called. Instead PHP will mark the objects are already destructed:
zend_objects_store_mark_destructed(&EG(objects_store) TSRMLS_CC);
The important lesson here is that in PHP a destructor is not necessarily called. The cases when this happens are rather rare, but it can happen. Furthermore this means that after this point no more destructors will be called, so the remainder of the (rather complicated) shutdown procedure does not matter anymore. At some point during the shutdown all the objects will be freed, but as the destructors have already been called this is not noticeable for userland.
I should point out that this is the shutdown order as it currently is. This has changed in the past and may change in the future. It's not something you should rely on.
Example for using an already destructed object
Here is an example showing that it is sometimes possible to use an object that already had its destructor called:
<?php
class A {
public $state = 'not destructed';
public function __destruct() { $this->state = 'destructed'; }
}
class B {
protected $a;
public function __construct(A $a) { $this->a = $a; }
public function __destruct() { var_dump($this->a->state); }
}
$a = new A;
$b = new B($a);
// prevent early destruction by binding to an error handler (one of the last things that is freed)
set_error_handler(function() use($b) {});
The above script will output destructed.

What is the exact order of deconstruction? Can anyone describe in detail the implementation of destruction order that PHP uses? And, if this order is not consistent between any and all PHP versions, can anyone pinpoint which PHP versions this order changes in?
I can answer three of these for you, in a somewhat roundabout way.
The exact order of destruction is not always clear, but is always consistent given a single script and PHP version. That is, the same script running with the same parameters that creates objects in the same order will basically always get the same destruction order as long as it runs on the same PHP version.
The shutdown process -- the thing that triggers object destruction when script execution has stopped -- has changed in the recent past, at least twice in a way that impacted the destruction order indirectly. One of these two introduced bugs in some old code I had to maintain.
The big one was back in 5.1. Prior to 5.1, the user's session was written to disk at the very start of the shutdown sequence, before object destruction. This meant that session handlers could access anything that was left over object-wise, like, say, custom database access objects. In 5.1, sessions were written after one sweep of object destruction. In order to retain the previous behavior, you had to manually register a shutdown function (which are run in order of definition at the start of shutdown before destruction) in order to successfully write session data if the write routines needed a (global) object.
It is not clear if the 5.1 change was intended or was a bug. I've seen both claimed.
The next change was in 5.3, with the introduction of the new garbage collection system. While the order of operations at shutdown remained the same, the precise order of destruction could now change based on ref counting and other delightful horrors.
NikiC's answer has details on the current (at time of writing) internal implementation of the shutdown process.
Once again, this is not guaranteed anywhere, and the documentation very expressly tells you to never assume a destruction order.

For anyone interested - as at PHP 8.0:
class A {
function __destruct() {
print get_class();
}
}
class B {
private $child;
function __construct() {
$this->child = new A();
}
function __destruct() {
print get_class();
}
}
class C {
private $child;
function __construct() {
$this->child = new B();
}
function __destruct() {
print get_class();
}
}
new C;
results in output of
CBA
ie. the containing object destructor fires before the contained object destructor.
To reverse the order if desired ie. to ABC, change the destructor in all but A (innermost class) to be:
function __destruct() {
unset($this->child);
print get_class();
}

What is the point of nulling private variables in destructor?

I have spot the following pattern in code I'm working with: in some classes in destructor I have found private variable being nulled, in example:
public function __destruct()
{
foreach($this->observers as $observer)
{
$observer = null;
}
$this->db_build = null;
}
Is there any point in doing this when PHP has GC? Does it somehow improve performance of script?

It's sometimes just for the cleanliness meme. But in your exmaple both $observer and ->$db_build reference sub-objects. So here the intention is to have them destroyed before the destruction of the current object finishes. (Albeit I'm unsure if Zend core really likes being interrupted when it's on a destroying rampage. It probably has a spool list or something.)
Anyway, it's not necessary from the GC point of view. But it might be sensible if the composite subojects have some inderdependencies; e.g. counters or registry references themselves. So, in most cases not needed I'd say.
I've made a silly example to demonstrate the __destruct order:
class dest {
function __construct($name, $sub=NULL) {
$this->name = $name;
$this->sub = $sub;
}
function __destruct() {
print "<destroying $this->name>\n";
$this->sub = NULL;
print "</destroying $this->name>\n";
}
}
$d = new dest("first", new dest("second", new dest("third")));
exit;
Without the $this->sub = NULL the destruction of the objects would each happen individually, not necessarily in instantiation order. With unsetting composite objects manually however PHP destroys the three objects in a nested fashion:
<destroying first>
<destroying second>
<destroying third>
</destroying third>
</destroying second>
</destroying first>

It could be because PHP's garbage collection is based on reference countring, and older versions could not handle cyclical dependencies. Then, in some cases it would have been necessary to manually set references to null to enable the GC to do its work, and there may still be some special cases that the cycle detection algorithm does not catch.
More likely though, it's just an example of cargo cult programming (the Wikipedia entry even explicitly lists this as an example).

first - it's a good programming tone, second - it makes script memory free. if right after invoking destructor php script terminates, i see no advantages.

Will the same singleton instance be available after php script command line call?

I have script with defined class (for instance, Singleton.php). This class implements classic singleton pattern as in PHP manual:
class Singleton {
private static $instance;
public static function getInstance()
{
if (!isset(self::$instance)) {
$c = __CLASS__;
self::$instance = new $c;
}
return self::$instance;
}
public function run() {
// bunch of "thread safe" operations
} }
$inst = Singleton::getInstance();
$inst->run();
Question. If I call this script twice from command line ('php Singleton.php'), will run() method be really "thread safe"? It seems that it will not. I used to imitate single-process run via text file where some flag is stored, but it seems that there might be other cases. Your thoughts?

Singletons have nothing to do with thread-safety. They are here to only have one instance of an object per process.
so, to answer your question: no, your script is not thread safe. php will start one process (not thread) for each call on the cli. both processes will create an instance of your class and both will try to write the file.
the process to later write the file will win, and overwrite changes from the first process.

PHP is not threaded - it is process oriented. Each invocation of PHP (wether it be commandline or apache instance) is memory independent.
Your singleton will only be unique to that one process.
(oh and instead of doing $c=__CLASS__; $instance = new $c; you should use 'self' like $instance = new self();. Same result, less fuss. Also be sure to set your __construct() private/protected)

If you run this script from the command line twice (concurrently, I guess), you will get two completely distinct processes, therefore the thread safety is not an issue: there are no threads here.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.