PHP Multi Threading - Synchronizing a cache file between threads - php
I created a script, that, for a game situation tries to find the best possible solution. It does this, by simulating each and every possible move, and quantifying them, thus deciding which is the best move to take (which will result in the fastest victory). To make it faster, I've implemented PHP's pthread, in the following way: each time the main thread needs to find a possible move (let's call this JOB), it calculates all the possible moves in the current depth, then starts a Pool, and adds to it, each possible move (let's call this TASK), so the threads develop the game tree for each move separately, for all the additional depths.
This would look something like this:
(1) Got a new job with 10 possible moves
(1) Created a new pool
(1) Added all jobs as tasks to the pool
(1) The tasks work concurently, and return an integer as a result, stored in a Volatile object
(1) The main thread selects a single move, and performs it
.... the same gets repeated at (1) until the fight is complete
Right now, the TASKS use their own caches, meaning while they work, they save caches and reuse them, but they do not share caches between themselves, and they do not take caches over from a JOB to another JOB. I tried to resolve this, and in a way managed, but I don't think this is the intended way, because it makes everything WAY slower.
What I tried to do is as follows: create a class, that will store all the cache hashes in arrays, then before creating the pool, add it to a Volatile object. Before a task is being run, it retrieves this cache, uses it for read/write operation, and when the task finished, it merges it with the instance which is in the Volatile object. This works, as in, the caches made in JOB 1, can be seen in JOB 2, but it makes the whole process way much slower, then it was, when each thread only used their own cache, which was built while building the tree, and then destroyed, when the thread finished. Am I doing this wrong, or the thing I want is simply not achieavable? Here's my code:
class BattlefieldWork extends Threaded {
public $taskId;
public $innerIterator;
public $thinkAhead;
public $originalBattlefield;
public $iteratedBattlefield;
public $hashes;
public function __construct($taskId, $thinkAhead, $innerIterator, Battlefield $originalBattlefield, Battlefield $iteratedBattlefield) {
$this->taskId = $taskId;
$this->innerIterator = $innerIterator;
$this->thinkAhead = $thinkAhead;
$this->originalBattlefield = $originalBattlefield;
$this->iteratedBattlefield = $iteratedBattlefield;
}
public function run() {
$result = 0;
$dataSet = $this->worker->getDataSet();
$HashClassShared = null;
$dataSet->synchronized(function ($dataSet) use(&$HashClassShared) {
$HashClassShared = $dataSet['hashes'];
}, $dataSet);
$myHashClass = clone $HashClassShared;
$thinkAhead = $this->thinkAhead;
$innerIterator = $this->innerIterator;
$originalBattlefield = $this->originalBattlefield;
$iteratedBattlefield = $this->iteratedBattlefield;
// the actual recursive function that will build the tree, and calculate a quantify for the move, this will use the hash I've created
$result = $this->performThinkAheadMoves($thinkAhead, $innerIterator, $originalBattlefield, $iteratedBattlefield, $myHashClass);
// I am trying to retrieve the common cache here, and upload the result of this thread
$HashClassShared = null;
$dataSet->synchronized(function($dataSet) use ($result, &$HashClassShared) {
// I am storing the result of this thread
$dataSet['results'][$this->taskId] = $result;
// I am merging the data I've collected in this thread with the data that is stored in the `Volatile` object
$HashClassShared = $dataSet['hashes'];
$HashClassShared = $HashClassShared->merge($myHashClass);
}, $dataSet);
}
}
This is how I create my tasks, my Volatile, and my Pool:
class Battlefield {
/* ... */
public function step() {
/* ... */
/* get the possible moves for the current depth, that is 0, and store them in an array, named $moves */
// $nextInnerIterator, is an int, which shows which hero must take an action after the current move
// $StartingBattlefield, is the zero point Battlefield, which will be used in quantification
foreach($moves as $moveid => $move) {
$moves[$moveid]['quantify'] = new BattlefieldWork($moveid, self::$thinkAhead, $nextInnerIterator, $StartingBattlefield, $this);
}
$Volatile = new Volatile();
$Volatile['results'] = array();
$Volatile['hashes'] = $this->HashClass;
$pool = new Pool(6, 'BattlefieldWorker', [$Volatile]);
foreach ($moves as $moveid => $move) {
if (is_a($moves[$moveid]['quantify'], 'BattlefieldWork')) {
$pool->submit($moves[$moveid]['quantify']);
}
}
while ($pool->collect());
$pool->shutdown();
$HashClass = $Volatile['hashes'];
$this->HashClass = $Volatile['hashes'];
foreach ($Volatile['results'] as $moveid => $partialResult) {
$moves[$moveid]['quantify'] = $partialResult;
}
/* The moves are ordered based on quantify, one is selected, and then if the battle is not yet finished, step is called again */
}
}
And here is how I am merging two hash classes:
class HashClass {
public $id = null;
public $cacheDir;
public $battlefieldHashes = array();
public $battlefieldCleanupHashes = array();
public $battlefieldMoveHashes = array();
public function merge(HashClass $HashClass) {
$this->battlefieldCleanupHashes = array_merge($this->battlefieldCleanupHashes, $HashClass->battlefieldCleanupHashes);
$this->battlefieldMoveHashes = array_merge($this->battlefieldMoveHashes, $HashClass->battlefieldMoveHashes);
return $this;
}
}
I've benchmarked each part of the code, to see where am I losing time, but everything seems to be fast enough to not warrant the time increase I am experiencing. What I am thinking is, that the problem lies in the Threads, sometimes, it seems that no job is being done at all, like they are waiting for some thread. Any insights on what could be the problem, would be greatly appreciated.
Related
How to implement a good MVC pattern with PHP/MySQL without loosing SQL request time/server memory ? (good practices)
I want to implement a real pattern MVC for my php controllers. Especially, i want to split Model and API by creating java "bean" equivalent in PHP (objects made for business organization) and an API using these business objects. For exemple, my basic object is the Member. The question is : where i request my database ? Do I request all the members proprities right at __construct, and i simply access them with the getters OR i do nothing in the __construct and I call the database in every getter function ? People tell me that the 1st solution is better, though, if i want only a specific information in my controller, i will create a Member with all the informations computed right at construct (bad memory management). In the 2nd case, if i want several members proprities, i will do several SQL request which will increase my server execution time. 1st solution : public function __construct($ID_membre,$sql) { $this->ID_membre = $ID_membre; $res = mysql_get("select * from membres where ID_membre = $ID_membre",$sql); if(!$res) throw new Exceptions\MyDefaultException("no member for this Id"); $this->firstName = $res['first_name']; $this->lastName = $res['last_name']; $this->mail = $res['mail']; $this->gender = $res['gender']; // .... $this->sql = $sql; } public function getLastName() { return $this->lastName; } public function getMail() { return $this->mail; } public function getGender() { return $this->gender; } // .... 2nd solution : public function __construct($ID_membre,$sql) { $this->ID_membre = $ID_membre; $res = mysql_get("select count(*) from membres where ID = $ID_membre",$sql); if($res == 0) throw new Exceptions\MyDefaultException("no member with this id"); $this->sql = $sql; } public function getLastName() { mysql_get("select name from members where ID = {$this->id}",$this->sql); return $this->lastName; } public function getMail() { mysql_get("select mail from members where ID = {$this->id}",$this->sql); return $this->mail; } public function getGender() { mysql_get("select gender from members where ID = {$this->id}",$this->sql); return $this->gender; } In this context, good old SQL custom request within controllers are perfect to not waste time or memory, because they are customs. So, why doing such request are so poorly viewed nowsaday ? And if big organizations such as Fb or Google do MVC with database, how they manage to not waste any time/memory while splitting Model and controllers ?
This is a classic problem, which can even get worse if you want one property of many members. The standard answer is that solution 1 is better. Requesting one row from a database doesn't take much longer than requesting one value from a database, so it makes sense to ask a whole row at once. That is unless your database rows get very big. That should however not occur in good database design. If your rows get so big that they hamper efficiency then it is probably time to split the table. Now back to the problem I mentioned at the start of this answer. You haven't solved this. My suggestion would be to make two classes: One with solution 1, dealing with one row, and one with solution 2 dealing with multiple rows. So both solutions have their place, it just that solution 2 is almost always inefficient for dealing with one row, and I haven't even talked about the amount of extra coding it requires.
Migrating data with Doctrine becomes slow
I need to import data from one table in db A to another table in db B (same server) and I've chosen doctrine to import it. I'm using a Symfony Commands and is all good for the first loop, just spends 0.04 secs, but then starts to become slower and slower and takes almost half an hour ... I'm considering to build a shell script to call this Symfony command giving the offset ( I manually tried it and keeps same speed ). This is running in a docker service and the php service is around 100% CPU, however mysql service is 10% Here part of the script: class UserCommand extends Command { ... protected function execute(InputInterface $input, OutputInterface $output) { $container = $this->getApplication()->getKernel()->getContainer(); $this->doctrine = $container->get('doctrine'); $this->em = $this->doctrine->getManager(); $this->source = $this->doctrine->getConnection('source'); $limit = self::SQL_LIMIT; $numRecords = 22690; // Hardcoded for debugging $loops = intval($numRecords / $limit); $numAccount = 0; for ($i = 0; $i < $loops; $i++){ $offset = self::SQL_LIMIT * $i; $users = $this->fetchSourceUsers($offset); foreach ($users as $user) { try{ $numAccount++; $this->persistSourceUser($user); if (0 === ($numAccount % self::FLUSH_FREQUENCY)) { $this->flushEntities($output); } } catch(\Exception $e) { // } } } $this->flushEntities($output); } private function fetchSourceUsers(int $offset = 0): array { $sql = <<<'SQL' SELECT email, password, first_name FROM source.users ORDER by id ASC LIMIT ? OFFSET ? SQL; $stmt = $this->source->prepare($sql); $stmt->bindValue(1, self::SQL_LIMIT, ParameterType::INTEGER); $stmt->bindValue(2, $offset, ParameterType::INTEGER); $stmt->execute(); $users = $stmt->fetchAll(); return $users; } }
If the time it takes to flush is getting longer every other flush then you forgot to clear entity manager (which for batch jobs should happen after flush). Reason is that you keep accumulating entities in the entity manager and during every commit Doctrine is checking each and every one for changes (I assume you're using default change tracking). I need to import data from one table in db A to another table in db B (same server) and I've chosen doctrine to import it. Unless you have some complex logic related to adding users (i.e. application events, something happening on the other side of the app, basically need some other PHP code to be executed) then you've chosen poorly - Doctrine is not designed for batch processing (although it can do just fine if you really know what you're doing). For "simple" migration the best choice would be to go with DBAL.
updating several records with one request in laravel
Some quick context: I have a sql table and a eloquent model for JobCards and each JobCard has several Operations belonging to it. I have a table and model for Operations. The users of my application browse and edit JobCards, but when I say editing a Jobcard this can include editing Operations associated with a JobCard. I have a page where a user can edit the Operations for a certain JobCard, I submit the the data as an array of Operations. I want a clean way to update the data for the Operations of a JobCard. There are 3 different actions I may or may not need to do: Update an existing Operation with new data Create a new Operation Delete an Operatation I tried dealing with the first 2 and things are getting messy already. I still need a way of deleting an Operation if it is not present in the array sent in the request. Heres my code: public function SaveOps(Request $a) { $JobCardNum = $a -> get('JobCardNum'); $Ops = $a -> get('Ops'); foreach ($Ops as $Op) { $ExistingOp = JobCardOp::GetOp($JobCardNum, $Op['OpNum'])->first(); if(count($ExistingOp)==0) { $NewOp = new JobCardOp; $NewOp -> JobCardNum = $JobCardNum; $NewOp -> fill($Op); $NewOp -> save(); $this->UpdateNextOpStatus($JobCardNum, $NewOp); } else { $ExistingOp -> fill($Op); $ExistingOp -> save(); } } Can anyone help with the deletion part and/or help make my code tidier.
This is how your method should look like. Please note that, I added a new method getCache($JobCardNum) this method will get an array of operations per job card (assuming that your model is designed to be related this way) this method will go to the DB only once, to get all the Operations that you need for this method call instead of getting them one-by-one (in the foreach loop), this way you make sure that the expensive call to the DB is done only once, on the other hand you got your JobCard's operations in the form of an array ready to compare with the new ones (coming in the request), the return of this method will be in the form of (key=>value with the key being the operation number and the value being the operation object it self). /** * This function will get you an array of current operations in the given job card * #param $JobCardNum * #return array */ public function getCache($JobCardNum) { /** * asuming that the relation in your model is built that way. if not you should then * use JobCardOp::all(); (Not recommended because it will get a lot of unnecessary * data ) */ $ExistingOps = JobCardOp::where('job_card_id', '=', $JobCardNum); $opCache = array(); foreach ($ExistingOps as $Op) { $opCache[(string)$Op->OpNum] = $Op; } return $opCache; } public function SaveOps(Request $a) { $strOpNum = (string)$Op['OpNum']; $JobCardNum = $a->get('JobCardNum'); $Ops = $a->get('Ops'); $opCache = $this->getCache($JobCardNum); foreach ($Ops as $Op) { if (!isset($opCache[$strOpNum])) { $NewOp = new JobCardOp; $NewOp->JobCardNum = $JobCardNum; $NewOp->fill($Op); $NewOp->save(); $this->UpdateNextOpStatus($JobCardNum, $NewOp); } else { $ExistingOp = $opCache[$strOpNum]; $ExistingOp->fill($Op); $ExistingOp->save(); } unset($opCache[$strOpNum]); } /* * at this point any item in the $opCache array must be deleted because it was not * matched in the previous for loop that looped through the requested operations :) */ foreach ($opCache as $op) { $op->delete(); } }
How to use DataMapper when data loading aspect needs to be optimized?
I have a DataMapper that creates an object, and loads the object with the same data from DB quite often. I have the DataMapper in a loop, to where the object that is being created essentially keeps loading the same SQL over and over again. How can I cache or reuse the data to ease the load on the database? Code $initData = '...'; $result = ''; foreach($models as $model) { $plot = (new PlotDataMapper())->loadData($model, $initData); $plot->compute(); $result[$i] = $plot->result(); } class PlotDataMapper { function loadData($model, $initData) { $plot = Plot($initData); //If the loop above executes 100 times, this SQL //executes 100 times as well, even if $model is the same every time $data = $db->query("SELECT * FROM .. WHERE .. $model"); $plot->setData($data); return $plot; } } My Thoughts My line of thought is that I can use the DataMapper itself as a caching object. If a particular $model number has already been used, I store results in some table of the PlotDataMapper object and retrieve it when I need it. Does that sound good? Kind of like memoizing data from DB.
PHP Garbage Collection for Singleton Properties
(Specifically, I'm working inside Magento, but this is a general PHP question.) I have a singleton class with an array which stores large objects. Please consider the following: protected $_categoryObjectCache = array(); /** * #param $key * #param $value */ public function setCategoryObjectCache($key, $value) { $this->_categoryObjectCache[$key] = $value; if ($this->nearingMemoryLimit()) { unset($this->_categoryObjectCache); if (function_exists('gc_collect_cycles')) { gc_collect_cycles(); } $this->_categoryObjectCache = array(); } } /** * #return bool */ protected function nearingMemoryLimit() { $memory_limit = ini_get('memory_limit'); if (preg_match('/^(\d+)(.)$/', $memory_limit, $matches)) { if ($matches[2] == 'M') { $memory_limit = $matches[1] * 1024 * 1024; // nnnM -> nnn MB } else if ($matches[2] == 'K') { $memory_limit = $matches[1] * 1024; // nnnK -> nnn KB } } return ((memory_get_usage() / $memory_limit) > 0.75); } (The method setCategoryObjectCache() doesn't get called very often. Usually I'm just reading from that array with a method called getCategoryObjectCache()) I built this singleton cache as part of an importer which goes through thousands of products and assigns them to various various categories. However, this requires loading each category model and using it's built in methods to look up parent/child relationships, status, etc. Looking up each category that a product wants to relate to, over and over for every product incurred thousands of duplicate model instantiations for the same category. This cache solves that problem. The issue though, is that I'm worried about the cache getting too big. Magento is notorious for having performance issues because of these huge model objects. So I wrote the above code to look at the current scripts memory usage and clear the cache if the script gets too close to maxing out memory. What I don't know, is whether this is really going to work. PHP garbage collection is a big ball of mystery to me, and I'm not sure if I could just call $this->_categoryObjectCache = array(); and expect PHP to clean up, or if I really needed to call unset($this->_categoryObjectCache); and gc_collect_cycles(); How can I make sure to clear out the RAM when my cache starts getting too big?