I am running 2 threads at the same time, but I have critical section where I need to put something in a MySql DB. The problem is that they can put the same thing in at the same time.
I have done some calculations that shows that for indexing 20000 different news pages, the indexes are from 20000 to 20020. (So 0 to 20 are duplicates)
How do I pause one thread while the other is accessing the database?
-----thread.php
class Process extends Thread {
public function __construct($website_url){
$this->website_url = $website_url;
}
public function run() {
work($this->website_url);
}
}
-------------- work
function work($website_url) {
while(condition) {
some work...
if(something->check){ // if this exist in base
mysqli->query("INSERT something IN db...");
prepare bind exec...
}
// between check and insert, second thread can put that element
// critical section is really small but sometimes occurs ...
}
}
------ main.php
$job1 = new Process($website_url,$trigger);
$job2 = new Process($website_url,$trigger);
$job1->start();
$job2->start();
Mutual Exclusion
The simplest way of achieving what you want here is by the use of a single Mutex:
<?php
class Process extends Thread {
public function __construct($url, $mutex) {
$this->url = $url;
$this->mutex = $mutex;
}
public function run() {
work($this->url, $this->mutex);
}
protected $url;
protected $mutex;
}
function work($url, $mutex) {
while (1) {
/* some work */
/* failing to check the return value of calls to acquire
or release mutex is bad form, I haven't done so for brevity */
Mutex::lock($mutex);
{
/* critical section */
printf("working on %s\n", $url);
/* sleeping here shows you that the critical section is
not entered by the second thread, this is obviously not needed */
sleep(1);
}
Mutex::unlock($mutex);
/* breaking here allows the example code to end, not needed */
break;
}
}
$website = "stackoverflow.com";
$lock = Mutex::create();
$jobs = [
new Process($website, $lock),
new Process($website, $lock)
];
foreach ($jobs as $job)
$job->start();
foreach ($jobs as $job)
$job->join();
/* always destroy mutex when finished with them */
Mutex::destroy($lock);
?>
This code should explain itself, I have added a few comments to guide you through it.
Related
I have small code that demonstrate how to perform a race condition in multithreads PHP.
The idea is I and my friend is sharing the pot for cooking. if the pot already have ingredient, so the pot can not cook.
class Pot:
class Pot
{
public $id;
function __construct()
{
$this->id = rand();
}
public $ingredient;
public function cook($ingredient, $who, $time){
if ($this->ingredient==null){
$this->ingredient = $ingredient;
print "pot".$this->id.'/'.$who." cooking ".$this->ingredient. " time spent: ".$time." \n";
sleep($time);
print "pot".$this->id.'/'.$who." had flush ingredient \n";
$this->ingredient = null;
}else{
throw new Exception("Pot still cook ".$this->ingredient);
}
}
}
class Friend:
class Friend extends Thread
{
/**
* #var Pot
*/
protected $pot;
function run() {
Cocking::cleanVegetable("Friend");
print "Friend will cook: \n";
$this->pot->cook("vegetable", 'Friend',4);
Cocking::digVegetable("Friend");
}
public function __construct($pot)
{
$this->pot = $pot;
}
}
class My:
class My
{
/**
* #var Pot
*/
private $pot;
public function doMyJob(){
Cocking::cleanRice("I");
print "I will cook: \n";
$this->pot->cook("rice", "I",10);
Cocking::digRice("I");
}
public function playGame(Friend $friend){
print "play with friend \n";
}
public function __construct($pot)
{
$this->pot = $pot;
}
}
class Coocking:
<?php
class Cocking
{
static function cleanRice($who){
print $who." is cleaning rice \n";
}
static function cleanVegetable($who){
print $who."is cleaning vegetable \n";
}
static function digRice($who){
print $who." is digging rice \n";
}
static function digVegetable($who){
print $who." is digging vegetable \n";
}
}
running script:
require_once "Friend.php";
require_once "My.php";
require_once "Cocking.php";
require_once "Pot.php";
$pot = new Pot();
$friend = new Friend($pot);
$my = new My($pot);
$friend->start();
$my->doMyJob();
$friend->join();
$my->playGame($friend);
that is so wreid that the output never throw exception? that i assume always happen.
root#e03ed8b56f21:/app/RealLive# php index.php
Friendis cleaning vegetable
I is cleaning rice
Friend will cook:
I will cook:
pot926057642/I cooking rice time spent: 10
pot926057642/Friend cooking vegetable time spent: 4
pot926057642/Friend had flush ingredient
Friend is digging vegetable
pot926057642/I had flush ingredient
I is digging rice
play with friend
the Pot had used by me, but my friend still can use it to cook vegetable. that so freak?
i expect the result would be:
Friend will cook:
I will cook:
pot926057642/I cooking rice time spent: 10
PHP Fatal error: Uncaught Exception: Pot still cook rice in /app/RealLive/Pot.php:23
Stack trace:
#0 /app/RealLive/My.php(14): Pot->cook('rice', 'I', 10)
#1 /app/RealLive/index.php(12): My->doMyJob()
#2 {main}
thrown in /app/RealLive/Pot.php on line 23
ps: my env is
PHP 7.0.10 (cli) (built: Apr 30 2019 21:14:24) ( ZTS )
Copyright (c) 1997-2016 The PHP Group
Zend Engine v3.0.0, Copyright (c) 1998-2016 Zend Technologies
Many thanks from your comment.
Your assumption seems to be that your if condition followed by an immediate member assign always needs to run in one go. However, it is entirely possible that Friend runs this line of code in the thread:
if ($this->ingredient==null){
... and concludes to go ahead, but before it reaches the next line that assigns $this->ingredient, execution switches back to My/main thread, where it also gets to this line:
if ($this->ingredient==null){
And since Friend has passed the if but not proceeded to actually assigned the ingredient yet, My can now also pass inside. Whatever runs next doesn't matter, you now got both threads accessing the pot cooking at the same time.
Additional correction/note: it seems like that the example also doesn't work since $this->ingredient isn't a Volatile. However, that would still make it prone to above race condition and hence still a bad idea.
How to do it properly: You really need to use a mutex or synchronized section for proper synchronization. Also, never ever assume threads can't switch in the middle of anywhere, including any two lines like an if followed by a variable assign that was meant as a pair.
Here is the PHP documentation on the synchronized section: https://www.php.net/manual/en/threaded.synchronized.php
Reading and writing a variable in a multithreaded application does not guarantee synchronization, you need some synchronization mechanism, the variable should be declared atomic to ensure that only one thread at a time can access it for reading or writing, to guarantee consistency between the two threads, or using mutex to synchronize accesses between shared resources (lock / trylock / unlock).
What is currently happening is that the two threads run parallel, the ingredient variable takes random values depending on the order of execution, and when the longest sleep ends the application exits.
In the following example I used flock which is one of the simplest systems to synchronize accesses between multiple processes, during the tests I had problems because probably the Friend constructor is not executed in the same thread as the run function of the same instance ... there are a lot of factors to take into consideration, Thread in php seems deprecated to me and the implementation a bit convoluted compared to languages like C.
class Friend extends Thread
{
protected $pot;
function run() {
$this->pot->cook("vegetable", 'Friend',2);
}
public function __construct($pot)
{
$this->pot = $pot;
}
}
class Pot
{
public $id;
public $ingredient;
function __construct()
{
$this->id = rand();
}
public function cook($ingredient, $who, $time)
{
$fp = fopen('/tmp/.flock.pot', 'r');
if (flock($fp, LOCK_EX|LOCK_NB)) {
if ($this->ingredient==null){
$this->ingredient = $ingredient;
print "pot".$this->id.'/'.$who." cooking ".$this->ingredient. " time spent: ".$time." \n";
sleep($time);
print "pot".$this->id.'/'.$who." had flush ingredient \n";
$this->ingredient = null;
}
flock($fp, LOCK_UN);
} else {
// throw new Exception("Pot still cook ".$this->ingredient);
print "ingredient busy for {$this->id}/$who\n";
}
fclose($fp);
}
}
class My
{
private $pot;
public function run(){
$this->pot->cook("rice", "I",3);
}
public function __construct($pot)
{
$this->pot = $pot;
}
}
touch('/tmp/.flock.pot');
$pot = new Pot();
$friend = new Friend($pot);
$my = new My($pot);
$friend->start();
sleep(1); // try comment me
$my->run();
$friend->join();
unlink('/tmp/.flock.pot');
Each thread of program have its own memory. at this example it is Pot and is saved in main memory. and one of threads have read & changed it, and the changed would not reflected to main memory,
So other threads can not be see that changed.
so we should make Pot extends Volatile to make the changed can reflected to main memory.
Or make the block synchronized:
if ($this->ingredient==null)
$this->ingredient = $ingredient;
my web app requires making 7 different soap wsdl api requests to complete one task (I need the users to wait for the result of all the requests). The avg response time is 500 ms to 1.7 second for each request. I need to run all these request in parallel to speed up the process.
What's the best way to do that:
pthreads or
Gearman workers
fork process
curl multi (i have to build the xml soap body)
Well the first thing to say is, it's never really a good idea to create threads in direct response to a web request, think about how far that will actually scale.
If you create 7 threads for everyone that comes along and 100 people turn up, you'll be asking your hardware to execute 700 threads concurrently, which is quite a lot to ask of anything really...
However, scalability is not something I can usefully help you with, so I'll just answer the question.
<?php
/* the first service I could find that worked without authorization */
define("WSDL", "http://www.webservicex.net/uklocation.asmx?WSDL");
class CountyData {
/* this works around simplexmlelements being unsafe (and shit) */
public function __construct(SimpleXMLElement $element) {
$this->town = (string)$element->Town;
$this->code = (string)$element->PostCode;
}
public function run(){}
protected $town;
protected $code;
}
class GetCountyData extends Thread {
public function __construct($county) {
$this->county = $county;
}
public function run() {
$soap = new SoapClient(WSDL);
$result = $soap->getUkLocationByCounty(array(
"County" => $this->county
));
foreach (simplexml_load_string(
$result->GetUKLocationByCountyResult) as $element) {
$this[] = new CountyData($element);
}
}
protected $county;
}
$threads = [];
$thread = 0;
$threaded = true; # change to false to test without threading
$counties = [ # will create as many threads as there are counties
"Buckinghamshire",
"Berkshire",
"Yorkshire",
"London",
"Kent",
"Sussex",
"Essex"
];
while ($thread < count($counties)) {
$threads[$thread] =
new GetCountyData($counties[$thread]);
if ($threaded) {
$threads[$thread]->start();
} else $threads[$thread]->run();
$thread++;
}
if ($threaded)
foreach ($threads as $thread)
$thread->join();
foreach ($threads as $county => $data) {
printf(
"Data for %s %d\n", $counties[$county], count($data));
}
?>
Note that, the SoapClient instance is not, and can not be shared, this may well slow you down, you might want to enable caching of wsdl's ...
I have to analyze a lot of information.
To speed things up I'll be running multiple instances of same script at the same moment.
However there is a big chance scripts would analyze same piece of information(duplicate) which I do not like as it would slow down the process.
If running only 1 instance I solve this problem with array(I save what has been already analyzed).
So I have a question how could I somehow sync that array with other "threads" ?
MySQL is an option but I guess it would be overkill?
I read also about memory sharing but not sure if this is solution I am looking for.
So if anyone has some suggestions let me know.
Regards
This is a trivial task using real multi-threading:
<?php
/* we want logs to be readable so we are creating a mutex for output */
define ("LOG", Mutex::create());
/* basically a thread safe printf */
function slog($message, $format = null) {
$format = func_get_args();
if ($format) {
$message = array_shift($format);
if ($message) {
Mutex::lock(LOG);
echo vsprintf(
$message, $format);
Mutex::unlock(LOG);
}
}
}
/* any pthreads descendant would do */
class S extends Stackable {
public function run(){}
}
/* a thread that manipulates the shared data until it's all gone */
class T extends Thread {
public function __construct($shared) {
$this->shared = $shared;
}
public function run() {
/* you could also use ::chunk if you wanted to bite off a bit more work */
while (($next = $this->shared->shift())) {
slog(
"%lu working with item #%d\n", $this->getThreadId(), $next);
}
}
}
$shared = new S();
/* fill with dummy data */
while (#$o++ < 10000) {
$shared[]=$o;
}
/* start some threads */
$threads = array();
while (#$thread++ < 5) {
$threads[$thread] = new T($shared);
$threads[$thread]->start();
}
/* join all threads */
foreach ($threads as $thread)
$thread->join();
/* important; ::destroy what you ::create */
Mutex::destroy(LOG);
?>
The slog() function isn't necessarily required for your use case, but thought it useful to show an executable example with readable output.
The main gist of it is that multiple threads need only a reference to a common set of data to manipulate that data ...
I've got around 25000 files scattered around many folders which vary between 5MB and 200MB on 2 external hard drives. I need to find out which of these are duplicate, leaving only the unique files on the drives.
Currently im doing md5_file() over each source file and compare these to see if the same file has been found before. The issue with that is, md5_file() could easily take more than 10 seconds to execute and I've seen it even taking up to a minute for some files. If I let this script run in it's current form, that would mean this process will take more than a week to finish.
Note that I'm saving each hash after one has been made, so I dont have to re-hash each file on each run. Thing is that all these files are yet to be hashed.
I'm wondering what I could do to speed this up. I need to finish this in less than 5 days, so a script that takes more than a week is no option. I was thinking multithreading (using pthread) could be a solution, but as the drives are so slow and my CPU is not the issue, I don't think this would help. What else is there I could do?
As you guessed, it's hard to tell if you can see any gains by using threading ...
However, I decided I would write a nice pthreads example based on your idea, I think it illustrates well things you should do while threading ...
Your mileage will vary, but here's the example all the same:
<?php
/* create a mutex for readable logging output */
define ("LOG", Mutex::create());
/* log a message to stdout, use as thread safe printf */
function out($message, $format = null) {
$format = func_get_args();
if ($format) {
$message = array_shift(
$format);
Mutex::lock(LOG);
echo vsprintf(
$message, $format
);
Mutex::unlock(LOG);
}
}
/*
Sums is a collection of sum => file shared among workers
*/
class Sums extends Stackable {
public function run(){}
}
/* Worker to execute sum tasks */
class CheckWorker extends Worker {
public function run() {}
}
/*
The simplest version of a job that calculates the checksum of a file
*/
class Check extends Stackable {
/* all properties are public */
public $file;
public $sum;
/* accept a file and Sums collection */
public function __construct($file, Sums &$sums) {
$this->file = $file;
$this->sums = $sums;
}
public function run(){
out(
"checking: %s\n", $this->file);
/* calculate checksum */
$sum = md5_file($this->file);
/* check for sum in list */
if (isset($this->sums[$sum])) {
/* deal with duplicate */
out(
"duplicate file found: %s, duplicate of %s\n", $this->file, $this->sums[$sum]);
} else {
/* set sum in shared list */
$this->sums[$sum] = $this->file;
/* output some info ... */
out(
"unique file found: %s, sum (%s)\n", $this->file, $sum);
}
}
}
/* start a timer */
$start = microtime(true);
/* checksum collection, shared across all threads */
$sums = new Sums();
/* create a suitable amount of worker threads */
$workers = array();
$checks = array();
$worker = 0;
/* how many worker threads you have depends on your hardware */
while (count($workers) < 16) {
$workers[$worker] = new CheckWorker();
$workers[$worker]->start();
$worker++;
}
/* scan path given on command line for files */
foreach (scandir($argv[1]) as $id => $path) {
/* #TODO(u) write code to recursively scan a path */
$path = sprintf(
"%s/%s",
$argv[1], $path
);
/* create a job to calculate the checksum of a file */
if (!is_dir($path)) {
$checks[$id] = new Check(
$path, $sums);
/* #TODO(u) write code to stack to an appropriate worker */
$workers[array_rand($workers)]->stack($checks[$id]);
}
}
/* join threads */
foreach ($workers as $worker) {
$worker->shutdown();
}
/* output some info */
out("complete in %.3f seconds\n", microtime(true)-$start);
/* destroy logging mutex */
Mutex::destroy(LOG);
?>
Play around with it, see how different numbers of workers affects runtime, and implement your own logic to delete files and scan directories (this is basic stuff you should know already, left out to make for a simple example) ...
You could try to find possible duplicates by only looking at the file size. Then only if multiple files have the same size you need to hash them. This is probably faster, since looking up files sizes is not much of an effort.
I'm doing some broad performance investigation in an application I maintain and I've set up a simple solution to track the execution time of requests, but I'm unable to find information to verify if this is going to be satisfyingly accurate.
This appears to have netted some good information and I've already eliminated some performance issues as a result, but I'm also seeing some confusing entries that make me question the accuracy of the recorded execution time.
Should I add an explicit method call at the end of each calling script to mark the end of its execution or is this (rather tidy) approach using the destructor good enough?
The calling code at the top of the requested script:
if( file_exists('../../../bench') )
{
require('../../../includes/classes/Bench.php');
$Bench = new Bench;
}
And here is the class definition (reduced for clarity):
require_once('Class.php');
class Bench extends Class
{
protected $page_log = 'bench-pages.log';
protected $page_time_end;
protected $page_time_start;
public function __construct()
{
$this->set_page_time_start();
}
public function __destruct()
{
$this->set_page_time_end();
$this->add_page_record();
}
public function add_page_record()
{
$line = $this->page_time_end - $this->page_time_start.','.
base64_encode(serialize($_SERVER)).','.
base64_encode(serialize($_GET)).','.
base64_encode(serialize($_POST)).','.
base64_encode(serialize($_SESSION))."\n";
$fh = fopen( APP_ROOT . '/' . $this->page_log, 'a' );
fwrite( $fh, $line );
}
public function set_page_time_end()
{
$this->page_time_end = microtime(true);
}
public function set_page_time_start()
{
$this->page_time_start = microtime(true);
}
}
What you need to do is use Xdebug. It is very simple once you set it up and don't have to change anything about your code to get full coverage of what took how long.