PHP - data access concurrency testing

PHP - data access concurrency testing - php

Imagine that you have interfaces which describe the data access layer of your application. You haven't decided yet what kind of storing mechanism you want to use, you just want to make sure, that whatever you choose, it will handle concurrent requests well. For that you have to write concurrency tests against those interfaces.
I think a schematic concurrency test should be something like this:
public function testMoneyIsNotLostByConcurrentTransfers(){
$accountRepository = DataAccessLayer::getBankAccountRepository();
$accountOfTom = $accountRepository->create(array(
'owner' => 'Tom',
'balance' => new Money(10000)
));
$accountOfBob = $accountRepository->create(array(
'owner' => 'Bob',
'balance' => new Money(10000)
));
$accountOfSusanne = $accountRepository->create(array(
'owner' => 'Susanne',
'balance' => new Money(10000)
));
$this->concurrentExecution(
function () use ($accountOfTom, $accountOfBob){
$accountOfTom->transfer($accountOfBob, new Money(5000));
},
function() use ($accountOfTom, $accountOfSusanne){
$accountOfSusanne->transfer($accountOfTom, new Money(5000));
}
);
$this->assertEquals($accountOfTom->getBalanceAmount(), 10000);
$this->assertEquals($accountOfBob->getBalanceAmount(), 15000);
$this->assertEquals($accountOfSusanne->getBalanceAmount(), 5000);
}
Is it possible to write such tests, test runner in PHP? Or is there any existing tool which can help by concurrency testing in PHP?

I could not find any test runner for such concurrency tests. I found only paratest, which can run independent tests, like unit tests parallel.
According to PHP - parallel task runner the best option I think is using pthreads with debug_backtrace. I think it will be hard even with that. I am looking forward the installing problems, thread safety, resource sharing difficulties, backtrace bugs, etc... I will have a great time I am sure...:S
I found async calls in the pthreads examples.
If I ever manage to solve this, I will share it on github and add a link here. Until then...
update
I just realized that I don't need multi thread or multi process applications to test concurrency. For example I can start two transactions with 2 database connections from the same php file. What I need is add event triggering for the statements the db driver does, so I can add breakpoints and wait the other task wherever I want. File locking is just the same... So coroutines or some hand made multi tasking and statement logging is just enough...

Concurrency should be built into your saving mechanism, not the execution layer.
For example, if you are using SQL, instead of setting the variable use += and -=.

Related

Breaking out of a Gearman loop

I have a php application that gets requests for part numbers from our server. At that moment, we reach out to a third party API to gather pricing information to make sure we have the latest pricing for that particular request. Sometimes the third party API is slow or it might be down, so we have a database that stores the latest pricing requests for each particular part number that we can use as a fallback. I'd like to run the request to the third party API and the database in parallel using Gearman. Here is the idea:
Receive request
Through gearman, create two jobs:
Request to third party API
MySQL database lookup
Wait in a loop and return the results based on the following conditions:
If the third party API has completed return that result, return that result immediately
If an elapsed time has passed, (e.g. 2 seconds) and the third party API hasn't responded, return the MySQL lookup data
Using gearman, my thoughts were to either run the two tasks in the foreground and break out of runTasks() within the setCompleteCallback() call, or to run them in the background and check in on the two tasks within a separate loop and check in on the tasks using jobStatus().
Unfortunately, I can't get either route to work for me while still getting access to the resulting data. Is there a a better way, or are there some existing examples of how someone has made this work?

I think you've described a single blocking problem, namely the results of an 3rd-party API lookup. There's two ways you can handle this from my point of view, either you could abort the attempt altogether if you decide that you've run out of time or you could report back to the client that you ran out of time but continue on with the lookup anyway, just to update your local cache just in case it happens to respond slower than you would like. I'll describe how I would go about the former problem because that would be easier.
From the client side:
$request = array(
'productId' => 5,
);
$client = new GearmanClient( );
$client->addServer( '127.0.0.1', 4730 );
$results = json_decode($client->doNormal('apiPriceLookup', json_encode( $request )));
if($results && property_exists($results->success) && $results->success) {
// Use local data
} else {
// Use fresh data
}
This will create a job on the job server with a function name of 'apiPriceLookup' and pass it the workload data containing a product id of 5. It will wait for the results to come back, and check for a success property. If it exists and is true, then the api lookup was successful.
The idea is to set the timeout condition then in the worker task, which completely depends on how you're implementing the API lookup. If you're using cURL (or some wrapper around cURL), you can see the answer to how to detect a timeout here.
From the worker side:
$worker= new GearmanWorker();
$worker->addServer();
$worker->addFunction("apiPriceLookup", "apiPriceLookup", $count);
while ($worker->work());
function apiPriceLookup($job) {
$payload = json_decode($job->workload());
try {
$results = [
'data' => PerformApiLookupForProductId($payload->productId),
'success' => true,
];
} catch(Exception $e) {
$results = ['success' => false];
}
return json_encode($results);
}
This just creates a GearmanWorker object and subscribes it the function of apiPriceLookup. It will call the function apiPriceLookup whenever a client submits a task to the job server. That function calls out to another function, PerformApiLookupForProductId, which should be written so as to throw an exception whenever a timeout condition occurs.
I don't think this would be considered using exceptions to control logic flow, I think timeout conditions generally are exceptional (or should be) events. For instance, Guzzle will throw a GuzzleHttp\Exception\RequestException when it has decided to timeout.

All requests requiring connection to mysql are very very slow (using Phalcon)

I've been working on converting an application of mine from CodeIgniter to Phalcon. I've noticed that [query heavy] requests that only took a maximum of 3 or 4 seconds using CI are taking up to 30 seconds to complete using Phalcon!
I've spent days trying to find a solution. I've tried using all the different means of access offered by the framework including submitting raw query strings directly to Phalcon's MySql PDO adapter.
I'm adding my database connection to the service container exactly like it is shown in Phalcon's INVO tutorial:
$di->set('db', function() use ($config) {
return new \Phalcon\Db\Adapter\Pdo\Mysql(array(
"host" => $config->database->host,
"username" => $config->database->username,
"password" => $config->database->password,
"dbname" => $config->database->name
));
});
Using webgrind output I've been able to narrow the bottleneck down to the constructor in Phalcon's PDO adapter class (cost is in milliseconds):
I've already profiled and manually tested the relevant SQL to make sure the bottleneck isn't in the database (or my poorly constructed SQL!)

I've discovered the problem, which to me wasn't immediately apparent, so hopefully others will find this useful as well.
Every time a new query was started, the application was getting a new instance of the database adapter. The request which produced the webgrind output above had a total of 20 queries.
While re-reading Phalcon's documentation section on dependency injection I saw that services can optionally be added to the service container as a "shared" service, which effectively forces the object to act as a singleton, meaning that once one instance of the class is created, the application will simply pass that instance to any request instead of creating a new instance.
There are several methods to force a service to be added as a shared service, details of which can be found here in Phalcon's Documentation:
http://docs.phalconphp.com/en/latest/reference/di.html#shared-services
Changing the code posted above to be added as a shared service looks like this:
$di->setShared('db', function() use ($config) {
return new \Phalcon\Db\Adapter\Pdo\Mysql(array(
"host" => $config->database->host,
"username" => $config->database->username,
"password" => $config->database->password,
"dbname" => $config->database->name
));
});
Here's what the webgrind output looks like for the same query referenced above, but after setting the database service to be added as a shared service (cost in milliseconds):
Notice that the invocation count is now 1 instead of 20, and the invocation cost dropped from 20 seconds down to 1 second!
I hope someone else finds this useful!

In most examples services are shared as de facto, not in the most apparent way though, but via:
$di->set('service', …, true);
The last bool argument passed to the set makes it shared and in 99.9% you'd want your DI services to be that way, otherwise similar things would happen as described by #the-notable, but because they are likely to be not as "impactful", they would be hard to trace down.

Pattern for Wrapping Shell Commands in a Class

Despite its inadvisability, using PHP's shell commands to interact with non-php system commands remains a common way of quickly achieving certain results in web applications.
Has anyone abstracted out the common use cases into a class library (something in Zend maybe?) that offers a more sane/common way of handling this? Every time I encounter (or have to produce) this kind of code it's a bunch of procedural spaghetti, copy-pasted over and over again. I was wondering if (hoping that) the PHP community had come up with a better way of handling using command line applications in your web/php applications.

Executing commandline applications is nothing dirty. In fact, it's the Unix way. And most mostly it's saner than trying to reimplement e.g. ImageMagick in pure PHP code. (Due to the disparity of its cmdline args, imagemagick is a bad example case if you look for a nice exec() abstraction.)
There isn't much wrapping up you can do. At best you can summarize in-/output to your external binary in a method:
function exec($args) {
$args = implode(" ", array_map("escapeshellcmd", func_get_args()));
$opts = $this->opts();
return `{$this->bin} {$args} {$opts}`;
}
So you just call ->exec("-o", "$file") where needed. Your code can only be gneralized further with specialized exec submethods, if the particular cmdline app has an inherent system in its --argument naming scheme.
Depending on your actual use case, you might be able to stash a few standard options away. I did this for pspell, where you have an almost 1:1 relationship of option names to --cmdline=args:
function opts() {
$map = array(
"--ignore" => $this->ignore,
"--verbose" => $this->verbose,
"--dir={$this->dir}" => isset($this->dir),
);
return implode(" ", array_keys(array_intersect($map, array(1=>1))));
}
A very generic abstraction class for exec/popen (for a wide range of cmdline programs) probably doesn't exist.

php/symfony/doctrine memory leak?

I'm having problems with a batch insertion of objects into a database using symfony 1.4 and doctrine 1.2.
My model has a certain kind of object called "Sector", each of which has several objects of type "Cupo" (usually ranging from 50 up to 200000). These objects are pretty small; just a short identifier string and one or two integers. Whenever a group of Sectors are created by the user, I need to automatically add all these instances of "Cupo" to the database. In case anything goes wrong, I'm using a doctrine transaction to roll back everything. The problem is that I can only create around 2000 instances before php runs out of memory. It currently has a 128MB limit, which should be more than enough for handling objects that use less than 100 bytes. I've tried increasing the memory limit up to 512MB, but php still crashes and that doesn't solve the problem. Am I doing the batch insertion correctly or is there a better way?
Here's the error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes) in /Users/yo/Sites/grifoo/lib/vendor/symfony/lib/log/sfVarLogger.class.php on line 170
And here's the code:
public function save($conn=null){
$conn=$conn?$conn:Doctrine_Manager::connection();
$conn->beginTransaction();
try {
$evento=$this->object;
foreach($evento->getSectores() as $s){
for($j=0;$j<$s->getCapacity();$j++){
$cupo=new Cupo();
$cupo->setActivo($s->getActivo());
$cupo->setEventoId($s->getEventoId());
$cupo->setNombre($j);
$cupo->setSector($s);
$cupo->save();
}
}
$conn->commit();
return;
}
catch (Exception $e) {
$conn->rollback();
throw $e;
}
Once again, this code works fine for less than 1000 objects, but anything bigger than 1500 fails. Thanks for the help.

Tried doing
$cupo->save();
$cupo->free();
$cupo = null;
(But substituting my code) And I'm still getting memory overflows. Any other ideas, SO?
Update:
I created a new environment in my databases.yml, that looks like:
all:
doctrine:
class: sfDoctrineDatabase
param:
dsn: 'mysql:host=localhost;dbname=.......'
username: .....
password: .....
profiler: false
The profiler: false entry disables doctrine's query logging, that normally keeps a copy of every query you make. It didn't stop the memory leakage, but I was able to get about twice as far through my data importing as I was without it.
Update 2
I added
Doctrine_Manager::connection()->setAttribute(Doctrine_Core::ATTR_AUTO_FREE_QUERY_OBJECTS, true );
before running my queries, and changed
$cupo = null;
to
unset($cupo);
And now my script has been churning away happily. I'm pretty sure it will finish without running out of RAM this time.
Update 3
Yup. That's the winning combo.

I have just did "daemonized" script with symfony 1.4 and setting the following stopped the memory hogging:
sfConfig::set('sf_debug', false);

For a symfony task, I also faced to this issue and done following things. It worked for me.
Disable debug mode. Add following before db connection initialize
sfConfig::set('sf_debug', false);
Set auto query object free attribute for db connection
$connection->setAttribute(Doctrine_Core::ATTR_AUTO_FREE_QUERY_OBJECTS, true );
Free all object after use
$object_name->free()
Unset all arrays after use unset($array_name)
Check all doctrine queries used on task. Free all queries after use. $q->free()
(This is a good practice for any time of query using.)
That's all. Hope it may help someone.

Doctrine leaks and there's not much you can do about it. Make sure you use $q->free() whenever applicable to minimize the effect.
Doctrine is not meant for maintenance scripts. The only way to work around this problem is to break you script to parts which will perform part of the task. One way to do that is to add a start parameter to your script and after a certain amount of objects had been processed, the script redirects to itself with a higher start value. This works well for me although it makes writing maintenance scripts more cumbersome.

Try to unset($cupo); after every saving. This should be help. An other thing is to split the script and do some batch processing.

Try to break circular reference which usually cause memory leaks with
$cupo->save();
$cupo->free(); //this call
as described in Doctrine manual.

For me , I've just initialized the task like that:
// initialize the database connection
$databaseManager = new sfDatabaseManager($this->configuration);
$connection = $databaseManager->getDatabase($options['connection'])->getConnection();
$config = ProjectConfiguration::getApplicationConfiguration('frontend', 'prod', true);
sfContext::createInstance($config);
(WITH PROD CONFIG)
and use free() after a save() on doctrine's object
the memory is stable at 25Mo
memory_get_usage=26.884071350098Mo
with php 5.3 on debian squeeze

Periodically close and re-open the connection - not sure why but it seems PDO is retaining references.

What is working for me is calling the free method like this:
$cupo->save();
$cupo->free(true); // free also the related components
unset($cupo);

Best performance for loading settings in PHP?

My latest idea for do settings across my php project I am building was to store all my settings in a config PHP file, the php file will just return an array like this...
<?php
/**
* #Filename app-config.php
* #description Array to return to our config class
*/
return array(
'db_host' => 'localhost',
'db_name' => 'socialnetwork',
'db_user' => 'root',
'db_password' => '',
'site_path' => 'C:/webserver/htdocs/project/',
'site_url' => 'http://localhost/project/',
'image_path' => 'C:/webserver/htdocs/fproject/images/',
'image_url' => 'http://localhost/project/images/',
'site_name' => 'test site',
'admin_id' => 1,
'UTC_TIME' => gmdate('U', time()),
'ip' => $_SERVER['REMOTE_ADDR'],
'testtttt' => array(
'testtttt' => false
)
);
?>
Please note the actual config array is MUCH MUCH larger, many more items in it...
Then I would have a Config.class.php file that would load my array file and use the magic method __get($key). I can then autoload my config class file and access any site settings like this...
$config->ip;
$config->db_host;
$config->db_name;
$config->db_user;
So I realize this works great and is very flexible, in my class I can have it read in a PHP file with array like I am doing now, read INI file into array, read XML file into array, read JSON file into array. So it is very flexible for future projects but I am more concerned about performance for this particular project that I am working on now, it will be a social network site like facebook/myspace and I have had one before prior to this project and once I got around 100,000 user's performance became very important. So I am not "micro-optimizing" or "premature optimizing" I am stricly looking to do this the BEST way with performance in mind, it does not need to be flexible as I will only need it on this project.
So with that information, I always read about people trying to eliminate function calls as much as possible saying function calls cause more overhead. SO I am wanting to know from more experienced people what you think about this? I am new to using classes and objects in PHP, so is calling $config->db_user; as costly as calling a function in procedural like this getOption('db_user'); ? I am guessing it is the same as every time I would call a setting it is using the __get() method.
So for best performance should I go about this a different way? Like just loading my config array into a bootstrap file and accessing items when I need them like this...
$config['db_host'];
$config['db_username'];
$config['db_password'];
$config['ip'];
Please give me your thoughts on this without me having to do a bunch of benchmark test

From tests I've seen, I believe Alix Axel's response above is correct with respect to the relative speed of the four methods. Using a direct methods is the fastest, and using any sort of magic method usually is slower.
Also, in terms of optimization. The biggest performance hit for any single request in the system you describe will probably be the parsing of the XML/INI/JSON, rather than the accessing of it via whichever syntax you decide to go with. If you want to fix this, store the loaded data in APC once you parse it. This will come with the one caveat that you will want to only store static data in it, and not dynamic things like the UTC date.

Firstly, instead of an included file that returns an array I would instead use an .ini file and then use PHP's parse_ini_file() to load the settings.
Secondly, you shouldn't worry about function calls in this case. Why? Because you might have 100,000 users but if all 100,000 execute a script and need some config values then your 100,000 function calls are distributed over 100,000 scripts, which will be completely irrelevant as far as performance goes.
Function calls are only an issue if a single script execution, for example, executes 100,000 of them.
So pick whichever is the most natural implementation. Either an object or an array will work equally well. Actually an object has an advantage in that you can do:
$db = $config->database->hostname;
where $config->database can implicitly load just the database section of the INI file and will create another config object that can return the hostname entry. If you want to segment your config file this way.

IMO these are the fastest methods (in order):
$config['db_user']
$config->db_user directly
$config->db_user via __get()
getOption('db_user') via __get()
Also, you've already asked a lot of questions about your config system, not that I mind but I specifically remembered that you asked a question about whether you should use parse_ini_file() or not.
Why are you repeating the basically same questions over and over again?
I think you're taking premature optimization to a whole new level, you should worry about the performance of 100,000 users iff and when you get 50,000 users or so, not now.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP - data access concurrency testing - php

Concurrency should be built into your saving mechanism, not the execution layer. For example, if you are using SQL, instead of setting the variable use += and -=.

Related

Breaking out of a Gearman loop

All requests requiring connection to mysql are very very slow (using Phalcon)

Pattern for Wrapping Shell Commands in a Class

php/symfony/doctrine memory leak?

Best performance for loading settings in PHP?

Categories

Resources