I've been working on converting an application of mine from CodeIgniter to Phalcon. I've noticed that [query heavy] requests that only took a maximum of 3 or 4 seconds using CI are taking up to 30 seconds to complete using Phalcon!
I've spent days trying to find a solution. I've tried using all the different means of access offered by the framework including submitting raw query strings directly to Phalcon's MySql PDO adapter.
I'm adding my database connection to the service container exactly like it is shown in Phalcon's INVO tutorial:
$di->set('db', function() use ($config) {
return new \Phalcon\Db\Adapter\Pdo\Mysql(array(
"host" => $config->database->host,
"username" => $config->database->username,
"password" => $config->database->password,
"dbname" => $config->database->name
));
});
Using webgrind output I've been able to narrow the bottleneck down to the constructor in Phalcon's PDO adapter class (cost is in milliseconds):
I've already profiled and manually tested the relevant SQL to make sure the bottleneck isn't in the database (or my poorly constructed SQL!)
I've discovered the problem, which to me wasn't immediately apparent, so hopefully others will find this useful as well.
Every time a new query was started, the application was getting a new instance of the database adapter. The request which produced the webgrind output above had a total of 20 queries.
While re-reading Phalcon's documentation section on dependency injection I saw that services can optionally be added to the service container as a "shared" service, which effectively forces the object to act as a singleton, meaning that once one instance of the class is created, the application will simply pass that instance to any request instead of creating a new instance.
There are several methods to force a service to be added as a shared service, details of which can be found here in Phalcon's Documentation:
http://docs.phalconphp.com/en/latest/reference/di.html#shared-services
Changing the code posted above to be added as a shared service looks like this:
$di->setShared('db', function() use ($config) {
return new \Phalcon\Db\Adapter\Pdo\Mysql(array(
"host" => $config->database->host,
"username" => $config->database->username,
"password" => $config->database->password,
"dbname" => $config->database->name
));
});
Here's what the webgrind output looks like for the same query referenced above, but after setting the database service to be added as a shared service (cost in milliseconds):
Notice that the invocation count is now 1 instead of 20, and the invocation cost dropped from 20 seconds down to 1 second!
I hope someone else finds this useful!
In most examples services are shared as de facto, not in the most apparent way though, but via:
$di->set('service', …, true);
The last bool argument passed to the set makes it shared and in 99.9% you'd want your DI services to be that way, otherwise similar things would happen as described by #the-notable, but because they are likely to be not as "impactful", they would be hard to trace down.
Related
I've managed to get a stable load balanced front end servers that can scale horizontally quite well however the next bottle neck would be the db. There was a blog post discussing scaling dbs horizontally however very little detail on it. I'm currently using PostgreSQL and so the only plugin I've found wouldn't work.
Are my only options creating my own HAProxy or rewriting the PostgreSQL plugin to allow connections with read replicas?
I'm using AWS for all my hosting
Firstly - I'd love to be corrected on this!
Having only had a quick look through some of the ORM classes in a SilverStripe 3.5 site, it looks like while the ORM does support multiple database connections (see DB::get_conn with argument for name) it is designed for specific use cases in mind. That is to say, you may have a module that needs to write to a specific database, so this would allow it to.
What you want is native and automatic support for this within the framework, so that all reads go to your slave(s) and writes go to your master. Unfortunately, it doesn't look like this comes out of the box. You might be able to achieve it by overloading a couple of the core SQL classes using the injector.
If you were to try it, this answer outlines how you could separate select statements out from the rest and run them through a different database connector.
As a quick example of how you might go at achieving this with SQLSelect, you will notice that it is injectable, which means you can easily overload it.
File: mysite/_config/injector.yml
Injector:
SQLSelect:
class: ReadOnlySQLSelect
You need to register a new database connection with the DB class:
File: mysite/_config.php
$readDatabaseConfig = array(/** define your DB credentials here, as with the default $databaseConfig **/);
if (!DB::connect($readDatabaseConfig, 'default_read')) {
user_error('Failed to connect to read replica DB!', E_USER_ERROR);
}
Now, overload the SQLSelect class and replace the parts of it that call the DB class methods. This class inherits from SQLExpression which is the class the contains the methods you actually care about in this instance:
File: mysite/code/ReadOnlySQLSelect.php
class ReadOnlySQLSelect extends SQLSelect
{
public function sql(&$parameters = array())
{
// Changed from SQLExpression: third parameter passed as connection name
$sql = DB::build_sql($this, $parameters, 'default_read');
if (empty($sql)) {
return null;
}
if ($this->replacementsOld) {
$sql = str_replace($this->replacementsOld, $this->replacementsNew, $sql);
}
return $sql;
}
public function execute()
{
$sql = $this->sql($parameters);
// Changed from SQLExpression: skip DB::prepared_query since it doesn't allow
// you to provide the connection name - replace it with its contents instead.
$conn = DB::get_conn('default_read');
return $conn->preparedQuery($sql, $parameters);
}
}
Note: SQLSelect::unlimitedRowCount should technically be replaced where it calls DB::prepared_query, since the prepared query method calls DB::get_conn with no arguments, so will always return the default connection. You could replace the DB::prepared_query line the same as used above:
$conn = DB::get_conn('default_read');
$result = $conn->preparedQuery($sql, $innerParameters);
If you implement the above method, also change new SQLSelect() to SQLSelect::create(), otherwise you'll end up with some queries that still hit the master server because it'll bypass your class by not using the injector.
There's also an instance in SQLConditionalExpression that you should replace too (::toSelect) but that is likely to affect query transformations from other child implementations of that class, and you won't be able to do much about it without either (A) PRing a fix to the framework or (B) overloading all the other SQL* classes.
At this point you should have everything you need to route select queries to your default_read connection.
Infrastructure
On the infrastructure side, you should be able to set up read replicas through the RDS console. When you do so it will provide you with a DNS endpoint for your replica node(s), which you can use in your _config.php to configure the connection to the read replica database.
If this works for you, you should create a module for it and put it up on GitHub - this would definitely be useful for others in future!
You may also consider making pull requests to the framework to add additional arguments to methods like DB::prepared_query to accept a connection name.
Also worth noting is that if you're using the mysqlnd database adapter you may be able to take advantage of read/write splitting, implemented with some sort of injector overloading but all handled at a lower level than the application layer.
Imagine that you have interfaces which describe the data access layer of your application. You haven't decided yet what kind of storing mechanism you want to use, you just want to make sure, that whatever you choose, it will handle concurrent requests well. For that you have to write concurrency tests against those interfaces.
I think a schematic concurrency test should be something like this:
public function testMoneyIsNotLostByConcurrentTransfers(){
$accountRepository = DataAccessLayer::getBankAccountRepository();
$accountOfTom = $accountRepository->create(array(
'owner' => 'Tom',
'balance' => new Money(10000)
));
$accountOfBob = $accountRepository->create(array(
'owner' => 'Bob',
'balance' => new Money(10000)
));
$accountOfSusanne = $accountRepository->create(array(
'owner' => 'Susanne',
'balance' => new Money(10000)
));
$this->concurrentExecution(
function () use ($accountOfTom, $accountOfBob){
$accountOfTom->transfer($accountOfBob, new Money(5000));
},
function() use ($accountOfTom, $accountOfSusanne){
$accountOfSusanne->transfer($accountOfTom, new Money(5000));
}
);
$this->assertEquals($accountOfTom->getBalanceAmount(), 10000);
$this->assertEquals($accountOfBob->getBalanceAmount(), 15000);
$this->assertEquals($accountOfSusanne->getBalanceAmount(), 5000);
}
Is it possible to write such tests, test runner in PHP? Or is there any existing tool which can help by concurrency testing in PHP?
I could not find any test runner for such concurrency tests. I found only paratest, which can run independent tests, like unit tests parallel.
According to PHP - parallel task runner the best option I think is using pthreads with debug_backtrace. I think it will be hard even with that. I am looking forward the installing problems, thread safety, resource sharing difficulties, backtrace bugs, etc... I will have a great time I am sure...:S
I found async calls in the pthreads examples.
If I ever manage to solve this, I will share it on github and add a link here. Until then...
update
I just realized that I don't need multi thread or multi process applications to test concurrency. For example I can start two transactions with 2 database connections from the same php file. What I need is add event triggering for the statements the db driver does, so I can add breakpoints and wait the other task wherever I want. File locking is just the same... So coroutines or some hand made multi tasking and statement logging is just enough...
Concurrency should be built into your saving mechanism, not the execution layer.
For example, if you are using SQL, instead of setting the variable use += and -=.
I am running some tests on a test a cluster I have set up. Right now, I have a three node cluster with one master, one slave and one arbiter.
I have a connection string like
mongodb://admin:pass#the_slave_node,the_master_node
I was under the impression that one of the features inherent in the connection string was that supplying more than one host would introduce a certain degree of resiliency on the client side. I was expecting that when I took down the_slave_node that the php driver should have moved on and try connecting to the_master_node, however this doesn't seem to be the case and instead I get the error:
The MongoCursor object has not been correctly initialized by its constructor
I know that MongoClient is responsible for making the initial connections, and indeed it is that way in the code. So this error is an indication to me that the MongoClient didn't connect properly and I didn't implement correct error checking. However that is a different issue --
How do I guarantee that the MongoClient will connect to at least one of the hosts in the hosts csv in the event at least one host is up and some hosts are down?
Thank you
The MongoCursor object has not been correctly initialized by its constructor
This error should only ever happen if you are constructing your own MongoCursor and overwriting its constructor.
This would happen with for example
class MyCursor extends MongoCursor {
function __construct(MongoClient $connection, $ns , array $query = array(), array $fields = array()) {
/* Do some work, forgetting to call parent::__construct(....); */
}
}
If you are not extending any of the classes, then this error is definetly a bug and you should report it please :)
How do I guarantee that the MongoClient will connect to at least one of the hosts
Put at least one member of each datacenter into your seed list.
Tune the various timeout options, and plan for the case when the primary is down (e.g. which servers should you be reading from instead?)
I suspect you may have forgotten to specify the "replicaSet" option, since you mention your connection string without it?
The following snippet is what I recommend (adapt at will), especially when you require full consistency when possible (e.g. always reading from a primary).
<?php
$seedList = "hostname1:port,hostname2:port";
$options = array(
// If the server is down, don't wait forever
"connectTimeoutMS" => 500,
// When the server goes down in the middle of operation, don't wait forever
"socketTimeoutMS" => 5000,
"replicaSet" => "ReplicasetName",
"w" => "majority",
// Don't wait forever for majority write acknowledgment
"wtimeout" => 500,
// When the primary goes down, allow reading from secondaries
"readPreference" => MongoClient::RP_PRIMARY_PREFERRED,
// When the primary is down, prioritize reading from our local datacenter
// If that datacenter is down too, fallback to any server available
"readPreferenceTags" => array("dc:is", ""),
);
try {
$mc = new MongoClient($seedList, $options);
} catch(Exception $e) {
/* I always have some sort of "Skynet"/"Ground control"/"Houston, We have a problem" system to automate taking down (or put into automated maintenance mode) my webservers in case of epic failure.. */
automateMaintenance($e);
}
PHP's Mongo driver lacks a renameCommand function. There is reference to do this through the admin database. But it seems more recent versions of the Mongo driver don't let you just "use" the admin database if do don't have login privileges on that database. So this method no longer works. I've also read this doesn't work in sharded environments although this isn't a concern for me currently.
The other suggestion people seem to have is to iterate through the "from" collection and insert into the "to" collection. With the proper WriteConcern (fire and forget) this could be fairly fast. But it still means pulling down each record over the network into the PHP process and then uploading it back over the network back into the database.
I ideally want a way to do it all server-side. Sort of like an INSERT INTO ... SELECT ... in SQL. This way it is fast, network efficient and a low load on PHP.
I have just tested this, it works as designed ( http://docs.mongodb.org/manual/reference/command/renameCollection/ ):
$mongo->admin->command(array('renameCollection'=>'ns.user','to'=>'ns.e'));
That is how you rename an unsharded collection. One problem with MR is that it will change the shape of the output from the original collection. As such it is not very good at copying a collection. You would be better off copying it manually if your collection is sharded.
As an added note I upgraded to 1.4.2 (which for some reason comes out from the pecl channel into phpinfo() as 1.4.3dev :S) and it still works.
Updates:
Removed my old map/reduce method since I found out (and Sammaye pointed out) that this changes the structure
Made my exec version secondary since I found out how to do it with renameCollection.
I believe I have found a solution. It appears some versions of the PHP driver will auth against the admin database even though it doesn't need to. But there is a workaround where the authSource connection param is used to change this behavior so it doesn't auth against the admin database but instead the database of your choice. So now my renameCollection function is just a wrapper around the renameCollection command again.
The key is to add authSource when connecting. In the below code $_ENV['MONGO_URI'] holds my connection string and default_database_name() returns the name of the database I want to auth against.
$class = 'MongoClient';
if( !class_exists($class) ) $class = 'Mongo';
$db_server = new $class($_ENV['MONGO_URI'].'?authSource='.default_database_name());
Here is my older version that used eval which should also work although some environments don't allow you to eval (MongoLab gives you a crippled setup unless you have a dedicated system). But if you are running in a sharded environment this seems like a reasonable solution.
function renameCollection($old_name, $new_name) {
db()->$new_name->drop();
$copy = "function() {db.$old_name.find().forEach(function(d) {db.$new_name.insert(d)})}";
db()->execute($copy);
db()->$old_name->drop();
}
you can use this. "dropTarget" flag is true then delete exist database.
$mongo = new MongoClient('_MONGODB_HOST_URL_');
$query = array("renameCollection" => "Database.OldName", "to" => "Database.NewName", "dropTarget" => "true");
$mongo->admin->command($query);
My latest idea for do settings across my php project I am building was to store all my settings in a config PHP file, the php file will just return an array like this...
<?php
/**
* #Filename app-config.php
* #description Array to return to our config class
*/
return array(
'db_host' => 'localhost',
'db_name' => 'socialnetwork',
'db_user' => 'root',
'db_password' => '',
'site_path' => 'C:/webserver/htdocs/project/',
'site_url' => 'http://localhost/project/',
'image_path' => 'C:/webserver/htdocs/fproject/images/',
'image_url' => 'http://localhost/project/images/',
'site_name' => 'test site',
'admin_id' => 1,
'UTC_TIME' => gmdate('U', time()),
'ip' => $_SERVER['REMOTE_ADDR'],
'testtttt' => array(
'testtttt' => false
)
);
?>
Please note the actual config array is MUCH MUCH larger, many more items in it...
Then I would have a Config.class.php file that would load my array file and use the magic method __get($key). I can then autoload my config class file and access any site settings like this...
$config->ip;
$config->db_host;
$config->db_name;
$config->db_user;
So I realize this works great and is very flexible, in my class I can have it read in a PHP file with array like I am doing now, read INI file into array, read XML file into array, read JSON file into array. So it is very flexible for future projects but I am more concerned about performance for this particular project that I am working on now, it will be a social network site like facebook/myspace and I have had one before prior to this project and once I got around 100,000 user's performance became very important. So I am not "micro-optimizing" or "premature optimizing" I am stricly looking to do this the BEST way with performance in mind, it does not need to be flexible as I will only need it on this project.
So with that information, I always read about people trying to eliminate function calls as much as possible saying function calls cause more overhead. SO I am wanting to know from more experienced people what you think about this? I am new to using classes and objects in PHP, so is calling $config->db_user; as costly as calling a function in procedural like this getOption('db_user'); ? I am guessing it is the same as every time I would call a setting it is using the __get() method.
So for best performance should I go about this a different way? Like just loading my config array into a bootstrap file and accessing items when I need them like this...
$config['db_host'];
$config['db_username'];
$config['db_password'];
$config['ip'];
Please give me your thoughts on this without me having to do a bunch of benchmark test
From tests I've seen, I believe Alix Axel's response above is correct with respect to the relative speed of the four methods. Using a direct methods is the fastest, and using any sort of magic method usually is slower.
Also, in terms of optimization. The biggest performance hit for any single request in the system you describe will probably be the parsing of the XML/INI/JSON, rather than the accessing of it via whichever syntax you decide to go with. If you want to fix this, store the loaded data in APC once you parse it. This will come with the one caveat that you will want to only store static data in it, and not dynamic things like the UTC date.
Firstly, instead of an included file that returns an array I would instead use an .ini file and then use PHP's parse_ini_file() to load the settings.
Secondly, you shouldn't worry about function calls in this case. Why? Because you might have 100,000 users but if all 100,000 execute a script and need some config values then your 100,000 function calls are distributed over 100,000 scripts, which will be completely irrelevant as far as performance goes.
Function calls are only an issue if a single script execution, for example, executes 100,000 of them.
So pick whichever is the most natural implementation. Either an object or an array will work equally well. Actually an object has an advantage in that you can do:
$db = $config->database->hostname;
where $config->database can implicitly load just the database section of the INI file and will create another config object that can return the hostname entry. If you want to segment your config file this way.
IMO these are the fastest methods (in order):
$config['db_user']
$config->db_user directly
$config->db_user via __get()
getOption('db_user') via __get()
Also, you've already asked a lot of questions about your config system, not that I mind but I specifically remembered that you asked a question about whether you should use parse_ini_file() or not.
Why are you repeating the basically same questions over and over again?
I think you're taking premature optimization to a whole new level, you should worry about the performance of 100,000 users iff and when you get 50,000 users or so, not now.