I'm having what seems to be a concurrency problem while using MySQL and PHP + Propel 1.3. Below is a small example of the "save" method of a Propel object.
public function save(PropelPDO $con = null) {
$con = Propel::getConnection();
try {
$con->beginTransaction();
sleep(3); // ignore this, used for testing only
parent::save($con);
$foo = $this->getFoo(); // Propel object, triggers a SELECT
// stuff is happening here...
$foo->save($con);
$con->commit();
} catch (Exception $e) {
$con->rollBack();
throw $e;
}
}
The problem is the $foo object. Let's say we get two calls of the example method one after another in a very short time. In some cases, if the second transaction reads $foo...
$foo = $this->getFoo();
... before the first transaction has had the chance to save it...
$foo->save($con);
... $foo read by the second transaction will be outdated and bad things will happen.
How can I force the locking of the table Foo objects are stored in so that subsequent transactions can read from it only after the first one has finished its work?
EDIT: The context is a web application. In short, in some cases I want the very first request to do some data modification (which happens between fetching and saving of $foo). All subsequent requests should not be able to do the modification. Whether the modification will occur or not depends on the fetched $foo state (table row attribute). If two transactions fetch the same $foo, the modification will occur twice which causes a problem.
when you load this existing row to the screen/application, load the LastChgDate too. when you save it, use "AND LastChgDate=thevalue". check the affected row count of the update, if it is zero, return an error "someone else has already saved this record", and rollback and other changes. With this logic in place you can only save a row if it the same as when you loaded it. for new rows, INSERT, this is not necessary because they are new.
In MySQL, I think you can use SELECT FOR UPDATE to accomplish the lock.
Another option is to use the GET_LOCK and RELEASE_LOCK MySQL function calls to create named locks that you would use to control access to the resource.
There are some downsides to these approaches. I haven't used them myself very much and they are MySQL specific but they could work for you.
Related
I have a strange problem with \Doctrine\ORM\UnitOfWork::getScheduledEntityDeletions used inside onFlush event
foreach ($unitOfWork->getScheduledEntityDeletions() as $entity) {
if ($entity instanceof PollVote) {
$arr = $entity->getAnswer()->getVotes()->toArray();
dump($arr);
dump($entity);
dump(in_array($entity, $arr, true));
dump(in_array($entity, $arr));
}
}
And here is the result:
So we see that the object is pointing to a different instance than the original, therefore in_array no longer yields expected results when used with stick comparison (AKA ===). Furthermore, the \DateTime object is pointing to a different instance.
The only possible explanation I found is the following (source):
Whenever you fetch an object from the database Doctrine will keep a copy of all the properties and associations inside the UnitOfWork. Because variables in the PHP language are subject to “copy-on-write” the memory usage of a PHP request that only reads objects from the database is the same as if Doctrine did not keep this variable copy. Only if you start changing variables PHP will create new variables internally that consume new memory.
However, I did not change anything (even the created field is kept as it is). The only operations that were preformed on entity are:
\Doctrine\ORM\EntityRepository::findBy (fetching from DB)
\Doctrine\Common\Persistence\ObjectManager::remove (scheduling for removal)
$em->flush(); (triggering synchronization with DB)
Which leads me to think (I might be wrong) that the Doctrine's change tracking method has nothing to do with the issue that I'm experiencing. Which leads me to following questions:
What causes this?
How to reliably check if an entity scheduled for deletion is inside a collection (\Doctrine\Common\Collections\Collection::contains uses in_array with strict comparison) or which items in a collection are scheduled for deletion?
The problem is that when you tell doctrine to remove entity, it is removed from identity map (here):
<?php
public function scheduleForDelete($entity)
{
$oid = spl_object_hash($entity);
// ....
$this->removeFromIdentityMap($entity);
// ...
if ( ! isset($this->entityDeletions[$oid])) {
$this->entityDeletions[$oid] = $entity;
$this->entityStates[$oid] = self::STATE_REMOVED;
}
}
And when you do $entity->getAnswer()->getVotes(), it does the following:
Load all votes from database
For every vote, checks if it is in identity map, use old one
If it is not in identity map, create new object
Try to call $entity->getAnswer()->getVotes() before you delete entity. If the problem disappears, then I am right. Of cause, I would not suggest this hack as a solution, just to make sure we understand what is going on under the hood.
UPD instead of $entity->getAnswer()->getVotes() you should probably do foreach for all votes, because of lazy loading. If you just call $entity->getAnswer()->getVotes(), Doctrine probably wouldn't do anytning, and will load them only when you start to iterate through them.
From the doc:
If you call the EntityManager and ask for an entity with a specific ID twice, it will return the same instance
So calling twice findOneBy(['id' => 12]) should result in two exact same instances.
So it all depends on how both instances are retrieved by Doctrine.
In my opinion, the one you get in $arr is from a One-to-Many association on $votes in the Answer entity, which results in a separate query (maybe a id IN (12)) by the ORM.
Something you could try is to declare this association as EAGER (fetch="EAGER"), it may force the ORM to make a specific query and keep it in cache so that the second time you want to get it, the same instance is returned ?
Could you have a look at the logs and post them here ? It may indicates something interesting or at least relevant to investigate further.
I came up with a very simple job queueing system using PHP, MySQL and cron.
Cron will call a website, which has a function that calls function A() every 2 seconds. A() searches and retrieves a row from table A
Upon retrieving a row, A() will update that row with value 1 in column working
A() then does something to the data in the retrieved row
A() then insert a row in table B with the value obtained during processing step 3.
Problem: I notice that there are sometimes duplicate values in the table B due to function A() retrieving the same row from table A multiple times.
Which part of the design above is allowing the duplicate processing, and how should it be fixed?
Please don't suggest something like rabbitMQ without at least showing how it can be implemented in more details. I read some of their docs and did not understand how to implement it. Thanks!
Update: I have a cron job that calls a page (which calls function c()) every minute. This function c() that does a loop 30 times which calls function A(), using sleep() to delay.
The supplied answer is good, file locks work well, but, since you're using MySQL, I thought I'd answer as well. With MySQL you can implement cooperative asynchronous locking using GET_LOCK and RELEASE_LOCK.
*DISCLAIMER: The examples below are untested. I have successfully implemented something very close to this before, and the below was the general idea.
Let's say you've wrapped this GET_LOCK function in a PHP class called Mutex:
class Mutex {
private $_db = null;
private $_resource = '';
public function __construct($resource, Zend_Db_Adapter $db) {
$this->resource = $resource;
$this->_db = $db;
}
// gets a lock for $this->_resource; you could add a $timeout value,
// to pass as a 2nd parameter to GET_LOCK, but I'm leaving that
// out for now
public function getLock() {
return (bool)$this->_db->fetchOne(
'SELECT GET_LOCK(:resource)',
array(
':resource' => $this->_resource
));
}
public function releaseLock($resource) {
// using DO because I really don't care if this succeeds;
// when the PHP process terminates, the lock is released
// so there is no worry about deadlock
$this->_db->query(
'DO RELEASE_LOCK(:resource)',
array(
':resource' => $resource
));
}
}
Before A() fetches methods from the table, have it ask for a lock. You can use any string as the resource name.
class JobA {
public function __construct(Zend_Db_Adapter $db) {
$this->_db = $db;
}
public function A() {
// I'm assuming A() is a class method and that the class somehow
// acquired access to a MySQL database - pretend $this->db is a
// Zend_Db instance. The resource name can be an arbitrary
// string - I chose the class name in this case but it could be
// 'barglefarglenarg' or something.
$mutex = new Mutex($this->db, get_class($this));
// I choose to throw an exception but you could just as easily
// die silently and get out of the way for the next process,
// which often works better depending on the job
if (!$mutex->getLock())
throw new Exception('Unable to obtain lock.');
// Got a lock, now select the rows you need without fear of
// any other process running A() getting the same rows as this
// process - presumably you would update/flag the row so that the
// next A() process will not select the same row when it finally
// gets a lock. Once we have our data we release the lock
$mutex->releaseLock();
// Now we do whatever we need to do with the rows we selected
// while we had the lock
}
}
When you engineer a scenario in which multiple processes are selecting and modifying the same data, this kind of thing comes in very handy. When using MySQL, I prefer this database approach to the file locking mechanism, for portability - it's easier to host your app on different platforms if the locking mechanism is external to the filesystem. Sure it can be done, and it works fine, but in my personal experience I found this easier to use.
If you plan on your app being portable across database engines, then this approach will probably not work for you.
One problem could be the processing at first:
Cron will call a function A() that searches and retrieves a row from table A every 2 seconds.
The processing of this part of the script could take longer than two seconds on a table without indexes as such you could pick multiple rows.
You could remedy this with an exclusive file lock.
I have a feeling there is more than just the workflow, if you can show some basic code attached maybe there might be a problem in the code as well.
edit
I think it is timing judging by your last update:
Update: I have a cron job that calls a page (which calls function c())
every minute. This function c() that does a loop 30 times which calls
function A(), using sleep() to delay.
Thats a lot of jumping through hoops and I think you might have a threading problem where crons are overlapping.
I want to make some simple objects/models for unspecified frameworks/systems.
Additionally I want to use MySQL as backend for my data.
My goal is simple implementation - small changes to a configuration file and thats basically it.
My problem is that I'm convinced I need to check, when using my model, that the database tables actually exists - and if not, create the table(s) for me.
I was thinking something like:
<?php
class MyObject
{
public function __construct()
{
$dal->query("SHOW TABLES LIKE 'MyTable'");
if($dal->num_rows() == 0)
$this->_createTables();
}
...
}
?>
But I'm worried about the performance with this model - I'm looking for either confirmation on the efficiency of my solution or a better solution.
In my opinion, and depending on your application's needs, you might be better off checking for this condition only if an error occurs, and to assume that the table exists otherwise. Something like this would avoid the extra query on every instance (you should, however, put the check into its own method).
public function insert($data, $spiralingToDeath=false)
{
// (do actual insertion here)
if ($this->isError) {
// nothing obvious
if ($spiralingToDeath) {
// recursion check
throw new DBException("Tried to create a table and failed.");
} else {
$dal->query("SHOW TABLES LIKE 'MyTable'");
if($dal->num_rows() == 0) {
$this->_createTables();
}
// try again:
$this->insert($data, true);
}
}
}
What about CREATE TABLE IF NOT EXISTS?
Require the user to change the configuration through your editor and make your editor modify the table layout before it saves the configuration.
also, you may want to consider just calling 'show tables', then caching it at the class level in a private var, so you could do a simple isset later, eg.
isset($this->tables_cache[$db_name][$table_name])
this would allow you to scale a bit better with more tables.
you could also save this as a json structure or serialized struct in your filesystem, loading it on __construct of your class, and (re)saving it on __destruct.
I have two complicated PHP objects, each of which has data in a few MySQL tables.
Sometimes, I just need to remove one object A from the database, and that takes 3 SQL statements.
Sometimes, I need to remove one object B from the database, which is takes 4 SQL statements, and which also needs to find and remove all of the object A's that object B owns.
So inside the function delete_A(), I execute those statements inside of a transaction. Inside of the function that delete_B(), I want to run one great big transaction that covers the activities inside of delete_A(). If the whole atom of deleting a B fails, I need to restore all of its A's in the rollback.
How do I update the definition of delete_A() to only open a new transaction if there isn't already a bigger transaction running.
I expected to be able to do something like this, but the autocommit attribute doesn't appear to get changed by beginTransaction()
function delete_A($a){
global $pdo;
$already_in_transaction = !$pdo->getAttribute(PDO::ATTR_AUTOCOMMIT);
if(!$already_in_transaction){
$pdo->beginTransaction();
}
//Delete the A
if(!$already_in_transaction){
$pdo->commit();
}
}
function delete_B($b){
global $pdo;
$pdo->beginTransaction();
foreach($list_of_As as $a){
delete_A($a);
}
$pdo->commit();
}
PDO::ATTR_AUTOCOMMIT is not an indicator attribute, it's a control attribute. It controls whether SQL statements implicitly commit when they finish.
You can call PDO::inTransaction() which returns 0 if you have no transaction in progress, and 1 if you have a transaction outstanding that needs to be committed or rolled back. However, this function is not documented, so it's hard to say if it's safe to depend on it being present in all future versions of PDO.
I recommend that PHP developers don't try to manage transactions within function or class scope. You should manage transactions at the top-level of the application.
See also:
How do detect that transaction has already been started?
Multiple Service Layers and Database Transactions
I ended up just using the exception thrown by ->beginTransaction() to figure out whether I was in a transaction, and using that to decide whether to commit in the inner loop. So delete_A() ended up looking like:
function delete_A($a){
global $pdo;
try {
$pdo->beginTransaction();
} catch (PDOException $e) {
$already_in_transaction = true;
}
//Delete the A
if(!$already_in_transaction){
$pdo->commit();
}
}
And delete_B() works without modification.
One way is to create your own PDOConnect class, which has a $hasTransaction variable. Then you just check that. An example can be found on the comments of the beginTransaction function on php.net here http://www.php.net/manual/en/pdo.begintransaction.php#81022
That would be my preference. Granted that example will need tweaking and dressing up etc, but should be a good foundation to start from.
Side Note Remember your table should be INNODB for transactions to work. And since you need to be in INNODB for transactions to work, you should take prodigitalson's advice and use foreign key constraints.
The question is if a database connection should be passed in by reference or by value?
For me I'm specifically questioning a PHP to MySQL connection, but I think it applies to all databases.
I have heard that in PHP when you pass a variable to a function or object, that it is copied in memory and therefore uses twice as much memory immediately. I have also heard that it's only copied once changes have been made to the value (such as a key being added/removed from an array).
In a database connection, I would think it's being changed within the function as the query could change things like the last insert id or num rows. (I guess this is another question: are things like num rows and insert id stored within the connection or an actual call is made back to the database?)
So, does it matter memory or speed wise if the connection is passed by reference or value? Does it make a difference PHP 4 vs 5?
// $connection is resource
function DoSomething1(&$connection) { ... }
function DoSomething2($connection) { ... }
A PHP resource is a special type that already is a reference in itself. Passing it by value or explicitly by reference won't make a difference (ie, it's still a reference). You can check this for yourself under PHP4:
function get_connection() {
$test = mysql_connect('localhost', 'user', 'password');
mysql_select_db('db');
return $test;
}
$conn1 = get_connection();
$conn2 = get_connection(); // "copied" resource under PHP4
$query = "INSERT INTO test_table (id, field) VALUES ('', 'test')";
mysql_query($query, $conn1);
print mysql_insert_id($conn1)."<br />"; // prints 1
mysql_query($query, $conn2);
print mysql_insert_id($conn2)."<br />"; // prints 2
print mysql_insert_id($conn1); // prints 2, would print 1 if this was not a reference
Call-time pass-by-reference is being depreciated,so I wouldn't use the method first described. Also, generally speaking, resources are passed by reference in PHP 5 by default. So having any references should not be required, and you should never open up more than one database connection unless you really need it.
Personally, I use a singleton-factory class for my database connections, and whenever I need a database reference I just call Factory::database(), that way I don't have to worry about multiple connections or passing/receiving references.
<?php
Class Factory
{
private static $local_db;
/**
* Open new local database connection
*
* #return MySql
*/
public static function localDatabase() {
if (!is_a(self::$local_db, "MySql")) {
self::$local_db = new MySql(false);
self::$local_db->connect(DB_HOST, DB_USER, DB_PASS, DB_DATABASE);
self::$local_db->debugging = DEBUG;
}
return self::$local_db;
}
}
?>
It isn't the speed you should be concerned with, but the memory.
In PHP 4, things like database connections and resultsets should be explicitly passed by reference. In PHP 5, this is done automatically, so you don't have to make it explicit.
BTW, singleton methods for creating database handles are a good idea: you can do $db = & Database::Connection(); and always get the correct handle. This saves you from using a global and the static method can do extra magic (like opening it automatically) for you. Just be careful of when your application scales enough that it needs multiple databases: then your magic function will have to know how to hand you back the correct one. IME this is not hugely difficult; the basic way to solve that is for the code layer that needs the DB handle to know how to ask for the correct one.
A database connection does not actually hold the underlying values, so you don't have to worry about losing assignments made inside a function. Metaphorically, you can think of a DB connection as, say, a runway number -- "OK, DB Connection 12 is cleared to be used for a query" -- The query and result set use the connection, and may need exclusive access for awhile, but the connection does not know anything about the underlying data.
A few people have said that you don't need to worry about this for PHP 5. This is incorrect, if you have a database OBJECT that you're using for all access. In that case, you do need to pass by reference, otherwise it instantiates a new DB object, which (often) creates a new connection to the database.
I discovered this using XDebug & WinCacheGrind, which kindly shows all the destructors that get called - in my case, a halfdozen or more database objects.
To clarify: The reason I point this out is that this is a common way of using Database connections, instead of the raw connection resource.
i don't really have a specific answer for php, but in general it would seem to me that you would want to pass this by reference if you are not explicitly sure that you encounter performance issues when passing by value.
Generally speaking, references are not faster in PHP. It's a common misconception, because they are semantically similar to C pointers, so people familiar with C often assume they work the same way. Not so. In fact, references are a tiny bit slower than copies, unless you actually assign to the variable (Which in any case is bad style, unless the variable is an object).
PHP has a mechanism called copy-on-write, which means that a variable isn't actually copied before it needs to. You can pass a huge data structure to a function; As long as it just reads from it, it makes no difference. A reference however, needs an additional entry in the internal registers, so it would actually take some extra processing (Though barely noticeable).