Doctrine is making duplicated document entries

Doctrine is making duplicated document entries - php

I am trying to insert a embedded document into my document with the following code.
// Add states, for the joining player.
$state = new PlayerState();
$state->setReady(false);
$state->setPlayer($player->getId());
$game->addPlayerState($state);
// Save element.
$dm->persist($game);
$dm->flush();
Problem being, that this generates 2 PlayerState Document like this.
{ "_id" : ObjectId( "513f50a58ead0ee9ac00000f" ),
"ready" : false,
"player" : "513f509f8ead0e8bac00000b" },
{ "_id" : ObjectId( "513f50af8ead0ecdac000015" ),
"ready" :false,
"player" : "513f509f8ead0e8bac00000b" }
Am i saving this in a incorrect way? Let me know, if you need more code.

This seemed to do the trick.
$state = new PlayerState();
$state->setReady(false);
$state->setPlayer($player->getId());
$dm->persist($state);
$dm->flush();
$game->addPlayerState($state);
// Save element.
$dm->flush();
This is hard to explain, but i will give it a try.
You need to persist the embedded document first, otherwise Doctrine will first persist the Document, making a embedded doc only with the values set, acting like a simple data container.
$state->setReady(false);
$state->setPlayer($player->getId());
After, doctrine will persist the embedded document once again, but this time looking at the Document object, assigning ID's, default values etc.
Resulting in 2 entries.

Related

MongoDB: strategies for resolving a race condition

We have a scenario where we need to store multiple feeds under a site model as following:
{
id: site_id
name: site_name
feeds: [
{
url: feed_url_1
date: feed_update_date_1
},
{
url: feed_url_2
date: feed_update_date_2
},
...
]
}
Since feeds is an array, we can update it with $set, $push or $addToSet.
2 different race condition (write skew) may occur when our concurrent application (queue) try to update the same site model.
If we pick $set, and guard duplicate on client side, then if 2 queues are writing to the same site, one feed maybe lost with following sequence.
Given a wordpress site, extract 2 feeds (RSS and ATOM), dispatch to Q1 and Q2.
Q1: load existing feed, check RSS feed is new
Q2: load existing feed, check ATOM feed is new
Q1: $set feeds => [RSS]
Q2: $set feeds => [ATOM]
Now RSS feed is lost.
If we pick $push or $addToSet, then following may happen.
User A added a site, putting RSS feed to Q1
User B added the same site, putting the same RSS feed to Q2
Q1: load existing feed, check RSS feed is new
Q2: load existing feed, check RSS feed is new
Q1: $push RSS
Q2: $push RSS
Now RSS feed has been duplicated
If our data model were simply { url }, then $addToSet will safeguard against duplicate feed. But unfortunately this is not the case, the date attribute may differ. So $addToSet is not much safer than $push.
We have thought of a few possible workaround to this problem, but none are great given our tight schedule.
Decouple feeds from site into its own collection, safeguard with url alone, and change our model and repository accordingly.
Insert a partial { url } into the site model first, then update them with addition information, this should makes $addToSet usable, but may break other queue that require date to always be present (testing needed).
Let race condition happen as-is, $push the feed first, use a background queue to detect duplicate and remove them later.
(There might be a 4th solution if upsert work with positional query, but as far as I know MongoDB v2.4 doesn't have it yet)
So I wonder whether there are better alternative for resolving this kind of race condition. Or if there are some best practices for it.

you might want to have a look at tokumx, a fork of mongodb which supports transactions (besides a few other usefull things)

You can use a gard on the update selector:
alice(mongod-2.4.8) test> db.foo.save({_id: 12 })
Updated 1 new record(s) in 1ms
alice(mongod-2.4.8) test> db.foo.update({ _id: 12, "feeds.url" : {$ne: "baz"} },
{ $push : { feeds : { url: "baz" } } } )
Updated 1 existing record(s) in 1ms
alice(mongod-2.4.8) test> db.foo.update({ _id: 12, "feeds.url" : {$ne: "baz"} },
{ $push : { feeds : { url: "baz" } } } )
Updated 0 record(s) in 1ms
alice(mongod-2.4.8) test> db.foo.find({_id: 12 })
{
"_id": 12,
"feeds": [
{
"url": "baz"
}
]
}
Fetched 1 record(s) in 1ms -- Index[_id_]

Stored function used in insert trough PHP runs multiple times

I'm trying to understand a strange behavior using PHP with mongodb 2.4.3 win32.
I try to have server side generated sequence ids.
When inserting documents using a stored function as one of the parameters it seems that the stored function is called several times at each insertion.
Let's say I have a counter initialized like this:
db.counters.insert( { _id: "uqid", seq: NumberLong(0) } );
I have a stored function named getUqid which is defined as
db.system.js.save(
{ _id: "getUqid",
value: function () {
var ret = db.counters.findAndModify(
{ query: { _id: "uqid" },
update: { $inc: { seq: NumberLong(1) } },
new: true
} );
return ret.seq;
}
} );
When I do three insertions like this:
$conn->test->ads->insert(['qid' => new MongoCode('getUqid()') , 'name' => "Sarah C."]);
I get something like that:
db.ads.find()
{ "_id" : ObjectId("51a34f8bf0774cac03000000"), "qid" : 17, "name" : "Sarah C." }
{ "_id" : ObjectId("51a34f8bf0774cac03000001"), "qid" : 20, "name" : "Michel D." }
{ "_id" : ObjectId("51a34f8bf0774cac03000002"), "qid" : 23, "name" : "Robert U." }
Any clue why qid is getting stepped by 3 ? It should mean that I received three call to my stored function right ?
Thanks in advance for your help, Regards.
PS: secondary question: are NumberLong still required to be sure we have 64bit unsigned integer in internal mongodb storage ? Any command to cross-check that in the shell ?

Cross-referencing this question with PHP-841. From the PHP side of things, you're actually storing a BSON code value in the qid field. You can likely verify that when fetching results back from the database or doing a database export with the mongodump command.
The issue is with the JS shell wrongfully evaluating the code type upon display, and that's the point where findAndModify is executed. This fix should be included in a subsequent server release.
In the meantime, Sammaye's suggestion to call findAndModify from PHP is the best option for this sort of functionality. Coincidentally, it is also what is done in Doctrine MongoDB ODM (see: IncrementGenerator). It does require an additional round trip to the server, but that is necessary since MongoDB has no facility for executing JS callbacks during a write operation.
If minimizing the round-trips to MongoDB is of utmost importance, you could insert the documents by executing server-side JS through PHP with MongoDB::execute() and do something like returning the generated ID(s) as the command response. Of course, that's generally not advisable and JS evaluation has its own caveats.

php mongodb count nested attributes

I have the following document structure:
"messages": {
"_id" : ObjectId("515a4de9c1a3c09c19000001"),
"author" : "50fd0d38c1a3c04c27000000",recipient" : "5159a292c1a3c01d5b000005",
"conversation" : [
{
"author" : "50fd0d38c1a3c04c27000000",
"date" : ISODate("2013-04-02T03:18:01.204Z"),
"message" : "hello test",
"read" : false
},
{
"message" : "reply test",
"date" : ISODate("2013-04-02T03:36:57.444Z"),
"author" : "5159a292c1a3c01d5b000005",
"read" : true
}....
Is it possible to get the total number of conversation messages where conversation.read is false and the author (conversation.author) of the unread message is not the person who started the conversation?
I'm currently finding documents that have conversations.read as false and then looping through them in PHP to check the conversations.author field. This works but I'm afraid when I get lots of data it will slow down.
I'm using the PHP MongoDB driver.
I have tried this..
$unread_conversations = $db->messages->find(
array(
'conversation.read'=>false
'conversation.author'=>array('$ne'=>$_SESSION['account']['_id']->__toString())
));
but thats not working because I want the messages that are not read and not an author.

This is not possible with regular mongo queries. You have to use map-reduce or mongos built in aggregation framework.
To accomplish your goal with current schema, you have to unwind nested list first.
Other solution is to change your schema and store each conversation as a separated document (with parent filed which contains the id of parent conversation).
And just as a side note: don't store id fields as string, always use ObjectId to store mongo ids.

How do you update a mongodb document while replacing the entire document?

If I have a document with two values:
{
"name" : "Bob",
"location" "France"
}
And then pass an array to the document that contains "name", "country". How do I ensure that the entire document is updated, with "location" removed?
I will not be aware of the differences between the two documents, so I wont be able to unset a specific field. I am looking for a method to simply replace the document data with the array supplied.

If you supply update with a new document containing the records you want, you will replace the document.
For example:
db.so.insert({"name":"Bob", "location": "France"})
db.so.update({"name":"Bob", "location": "France"}, {"name":"Roberto", "country":"Italy"})
db.so.find()
{ "_id" : ObjectId(), "name" : "Roberto", "country" : "Italy" }
Of course, if you had the _id of the document, this would be a better way of specifying the update (rather than passing back the document you just inserted like above)

Drupal - How to update a CCK NodeReference field programmatically?

I'm trying to create a node (B type) & assign it to a A type node's CCK nodereference field using node_save() method.
$node_type_A = node_load($some_nid);
$node_type_A->field_type_B_node_ref[]['nid'] = $node_type_B_nid;
$node_type_A = node_submit($node_type_A);
node_save($node_type_A);
As the result, a new B type node will be created, but no reference will be assigned to the A type node. any help would be appreciated.

GApple is right, the format is correct, but there are couple of things that you might want to care about.
Delta Value
First you need to know the delta value of the latest node reference attached to $node_type_A, the delta is actually a partial index, when combined with vid field of the $node_type_A, they become the index for node reference table in the database. In other words, its a count for $node_type_B which are referenced in $node_type_A, ok?
GApple is right again, you have to exactly say where to add the new reference. When you got that delta value you can exactly say where to append (delta+1) the new reference. Here it is:
function get_current_delta($node_vid){
return db_result(db_query("SELECT delta FROM {content_field_type_A_node_ref}
WHERE vid = '%d'
ORDER BY delta DESC
LIMIT 1", $node_vid));
}
Adding the new reference
We got delta! so we can attach the new $node_type_B node to our $node_type_A node:
// Loading type_A node.
$node_type_A = node_load($some_nid);
// Getting current delta value.
$current_delta = get_current_delta($node_type_A->vid);
// "Appending" a node reference based on delta.
$node_type_A->field_type_B_node_ref += array($current_delta + 1 => array('nid' => $node_type_B_nid));
Resaving the updated node
Optionally call node_submit() to populate some essential fields in the node object and save it by utilizing node_save(). After all, you need to call content_insert() to make the node completely saved asidelong with its CCK fields:
// Resaving the updated node.
$node_type_A = node_submit($node_type_A);
node_save($node_type_A);
content_insert($node_type_A);
Flushing the content cache
Probably the most important part, this was killin' me for couple of days. CCK has a cache table in the database called cache_content (take a look at its structure), after resaving the updated node, you will notice that nothing has changed in the $node_type_A theme output even though that the tables are updated. We have to remove a record from that content cache table, this will force Drupal to show the latest snapshot of the data. You can define the following as a function:
db_query("DELETE FROM {cache_content} WHERE cid = '%s'", 'content:' . $node_type_A->nid . ':' . $node_type_A->vid);
Hope it helps ;)

I just checked one of my own modules that does something similar for the object format, and $node_type_A->field_type_B_node_ref[]['nid'] should be correct.
One thing to check for is that when you load the node, CCK may pre-populate the node reference array with an empty value. If you have configured the field to only allow one value, by using the array append operator (field_type_B_node_ref[]) it will create a second entry that will be ignored (field_type_B_node_ref[1]), instead of overwriting the existing value (field_type_B_node_ref[0]). Try explicitly specifying the array key if possible.

Great post, but one correction: don't flush cache entries by manually querying the DB. In the event someone is using memcache or any other external cache it's going to fail.
cache_clear_all() is your friend for clearing.
Suggested code, direct from the CCK module:
cache_clear_all('content:'. $node_type_A->nid .':'. $node_type_A->vid, content_cache_tablename());

I show CCK storing node references as $node->field_node_reference[0]['items'][0]['nid'], not $node->field_node_reference[0]['nid']. Have you tried mimicking that?

"Flushing the content cache" This works for me, especially if you get a data from node_load()

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.