MongoDB: strategies for resolving a race condition

MongoDB: strategies for resolving a race condition - php

We have a scenario where we need to store multiple feeds under a site model as following:
{
id: site_id
name: site_name
feeds: [
{
url: feed_url_1
date: feed_update_date_1
},
{
url: feed_url_2
date: feed_update_date_2
},
...
]
}
Since feeds is an array, we can update it with $set, $push or $addToSet.
2 different race condition (write skew) may occur when our concurrent application (queue) try to update the same site model.
If we pick $set, and guard duplicate on client side, then if 2 queues are writing to the same site, one feed maybe lost with following sequence.
Given a wordpress site, extract 2 feeds (RSS and ATOM), dispatch to Q1 and Q2.
Q1: load existing feed, check RSS feed is new
Q2: load existing feed, check ATOM feed is new
Q1: $set feeds => [RSS]
Q2: $set feeds => [ATOM]
Now RSS feed is lost.
If we pick $push or $addToSet, then following may happen.
User A added a site, putting RSS feed to Q1
User B added the same site, putting the same RSS feed to Q2
Q1: load existing feed, check RSS feed is new
Q2: load existing feed, check RSS feed is new
Q1: $push RSS
Q2: $push RSS
Now RSS feed has been duplicated
If our data model were simply { url }, then $addToSet will safeguard against duplicate feed. But unfortunately this is not the case, the date attribute may differ. So $addToSet is not much safer than $push.
We have thought of a few possible workaround to this problem, but none are great given our tight schedule.
Decouple feeds from site into its own collection, safeguard with url alone, and change our model and repository accordingly.
Insert a partial { url } into the site model first, then update them with addition information, this should makes $addToSet usable, but may break other queue that require date to always be present (testing needed).
Let race condition happen as-is, $push the feed first, use a background queue to detect duplicate and remove them later.
(There might be a 4th solution if upsert work with positional query, but as far as I know MongoDB v2.4 doesn't have it yet)
So I wonder whether there are better alternative for resolving this kind of race condition. Or if there are some best practices for it.

you might want to have a look at tokumx, a fork of mongodb which supports transactions (besides a few other usefull things)

You can use a gard on the update selector:
alice(mongod-2.4.8) test> db.foo.save({_id: 12 })
Updated 1 new record(s) in 1ms
alice(mongod-2.4.8) test> db.foo.update({ _id: 12, "feeds.url" : {$ne: "baz"} },
{ $push : { feeds : { url: "baz" } } } )
Updated 1 existing record(s) in 1ms
alice(mongod-2.4.8) test> db.foo.update({ _id: 12, "feeds.url" : {$ne: "baz"} },
{ $push : { feeds : { url: "baz" } } } )
Updated 0 record(s) in 1ms
alice(mongod-2.4.8) test> db.foo.find({_id: 12 })
{
"_id": 12,
"feeds": [
{
"url": "baz"
}
]
}
Fetched 1 record(s) in 1ms -- Index[_id_]

Related

Bulk inserting documents into couchbase database with php - how to?

I am experimenting actually a little bit with couchbase server.
I have tried to read a mysql database table, build a document from each data row and then inserting the document with an id which I generate with
uniqid('table_name');
via cUrl, method is POST.
This so far works pretty good, until the script has inserted roundabout 7050 documents. Then an exception is thrown -> "No buffer space".
Until now I was not able to fix this, so I decided to collect i.e. 50 rows of data build a json_encode(d) string and POST it again via cUrl.
This worked so far if I don't set the id - but I can't figure out how to set the id of the inserted documents.
Actually I try to send my documents in a format like this:
{"docs": {
"_id": {
"geodata_de_54476f7e6adc57.14196038": {
"table": "geodata_de",
"country": "DE",
"postal_code": "01945",
"place_name": "Lindenau",
"state_name": "Brandenburg",
"state_code": "BB",
"province_name": "",
"province_code": "00",
"community_name": "Oberspreewald-Lausitz",
"community_code": "12066",
"lat": "51.4",
"lng": "13.7333",
"Xco": "3861.1",
"Yco": "943.614",
"Zco": "4979.07"
}
}, ...
}
}
but this just inserts ONE document with the above object.
Maybe there is someone here who can point me the right direction.

I would use the Couchbase PHP SDK to insert these documents instead of using curl. http://docs.couchbase.com/developer/php-2.0/storing.html
Also for CB, you do not have to set the ID in the document itself. it depends. I might take a look at instead using the ID you have in your example ("geodata_de_54476f7e6adc57.14196038") and put it as the key for the object in Couchbase. Then you do not necessarily need the _id. The key in Couchbase can be up to 250 bytes of data and you can make it meaningful to your application so you can do lookup by key extremely fast.
Another option is, if you wrote your docs to the filesystem, you could also use cbdocloader utility which is specifically for bulk loading docs. If you are on linux it is in /opt/couchbase/bin/tools/cbdocloader.

Instagram API to fetch all photo in php

i'm using this api:
public function getUserMedia($id = 'self', $limit = 0) {
return $this->_makeCall('users/'.$id.'/media/recent', true, array('count' => $limit));
}
to fetch the photo of a user logged in my site using php
it works, and all other api works.
the problem is that i want to retrieve ALL the photo of a user (such as printstagr.am).
i've searched in the api but without success: it seems that you can take the recents or the populars, but the site mentioned above takes all. any idea?
thanks!

Not sure what the max count is (saw someone mention 20 was the max on another question, but can't find any limit in the docs from a quick scan), but essentially what you have to do is request as many as possible, then follow the pagination links to collect more.
So from the api docs they provide this:
{
...
"pagination": {
"next_url": "https://api.instagram.com/v1/tags/puppy/media/recent?access_token=fb2e77d.47a0479900504cb3ab4a1f626d174d2d&max_id=13872296",
"next_max_id": "13872296"
}
}
Your application needs to store the objects from the request (i.e. an array), then fire a new request to the "next_url", put those objects into the same store (i.e. array), then follow the link again, until you reach the end or until you get enough to satisfy your needs.

Stored function used in insert trough PHP runs multiple times

I'm trying to understand a strange behavior using PHP with mongodb 2.4.3 win32.
I try to have server side generated sequence ids.
When inserting documents using a stored function as one of the parameters it seems that the stored function is called several times at each insertion.
Let's say I have a counter initialized like this:
db.counters.insert( { _id: "uqid", seq: NumberLong(0) } );
I have a stored function named getUqid which is defined as
db.system.js.save(
{ _id: "getUqid",
value: function () {
var ret = db.counters.findAndModify(
{ query: { _id: "uqid" },
update: { $inc: { seq: NumberLong(1) } },
new: true
} );
return ret.seq;
}
} );
When I do three insertions like this:
$conn->test->ads->insert(['qid' => new MongoCode('getUqid()') , 'name' => "Sarah C."]);
I get something like that:
db.ads.find()
{ "_id" : ObjectId("51a34f8bf0774cac03000000"), "qid" : 17, "name" : "Sarah C." }
{ "_id" : ObjectId("51a34f8bf0774cac03000001"), "qid" : 20, "name" : "Michel D." }
{ "_id" : ObjectId("51a34f8bf0774cac03000002"), "qid" : 23, "name" : "Robert U." }
Any clue why qid is getting stepped by 3 ? It should mean that I received three call to my stored function right ?
Thanks in advance for your help, Regards.
PS: secondary question: are NumberLong still required to be sure we have 64bit unsigned integer in internal mongodb storage ? Any command to cross-check that in the shell ?

Cross-referencing this question with PHP-841. From the PHP side of things, you're actually storing a BSON code value in the qid field. You can likely verify that when fetching results back from the database or doing a database export with the mongodump command.
The issue is with the JS shell wrongfully evaluating the code type upon display, and that's the point where findAndModify is executed. This fix should be included in a subsequent server release.
In the meantime, Sammaye's suggestion to call findAndModify from PHP is the best option for this sort of functionality. Coincidentally, it is also what is done in Doctrine MongoDB ODM (see: IncrementGenerator). It does require an additional round trip to the server, but that is necessary since MongoDB has no facility for executing JS callbacks during a write operation.
If minimizing the round-trips to MongoDB is of utmost importance, you could insert the documents by executing server-side JS through PHP with MongoDB::execute() and do something like returning the generated ID(s) as the command response. Of course, that's generally not advisable and JS evaluation has its own caveats.

Doctrine is making duplicated document entries

I am trying to insert a embedded document into my document with the following code.
// Add states, for the joining player.
$state = new PlayerState();
$state->setReady(false);
$state->setPlayer($player->getId());
$game->addPlayerState($state);
// Save element.
$dm->persist($game);
$dm->flush();
Problem being, that this generates 2 PlayerState Document like this.
{ "_id" : ObjectId( "513f50a58ead0ee9ac00000f" ),
"ready" : false,
"player" : "513f509f8ead0e8bac00000b" },
{ "_id" : ObjectId( "513f50af8ead0ecdac000015" ),
"ready" :false,
"player" : "513f509f8ead0e8bac00000b" }
Am i saving this in a incorrect way? Let me know, if you need more code.

This seemed to do the trick.
$state = new PlayerState();
$state->setReady(false);
$state->setPlayer($player->getId());
$dm->persist($state);
$dm->flush();
$game->addPlayerState($state);
// Save element.
$dm->flush();
This is hard to explain, but i will give it a try.
You need to persist the embedded document first, otherwise Doctrine will first persist the Document, making a embedded doc only with the values set, acting like a simple data container.
$state->setReady(false);
$state->setPlayer($player->getId());
After, doctrine will persist the embedded document once again, but this time looking at the Document object, assigning ID's, default values etc.
Resulting in 2 entries.

Is there PHP treeview with data from database?

I want to know if there exists php treeview with data from mysql. I haven't found a suitalbe one for my project. Do you know if there is some plugins or code samples out there?
Thanks a lot.
Edit:
jQuery Treeview's asyncronous example, link text
I found it can work, but i don't know how to get the source.php. Do you have any ideas or other propositions?

you would need to run the query yourself, but it's pretty easy. the output the tree expects is an array of objects in json format like the example below.
your table structure could be:
tree_node (id, title, parent_id)
you would select the root node, then it's children, recursively until the tree is complete.
function expandTree($node)
{
$result = array('text' => $node['title'], 'children' => array());
$nodes = getChildren($node); // query all nodes whose parent_id = $node['id']
foreach ($nodes as $node) {
$result['children'][] = expandTree($node);
}
return $result;
}
output format:
[
{
"text": "1. Pre Lunch (120 min)",
"expanded": true,
"classes": "important",
"children":
[
{
"text": "1.1 The State of the Powerdome (30 min)"
},
{
"text": "1.2 The Future of jQuery (30 min)"
},
{
"text": "1.2 jQuery UI - A step to richnessy (60 min)"
}
]
},
{
"text": "2. Lunch (60 min)"
},
[...]

Assuming you have a db with parents and children, have a look at
http://www.ideashower.com/our_solutions/create-a-parent-child-array-structure-in-one-pass/ & http://www.phpriot.com/articles/nested-trees-1
Once you have your data correctly sorted, you can then look at rendering it.

To present bulk of data with parent child relationship Treeview is a classical approach. The major advantage of Treeview is using a Treeview we can show more data in less space. Assume that you have a global recruitment portal. You want to display job opportunities depending upon Countries and their Cities. In this case you required Treeview. Using a Treeview easily you can display Countries & related Cities. In this session let us share codes for a PHP Treeview using data from MySQL Database. In front-end using PHP I am binding data to ol li element of HTML. Then by applying CSS giving expand and collapse effects to the Treeview. Let us explain this PHP Treeview Example Step by Step. PHP Treeview Example using data from MySQL Database

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.