Duplicate value inserted in the mongodb "_id" field - php

In mongodb, We can assign our own value to _id field and the "_id" field value may be of any type, other than arrays, so long as it is a unique -- From the docs.
But in my live database, i can see some records were duplicated as follows,
db.memberrecords.find().limit(2).forEach(printjson)
{
"_id" : "999783",
"Memberid" : "999783",
"Name" : "ASHEESH SHARMA",
"Gender" : "M",
}
{
"_id" : "999783",
"Memberid" : "999783",
"Name" : "Sanwal Singh Meena",
"Gender" : "M",
}
In above records, the same _id value inserted twice in the table. When i tested with local database it is not allowing to insert the same _id record and throwing error as follows,
E11000 duplicate key error index: mongoms.memberrecords.$_id_ dup key: { : "999783" }
Below is the Indexes for my live memberrecords table(for your reference),
db.memberrecords.getIndexes()
[
{
"name" : "_id_",
"ns" : "mongoms.memberrecords",
"key" : {
"_id" : 1
},
"v" : 0
},
{
"_id" : ObjectId("4f0bcdf2b1513267f4ac227c"),
"ns" : "mongoms.memberrecords",
"key" : {
"Memberid" : 1
},
"name" : "Memberid_1",
"unique" : true,
"v" : 0
}
]
Note: i have two sharding for this table.
Any suggestion on this please,

Is your shard key the _id field? You can only have one unique index enforced across a cluster: the shard key (otherwise the server would have to check with every shard on every insert).
So: on a single a shard, _id will be unique. However, if it isn't your shard key, all bets are off across multiple shards.
See http://www.mongodb.org/display/DOCS/Sharding+Limits#ShardingLimits-UniqueIndexesDOCS%3AIndexes%23UniqueIndexes.

Related

ISODate format in mongodb version

I had restored MongoDB server version: 4.2.3 to MongoDB server version: 4.2.7 and I had an error about ISODate as below when saving data to the database again:
{ "_id" : ObjectId("5ed4b193ed6fab6d2272c5c4"), "id" : 1, "timestamp" : ISODate("2020-05-31T05:59:59Z") } #new data run after change db (it must disappear for unique)
{ "_id" : ObjectId("5ed33bef1e499012bf35e412"), "id" : 1, "timestamp" : ISODate("2020-05-31T04:59:59.999Z") } #old data
{ "_id" : ObjectId("5ed4b193ed6fab6d2272c5c3"), "id" : 1, "timestamp" : ISODate("2020-05-31T04:59:59Z") } #new data run after change db (it must disappear for unique)
{ "_id" : ObjectId("5ed32de165269b416f6c7362"), "id" : 1, "timestamp" : ISODate("2020-05-31T03:59:59.999Z") } #old data
{ "_id" : ObjectId("5ed4b193ed6fab6d2272c5c2"), "id" : 1, "timestamp" : ISODate("2020-05-31T03:59:59Z") } #new data run after change db (it must disappear for unique)
{ "_id" : ObjectId("5ed31fcff2a5076cc947bc02"), "id" : 1, "timestamp" : ISODate("2020-05-31T02:59:59.999Z") } #old data
{ "_id" : ObjectId("5ed311bfb0d88300f81e90d2"), "id" : 1, "timestamp" : ISODate("2020-05-31T01:59:59.999Z") } #old data
I have an index id and timestamp which is unique, but because timestamp has microseconds, not exactly so. Please give me a solution to keep microseconds in an ISODate.
PS: my code did not change. I use PHP and always format dates with 'Y-m-d\TH:i:s.uP'
MongoDB time resolution is 1 millisecond. Values with more precision will be truncated to millisecond precision.

Elasticsearch: Trying to find out appropriate searching query algorithm

I want a result from elastic search in a way where entered search string match with multiple fields of documents and list out the result from indexes is max number of field matching record shows first and least number of field matching record show last in the list.
As e.g: If i am searching keyword "test" and i have more than 12 fields in one record of index.
Now if test is match in 10 fields in 1 record
then match in 6 fields in another records and
then 2 fields in another records.
I want to show first record in listing with match of search string with maximum number of field match to least number of field match.
As per this example first record show with 10 fields match with search string, second with 6 fields match and 3rd with 2 fields match and go on...
It's good if able to get some good suggestion or example for same.
This is default behavior of elasticsearch. Documents with more number of matches are scored higher
Query:
{
"query": {
"query_string": {
"default_field": "*", -->search in all fields
"query": "test"
}
}
}
Result:
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iSCe6nEB8J88APx3YBGn",
"_score" : 0.9808291, --> scored higher as two fields match
"_source" : {
"field1" : "test",
"field2" : "test"
}
},
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iiCe6nEB8J88APx3ghF-",
"_score" : 0.4700036,
"_source" : {
"field1" : "test",
"field2" : "abc"
}
}

How to project more than one value with doctrine mongodb query builder selectElemMatch

I have the following kind of data in my mongo database. The property "values" consists of an array of attributes. "values" is a property of a product, which also has some other properties like "normalizedData". But the structure of "values" is what gives me a headache.
"values" : [
{
"_id" : ObjectId("5a09d88c83218b814a8df57d"),
"attribute" : NumberLong("118"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "de_AT",
"varchar" : "LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df57a"),
"attribute" : NumberLong("118"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "de_DE",
"varchar" : "LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df57c"),
"attribute" : NumberLong("184"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "de_AT",
"boolean" : false
},
{
"_id" : ObjectId("5a09d88c83218b814a8df585"),
"attribute" : NumberLong("118"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "fr_FR",
"varchar" : "LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df584"),
"attribute" : NumberLong("121"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "fr_FR",
"varchar" : "Eurolite LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df574"),
"attribute" : NumberLong("207"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"varchar" : "51913611"
},
]
A couple of things to notice about this extract from the dataset:
attributes with their ID ("attribute") can appear multiple times, like 118 for example.
attributes do not always have the same subset of properties (see 207 and 121 for example).
if an attribute is present multiple times (like 118) it should differ in the "locale" property at least.
Now I need the doctrine mongoDB query builder to project the following result:
I want only those attributes to be present in the result that contain one of the IDs specified by the query (e.g. array(118, 184)).
If the attribute exists multiple times, I want to see it multiple times.
If the attribute exists multiple times, I want to limit the number by an array of locales given.
So an example query would be: return all attributes inside "values" that have eigther 118 or 184 as the "attribute" property, and (if specified) limit the results to those attributes, where the locale is either "de_DE" or "it_IT".
Here is what I have tried so far:
$qb = $productRepository->createQueryBuilder();
$query = $qb
->hydrate(false)
->select(array('normalizedData.sku'))
->selectElemMatch(
'values',
$qb->expr()->field('attribute')->in(array(117, 110))->addAnd(
$qb->expr()->field('locale')->in(array('it_IT', 'de_DE'))
))
->field('_id')->in($entityIds)
->field('values')->elemMatch($qb->expr()->field('attribute')->in(array(117, 110)))
->limit($limit)
->skip($offset);
This query always returns only one attribute (no matter how many times it is present within the "values" array) per product. What am I doing wrong here?
EDIT: My MongoDB version is 2.4.9 and doctrine-mongo-odm is below 1.2. Currently I cannot update either.
You can try below aggregation query in 3.4 mongo version. $elemMatch by design returns first matching element.
You will need $filter to return multiple matches.
$match to limit the documents were values has atleast one value where it contains both attribute in [118,184] and locale in ["de_DE","it_IT"] followed by $filter to limit to matching documents in a $project stage. You can add $limit and $skip stage at the end of aggregation pipeliine same as what you did with regular query.
db.col.aggregate([
{"$match":{
"values":{
"$elemMatch":{
"attribute":{"$in":[118,184]},
"locale":{"$in":["de_DE","it_IT"]}
}
}
}},
{"$project":{
"values":{
"$filter":{
"input":"$values",
"as":"item",
"cond":{
"$and":[
{"$in":["$$item.attribute",[118,184]]},
{"$in":["$$item.locale",["de_DE","it_IT"]]}
]
}
}
}
}}
])
You can use AggregationBuilder to write the query in doctrine.

MongoDB Query to find out all the array elements of a collection

I have a pretty big MongoDB document that holds all kinds of data. I need to identify the fields that are of type array in a collection so I can remove them from the displayed fields in the grid that I will populate.
My method now consists of retrieving all the field names in the collection with
This was taken from the response posted here MongoDB Get names of all keys in collection
mr = db.runCommand({
"mapreduce" : "Product",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "things" + "_keys"
})
db[mr.result].distinct("_id")
And running for each of the fields a query like this one
db.Product.find( { $where : "Array.isArray(this.Orders)" } ).count()
If there's anything retrieved the field is considered an array.
I don't like that I need to run n+2 queries ( n being the number of different fields in my collection ) and I wouldn't like to hardcode the fields in the model. It would defeat the whole purpose of using MongoDB.
Is there a better method of doing this ?
I made a couple of slight modifications to the code you provided above:
mr = db.runCommand({
"mapreduce" : "Product",
"map" : function() {
for (var key in this) {
if (Array.isArray(this[key])) {
emit(key, 1);
} else {
emit(key, 0);
}
}
},
"reduce" : function(key, stuff) { return Array.sum(stuff); },
"out": "Product" + "_keys"
})
Now, the mapper will emit a 1 for keys that contain arrays, and a 0 for any that do not. The reducer will sum these up, so that when you check your end result:
db[mr.result].find()
You will see your field names with the number of documents in which they contain Array values (and a 0 for any that are never arrays).
So this should give you which fields contain Array types with just the map-reduce job.
--
Just to see it with some data:
db.Product.insert({"a":[1,2,3], "c":[1,2]})
db.Product.insert({"a":1, "b":2})
db.Product.insert({"a":1, "c":[2,3]})
(now run the "mr =" code above)
db[mr.result].find()
{ "_id" : "_id", "value" : 0 }
{ "_id" : "a", "value" : 1 }
{ "_id" : "b", "value" : 0 }
{ "_id" : "c", "value" : 2 }

MapReduce for MongoDB on PHP?

I'm new to MongoDB and this is my first use of MapReduce ever.
I have two collections: Shops and Products with the following schema
Products
{'_id', 'type': 'first', 'enabled': 1, 'shop': $SHOP_ID }
{'_id', 'type': 'second', 'enabled': 0, 'shop': $SHOP_ID }
{'_id', 'type': 'second', 'enabled': 1, 'shop': $SHOP_ID }
And
Shops
{'_id', 'name':'L', ... }
{'_id', 'name':'M', ... }
I'm looking for a GROUPBY similar statement for MongoDB with MapReduce to retrieve the Shops with name 'L' that have Products with 'enabled' => 1
How can I do it? Thank you.
It should be possible to retrieve the desired information without a Map Reduce operation.
You could first query the "Products" collection for documents that match {'enabled': 1}, and then take the list of $SHOP_IDs from that query (which I imagine correspond to the _id values in the "Shops" collection), put them in an array, and perform an $in query on the "Shops" collection, combined with the query on "name".
For example, given the two collections:
> db.products.find()
{ "_id" : 1, "type" : "first", "enabled" : 1, "shop" : 3 }
{ "_id" : 2, "type" : "second", "enabled" : 0, "shop" : 4 }
{ "_id" : 3, "type" : "second", "enabled" : 1, "shop" : 5 }
> db.shops.find()
{ "_id" : 3, "name" : "L" }
{ "_id" : 4, "name" : "L" }
{ "_id" : 5, "name" : "M" }
>
First find all of the documents that match {"enabled" : 1}
> db.products.find({"enabled" : 1})
{ "_id" : 1, "type" : "first", "enabled" : 1, "shop" : 3 }
{ "_id" : 3, "type" : "second", "enabled" : 1, "shop" : 5 }
From the above query, generate a list of _ids:
> var c = db.products.find({"enabled" : 1})
> shop_ids = []
[ ]
> c.forEach(function(doc){shop_ids.push(doc.shop)})
> shop_ids
[ 3, 5 ]
Finally, query the shops collection for documents with _id values in the shop_ids array that also match {name:"L"}.
> db.shops.find({_id:{$in:shop_ids}, name:"L"})
{ "_id" : 3, "name" : "L" }
>
Similar questions regarding doing the equivalent of a join operation with Mongo have been asked before. This question provides some links which may provide you with additional guidance:
How to join MongoDB collections in Python?
If you would like to experiment with Map Reduce, here is a link to a blog post from a user who used an incremental Map Reduce operation to combine values from two collections.
http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/
Hopefully the above will allow you to retrieve the desired information from your collections.
Short answer: you can't do that (with a single MapReduce command).
Long answer: MapReduce jobs in MongoDB run only on a single collection and cannot refer other collections in the process. So, JOIN/GROUP BY-like behaviour of SQL is not available here. The new Aggregation Framework also operates on a single collection only.
I propose a two-part solution:
Get all shops with name "L".
Compose and run map-reduce command that will check every product document against this pre-computed list of shops.

Categories