Elasticsearch: Trying to find out appropriate searching query algorithm

Elasticsearch: Trying to find out appropriate searching query algorithm - php

I want a result from elastic search in a way where entered search string match with multiple fields of documents and list out the result from indexes is max number of field matching record shows first and least number of field matching record show last in the list.
As e.g: If i am searching keyword "test" and i have more than 12 fields in one record of index.
Now if test is match in 10 fields in 1 record
then match in 6 fields in another records and
then 2 fields in another records.
I want to show first record in listing with match of search string with maximum number of field match to least number of field match.
As per this example first record show with 10 fields match with search string, second with 6 fields match and 3rd with 2 fields match and go on...
It's good if able to get some good suggestion or example for same.

This is default behavior of elasticsearch. Documents with more number of matches are scored higher
Query:
{
"query": {
"query_string": {
"default_field": "*", -->search in all fields
"query": "test"
}
}
}
Result:
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iSCe6nEB8J88APx3YBGn",
"_score" : 0.9808291, --> scored higher as two fields match
"_source" : {
"field1" : "test",
"field2" : "test"
}
},
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iiCe6nEB8J88APx3ghF-",
"_score" : 0.4700036,
"_source" : {
"field1" : "test",
"field2" : "abc"
}
}

Related

How to project more than one value with doctrine mongodb query builder selectElemMatch

I have the following kind of data in my mongo database. The property "values" consists of an array of attributes. "values" is a property of a product, which also has some other properties like "normalizedData". But the structure of "values" is what gives me a headache.
"values" : [
{
"_id" : ObjectId("5a09d88c83218b814a8df57d"),
"attribute" : NumberLong("118"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "de_AT",
"varchar" : "LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df57a"),
"attribute" : NumberLong("118"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "de_DE",
"varchar" : "LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df57c"),
"attribute" : NumberLong("184"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "de_AT",
"boolean" : false
},
{
"_id" : ObjectId("5a09d88c83218b814a8df585"),
"attribute" : NumberLong("118"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "fr_FR",
"varchar" : "LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df584"),
"attribute" : NumberLong("121"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "fr_FR",
"varchar" : "Eurolite LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df574"),
"attribute" : NumberLong("207"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"varchar" : "51913611"
},
]
A couple of things to notice about this extract from the dataset:
attributes with their ID ("attribute") can appear multiple times, like 118 for example.
attributes do not always have the same subset of properties (see 207 and 121 for example).
if an attribute is present multiple times (like 118) it should differ in the "locale" property at least.
Now I need the doctrine mongoDB query builder to project the following result:
I want only those attributes to be present in the result that contain one of the IDs specified by the query (e.g. array(118, 184)).
If the attribute exists multiple times, I want to see it multiple times.
If the attribute exists multiple times, I want to limit the number by an array of locales given.
So an example query would be: return all attributes inside "values" that have eigther 118 or 184 as the "attribute" property, and (if specified) limit the results to those attributes, where the locale is either "de_DE" or "it_IT".
Here is what I have tried so far:
$qb = $productRepository->createQueryBuilder();
$query = $qb
->hydrate(false)
->select(array('normalizedData.sku'))
->selectElemMatch(
'values',
$qb->expr()->field('attribute')->in(array(117, 110))->addAnd(
$qb->expr()->field('locale')->in(array('it_IT', 'de_DE'))
))
->field('_id')->in($entityIds)
->field('values')->elemMatch($qb->expr()->field('attribute')->in(array(117, 110)))
->limit($limit)
->skip($offset);
This query always returns only one attribute (no matter how many times it is present within the "values" array) per product. What am I doing wrong here?
EDIT: My MongoDB version is 2.4.9 and doctrine-mongo-odm is below 1.2. Currently I cannot update either.

You can try below aggregation query in 3.4 mongo version. $elemMatch by design returns first matching element.
You will need $filter to return multiple matches.
$match to limit the documents were values has atleast one value where it contains both attribute in [118,184] and locale in ["de_DE","it_IT"] followed by $filter to limit to matching documents in a $project stage. You can add $limit and $skip stage at the end of aggregation pipeliine same as what you did with regular query.
db.col.aggregate([
{"$match":{
"values":{
"$elemMatch":{
"attribute":{"$in":[118,184]},
"locale":{"$in":["de_DE","it_IT"]}
}
}
}},
{"$project":{
"values":{
"$filter":{
"input":"$values",
"as":"item",
"cond":{
"$and":[
{"$in":["$$item.attribute",[118,184]]},
{"$in":["$$item.locale",["de_DE","it_IT"]]}
]
}
}
}
}}
])
You can use AggregationBuilder to write the query in doctrine.

how to fetch a single value from a reference to another collection?

I have two collections - users and chats. Each chat message has a structure like the following:
_id: ObjectId
from: ObjectId // user _id
to: ObjectId // user _id
message: String
date_created: Date
And each user has:
_id: ObjectId
name: String
username: String
// ... not important stuff
I need to fetch conversations that are only sent to me and the result should be in the following way:
{
data: [
{
"id": conversation_id,
"title": username,
"message": message_excerpt
},...
]
}
My problem is trying to get the username from the reference because I don't want to make 20 fetch queries to get 20 different usernames. I would have added the username when the conversation is first created but I can't because the username can be changed any time. This would create an inconsistency between the username and the conversation title. How should I handle this problem? This is the first time I wished there was a JOIN in Mongo.

Two possibilities:
1: Add the username to chat message documents. Like you said, if the username changes, you need to change the username on all the user's chats.
2: Do an application-level join. You don't need to do 20 queries to get the 20 names. You can first retrieve all the chats, then collect all of the user_id values and do one query. For example:
var results = [
{ "_id" : 0, "from" : 43, "to" : 86, "message" : "sup?" },
{ "_id" : 1, "from" : 99, "to" : 86, "message" : "yo" }
]
var from_users = db.users.find({ "_id" : { "$in" : results.map(function(doc) { return doc.from }) } }).toArray()
Now you can use the from_users to populate the username into results or create your desired document structure. Note that results from the $in are necessarily returned in the order of elements in the array argument to $in - this is commonly expected/desired but it's not the case.

matching whole string with dashes in elasticsearch

I have an elasticsearch query which I am trying to match properly, the field data itself contains -(dashes), the string data are GUIDS
It was not matching properly because it was splitting the term up into separate words split by the -
I have since changed the query to use a match_phrase query like this:
"query": {
"filtered": {
"query": {
"match_phrase":{
"guid":{"operator" : "or","query":"bd2acb42-cf01-11e2-ba92-12313916f4be"}
}
}
}
}
When I am trying to match just one GUIDS, this works just fine.
However I am trying to match multiple GUIDS
So it currently looks like
"query": {
"filtered": {
"query": {
"match_phrase":{
"guid":{"operator" : "or","query":"bd2acb42-cf01-11e2-ba92-12313916f4be d1091f08-ceff-11e2-ba92-12313916f4be"}
}
}
}
}
I assume its not working because its trying to match the whole string, and not each GUID separately.
I tried added "analyzer" : "whitespace", to the query, but this broke the query entirely.
So what is the best method to ensure the query is looking for the whole GUID string and allows matching of multiple GUIDS?

I have been setting the field mapping to not_analyzed for similar purposes.
"guid" : {
"type" : "string",
"index" : "not_analyzed"
}
Building the query manually then works.
{
"bool" : {
"should" : [
{
"term" : { "guid" : "bd2acb42-cf01-11e2-ba92-12313916f4be" }
},
{
"term" : { "guid" : "d1091f08-ceff-11e2-ba92-12313916f4be" }
}
],
"minimum_number_should_match" : 1
}
}

MapReduce for MongoDB on PHP?

I'm new to MongoDB and this is my first use of MapReduce ever.
I have two collections: Shops and Products with the following schema
Products
{'_id', 'type': 'first', 'enabled': 1, 'shop': $SHOP_ID }
{'_id', 'type': 'second', 'enabled': 0, 'shop': $SHOP_ID }
{'_id', 'type': 'second', 'enabled': 1, 'shop': $SHOP_ID }
And
Shops
{'_id', 'name':'L', ... }
{'_id', 'name':'M', ... }
I'm looking for a GROUPBY similar statement for MongoDB with MapReduce to retrieve the Shops with name 'L' that have Products with 'enabled' => 1
How can I do it? Thank you.

It should be possible to retrieve the desired information without a Map Reduce operation.
You could first query the "Products" collection for documents that match {'enabled': 1}, and then take the list of $SHOP_IDs from that query (which I imagine correspond to the _id values in the "Shops" collection), put them in an array, and perform an $in query on the "Shops" collection, combined with the query on "name".
For example, given the two collections:
> db.products.find()
{ "_id" : 1, "type" : "first", "enabled" : 1, "shop" : 3 }
{ "_id" : 2, "type" : "second", "enabled" : 0, "shop" : 4 }
{ "_id" : 3, "type" : "second", "enabled" : 1, "shop" : 5 }
> db.shops.find()
{ "_id" : 3, "name" : "L" }
{ "_id" : 4, "name" : "L" }
{ "_id" : 5, "name" : "M" }
>
First find all of the documents that match {"enabled" : 1}
> db.products.find({"enabled" : 1})
{ "_id" : 1, "type" : "first", "enabled" : 1, "shop" : 3 }
{ "_id" : 3, "type" : "second", "enabled" : 1, "shop" : 5 }
From the above query, generate a list of _ids:
> var c = db.products.find({"enabled" : 1})
> shop_ids = []
[ ]
> c.forEach(function(doc){shop_ids.push(doc.shop)})
> shop_ids
[ 3, 5 ]
Finally, query the shops collection for documents with _id values in the shop_ids array that also match {name:"L"}.
> db.shops.find({_id:{$in:shop_ids}, name:"L"})
{ "_id" : 3, "name" : "L" }
>
Similar questions regarding doing the equivalent of a join operation with Mongo have been asked before. This question provides some links which may provide you with additional guidance:
How to join MongoDB collections in Python?
If you would like to experiment with Map Reduce, here is a link to a blog post from a user who used an incremental Map Reduce operation to combine values from two collections.
http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/
Hopefully the above will allow you to retrieve the desired information from your collections.

Short answer: you can't do that (with a single MapReduce command).
Long answer: MapReduce jobs in MongoDB run only on a single collection and cannot refer other collections in the process. So, JOIN/GROUP BY-like behaviour of SQL is not available here. The new Aggregation Framework also operates on a single collection only.
I propose a two-part solution:
Get all shops with name "L".
Compose and run map-reduce command that will check every product document against this pre-computed list of shops.

Duplicate value inserted in the mongodb "_id" field

In mongodb, We can assign our own value to _id field and the "_id" field value may be of any type, other than arrays, so long as it is a unique -- From the docs.
But in my live database, i can see some records were duplicated as follows,
db.memberrecords.find().limit(2).forEach(printjson)
{
"_id" : "999783",
"Memberid" : "999783",
"Name" : "ASHEESH SHARMA",
"Gender" : "M",
}
{
"_id" : "999783",
"Memberid" : "999783",
"Name" : "Sanwal Singh Meena",
"Gender" : "M",
}
In above records, the same _id value inserted twice in the table. When i tested with local database it is not allowing to insert the same _id record and throwing error as follows,
E11000 duplicate key error index: mongoms.memberrecords.$_id_ dup key: { : "999783" }
Below is the Indexes for my live memberrecords table(for your reference),
db.memberrecords.getIndexes()
[
{
"name" : "_id_",
"ns" : "mongoms.memberrecords",
"key" : {
"_id" : 1
},
"v" : 0
},
{
"_id" : ObjectId("4f0bcdf2b1513267f4ac227c"),
"ns" : "mongoms.memberrecords",
"key" : {
"Memberid" : 1
},
"name" : "Memberid_1",
"unique" : true,
"v" : 0
}
]
Note: i have two sharding for this table.
Any suggestion on this please,

Is your shard key the _id field? You can only have one unique index enforced across a cluster: the shard key (otherwise the server would have to check with every shard on every insert).
So: on a single a shard, _id will be unique. However, if it isn't your shard key, all bets are off across multiple shards.
See http://www.mongodb.org/display/DOCS/Sharding+Limits#ShardingLimits-UniqueIndexesDOCS%3AIndexes%23UniqueIndexes.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Elasticsearch: Trying to find out appropriate searching query algorithm - php

Related

How to project more than one value with doctrine mongodb query builder selectElemMatch

how to fetch a single value from a reference to another collection?

matching whole string with dashes in elasticsearch

MapReduce for MongoDB on PHP?

Duplicate value inserted in the mongodb "_id" field

Categories

Resources