mongodb aggregation query by using php

mongodb aggregation query by using php - php

i have mongodb with database name travel and collection name sample with data as below
`{
"_id": { id": "2344" },
"places": { "destination": [ "singapore", "delhi" ] },
"dt": { "date": "1434467400" }
},
{
"_id": { id": "2345" },
"places": { "destination": [ "singapore", "delhi" ] },
"dt": { "date": "1434467445" }
},
{
"_id": { "id": "2354" },
"places": { "code": "7856", "source": [ "beijing", "singapore" ], "destination": [ "newyork", "landon", "sidney" ] },
"dt": { "date": "1434589338" }
}`
i wrote monogdb aggregate function to find count of places
`db.sample.aggregate([{$group : {_id : "$places", count : {$sum : 1}}}, {$sort: {count: 1}}])`
which shows result as
{ "_id" : { "destination" : [ "singapore", "delhi" ] }, "count" : 2 } \
{ "_id" : { "code" : "7856", "source" : [ "beijing", "singapore" ], "destination
" : [ "newyork", "landon", "sidney" ] }, "count" : 1 }
which gives count for each places by grouping places.
i have below these 2 requirements
i want to write this query in php with mongodb connection
if possible i want to get count based on code or source or destination which are inside places.
thanx in advance for help..

Related

Trying to implement ElasticSearch sorting conditionally

Here is my mapping :-
{
"test": {
"aliases": {
},
"mappings": {
"courses": {
"properties": {
"is_sponsored": {
"type": "long"
},
"sponsored_end_date": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"sponsored_start_date": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"settings": {
"index": {
"creation_date": "1609945591003",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "3S6mwaIbSFuTKPtuj8sSWw",
"version": {
"created": "6070199"
},
"provided_name": "test"
}
}
}
}
}
}
}
I want to show those courses at the top whose "is_sponsored" value is true and current date lies between "sponsored_start_date" and "sponsored_end_date". Once the "sponsored_end_date" is passed it should show at the normal position. I'm new in ElasticSearch so kindly suggest a way to do this. I'm using php.
Thanks

You can use function_score to boost certain documents
Query
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"bool": {
"must": [
{
"term": {
"is_sponsored": 1
}
},
{
"range": {
"sponsored_start_date": {
"lte": "now"
}
}
},
{
"range": {
"sponsored_end_date": {
"gte": "now"
}
}
}
]
}
},
"weight": 10
}
],
"score_mode": "sum"
}
}
}
Result
"hits" : [
{
"_index" : "index45",
"_type" : "_doc",
"_id" : "sdb6L3wBJ0n9LhX2J02r",
"_score" : 10.0,
"_source" : {
"is_sponsored" : 1,
"sponsored_start_date" : "2021-09-01",
"sponsored_end_date" : "2021-09-30"
}
},
{
"_index" : "index45",
"_type" : "_doc",
"_id" : "stb6L3wBJ0n9LhX2PE0c",
"_score" : 1.0,
"_source" : {
"is_sponsored" : 0,
"sponsored_start_date" : "2021-09-01",
"sponsored_end_date" : "2021-09-30"
}
},
{
"_index" : "index45",
"_type" : "_doc",
"_id" : "s9b6L3wBJ0n9LhX2Z00v",
"_score" : 1.0,
"_source" : {
"is_sponsored" : 1,
"sponsored_start_date" : "2021-10-01",
"sponsored_end_date" : "2021-10-30"
}
}
]
Documents which satisfy all three condition will get higher weight. Rest of the documents will have normal scoring.

Sort parent user list by inner hits elasticsearch

I have 200K users in elasticsearch and each user has its own inbox. Now suppose threeo users user A,B and C. User A and user C send message to user B. So when user B fetch users list from elasticsearch then user A and C should be on the top of the user list because A and B most recent sent message to user B. I write my elasticsearch query that is given below
{
"_source": [
"db_id",
"username",
"message_privacy"
],
"from": "0",
"size": "40",
"sort": [{"messages_received.created_at" : "desc"}],
"query": {
"bool": {
"must": [
{
"term":{
"type":"user"
}
},
{
"has_child": {
"type": "messages_received",
"inner_hits": {
"sort": [
{
"created_at": "desc"
}
],
"size": 1,
"_source": [
"id",
"user_id",
"object_id",
"created_at"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"object_id": "u-5"
}
}
]
}
}
}
}
]
}
}
}
But when I run query it gives me error
{ "error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "No mapping found for [messages_received.created_at] in order to sort on",
"index_uuid": "5jsM1khYRrC0cjWbRjsx5A",
"index": "trending"
}
],
I search this problem on google but not usefull solution found for my scenario.
Mapping
{
"type": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"post": [
"comments",
"place",
"media",
"views",
"likes",
"post_box"
],
"box": "posts",
"user": [
"user_views",
"user_likes",
"followers",
"post",
"blocked",
"followings",
"box",
"block",
"notifications",
"messages_received",
"messages_sent"
],
"posts": "posts_views"
}
}}

Get nested documents with a filter on Elasticsearch 5

I have the following document mapped in ES 5:
{
"appName" : {
"mappings" : {
"market_audit" : {
"properties" : {
"generation_date": {
"type": "date"
},
"customers" : {
"type" : "nested",
"properties" : {
"customer_id" : {
"type" : "integer"
},
[... other properties ...]
}
Several entries in the "customers" node may have the same customer_id, and I am trying to retrieve only the entries having a specific customer_id (ie. "1") along with the "generation_date" of the top-level document (only the latest document is to be processed).
I was able to come up with the following query:
{
"query": {},
"sort": [
{ "generation_date": "desc" }
],
"size": 1,
"aggregations": {
"nested": {
"nested": {
"path": "customers"
},
"aggregations": {
"filter": {
"filter": {
"match": {
"customers.customer_id": {
"query": "1"
}
}
},
"aggregations": {
"tophits_agg": {
"top_hits": {}
}
}
}
}
}
}
}
This query gets me the data I'm interested in, located in the "aggregations" array (along with the "hits" one that contains the whole document). The issue here is that the framework I use (ONGR's ElasticSearch bundle along with the DSL bundle, using Symfony3) complains every time I try to get access to the actual data that no buckets are available.
I've read the ES documentation but could not come up with a working query that added buckets. I'm sure I am missing something, a little help would be more than welcome. If you have an idea on how to appropriately modify the query I think I can come up with the PHP code to produce it.
EDIT: since this question got some views and no answer (and I'm still stuck), I would settle for any query that allows me to retrieve information about a specific "customer" (using customer_id) from the latest document generated (according to the "generation_date" field). The query I gave is just what I was able to come up with and I'm pretty sure there's a far better way to do that. Suggestions maybe ?
EDIT 2:
Here's the data sent to ES:
{
"index": {
"_type": "market_data_audit_document"
}
}
{
"customers": [
{
"customer_id": 1,
"colocation_name": "colo1",
"colocation_id": 26,
"device_name": "device 1",
"channels": [
{
"name": "channel1-5",
"multicast":"1.2.1.5",
"sugar_state":4,
"network_state":1
}
]
},
{
"customer_id":2,
"colocation_name":"colo2",
"colocation_id":27,
"device_name":"device 2",
"channels": [
{
"name":"channel2-5",
"multicast":"1.2.2.5",
"sugar_state":4,
"network_state":1
}
]
},
{
"customer_id":3,
"colocation_name":"colo3",
"colocation_id":28,
"device_name":"device 3",
"channels": [
{
"name":"channel3-5",
"multicast":"1.2.3.5",
"sugar_state":4,
"network_state":1
}
]
},
{
"customer_id":4,
"colocation_name":"colo4",
"colocation_id":29,
"device_name":"device 4"
,"channels": [
{
"name":"channel4-5",
"multicast":"1.2.4.5",
"sugar_state":4,
"network_state":1
}
]
},
{
"customer_id":5,
"colocation_name":"colo5",
"colocation_id":30,
"device_name":"device 5",
"channels": [
{
"name":"channel5-5",
"multicast":"1.2.5.5",
"sugar_state":4,
"network_state":1
}
]
}
],
"generation_date":"2017-02-27T10:55:45+0100"
}
Unfortunately, when I tried to send the query listed in this post, I discovered that the aggregation does not do what I expected it to do: it returns "good" data, but from ALL the stored documents ! Here's an output example:
{
"timed_out" : false,
"took" : 60,
"hits" : {
"total" : 2,
"hits" : [
{
"_source" : {
"customers" : [
{
"colocation_id" : 26,
"channels" : [
{
"name" : "channel1-5",
"sugar_state" : 4,
"network_state" : 1,
"multicast" : "1.2.1.5"
}
],
"customer_id" : 1,
"colocation_name" : "colo1",
"device_name" : "device 1"
},
{
"colocation_id" : 27,
"channels" : [
{
"multicast" : "1.2.2.5",
"network_state" : 1,
"name" : "channel2-5",
"sugar_state" : 4
}
],
"customer_id" : 2,
"device_name" : "device 2",
"colocation_name" : "colo2"
},
{
"device_name" : "device 3",
"colocation_name" : "colo3",
"customer_id" : 3,
"channels" : [
{
"multicast" : "1.2.3.5",
"network_state" : 1,
"sugar_state" : 4,
"name" : "channel3-5"
}
],
"colocation_id" : 28
},
{
"channels" : [
{
"sugar_state" : 4,
"name" : "channel4-5",
"multicast" : "1.2.4.5",
"network_state" : 1
}
],
"customer_id" : 4,
"colocation_id" : 29,
"colocation_name" : "colo4",
"device_name" : "device 4"
},
{
"device_name" : "device 5",
"colocation_name" : "colo5",
"colocation_id" : 30,
"channels" : [
{
"sugar_state" : 4,
"name" : "channel5-5",
"multicast" : "1.2.5.5",
"network_state" : 1
}
],
"customer_id" : 5
}
],
"generation_date" : "2017-02-27T11:45:37+0100"
},
"_type" : "market_data_audit_document",
"sort" : [
1488192337000
],
"_index" : "mars",
"_score" : null,
"_id" : "AVp_LPeJdrvi0cWb8CrL"
}
],
"max_score" : null
},
"aggregations" : {
"nested" : {
"doc_count" : 10,
"filter" : {
"doc_count" : 2,
"tophits_agg" : {
"hits" : {
"max_score" : 1,
"total" : 2,
"hits" : [
{
"_nested" : {
"offset" : 0,
"field" : "customers"
},
"_score" : 1,
"_source" : {
"channels" : [
{
"name" : "channel1-5",
"sugar_state" : 4,
"multicast" : "1.2.1.5",
"network_state" : 1
}
],
"customer_id" : 1,
"colocation_id" : 26,
"colocation_name" : "colo1",
"device_name" : "device 1"
}
},
{
"_source" : {
"colocation_id" : 26,
"customer_id" : 1,
"channels" : [
{
"multicast" : "1.2.1.5",
"network_state" : 1,
"name" : "channel1-5",
"sugar_state" : 4
}
],
"device_name" : "device 1",
"colocation_name" : "colo1"
},
"_nested" : {
"offset" : 0,
"field" : "customers"
},
"_score" : 1
}
]
}
}
}
}
},
"_shards" : {
"total" : 13,
"successful" : 1,
"failures" : [
{
"reason" : {
"index" : ".kibana",
"index_uuid" : "bTkwoysSQ0y8Tt9yYFRStg",
"type" : "query_shard_exception",
"reason" : "No mapping found for [generation_date] in order to sort on"
},
"shard" : 0,
"node" : "4ZUgOm4VRry6EtUK15UH3Q",
"index" : ".kibana"
},
{
"reason" : {
"index_uuid" : "lN2mVF9bRjuDtiBF2qACfA",
"index" : "archiv1_log",
"type" : "query_shard_exception",
"reason" : "No mapping found for [generation_date] in order to sort on"
},
"shard" : 0,
"node" : "4ZUgOm4VRry6EtUK15UH3Q",
"index" : "archiv1_log"
},
{
"index" : "archiv1_session",
"shard" : 0,
"node" : "4ZUgOm4VRry6EtUK15UH3Q",
"reason" : {
"type" : "query_shard_exception",
"index" : "archiv1_session",
"index_uuid" : "cmMAW04YTtCb0khEqHpNyA",
"reason" : "No mapping found for [generation_date] in order to sort on"
}
},
{
"shard" : 0,
"node" : "4ZUgOm4VRry6EtUK15UH3Q",
"reason" : {
"reason" : "No mapping found for [generation_date] in order to sort on",
"index" : "archiv1_users_dev",
"index_uuid" : "AH48gIf5T0CXSQaE7uvVRg",
"type" : "query_shard_exception"
},
"index" : "archiv1_users_dev"
}
],
"failed" : 12
}
}

Based on your description :
you store documents on elasticsearch with a bunch of properties
each document contains a list of customer within array (nested documents)
you want to extract only nested document related to a customer.id
your lib does not manage Elasticsearch response without buckets
your are expecting Elasticsearch to return Nested Documents
Problem
It exists 2 kind of aggregations :
buckets
metrics
In your case you ve 2 Aggregations under Nested Agg : Filter and Metric.
Filter :
Filter defines a single bucket of all the documents but does not provide 'bucket' keyword on results.
Top hits is a Metric and does not provides a bucket.
workaround :
I doubt that your PHP lib will handle correctly the Nested aggregation result, but you could use Filters instead of Filter Aggregations to get a bucket list
{
"aggregations": {
"nested": {
"nested": {
"path": "customers"
},
"aggregations": {
"filters_customer": {
"filters": {
"filters": [
{
"match": {
"customers.customer_id": "1"
}
}
]
},
"aggregations": {
"top_hits_customer": {
"top_hits": {}
}
}
}
}
}
}
}
Will provide something like :
{
"aggregations": {
"nested": {
"doc_count": 15,
"filters_customer": {
"buckets": [
{
"doc_count": 3,
"top_hits_customer": {
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_nested": {
"field": "customers",
"offset": 0
},
"_score": 1,
"_source": {
"customer_id": 1,
"foo": "bar"
}
},
{
"_nested": {
"field": "customers",
"offset": 0
},
"_score": 1,
"_source": {
"customer_id": 1,
"foo": "bar"
}
},
{
"_nested": {
"field": "customers",
"offset": 0
},
"_score": 1,
"_source": {
"customer_id": 1,
"foo": "bar"
}
}
]
}
}
}
]
}
}
}
}
Note on your EDIT 2
Elasticsearch will search over all documents, not on 'TOP 1' document based on your report date. A way to split your results by report is using term bucket on report date :
{
"query": {},
"size": 0,
"aggregations": {
"grp_report": {
"terms": {
"field": "generation_date"
},
"aggregations": {
"nested_customers": {
"nested": {
"path": "customers"
},
"aggregations": {
"filters_customer": {
"filters": {
"filters": [
{
"match": {
"customers.customer_id": "1"
}
}
]
},
"aggregations": {
"top_hits_customer": {
"top_hits": {}
}
}
}
}
}
}
}
}
}
Advices :
Avoid complex documents, prefer splitting your report in small documents with a related key (reportId for example). You will be able to filter and aggregate easily without any nested document. Add on customer document information on witch you will filter across all types (redundancy is not a problem in this case).
Use case examples :
reports listing
show customers information per reports
show history for a customer across multiple reports
Current document example : /indexName/market_audit
{
"generation_date": "...",
"customers": [
{
"id": 1,
"foo": "bar 1"
},
{
"id": 2,
"foo": "bar 2"
},
{
"id": 3,
"foo": "bar 3"
}
]
}
Reformated document :
/indexName/market_audit_report
{
"report_id" : "123456"
"generation_date": "...",
"foo":"bar"
}
/indexName/market_audit_customer documents
{
"report_id" : "123456"
"customer_id": 1,
"foo": "bar 1"
}
{
"report_id" : "123456"
"customer_id": 2,
"foo": "bar 2"
}
{
"report_id" : "123456"
"customer_id": 3,
"foo": "bar 3"
}
If you know your report id you will be able to get all your data in one request :
a filter on report id
a term aggregation on type
a filter on type report
a top_hit aggregation to get report
a filter aggregation to get only type customer and customer id 1
a top_hit aggregation to customer 1 info
Or
a filter on report id
a term aggregation on type
a filter on type report
a top_hit aggregation to get report
a term aggregation on customer id
a top_hit aggregation to retrieve information per customer
Top Hits Aggregation Size
Do not forget to provide a size in your top_hits otherwise you will get only the top 3

Reading elasticsearch first line of aggregations definition I think that you don't understand well how it works:
The aggregations framework helps provide aggregated data based on a
search query
Since your query hasn't any filter at all, returning ALL the stored documents in hits.hits objects is the expected result.
Then you use a filter aggregation that helps you to get desired documents, but they are in aggs property of returned dict
If I'm right, I'd recommend you to keep it as simple as you can, so here's my guessed query
{
"query": {
"filtered": {
"filter": {
"nested": {
"path" : "customers",
"filter": {
"bool": {
"must" : [
"term": {"customer_id" : "1"}
]
}
}
}
}
}
},
"aggregations": {
"tophits_agg": {
"top_hits": {}
}
}
}

Number format Exception For string type

I have a mapping like this
{
"settings": {
"analysis": {
"filter": {
"nGramFilter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
},
"email" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"([^#]+)",
"(\\p{L}+)",
"(\\d+)",
"#(.+)"
]
},
"number" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"([^+-]+)",
"(\\d+)"
]
},
"edgeNGramFilter": {
"type": "nGram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGramAnalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"nGramFilter"
]
},
"whitespaceAnalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
},
"email" : {
"tokenizer" : "uax_url_email",
"filter" : [
"email",
"lowercase",
"unique"
]
},
"number" : {
"tokenizer" : "whitespace",
"filter" : [ "number", "unique" ]
},
"edgeNGramAnalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"edgeNGramFilter"
]
}
}
}
},
"users": {
"mappings": {
"user_profiles": {
"properties": {
"firstName": {
"type": "string",
"analyzer": "nGramAnalyzer",
"search_analyzer": "whitespaceAnalyzer"
},
"lastName": {
"type": "string",
"analyzer": "nGramAnalyzer",
"search_analyzer": "whitespaceAnalyzer"
},
"email": {
"type": "string",
"analyzer": "email",
"search_analyzer": "whitespaceAnalyzer"
},
"score" : {
"type": "string"
},
"homeLandline": {
"type": "string",
"analyzer": "number",
"search_analyzer": "whitespaceAnalyzer"
},
"dob": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"mobile": {
"type": "integer"
},
"residenceCity": {
"type": "string",
"analyzer": "edgeNGramAnalyzer",
"search_analyzer": "whitespaceAnalyzer"
},
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
}
}
}
}
}
I can get the score as integer as well as "NA" so I mapped the type as string but while posting data to the index i am getting Number Format Exception.
For Example:
if I post first data as integer and followed by "NA". I am getting these exception.
while checking my log file I am getting this errors:
[2016-08-29 15:19:01] elasticlog.WARNING: Response ["{\"error\":{\"root_cause\":[{\"type\":\"mapper_parsing_exception\",\"reason\":\"failed
to parse
[score]\"}],\"type\":\"mapper_parsing_exception\",\"reason\":\"failed
to parse
[score]\",\"caused_by\":{\"type\":\"number_format_exception\",\"reason\":\"For
input string: \"NH\"\"}},\"status\":400}"] []

Your mapping is incorrect. It should be, assuming, users is the index name and user_profiles is the type:
{
"users": {
"mappings": {
"user_profiles": {
"properties": {
"score": {
"type": "string"
}
}
}
}
}
}
You have a missing mappings before user_profiles.

How to modify a laravel collection that has been groupedby

I have the following code which takes a collection of CallRecords and then applies groupBy them as well as a filter:
$Groups = $CallRecords->filter( function($CallRecord)
{
return isset($CallRecord->meta->reason_not_connected) && true;
})
->groupBy('meta.reason_not_connected');
return $Groups;
Which returns
"example": {
"Reached Voicemail - No Message": [
{
"id": "44",
"phone_number_id": "51",
},
{
"id": "55",
"phone_number_id": "31",
},
],
"Reached Voicemail - Left Message": [
{
"id": "19",
"phone_number_id": "11",
},
{
"id": "20",
"phone_number_id": "21",
},
]
}
How can I morph this collection to display counts like this:
"example": {
"Reached Voicemail - No Message": 2,
"Reached Voicemail - Left Message": 2
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

mongodb aggregation query by using php - php

Related

Trying to implement ElasticSearch sorting conditionally

Sort parent user list by inner hits elasticsearch

Get nested documents with a filter on Elasticsearch 5

Number format Exception For string type

How to modify a laravel collection that has been groupedby

Categories

Resources