Filtering in Elastic Search? - php

I'm using date_histogram api to get the actual count using the interval (hour/day/week or month). Also I have a feature which I'm having trouble implementing, a user can filter the results by entering an startDate and endDate (textbox) which will be queried using a field timestamp. So how can I filter the results by querying only one field (which is TIMESTAMP) while using date_histogram api or any api so I can achieve my desire result.
In SQL I will just use a between operator to get the result but from what I've read so far their is no BETWEEN operator in Elastic Search (not sure).
I have this script so far:
curl 'http://anotherdomain.com:9200/myindex/_search?pretty=true' -d '{
"query" : {
"filtered" : {
"filter" : {
"exists" : {
"field" : "adid"
}
},
"query" : {
"query_string" : {
"fields" : [
"adid", "imp"
],
"query" : "525826 AND true"
}
}
}
},
"facets" : {
"histo1":{
"date_histogram":{
"field":"timestamp",
"interval":"day"
}
}
}
}'

In elasticsearch you can use range query of filter to achieve that.

Related

elasticsearch aggregations on substring

I have a field indexed as String in elasticsearch 5
For example 20090219 , 20100416 etc
I can make a aggregation this data, But I want to aggregate on substring.
that is on
2009,2010
I don't want to convert to date. I want to get first 4 characters and get the count.
This is my current code.Very new to Elasticsearch
$params['body']["aggs"]["Year"]["terms"]["field"] = "PublicationDate.keyword";
$params['body']["aggs"]["Year"]["terms"]["size"] = 10;
$params['body']["aggs"]["Year"]["terms"]["order"]["_count"] = "desc";
You can use elasticsearch script feature to achieve this.
GET my-index/_search
{
"aggs" : {
"my-agg" : {
"terms" : {
"script": {
"inline": "doc['PublicationDate.keyword'].getValue().substring(0,4)"
},
"size": 10,
"order" : { "_count" : "desc" }
}
}
}
}
I don't know equivalent php script for above command, but believe you will able to make it work in php.
this did the task
$params['body']["aggs"]["PublicationYear"]["terms"]["script"] = "_value.substring(0,4)";

Pull mysql database into elasticsearch

I am using elasticsearch in my project and my requirement pulling a large MySQL data into Elasticsearch using Elasticsearch JDBC River plugin. My need is to sync mysql table to elasticsearch so i'm creating a mapping for jdbc river index.
curl -XPOST http://localhost:9200/city -d '
{
"mappings" : {
"city_type": {
"properties" : {
"domain" : {
"type" : "multi_field",
"fields" : {
"domain" : {
"type" : "string",
"index" : "analyzed"
},
"exact" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"sent_date" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}'
After creating the mapping in elasticsearch . i want to load the mysql table data into it. so i'm using the following command.
curl -XPUT 'localhost:9200/river/city/_meta?pretty' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "root",
"password" : "root",
"sql" : "select id as _id,id as domain from city;",
"strategy":"oneshot"
},
"index" :{
"index" : "city",
"type" : "city_type",
"bulk_size":500
}
}'
These queries are successfully run and after these query when i run the command to find the data in elasticsearch is empty.
http://localhost:9200/river/_search?pretty&q=*
Please check the response of the above query here. Why the data is not showing in the elasticsearch query please help.
River has been deprecated https://github.com/elastic/elasticsearch/issues/10345 by the way.
I would highly recommend jprante jdbc importer which is a java stand-alone allowing to do the operations you are needing. https://github.com/jprante/elasticsearch-jdbc. It is not exactly a river as you have defined one.
Concerning your question, could you please try http://localhost:9200/_search?pretty&q=* ? With your syntax, you are actually looking for data in index river. You should look on all index with the query I wrote or in city index : http://localhost:9200/city/city_type/_search?pretty&q=*
If I were in your shoes, I would use logstash to push the data from MySQL to Elastic. River is deprecated since a long time ago as #Artholl already mentioned.
See https://www.elastic.co/blog/logstash-jdbc-input-plugin

Limit returned fields with PHP MongoClient find() query

I am writing a PHP MongoClient Model which accesses mongodb that stores deploy logs with gitlab information, server hosts, and zend restart instructions. I have a mongo Collection called deployAppConfigs. Its document structure looks like this:
{
"_id" : ObjectId("54de193790ded22d1cd24c36"),
"app_name" : "ai2_api",
"name" : "AI2 Admin API",
"app_directory" : "path_to_app",
"app_owner" : "www-data:deployers",
"directories" : [],
"vcs" : {
"type" : "git",
"name" : "input/ai2-api"
},
"environments" : {
"development" : {
...
},
"qa" : {
...
},
"staging" : {
...
},
"production" : {
...
},
"actions" : {
"post_checkout" : [
"composer_install"
]
}
}
Because there are many documents in this collection, I would like to query the entire collection for only the "vcs" sub document and the "app_name". I am able to execute this command in Robomongo's mongo shell with the following find() query:
db.deployAppConfigs.find({}, {"vcs": 1, "app_name": 1})
This returns exactly what I want for each document in the collection:
{
"_id" : ObjectId("54de193790ded22d1cd24c36"),
"app_name" : "ai2_api",
"vcs" : {
"type" : "git",
"name" : "input/ai2-api"
}
}
I am having a problem writing a PHP MongoClient equivalent to that mongo shell command. I basically want to make a PHP MongoClient version of this mongo docs example on Limit Fields to Return from a Query
I have tried using an empty array to replace the "{}" in the mongo shell command like this, but it hasn't worked:
$query = array (
array(),
array("vcs"=> 1, "app_name"=> 1)
);
All the fields share the vcs.type = "git" so I tried wrote a query that selects all fields in every document based on that shared value. It looks like this:
$query = array (
"vcs.type" => "git"
);
But this returns the entire document, which is what I want to avoid.
The alternative could be to do a limit projection find() for the first document in the collection and then use the MongoCursor to iterate through the whole collection, but I'd rather not have to do the extra loop if possible.
Essentially, I am asking how to limit the return fields of a find() query to only one subdocument of each document in the entire collection.
looks like I was able to find the solution... I will solve the question and leave it up in case it ends up being useful to anyone else.
What I ended up having to do was alter my MongoClient custom class find() function, which calls the $collection->find() query, to include a $fields parameter.
Now, the MongoClient->find() query looks like this:
$collection->find(
array("vcs.type" => "git"),
array("vcs" => 1, "app_name" = 1)
)
Found the answer on the MongoClient::cursor::find() : here

matching whole string with dashes in elasticsearch

I have an elasticsearch query which I am trying to match properly, the field data itself contains -(dashes), the string data are GUIDS
It was not matching properly because it was splitting the term up into separate words split by the -
I have since changed the query to use a match_phrase query like this:
"query": {
"filtered": {
"query": {
"match_phrase":{
"guid":{"operator" : "or","query":"bd2acb42-cf01-11e2-ba92-12313916f4be"}
}
}
}
}
When I am trying to match just one GUIDS, this works just fine.
However I am trying to match multiple GUIDS
So it currently looks like
"query": {
"filtered": {
"query": {
"match_phrase":{
"guid":{"operator" : "or","query":"bd2acb42-cf01-11e2-ba92-12313916f4be d1091f08-ceff-11e2-ba92-12313916f4be"}
}
}
}
}
I assume its not working because its trying to match the whole string, and not each GUID separately.
I tried added "analyzer" : "whitespace", to the query, but this broke the query entirely.
So what is the best method to ensure the query is looking for the whole GUID string and allows matching of multiple GUIDS?
I have been setting the field mapping to not_analyzed for similar purposes.
"guid" : {
"type" : "string",
"index" : "not_analyzed"
}
Building the query manually then works.
{
"bool" : {
"should" : [
{
"term" : { "guid" : "bd2acb42-cf01-11e2-ba92-12313916f4be" }
},
{
"term" : { "guid" : "d1091f08-ceff-11e2-ba92-12313916f4be" }
}
],
"minimum_number_should_match" : 1
}
}

Sort data in sub array in mongodb

Is that possible to sort data in sub array in mongo database?
{ "_id" : ObjectId("4e3f8c7de7c7914b87d2e0eb"),
"list" : [
{
"id" : ObjectId("4e3f8d0be62883f70c00031c"),
"datetime" : 1312787723,
"comments" :
{
"id" : ObjectId("4e3f8d0be62883f70c00031d")
"datetime": 1312787723,
},
{
"id" : ObjectId("4e3f8d0be62883f70c00031d")
"datetime": 1312787724,
},
{
"id" : ObjectId("4e3f8d0be62883f70c00031d")
"datetime": 1312787725,
},
}
],
"user_id" : "3" }
For example I want to sort comments by field "datetime". Thanks. Or only variant is to select all data and sort it in PHP code, but my query works with limit from mongo...
With MongoDB, you can sort the documents or select only some parts of the documents, but you can't modify the documents returned by a search query.
If the current order of your comments can be changed, then the best solution would be to sort them in the MongoDB documents (find(), then for each doc, sort its comments and update()). If you want to keep the current internal order of comments, then you'll have to sort each document after each query.
In both case, the sort will be done with PHP. Something like:
foreach ($doc['list'] as $list) {
// uses a lambda function, PHP 5.3 required
usort($list['comments'], function($a,$b){ return $a["datetime"] < $b["datetime"] ? -1 : 1; });
}
If you can't use PHP 5.3, replace the lambda function by a normal one. See usort() examples.

Categories