This is just a quick question. I am planning to change my search system to elasticsearch. If I send elasticsearch three pieces of information in my request:
Long
Lat
Distance
And my elasticsearch results are tagged with the following information:
Long
Lat
Is it possible for me to only return results within that distance without needed any additional elasticsearch plugins? Is this functionality built into elasticsearch?
Yes, in three pretty simple steps. You must map the data (like setting up a database schema), insert the data, and then search for it (filter).
You must appropriately map the field so that it is recognized as a geo_point. Note: If you wanted more flexibility to store other types of geolocations (e.g., polygons or linestrings), then you would want to map it as a geo_shape.
{
"mappings" : {
"TYPE_NAME" : {
"properties" : {
"fieldName" : { "type" : "geo_point" }
}
}
}
}
Once mapped, you can insert the location details using the point type.
{
"fieldName" : {
"type" : "point",
"coordinates" : [ longitude, latitude ]
}
}
You can type it in differently to get the same effect, but perhaps more intuitively (rather than [ X, Y ], which is not the most common geospatial order).
{
"fieldName" : {
"type" : "point",
"coordinates" : {
"lat" : latitude,
"lon" : longitude
}
}
}
Either way, it will represent the same thing (same goes for below in the filter).
Once you have mapped the field and started to insert points, then you can run a geo_distance filter to find points within the desired distance:
{
"filtered" : {
"query" : { "match_all" : {} },
"filter" : {
"geo_distance" : {
"distance" : "10mi",
"fieldName" : [ longitude, latitude ]
}
}
}
}
Related
I am using elasticsearch in my project and my requirement pulling a large MySQL data into Elasticsearch using Elasticsearch JDBC River plugin. My need is to sync mysql table to elasticsearch so i'm creating a mapping for jdbc river index.
curl -XPOST http://localhost:9200/city -d '
{
"mappings" : {
"city_type": {
"properties" : {
"domain" : {
"type" : "multi_field",
"fields" : {
"domain" : {
"type" : "string",
"index" : "analyzed"
},
"exact" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"sent_date" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}'
After creating the mapping in elasticsearch . i want to load the mysql table data into it. so i'm using the following command.
curl -XPUT 'localhost:9200/river/city/_meta?pretty' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "root",
"password" : "root",
"sql" : "select id as _id,id as domain from city;",
"strategy":"oneshot"
},
"index" :{
"index" : "city",
"type" : "city_type",
"bulk_size":500
}
}'
These queries are successfully run and after these query when i run the command to find the data in elasticsearch is empty.
http://localhost:9200/river/_search?pretty&q=*
Please check the response of the above query here. Why the data is not showing in the elasticsearch query please help.
River has been deprecated https://github.com/elastic/elasticsearch/issues/10345 by the way.
I would highly recommend jprante jdbc importer which is a java stand-alone allowing to do the operations you are needing. https://github.com/jprante/elasticsearch-jdbc. It is not exactly a river as you have defined one.
Concerning your question, could you please try http://localhost:9200/_search?pretty&q=* ? With your syntax, you are actually looking for data in index river. You should look on all index with the query I wrote or in city index : http://localhost:9200/city/city_type/_search?pretty&q=*
If I were in your shoes, I would use logstash to push the data from MySQL to Elastic. River is deprecated since a long time ago as #Artholl already mentioned.
See https://www.elastic.co/blog/logstash-jdbc-input-plugin
Some buddy help me?
I have tried to get distance between two address using google map but when i have change source and destination vice-versa then google map giving me diffrent response.
First Address: 1 Airport Drive, Oakland, CA 94621, USA
Second Address: 44085 Laurel Canyon Way, Fremont, CA 94539, USA
http://maps.googleapis.com/maps/api/directions/json?origin=Oakland+International+Airport+%28OAK%29,+1+Airport+Drive,+Oakland,+CA+94621,+USA&destination=44085+Laurel+Canyon+Way,+Fremont,+CA,+United%20States&sensor=false
Response:
"routes" : [
{
"bounds" : {
"northeast" : {
"lat" : 37.7311528,
"lng" : -121.932651
},
"southwest" : {
"lat" : 37.5066302,
"lng" : -122.2137008
}
},
"copyrights" : "Map data ©2015 Google",
"legs" : [
{
"distance" : {
"text" : "24.6 mi",
"value" : 39536
},
"duration" : {
"text" : "33 mins",
"value" : 1982
},
http://maps.googleapis.com/maps/api/directions/json?origin=44085+Laurel+Canyon+Way,+Fremont,+CA,+United%20States&destination=Oakland+International+Airport+%28OAK%29,+1+Airport+Drive,+Oakland,+CA+94621,+USA&sensor=false
Response:
"routes" : [
{
"bounds" : {
"northeast" : {
"lat" : 37.7325325,
"lng" : -121.932675
},
"southwest" : {
"lat" : 37.5081351,
"lng" : -122.2137008
}
},
"copyrights" : "Map data ©2015 Google",
"legs" : [
{
"distance" : {
"text" : "24.2 mi",
"value" : 38993
},
"duration" : {
"text" : "31 mins",
"value" : 1839
},
In given response object distance->text and distance->value both response diffrent.
I can't understand if same address so why this happen.
Google map does not find distances using the formula distance between two points, instead they calculate distance based on routes.
There are various routes to reach the destination. Hence when you check the vice versa, it could have taken an alternative route.
If you look at Google Maps (not API) you see that routes are not the same.
It is often the case around airports.
Google's algorithm for Distance works around the routes. So for the same origin and destination the distance and duration would differ. It also takes the traffic into consideration and gives you the best optimized route so you have no control on the distance and duration the API is going to return for same origin and destination if they are flipped.
One work around would be to specify waypoints falling in between each route from origin to destination and destination to origin so that there is no change in the route and would possibly return the same distance in both directions.
I'm new to the map reduce concept and even though I'm making some slow progress, I'm finding some issues that I need some help with.
I have a simple collection consisting of an id, city and and destination, something like this:
{ "_id" : "5230e7e00000000000000000", "city" : "Boston", "to" : "Chicago" },
{ "_id" : "523fe7e00000000000000000", "city" : "New York", "to" : "Miami" },
{ "_id" : "5240e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
{ "_id" : "536fe4e00000000000000000", "city" : "Washington D.C.", "to" : "Boston" },
{ "_id" : "53ffe7e00000000000000000", "city" : "New York", "to" : "Boston" },
{ "_id" : "5740e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
...
(Please do note that this data is just made up for example purposes)
I'd like to group by city the destinations including a count:
{ "city" : "Boston", values : [{"Chicago",1}, {"Miami",2}] }
{ "city" : "New York", values : [{"Miami",1}, {"Boston",1}] }
{ "city" : "Washington D.C.", values : [{"Boston", 1}] }
For this I'm starting to playing with this function to map:
function() {
emit(this.city, this.to);
}
which performs the expected grouping. My reduce function is this:
function(key, values) {
var reduced = {"to":[]};
for (var i in values) {
var item = values[i];
reduced.to.push(item);
}
return reduced;
}
which gives somewhat an expected output:
{ "_id" : ObjectId("522f8a9181f01e671a853adb"), "value" : { "to" : [ "Boston", "Miami" ] } }
{ "_id" : ObjectId("522f933a81f01e671a853ade"), "value" : { "to" : [ "Chicago", "Miami", "Miami" ] } }
{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : "Boston" }
As you can see, I still haven't counted the repeated cities, but as can be seen above, for some reason the last result in the output doesn't look good. I'd expected it to be
{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : { "to" : ["Boston"] } }
Has this anything to do with the fact that there is a single item? Is there any way to obtain this?
Thank you.
I see you are asking about a PHP issue, but you are using javascript to ask, so I’m assuming a javascript answer will help you move things along. As such here is the javascript needed in the shell to run your aggregation. I strong suggest getting your aggregation working in the shell(or some other javascript editor) in general and then translating it into the language of your choice. It is a lot easier to see what is going on and there faster using this method. You can then run:
use admin
db.runCommand( { setParameter: 1, logLevel: 2 } )
to check the bson output of your selected language vs what the shell looks like. This will appear in the terminal if mongo is in the foreground, otherwise you’ll have ot look in the logs.
Summing the routes in the aggregation framework [AF] with Mongo is fairly strait forward. The AF is faster and easier to use then map reduce[MR]. Though in this case they both have similar issues, simply pushing to an array won’t yield a count in and of itself (in MR you either need more logic in your reduce function or to use a finalize function).
With the AF using the example data provided this pipeline is useful:
db.agg1.aggregate([
{$group:{
_id: { city: "$city", to: "$to" },
count: { $sum: 1 }
}},
{$group: {
_id: "$_id.city",
to:{ $push: {to: "$_id.to", count: "$count"}}
}}
]);
The aggregation framework can only operate on known fields, but many pipeline operations so a problem needs to broken down with that as a consideration.
Above, the 1st stage calculates the numbers need, for which there are 3 fixed fields: the source, the destination, and the count.
The second stage has 2 fixed fields, one of which is an array, which is only being pushed to (all the data for the final form is there).
For MR you can do this:
var map = function() {
var key = {source:this.city, dest:this.to};
emit(key, 1);
};
var reduce = function(key, values) {
return Array.sum(values);
};
A separate function will have to pretty it however.
If you have any additional questions please don’t hesitate to ask.
Best,
Charlie
I have a pretty big MongoDB document that holds all kinds of data. I need to identify the fields that are of type array in a collection so I can remove them from the displayed fields in the grid that I will populate.
My method now consists of retrieving all the field names in the collection with
This was taken from the response posted here MongoDB Get names of all keys in collection
mr = db.runCommand({
"mapreduce" : "Product",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "things" + "_keys"
})
db[mr.result].distinct("_id")
And running for each of the fields a query like this one
db.Product.find( { $where : "Array.isArray(this.Orders)" } ).count()
If there's anything retrieved the field is considered an array.
I don't like that I need to run n+2 queries ( n being the number of different fields in my collection ) and I wouldn't like to hardcode the fields in the model. It would defeat the whole purpose of using MongoDB.
Is there a better method of doing this ?
I made a couple of slight modifications to the code you provided above:
mr = db.runCommand({
"mapreduce" : "Product",
"map" : function() {
for (var key in this) {
if (Array.isArray(this[key])) {
emit(key, 1);
} else {
emit(key, 0);
}
}
},
"reduce" : function(key, stuff) { return Array.sum(stuff); },
"out": "Product" + "_keys"
})
Now, the mapper will emit a 1 for keys that contain arrays, and a 0 for any that do not. The reducer will sum these up, so that when you check your end result:
db[mr.result].find()
You will see your field names with the number of documents in which they contain Array values (and a 0 for any that are never arrays).
So this should give you which fields contain Array types with just the map-reduce job.
--
Just to see it with some data:
db.Product.insert({"a":[1,2,3], "c":[1,2]})
db.Product.insert({"a":1, "b":2})
db.Product.insert({"a":1, "c":[2,3]})
(now run the "mr =" code above)
db[mr.result].find()
{ "_id" : "_id", "value" : 0 }
{ "_id" : "a", "value" : 1 }
{ "_id" : "b", "value" : 0 }
{ "_id" : "c", "value" : 2 }
I'm using date_histogram api to get the actual count using the interval (hour/day/week or month). Also I have a feature which I'm having trouble implementing, a user can filter the results by entering an startDate and endDate (textbox) which will be queried using a field timestamp. So how can I filter the results by querying only one field (which is TIMESTAMP) while using date_histogram api or any api so I can achieve my desire result.
In SQL I will just use a between operator to get the result but from what I've read so far their is no BETWEEN operator in Elastic Search (not sure).
I have this script so far:
curl 'http://anotherdomain.com:9200/myindex/_search?pretty=true' -d '{
"query" : {
"filtered" : {
"filter" : {
"exists" : {
"field" : "adid"
}
},
"query" : {
"query_string" : {
"fields" : [
"adid", "imp"
],
"query" : "525826 AND true"
}
}
}
},
"facets" : {
"histo1":{
"date_histogram":{
"field":"timestamp",
"interval":"day"
}
}
}
}'
In elasticsearch you can use range query of filter to achieve that.