Elasticsearch wont apply not_analyzed into my mapping - php

When I try to apply "not_analyzed" into my ES mapping it doesnt work.
I am using this package for ES in Laravel - Elasticquent
My mapping looks like:
'ad_title' => [
'type' => 'string',
'analyzer' => 'standard'
],
'ad_type' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
'ad_type' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
'ad_state' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
Afterwards I do an API get call to view the mapping and it will output:
"testindex": {
"mappings": {
"ad_ad": {
"properties": {
"ad_city": {
"type": "integer"
},
"ad_id": {
"type": "long"
},
"ad_state": {
"type": "integer"
},
"ad_title": {
"type": "string",
"analyzer": "standard"
},
"ad_type": {
"type": "integer"
},
Note that not_analyzed is missing.
I cant see any errors/warnings in my logs either.

What I gathered from my own experience is that you must do the mapping before you do any indexing. Delete the index you've created, assign your not_analyzed mapper and then index your fields again, and you will have the not_analyzed field appear. Please let me know if this works for you. Thank you.

There is an easy test to see if it works:
PUT /stack
{
"mappings": {
"try": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"age": {
"type": "integer",
"index": "not_analyzed"
}
}
}
}
}
GET /stack/_mapping
and the response is:
{
"stack": {
"mappings": {
"try": {
"properties": {
"age": {
"type": "integer"
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}

Related

Elasticseach or query for comma separated values

I am saving id's in the database as comma separated and indexing the same to ElasticSearch. Now I need to retrieve if the user_id matches with the value.
For example it it saving like this in the indexing for the column user_ids (database type is varchar(500) in elasticsearch it is text)
8938,8936,8937
$userId = 8936; // For example expecting to return that row
$whereCondition = [];
$whereCondition[] = [
"query_string" => [
"query"=> $userId,
"default_field" => "user_ids",
"default_operator" => "OR"
]
];
$searchParams = [
'query' => [
'bool' => [
'must' => [
$whereCondition
],
'must_not' => [
['exists' => ['field' => 'deleted_at']]
]
]
],
"size" => 10000
];
User::search($searchParams);
Json Query
{
"query": {
"bool": {
"must": [
[{
"query_string": {
"query": 8936,
"default_field": "user_ids",
"default_operator": "OR"
}
}]
],
"must_not": [
[{
"exists": {
"field": "deleted_at"
}
}]
]
}
},
"size": 10000
}
Mapping details
{
"user_details_index": {
"aliases": {},
"mappings": {
"test_type": {
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"deleted_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"updated_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"user_ids": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"settings": {
"index": {
"creation_date": "1546404165500",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "krpph26NTv2ykt6xE05klQ",
"version": {
"created": "6020299"
},
"provided_name": "user_details_index"
}
}
}
}
I am trying with above logic, but not unable to retrieve. Can someone help on this.
Since the field user_ids is of type text any no analyzer is specified for it by default it will use standard analyzer which won't break 8938,8936,8937 into terms 8938, 8936 and 8937 and hence the id can't match.
To solve this I would suggest you to store array of ids to user_ids field instead of csv. So while indexing you json input should look as below:
{
...
"user_ids": [
8938,
8936,
8937
]
...
}
Since user ids are integer values following changes should be done in mapping:
{
"user_ids": {
"type": "integer"
}
}
The query will be now as follow:
{
"query": {
"bool": {
"filter": [
[
{
"terms": {
"userIds": [
8936
]
}
}
]
],
"must_not": [
[
{
"exists": {
"field": "deleted_at"
}
}
]
]
}
},
"size": 10000
}

mapper_parsing_exception in ElasticSearch php client

Error
[body] => {"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to
parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can
only be called on some xcontent bytes or compressed xcontent ytes"}},"status":400}
I'm getting an error while adding the documents to my index.
http://localhost:9595/patient_trimester
{
"patient_trimester": {
"aliases": {
},
"mappings": {
"_default_": {
"_all": {
"enabled": true
},
"dynamic_templates": [
{
"string_fields": {
"mapping": {
"index": "not_analyzed",
"omit_norms": true,
"type": "string"
},
"match": "*",
"match_mapping_type": "string"
}
}
],
"properties": {
"#version": {
"type": "string",
"index": "not_analyzed"
}
}
},
"patient_trimester": {
"_all": {
"enabled": true
},
"dynamic_templates": [
{
"string_fields": {
"mapping": {
"index": "not_analyzed",
"omit_norms": true,
"type": "string"
},
"match": "*",
"match_mapping_type": "string"
}
}
],
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"#version": {
"type": "string",
"index": "not_analyzed"
},
"last_consult_by": {
"type": "string",
"index": "not_analyzed"
},
"mpi": {
"type": "string",
"index": "not_analyzed"
},
"bill_id": {
"type": "integer"
},
"bill_date": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"site": {
"type": "string",
"index": "not_analyzed"
},
"effective_edd": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"is_converted": {
"type": "integer"
},
"admitting_physician": {
"type": "string",
"index": "not_analyzed"
},
"days": {
"type": "integer"
},
"trim": {
"type": "string",
"index": "not_analyzed"
},
"tags": {
"type": "string",
"index": "not_analyzed"
}
}
}
},
"warmers": {
}
}
}
This is how I created the index through postman.
in the $result variable im sending
(
[last_consult_by] => xxxxxx
[mpi] => xxxxxxxx
[bill_id] => 176073
[bill_date] => 2018-07-12 12:00:00
[site] => xxx
[effective_edd] => 2018-07-28 12:00:00
[is_converted] => 0
[admitting_physician] => xxxxxxxxx
[days] => 16
[trim] => Array
(
[trim3] => 1
)
)
$params = [
'index' => 'patient_trimester',
'type' => 'patient_trimester',
'body' => $result
];
$res = $client->index($params);
print_r($res); exit;
I'm not getting why mapper_parsing_exception is happening.
Is this because of my mapping of datatypes? mapping given for Datatype of effective_edd,bill_date and the trim is right ?
please help me out to resolve this issue.

ElasticSearch match combination in array

I'm implementing ElasticSearch into my Laravel application using the php package from ElasticSearch.
My application is a small jobboard and currently my job document is looking like this:
{
"_index":"jobs",
"_type":"job",
"_id":"19",
"_score":1,
"_source":{
"0":"",
"name":"Programmer",
"description":"This is my first job! :)",
"text":"Programming is awesome",
"networks":[
{
"id":1,
"status":"PRODUCTION",
"start":"2015-02-26",
"end":"2015-02-26"
},
{
"id":2,
"status":"PAUSE",
"start":"2015-02-26",
"end":"2015-02-26"
}
]
}
}
As you can see a job can be attached to multiple networks. In my search query I would like to include WHERE network.id == 1 AND network.status == PRODUCTION.
My current query looks like this, however this returns documents where it has a network of id 1, if it has any network of status PRODUCTION. Is there anyway i can enforce both to be true within one network?
$query = [
'index' => $this->index,
'type' => $this->type,
'body' => [
'query' => [
'bool' => [
'must' => [
['networks.id' => 1]],
['networks.status' => 'PRODUCTION']]
],
'should' => [
['match' => ['name' => $query]],
['match' => ['text' => $query]],
['match' => ['description' => $query]],
],
],
],
],
];
You need to specify that the objects in the networks array should be stored as individual objects in the index, this will allow you to perform a search on individual network objects. You can do so using the nested type in Elasticsearch.
Also, if you doing exact matches it is better to use a filter rather than a query as the filters are cached and always give you better performance than a query.
Create your index with a new mapping. Use the nested type for the networks array.
POST /test
{
"mappings": {
"job": {
"properties": {
"networks": {
"type": "nested",
"properties": {
"status": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}
Add a document:
POST /test/job/1
{
"0": "",
"name": "Programmer",
"description": "This is my first job! :)",
"text": "Programming is awesome",
"networks": [
{
"id": 1,
"status": "PRODUCTION",
"start": "2015-02-26",
"end": "2015-02-26"
},
{
"id": 2,
"status": "PAUSE",
"start": "2015-02-26",
"end": "2015-02-26"
}
]
}
As you have a nested type you will need to use a nested filter.
POST /test/job/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "networks",
"filter": {
"bool": {
"must": [
{
"term": {
"networks.id": "1"
}
},
{
"term": {
"networks.status.raw": "PRODUCTION"
}
}
]
}
}
}
}
}
}
}

Not sure if my mapping worked in Elasticsearch

Im using ES with my Laravel app using this Elasticquent package.
My mapping looks like this before I index my DB:
'ad_title' => [
'type' => 'string',
'analyzer' => 'standard'
],
'ad_type' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
'ad_type' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
'ad_state' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
But when I do the API call _mapping?pretty afterwards
My mapping looks like this:
"testindex": {
"mappings": {
"ad_ad": {
"properties": {
"ad_city": {
"type": "integer"
},
"ad_id": {
"type": "long"
},
"ad_state": {
"type": "integer"
},
"ad_title": {
"type": "string",
"analyzer": "standard"
},
"ad_type": {
"type": "integer"
},
Shouldnt I be able to see 'index' => 'not_analyzed' in my mapping afterwards? Or does 'index' => 'not_analyzed' not show in the map structure afterwards?
You are correct, the mapping did not get applied. You would see the not_analyzed in the mapping API if it was applied correctly.
Make sure that you apply the mapping BEFORE you write any data. We apply the mapping on application startup to verify the mapping is always correct and to apply any mapping updates.
Here is a sample of how to apply a mapping:
PUT hilden1
PUT hilden1/type1/_mapping
{
"properties": {
"regular": {
"type": "string"
},
"indexSpecified": {
"type": "string",
"index": "not_analyzed"
}
}
}
To verify that mapping use the GET api
GET hilden1/type1/_mapping
You should see that the field "regular" only specifies it's type, where as "indexSpecified" is listed as not_analyzed. Here is the output from my machine running ES 1.4.4
{
"hilden1": {
"mappings": {
"type1": {
"properties": {
"indexSpecified": {
"type": "string",
"index": "not_analyzed"
},
"regular": {
"type": "string"
}
}
}
}
}
}

Elasticsearch Snowball Analyzer wants exact word

I Have been using Elastic Search for a project, but I find the result of Snowball Analyzer a bit strange.
Below is my example of Mapping used.
$myTypeMapping = array(
'_source' => array(
'enabled' => true
),
'properties' => array(
'id' => array(
'type' => 'integer',
'index' => 'not_analyzed'
),
'name' => array(
'type' => 'string',
'analyzer' => 'snowball',
'boost' => 2.0
),
'food_types' => array(
'type' => 'string',
'analyzer' => 'keyword'
),
'location' => array(
'type' => 'geo_point',
"geohash_precision"=> 4
),
'city' => array(
'type' => 'string',
'analyzer' => 'keyword'
)
)
);
$indexParams['body']['mappings']['online_pizza'] = $myTypeMapping;
// Create the index
$elastic_client->indices()->create($indexParams);
On quering the http://localhost:9200/online_pizza/online_pizza/_mapping I get the following results,
{
"online_pizza": {
"properties": {
"city": {
"type": "string",
"analyzer": "keyword"
},
"food_types": {
"type": "string",
"analyzer": "keyword"
},
"id": {
"type": "integer"
},
"location": {
"type": "geo_point",
"geohash_precision": 4
},
"name": {
"type": "string",
"boost": 2,
"analyzer": "snowball"
}
}
}
}
My Question is, I have data, which has Name field as "Milano". On querying for "Milano" I get the desired result, but if I query for "Milan" or "Mil" I get no result found.
{
"query": {
"query_string": {
"default_field": "name",
"query": "Milan"
}
}
}
I've also tried to snowball analyzer during querying, no help.
{
"query": {
"query_string": {
"default_field": "name",
"query": "Milan",
"analyzer": "snowball"
}
}
}
Second Question is Keyword Search is case sensitive, eg, Pizza != pizza, how do i get away with this ?
Thanks,
The snowball stemmer doesn't want exact words. If you try it with jumping, it outputs jump as expected.
However, depending on the case, you word may be understemmed as it doesn't match any stemmer rule.
If you use the analyze API endpoint (more info here), you will see that analyzing Milano with snowball analyzer gives you the token milano :
GET _analyze?analyzer=snowball&text=Milano
Output :
{
"tokens": [
{
"token": "milano",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Then, using same snowball analyzer on Mil like this :
GET _analyze?analyzer=snowball&text=Mil
gives you this token :
{
"tokens": [
{
"token": "mil",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 1
}
]
}
That's why searching for 'milan' or 'mil' won't match 'Milano' documents : it doesn't match the milano term stored in index.
For your second question, you can prepare a custom analyzer combining keyword tokenizer and a lowercase tokenfilter in order to have your keyword search case-insensitive (if you use the same analyzer at search time) :
POST index_name
{
"analysis": {
"analyzer": {
"case_insensitive_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
Test :
GET analyse/_analyze?analyzer=case_insensitive_keyword&text=Choo Choo
Output :
{
"tokens": [
{
"token": "choo choo",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 1
}
]
}
I hope I'm clear enough in my explainations :)

Categories