Not sure if my mapping worked in Elasticsearch - php

Im using ES with my Laravel app using this Elasticquent package.
My mapping looks like this before I index my DB:
'ad_title' => [
'type' => 'string',
'analyzer' => 'standard'
],
'ad_type' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
'ad_type' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
'ad_state' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
But when I do the API call _mapping?pretty afterwards
My mapping looks like this:
"testindex": {
"mappings": {
"ad_ad": {
"properties": {
"ad_city": {
"type": "integer"
},
"ad_id": {
"type": "long"
},
"ad_state": {
"type": "integer"
},
"ad_title": {
"type": "string",
"analyzer": "standard"
},
"ad_type": {
"type": "integer"
},
Shouldnt I be able to see 'index' => 'not_analyzed' in my mapping afterwards? Or does 'index' => 'not_analyzed' not show in the map structure afterwards?

You are correct, the mapping did not get applied. You would see the not_analyzed in the mapping API if it was applied correctly.
Make sure that you apply the mapping BEFORE you write any data. We apply the mapping on application startup to verify the mapping is always correct and to apply any mapping updates.
Here is a sample of how to apply a mapping:
PUT hilden1
PUT hilden1/type1/_mapping
{
"properties": {
"regular": {
"type": "string"
},
"indexSpecified": {
"type": "string",
"index": "not_analyzed"
}
}
}
To verify that mapping use the GET api
GET hilden1/type1/_mapping
You should see that the field "regular" only specifies it's type, where as "indexSpecified" is listed as not_analyzed. Here is the output from my machine running ES 1.4.4
{
"hilden1": {
"mappings": {
"type1": {
"properties": {
"indexSpecified": {
"type": "string",
"index": "not_analyzed"
},
"regular": {
"type": "string"
}
}
}
}
}
}

Related

Elasticsearch PHP add routing to child document

Database Server:
Elasticsearch 7.9.2
Centos 7.7
Dev env:
PHP 7.3.11
MacOS
I am fairly new to Elasticsearch, so please bare with me on this one.
It is driving me crazy though.
I am trying to to something very easy, but since I am from the relational database world, I need some mind bending. I have created a mapping with a parent-child relationship.
Product --> Price
This is the mapping I created:
PUT /products_pc
{
"mappings": {
"properties": {
"datafeed_id": {
"type": "integer"
},
"date_add": {
"type": "date"
},
"description": {
"type": "text"
},
"ean": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"image_url": {
"type": "text",
"index": false
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"sku": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"webshop_id": {
"type": "integer"
},
"price": {
"type": "float"
},
"url": {
"type": "text"
},
"date_mod":{
"type": "date"
},
"product_price" : {
"type":"join",
"relations": {
"product":"price"
}
}
}
}
}
So far so good. When I manually add a product and 2 prices I can get what I would expect: 1 parent with 2 child documents.
Now on to PHP, I am able to index the parent document, but not for the child documents. Looks like I am not able to send along a routing parameter (which I can with Kibana)
This is what I tried in PHP, parent _id = 123
$hosts = ['xxx.xxx.xxx.xxx:9200'];
$client = ClientBuilder::create()
->setHosts($hosts)
->build();
$params['body'][] = [
'create' => [
'_index' => 'products_pc',
'_id' => '123_1'
]
];
$params['body'][] = [
'webshop_id' => 1,
'date_mod' => time(),
'price' => 12,
'url' => '',
'product_price' => [
'name' => 'price',
'parent' => 123
]
];
$client->bulk($params);
But this does not work, as there is no routing set. If I add '_routing' => 123 below _id field I get an 400 error telling me the _routing field is wrong ("Action/metadata line [3] contains an unknown parameter [_routing]")
I have been searching for 2 days now, running in circles. All the different Elasticsearch versions are slightly different, so I have to admit that I am lost. Is there anybody who can point me my mistake? Or a hint in the right direction? It is driving me crazy. (As I am afraid it will be too simple to do...)
Thanks in advance!
So here we are, after 2 more days of searching... But I have found the solution it seems...
After some more hours searching I ended up at this page (again):
https://elastic.co/guide/en/elasticsearch/client/php-api/current/ElasticsearchPHP_Endpoints.html#Elasticsearch_Clientbulk_bulk
And there it was, in the params list of the bulk endpoint:
$params['routing'] = // (string) Specific routing value
Not quite sure how to use this at first, but...
Then I tried this for each of the child documents, which seems to be doing the trick!
$hosts = ['xxx.xxx.xxx.xxx:9200'];
$client = ClientBuilder::create()
->setHosts($hosts)
->build();
// insert price
$params['body'][] = [
'index' => [
'_index' => 'products_pc',
'_id' => '123_1',
'routing' => 123 // <-- Insert routing here.
]
];
$params['body'][] = [
'webshop_id' => 1,
'date_mod' => time(),
'price' => 12,
'url' => '',
'product_price' => [
'name' => 'price',
'parent' => 123 // <-- Parent _id value
]
];
$client->bulk($params);
As thought before, too easy actually. But I guess that is the life of a programmer.
Please be aware though, a LOT of documentation is mentioning the _routing field (Even de official docs for version 7.9: https://www.elastic.co/guide/en/elasticsearch/reference/7.9/mapping-routing-field.html As seen in the text as in the right submenu under metadata fields) but the field is actually just "routing". Might save you a couple of days ;-)

Laravel collection formatting key to field

I have a laravel collection that has this format:
{
"24385528032901": [
{
"time": "2020-06-30T22:30:00.000000Z",
"conso_prod": "Prod",
"meter_id": "24385528032901",
"delta": "0",
},
{
"time": "2020-06-30T23:00:00.000000Z",
"conso_prod": "Prod",
"meter_id": "24385528032901",
"delta": "2",
}
],
"24385528032777": [
{
"time": "2020-06-30T22:30:00.000000Z",
"conso_prod": "Prod",
"meter_id": "24385528032777",
"delta": "0",
},
{
"time": "2020-06-30T23:00:00.000000Z",
"conso_prod": "Prod",
"meter_id": "24385528032777",
"delta": "5",
}
], etc.
}
I'm having an hard time converting it to a chartJS linechart graph format:
[
[
'label' => '24385528032901',
'data' => $measures->map->delta,
], [
'label' => '24385528032777',
'data' => $measures->map->delta,
],
]
I know there is a collection method to do this, but can't find it anymore. Anyone ?
If i got your question right and you want to get meter_id and delta as label and data you can get meter_id and delta fields only from your collection then use collection's map helper to map these fields to your new field names like so
Model::get(['meter_id', 'delta'])->map(function (Model $model) {
return [
'label' => $model->meter_id,
'data' => $model->delta,
];
})->toArray()

ElasticSearch match combination in array

I'm implementing ElasticSearch into my Laravel application using the php package from ElasticSearch.
My application is a small jobboard and currently my job document is looking like this:
{
"_index":"jobs",
"_type":"job",
"_id":"19",
"_score":1,
"_source":{
"0":"",
"name":"Programmer",
"description":"This is my first job! :)",
"text":"Programming is awesome",
"networks":[
{
"id":1,
"status":"PRODUCTION",
"start":"2015-02-26",
"end":"2015-02-26"
},
{
"id":2,
"status":"PAUSE",
"start":"2015-02-26",
"end":"2015-02-26"
}
]
}
}
As you can see a job can be attached to multiple networks. In my search query I would like to include WHERE network.id == 1 AND network.status == PRODUCTION.
My current query looks like this, however this returns documents where it has a network of id 1, if it has any network of status PRODUCTION. Is there anyway i can enforce both to be true within one network?
$query = [
'index' => $this->index,
'type' => $this->type,
'body' => [
'query' => [
'bool' => [
'must' => [
['networks.id' => 1]],
['networks.status' => 'PRODUCTION']]
],
'should' => [
['match' => ['name' => $query]],
['match' => ['text' => $query]],
['match' => ['description' => $query]],
],
],
],
],
];
You need to specify that the objects in the networks array should be stored as individual objects in the index, this will allow you to perform a search on individual network objects. You can do so using the nested type in Elasticsearch.
Also, if you doing exact matches it is better to use a filter rather than a query as the filters are cached and always give you better performance than a query.
Create your index with a new mapping. Use the nested type for the networks array.
POST /test
{
"mappings": {
"job": {
"properties": {
"networks": {
"type": "nested",
"properties": {
"status": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}
Add a document:
POST /test/job/1
{
"0": "",
"name": "Programmer",
"description": "This is my first job! :)",
"text": "Programming is awesome",
"networks": [
{
"id": 1,
"status": "PRODUCTION",
"start": "2015-02-26",
"end": "2015-02-26"
},
{
"id": 2,
"status": "PAUSE",
"start": "2015-02-26",
"end": "2015-02-26"
}
]
}
As you have a nested type you will need to use a nested filter.
POST /test/job/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "networks",
"filter": {
"bool": {
"must": [
{
"term": {
"networks.id": "1"
}
},
{
"term": {
"networks.status.raw": "PRODUCTION"
}
}
]
}
}
}
}
}
}
}

Elasticsearch wont apply not_analyzed into my mapping

When I try to apply "not_analyzed" into my ES mapping it doesnt work.
I am using this package for ES in Laravel - Elasticquent
My mapping looks like:
'ad_title' => [
'type' => 'string',
'analyzer' => 'standard'
],
'ad_type' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
'ad_type' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
'ad_state' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
Afterwards I do an API get call to view the mapping and it will output:
"testindex": {
"mappings": {
"ad_ad": {
"properties": {
"ad_city": {
"type": "integer"
},
"ad_id": {
"type": "long"
},
"ad_state": {
"type": "integer"
},
"ad_title": {
"type": "string",
"analyzer": "standard"
},
"ad_type": {
"type": "integer"
},
Note that not_analyzed is missing.
I cant see any errors/warnings in my logs either.
What I gathered from my own experience is that you must do the mapping before you do any indexing. Delete the index you've created, assign your not_analyzed mapper and then index your fields again, and you will have the not_analyzed field appear. Please let me know if this works for you. Thank you.
There is an easy test to see if it works:
PUT /stack
{
"mappings": {
"try": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"age": {
"type": "integer",
"index": "not_analyzed"
}
}
}
}
}
GET /stack/_mapping
and the response is:
{
"stack": {
"mappings": {
"try": {
"properties": {
"age": {
"type": "integer"
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}

Elasticsearch Snowball Analyzer wants exact word

I Have been using Elastic Search for a project, but I find the result of Snowball Analyzer a bit strange.
Below is my example of Mapping used.
$myTypeMapping = array(
'_source' => array(
'enabled' => true
),
'properties' => array(
'id' => array(
'type' => 'integer',
'index' => 'not_analyzed'
),
'name' => array(
'type' => 'string',
'analyzer' => 'snowball',
'boost' => 2.0
),
'food_types' => array(
'type' => 'string',
'analyzer' => 'keyword'
),
'location' => array(
'type' => 'geo_point',
"geohash_precision"=> 4
),
'city' => array(
'type' => 'string',
'analyzer' => 'keyword'
)
)
);
$indexParams['body']['mappings']['online_pizza'] = $myTypeMapping;
// Create the index
$elastic_client->indices()->create($indexParams);
On quering the http://localhost:9200/online_pizza/online_pizza/_mapping I get the following results,
{
"online_pizza": {
"properties": {
"city": {
"type": "string",
"analyzer": "keyword"
},
"food_types": {
"type": "string",
"analyzer": "keyword"
},
"id": {
"type": "integer"
},
"location": {
"type": "geo_point",
"geohash_precision": 4
},
"name": {
"type": "string",
"boost": 2,
"analyzer": "snowball"
}
}
}
}
My Question is, I have data, which has Name field as "Milano". On querying for "Milano" I get the desired result, but if I query for "Milan" or "Mil" I get no result found.
{
"query": {
"query_string": {
"default_field": "name",
"query": "Milan"
}
}
}
I've also tried to snowball analyzer during querying, no help.
{
"query": {
"query_string": {
"default_field": "name",
"query": "Milan",
"analyzer": "snowball"
}
}
}
Second Question is Keyword Search is case sensitive, eg, Pizza != pizza, how do i get away with this ?
Thanks,
The snowball stemmer doesn't want exact words. If you try it with jumping, it outputs jump as expected.
However, depending on the case, you word may be understemmed as it doesn't match any stemmer rule.
If you use the analyze API endpoint (more info here), you will see that analyzing Milano with snowball analyzer gives you the token milano :
GET _analyze?analyzer=snowball&text=Milano
Output :
{
"tokens": [
{
"token": "milano",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Then, using same snowball analyzer on Mil like this :
GET _analyze?analyzer=snowball&text=Mil
gives you this token :
{
"tokens": [
{
"token": "mil",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 1
}
]
}
That's why searching for 'milan' or 'mil' won't match 'Milano' documents : it doesn't match the milano term stored in index.
For your second question, you can prepare a custom analyzer combining keyword tokenizer and a lowercase tokenfilter in order to have your keyword search case-insensitive (if you use the same analyzer at search time) :
POST index_name
{
"analysis": {
"analyzer": {
"case_insensitive_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
Test :
GET analyse/_analyze?analyzer=case_insensitive_keyword&text=Choo Choo
Output :
{
"tokens": [
{
"token": "choo choo",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 1
}
]
}
I hope I'm clear enough in my explainations :)

Categories