Elasticsearch PHP add routing to child document - php

Database Server:
Elasticsearch 7.9.2
Centos 7.7
Dev env:
PHP 7.3.11
MacOS
I am fairly new to Elasticsearch, so please bare with me on this one.
It is driving me crazy though.
I am trying to to something very easy, but since I am from the relational database world, I need some mind bending. I have created a mapping with a parent-child relationship.
Product --> Price
This is the mapping I created:
PUT /products_pc
{
"mappings": {
"properties": {
"datafeed_id": {
"type": "integer"
},
"date_add": {
"type": "date"
},
"description": {
"type": "text"
},
"ean": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"image_url": {
"type": "text",
"index": false
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"sku": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"webshop_id": {
"type": "integer"
},
"price": {
"type": "float"
},
"url": {
"type": "text"
},
"date_mod":{
"type": "date"
},
"product_price" : {
"type":"join",
"relations": {
"product":"price"
}
}
}
}
}
So far so good. When I manually add a product and 2 prices I can get what I would expect: 1 parent with 2 child documents.
Now on to PHP, I am able to index the parent document, but not for the child documents. Looks like I am not able to send along a routing parameter (which I can with Kibana)
This is what I tried in PHP, parent _id = 123
$hosts = ['xxx.xxx.xxx.xxx:9200'];
$client = ClientBuilder::create()
->setHosts($hosts)
->build();
$params['body'][] = [
'create' => [
'_index' => 'products_pc',
'_id' => '123_1'
]
];
$params['body'][] = [
'webshop_id' => 1,
'date_mod' => time(),
'price' => 12,
'url' => '',
'product_price' => [
'name' => 'price',
'parent' => 123
]
];
$client->bulk($params);
But this does not work, as there is no routing set. If I add '_routing' => 123 below _id field I get an 400 error telling me the _routing field is wrong ("Action/metadata line [3] contains an unknown parameter [_routing]")
I have been searching for 2 days now, running in circles. All the different Elasticsearch versions are slightly different, so I have to admit that I am lost. Is there anybody who can point me my mistake? Or a hint in the right direction? It is driving me crazy. (As I am afraid it will be too simple to do...)
Thanks in advance!

So here we are, after 2 more days of searching... But I have found the solution it seems...
After some more hours searching I ended up at this page (again):
https://elastic.co/guide/en/elasticsearch/client/php-api/current/ElasticsearchPHP_Endpoints.html#Elasticsearch_Clientbulk_bulk
And there it was, in the params list of the bulk endpoint:
$params['routing'] = // (string) Specific routing value
Not quite sure how to use this at first, but...
Then I tried this for each of the child documents, which seems to be doing the trick!
$hosts = ['xxx.xxx.xxx.xxx:9200'];
$client = ClientBuilder::create()
->setHosts($hosts)
->build();
// insert price
$params['body'][] = [
'index' => [
'_index' => 'products_pc',
'_id' => '123_1',
'routing' => 123 // <-- Insert routing here.
]
];
$params['body'][] = [
'webshop_id' => 1,
'date_mod' => time(),
'price' => 12,
'url' => '',
'product_price' => [
'name' => 'price',
'parent' => 123 // <-- Parent _id value
]
];
$client->bulk($params);
As thought before, too easy actually. But I guess that is the life of a programmer.
Please be aware though, a LOT of documentation is mentioning the _routing field (Even de official docs for version 7.9: https://www.elastic.co/guide/en/elasticsearch/reference/7.9/mapping-routing-field.html As seen in the text as in the right submenu under metadata fields) but the field is actually just "routing". Might save you a couple of days ;-)

Related

Elasticsearch - Research that returns too many bad results

I have an elasticsearch that works but it is really too large, it gives me too many results on terms that have nothing to do with it. I'm looking for a way to refine these results.
On a sample of fake text when I search for the term music, the terms that come out in highlights are :
must, much, alice, inside, patriotic, noticed
I think that the ngram doesn't help me but I think I really need it to have a better search.
Here is my configuration :
{
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "mySnowball", "myNgram"]
},
"default_search": {
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "lowercase", "mySnowball", "myNgram"]
}
},
"filter": {
"mySnowball": {
"type": "snowball",
"language": "English"
},
"myNgram": {
"type": "ngram",
"min_gram": 2,
"max_gram": 6
}
}
}
}
Here is my request :
{
"query": {
"bool": {
"should": [{
"match": {
"content": "music"
}
}, {
"match": {
"url": "music"
}
}, {
"match": {
"h1": "music"
}
}, {
"match": {
"h2": "music"
}
}
],
"minimum_should_match": 1
}
},
"min_score": 8
}
My document is quite simple :
content => text,
url => text,
h1 => text,
h2 => text,
And the mapping too:
$configMapping = [
'content' => ['type' => 'text', 'boost' => 6],
'url' => ['type' => 'text', 'boost' => 6],
'h1' => ['type' => 'text', 'boost' => 9],
'h2' => ['type' => 'text', 'boost' => 7]
]
I welcome any modification that will allow me to obtain only consistent results.
As you said yourself, analyzing with 'ngram' is the reason you get all these unrelated results.
In all the results you get, you can see the token (2 characters token, as the minimum of your n-gram) that matched the query term 'music':
must, much, alice, inside, patriotic, noticed
Start by removing this filter from your analyzer and keep on tuning the results from there.

Elasticsearch -- count number of keyword occurences in a document

Database: Elasticsearch v7.2
Application: Laravel v5.7
Using Elasticsearch/Elasticsearch (https://github.com/elastic/elasticsearch-php) Official PHP Library
I have a query_string query for Elasticsearch with this code to retrieve documents that have a certain phrase as I search throughout my index
[
"query_string" => [
"default_field" => $content,
"query" => $keywords
]
],
and the $keywords variable contains:
("MCU" OR "Marvel" OR "Spiderman")
Now, I want to count the NUMBER OF OCCURENCES of these words in the documents that I'm about to retrieve
I used the aggs query with this:
'aggs' => [
'count' => [
'terms' => [
'field' => 'content.keyword'
]
]
]
However, I have no idea how to associate these doc_count and display it in a matched manner with the hits -- because the key itself is the content, instead of the IDs
Im planning to display the whole document and pertain how many times the $keywords above have occurred in each document as Mentions
Is there other way to do the counting of occurrences without using the aggs in Elasticsearch?
If you only wants to count the occurrences of keywords, then you don't have to enable fielddata, try the filters aggs along with your query
GET my_index/_search
{
"query": {
"query_string": {
"default_field": "content",
"query": "MCU OR Marvel OR Spiderman"
}
},
"aggs": {
"count": {
"filters": {
"filters": {
"mcu": {
"match": {
"content": "MCU"
}
},
"marvel": {
"match": {
"content": "Marvel"
}
},
"spiderman": {
"match": {
"content": "Spiderman"
}
}
}
}
}
}
}
Result with be like below :
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1.219939,
"hits": [
....
....
]
},
"aggregations": {
"count": {
"buckets": {
"marvel": {
"doc_count": 2
},
"mcu": {
"doc_count": 2
},
"spiderman": {
"doc_count": 1
}
}
}
}
}
Source : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html
Thanks to sir #AshrafulIslam, I was able to come up with Elasticsearch's feature called highlights. Though highlights literally emphasizes keywords that occur, I resorted to PHP's substr_count() function to count the <em> tags
I added this code as a sibling of the ['body']['query'] element:
"highlight" => [
"fields" => [
"content" => ["number_of_fragments" => 0]
],
'require_field_match' => false
]
Then as I loop through the ['hits']['hits'] array element, I performed something like this:
$articles = $client->search($params);
$hits = $articles['hits']['hits'];
for($i=0; $i<count($hits); $i++){
$hits[$i]['_source']['count_mentions'] = substr_count($hits[$i]['highlight']['content'][0],"<em>");
}
Enabling Fieldata may not be the best way to enable text search.
https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html#before-enabling-fielddata
Before you enable fielddata, consider why you are using a text field for aggregations, sorting, or in a script. It usually doesn’t make sense to do so.
A text field is analyzed before indexing so that a value like New York can be found by searching for new or for york. A terms aggregation on this field will return a new bucket and a york bucket, when you probably want a single bucket called New York.
Instead, you should have a text field for full text searches, and an unanalyzed keyword field with doc_values enabled for aggregations, as follows:
PUT my_index
{
"mappings": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}

Magento REST API tracking number not being updated

I am using Magento 2.2.2 version. I am trying to update tracking information through their rest API. Here is my code:
$tracking_str =
'{
"items": [
{
"extension_attributes": {},
"order_item_id": "'.$orderItemId.'",
"qty": "'.$qty_invoiced.'"
}
],
"notify": false,
"appendComment": true,
"comment": {
"extension_attributes": {},
"comment": "Item(s) has been shipped",
"is_visible_on_front": 0
},
"tracks": [
{
"extension_attributes": {},
"track_number": "'.$TrackingNumber.'",
"title": "'.$ShipTitle.'",
"carrier_code": "'.$carrierCode.'"
}
],
"packages": [
{
"extension_attributes": {}
}
],
"arguments": {
"extension_attributes": {}
}
}';
I am passing the above things to php curl and for the first time, I am getting a response of shipment id. And the Order status is changing to 'complete' over the Magento API. However, tracking information like carrier code and tracking number are not being updated. When I run the code again, I am getting response as:
res is: stdClass Object
( [message] => Shipment Document Validation Error(s):
The order does not allow a shipment to be created.
You can't create a shipment without products.
)
I don't know, where I am going wrong.
Finally it worked!!!
$tracking_str = [ "items"=> [ [ "order_item_id"=>$orderItemId, "qty"=> $qty_invoiced ] ], "notify" => true, "appendComment" => true, "comment" => [ "extension_attributes" => [], "comment" => "Item(s) has been shipped", "is_visible_on_front" => 0 ], "tracks" => [ [ "extension_attributes" => [], "track_number" => $TrackingNumber, "title" => $ShipTitle, "carrier_code" => $carrierCode ] ] ]
The above code, i have used. some one in github forum's helped me. Thanks to them. Now tracking number, carrier code and title are being updated.
this solution it works for me.
{
"appendComment": true,
"notify": true,
"comment": {
"comment": "shipment creado via webservice",
"is_visible_on_front": 1
},
"tracks": [
{
"track_number": "3SCEMW182389201",
"title": "UPS",
"carrier_code": "Carrier code n"
}
]
}
remove items.

Not sure if my mapping worked in Elasticsearch

Im using ES with my Laravel app using this Elasticquent package.
My mapping looks like this before I index my DB:
'ad_title' => [
'type' => 'string',
'analyzer' => 'standard'
],
'ad_type' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
'ad_type' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
'ad_state' => [
'type' => 'integer',
'index' => 'not_analyzed'
],
But when I do the API call _mapping?pretty afterwards
My mapping looks like this:
"testindex": {
"mappings": {
"ad_ad": {
"properties": {
"ad_city": {
"type": "integer"
},
"ad_id": {
"type": "long"
},
"ad_state": {
"type": "integer"
},
"ad_title": {
"type": "string",
"analyzer": "standard"
},
"ad_type": {
"type": "integer"
},
Shouldnt I be able to see 'index' => 'not_analyzed' in my mapping afterwards? Or does 'index' => 'not_analyzed' not show in the map structure afterwards?
You are correct, the mapping did not get applied. You would see the not_analyzed in the mapping API if it was applied correctly.
Make sure that you apply the mapping BEFORE you write any data. We apply the mapping on application startup to verify the mapping is always correct and to apply any mapping updates.
Here is a sample of how to apply a mapping:
PUT hilden1
PUT hilden1/type1/_mapping
{
"properties": {
"regular": {
"type": "string"
},
"indexSpecified": {
"type": "string",
"index": "not_analyzed"
}
}
}
To verify that mapping use the GET api
GET hilden1/type1/_mapping
You should see that the field "regular" only specifies it's type, where as "indexSpecified" is listed as not_analyzed. Here is the output from my machine running ES 1.4.4
{
"hilden1": {
"mappings": {
"type1": {
"properties": {
"indexSpecified": {
"type": "string",
"index": "not_analyzed"
},
"regular": {
"type": "string"
}
}
}
}
}
}

Elasticsearch Snowball Analyzer wants exact word

I Have been using Elastic Search for a project, but I find the result of Snowball Analyzer a bit strange.
Below is my example of Mapping used.
$myTypeMapping = array(
'_source' => array(
'enabled' => true
),
'properties' => array(
'id' => array(
'type' => 'integer',
'index' => 'not_analyzed'
),
'name' => array(
'type' => 'string',
'analyzer' => 'snowball',
'boost' => 2.0
),
'food_types' => array(
'type' => 'string',
'analyzer' => 'keyword'
),
'location' => array(
'type' => 'geo_point',
"geohash_precision"=> 4
),
'city' => array(
'type' => 'string',
'analyzer' => 'keyword'
)
)
);
$indexParams['body']['mappings']['online_pizza'] = $myTypeMapping;
// Create the index
$elastic_client->indices()->create($indexParams);
On quering the http://localhost:9200/online_pizza/online_pizza/_mapping I get the following results,
{
"online_pizza": {
"properties": {
"city": {
"type": "string",
"analyzer": "keyword"
},
"food_types": {
"type": "string",
"analyzer": "keyword"
},
"id": {
"type": "integer"
},
"location": {
"type": "geo_point",
"geohash_precision": 4
},
"name": {
"type": "string",
"boost": 2,
"analyzer": "snowball"
}
}
}
}
My Question is, I have data, which has Name field as "Milano". On querying for "Milano" I get the desired result, but if I query for "Milan" or "Mil" I get no result found.
{
"query": {
"query_string": {
"default_field": "name",
"query": "Milan"
}
}
}
I've also tried to snowball analyzer during querying, no help.
{
"query": {
"query_string": {
"default_field": "name",
"query": "Milan",
"analyzer": "snowball"
}
}
}
Second Question is Keyword Search is case sensitive, eg, Pizza != pizza, how do i get away with this ?
Thanks,
The snowball stemmer doesn't want exact words. If you try it with jumping, it outputs jump as expected.
However, depending on the case, you word may be understemmed as it doesn't match any stemmer rule.
If you use the analyze API endpoint (more info here), you will see that analyzing Milano with snowball analyzer gives you the token milano :
GET _analyze?analyzer=snowball&text=Milano
Output :
{
"tokens": [
{
"token": "milano",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Then, using same snowball analyzer on Mil like this :
GET _analyze?analyzer=snowball&text=Mil
gives you this token :
{
"tokens": [
{
"token": "mil",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 1
}
]
}
That's why searching for 'milan' or 'mil' won't match 'Milano' documents : it doesn't match the milano term stored in index.
For your second question, you can prepare a custom analyzer combining keyword tokenizer and a lowercase tokenfilter in order to have your keyword search case-insensitive (if you use the same analyzer at search time) :
POST index_name
{
"analysis": {
"analyzer": {
"case_insensitive_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
Test :
GET analyse/_analyze?analyzer=case_insensitive_keyword&text=Choo Choo
Output :
{
"tokens": [
{
"token": "choo choo",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 1
}
]
}
I hope I'm clear enough in my explainations :)

Categories