elasticsearch: search for parts of words - php

I'm trying to learn how to use elasticsearch (using elasticsearch-php for queries). I have inserted a few data, which look something like this:
['id' => 1, 'name' => 'butter', 'category' => 'food'],
['id' => 2,'name' => 'buttercup', 'category' => 'food'],
['id' => 3,'name' => 'something else', 'category' => 'butter']
Now I created a search query which looks like this:
$query = [
'filtered' => [
'query' => [
'bool' => [
'should' => [
['match' => [
'name' => [
'query' => $val,
'boost' => 7
]
]],
['match' => [
'category' => [
'query' => $val,
'boost' => 5
]
]],
],
]
]
]
];
where $val is the search term. This works nicely, the only problem I have: when I search for "butter", I find ids 1 and 3, but not 2, because the searchterm seems to match exact words only. Is there a way to search "within words", or, in mysql terms, to do something like WHERE name LIKE '%val%' ?

You can try the wildcard query
$query = [
'filtered' => [
'query' => [
'bool' => [
'should' => [
['wildcard' => [
'name' => [
'query' => '*'.$val.'*',
'boost' => 7
]
]],
['wildcard' => [
'category' => [
'query' => '*'.$val.'*',
'boost' => 5
]
]],
],
]
]
]
];
or the query_string query.
$query = [
'filtered' => [
'query' => [
'bool' => [
'should' => [
['query_string' => [
'default_field' => 'name',
'query' => '*'.$val.'*',
'boost' => 7
]],
['query_string' => [
'default_field' => 'category',
'query' => '*'.$val.'*',
'boost' => 7
]],
],
]
]
]
];
Both will work but are not really performant if you have lots of data.
The correct way of doing this is to use a custom analyzer with a standard tokenizer and an ngram token filter in order to slice and dice each of your tokens into small ones.

Related

Elasticsearch how to correctly provide a negative boost in PHP

I'm trying to give a negative boost to push results down in the ranking if they have 'b-stock' in the title.
Here is my code:
'body' => [
'size' => 15,
'query' => [
'boosting' => [
'positive' =>[
'bool' => [
'should' => [
['query_string' => [
'default_field' => 'title_tag',
'query' => $term
]],
['query_string' => [
'default_field' => 'name',
'query' => $term
]],
['query_string' => [
'default_field' => 'description',
'query' => $term
]],
]
],
],
'negative' => [
'term' => [
'name' => 'B-Stock'
]
],
'negative_boost' => 2
]
]
]
However this seems to have no affect on the results even if I remove the term array from the 'negative' array the same results set is returned.

How to optimize elastic search query

I have been reading through elastic search docs over the last few months and have continued to optimize my query, but I can't seem to get a search query below 500-600ms. Locally with less data I can get responses in ~80-200ms.
To outline what I am trying to accomplish:
I have 12 different models in Laravel that are searchable from a single search bar. As someone types it is searched and returned in a list of results.
Currently, I have this for my search query. Are there any references for how I can improve this? I looked into multi_match, but I was having issues with partial matches and specifying all fields.
$results = $this->elastic->search([
'index' => config('scout.elasticsearch.index'),
'type' => $type ?? implode(',', array_keys($this->permissions, true, true)),
'body' => [
'query' => [
'bool' => [
'must' => [
[
'query_string' => [
'query' => "$searchQuery*",
],
],
],
'filter' => [
[
'term' => [
'account_id' => $accountId,
],
],
],
'should' => [
[
'term' => [
'_type' => [
'value' => 'customers',
'boost' => 1.3,
],
],
],
[
'term' => [
'_type' => [
'value' => 'contacts',
'boost' => 1.3,
],
],
],
[
'term' => [
'_type' => [
'value' => 'users',
'boost' => 1.3,
],
],
],
[
'term' => [
'_type' => [
'value' => 'chart_accounts',
'boost' => 1.2,
],
],
],
],
],
],
'from' => $from,
'size' => $size,
],
]);

Elasticsearch Partial match or fuzzy match, boost partial results

Trying to query in Elasticsearch w/ the PHP client and give priority to partial words matches but still include fuzzy matches. If I remove the address.company match block, the query works as expected, but is broken with it present no matter how I seem to frame it. I am lost on the formatting to also include the fuzzy searches with a lower priority?
$search_data = [
"from" => (int) $start, "size" => (int) $count,
'query' => [
'bool' => [
'filter' => [
['term' => ['active' => 1]],
['term' => ['type' => 2]],
],
'must' => [
'wildcard' => [
'address.company' => '*' . $search_query . '*'
],
'match' => [
'address.company' => [
'query' => $search_query,
'operator' => 'and',
'fuzziness' => 'AUTO',
],
],
],
],
],
];
While I am still new to ES as likely is apparent, this solution seems to get the data I'm after. I only mention that because there may be a more ideal way if someone views this in the future. Switching from must to should and wrapping the arrays a bit differently did the trick.
$search_data = [
"from" => (int) $start, "size" => (int) $count,
'query' => [
'bool' => [
'filter' => [
['term' => ['active' => 1]],
['term' => ['type' => 2]],
],
'should' => [
[
'match' => ['address.company' => ['query'=>$search_query,'boost'=>10]],
],
[
'match' =>
[
'address.company' =>
[
'query' => $search_query,
'fuzziness' => 'AUTO',
],
],
],
],
'minimum_should_match'=>1,
],
],
];

Elasticsearch and PHP. Combine match and multi_match in one query

I have three fields - one of integer type (field1) and two of decimal type (field2, field3). I want to be able to query by all fields. These separate queries work nice in my situation:
$params = [
'index' => 'test_index',
'type' => 'text_index_type',
'body' => [
'query' => [
'match' => [
'field1' => '12'
]
]
]
];
and this query works well:
$params = [
'index' => 'test_index',
'type' => 'text_index_type',
'body' => [
'query' => [
'multi_match' => [
'query' => '345',
'fields' => ['field2', 'field3']
]
]
]
];
If, however, I combine them:
$params = [
'index' => 'test_index',
'type' => 'text_index_type',
'body' => [
'query' => [
'match' => [
'field1' => '12'
],
'multi_match' => [
'query' => '345',
'fields' => ['field2', 'field3']
]
]
]
];
I get an error:
Uncaught exception 'Elasticsearch\Common\Exceptions\BadRequest400Exception' ... [match] malformed query, unexpected [FIELD_NAME] found [multi_match]
So, what is wrong with that and how can I fix it?
PS. In terms of SQL, this is what I want to achive:
SELECT * FROM mytable where field1 = 12 or field2 = 345 or field3 = 345
You can combine them with bool queries
$params = [
'index' => 'test_index',
'type' => 'test_index_type',
'body' => [
'query' => [
'bool' => [
'should' => [
[ 'match' => [ 'field1' => '12' ] ],
[ 'multi_match' => [ 'query' => '345',
'fields' => ['field2', 'field3']] ],
]
]
]
]
];
should equates to "OR" while must equates to "AND"

Querying Elasticsearch with PHP

I'm having a little trouble translating some of the queries I use for elasticsearch into PHP readable queries.
For example this simple query works:
$query = $elastic->search([
body' => [
'query' => [
'match' => [
'myfield' => 'mymatchingresult'
]
]
]
]);
But what I'm trying to get to work follow below. There isn't an error, it just doesn't run. I must not be understanding the structure. The same query if placed in something like google extension sense seems to work. (With the php '=>' converted to ':' etc.)
$query = $elastic->search([
'body' => [
'query' => [
'filtered' => [
'query' => [
'query_string' => [
'query' => '*',
'analyze_wildcard' => 'true'
]
],
'filter' => [
'bool' => [
'must' => [
'query' => [
'query_string' => [
'analyze_wildcard' => 'true',
'query' => 'cn:name'
]
],
'range' => [
'#timestamp' => [
'from' => '2012-05-01',
'to' => '2016-05-01'
]
]
]
]
]
]
]
]
]);
Thank you for the help!
-John
As far as I can tell, the constraints in your bool/must filter must be enclosed in square brackets, i.e. bool/must should be a pure array, not an associative array.
Like this:
$query = $elastic->search([
'body' => [
'query' => [
'filtered' => [
'query' => [
'query_string' => [
'query' => '*',
'analyze_wildcard' => 'true'
]
],
'filter' => [
'bool' => [
'must' => [
[
'query' => [
'query_string' => [
'analyze_wildcard' => 'true',
'query' => 'cn:name'
]
]
],
[
'range' => [
'#timestamp' => [
'from' => '2012-05-01',
'to' => '2016-05-01'
]
]
]
]
]
]
]
]
]
]);

Categories