elasticsearch sort data in fuzzy mode - php

I want to sort data by more similar in elasticsearch with fuzzy mode
we have to record
1.panadol
2.penadol
when I search with panadol or penadol the first result is (penadol) but I want wen I type (panadol) the first result appear (panadol) and the second result id (penadol) etc ..
$params = [
'index' => 'my_index',
'type' => 'my_type',
'body' => [
"track_scores"=> true,
'sort'=>[
'name'=> ['reverse'=>true],
'_score'=> ['order'=>'desc'],
],
'query' => [
'fuzzy' => [
'name' => [
"value"=> 'panadol',
"fuzziness" => 2,
]
]
],
]
];

Fuzziness is not meant for scoring. You can find more info about it in the docs.
If you want to sort the results by relevance to the original phrase your searched for you can use either the phrase-suggester or the completion-suggester, depending on your needs (and your data).

Related

How to combine fulltext search with other optinals matches in MongoDb?

I try to query my collection with only one query and 3 potentials search method:
fulltext search
classic search
search regex
This 3 matches can be executed at the same time or just one of them.
The fulltext search is the first stage pipeline as we know. Does this fulltext search can be optional in my aggregate? Because if my default value of search is "", my query returns any data. And I need data to perform my other optionals matches.
Here is my Laravel 8 controller :
Product::raw(function ($collection) use($filters, $fullText, $likeKey, $likeValue){
return $collection->aggregate([
[
'$match' =>
[
'$text' =>['$search' => $fullText],
],
],
[
'$match' => $filters
],
[
'$match' =>
[
$likeKey =>
[
'$regex' => $likeValue,
'$options' => "i"
]
]
],
[
'$addFields' =>
[
'avgReviews' => ['$avg' => '$reviews.ranking'],
'price' => ['$min' => '$variants.price'],
'equipmentsList' => [
'$reduce' => [
'input' => '$equipments.list.list',
'initialValue' => [],
'in' =>
[
'$concatArrays' => [
'$$value',
'$$this'
]
]
]
]
],
]
]);
})
->when($operations, function($products) use ($operations){
foreach($operations as $key => $operation){
return $products
->where($operation[0],$operation[1],$operation[2]);
}
})
->forPage($page,$limit)
->sortBy($sortBy, SORT_REGULAR, $order == 'desc')
->values();
$filters is an array and works fine when it's the only one match. But if I want to use $filters without $text, it returns any data. And with the third match, nothing works. Can somebody help me with this?

How can I fetch results from elastic search from one column with different values at the same time?

I am using REST API using PHP for fetching data from Elastic search with following code
$params = [
'index' => $search_index,
'type' => $search_type,
'from' => $_POST["from"],
'size' => $_POST["fetch"],
'body' => [
'query' => [
'bool' => [
'must' => [
[ 'match' => [ 'is_validated' => false ] ],
[ 'query_string' => [ 'query' => $search_str, 'default_operator' => 'OR' ] ]
]
]
]
]
];
Now, this is working perfectly and giving me my desired results.
The data that is returned from ES, has one column "result_source" and it has predefined values like CNN, BBC or YouTube etc.
What I need is, I want to filter results on "result_source" column in a way that, I can only fetch the results with the option I want. Like I want results that have "result_source" value only "YouTube" or only "BBC & CNN" or only "CNN or YouTube" etc.
I have already tried "Should" option, but it also returns the data with other values that I don't need. Not sure how to skip those values of "result_source" column in fetching results from ES.
Any help on this will be appreciated.
Thanks
Solved!!
I am replying to my own question, because I found a solution for it. May be it can help someone else in future.
If anyone is looking for a solution of searching within the field / column of Elastic search, here is what can be done.
[ 'query_string' => [ 'query' => $search_str.'(result_source:CNN OR result_source:BBC)', 'default_operator' => 'OR' ] ]
"result_source" is actually the field / column name of ES on which filter is applied to return results that have result_source=BBC or result_source=CNN.
This actually solved my issue.

How can I fetch distinct records from Elasticsearch

I am working on Elasticsearch (ES) for last couple of weeks. There are millions of records currently present in different search indices in ES.
I have noticed that in different search indices, there is duplication of records and it is creating problem.
We can search for duplicate records via code and remove those records. May be this can be applicable, but I have more than 100 million records so it will take lot of time.
My requirement is, while we fetch records from ES, we can apply different filters. Is there any filter or way we can only fetch distinct records? I am currently using REST API using PHP.
Here is the code that I am currently using and filters are working perfectly.
$params = [
'index' => 'MyIndex',
'type' => 'MyType',
'from' => 0,
'size' => 10,
'body' => [
'query' => [
'bool' => [
'must' => [
[ 'match' => [ 'image' => true ] ],
[ 'simple_query_string' => [ 'query' => 'MyQuery' ] ]
]
]
]
]
];
I also tried looking something from "Aggregations", but couldn't find something related to my requirement.
Quick help will be highly appreciated.
Thanks in advance.
I think what you are looking for is "collapsing".
Elasticsearch supports it from 6.x:
https://www.elastic.co/guide/en/elasticsearch/reference/6.x/search-request-collapse.html

Elasticsearch search delays pulling latest data after indexing

I am using the Official PHP driver to connect to Elasticsearch(v 2.3), every when I index a new document it takes from 5sec to 60sec to be able to get it into my filter results. How can I cut down the delay time to zero?
Here is my index query
# Document Body
$data = [];
$data['time'] = $time;
$data['unique'] = 1;
$data['lastACtivity'] = $time;
$data['bucket'] = 20,
$data['permission'] = $this->_user->permission; # Extracts User Permission
$data['ipaddress'] = $this->_client->ipaddress(); # Extracts User IP Address
# Construct Index
$indexRequest = [
'index' => 'gorocket',
'type' => 'log',
'refresh' => true,
'body' => $data
];
# Indexing Document
$confirmation = $client->index( $indexRequest );
And here is my search filter query
# Query array
$query =[ 'query' => [
'filtered' => [
'filter' => [
'bool' => [
'must' =>[
[
'match' => [ 'unique' => 1 ]
],
[
'range' => [
'lastACtivity' => [
'gte' => $from,
'lte' => $to
],
'_cache' => false
]
]
],
'must_not' => [
[ 'match' => [ 'type' => 'share' ] ],
]
]
]
]
]
];
# Prepare filter parameters
$filterParams = [
'index' => 'gorocket',
'type' => 'log',
'size' => 20,
'query_cache' => false,
'body' => $query
];
$client->search($filterParams);
Thank you.
When you index a new document you can specify the refresh parameter in order to make the new document available immediately for your next search operation.
$params = [
'index' => 'my-index',
'type' => 'my-type',
'id' => 123,
'refresh' => true <--- add this
];
$response = $client->index($params);
The refresh parameter is also available on the bulk operation if you're using it.
Be aware, though, that refreshing too often can have negative impacts on performance.
There is a refresh option provided, which needs a value (in seconds) to refresh the index. For example, if you update something in index, it gets written in the index but not ready for reading until the index is refreshed.
Refresh can be set to true for refreshing the index as soon as any change happens. This needs to be very carefully thought, because many times, it downgrades your performance as its an overkill to refresh for each small operation, plus many bulk refreshes can make the index busy.
Tip: Use an elasticsearch plugin, such as kopf and see more such options like refresh rate, to configure.

Elasticsearch exact match field

I have a field called url that is set to not_analyzed when I index it:
'url' => [
'type' => 'string',
'index' => 'not_analyzed'
]
Here is my method to determine if a URL already exists in the index:
public function urlExists($index, $type, $url) {
$params = [
'index' => $index,
'type' => $type,
'body' => [
'query' => [
'match' => [
'url' => $url
]
]
]
];
$results = $this->client->count($params);
return ($results['count'] > 0);
}
This seems to work fine however I can't be 100% sure this is the correct way to find an exact match, as reading the docs another way to do the search is with the params like:
$params = [
'index' => $index,
'type' => $type,
'body' => [
'query' => [
'filtered' => [
'filter' => [
'term' => [
'url' => $url
]
]
]
]
]
];
My question is would either params work the same way for a not_analyzed field?
The second query is the right approach. term level queries/filters should be used for exact match. Biggest advantage is caching. Elasticsearch uses bitset for this and you will get quicker response time with subsequent calls.
From the Docs
Exclude as many document as you can with a filter, then query just the
documents that remain.
Also if you observe your output, you will find that _score of every document is 1 as scoring is not applied to filters, same goes for highlighting but with match query you will see different _score. Again From the Docs
Keep in mind that once you wrap a query as a filter, it loses query
features like highlighting and scoring because these are not features
supported by filters.
Your first query uses match which is basically used for analyzed fields e.g when you want both Google and google to match all your documents containing google(case insensitive) match queries are used.
Hope this helps!!

Categories