advanced search with ElasticSerach

advanced search with ElasticSerach - php

I've create a small application with PHP and I use ES.
My request is good, but I've got the good result.
My request look-like that:
link:9200/index/_search?from=0&size=130&q=try:'yes'
%2Bbrand:'BMW' %2Bmodel:'SERIE 5' %2Bprice:[500 TO 700000]
When I send this query, ES reply me with model 'SERIE 3' and 'SERIE 5', it's great, but when I send this query, I would like to recover only 'BMW' and 'SERIE 5'.
How can I fix this?

First, you should take a look at the documentation to be more familiar with these notions (analyze / difference between query and filters) which are very important for a good use of ElasticSearch. You can find a good getting started documentation here.
Your problem is that your "model" field is a string, which by default is analyzed using the standard analyzer.
It outputs 2 tokens because of the whitespace in the model name as you can see if you use the _analyze endpoint :
GET _analyze?analyzer=standard&text='Serie 5'
{
"tokens": [
{
"token": "serie",
"start_offset": 1,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "5",
"start_offset": 7,
"end_offset": 8,
"type": "<NUM>",
"position": 2
}
]
}
On top of that, you're using a query and though will return all results matching even partially. So, you're certainly having the two cars in your results, but the "SERIE 5" car must be the first (as it matches better) than the car "SERIE 3", which is represented by a higher _score attribute.
You need to use a term filter which will return only the documents containing the term value you provided.
However, as it works on terms, you have to change the mapping of your field to "not_analyzed" like this to keep it as it is :
PUT /test/car/_mapping
{
"properties":{
"model":{
"type": "string",
"index":"not_analyzed"
}
}
}
Finally, the search request will be something like this (with price criteria as range filter and the use of a and filter to combine both) :
GET /test/car/_search
{
"query": {
"filtered": {
"filter": {
"and": {
"filters": [
{
"term": {
"model": "Serie 3"
}
},
{
"range": {
"price": {
"from": 500,
"to": 70000
}
}
}
]
}
}
}
}
}

Your query (url_decoded) looks like
link:9200/index/_search?from=0&size=130&q=try:'yes' +brand:'BMW' +model:'SERIE 5' +price:[500 TO 700000]
I think you are using '+' incorrectely, so that it is doing or operation for your query,
If you want to get with try:yes, brand:BMW and model:SERIE 5 then you have to join these query by AND keyword.
like.
link:9200/index/_search?from=0&size=130&q=try:'yes'
AND brand:'BMW' AND model:'SERIE 5' AND price:[500 TO 700000]
And you should be aware of choosing analyzer (in mapping of fields), so that things are indexed as you want.
It will work, Thanks
Reference

Related

Can you use named queries on Elasticsearch with PHP?

I know you can use named queries in Elasticsearch to test which document matched the best in Kibana but I'm running Wordpress with Jetpack search which uses elasticsearch PHP (v2.4) and I want to be able to test my queries and return the named queries on each result so I can better understand that my queries returned what I had intended. This is how it's done in Elasticsearch (json):
...
"must": [
{
"match": {
"body": {
"query": "Will Smith",
"_name": "match_will_smith"
}
}
}
],
"should": [
{
"match_phrase": {
"body": {
"query": "Will Smith",
"slop": 5,
"_name": "should_match_phrase_will_smith_with_slop"
}
}
},
]...
Result:
"matched_queries" : [
"match_will_smith",
"should_match_phrase_will_smith_with_slop"
]
That would be awesome if I could get the value of the "matched_queries" object and print it to my php page on every result to I can see what each article is matching. Anyone knows if this is possible?

I think the closest thing is using explain parameter in your query
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html
it would give how the score of your document was calculated by using which sections of your query
but I would not use it in production environment

Any way to prevent Doctrine result to nest entities when adding custom field to QueryBuilder?

I am adding a calculated field named 'distance' to a Doctrine query but the result is then nesting the entities as follow:
[
{
"0": {
"name": "Some name",
"id": 3
},
"distance": "10"
},
{
...
]
Is there a way to tell Doctrine to format the response like this instead?
[
{
"name": "Some name",
"id": 3
"distance": "10"
},
{
...
]
I don't always add this field as it depends of the search criteria, so I am having inconsistent result format.
Also I can prevent the issue by adding the distance field as HIDDEN, but then I lose the distance information, which I would like to keep.
Any help appreciated, thanks.

Custom sorting in Elasticsearch

Does anyone know if it's possible to custom sort in elasticsearch?
I have a sort on the category field. Which groups all of the records together by category. This works great.
However could you then give the sort a list e.g cars, books, food.
It would then show the cars first, then books and finally food?

You can use a function_score query, something like this:
{
"query": {
"function_score": {
"query": { "match_all": {} },
"boost": "5",
"functions": [
{
"filter": { "match": { "category": "cars" } },
"weight": 100
},
{
"filter": { "match": { "category": "books" } },
"weight": 50
},
{
"filter": { "match": { "category": "food" } },
"weight": 1
}
],
"score_mode": "max",
"boost_mode": "replace"
}
}
}
Where you, of course, put whichever query you are using now instead of the match_all query, and leave off the sort (the default is by score, which is what you want here).
This is replacing the score elasticsearch normally generates, with a custom score for each category. You could experiment with other boost_mode in order to have a reasonable ranking within the categories. In case you need to understand what is happening with the scoring, you can add "explain": true to the query at the top level.

You can use custom script for your own scoring.
More details at in Script Based Sorting section: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-request-sort.html

Elasticsearch: What's the best way to search for a word within a string AND get score?

I'm using ElasticSearch's PHP client and I find really difficult to return results with scores whenever I want to search for a word that is "hidden" within a string.
This is an example:
I want to get all the documents where the field "file" has the word "anses" and files are named like this:
axx14anses19122015.zip
What I know about it
I know I should tokenize those words, can't realize how to do it.
Also I've read about aggregations but I'm really new to ES and I have to deliver a working piece ASAP.
What I've tried so far
REGEXP: using regular expressions is very expensive and does not return any scores, which is a must-to-have in order to shrink results and bring the user accurate information.
Wildcards: same thing, slow and no scores
Own script where I have a dictionary and search for critical words using regexp, if match, create a new field within that matched document with the word. The reason is to create a TOKEN so in future searches I can use regular match with scores. Negative side: the dictionary thing was totally denied by my boss so I'm here asking for any ideas.
Thanks in advance.

I suggest in your case nGram tokenizer see the example
I will create a analyzer and a mapping for a doc type
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": 4,
"max_gram": 4,
"token_chars": [ "letter", "digit" ]
}
},
"analyzer": {
"ngram_tokenizer_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"text_field": {
"type": "string",
"term_vector": "yes",
"analyzer": "ngram_tokenizer_analyzer"
}
}
}
}
}
after that I`ll insert a document using your file name
PUT /test_index/doc/1
{
"text_field": "axx14anses19122015"
}
now I`ll just will use a query match
POST /test_index/_search
{
"query": {
"match": {
"text_field": "anses"
}
}
}
and will receive a reponse like this
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.10848885,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.10848885,
"_source": {
"text_field": "axx14anses19122015"
}
}
]
}
}
What i did?
i just created a nGram tokenizer that will explode our string in 4 characters terms and will index this terms separated and they will be searched when I search a part of the string.
To see more, read this article https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
Hope it help!

Ok after trying -so- many times it worked. I'll share the solution just in case someone else needs it. Thank you so much to Waldemar, it was a really good approach and I still cannot see why it's not working.
curl -XPUT 'http://ipaddresshere/tokentest' -d
'{ "settings":
{ "number_of_shards": 1, "analysis" :
{ "analyzer" : { "myngram" : { "tokenizer" : "mytokenizer" } },
"tokenizer" : { "mytokenizer" : {
"type" : "nGram",
"min_gram" : "3",
"max_gram" : "5",
"token_chars" : [ "letter", "digit" ] } } } },
"mappings":
{ "doc" :
{ "properties" :
{ "field" : {
"type" : "string",
"term_vector" : "yes",
"analyzer" : "myngram" } } } } }'
Sorry for bad indentation, I'm really hurry but want to post the solution.
So, this will take any string from "field" and split it into nGrams with lenght 3 to 5. For example: "abcanses14f.zip" will result in:
abc, abca, abcan, bca, bcan, bcans, etc... until it reaches anses or a similar term which is matcheable and has a score related to it.

Percentage of OR conditions matched in mongodb

I have got my data in following format..
{
"_id" : ObjectId("534fd4662d22a05415000000"),
"product_id" : "50862224",
"ean" : "8808992479390",
"brand" : "LG",
"model" : "37LH3000",
"features" : [{
{
"key" : "Screen Format",
"value" : "16:9",
}, {
"key" : "DVD Player / Recorder",
"value" : "No",
},
"key" : "Weight in kg",
"value" : "12.6",
}
... so on
]
}
I need to compare features of one product with others and divide the result into separate categories ( 100% match, 50-99 % match) based on % of feature matches..
My initial thought was to prepare a dynamic query with or condition for each feature and do the percentage thing in php but then that means mongodb will return me even those product which only have 1 feature matching. And I I think nearly all products of a category might have some feature in common, so I fear I might be working on lot of products in php.
I have two questions basically.
is there any alternate ways?
And is the data structure I am using is good enough to support the functionality I am looking for, Or should I consider changing it

Well your solution really should be MongoDB specific otherwise you will end up doing your calculations and possible matching on the client side, and that is not going to be good for performance.
So of course what you really want is a way for that to have that processing on the server side:
db.products.aggregate([
// Match the documents that meet your conditions
{ "$match": {
"$or": [
{
"features": {
"$elemMatch": {
"key": "Screen Format",
"value": "16:9"
}
}
},
{
"features": {
"$elemMatch": {
"key" : "Weight in kg",
"value" : { "$gt": "5", "$lt": "8" }
}
}
},
]
}},
// Keep the document and a copy of the features array
{ "$project": {
"_id": {
"_id": "$_id",
"product_id": "$product_id",
"ean": "$ean",
"brand": "$brand",
"model": "$model",
"features": "$features"
},
"features": 1
}},
// Unwind the array
{ "$unwind": "$features" },
// Find the actual elements that match the conditions
{ "$match": {
"$or": [
{
"features.key": "Screen Format",
"features.value": "16:9"
},
{
"features.key" : "Weight in kg",
"features.value" : { "$gt": "5", "$lt": "8" }
},
]
}},
// Count those matched elements
{ "$group": {
"_id": "$_id",
"count": { "$sum": 1 }
}},
// Restore the document and divide the mated elements by the
// number of elements in the "or" condition
{ "$project": {
"_id": "$_id._id",
"product_id": "$_id.product_id",
"ean": "$_id.ean",
"brand": "$_id.brand",
"model": "$_id.model",
"features": "$_id.features",
"matched": { "$divide": [ "$count", 2 ] }
}},
// Sort by the matched percentage
{ "$sort": { "matched": -1 } }
])
So as you know the "length" of the $or condition being applied, then you simply need to find out how many of the elements in the "features" array match those conditions. So that is what the second $match in the pipeline is all about.
Once you have that count, you simply divide by the number of conditions what were passed in as your $or. The beauty here is that now you can do something useful with this like sort by that relevance and then even "page" the results server side.
Of course if you want some additional "categorization" of this, all you would need to do is add another $project stage to the end of the pipeline:
{ "$project": {
"product_id": 1
"ean": 1
"brand": 1
"model": 1,
"features": 1,
"matched": 1,
"category": { "$cond": [
{ "$eq": [ "$matched", 1 ] },
"100",
{ "$cond": [
{ "$gte": [ "$matched", .7 ] },
"70-99",
{ "$cond": [
"$gte": [ "$matched", .4 ] },
"40-69",
"under 40"
]}
]}
]}
}}
Or as something similar. But the $cond operator can help you here.
The architecture should be fine as you have it as you can have a compound index on the "key" and "value" for the entries in your features array and this should scale well for queries.
Of course if you actually need something more than that, such as faceted searching and results, you can look at solutions like Solr or elastic search. But the full implementation of that would be a bit lengthy for here.

I'm assuming that you'd like to compare the rest of the collection to a given product, which is a textbook example of aggregation:
lookingat = db.products.findOne({product_id:'50862224'})
matches = db.products.aggregate([
{ $unwind: '$features' },
{ $match: { features: { $in: lookingat.features }}},
{ $group: { _id: '$product_id', matchedfeatures: { $sum:1 }}},
{ $sort: { matchedfeatures: -1 }},
{ $limit: 5 },
{ $project: { _id:0, product_id: '$_id',
pctmatch: { $multiply: [ '$matchedfeatures',
100/lookingat.features.length ]}
}}
])
Walking through this briefly from the perspective of a product in the collection that has 6 features, and comparing it to the target product ('lookingat') which has 4 features, 3 of which match:
$unwind turns 1 document with 6 features into 6 otherwise-identical documents with 1 feature each
$match looks for that feature in the target's feature array (be aware that two documents are "equal" only if they have the same field names and values, in the same order), discards the 3 that don't match, and passes along the 3 that do
$group consumes those 3 matching documents and produces a new one that tells you there were 3 documents that matched that product_id
$sort and $limit give you the most relevant results and leave behind all those 1-feature matches you were concerned about
$project lets you rename the _id from the $group step back to product_id and also math the number of matching features into a percentage (we avoided a $divide operation by recognizing that 2 of the 3 terms in our calculation are constants and can be divided in JS)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

advanced search with ElasticSerach - php

Related

Can you use named queries on Elasticsearch with PHP?

Any way to prevent Doctrine result to nest entities when adding custom field to QueryBuilder?

Custom sorting in Elasticsearch

Elasticsearch: What's the best way to search for a word within a string AND get score?

Percentage of OR conditions matched in mongodb

Categories

Resources