Searching in an array indexed in ElasticSearch

Searching in an array indexed in ElasticSearch - php

I have indexed a document in ElasticSearch that contains arrays like this:
{
"student": "John",
"sport": "Soccer",
"match":
{
"eventType": "League",
"date": "2013-12-31T11:00:00.000Z"
}
}
I need to perform a query that searches for, for example, all league matches (ie, where doc["match"]["eventType"] == "League")
I am using the ElasticSearch-PHP api 1.1.0 and tried querying as such as this without success:
$params['body']['query']['match']['match']['eventType'] = 'League';
I also tried:
$params['body']['query']['match']['match']->eventType = 'League';
What is the correct way to do such a search? The documentation has no such examples.

Can you convert this JSON to php object?
{
"query": {
"match": {
"match.eventType": "League"
}
}
}
I think this will do the work.

As a first step, try to use a different name for your soccer 'match' and call it 'game' to prevent causing a collision with the use of the 'match' operation.

Related

Can you use named queries on Elasticsearch with PHP?

I know you can use named queries in Elasticsearch to test which document matched the best in Kibana but I'm running Wordpress with Jetpack search which uses elasticsearch PHP (v2.4) and I want to be able to test my queries and return the named queries on each result so I can better understand that my queries returned what I had intended. This is how it's done in Elasticsearch (json):
...
"must": [
{
"match": {
"body": {
"query": "Will Smith",
"_name": "match_will_smith"
}
}
}
],
"should": [
{
"match_phrase": {
"body": {
"query": "Will Smith",
"slop": 5,
"_name": "should_match_phrase_will_smith_with_slop"
}
}
},
]...
Result:
"matched_queries" : [
"match_will_smith",
"should_match_phrase_will_smith_with_slop"
]
That would be awesome if I could get the value of the "matched_queries" object and print it to my php page on every result to I can see what each article is matching. Anyone knows if this is possible?

I think the closest thing is using explain parameter in your query
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html
it would give how the score of your document was calculated by using which sections of your query
but I would not use it in production environment

How to get exact text match from elasticsearch if the query is between quotes

I implemented elasticsearch using php for binary documents (fscrawler). It works just fine with the default settings. I can search the documents for the word I want and I get results that are case insensitive. However, I now want to do exact matches i.e on top of the current search, if the query is enclosed in quotes, I want to get results that only match the query exactly.. even case sensitive.
My mapping looks like this:
"settings": {
"number_of_shards": 1,
"index.mapping.total_fields.limit": 2000,
"analysis": {
"analyzer": {
"fscrawler_path": {
"tokenizer": "fscrawler_path"
}
},
"tokenizer": {
"fscrawler_path": {
"type": "path_hierarchy"
}
}
}
.
.
.
"content": {
"type": "text",
"index": true
},
My query for the documents looks like this:
if ($q2 == '') {
$params = [
'index' => 'trial2',
'body' => [
'query' => [
'match_phrase' => [
'content' => $q
]
]
]
];
$query = $client->search($params);
$data['q'] = $q;
}
For exact matches(does not work):
if ($q2 == '') {
$params = [
'index' => 'trial2',
'body' => [
'query' => [
'filter' =>[
'term' => [
'content' => $q
]
]
]
]
];
$query = $client->search($params);
$data['q'] = $q;
}
content field is the body of the document. How do I implement the exact match for a specific word or phrase in the content field?

Your content field, what I understand, would be significantly large as many documents may be more than 2-3 MB and that's a lot of words.
There'd be no point in using keyword field in order to do exact match as per the answer to your earlier question where I referred to using keyword. You should use keyword datatype for exact match only if your data is structured
What I understand is the content field you have is unstructured. In that case you would want to make use of Whitespace Analyzer on your content field.
Also for exact phrase match you make take a look at Match Phrase query.
Below is a sample index, documents and queries that would suffice your use case.
Mapping:
PUT mycontent_index
{
"mappings": {
"properties": {
"content":{
"type":"text",
"analyzer": "whitespace" <----- Note this
}
}
}
}
Sample Documents:
POST mycontent_index/_doc/1
{
"content": """
There is no pain you are receding
A distant ship smoke on the horizon
You are only coming through in waves
Your lips move but I can't hear what you're saying
"""
}
POST mycontent_index/_doc/2
{
"content": """
there is no pain you are receding
a distant ship smoke on the horizon
you are only coming through in waves
your lips move but I can't hear what you're saying
"""
}
Phrase Match:(To search a sentence with words in order)
POST mycontent_index/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": { <---- Note this for phrase match
"content": "There is no pain"
}
}
]
}
}
}
Match Query:
POST mycontent_index/_search
{
"query": {
"bool": {
"must": [
{
"match": { <---- Use this for token based search
"content": "there"
}
}
]
}
}
}
Note that your response should be accordingly.
For exact match for a word, just use a simple Match query.
Note that when you do not specify any analyzer, ES by default uses Standard Analyzer and this would cause all the tokens to be converted into lower case before storing them in Inverted Index. However, Whitespace Analyzer would not convert the tokens into lower case. As a result There and there are stored as two different tokens in your ES index.
I'm assuming you are aware of Analysis and Analyzer concepts and if not I'd suggest you to go through the links as that will help you know more on what I'm talking about.
Updated Answer:
Post understanding your requirements, there is no way you can apply multiple analyzers on a single field, so basically you have two options:
Option 1: Use multiple indexes
Option 2: Use multi-field in your mapping as shown below:
That way, your script or service layer would have the logic of pushing to different index or field depending on your input value(ones having double inverted comma and ones that are simple tokens)
Multi Field Mapping:
PUT <your_index_name>
{
"mappings":{
"properties":{
"content":{
"type":"text", <--- Field with standard analyzer
"fields":{
"whitespace":{
"type":"text", <--- Field with whitespace
"analyzer":"whitespace"
}
}
}
}
}
}
Ideally, I would prefer to have the first solution i.e making use of multiple indexes with different mapping, however I would strongly advise you to revisit your use-case because it doesn't make sense in managing querying like this but again its your call.
Note: A cluster of single node that's the worst possible option you can ever do and specially not for Production.
I'd suggest you ask that in separate question detailing your docs count, growth rate over next 5 years or something and would your use case be more read heavy or write intensive? Is that cluster something other teams may also would want to leverage. I'd suggest you to read more and discuss with your team or manager to get more clarity on your scenarios.
Hope this helps.

Elasticsearch either or match query

I am trying to write a query to search for a products on two columns called category1 and category2. I am working using elastic search php client and tried with match should query but this giving me wrong results because of match of substring.
But i am looking for exact match with OR operation on two columns. I am new to this please guide me.
$params['index'] = 'furnit';
$params['type'] = 'products';
$params['body']['query']['bool']['should'] = array(
array('match' => array('category1' => $category->name)),
array('match' => array('category2' => $category->name)),
);
$results = $this->elasticsearch->search($params);

If you are not searching then using a bool query in this scenario is not the right way to do it in elasticsearch. Queries are used when you are searching something and relevancy of your search keyword and score of matching documents matters.
Here you can apply a bool filter of elasticsearch to filter out the desired results. Using filters with queries (filtered query) is right way to do it as it excludes all non-matching documents and then you can search for desired documents by using match queries.
here's an example of a bool filter
{
"from": 0,
"size": 50,
"sort": [
{
"name" : {
"order": "asc"
}
}
],
"query": {
"filtered": {
"query": {
"match_all" : {}
},
"filter": {
"bool": {
"should": [
{
"term": {
"category1" : "category1"
}
},
{
"term": {
"category2" : "category2"
}
}
]
}
}
}
}
}
you can refer to docs as well (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html)

Maybe your problem is you have used default analyzer (which is standard analyzer).
could you give me your mapping ?
I suggest you to change to use not_analyzer when indexing and use term filter/query.
You could use put mapping here to setting for your analyzer: Put Mapping
Edit: I have created a gist for you, check it here:
Mappings & Terms Filter

Parsing FB Graph API data Cleanly

I am currently working to parse through FB api data. Take the following for example...
{
"data": [
{
"id": "ID",
"name": "Creative!"
},
{
"id": "ID",
"name": "a name"
}
],
"paging": {
"cursors": {
"after": "fdsagfsganhdfs==",
"before": "gfdwiolrukjhteqrfgbh"
}
},
"summary": {
"total_count": 2
}
}
This is an example of what is returned when the Graph API is queried for likes. The issue I am having is that I want a clean way to get the total_count out of this data. Often times it will come in without the summary field if there are no 'likes'. This is easily parsed by doing a few if isset() and if array_key_exists but I will be dealing with a lot of data and this use case applies to many different type of data from FB. Any advice on just getting the total_count field? FQL would work but seems to be deprecated. Thanks.

If the total_count field is in fact the sum of the number of elements in the data array then you can simply use the count function in PHP to count the length of the array.

$json_array = json_decode($fb_json_string, true);
$total_count = (isset($json_array['total_count']))? $json_array['total_count']:((isset($json_array['summary']['total_count']))? $json_array['summary']['total_count']:0);
I don't know if I can make it any uglier, it's simply a suggestion though.

matching whole string with dashes in elasticsearch

I have an elasticsearch query which I am trying to match properly, the field data itself contains -(dashes), the string data are GUIDS
It was not matching properly because it was splitting the term up into separate words split by the -
I have since changed the query to use a match_phrase query like this:
"query": {
"filtered": {
"query": {
"match_phrase":{
"guid":{"operator" : "or","query":"bd2acb42-cf01-11e2-ba92-12313916f4be"}
}
}
}
}
When I am trying to match just one GUIDS, this works just fine.
However I am trying to match multiple GUIDS
So it currently looks like
"query": {
"filtered": {
"query": {
"match_phrase":{
"guid":{"operator" : "or","query":"bd2acb42-cf01-11e2-ba92-12313916f4be d1091f08-ceff-11e2-ba92-12313916f4be"}
}
}
}
}
I assume its not working because its trying to match the whole string, and not each GUID separately.
I tried added "analyzer" : "whitespace", to the query, but this broke the query entirely.
So what is the best method to ensure the query is looking for the whole GUID string and allows matching of multiple GUIDS?

I have been setting the field mapping to not_analyzed for similar purposes.
"guid" : {
"type" : "string",
"index" : "not_analyzed"
}
Building the query manually then works.
{
"bool" : {
"should" : [
{
"term" : { "guid" : "bd2acb42-cf01-11e2-ba92-12313916f4be" }
},
{
"term" : { "guid" : "d1091f08-ceff-11e2-ba92-12313916f4be" }
}
],
"minimum_number_should_match" : 1
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Searching in an array indexed in ElasticSearch - php

Can you convert this JSON to php object? { "query": { "match": { "match.eventType": "League" } } } I think this will do the work.

As a first step, try to use a different name for your soccer 'match' and call it 'game' to prevent causing a collision with the use of the 'match' operation.

Related

Can you use named queries on Elasticsearch with PHP?

How to get exact text match from elasticsearch if the query is between quotes

Elasticsearch either or match query

Parsing FB Graph API data Cleanly

matching whole string with dashes in elasticsearch

Categories

Resources