Elasticsearch in php doesn't recognize dash - php

I'm working on a project and try to make a search with elasticsearch but my field can contain dash and when I search with it I can't find the result I'm looking for, so I tried to change the mapping but the index doesn't work at all. I don't have any error message but I can't find what I indexed even using a different field. So what I did was :
$params = [
'index' => 'arc',
'type' => 'purchase',
'id' => $purchase['id'],
'body' => $purchase
];
It worked great with that except for the field with the dash. My $purchase looks like that :
array:34 [
"id" => 163160
"distant" => "MOR-938BBM28147090"
[...]
]
so when I search for "MOR" I find the result but when I do "MOR-" nothing. I tried to change the mapping by doing that :
$params = [
'index' => 'arc',
'type' => 'purchase',
'id' => $purchase['id'],
'body' => [
'mappings' => [
'_default_' => [
'properties' => [
'distant' => [
'type' => 'string',
'index' => 'not_analyzed'
]
]
]
],
$purchase
]
];
But with that even if I try to search "163160" I can't find any result.

Whitespace analyzer could be the right solution in this case. It takes into account only whitespaces while breaking text into tokens, and characters like "-" or "_" are still treated as a part of a term.
But if you need to do a partial matching, for example with "MOR-" token, then it requires a bit more complicated mapping.
As I don't know php, I'll be using Elasticsearch syntax. First, create a proper mapping:
PUT http://127.0.0.1:9200/arc
{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 18,
"token_chars": [
"letter",
"digit",
"punctuation"
]
}
}
}
},
"mappings": {
"purchase": {
"properties": {
"distant": {
"type": "string",
"analyzer": "edge_ngram_analyzer"
}
}
}
}
}
As you can see, I use EdgeNGram tokenizer here. When you index a document with MOR-938BBM28147090 in distant field, it will create following tokens:
[MOR, MOR-, MOR-9, MOR-93, MOR-938, MOR-938B, MOR-938BB, ...]
The core point here is punctuation character class in token_chars list, that tells elasticsearch, that dash character (and some others like ! or ") should be included in a token and not treated as a "split char".
Now when I index the document:
PUT http://127.0.0.1:9200/arc/purchase/163160
{
"distant": "MOR-938BBM28147090"
}
and run a term search query:
POST http://127.0.0.1:9200/arc/purchase/_search
{
"query": {
"bool" : {
"must" : {
"term" : {
"distant": "MOR-93"
}
}
}
}
}
I get in response:
"hits": {
"total": 1,
"max_score": 0.6337049,
"hits": [
{
"_index": "arc",
"_type": "purchase",
"_id": "163160",
"_score": 0.6337049,
"_source": {
"distant": "MOR-938BBM28147090"
}
}
]
}

Related

Query an array with must match

I have an index of documents. The index contains body of documents and the type of document e.g pdf, jpeg, png etc. I can query the index with a word and one document type using must just fine.
$params = [
'index' => 'trial2',
'type' => '_doc',
'body' => [
'query' => [
'bool' => [
'must' => [
[ 'match' => [ 'file.extension' => "png" ] ],
[ 'match' => [ 'content' => "abc" ] ],
]
]
]
]
];
screenshot The challenge is, I would like to query the index still using must but with an array of document type (png but jpeg, gif, svg, tiff) so that I classify it as an image. How do I replace png with an array so that at lease one is true.
If file.extension is of text type:
Just add more tokens next to Match Query. Make sure you go through this (Analysis) and to this (Analyzer) to understand how it works internally.
POST my_png/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"file.extension": "jpg jpeg png" <---- Note this.
}
},
{
"match": {
"content": "abc"
}
}
]
}
}
}
If file.extension is of keyword type (Recommended)
Or if you have a keyword sibling in file.extension.keyword, you can make use of Terms Query
POST my_png/_search
{
"query": {
"bool": {
"must": [
{
"terms": { <---- Terms Query
"file.extension.keyword": [ <---- Or 'file.extension' field, whichever must of be type `keyword`
"jpg",
"jpeg",
"png"
]
}
},
{
"match": {
"content": "abc"
}
}
]
}
}
}
Based on your requirement, I like to think that you must be using the second option as for exact matches, you would need to use Terms Query on keyword field.
Hope that helps!
You can use a "terms" query:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
The terms query returns documents that contain one or more exact terms in a provided field.

How to ignore query constraints when the parameter is empty or null in elastic search

Elastic Noob here, I have the following query for the elastic search index. in which I try to filter the records based on the title of the product record.
"query" => [
"bool" => [
"should" => [
[
"nested" => [
"path" => "name",
"query" => [
"multi_match" => [
"query" => (string) $query, // here the $query can be empty string
"fields" => ['name.en', 'name.ar'],
],
],
],
],
],
],
];
Now the parameter $query (please see the commented section in the sample code) can be an empty string. In that case, now I am getting zero results, Obviously because I don't have any records with an empty title.
What I would like to get
is to essentially ignore the query since the parameter is empty and to get a default result set back.
I have more queries like category, tags, reviews etc... so even when the name/title query is empty, I should be able to filter based on the other queries. But now if the name part is empty I am getting an empty result set.
Please post a comment if more info is needed
Elasticsearch 7
In a match or multi_match query an empty string will be ingored if one uses zero_terms_query option, like this:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "",
"zero_terms_query": "all",
"fields": ["name.en", "name.ar"]
}
}
]
}
}
}
or match version:
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "",
"zero_terms_query": "all"
}
}
}
]
}
}
}
See zero terms query

Override/Update Variable/Array in php file

In my recent project, working on console command where I need to perform/run various action mentioned in json based on the linux standard convention as
what:mv(Move),type:file,dir
what:mkdir(Make Directory)
what:touch(Make File)
what:cp(Copy), type:file,dir
what:echo (Write into File), type:override,append
what:sed (Find and Replace in file)
and param schema would be same almost exact to linux convention.
Current SetUp (Mkdir, touch)
Json Schema (Array)
"actions" => [
[
'what' => "mkdir",
'param' => [
'name' => "cache",
'in' => "bootstrap",
],
],
[
'what' => "touch",
'param' => [
'name' => ".gitignore",
'in' => "bootstrap/cache",
],
]
],
and its iterate through all action and resolve action class per what type (mkdir,touch) like MkdirOperation for mkdir and call handle functions respectively.
<?php
use Symfony\Component\Filesystem\Filesystem;
use Symfony\Component\Filesystem\Exception\IOExceptionInterface;
class MkdirOperation extends Operation
{
const ATTR_IN = "in";
const ATTR_NAME = "name";
public function handle()
{
$path = $this->_path();
$this->oIO->comment($path);
if ($this->oFileSystem->isAbsolutePath($path)) {
try {
$this->oFileSystem->mkdir($path);
} catch (IOExceptionInterface $e) {
echo "An error occurred while creating your directory at "
.$e->getPath();
}
$this->oIO->info("Directory created at:".$path);
}
}
private function _path()
{
return $this->oConfig->getBaseDir()
.$this->aParam[self::ATTR_IN].DIRECTORY_SEPARATOR
.$this->aParam[self::ATTR_NAME]
.DIRECTORY_SEPARATOR;
}
}
Requirement:
//somefile.php
$path = "/var/www/ins/"
//someotherfile.php
return [
'files' = [
'Path\\To\\NameSpace',
'Path\\To\\NameSpace'
]
];
So, basically I want to update/override my mentioned variable/array according to specific rules, for that purpose, I tried to prepare rules in json schema:
"actions": [
{
"what": "sed",
"in": "path/to/somefile.php",
"find": {
"type": "variable",
"value": "path"
},
"replace": {
"type": "variable",
"value": "__DIR__.'/../vendor/compiled.php';"
}
},{
"what": "put",
"value": "Path\\To\\NameSpace",
"in": "path/to/someotherfile.php",
"find": {
"type": "array",
"at": "files"
}
}
]
The Component I'm using
symfony/console
symfony/finder
Symfony/filesystem
Looking for:
Suggestion to organize rules set schema in such manner to iterate through all actions for update/override variable or push/pull element from array and perform action.
Mechanism to update the value of specific variable and also push/pull element from array/subarray using php.
If still something unclear from my side let me know.
Thanks in advance.

elasticsearch php search exists

How might one do the following request
GET /giata_index/giata_type/_search/exists
{
"query": {
"bool": {
"must": [
{
"term": {
"status": 2
}
},
{
"term": {
"ids": "26744"
}
}
]
}
}
}
with ElasticSearch's PHP library?
I have played around with the exists endpoint, but as it turns out, that can only check whether a specific uid is existant or not. So I guess I need to do a search. But I can't find a parameter in the Search endpoints's whitelist that would allow a simple check for exists or not.
The reason why I would like to avoid getting the entire document and just ask whether it exists or not is because I have multiple hundreds of thousands of imports and just as many documents in ES, so I would like it to put as little work into it as possible.
Note: I have also looked into head requests that are possible via HTTP requests (only retrieve the header of a document - either 200 or 404). But that would probably only exist for requests via HTTP.
If worse comes to worse I could shoot a curl via php and simply do it via HTTP. But I would prefer it otherwise.
It seems indeed that there's no endpoint voor search exists, but I think you use a simple alternative:
Use an empty "fields" array. And count the results of your query. If == 0: false. If > 0: true
GET /giata_index/giata_type/_search
{
"fields": [],
"query": {
"bool": {
"must": [
{
"term": {
"status": 2
}
},
{
"term": {
"ids": "26744"
}
}
]
}
}
}
An other alternative is to use _count : https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-count.html
It should be possible with the latest 2.x version.
Code sample could be something like this:
$clientBuilder = Elasticsearch\ClientBuilder::create();
// Additional client options, hosts, etc.
$client = $clientBuilder->build();
$index = 'your_index';
$type = 'your_type';
$params = [
'index' => $index,
'type' => $type,
'body' => [
'query' => [
'bool' => [
'must' => [
[
'term' => [
"status" => 2
]
],
[
'term' => [
'ids' => "26744"
]
]
]
]
]
];
try {
$client->searchExists($params);
} catch (Exception $e) {
// Not found. You might want to return FALSE if wrapped in a function.
// return FALSE;
}
// Found.
It is worth noting that if search is not wrapped in try/catch block it can break execution and throw an exception (status code 4xx if not found).
Also, it can not be used effectively in future mode.

Is it possible to only use filters without text search in elasticsearch

I am using ES for my Laravel app, and I need to do a search query that only contains filters and no "text search" but I am not sure on how to write it.
Must I use match_all eg:
$query = [
'filtered' => [
'query' => [
'match_all' => []
],
'filter'=> [
'bool' => [
'must' => [
[ 'range' => [
'price' => [
'lte' => 9000
]
]
],
],
]
],
],
];
Or like this:
$query = [
'filtered' => [
'filter'=> [
'bool' => [
'must' => [
[ 'range' => [
'price' => [
'lte' => 9000
]
]
],
],
]
],
],
];
What I want is to only use a filtered bool query without text search.
In fact, if you don't specify the query part in your filtered query, a match_all query is used by default. Quoting the doc :
If a query is not specified, it defaults to the match_all query. This
means that the filtered query can be used to wrap just a filter, so
that it can be used wherever a query is expected.
Your second query should do the job : filters must be wrapped either in filtered (doc) or constant_score (doc) queries to be used.
If the scoring part isn't useful for you, you can stick to the filtered query.
Last thing : you don't have to nest your filter in a bool filter, unless you want to combine it with other(s) filter(s). In your demo case, you can write directly :
$query = [
'filtered' => [
'filter'=> [
'range' => [
'price' => [
'lte' => 9000
]
]
]
]
];
Hope this will be helpful :)
It's actually exactly the same thing since if a query is not specified in the clause it defaults to using the match_all query.
While in query context, if you need to use a filter without a query (for instance, to match all emails in the inbox), you can just omit the query:
GET /_search
{
"query": {
"filtered": {
"filter": { "term": { "folder": "inbox" }}
}
}
}
If a query is not specified it defaults to using the match_all query, so the preceding query is equivalent to the following:
GET /_search
{
"query": {
"filtered": {
"query": { "match_all": {}},
"filter": { "term": { "folder": "inbox" }}
}
}
}
Check here the official documentation: http://www.elastic.co/guide/en/elasticsearch/guide/current/_combining_queries_with_filters.html

Categories