Elasticsearch and Laravel scout-elasticsearch-driver timestamps malformed error - php

I have successfully configured ES and the babenkoivan/scout-elasticsearch-driver, but run into this error when adding new entries to the DB:
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse [updated_at.raw]"}],"type":"mapper_parsing_exception","reason":"failed to parse [updated_at.raw]","caused_by":{"type":"illegal_argument_exception","reason":"Invalid format: \"2018-07-13 07:52:02\" is malformed at \" 07:52:02\""}},"status":400}
I have set the format in the mapping like this, and according to the ES docs this format should work:
protected $mapping = [
'properties' => [
'created_at' => [
'type' => 'date',
'format' => 'yyyy-MM-DD HH:mm:ss',
'fields' => [
'raw' => [
'type' => 'date',
'index' => 'not_analyzed'
]
]
],
'updated_at' => [
'type' => 'date',
'format' => 'yyyy-MM-DD HH:mm:ss',
'fields' => [
'raw' => [
'type' => 'date',
'index' => 'not_analyzed'
]
]
]
]
];
https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html#multiple-date-formats
Is there something I'm missing here?

In your mapping you defined a custom date format (yyyy-MM-DD HH:mm:ss) for created_at and updated_at. The raw fields instead are a date type too, but use the default format (which according the doc is date_optional_time, meaning yyyy-MM-DD'T'HH:mm:ss).
This means that the former expects 2018-07-13 07:52:02, while the latter 2018-07-13T07:52:02, so you indexing can't possibly avoid breaking one of the two.
Now, the use of multi-fields is meant to index values in different ways, but what you are doing is to create a new field raw with basically the same properties of the base value (they are both date types, except for the inconsistency in the format, of course).
So, in my opinion you options are:
if you don't have any specific use for raw you can remove it from the mapping. Sorting and matching works well with the base field.
"created_at": {"type": "date", "format": "yyyy-MM-DD HH:mm:ss"}
if you need to keep the original string format (as "raw" may suggest) you can use a keyword type
"created_at": {"type": "date", "format": "yyyy-MM-DD HH:mm:ss", "fields": {"raw": {"type": "keyword"}}}
if you really need the raw field as is, you have to specify a format that is consistent with the other one:
"created_at": {"type": "date", "format": "yyyy-MM-DD HH:mm:ss", "fields": {"raw": {"type": "date", "format": "yyyy-MM-DD HH:mm:ss"}}}

Related

Malformed UTF-8 characters, possibly incorrectly encoded Laravel

Laravel 8
I have seen a few of these questions, but the answers are either msising, not for php, or some weird hack.
I have a table in the database, mariadb, with the field type of LONGTEXT - equates to JSON field.
in my model I do:
protected $casts = [
'event_data' => 'array',
];
public function setEventDataAttribute($value) {
$this->attributes['event_data'] = json_encode($value);
}
The data going into the field is:
array:12 [
"start" => "2022-08-23T00:00:00+00:00"
"end" => "2022-08-23T00:00:00+00:00"
"all_day" => false
"unassigned" => true
"draft" => true
"title" => "ggggg"
"notes" => "test"
"active" => true
"schedule_calendar_id" => null
"jobcode_id" => 122723308
"customfields" => array:2 [
1782352 => "Dirty"
1782354 => "Vacant"
]
"color" => "#888888"
]
When I run json_encode($value) where $value is the above array, I get:
{
"start": "2022-08-23T00:00:00+00:00",
"end": "2022-08-23T00:00:00+00:00",
"all_day": false,
"unassigned": true,
"draft": true,
"title": "ggggg",
"notes": "test",
"active": true,
"schedule_calendar_id": null,
"jobcode_id": 122723308,
"customfields": {
"1782352": "Dirty",
"1782354": "Vacant"
},
"color": "#888888"
}
which according to every validator out there, this is valid JSON. How ever attempting to set this as the attribute into the field throws:
Malformed UTF-8 characters, possibly incorrectly encoded
I can, above the $this->attributes['event_data'] do:
dump(json_encode($value), json_decode(json_encode($value)));
And get the json object listed above and get a stdClass class object of the decoded json.
So my question is:
If the online JSON formatters are saying this is valid JSON, php has no issue encoding and decoding it - why can't laravel insert it? Is it the dates? they must be in ISO8601 Format.
What is going on? I have done this, json encoding like this, a thousand times with no issue.

(Elasticsearch) Convert Unix formatted data into timestamp (without changing the mappings)

We're executing an Elasticsearch query like this using PHP API:
$params = [
//please ignore the variables below,
//we made it in dynamic parameter-based in our function,
//that's why they're variables
'index' => $ourIndex,
'type' => $ourType,
'from' => $from,
'size' => $page_size,
'body' => [
"query" => [
'bool' => [
'must' => [
[
"query_string" => [
"default_field" => $content,
"query" => "$keywords"
]
],
[
"range" => [
"#timestamp" => [
"from" => $parseParams['pub_date_start'],
"to" => $parseParams['pub_date_end'],
'format' => "yy-MMM-dd'T'HH:mm:ss.SSS'Z'",
]
]
]
]
]
]
]
];
The query above works with our #timestamp field because its type is on date
"#timestamp" : {
"type" : "date"
}
And a sample value is like this:
"#timestamp" : "2019-06-17T16:53:55.778Z"
However, we want to target our pub_date field in our index, and in its mapping, the field has a type of long
"pub_date" : {
"type" : "long"
},
so it has this kind of values when we're displaying the documents:
"pub_date" : 1510358400
When we changed the query above to target instead of #timestamp to pub_date, it now displays an error like this:
Tried Solutions
I tried to add an additional format epoch_millis in the format property:
[
"range" => [
"pub_date" => [
"from" => $parseParams['pub_date_start'],
"to" => $parseParams['pub_date_end'],
'format' => "yyyy-MM-dd||yy-MMM-dd'T'HH:mm:ss.SSS'Z'||epoch_millis",
]
]
]
But still fails
Main Question
I feel that the Unix formatted values cant be recognized by the range query of Elasticsearch so that's why the query fails. Is there a work-around for this without changing the MAPPINGS of the index?
Because the other possible solutions suggested to change the mapping, but we already have around 25 million documents in the index, so we thought formatting it in PHP would be a nicer approach
Since the field is of type long and stores the unix timestamp, simply convert the date in $parseParams['pub_date_start'] and $parseParams['pub_date_end'] to unix timestamp using strtotime. Update the range query as below:
"range" => [
"pub_date" => [
"from" => strtotime($parseParams['pub_date_start']),
"to" => strtotime($parseParams['pub_date_end']),
]
]

How can i set a dynamic date for a JSON value?

Im trying to set a POST request to a file in my website with some data in JSON. There is a value response that needs to be a dynamic date so when the request is made the current date is saved in a custom field. How can i set up the code under "value" so the date is generated?
Have tried things like "variable":{ new Date(), } but can't make it work since always ends displaying the code and not a date, or the code is not validated if i add [ or { , etc.
{
"version": "v2",
"content": {
"messages": [],
"actions": [{
"action": "set_field_value",
"field_name": "bday_reg_date",
"value": "2019-06-22"
}, {
"action": "set_field_value",
"field_name": "bday_exp_date",
"value": "2019-06-29"
}
That's the code im basing mine. Everytime i access the file (apparently a .php file), a new date is generated on "value", that date is send and to a custom field called reg_date on another platform. What should be the correct way to get that dynamic value?
Thanks.
If your are trying to create a new json object in PHP you have to create an array first then use the json_encode($arr) function to convert it to json.
https://www.php.net/manual/en/function.json-encode.php
IE:
<?php
$dateNow = new DateTime('now');
$dateNextWeek = new DateTime('now');
$dateNextWeek->modify('+1 week');
$arr = [
'version' => 'v2',
'content' => [
'messages' => [],
'actions' => [
[
'action' => 'set_field_value',
'field_name' => 'bday_reg_date',
'value' => $dateNow->format('d-m-Y'),
],
[
'action' => 'set_field_value',
'field_name' => 'bday_reg_date',
'value' => $dateNextWeek->format('d-m-Y'),
],
]
]
];
return json_encode($arr);

Elasticsearch conflicts with existing mapping in other types

I'm very new in Elasticsearch, I'm implementing it inside my Laravel project with Elasticsearch Scout Driver but I've got an error while insert model object inside index.
The model is a Post, made like this:
>>> $post = App\Models\Post::first();
=> App\Models\Post {#853
id: 1,
title: "First Post",
description: "My first post",
created_at: "2017-09-13 13:31:51",
updated_at: "2018-02-16 16:23:44",
deleted_at: null,
}
Inside model class, I declare the mapping options, I report my mapping options:
// Map elements to be saved in Elasticsearch
protected $mapping = [
'properties' => [
'id' => [
'type' => 'integer',
'index' => false
],
'title' => [
'type' => 'text'
],
'description' => [
'type' => 'text'
],
'created_at' => [
'type' => 'date',
'ignore_malformed' => true,
'format' => "yyyy-MM-dd HH:mm:ss"
],
'updated_at' => [
'type' => 'date',
'ignore_malformed' => true,
'format' => "yyyy-MM-dd HH:mm:ss",
],
'deleted_at' => [
'type' => 'date',
'ignore_malformed' => true,
'format' => "yyyy-MM-dd HH:mm:ss",
]
]
];
Every time I call $post->searchable(); to put my model inside my Elasticsearch index, I've got this error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Mapper for [deleted_at] conflicts with existing mapping in other types:\n[mapper [deleted_at] has different [format] values]"
}
],
"type": "illegal_argument_exception",
"reason": "Mapper for [deleted_at] conflicts with existing mapping in other types:\n[mapper [deleted_at] has different [format] values]"
},
"status": 400
}
I'm guessing that the problem is the null value of deleted_at property.
I need deleted_at == null because I manage soft deletion with Laravel: any another value will cause the soft deletion for Laravel framework (not retrieve element when query).
As you can see, I tried to put ignore_malformed => true but it doesn't work for me.
I tried to add another option null_value => NULL without success.
Where I am wrong?
How can I insert posts inside my Elasticsearch Index with deleted_at attribute set to null OR set to date with format yyyy-MM-dd HH:mm:ss?
Thanks
PS: I'm using Elasticsearch Version 6.1.2.
An index consists of multiple types (in version 6 this is no longer possible mainly due to this reason). The problem with different types is that they cannot store the same field name with a different mapping. This has to do with the way it is stored in Lucene.
Could it be you are inserting documents in two different types? Maybe by accident (Typo in the type while ingesting documents for instance). Then it might try to create a different field type by dynamic mapping say a string. This would cause the exception that you mention.

Is it possible to only use filters without text search in elasticsearch

I am using ES for my Laravel app, and I need to do a search query that only contains filters and no "text search" but I am not sure on how to write it.
Must I use match_all eg:
$query = [
'filtered' => [
'query' => [
'match_all' => []
],
'filter'=> [
'bool' => [
'must' => [
[ 'range' => [
'price' => [
'lte' => 9000
]
]
],
],
]
],
],
];
Or like this:
$query = [
'filtered' => [
'filter'=> [
'bool' => [
'must' => [
[ 'range' => [
'price' => [
'lte' => 9000
]
]
],
],
]
],
],
];
What I want is to only use a filtered bool query without text search.
In fact, if you don't specify the query part in your filtered query, a match_all query is used by default. Quoting the doc :
If a query is not specified, it defaults to the match_all query. This
means that the filtered query can be used to wrap just a filter, so
that it can be used wherever a query is expected.
Your second query should do the job : filters must be wrapped either in filtered (doc) or constant_score (doc) queries to be used.
If the scoring part isn't useful for you, you can stick to the filtered query.
Last thing : you don't have to nest your filter in a bool filter, unless you want to combine it with other(s) filter(s). In your demo case, you can write directly :
$query = [
'filtered' => [
'filter'=> [
'range' => [
'price' => [
'lte' => 9000
]
]
]
]
];
Hope this will be helpful :)
It's actually exactly the same thing since if a query is not specified in the clause it defaults to using the match_all query.
While in query context, if you need to use a filter without a query (for instance, to match all emails in the inbox), you can just omit the query:
GET /_search
{
"query": {
"filtered": {
"filter": { "term": { "folder": "inbox" }}
}
}
}
If a query is not specified it defaults to using the match_all query, so the preceding query is equivalent to the following:
GET /_search
{
"query": {
"filtered": {
"query": { "match_all": {}},
"filter": { "term": { "folder": "inbox" }}
}
}
}
Check here the official documentation: http://www.elastic.co/guide/en/elasticsearch/guide/current/_combining_queries_with_filters.html

Categories