understanding ElasticSearch routing - php

I am trying to use the elasticsearch routing mapping to speed up some queries, but I am not getting the expected result set (not worried about the query performance just yet)
I am using Elastic to set up my mapping:
$index->create(array('number_of_shards' => 4,
'number_of_replicas' => 1,
'mappings'=>array("country"=>array("_routing"=>array("path"=>"countrycode"))),
'analysis' => array(
'analyzer' => array(
'indexAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
),
'searchAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
)
)
) ), true);
If I understand correctly, what should happen is that each result should now have a field called "countrycode" with the value of "country" in it.
The results of _mapping look like this:
{"postcode":
{"postcode":
{"properties":
{
"area1":{"type":"string"},
"area2":{"type":"string"},
"city":{"type":"string",
"include_in_all":true},
"country":{"type":"string"},
"country_iso":{"type":"string"},
"country_name":{"type":"string"},
"id":{"type":"string"},
"lat":{"type":"string"},
"lng":{"type":"string"},
"location":{"type":"geo_point"},
"region1":{"type":"string"},
"region2":{"type":"string"},
"region3":{"type":"string"},
"region4":{"type":"string"},
"state_abr":{"type":"string"},
"zip":{"type":"string","include_in_all":true}}},
"country":{
"_routing":{"path":"countrycode"},
"properties":{}
}
}
}
Once all the data is in the index if I run this command:
http://localhost:9200/postcode/_search?pretty=true&q=country:au
it responds with 15740 total items
what I was expecting is that if I run the query like this:
http://localhost:9200/postcode/_search?routing=au&pretty=true
Then I was expecting it to respond with 15740 results
instead it returns 120617 results, which includes results where country is != au
I did note that the number of shards in the results went from 4 to 1, so something is working.
I was expecting that in the result set there would be an item called "countrycode" (from the rounting mapping) which there isn't
So I thought at this point that my understand of routing was wrong. Perhaps all the routing does is tell it which shard to look in but not what to look for? in other words if other country codes happen to also land in that particular shard, the way those queries are written will just bring back all records in that shard?
So I tried the query again, this time adding some info to it.
http://localhost:9200/postcode/_search?routing=AU&pretty=true&q=country:AU
I thought by doing this it would force the query into giving me just the AU place names, but this time it gave me only 3936 results
So I Am not quite sure what I have done wrong, the examples I have read show the queries changing from needing a filter, to just using match_all{} which I would have thought would only being back ones matching the au country code.
Thanks for your help in getting this to work correctly.
Almost have this working, it now gives me the correct number of results in a single shard, however the create index is not working quite right, it ignores my number_of_shards setting, and possibly other ones too
$index = $client->getIndex($indexname);
$index->create(array('mappings'=>array("$indexname"=>array("_routing"=>array("required"=>true))),'number_of_shards' => 6,
'number_of_replicas' => 1,
'analysis' => array(
'analyzer' => array(
'indexAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
),
'searchAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
)
)
) ), true);

I can at least help you with more info on where to look:
http://localhost:9200/postcode/_search?routing=au&pretty=true
That query does indeed translate into "give me all documents on the shard where documents for country:AU should be sent."
Routing is just that, routing ... it doesn't filter your results for you.
Also i noticed you're mixing your "au"s and your "AU"s .. that might mix things up too.
You should try setting required on your routing element to true, to make sure that your documents are actually stored with routing information when being indexed.
Actually to make sure your documents are indexed with proper routing explicitly set the route to lowercase(countrycode) when indexing documents. See if that helps any.
For more information try reading this blog post:
http://www.elasticsearch.org/blog/customizing-your-document-routing/
Hope this helps :)

Related

Creating campaign for dynamic TextMerge segment fails

I'm trying to send a campaign to a dynamic list segment based on a custom numeric merge field (GMT_OFFSET, in this case) but the code below yields the following error from the MailChimp API:
"errors" => [
0 => [
"field" => "recipients.segment_opts.conditions.item:0"
"message" => "Data did not match any of the schemas described in anyOf."
]
]
My code, using drewm/mailchimp-api 2.4:
$campaign = $mc->post('campaigns', [
'recipients' => [
'list_id' => config('services.mailchimp.list_id'),
'segment_opts' => [
'conditions' => [
[
'condition_type' => 'TextMerge',
'field' => 'GMT_OFFSET',
'op' => 'is',
'value' => 2,
],
],
'match' => 'all',
],
],
],
// Cut for brevity
];
If I am to take the field description literally (see below), the TextMerge condition type only works on merge0 or EMAIL fields, which is ridiculous considering the Segment Type title says it is a "Text or Number Merge Field Segment". However, other people have reported the condition does work when applied exclusively to the EMAIL field. (API Reference)
I found this issue posted but unresolved on both DrewM's git repo (here) and SO (here) from January 2017. Hoping somebody has figured this out by now, or found a way around it.
Solved it! I passed an integer value which seemed to make sense given that my GMT_OFFSET merge field was of a Number type. MailChimp support said this probably caused the error and suggested I send a string instead. Works like a charm now.

PHP::MongoCollection->aggregate() Failures

I have a MongoDB query that is verified as 100% working ( Using MongoHub I have connected to the Replica Set and run the query and received results ), but when converting this query to PHP and attempting to run it through MongoCollection->aggregate(), I fail to get a return/result of any kind whatsoever ... not even an error.
Here is the query, as put into a PHP Array ( as MongoCollection requires ):
$query = array(
'$match' => array(
'$and' => array(
'make' => $props[0],
'model' => $props[1],
'makeYear' => (integer)$props[2],
'status' => 'Active'
)
),
'$group' => array(
'_id' => null,
'marketTotal' => array('$sum' => '$price'),
'count' => array('$sum' => 1)
)
);
The code to run the query is a simple one-liner calling aggregate.
As I don't get errors ... or a log showing any sort of error ... I'm kind of at a total loss here. Is anyone familiar with using PHP w/ MongoDB able to see what I might be doing wrong?
Turns out I was simply missing a layer of arrays ... wrapping each piece of the '$and' array in its own array ... so array('make' => $props[0]), etc ... made it work.
Fun stuff. MongoDB queries are easy. Translating them into PHP-compatible arrays is apparently very difficult and requires a lot of guesswork because it's not 1-to-1

Elasticsearch Completion

I have a elasticsearch index which i update every 10 minutes via cronjob. In this index i have a completion field which works as expected.
But i have one little problem. Lets say i have a "article" field where i change a value from "a" to "b". After 10 minutes the index is been updated and the document which holds article "a" is been updated to article "b". Everything as expected.
But my completion field now holds both values. "a" and "b" both with the same id.
How can this happen?
Mapping:
'suggest' => array(
'type' => 'completion',
'payloads' => true,
'preserve_separators' => false,
'search_analyzer' => 'standard',
'index_analyzer' => 'standard'
),
How i set the field:
'suggest' => array(
'input' => array(
$result["Name"],
$result["Name"],
$result["Name2"],
$result["Name3"],
$result["Name4"],
$result["Name5"]
),
'output' => $result["Name"].' (' . $result["Name1"].', '.$result["Name2"].')',
'payload' => array(
'id' => $result["ID"]
)
)
Found the answer in the docs.
The suggest data structure might not reflect deletes on documents immediately. You may need to do an Optimize for that. You can call optimize with the only_expunge_deletes=true to only cater for deletes or alternatively call a Merge operation.

Query caching not working in Yii framework

I've a table containing more than 27,000 records. I want to fetch all data in Dropdown list. For that I've implemented cache but it seems to be not working as its getting very slow and showing blank page (Sometime browser is getting hanged).
Following is my code (I am using yiiboilerplate):
Configuration of backend/config/main.php in component array:
'cache' => array(
//'class' => 'system.caching.CMemCache',
'class' => 'system.caching.CDbCache',
'connectionID' => 'db',
),
In View page:
$dependency = new CDbCacheDependency('SELECT MAX(bank_id) FROM bank');
$bank = CHtml::listData(Bank::model()->cache(1000, $dependency)->findAll('is_active=1', array('order' => 'name')), 'bank_id', 'concatened');
echo $form->dropDownListRow($model, 'bank_id', $bank, array(
'empty' => 'Select'
));
I think 27000 records is not big data but still its getting very slow and I want to implement cache in my entire application.
Is my configuration correct? Where I am going wrong?
Thanks
I think your parameters in findAll is incorrect.
It should be:
Bank::model()
->cache(1000, $dependency)
->findAll([
'select' => 'bank_id',
'order' => 'name ASC', // if it is in ascending order
'condition' => 'is_active = 1'
]);
I don't know what concatened so I just ignored it. But you can always use scopes for your conditions.

How are filters applied in Elastic Search?

In ES are filters applied before the query?
Say, for example, I am doing a really slow fuzzy search but I am only doing it on a small date range. For an example you can look below (PHP):
$res=$client->search(array('index' => 'main', 'body' => array(
'query' => array(
'bool' => array(
'should' => array(
array('wildcard' => array('title' => '*123*')),
)
)
),
'filter' => array(
'and' => array(
array('range' => array('created' => array('gte' => date('c',time()-3600), 'lte' => date('c',time()+3600))))
)
),
'sort' => array()
)));
Will the filter be applied before trying that slower search?
Logic would dictate that filters are run and then the query but I would like to be sure.
If you use the filtered-query, then filters will be applied before documents are scored.
This will generally speed things up quite a lot. However, the fuzzy query will still be using the input to build a larger query regardless of the filters.
When you use filter right on the search object, then the query will first run without respecting the filter, then documents will be filtered out of the hits - whereas facets will remain unfiltered.
Therefore, you should almost always use the filtered-query, at least when you are not using facets.

Categories