i have 2 collections : words and phrases
Each word document has an array of phrases id's. And each phrase can be active or inactive.
For example :
words : {"word" => "hello", phrases => [1,2]}{"word" => "table", phrases => [2]}
phrases :{"id" => 1, "phrase" => "hello world!", "active" => 1}{"id" => 2, "phrase" => "hello, i have already bought new table", "active" => 0}
I need to get count of active phrases for each word.
In php i do it like this:
1. get all words
2. for each word get count of active phrases with condition ['active' => 1]
Question: How can i get words with active phrases count in one request? I tried to use MapReduce, but i need to make a request for each word to get count of active phrases.
UPD:
In my test collection there are 92 000 phrases and 23 000 words.
I have already tested both variant: with php loop for each word in which i get phrases count and aggreagation function in mongo.
But i changed aggregation pipeline in commets below because of phrases_data. It is array, so i can't use $match on it. I use $unwind after $lookup.
[ '$unwind' => '$5'],
[
'$lookup' => [
'from' => 'phrases_926ee3bc9fa72b029e028ec90e282072ea0721d1',
'localField' => '5',
'foreignField' => '0',
'as' => 'phrases_data'
]
],
[ '$unwind' => '$phrases_data'],
[ '$match' => [ 'phrases_data.3' => 77] ], //phrases_data.3 => 77 it is similar to phrases_data.active => 1
[ '$group' =>
[
'_id' => ['word' => '$1', 'id' => '$0'],
'active_count' => [ '$sum' => 1]
]
],
[ '$match' => [ 'active_count' => ['$gt' => 0]] ],
[ '$sort' =>
[
'active_count' => -1
]
]
The problem is that $group command take 80% of process time. And it is much slower than php loop. Here is my results for test collection:
1. Php loop (get words-> get phrases count for each word): 10 seconds
2. Aggregation function : 20 seconds
db.words.aggregate([
{ "$unwind" : "$phrases"},
{
"$lookup": {
"from": "phrases",
"localField": "phrases",
"foreignField": "id",
"as": "phrases_data"
}
},
{ "$match" : { "phrases_data.active" : 1} },
{ "$group" : {
"_id" : "$word",
"active_count" : { $sum : 1 }
}
}
]);
You can use above aggregation pipeline :
Unwind the phrases array from words collection documen as separate document
do a lookup(join) in phrases collection using unwinded phrases
filter the phrases and check for active using $match
Finally group phrases by word and count using $sum : 1
Related
This question already has answers here:
Merge two arrays into one associative array
(2 answers)
Closed 7 months ago.
I want to change the following php array
"extra_charge_item" => [
0 => "Massage",
1 => "Pool table",
2 => "Laundry"
],
"extra_charge_description" => [
0 => "Paid",
1 => "Paid",
2 => "We wash everything"
],
"extra_charge_price" => [
0 => "200",
1 => "100",
2 => "1000"
],
I haven't been able to solve for a whole 2hrs
This is the expected output
"new_data" => [
0 => [
"Maasage", "Paid", "200"
],
1 => [
"Pool table", "Paid", "100"
],
2 => [
"Laundry", "we wash everything", "1000"
]
]
Rather than doing all the work for you, here's some pointers on one way to approach this:
If you are happy to assume that all three sub-arrays have the same number of items, you can use array_keys to get those keys from whichever you want.
Once you have those keys, you can use a foreach loop to look at each in turn.
For each key, use square bracket syntax to pluck the three items you need.
Use [$foo, $bar, $baz] or array($foo, $bar, $baz) to create a new array.
Assign that array to your final output array, using the key from your foreach loop.
Just use foreach with key => value
$data = [
"extra_charge_item" => [
0 => "Massage",
1 => "Pool table",
2 => "Laundry"
],
"extra_charge_description" => [
0 => "Paid",
1 => "Paid",
2 => "We wash everything"
],
"extra_charge_price" => [
0 => "200",
1 => "100",
2 => "1000"
],
];
$newData = [];
foreach ($data as $value) {
foreach ($value as $k => $v) {
$newData[$k][] = $v;
}
}
var_dump($newData);
I found an answer. Seems someone else had the same problem
Merge two arrays into one associative array
Here's the solution for my case;
//These are arrays passed from front end in the name attribute e.g extra_charge_item[] e.t.c
$extra_charge_item_array = $request->input('extra_charge_item');
$extra_charge_description_array = $request->input('extra_charge_description');
$extra_charge_price_array = $request->input('extra_charge_price');
$new_extra_charges_data = array_map(
function ($item, $description, $price) {
return [
'item' => $item,
'description' => $description,
'price'=> $price
];
}, $extra_charge_item_array, $extra_charge_description_array, $extra_charge_price_array);
//save the data
foreach ($new_extra_charges_data as $extra_charge) {
Charge::create([
'item' => $extra_charge['item'],
'description' => $extra_charge['description'],
'price' => $extra_charge['price']
]);
}
I'm fairly new to ElasticSearch, currently using v6.2 and I seem to have run into a problem while trying to add some aggregations to a query. Trying to wrap my head around the various types of aggregation, as well as the best ways to store the data.
When the query runs, I have some variable attributes that I would like to aggregate and then return as filters to the user. For example, one character may have attributes for "size", "shape" and "colour", while another only has "shape" and "colour".
The full list of attributes is unknown so I don't think I would be able to construct the query that way.
My data is currently structured like this:
{
id : 1,
title : 'New Character 1',
group : 1,
region : 1,
attrs : [
moves : 2,
# These would be dynamic, would only apply to some rows, not others.
var_colours : ['Blue', Green', 'Red'],
var_shapes : ['Round', 'Square', 'Etc'],
effects : [
{ id : 1, value: 20},
{ id : 2, value: 60},
{ id : 3, value: 10},
]
]
}
I currently have an aggregation of groups and regions that looks like this. It seems to be working wonderfully and I would like to add something similar for the attributes.
[
'aggs' => [
'group_ids' => [
'terms' => [
'field' => 'group',
'order' => [ '_count' => 'desc' ]
]
],
'region_ids' => [
'terms' => [
'field' => 'region',
'order' => [ '_count' => 'desc' ]
]
]
]
]
I'm hoping to get a result that looks like the below. I am also not sure if the data structure is setup in the best way either, I can make changes there if necessary.
[aggregations] => [
[groups] => [
[doc_count_error_upper_bound] => 0
[sum_other_doc_count] => 0
[buckets] => [
[0] => [
[key] => 5
[doc_count] => 27
],
[1] => [
[key] => 2
[doc_count] => 7
]
]
],
[var_colours] => [
[doc_count_error_upper_bound] => 0
[sum_other_doc_count] => 0
[buckets] => [
[0] => [
[key] => 'Red'
[doc_count] => 27
],
[1] => [
[key] => 'Blue'
[doc_count] => 7
]
]
],
[var_shapes] => [
[doc_count_error_upper_bound] => 0
[sum_other_doc_count] => 0
[buckets] => [
[0] => [
[key] => 'Round'
[doc_count] => 27
],
[1] => [
[key] => 'Polygon'
[doc_count] => 7
]
]
]
// ...
]
Any insight that anyone could provide would be extremely appreciated.
You should do this within your PHP script.
I can think of the following:
Use the Dynamic field mapping for your index.
By default, when a previously unseen field is found in a document, Elasticsearch will add the new field to the type mapping. This behaviour can be disabled, both at the document and at the object level, by setting the dynamic parameter to false (to ignore new fields) or to strict (to throw an exception if an unknown field is encountered).
Get all the existing fields in your index. Use the Get mapping API for this.
Loop over the results of Step 2 so you can get all the existing fields in your index. You can store them in a list (or array), for example.
You can create a PHP Elasticsearch terms aggregation for each of the fields in your list (or array). This is: create an empty or base query with no terms aggregation and add one terms for each element you got from step 3.
Add to each terms, the missing field with an empty empty string ("").
That's it. Following this, you have creating a query in such way that, no matter what index you're searching, you'll get a terms agg with all the existing fields for it.
Advantages:
Your terms aggregations will be generated dynamically with all the existing fields.
For each of the doc that does not contain any of the fields, an empty string will be shown.
Disadvantages:
Looping through the GET mapping API's result could be a little frustrating (but I trust you).
Performance (time & resources) will be affected for every new field you find in your mappings.
I hope this is helpful! :D
I have a query like
'aggs' => [
'deadline' => [
'date_histogram' => [
'field' => 'deadline',
'interval' => 'month',
'keyed' => true,
'format' => 'MMM'
]
]
]
the result I am getting are buckets with keys as month names.
The problem I am facing is the buckets with the month names as keys for a previous year are over written by another month of the next year (because obviously the key is same).
I want results where doc-count of buckets of previous which are over written merge with the doc_count of the next.
You can either add a separate month field during indexing and perform aggregation on it or use below script
{
"size": 0,
"aggs": {
"deadline": {
"histogram": {
"script": { "inline" : "return doc['deadline'].value.getMonthOfYear()" },
"interval": 1
}
}
}
}
Creating a separate month field will have better performance
Replace the format from MMM to YYYY-MMM as below:
'aggs' => [
'deadline' => [
'date_histogram' => [
'field' => 'deadline',
'interval' => 'month',
'keyed' => true,
'format' => 'YYYY-MMM'
]
]
]
After this you can handle the merging process at your application level
I have an array based MySql database.
This is the array.
[
0 => [
'id' => '1997'
'lokasi_terakhir' => 'YA4121'
]
1 => [
'id' => '1998'
'lokasi_terakhir' => 'PL2115'
]
2 => [
'id' => '1999'
'lokasi_terakhir' => 'PL4111'
]
]
How can I get the element lokasi_terakhir that grouped by the first character ? What the best way ?
This is the goal :
[
"Y" => 1,
"P" => 2
]
Please advise
Here are two refined methods. Which one you choose will come down to your personal preference (you won't find better methods).
In the first, I am iterating the array, declaring the first character of the lokasi_terakhir value as the key in the $result declaration. If the key doesn't yet exist in the output array then it must be declared / set to 1. After it has been instantiated, it can then be incremented -- I am using "pre-incrementation".
The second method first maps a new array using the first character of the lokasi_terakhir value from each subarray, then counts each occurrence of each letter.
(Demonstrations Link)
Method #1: (foreach)
foreach($array as $item){
if(!isset($result[$item['lokasi_terakhir'][0]])){
$result[$item['lokasi_terakhir'][0]]=1; // instantiate
}else{
++$result[$item['lokasi_terakhir'][0]]; // increment
}
}
var_export($result);
Method #2: (functional)
var_export(array_count_values(array_map(function($a){return $a['lokasi_terakhir'][0];},$array)));
// generate array of single-character elements, then count occurrences
Output: (from either)
array (
'Y' => 1,
'P' => 2,
)
You can group those items like this:
$array = [
0 => [
'id' => '1997',
'lokasi_terakhir' => 'YA4121'
],
1 => [
'id' => '1998',
'lokasi_terakhir' => 'PL2115'
],
2 => [
'id' => '1999',
'lokasi_terakhir' => 'PL4111'
]
];
$result = array();
foreach($array as $item) {
$char = substr($item['lokasi_terakhir'], 0, 1);
if(!isset($result[$char])) {
$result[$char] = array();
}
$result[$char][] = $item;
}
<?php
$array=[
0 => [
'id' => '1997',
'lokasi_terakhir' => 'YA4121'
],
1 => [
'id' => '1998',
'lokasi_terakhir' => 'PL2115'
],
2 => [
'id' => '1999',
'lokasi_terakhir' => 'PL4111'
]
];
foreach($array as $row){
$newArray[]=$row['lokasi_terakhir'][0];
}
print_r(array_flip(array_unique($newArray)));
this code gets the first letter of the fields lokasi_terakhir , get the unique values to avoid duplicates and just flips the array to get the outcome you want.
The output is this :
Array ( [Y] => 0 [P] => 1 )
I have below array, I need to append a new array inside $newData['_embedded']['settings']['web/vacation/filters']['data'], How can I access and append inside it ?
$newData = [
"id" => "47964173",
"email" => "abced#gmail.com",
"firstName" => "Muhammad",
"lastName" => "Taqi",
"type" => "employee",
"_embedded" => [
"settings" => [
[
"alias" => "web/essentials",
"data" => [],
"dateUpdated" => "2017-08-16T08:54:11Z"
],
[
"alias" => "web/personalization",
"data" => [],
"dateUpdated" => "2016-07-14T10:31:46Z"
],
[
"alias" => "wizard/login",
"data" => [],
"dateUpdated" => "2016-09-26T07:56:43Z"
],
[
"alias" => "web/vacation/filters",
"data" => [
"test" => [
"type" => "teams",
"value" => [
0 => "09b285ec-7687-fc95-2630-82d321764ea7",
1 => "0bf117b4-668b-a9da-72d4-66407be64a56",
2 => "16f30bfb-060b-360f-168e-1ddff04ef5cd"
],
],
"multiple teams" => [
"type" => "teams",
"value" => [
0 => "359c0f53-c9c3-3f88-87e3-aa9ec2748313"
]
]
],
"dateUpdated" => "2017-07-03T09:10:36Z"
],
[
"alias" => "web/vacation/state",
"data" => [],
"dateUpdated" => "2016-12-08T06:58:57Z"
]
]
]
];
$newData['_embedded']['settings']['web/vacation/filters']['data'] = $newArray;
Any Hint to quickly append it, I don't want to loop-in and check for keys inside loops.
The settings subarray is "indexed". You first need to search the alias column of the subarray for web/vacation/filters to find the correct index. Using a foreach loop without a break will mean your code will continue to iterate even after the index is found (bad coding practice).
There is a cleaner way that avoids a loop & condition & break, use array_search(array_column()). It will seek your associative element, return the index, and immediately stop seeking.
You can use the + operator to add the new data to the subarray. This avoids calling a function like array_merge().
Code: (Demo)
if(($index=array_search('web/vacation/filters',array_column($newData['_embedded']['settings'],'alias')))!==false){
$newData['_embedded']['settings'][$index]['data']+=$newArray;
}
var_export($newData);
Perhaps a more considered process would be to force the insert of the new data when the search returns no match, rather than just flagging the process as unsuccessful. You may have to tweak the date generation for your specific timezone or whatever... (Demo Link)
$newArray=["test2"=>[
"type" =>"teams2",
"value" => [
0 => "09b285ec-7687-fc95-2630-82d321764ea7",
1 => "0bf117b4-668b-a9da-72d4-66407be64a56",
2 => "16f30bfb-060b-360f-168e-1ddff04ef5cd"
],
]
];
if(($index=array_search('web/vacation/filters',array_column($newData['_embedded']['settings'],'alias')))!==false){
//echo $index;
$newData['_embedded']['settings'][$index]['data']+=$newArray;
}else{
//echo "couldn't find index, inserting new subarray";
$dt = new DateTime();
$dt->setTimeZone(new DateTimeZone('UTC')); // or whatever you are using
$stamp=$dt->format('Y-m-d\TH-i-s\Z');
$newData['_embedded']['settings'][]=[
"alias" => "web/vacation/filters",
"data" => $newArray,
"dateUpdated" => $stamp
];
}
You need to find the key that corresponds to web/vacation/filters. For Example you could use this.
foreach ($newData['_embedded']['settings'] as $key => $value) {
if ($value["alias"]==='web/vacation/filters') {
$indexOfWVF = $key;
}
}
$newData['_embedded']['settings'][$indexOfWVF]['data'][] = $newArray;
From the comments. Then you want to merge the arrays. Not append them.
$newData['_embedded']['settings'][$indexOfWVF]['data'] = array_merge($newData['_embedded']['settings'][$indexOfWVF]['data'],$newArray);
Or (if it's always Filter1):
$newData['_embedded']['settings'][$indexOfWVF]['data']['Filter1'] = $newArray['Filter1'];