I am currently trying to accept user input so that a user may be able to search the database.
> db.test2.find().pretty()
{
"_id" : ObjectId("55de8a17f8389e208a1e7d7e"),
"name" : "john",
"favorites" : {
"vegetable" : "spinach",
"fruit" : "apple",
}
}
{
"_id" : ObjectId("55de8a17f8389e208a1f6gg4"),
"name" : "becky",
"favorites" : {
"vegetable" : "spinach",
"fruit" : "apple",
}
}
{
"_id" : ObjectId("55e3b6cbec2740181355b809"),
"name" : "liz",
"favorites" : {
"vegetable" : "spinach",
"fruit" : "banana",
}
}
In this example, the user would be able to search for any combination of a person's favorite vegetable, fruit, or both their favorite vegetable and favorite fruit. If the user entered spinach for favorite vegetable, all three would be returned. However, if the user input favorite vegetable = spinach and favorite fruit = apple, only john and becky would be returned.
In MongoDB, you are able to determine which parameter you want to search. I am trying to write my code in a way that if the user leaves a field blank, it should not be searched for.
I have tried
$query = array("favorites.vegetable" => "$userInput", "favorites.fruit" => "$userInput2");
but if either of those fields are left blank, it will not return any results. I thought about trying to use if statements:
if ($vegetable == NULL)
{
$query = array("favorites.fruit" => "$fruit");
}
else if($fruit == NULL)
{
$query = array("favorites.vegetable" => "$vegetable");
}
else
{
$query = array("favorites.vegetable" => "$vegetable", "favorites.fruit" => "$fruit");
}
but if I would like to make my database searchable by more parameters I would have too many conditional statements. Is there any way to make my Mongo search recognize when a field is left blank?
The question really is, "Where is the input coming from?". As if you have some sort of structure to the input, then the coding is quite simple to follow a pattern.
As it is, with two basic variables you can clean the code to simply contruct your query based on what is in the variables:
$query = array();
if ( $fruit != NULL ) {
$query["favorites.fruit"] = $fruit;
}
if ( $vegetable != NULL ) {
$query["favorites.vegetable"] = $vegetable;
)
Which means you either end up with a $query here that is either blank to match everything or contains the specific arguments ( either one or two ) depending on whether the content was null or not.
If your input has some structure, then you can be a lot more dynamic:
$input = array("fruit" => "apple", "vegetable" => "spinach");
$query = array();
foreach ( $input as $key => $value ) {
$query["favorites.$key"] = $value;
}
Which is doing the same thing by appending to the $query but in a much more dynamic way than with individual variables.
Also note that as far as MongoDB is concerned, then your document structure is not great. It probably really should look like this:
{
"_id" : ObjectId("55de8a17f8389e208a1e7d7e"),
"name" : "john",
"favorites" : [
{ "type": "vegetable", "name": "spinach" },
{ "type": "fruit", "name": "apple" }
]
}
And while it may initially look like the query comes out a bit more complex, removing the "specific paths" in your query for keys like "vegetable" and "fruit" have a whole lot of benefits that make life a lot easier, and also primarilly the "data" can be "indexed". Key names are not indexable for search and therefore loose efficiency:
$input = array("fruit" => "apple", "vegetable" => "spinach");
$query = array();
foreach ( $input as $key => $value ) {
$query['$and'][] = array(
'favorites' => array(
'$elemMatch' => array(
'type' => $key, 'name' => $value
)
)
);
}
Which is a nice query using $elemMatch to find documents that contain "all" array elements where "both" the "type" and "name" of the specified items in your input list.
Basically looks like this in JSON:
{
"$and": [
{ "favorites": {
"$elemMatch": {
"type": "fruit", "name": "apple"
}
}},
{ "favorites": {
"$elemMatch": {
"type": "vegetable", "name": "spinach"
}
}}
]
}
It all comes down to knowing how to manipulate data structures in your chosen language, and that "data structures" are all MongoDB queries really are, as far as your language is concerned.
As for the structure change then condider that finding people who have "vegetables" in their "favorites" now becomes this:
{ "favorites.type": "vegetable" }
Which is nice since "favorites.type" can be indexed, as opposed to:
{ "favorites.vegetable": { "$exists": true } }
Which while this can "technically" use an index, it does not really do so in such a nice way. So changing the way the schema is represented is desirable and gives more flexibility.
Related
I'm using elasticsearch in my laravel-app and I'm trying to use the range-query. I have an array of companies, which in different periods have different amounts of employees, but I'm only interested in the newest period, which in this case means the last item of the employees array.
so, basically the array looks like this:
"company" => [
"name" => "some company",
"company_number" => "1234567",
"status" => "normal",
"employees" => [
"period_1" => [
"amount" => 10
],
"period_2" => [
"amount" => 15
],
"period_3" => [
"amount" => 24
],
etc etc...
]
]
so, in the frontend, you can enter a minimum and a maximum value to search for companies with certain amounts of employees. In my Controller, I then do this:
"query":{
"bool": {
"should" : [
{ "match" : { "company.status" : "normal" },
{
"range": {
"company.employees": { // I WANT THE LAST ITEM FROM THIS ARRAY
"gte": "'. $min . '",
"lt" : "'.$max .'"
}
}
}
]
}
}
This basically works, but of course, doesn't give me the last record of the employees array.
How can I solve this? Please help...
UPDATE
ok so now I added the code which was suggested:
"query": {
"bool": {
"should" : [
{ "match" : { "company.status" : "normal" },
{
"range": {
"company.employees": { // I WANT THE LAST ITEM FROM THIS ARRAY
"gte": "'. $min . '",
"lt" : "'.$max .'"
}
}
}
]
},
"script": {
"source": """
def period_keys = new ArrayList(ctx._source.company.employees.keySet());
Collections.sort(period_keys);
Collections.reverse(period_keys);
def latest_period = period_keys[0];
def latest_amount = ctx._source.company.employees[latest_period].amount;
ctx._source.company.current_employees = ["period": latest_period, "amount": latest_amount];
"""
}
}
}
But I get the error: Unexpected character ('{' (code 123)): was expecting comma to separate Object entries...
Since I'm still learning I must say, I have no clue what is going on and error messaging from Elasticsearch is horrible.
Anyway, does anyone have a clue? Thanks in advance
Looking up something like this at runtime is quite difficult and under-optimized. Here's an alternative.
I'm assuming a given company's employee counts don't change that often -- meaning when they do change (i.e. you update that document), you can run the following _update_by_query script to get the latest period's employee info and save it on the company level while leaving the employee section untouched:
POST companies_index/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": """
def period_keys = new ArrayList(ctx._source.company.employees.keySet());
Collections.sort(period_keys);
Collections.reverse(period_keys);
def latest_period = period_keys[0];
def latest_amount = ctx._source.company.employees[latest_period].amount;
ctx._source.company.current_employees = ['period': latest_period, 'amount': latest_amount];
"""
}
}
One-liner:
POST companies_index/_update_by_query
{"query":{"match_all":{}},"script":{"source":" def period_keys = new ArrayList(ctx._source.company.employees.keySet());\n Collections.sort(period_keys);\n Collections.reverse(period_keys);\n \n def latest_period = period_keys[0];\n def latest_amount = ctx._source.company.employees[latest_period].amount;\n \n ctx._source.company.current_employees = ['period': latest_period, 'amount': latest_amount];"}}
Note that when the above query is empty, the script will apply to all docs in your index. But of course you could limit it to one company only.
After that call your documents will look like this:
{
"company" : {
"company_number" : "1234567",
"name" : "some company",
"current_employees" : { <---
"period" : "period_3",
"amount" : 24
},
"employees" : {
...
},
...
}
}
and the range query from above becomes a piece of cake:
...
"range": {
"company.current_employees.amount": { <--
"gte": "'. $min . '",
"lt" : "'.$max .'"
}
...
BTW I also assumed that the period keys can be sorted alphabetically but if they contain dates, the script will require an adjustment in the form of a date parsing comparator.
Is it possible to move the preg_match search below into the JMESPATH search filter using contains? I found an example of contains in the JMESPATH tutorial, but I'm not sure if the syntax supports combining filter strings somehow using OR.
$s3_results = $s3_client->getPaginator('ListObjects', ['Bucket' => $config['bucket']]);
// Narrow S3 list down to files containing the strings "css_" or "js_".
foreach ($s3_results->search('Contents[].Key') as $key) {
if (preg_match("/(css_|js_)/", $key)) {
$s3_keys[] = $key;
}
}
Assuming the document is in a format something like this:
{
"Contents": [
{
"Key": "some/path/css_example",
...
},
{
"Key": "another/path/js_example",
...
},
{
"Key": "blah/blah/other_example",
...
}
]
}
(I would check but I don't have AWS credentials to hand and it's surprisingly hard to find examples of the JSON format of an S3 list-objects response.)
Try the query:
Contents[].Key|[?contains(#, 'css_') || contains(#, 'js_')]
In this case, Contents[].Key selects just the keys (as in your example). The pipe | is used to reset the projection so that we operate on the whole list rather than on each key separately. The [?...] filters the list to select only the keys that meet the boolean condition. The contains function can be used in several ways but we use it here to see if the first argument, as strings (#, the current item in the list) contains the second argument, the substring we're searching for. We combine two uses of the contains function with the or operator ||, so that an item matches if either condition is true.
Sure, the or expression is something that does exist in JMESPath.
But then, you could even go further and drop your for loop entirely and fetch the keys using functions expression.
Given the data:
{
"Contents": [
{
"Key": [
"css_foo",
"js_bar",
"brown_fox"
]
},
{
"Key": [
"js_foo",
"css_bar",
"lazy_dog"
]
}
]
}
Then
$keys = $s3_results->search(
'Contents[].Key[?contains(#, `js_`) == `true` || contains(#, `css_`) == `true`]'
);
Would give you the following two dimensional array (because there is two contents):
array(2) {
[0] =>
array(2) {
[0] =>
string(7) "css_foo"
[1] =>
string(6) "js_bar"
}
[1] =>
array(2) {
[0] =>
string(6) "js_foo"
[1] =>
string(7) "css_bar"
}
}
Or even simpler, since contains returns true or false, already:
$keys = $s3_results->search(
'Contents[].Key[?contains(#, `js_`) || contains(#, `css_`)]'
);
Background Information
I have the following data in my mongo database:
{ "_id" :
ObjectId("581c97b573df465d63af53ae"),
"ph" : "+17771111234",
"fax" : false,
"city" : "abd",
"department" : "",
"description" : "a test"
}
I am now writing a script that will loop through a CSV file that contains data that I need to append to the document. For example, the data might look like this:
+17771111234, 10:15, 12:15, test#yahoo.com
+17771111234, 1:00, 9:00, anothertest#yahoo.com
Ultimately I want to end up with a mongo document that looks like this:
{ "_id" :
ObjectId("581c97b573df465d63af53ae"),
"ph" : "+17771111234",
"fax" : false,
"city" : "abd",
"department" : "",
"description" : "a test",
"contact_locations": [
{
"stime": "10:15",
"etime": "12:15",
"email": "test#yahoo.com"
},
{
"stime": "1:00",
"etime": "9:00",
"email": "anothertest#yahoo.com"
},
]
}
Problem
The code I've written is actually creating new documents instead of appending to the existing ones. And actually, it's not even creating a new document per row in the CSV file... which I haven't debugged enough yet to really understand why.
Code
For each row in the csv file, I'm running the following logic
while(!$csv->eof() && ($row = $csv->fgetcsv()) && $row[0] !== null) {
//code that massages the $row into the way I need it to look.
$data_to_submit = array('contact_locations' => $row);
echo "proving that the record already exists...: <BR>";
$cursor = $contact_collection->find(array('phnum'=>$row[0]));
var_dump(iterator_to_array($cursor));
echo "now attempting to update it....<BR>";
// $cursor = $contact_collection->update(array('phnum'=>$row[0]), $data_to_submit, array('upsert'=>true));
$cursor = $contact_collection->insert(array('phnum'=>$row[0]), $data_to_submit);
echo "AFTER UPDATE <BR><BR>";
$cursor = $contact_collection->find(array('phnum'=>$row[0]));
var_dump(iterator_to_array($cursor));
}
}
Questions
Is there a way to "append" to documents? Or do I need to grab the existing document, save as an array, merge my contact locations array with the main document and then resave?
how can I query to see if the "contact_locations" object already exists inside a document?
Hi yes you can do it !
1st you need to find your document and push the new value you need :
use findAndModify and $addToSet :
$cursor = $contact_collection->findAndModify(
array("ph" => "+17771111234"),
array('$addToSet' =>
array(
"contact_locations" => array(
"stime"=> "10:15",
"etime"=> "12:15",
"email"=> "test#yahoo.com"
)
)
)
);
The best part is $addToSet wont add 2 time the same stuff so you will not have twice the same value :)
Here the docs https://docs.mongodb.com/manual/reference/operator/update/addToSet/
I'm not sure the exact syntax in PHP as I've never done it before but I'm currently doing the same thing in JS with MongoDB and $push is the method you're looking for. Also if I may be a bit nitpicky I recommend changing $contact_collection to $contact_locations as a variable name. Array variable names are usually plural and being more descriptive is always better. Also make sure you find the array in the MongoDB first that you want to append to and that you use the MongoDb "update" command
I want to query the content (text) inside my dynamic values keys, but i can't figure out the easiest way to do this.
So my mongo collection is like this:
{
"_id" : ObjectId("566aecb8f0e46491068b456c"),
"metadatas" : [
{
"schema_id" : "f645fabef0e464e51e8b4567",
"values" : {
"name" : "Test",
"age" : NumberLong(29),
"address" : "Test1"
},
"updated_on" : ISODate("2015-12-11T00:00:00Z")
},
{
"schema_id" : "d745fabef0e464e51e8b4567",
"values" : {
"something_else" : "lipsum"
},
"updated_on" : ISODate("2016-12-11T00:00:00Z")
}
],
}
How can i dynamically query whats inside my values since i cannot do $db->collec->find(array('metadatas.values.name' => $regex)) because i might have some other dynamic key instead of name?
thanks in advance
I ended up saving my keys uniquely on another collection and then building the query and concatenating it before applying based on #Sammaye idea:
$regex = new \MongoRegex("/^$query/i");
# First get all the dynamic keys you need to filter
$keys_to_search = $this->db->metadata_keys->find();
$this->log($keys_to_search);
$query_builder = array('$or'=>array());
foreach ($keys_to_search as $value){
array_push(
$query_builder['$or'],
array('metadatas.values.' . $value['key'] => $regex)
);
}
$this->log($query_builder);
$search_metadata_name = $this->db->filesfolders->find(
$query_builder, array('sql_fileid' => true)
);
I have a pretty big MongoDB document that holds all kinds of data. I need to identify the fields that are of type array in a collection so I can remove them from the displayed fields in the grid that I will populate.
My method now consists of retrieving all the field names in the collection with
This was taken from the response posted here MongoDB Get names of all keys in collection
mr = db.runCommand({
"mapreduce" : "Product",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "things" + "_keys"
})
db[mr.result].distinct("_id")
And running for each of the fields a query like this one
db.Product.find( { $where : "Array.isArray(this.Orders)" } ).count()
If there's anything retrieved the field is considered an array.
I don't like that I need to run n+2 queries ( n being the number of different fields in my collection ) and I wouldn't like to hardcode the fields in the model. It would defeat the whole purpose of using MongoDB.
Is there a better method of doing this ?
I made a couple of slight modifications to the code you provided above:
mr = db.runCommand({
"mapreduce" : "Product",
"map" : function() {
for (var key in this) {
if (Array.isArray(this[key])) {
emit(key, 1);
} else {
emit(key, 0);
}
}
},
"reduce" : function(key, stuff) { return Array.sum(stuff); },
"out": "Product" + "_keys"
})
Now, the mapper will emit a 1 for keys that contain arrays, and a 0 for any that do not. The reducer will sum these up, so that when you check your end result:
db[mr.result].find()
You will see your field names with the number of documents in which they contain Array values (and a 0 for any that are never arrays).
So this should give you which fields contain Array types with just the map-reduce job.
--
Just to see it with some data:
db.Product.insert({"a":[1,2,3], "c":[1,2]})
db.Product.insert({"a":1, "b":2})
db.Product.insert({"a":1, "c":[2,3]})
(now run the "mr =" code above)
db[mr.result].find()
{ "_id" : "_id", "value" : 0 }
{ "_id" : "a", "value" : 1 }
{ "_id" : "b", "value" : 0 }
{ "_id" : "c", "value" : 2 }