Suppose I have the following objects in my collection:
{id:'123', tags:['berry', 'apple']}
{id:'456', tags:['salad', 'tomatoe']}
{id:'789', tags:['bread', 'rice']}
My search term is "Strawberry". I want to find all objects, where one of the tags is part of search term. In this case it's the object with id '123', since 'berry' is part of 'Strawberry'.
I wanted to use Regex, like this (I'm using php btw):
$regex = new MongoRegex("/.*berry.*/i");
$results = $mongodb->data->find(array("tags" => array('$in' => array($regex))));
but the problem is that the regex is applied on the tags and not on the search result. So i'd need something like a reverse Regex.
Is a query like this somehow possible? Right now I'm doing it like this:
$search = "Strawberry";
$js = "function() { var i = 0; for (; i < this.tags.length; i++) { if ('".$search."'.indexOf(this.tags[i]) != -1) { return true; } } }";
$results = $mongodb->data->find($js);
That's OK for now, since the dataset isn't very large, but will be slow in the future.
Does anyone have a suggestion? Thanks.
UPDATE:
Sorry if this is still not clear.
My search Term is "Strawberry", not "berry". The php code I posted that contains the Regex was just to show that this is not a solution and does not work.
So again: My search term is "Strawberry" and I want to find all objects, where on of the tags is part of the search term, not the other way around
UPDATE 2:
To make it even clearer, in SQL this would be:
SELECT * FROM data WHERE 'Strawberry' LIKE CONCAT('%', tag, '%')
This query will match strawberry if you have in tags
db.collection.aggregate(
[
{$unwind: "$tags"},
{$match : {tags: /.*berry.*/i }}
]
)
Tested output'
{
"result" : [
{
"_id" : ObjectId("537373c17c3639c32fe515fb"),
"id" : "123",
"tags" : "berry"
},
{
"_id" : ObjectId("537375337c3639c32fe515fe"),
"id" : "789",
"tags" : "strawberry"
}
],
"ok" : 1
}
In terms of PHP,
$result = $mongodb->aggregate(array(
array(
'$unwind' => "$tags",
),
array(
'$match' => array(
'tags' => /.*berry.*/i
),
),
));
Related
I need help with MongoDb with PHP driver.
I have 4 collections:
order_aproved :
{
"order_id" : mongoId ,
"user_id":num ,
"order_date":mongoDate ,
"requset" : string
}
orders_rejected :
{
"order_id" : mongoId,
"user_id" : num ,
"order_date" : mongoDate ,
"requset" : string
}
users :
{
"user_id" : mongoId,
"username" : num ,
"last_order" : mongoDate ,
"num_orders" : num,
"last_order"
}
orders_log :
{
"order_id" : mongoId ,
"order_date" : mongoDate ,
"status" : boolen ,
"user_id" : num
}
Every approved/rejected order, I update the num_orders on user document
that have a new/rejected order. So that number is always changing
and log that order on orders_log.
I need to fetch all orders approved/rejected on orders_log by list of users [array] with condition and get the orders count num_orders and last order date for that user by the order from this user
I am doing it like this:
$cursor = $orders->find()->sort(array("order_date" => -1))->limit(15);
$array = iterator_to_array($cursor,false);
$users_for_aproved = ["123","124","125"];
$users_for_rejcted = ["112","113","114"];
$js = "function() { if ( this.requset ) { return this.requset.length > 0 } }";
$query1 = array( '$and' => array(
array("user_id" => array('$in'=> $users_for_aproved)),
array('$where' => $js )
));
$query1 = array( '$and' => array(
array("user_id" => array('$in'=> $users_for_rejcted)),
array('$where' => $js )
));
$query_or = array('$or' => array($query,$query1);
$cursor = $orders_log->find($query_or)->sort(array("order_date" => -1))->limit(15);
$array = iterator_to_array($cursor,false);
for ( $x=0; $x < count($array) ; $x++ ) {
$query = array( "user_id" => $order["user_id"] );
$cursor = $orders->find($query)->limit(1);
$array = iterator_to_array($cursor,false);
$order_count = $array[0]["num_orders"];
$array[$x]["order_count"] = $order_count;
}
return $array;
It's working but its not very efficient , i need a way to fetch data from another collection and add the num_orders to the doc that i have find without a use form anther query
like SQL JOIN but on mongo and php driver
Thanks!
There are two ways to achieve gain in performance in this case:
create an index on MongoDB:
db.order_aproved.createIndex( { user_id: 1 } )
You may create the above index either in the above way, or in the background:
db.order_aproved.createIndex( { user_id: 1 }, { background: true } )
In the last case, the creation will be slower, but it will not bother the currently ongoing operations on the database. If you may afford it, I think you should better create an index not in the background, esp. if are not running this script on the Production Database
re-design the collections, so that instead of the different collections, joined by some ID, you should create embedded documents inside the main document, thus eliminating the need to perform any operations, similar to JOINs in RDBMSs.
Of the above, simplest and more straight forward solution in your case, seems to me the first one. Choosing it, you will also avoid performance losses in updates for embedded documents
I have made the following attempt to query documents that have a department value filled in:
$collection = $this->mongo_db->db->selectCollection('widget');
$result = $collection->find(
array("department"=>array('$ne' => null),"department"=> array('$ne' => ""))
)->sort(['department'=>1]);
return iterator_to_array($result);
But this is still returning documents that look like this:
{
"_id" : ObjectId("5824b9376b6347a422aae017"),
"widgetnum" : "1840023",
"last_assigned" : "missing"
}
I thought the
"department"=>array('$ne' => null)
would have filtered this out.
Any suggestions?
For you perfect will be $exists operator.
https://docs.mongodb.com/manual/reference/operator/query/exists/
Query should look something like this(select all documents where department exists and does not equal to "").
$collection = $this->mongo_db->db->selectCollection('widget');
$result = $collection->find(
array( "department"=> array('$exists' => "true", '$nin': [""]) )
)->sort(['department'=>1]);
return iterator_to_array($result);
Hope this helps.
Below given is my code to generate index using elasticsearch.Index is getting generated successfully.Basically I am using it to generate autosuggest depending upon movie name,actor name and gener.
Now my requirement is, I need to match substring with particular field.This is working fine if I use $params['body']['query']['wildcard']['field'] = '*sub_word*';.(i.e. search for 'to' gives 'tom kruz' but search for 'tom kr' returns no result).
This matches only particular word in string.I want to match substring containing multiple words(i.e. 'tom kr' should return 'tom kruz').
I found few docs, saying it will be possible using 'ngram'.
But I don't know, how should I implement it in my code, as I am using array based configurations for elasticsearch and all support docs are mentioning configuration in json fromat.
Please help.
require 'vendor/autoload.php';
$client = \Elasticsearch\ClientBuilder::create()
->setHosts(['http://localhost:9200'])->build();
/*************Index a document****************/
$params = ['body' => []];
$j = 1;
for ($i = 1; $i <= 100; $i++) {
$params['body'][] = [
'index' => [
'_index' => 'pvrmod',
'_type' => 'movie',
'_id' => $i
]
];
if ($i % 10 == 0)
$j++;
$params['body'][] = [
'title' => 'salaman khaan'.$j,
'desc' => 'salaman khaan description'.$j,
'gener' => 'movie gener'.$j,
'language' => 'movie language'.$j,
'year' => 'movie year'.$j,
'actor' => 'movie actor'.$j,
];
// Every 10 documents stop and send the bulk request
if ($i % 10 == 0) {
$responses = $client->bulk($params);
// erase the old bulk request
$params = ['body' => []];
unset($responses);
}
}
// Send the last batch if it exists
if (!empty($params['body'])) {
$responses = $client->bulk($params);
}
The problem here lies in the fact that Elasticsearch builds an inverted index. Assuming you use the standard analyser, the sentence "tom kruz is a top gun" get's split into 6 tokens: tom - kruz - is - a - top - gun. These tokens get assigned to the document (with some metadata about there position but let's leave that on the side for now).
If you want to make a partial match, you can, but only on the separate tokens, not over the boundary of tokens as you would like. The suggestion for splitting your search string and building a wildcard query out of these strings is an option.
Another option would indeed be using an ngram or edge_ngram token filter. What that would do (at index time) is creating those partial tokens (like t - to - tom - ... - k - kr - kru - kruz - ...) in advance and you can just put in 'tom kr' in your (match) search and it would match. Be careful though: this will bloat your index (as you can see, it will store A LOT more tokens), you need custom analysers and probably quite a bit of knowledge about your mappings.
In general, the (edge_)ngram route is a good idea only for things like autocomplete, not for just any text field in your index. There's a few ways to get around your problem but most involve building separate features to detect misspelled words and trying to suggest the right terms for it.
Try to create this JSON
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"wildcard": {
"field": {
"value": "tom*",
"boost": 1
}
}
},
{
"field": {
"brandname": {
"value": "kr*",
"boost": 1
}
}
},
]
}
}
}
}
You can explode your search term
$searchTerms = explode(' ', 'tom kruz');
And then create the wildcard for each one
foreach($searchTerms as $searchTerm) {
//create the new array
}
I have a mongodb database which contains two connected collections.
The first one has a dataset which looks like this:
{
"_id": ObjectId("5326d2a61db62d7d2f8c13c0"),
"reporttype": "visits",
"country": "AT",
"channel": "wifi",
"_level": NumberInt(3)
}
The ObjectId is connected to several datasets in the second collection which look like this:
{
"_id": ObjectId("54c905662d0a99627efe17a9"),
"avg": NumberInt(0),
"chunk_begin": ISODate("2015-01-28T12:00:00.0Z"),
"count": NumberInt(15),
"max": NumberInt(0),
"min": NumberInt(0),
"sum": NumberInt(0),
"tag": ObjectId("5326d2a61db62d7d2f8c13c0")
}
As you can see it the "_id" from the first dataset the same as the "tag" from the second.
I want to write a routine in php which gets the ids from the first collection and finds by them datasets in a certain timeframe in the second collection for deletion.
I get the id from the first collection ok, but I suspect I use it wrongly in the the query for the second collection because nothing is ever found or deleted.
Code looks like this:
// select a collection (analog to a relational database's table)
$tagCollection = $db->tags;
$orderCollection = $db->orders;
// formulate AND query
$aTagCriteria = array(
'reporttype' => new MongoRegex('/[a-z]+/'),
);
// retrieve only _id keys
$fields = array('_id');
$cursor = $tagCollection->find($aTagCriteria, $fields);
$startOfTimeperiod = new MongoDate(strtotime('2015-01-05 00:00:00'));
$endOfTimeperiod = new MongoDate(strtotime('2015-01-07 13:20:00'));
// iterate through the result set
foreach ($cursor as $obj) {
echo '_id: '.$obj['_id'].' | ';
// Until here all is ok, I get the _id as output.
$aOrdercriteria = array(
'tag' => new MongoId($obj['_id']),
'date' => array(
'$lte' => $endOfTimeperiod,
'$gte' => $startOfTimeperiod
),
);
$iCount = $orderCollection->count($aOrdercriteria);
if ($iCount > 0) {
echo PHP_EOL.$iCount.' document(s) found.'.PHP_EOL;
$result = $orderCollection->remove($aOrdercriteria);
echo __FUNCTION__.'|'.__LINE__.' | '.json_encode($result).PHP_EOL;
echo 'Removed document with ID: '.$aOrdercriteria['tag'].PHP_EOL;
}
}
What is the correct way for the search condition so it looks for tag Objects
with the previously found id?
PS:
I tried
'tag' => $obj['_id'],
instead of
'tag' => new MongoId($obj['_id']),
which didn't work either.
So two things had to be changed.
The first one was like EmptyArsenal hinted:
tag' => new MongoId($obj['_id']),
is wrong since $obj['_id'] is already an object.
So
'tag' => $obj['_id'],
is correct.
And if I change my condition from "date" to "chunk_begin" yahooo.... it works. Stupid me.
I want to find documents where last elements in an array equals to some value.
Array elements may be accessed by specific array position:
// i.e. comments[0].by == "Abe"
db.example.find( { "comments.0.by" : "Abe" } )
but how do i search using the last item as criteria?
i.e.
db.example.find( { "comments.last.by" : "Abe" } )
By the way, i'm using php
I know this question is old, but I found it on google after answering a similar new question. So I thought this deserved the same treatment.
You can avoid the performance hit of $where by using aggregate instead:
db.example.aggregate([
// Use an index, which $where cannot to narrow down
{$match: { "comments.by": "Abe" }},
// De-normalize the Array
{$unwind: "$comments"},
// The order of the array is maintained, so just look for the $last by _id
{$group: { _id: "$_id", comments: {$last: "$comment"} }},
// Match only where that $last comment by `by.Abe`
{$match: { "comments.by": "Abe" }},
// Retain the original _id order
{$sort: { _id: 1 }}
])
And that should run rings around $where since we were able to narrow down the documents that had a comment by "Abe" in the first place. As warned, $where is going to test every document in the collection and never use an index even if one is there to be used.
Of course, you can also maintain the original document using the technique described here as well, so everything would work just like a find().
Just food for thought for anyone finding this.
Update for Modern MongoDB releases
Modern releases have added the $redact pipeline expression as well as $arrayElemAt ( the latter as of 3.2, so that would be the minimal version here ) which in combination would allow a logical expression to inspect the last element of an array without processing an $unwind stage:
db.example.aggregate([
{ "$match": { "comments.by": "Abe" }},
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$arrayElemAt": [ "$comments.by", -1 ] },
"Abe"
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
The logic here is done in comparison where $arrayElemAt is getting the last index of the array -1, which is transformed to just an array of the values in the "by" property via $map. This allows comparison of the single value against the required parameter, "Abe".
Or even a bit more modern using $expr for MongoDB 3.6 and greater:
db.example.find({
"comments.by": "Abe",
"$expr": {
"$eq": [
{ "$arrayElemAt": [ "$comments.by", -1 ] },
"Abe"
]
}
})
This would be by far the most performant solution for matching the last element within an array, and actually expected to supersede the usage of $where in most cases and especially here.
You can't do this in one go with this schema design. You can either store the length and do two queries, or store the last comment additionally in another field:
{
'_id': 'foo';
'comments' [
{ 'value': 'comment #1', 'by': 'Ford' },
{ 'value': 'comment #2', 'by': 'Arthur' },
{ 'value': 'comment #3', 'by': 'Zaphod' }
],
'last_comment': {
'value': 'comment #3', 'by': 'Zaphod'
}
}
Sure, you'll be duplicating some data, but atleast you can set this data with $set together with the $push for the comment.
$comment = array(
'value' => 'comment #3',
'by' => 'Zaphod',
);
$collection->update(
array( '_id' => 'foo' ),
array(
'$set' => array( 'last_comment' => $comment ),
'$push' => array( 'comments' => $comment )
)
);
Finding the last one is easy now!
You could do this with a $where operator:
db.example.find({ $where:
'this.comments.length && this.comments[this.comments.length-1].by === "Abe"'
})
The usual slow performance caveats for $where apply. However, you can help with this by including "comments.by": "Abe" in your query:
db.example.find({
"comments.by": "Abe",
$where: 'this.comments.length && this.comments[this.comments.length-1].by === "Abe"'
})
This way, the $where only needs to be evaluated against documents that include comments by Abe and the new term would be able to use an index on "comments.by".
I'm just doing :
db.products.find({'statusHistory.status':'AVAILABLE'},{'statusHistory': {$slice: -1}})
This gets me products for which the last statusHistory item in the array, contains the property status='AVAILABLE' .
I am not sure why my answer above is deleted. I am reposting it. I am pretty sure without changing the schema, you should be able to do it this way.
db.example.find({ "comments:{$slice:-1}.by" : "Abe" }
// ... or
db.example.find({ "comments.by" : "Abe" }
This by default takes the last element in the array.