MongoCursorTimeoutException with sort on _id - php

I have an Mongo collection containing ~7 millions events. To get the events that happend for an aggregate I have the following PHP code
$client = new MongoClient();
$db = $client->selectDB('db_name');
$collection = $db->selectCollection('events');
foreach($collection->find([
'headers.for' => '89d115f8-0b2f-470e-9495-2a07d9dfb942',
])->sort([
'headers.occurredOn' => 1,
'_id' => 1,
]) as $event) {
var_dump($event);
}
When I run the above PHP code I get an MongoCursorTimeoutException after 30 seconds.
But when I run the same code without a sort on _id, so:
$client = new MongoClient();
$db = $client->selectDB('db_name');
$collection = $db->selectCollection('events');
foreach($collection->find([
'headers.for' => '89d115f8-0b2f-470e-9495-2a07d9dfb942',
])->sort([
'headers.occurredOn' => 1,
]) as $event) {
var_dump($event);
}
The error does not occur and get instant results (which is one record).
So why does a MongoCursorTimeoutException occur when a sort on _id is added?
The indexes for the collection looks as follow
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "db.events"
},
{
"v" : 1,
"key" : {
"headers.occurredOn" : NumberLong(1),
"_id" : NumberLong(1)
},
"name" : "headers_occurredOn_1__id_1",
"ns" : "db.events"
},
{
"v" : 1,
"key" : {
"headers.for" : NumberLong(1)
},
"name" : "headers_for_1",
"ns" : "db.events"
}
]

Well the problem was the indexes I had.
{
"v" : 1,
"key" : {
"headers.occurredOn" : NumberLong(1),
"_id" : NumberLong(1)
},
"name" : "headers_occurredOn_1__id_1",
"ns" : "db.events"
}
After I dropped this one and added the following:
{
"v" : 1,
"key" : {
"headers.occurredOn" : NumberLong(1)
},
"name" : "headers.occurredOn_1",
"ns" : "db.events"
}
Everything went smooth

Related

Trying to make an trending page with mongo and php, need some thoughts about how to update documents and query them

We have a big database. We collect newsletters and I want to make a trending page. The goal is to make the page realtime and fast! We want to display trending newsletters from the past 2 hours, 4 hours, 24 hours, past week, and past month.
I've worked with MongoDB for a while and I try to keep things simple. I want a new collection, trending, that stores the visitors of the newsletter pages in a time bucket. On every visit, I want to add the information of the newsletter to the object that holds the trending newsletters for that time and $inc the hits field for statistics.
My objects are:
{
"_id" : ObjectId("5d4b4ca5a6bba5f7ffb23b39"),
"bucket" : "last2hours",
"language" : "nl",
"time" : "2019-08-08_00",
"newsletters" : {
"5d4b29ba8ddf870fe15628c7" : {
"_id" : ObjectId("5d4b29ba8ddf870fe15628c7"),
"_slug" : "nieuwsbrief-dalstra-reizen-touring-december-2015",
"subject" : "Nieuwsbrief Dalstra Reizen Touring december 2015",
"date" : ISODate("2015-12-04T13:15:03.000+0000"),
"publisher" : {
"_id" : ObjectId("557ebcc54c79597761fd71c2"),
"_slug" : "dalstra-nl",
"name" : "dalstra.nl",
"taal" : "nl"
},
"hits" : NumberInt(1)
},
"5d4b29af8ddf870fe15624ba" : {
"_id" : ObjectId("5d4b29af8ddf870fe15624ba"),
"_slug" : "the-carolina-weddings-show",
"subject" : "The Carolina Weddings Show",
"date" : ISODate("2015-12-04T13:13:54.000+0000"),
"publisher" : {
"_id" : ObjectId("503b950fffa67e2c790007d7"),
"_slug" : "livingsocialcom",
"name" : "Livingsocial.com",
"taal" : "nl"
},
"hits" : NumberInt(1)
},
"5d4b29ad8ddf870fe15623f4" : {
"_id" : ObjectId("5d4b29ad8ddf870fe15623f4"),
"_slug" : "newport-gangster-tour",
"subject" : "Newport Gangster Tour",
"date" : ISODate("2015-12-04T13:13:22.000+0000"),
"publisher" : {
"_id" : ObjectId("503b950fffa67e2c790007d7"),
"_slug" : "livingsocialcom",
"name" : "Livingsocial.com",
"taal" : "nl"
},
"hits" : NumberInt(1)
},
"5d4b29bb8ddf870fe15628f3" : {
"_id" : ObjectId("5d4b29bb8ddf870fe15628f3"),
"_slug" : "springwise-daily-shoe-insoles-control-devices-through-kicking-and-more",
"subject" : "Springwise Daily | Shoe insoles control devices through kicking, and more.",
"date" : ISODate("2015-12-04T13:15:05.000+0000"),
"publisher" : {
"_id" : ObjectId("5581f0b54c7959e82bfd71c2"),
"_slug" : "springwise-com",
"name" : "springwise.com",
"taal" : "nl"
},
"hits" : NumberInt(2)
}
}
}
{
"_id" : ObjectId("5d4b4ca5a6bba5f7ffb23b3b"),
"bucket" : "last2hours",
"language" : "nl",
"time" : "2019-08-08_01",
"newsletters" : {
"5d4b29ba8ddf870fe15628c7" : {
"_id" : ObjectId("5d4b29ba8ddf870fe15628c7"),
"_slug" : "nieuwsbrief-dalstra-reizen-touring-december-2015",
"subject" : "Nieuwsbrief Dalstra Reizen Touring december 2015",
"date" : ISODate("2015-12-04T13:15:03.000+0000"),
"publisher" : {
"_id" : ObjectId("557ebcc54c79597761fd71c2"),
"_slug" : "dalstra-nl",
"name" : "dalstra.nl",
"taal" : "nl"
},
"hits" : NumberInt(1)
},
"5d4b29af8ddf870fe15624ba" : {
"_id" : ObjectId("5d4b29af8ddf870fe15624ba"),
"_slug" : "the-carolina-weddings-show",
"subject" : "The Carolina Weddings Show",
"date" : ISODate("2015-12-04T13:13:54.000+0000"),
"publisher" : {
"_id" : ObjectId("503b950fffa67e2c790007d7"),
"_slug" : "livingsocialcom",
"name" : "Livingsocial.com",
"taal" : "nl"
},
"hits" : NumberInt(1)
},
"5d4b29ad8ddf870fe15623f4" : {
"_id" : ObjectId("5d4b29ad8ddf870fe15623f4"),
"_slug" : "newport-gangster-tour",
"subject" : "Newport Gangster Tour",
"date" : ISODate("2015-12-04T13:13:22.000+0000"),
"publisher" : {
"_id" : ObjectId("503b950fffa67e2c790007d7"),
"_slug" : "livingsocialcom",
"name" : "Livingsocial.com",
"taal" : "nl"
},
"hits" : NumberInt(1)
},
"5d4b29bb8ddf870fe15628f3" : {
"_id" : ObjectId("5d4b29bb8ddf870fe15628f3"),
"_slug" : "springwise-daily-shoe-insoles-control-devices-through-kicking-and-more",
"subject" : "Springwise Daily | Shoe insoles control devices through kicking, and more.",
"date" : ISODate("2015-12-04T13:15:05.000+0000"),
"publisher" : {
"_id" : ObjectId("5581f0b54c7959e82bfd71c2"),
"_slug" : "springwise-com",
"name" : "springwise.com",
"taal" : "nl"
},
"hits" : NumberInt(2)
}
}
}
{
"_id" : ObjectId("5d4b4ca5a6bba5f7ffb23b3d"),
"bucket" : "last4hours",
"language" : "nl",
"time" : "2019-08-08_00",
"newsletters" : {
"5d4b29ba8ddf870fe15628c7" : {
"_id" : ObjectId("5d4b29ba8ddf870fe15628c7"),
"_slug" : "nieuwsbrief-dalstra-reizen-touring-december-2015",
"subject" : "Nieuwsbrief Dalstra Reizen Touring december 2015",
"date" : ISODate("2015-12-04T13:15:03.000+0000"),
"publisher" : {
"_id" : ObjectId("557ebcc54c79597761fd71c2"),
"_slug" : "dalstra-nl",
"name" : "dalstra.nl",
"taal" : "nl"
},
"hits" : NumberInt(1)
},
"5d4b29af8ddf870fe15624ba" : {
"_id" : ObjectId("5d4b29af8ddf870fe15624ba"),
"_slug" : "the-carolina-weddings-show",
"subject" : "The Carolina Weddings Show",
"date" : ISODate("2015-12-04T13:13:54.000+0000"),
"publisher" : {
"_id" : ObjectId("503b950fffa67e2c790007d7"),
"_slug" : "livingsocialcom",
"name" : "Livingsocial.com",
"taal" : "nl"
},
"hits" : NumberInt(1)
},
"5d4b29ad8ddf870fe15623f4" : {
"_id" : ObjectId("5d4b29ad8ddf870fe15623f4"),
"_slug" : "newport-gangster-tour",
"subject" : "Newport Gangster Tour",
"date" : ISODate("2015-12-04T13:13:22.000+0000"),
"publisher" : {
"_id" : ObjectId("503b950fffa67e2c790007d7"),
"_slug" : "livingsocialcom",
"name" : "Livingsocial.com",
"taal" : "nl"
},
"hits" : NumberInt(1)
},
"5d4b29bb8ddf870fe15628f3" : {
"_id" : ObjectId("5d4b29bb8ddf870fe15628f3"),
"_slug" : "springwise-daily-shoe-insoles-control-devices-through-kicking-and-more",
"subject" : "Springwise Daily | Shoe insoles control devices through kicking, and more.",
"date" : ISODate("2015-12-04T13:15:05.000+0000"),
"publisher" : {
"_id" : ObjectId("5581f0b54c7959e82bfd71c2"),
"_slug" : "springwise-com",
"name" : "springwise.com",
"taal" : "nl"
},
"hits" : NumberInt(2)
}
}
}
The goal here is to only have to query the bucket language time. So, if I want to see the trending newsletters of the last 2 hours, I query {bucket: 'last2hours', language: 'nl', time: '2019-08-08_00'}, then I have all the information I need. No need for aggregation. This findOne query is fast.
So i made a method to update the trending collection:
public function setNewsletterTrendingStatistics($newsletter){
// Buckets
$trend_buckets = array(
'last2hours' => array('steps' => 2, 'step'=>'hour', 'format'=> 'Y-m-d_H'),
'last4hours' => array('steps' => 4, 'step'=>'hour', 'format'=> 'Y-m-d_H' ),
'last1day' => array('steps' => 24, 'step'=>'hour', 'format'=> 'Y-m-d_H' ),
'lastweek' => array('steps' => 7, 'step'=>'day', 'format'=> 'Y-m-d' ),
'lastmonth' => array('steps' => 31, 'step'=>'day', 'format'=> 'Y-m-d' ),
);
// $newsletter['date']->toDateTime()->format('U')
$buckets = array();
foreach($trend_buckets AS $bucket => $settings){
for($i=0; $i<$settings['steps']; $i++){
$buckets[] = array(
'bucket' => $bucket,
'time' => date($settings['format'], strtotime('+'. $i . ' ' . $settings['step'])),
'language' => $newsletter['publisher']['taal'],
);
}
}
// Add the stats to each bucket
foreach($buckets AS $bucket){
$query = array();
$query = $bucket;
$update = array(
'$set' => array(
'newsletters.' . (string) $newsletter['_id'] . '._id' => $newsletter['_id'],
'newsletters.' . (string) $newsletter['_id'] . '._slug' => $newsletter['_slug'],
'newsletters.' . (string) $newsletter['_id'] . '.subject' => $newsletter['subject'],
'newsletters.' . (string) $newsletter['_id'] . '.date' => $newsletter['date'],
'newsletters.' . (string) $newsletter['_id'] . '.publisher' => array(
'_id' => $newsletter['publisher']['_id'],
'_slug' => $newsletter['publisher']['_slug'],
'name' => $newsletter['publisher']['name'],
'taal' => $newsletter['publisher']['taal'],
),
),
'$inc' => array(
'newsletters.' . (string) $newsletter['_id'] . '.hits' => 1
),
);
$options = array('upsert'=>true);
$this->FW->mdb->{$this->config['collections']['newsletters_trending']}->updateOne($query, $update, $options);
}
}
First of all, is this a good approach? Is there a better approach? Second, I want to count unique hits, so I need to save an IP address. I want to count unique hits on the update query so I don't have to count on the findOne query. Whats the best way to achieve this? I know I can use addtoset for a unique array with IP addresses. But then I need to count these unique IP addresses.
So i ended up doing this:
I made buckets for each trending container (last 2 hours, last 4 hours, today, last week, last month) for every hour.
I fill this containers on every pageview with an update query $inc 1.
Every hour a cronjob combines these stats. so 2 hours fill 4 hours 4 hours fill today etc.
This seems like the best approche and are live stats.

FOSElastica nested query

My collections are like this:
{
"_index" : "test_index",
"_type" : "test_type",
"_id" : "10000",
"_score" : 1.0,
"_source" : {
"user_id" : 12,
"index_date" : {
"date" : "2018-02-06 14:25:49.816952",
"timezone_type" : 3,
"timezone" : "UTC"
},
"rating" : null,
"orders" : [
{
"hour" : "08",
"count" : 1
},
{
"hour" : "10",
"count" : 1
}
],
"products" : [
{
"p_id" : 970111,
"count" : 4
},
{
"p_id" : 1280811,
"count" : 1
},
]
}
},
and tried to access to {"hour":"10"}
My query is:
$query = new Query\Nested();
$query->setPath('orders');
$term = new Term();
$term->setTerm('orders.hour', $order->getCreatedAt()->format('H'));
$query->setQuery($term);
dump($finder->find($query));die;
but i got the following error:
[Elastica\Exception\ResponseException]
failed to create query: {
"nested" : {
"query" : {
"term" : {
"orders.hour" : {
"value" : "12",
"boost" : 1.0
}
}
},
"path" : "orders",
"ignore_unmapped" : false,
"score_mode" : "avg",
"boost" : 1.0
}
} [index: test_index] [reason: all shards failed]
Your documents not look like nested queries.
I assume that finder is your repository manager that is defined as orders repository, your code should look something like this
$finder = $this->get('fos_elastica.repository_manager')->getRepository('YourBundle:order');
$boolquery = new Query\BoolQuery();
$term = new Query\Term();
$term->setTerm('hour', $order->getCreatedAt()->format('H'));
$boolquery->addMust($term);
$finder->find($boolquery);

How to use $push in mongodb?

Hi all i am trying to push values to existing record using $push but i am getting error stating that
Invalid modifier specified: $push
I am using php below is my code php code
$collection->update(array('_id'=>3,"data._id"=>2),array('$push'=>array('userid','52')));
i.e adding 52 to userid. In 3 record and data._id 2
below is my table structure for mongo db
{ "_id" : 2,
"name" : "test",
"data" :[{"_id" : "1",
"file" : "nic",
"userid" : [1,2 ]
},
{"_id" : "2",
"file" : "nic1",
"userid" : [1 ]
},
{"_id" : 3,
"file" : "nick2",
"userid" : [1,2 ]
}
]},
{ "_id" : 3,
"name" : "test",
"data" : [{"_id" : "1",
"file" : "nic",
"userid" : [1,2 ]
},
{"_id" : "2",
"file" : "nic1",
"userid" : [3,2 ]
}
]}
Use the $ positional operator in your update that identifies an element in the data array to update without explicitly specifying the position of the element:
$collection -> update(
array('_id' => 3, "data._id" => 2),
array('$push' =>
array('data.$.userid' => 52)
)
);

MongoDb return only one element from the array

So my collection looks like this:
{
"_id" : ObjectId("52722429d874590c15000029"),
"name" : "Bags",
"products" : [{
"_id" : ObjectId("527225b5d87459b802000029"),
"name" : "Prada",
"description" : "Prada Bag",
"points" : "234",
"validDate" : 1382562000,
"link" : "dasdad",
"code" : "423423424",
"image" : null
}, {
"_id" : ObjectId("5272307ad87459401a00002a"),
"name" : "Gucci",
"description" : "Gucii bag",
"points" : "2342",
"validDate" : 1383170400,
"link" : "dsadada",
"code" : "2342",
"image" : null
}]
}
and I want to get only the product with the _id 527225b5d87459b802000029, I tried this:
$this->find(array(
'_id' => new \MongoId('52722429d874590c15000029'),
'products._id' => new \MongoId('527225b5d87459b802000029')
));
But it returns the entire array for that collection, and I only want one...can this be done in mongo?
As mentioned in comments, you have to add a projection, and more precisely an $elemMatch. No need to use the aggregation framework in that case.
Example :
find( { _id: 1, "products._id": 4 }, { products: { $elemMatch: { _id: 4 } } } ).pretty()

How to search a string in inner array using mongodb?

How to search value in multidimensional array,
for example I want to search example keyword in the following data in mongodb
I used to fetch all data from command
>db.info.find()
{
"_id" : ObjectId("4f74737cc3a51043d26f4b90"),
"id" : "12345",
"info" : [
{
"sno" : 1,
"name" : "ABC",
"email" : "abc#example.com"
},
{
"sno" : 2,
"name" : "XYZ",
"email" : "xyz#example.com"
},
{
"sno" : 3,
"name" : "XYZ",
"email" : "xyz#demo.com"
},
{
"sno" : 4,
"name" : "ABC",
"email" : "abc#demo.com"
},
{
"sno" : 5,
"name" : "Rohan",
"email" : "rohan#example.com"
}
]
}
Now, to find data having example I used command
>db.info.find({"info.email":"example"})
and it gives
{
"_id" : ObjectId("4f74737cc3a51043d26f4b90"),
"id" : "12345",
"info" : [
{
"sno" : 1,
"name" : "ABC",
"email" : "abc#example.com"
},
{
"sno" : 2,
"name" : "XYZ",
"email" : "xyz#example.com"
},
{
"sno" : 3,
"name" : "XYZ",
"email" : "xyz#demo.com"
},
{
"sno" : 4,
"name" : "ABC",
"email" : "abc#demo.com"
},
{
"sno" : 5,
"name" : "Rohan",
"email" : "rohan#example.com"
}
]
}
But I want only 3 out of 5 sub rows like
{
"_id" : ObjectId("4f74737cc3a51043d26f4b90"),
"id" : "12345",
"info" : [
{
"sno" : 1,
"name" : "ABC",
"email" : "abc#example.com"
},
{
"sno" : 2,
"name" : "XYZ",
"email" : "xyz#example.com"
},
{
"sno" : 5,
"name" : "Rohan",
"email" : "rohan#example.com"
}
]
}
Rohan, MongoDB always returns the whole document that you are searching on. You can't just make it return the array elements in which your keyword was found. If you want to do that, then you need to make sure all all embedded documents in the "info" field are in their own collection. And that might mean that you need to link them back to the original document in your "info" collection. Perhaps something like:
{
"sno" : 1,
"name" : "ABC",
"email" : "abc#example.com"
"info_id" : "12345",
},
Alternatively, you can of course do post-processing in PHP to obtain only the rows that you want.
Perhaps this is a good idea?
http://php.net/manual/en/class.mongoregex.php
I tried Map Reduce Function and it works on this type of problems the code is something like that:
Write a map function
map=function ()
{
filter = [];
this.info.forEach(function (s) {if (/example/.test(s.email)) {filter.push(s);}});
emit(this._id, {info:filter});
}
Write a reduce function
reduce=function(key, values) { return values;}
MapReduce Function
res=db.info.mapReduce(map,reduce,{out:{inline:1}})
And The Output look likes:
"results" : [
{
"_id" : ObjectId("4f9a2de0ea4a65c3ab85a9d3"),
"value" : {
"info" : [
{
"sno" : 1,
"name" : "ABC",
"email" : "abc#example.com"
},
{
"sno" : 2,
"name" : "XYZ",
"email" : "xyz#example.com"
},
{
"sno" : 5,
"name" : "Rohan",
"email" : "rohan#example.com"
}
]
}
}
],
"timeMillis" : 1,
"counts" : {
"input" : 3,
"emit" : 3,
"reduce" : 0,
"output" : 3
},
"ok" : 1,
Now you can find your search data from
printjson(res.results)
Did you try $ (projection)?
db.info.find({"info.email":"example"}, {"info.email.$":1})
document

Categories