MongoDB, PHP getting unique visitors per day - php

I'm creating some analytics script using PHP and MongoDB and I am a bit stuck. I would like to get the unique number of visitors per day within a certain time frame.
{
"_id": ObjectId("523768039b7e7a1505000000"),
"ipAddress": "127.0.0.1",
"pageId": ObjectId("522f80f59b7e7a0f2b000000"),
"uniqueVisitorId": "0445905a-4015-4b70-a8ef-b339ab7836f1",
"recordedTime": ISODate("2013-09-16T20:20:19.0Z")
}
The field to filter on is uniqueVisitorId and recordedTime.
I've created a database object in PHP that I initialise and it makes me a database connection when the object is constructed, then I have MongoDB php functions simply mapped to public function using the database connection created on object construction.
Anyhow, so far I get the number of visitors per day with:
public function GetUniqueVisitorsDiagram() {
// MAP
$map = new MongoCode('function() {
day = new Date(Date.UTC(this.recordedTime.getFullYear(), this.recordedTime.getMonth(), this.recordedTime.getDate()));
emit({day: day, uniqueVisitorId:this.uniqueVisitorId},{count:1});
}');
// REDUCE
$reduce = new MongoCode("function(key, values) {
var count = 0;
values.forEach(function(v) {
count += v['count'];
});
return {count: count};
}");
// STATS
$stats = $this->database->Command(array(
'mapreduce' => 'statistics',
'map' => $map,
'reduce' => $reduce,
"query" => array(
"recordedTime" =>
array(
'$gte' => $this->startDate,
'$lte' => $this->endDate
)
),
"out" => array(
"inline" => 1
)
));
return $stats;
}
How would I filter this data correctly to get unique visitors? Or would it better to use aggregation, if so could you be so kind to help me out with a code snippet?

The $group operator in the aggregation framework was designed for exactly this use case and will likely be ~10 to 100 times faster. Read up on the group operator here: http://docs.mongodb.org/manual/reference/aggregation/group/
And the php driver implementation here: http://php.net/manual/en/mongocollection.aggregate.php
You can combine the $group operator with other operators to further limit your aggregations. It's probably best you do some reading up on the framework yourself to better understand what's happening, so I'm not going to post a complete example for you.

$m=new MongoClient();
$db=$m->super_test;
$db->gjgjgjg->insert(array(
"ipAddress" => "127.0.0.1",
"pageId" => new MongoId("522f80f59b7e7a0f2b000000"),
"uniqueVisitorId" => "0445905a-4015-4b70-a8ef-b339ab7836f1",
"recordedTime" => new MongoDate(strtotime("2013-09-16T20:20:19.0Z"))
));
var_dump($db->gjgjgjg->find(array('recordedTime'=>array('$lte'=>new MongoDate(),'$gte'=>new MongoDate(strtotime('-1 week')))))->count()); // Prints 1
$res=$db->gjgjgjg->aggregate(array(
array('$match'=>array('recordedTime'=>array('$lte'=>new MongoDate(),'$gte'=>new MongoDate(strtotime('-1 week'))),'uniqueVisitorId'=>array('$ne'=>null))),
array('$project'=>array('day'=>array('$dayOfMonth'=>'$recordedTime'),'month'=>array('$month'=>'$recordedTime'),'year'=>array('$year'=>'$recordedTime'))),
array('$group'=>array('_id'=>array('day'=>'$day','month'=>'$month','year'=>'$year'), 'c'=>array('$sum'=>1)))
));
var_dump($res['result']);
To answer the question entirely:
$m=new MongoClient();
$db=$m->super_test;
$db->gjgjgjg->insert(array(
"ipAddress" => "127.0.0.1",
"pageId" => new MongoId("522f80f59b7e7a0f2b000000"),
"uniqueVisitorId" => "0445905a-4015-4b70-a8ef-b339ab7836f1",
"recordedTime" => new MongoDate(strtotime("2013-09-16T20:20:19.0Z"))
));
var_dump($db->gjgjgjg->find(array('recordedTime'=>array('$lte'=>new MongoDate(),'$gte'=>new MongoDate(strtotime('-1 week')))))->count()); // Prints 1
$res=$db->gjgjgjg->aggregate(array(
array('$match'=>array('recordedTime'=>array('$lte'=>new MongoDate(),'$gte'=>new MongoDate(strtotime('-1 week'))),'uniqueVisitorId'=>array('$ne'=>null))),
array('$project'=>array('day'=>array('$dayOfMonth'=>'$recordedTime'),'month'=>array('$month'=>'$recordedTime'),'year'=>array('$year'=>'$recordedTime'))),
array('$group'=>array('_id'=>array('day'=>'$day','month'=>'$month','year'=>'$year','v'=>'$uniqueVisitorId'), 'c'=>array('$sum'=>1))),
array('$group'=>array('_id'=>array('day'=>'$_id.day','month'=>'$_id.month','year'=>'$_id.year'),'c'=>array('$sum'=>1)))
));
var_dump($res['result']);
Something close to that is what your looking for I believe.
It will reutrn a set of documents that have the _id as the date and then the count of unique visitors for that day irresptive of the of the id, it simply detects only if the id is there.
Since you want it per day you can actually exchange the dat parts for just one field of $dayOfYear I reckon.

Related

DynamoDB Count Group By

We are trying to search a dynamodb, and need to get count of objects within a grouping, how can this be done?
I have tried this, but when adding the second number, this doesn't work:
$search = array(
'TableName' => 'dev_adsite_rating',
'Select' => 'COUNT',
'KeyConditions' => array(
'ad_id' => array(
'ComparisonOperator' => 'EQ',
'AttributeValueList' => array(
array('N' => 1039722, 'N' => 1480)
)
)
)
);
$response = $client->query($search);
The sql version would look something like this:
select ad_id, count(*)
from dev_adsite_rating
where ad_id in(1039722, 1480)
group by ad_id;
So, is there a way for us to achieve this? I can not find anything on it.
Trying to perform a query like this on DynamoDB is slightly trickier than in an SQL world. To perform something like this, you'll need to consider a few things
EQ ONLY Hash Key: To perform this kind of query, you'll need to make two queries (i.e. ad_id EQ 1039722 / ad_id EQ 1480)
Paginate through query: Because dynamodb returns your result set in increments, you'll need to paginate through your results. Learn more here.
Running "Count": You can take the "Count" property from the response and add it to the running total as you're paginating through the results of both queries. Query API
You could add a Lambda function triggered by the DynamoDBStream, to aggregate your data on the fly, in your case add +1 to the relevant counters. Your search function would then simply retrieve the aggregated data directly.
Example: if you have a weekly online voting system where you need to store each vote (also to check that no user votes twice), you could aggregate the votes on the fly using something like this:
export const handler: DynamoDBStreamHandler = async (event: DynamoDBStreamEvent) => {
await Promise.all(event.Records.map(async record => {
if (record.dynamodb?.NewImage?.vote?.S && record.dynamodb?.NewImage?.week?.S) {
await addVoteToResults(record.dynamodb.NewImage.vote.S, record.dynamodb.NewImage.week.S)
}
}))
}
where addVoteToResults is something like:
export const addVoteToResults = async (vote: string, week: string) => {
await dynamoDbClient.update({
TableName: 'table_name',
Key: { week: week },
UpdateExpression: 'add #vote :inc',
ExpressionAttributeNames: {
'#vote': vote
},
ExpressionAttributeValues: {
':inc': 1
}
}).promise();
}
Afterwards, when the voting is closed, you can retrieve the aggregated votes per week with a single get statement. This solution also helps spreading the write/read load rather than having a huge increase when executing your search function.

How to get full list of Twitter followers using new API 1.1

I am using this https://api.twitter.com/1.1/followers/ids.json?cursor=-1&screen_name=sitestreams&count=5000 to list the Twitter followers list, But I got only list of 200 followers. How to increase the list of Twitter followers using the new API 1.1?
You must first setup you application
<?php
$consumerKey = 'Consumer-Key';
$consumerSecret = 'Consumer-Secret';
$oAuthToken = 'OAuthToken';
$oAuthSecret = 'OAuth Secret';
# API OAuth
require_once('twitteroauth.php');
$tweet = new TwitterOAuth($consumerKey, $consumerSecret, $oAuthToken, $oAuthSecret);
You can download the twitteroauth.php from here: https://github.com/elpeter/pv-auto-tweets/blob/master/twitteroauth.php
Then
You can retrieve your followers like this:
$tweet->get('followers/ids', array('screen_name' => 'YOUR-SCREEN-NAME-USER'));
If you want to retrieve the next group of 5000 followers you must add the cursor value from first call.
$tweet->get('followers/ids', array('screen_name' => 'YOUR-SCREEN-NAME-USER', 'cursor' => 9999999999));
You can read about: Using cursors to navigate collections in this link: https://dev.twitter.com/docs/misc/cursoring
You can't fetch more than 200 at once... It was clearly stated on the documentation where count:
The number of users to return per page, up to a maximum of 200. Defaults to 20.
you can somehow make it via pagination using
"cursor=-1" #means page 1, "If no cursor is provided, a value of -1 will be assumed, which is the first “page."
Here's how I run/update full list of follower ids on my platform. I'd avoid using sleep() like #aphoe script. Really bad to keep a connection open that long - and what happens if your user has 1MILL followers? You going to keep that connection open for a week? lol If you must, run cron or save to redis/memcache. Rinse and repeat until you get all the followers.
Note, my code below is a class that's run through a cron command every minute. I'm using Laravel 5.1. So you can probably ignore a lot of this code, as it's unique to my platform. Such as the TwitterOAuth (which gets all oAuths I have on db), TwitterFollowerList is another table and I check if an entry already exists, TwitterFollowersDaily is another table where I store/update total amount for the day for the user, and TwitterApi is the Abraham\TwitterOAuth package. You can use whatever library though.
This might give you a good sense of what you might do the same or even figure out a better way. I won't explain all the code, as there's a lot happening, but you should be able to guide through it. Let me know if you have any questions.
/**
* Update follower list for each oAuth
*
* #return response
*/
public function updateFollowers()
{
TwitterOAuth::chunk(200, function ($oauths)
{
foreach ($oauths as $oauth)
{
$page_id = $oauth->page_id;
$follower_list = TwitterFollowerList::where('page_id', $page_id)->first();
if (!$follower_list || $follower_list->updated_at < Carbon::now()->subMinutes(15))
{
$next_cursor = isset($follower_list->next_cursor) ? $follower_list->next_cursor : -1;
$ids = isset($follower_list->follower_ids) ? $follower_list->follower_ids : [];
$twitter = new TwitterApi($oauth->oauth_token, $oauth->oauth_token_secret);
$results = $twitter->get("followers/ids", ["user_id" => $page_id, "cursor" => $next_cursor]);
if (isset($results->errors)) continue;
$ids = $results->ids;
if ($results->next_cursor !== 0)
{
$ticks = 0;
do
{
if ($ticks === 13)
{
$ticks = 0;
break;
}
$ticks++;
$results = $twitter->get("followers/ids", ["user_id" => $page_id, "cursor" => $results->next_cursor]);
if (!$results) break;
$more_ids = $results->ids;
$ids = array_merge($ids, $more_ids);
}
while ($results->next_cursor > 0);
}
$stats = [
'page_id' => $page_id,
'follower_count' => count($ids),
'follower_ids' => $ids,
'next_cursor' => ($results->next_cursor > 0) ? $results->next_cursor : null,
'updated_at' => Carbon::now()
];
TwitterFollowerList::updateOrCreate(['page_id' => $page_id], $stats);
TwitterFollowersDaily::updateOrCreate([
'page_id' => $page_id,
'date' => Carbon::now()->toDateString()
],
[
'page_id' => $page_id,
'date' => Carbon::now()->toDateString(),
'follower_count' => count($ids),
]
);
continue;
}
}
});
}

MongoDB search $in _id php

Usually when I search for one related ID I do it like this:
$thisSearch = $collection->find(array(
'relatedMongoID' => new MongoId($mongoIDfromSomewhereElse)
));
How would I do it if I wanted to do something like this:
$mongoIdArray = array($mongoIDfromSomewhereElseOne, $mongoIDfromSomewhereElseTwo, $mongoIDfromSomewhereElseThree);
$thisSearch = $collection->find(array(
'relatedMongoID' => array( '$in' => new MongoId(mongoIdArray)
)));
I've tried it with and without the new MongoId(), i've even tried this with no luck.
foreach($mongoIdArray as $seprateIds){
$newMongoString .= new MongoId($seprateIds).', ';
}
$mongoIdArray = explode(',', $newMongoString).'0';
how do I search '$in' "_id" when you need to have the new MongoID() ran on each _id?
Hmm your rtying to do it the SQL way:
foreach($mongoIdArray as $seprateIds){
$newMongoString .= new MongoId($seprateIds).', ';
}
$mongoIdArray = explode(',', $newMongoString).'0';
Instead try:
$_ids = array();
foreach($mongoIdArray as $seprateIds){
$_ids[] = $serprateIds instanceof MongoId ? $seprateIds : new MongoId($seprateIds);
}
$thisSearch = $collection->find(array(
'relatedMongoID' => array( '$in' => $_ids)
));
That should produce a list of ObjectIds that can be used to search that field - relatedMongoID.
This is what I am doing
Basically, as shown in the documentation ( https://docs.mongodb.org/v3.0/reference/operator/query/in/ ) the $in operator for MongoDB in fact takes an array so you need to replicate this structure in PHP since the PHP driver is a 1-1 with the documentation on most fronts (except in some areas where you need to use an additional object, for example: MongoRegex)
Now, all _ids in MongoDB are in fact ObjectIds (unless you changed your structure) so what you need to do to complete this query is make an array of ObjectIds. The ObjectId in PHP is MongoId ( http://php.net/manual/en/class.mongoid.php )
So you need to make an array of MongoIds.
First, I walk through the array (could be done with array_walk) changing the values of each array element to a MongoId with the old value encapsulated in that object:
foreach($mongoIdArray as $seprateIds){
$_ids[] = $serprateIds instanceof MongoId ? $seprateIds : new MongoId($seprateIds);
}
I use a ternary operator here to see if the value is already a MongoId encapsulated value, and if not encapsulate it.
Then I add this new array to the query object to form the $in query array as shown in the main MongoDB documentation:
$thisSearch = $collection->find(array(
'relatedMongoID' => array( '$in' => $_ids)
));
So now when the query is sent to the server it forms a structure similar to:
{relatedMongoId: {$in: [ObjectId(''), ObjectId('')]}}
Which will return results.
Well... I came across the same issue and the solution might not be relevant anymore since the API might have changed. I solved this one with:
$ids = [
new \MongoDB\BSON\ObjectId('5ae0cc7bf3dd2b8bad1f71e2'),
new \MongoDB\BSON\ObjectId('5ae0cc7cf3dd2b8bae5aaf33'),
];
$collection->find([
'_id' => ['$in' => $_ids],
]);

Get top 5 documents with newest nested objects

I have the following data structure in MongoDB:
{ "_id" : ObjectId( "xy" ),
"litter" : [
{ "puppy_name" : "Tom",
"birth_timestamp" : 1353963728 },
{ "puppy_name" : "Ann",
"birth_timestamp" : 1353963997 }
]
}
I have many of these "litter" documents with varying number of puppies. The highter the timestamp number, the younger the puppy is (=born later).
What I would like to do is to retrieve the five youngest puppies from the collection accross all litter documents.
I tried something along
find().sort('litter.birth_timestamp' : -1).limit(5)
to get the the five litters which have the youngest puppies and then to extract the youngest puppy from each litter in the PHP script.
But I am not sure if this will work properly. Any idea on how to do this right (without changing the data structure)?
You can use the new Aggregation Framework in MongoDB 2.2 to achieve this:
<?php
$m = new Mongo();
$collection = $m->selectDB("test")->selectCollection("puppies");
$pipeline = array(
// Create a document stream (one per puppy)
array('$unwind' => '$litter'),
// Sort by birthdate descending
array('$sort' => array (
'litter.birth_timestamp' => -1
)),
// Limit to 5 results
array('$limit' => 5)
);
$results = $collection->aggregate($pipeline);
var_dump($results);
?>

Map Reduce To Get Most popular tags

I have a problem that I need some help on but I feel I'm close. It involves Lithium and MongoDB Code looks like this:
http://pastium.org/view/0403d3e4f560e3f790b32053c71d0f2b
$db = PopularTags::connection();
$map = new \MongoCode("function() {
if (!this.saved_terms) {
return;
}
for (index in this.saved_terms) {
emit(this.saved_terms[index], 1);
}
}");
$reduce = new \MongoCode("function(previous, current) {
var count = 0;
for (index in current) {
count += current[index];
}
return count;
}");
$metrics = $db->connection->command(array(
'mapreduce' => 'users',
'map' => $map,
'reduce' => $reduce,
'out' => 'terms'
));
$cursor = $db->connection->selectCollection($metrics['result'])->find()->limit(1);
print_r($cursor);
/**
User Data In Mongo
{
"_id" : ObjectId("4e789f954c734cc95b000012"),
"email" : "example#bob.com",
"saved_terms" : [
null,
[
"technology",
" apple",
" iphone"
],
[
"apple",
" water",
" beryy"
]
] }
**/
I am having a user savings terms they search on and then I am try to get the most populars terms
but I keep getting errors like :Uncaught exception 'Exception' with message 'MongoDB::__construct( invalid name '. does anyone have any idea how to do this or some direction?
First off I would not store this in the user object. MongoDb objects have an upper limit of 4/16MB (depending on version). Now this limit is normally not a problem, but when logging inline in one object you might be able to reach it. However a more real problem is that every time you need to act on these objects you need to load them into RAM and it becomes consuming. I dont think you want that on your user objects.
Secondly arrays in objects are not sortable and have other limitations that might come back to bite you later.
But, if you want to have it like this (low volume of searches should not be a problem really) you can solve this most easy by using a group query.
A group query is pretty much like a group query in sql, so its a slight trick as you need to group on something most objects share. (An active field on users maybe).
So, heres a working group example that will sum words used based on your structure.
Just put this method in your model and do MyModel::searchTermUsage() to get a Document object back.
public static function searchTermUsage() {
$reduce = 'function(obj, prev) {
obj.terms.forEach(function(terms) {
terms.forEach(function(term) {
if (!(term in prev)) prev[term] = 0;
prev[term]++;
});
});
}';
return static::all(array(
'initial' => new \stdclass,
'reduce' => $reduce,
'group' => 'common-value-key' // Change this
));
}
There is no protection against non-array types in the terms field (you had a null value in your example). I removed it for simplicity, its better to probably strip this before it ends up in the database.

Categories