MongoDB query to get number of times a Key occurs - php

So I have a MongoDB document that tracks logons to our app. Basic structure appears thusly:
[_id] => MongoId Object
(
[$id] => 50f6da28686ba94b49000003
)
[userId] => 50ef542a686ba95971000004
[action] => login
[time] => 1358354984
Now- the challenge is this: there are about 20,000 of these entries. I have been challenged to look at the number of times each user logged in (as defined by userId)...so I am looking for a good way to do this. There are a couple of possible approaches that I've seen (in SQL, for example, I might pull down number of logins by grouping by UserID and doing a count on it- something like SELECT userID, count(*) from....group by UserId...and then sub-selecting on that (CASE WHEN or something in the top select).
Anyways- wondering if anyone has any suggestions on the best way to do this. Worst case scenario I can limit the result set and do the grouping in memory- but ideally would like to get the full answer directly from Mongo.
The other limitation (even after I get past the first set) is that I am looking to do a unique count by date...which will be even tougher!

Now- the challenge is this: there are about 20,000 of these entries.
At 20,000 you will probably be better off with the aggregation framework ( http://docs.mongodb.org/manual/applications/aggregation/ ):
$db->user->aggregate(array(
array( '$group' => array( '_id' => '$userId', 'num_logins' => array( '$sum' => 1 ) ) )
));
That will group ( http://docs.mongodb.org/manual/reference/aggregation/#_S_group ) by userId and count (sum: http://docs.mongodb.org/manual/reference/aggregation/sum/#_S_sum ) the amount of grouped login there are.
Note: As stated in the comments, the aggregate helper is in version 1.3+ of the PHP driver. Before version 1.3 you must use the command function directly.

You can use MapReduce to group the results by user ID
http://docs.mongodb.org/manual/applications/map-reduce/#map-reduce-examples
Or you can use the Group method:
db.logins.aggregate(
{ $group : {
_id : "$userId",
loginsPerUser : { $sum : 1 }
}}
);
For MongoDB 20K or even more won't be a problem to walk and combine them so no worries about performance.

http://docs.mongodb.org/manual/reference/command/group/
db.user.group({key: {userId: 1}, $reduce: function ( curr, result ) { result.total++ }, initial: {total: 0}});
I ran this on 191000 rows in just a couple seconds but group is limited to 20,000 unique entries so it really isn't a solution for you.

Related

php Mongo driver cursor traveling take long so much

I have a query like this
$results = $collection->find([
'status' => "pending",
'short_code' => intval($shortCode),
'create_time' => ['$lte' => time()],
])
->limit(self::BATCH_NUMBER)
->sort(["priority" => -1, "create_time" => 1]);
Where BATCH_SIZE is 70.
and i use the result of query like below :
foreach ($results as $mongoId => $result) {
}
or trying to convert in to array like :
iterator_to_array($results);
mongo fetch data and traveling on iterate timing is :
FetchTime: 0.003173828125 ms
IteratorTime: 4065.1459960938 ms
As you can see, fetching data by mongo is too fast, but iterating (in both case of using iterator_to_array or using foreach) is slow.
It is a queue for sending messages to another server. Destination server accept less than 70 documents per each request. So i forced to fetch 70 document. anyway. I want to fetch 70 documents from 1,300,000 documents and we have problem here.
query try to fetch first 70 documents which have query conditions, send them and finally delete them from collection.
can anybody help? why it takes long? or is there any config for accelerating for php or mongo?
Another thing, when total number of data is like 100,000 (isntead of 1,300,000) the traveling is fast. traveling time will increase by increasing number of total documents.
That was because of sorting.
The problem :
Fetching data from mongo was fast, but traveling in iterator was slow using foreach.
solution :
there was a sort which we used. sorting by priority DESC and create_time ASC. These fileds was index ASC seperatly. Indexing priority DESC and create_time ASC together fixed problem.
db.queue_1.createIndex( { "priority" : -1, "create_time" : 1 } )
order of fileds on indexing is important. means you should use priority at first. then use create_time.
because when you try to sort your query, you sort them like below :
.sort({priority : -1, create_time : 1});

Extract specific data from a php string

I have a url stored in a database in the following format
index.php?main_page=product_info&cPath=1_11&products_id=568
I want to be able to extract the cPath data, 1_11 in this case, and the products id '568' to two separate variables. Note that the cPath value could vary from being a single number such as 23 to a series of numbers and underscores such as 17_25_31. If extracting the cPath is too difficult I could use the products_id once it's extracted and query the database again, but this isn't ideal as I want to avoid additional requests as much as possible.
I really don't know the best (correct) way to go about this.
A more refined approach as suggested by Robbie Averill
//first lets the the query string alone
$string=parse_url('index.php?main_page=product_info&cPath=1_11&products_id=568', PHP_URL_QUERY);
parse_str($string,$moo);
print_r($moo);
Output:
Array
(
[main_page] => product_info
[cPath] => 1_11
[products_id] => 568
)
My original suggestion.
parse_str('index.php?main_page=product_info&cPath=1_11&products_id=568',$moo);
print_r($moo);
output:
Array
(
[index_php?main_page] => product_info
[cPath] => 1_11
[products_id] => 568
)

Doctrine Paginator selects entire table (very slow)?

This is related to a previous question here: Doctrine/Symfony query builder add select on left join
I want to perform a complex join query using Doctrine ORM. I want to select 10 paginated blog posts, left joining a single author, like value for current user, and hashtags on the post. My query builder looks like this:
$query = $em->createQueryBuilder()
->select('p')
->from('Post', 'p')
->leftJoin('p.author', 'a')
->leftJoin('p.hashtags', 'h')
->leftJoin('p.likes', 'l', 'WITH', 'l.post_id = p.id AND l.user_id = 10')
->where("p.foo = bar")
->addSelect('a AS post_author')
->addSelect('l AS post_liked')
->addSelect('h AS post_hashtags')
->orderBy('p.time', 'DESC')
->setFirstResult(0)
->setMaxResults(10);
// FAILS - because left joined hashtag collection breaks LIMITS
$result = $query->getQuery()->getResult();
// WORKS - but is extremely slow (count($result) shows over 80,000 rows)
$result = new \Doctrine\ORM\Tools\Pagination\Paginator($query, true);
Strangely, count($result) on the paginator shows the total number of rows in my table (over 80,000) but traversing the $result with foreach outputs 10 Post entities, as expected. Do I need to do some additional configuration to properly limit my paginator?
If this is a limitation of the paginator class what other options do I have? Writing custom paginator code or other paginator libraries?
(bonus): How can I hydrate an array, like $query->getQuery()->getArrayResult();?
EDIT: I left out a stray orderBy in my function. It looks like including both groupBy and orderBy causes the slowdown (using groupBy rather than the paginator). If I omit one or the other, the query is fast. I tried adding an index on the "time" column in my table, but didn't see any improvement.
Things I Tried
// works, but makes the query about 50x slower
$query->groupBy('p.id');
$result = $query->getQuery()->getArrayResult();
// adding an index on the time column (no improvement)
indexes:
time_idx:
columns: [ time ]
// the above two solutions don't work because MySQL ORDER BY
// ignores indexes if GROUP BY is used on a different column
// e.g. "ORDER BY p.time GROUP BY p.id is" slow
You should simplify your query. That would shave off some execution time. I can't test your query but here are a few pointers:
don't do sort while executing count()
you could sort by orderBy('p.id', 'DESC'), index would be used
instead of leftJoin() you could use join() if at least one record always exists at joined table. Else that record is skipped.
KNP/Paginator uses DISTINCT() to read only distinct records, but that could lead to using disk tmp table
$query->getArrayResult() uses array hidration mode, which returns multidimension array and it is way faster than object hidration for large result set
you could use partial select('partial p.{id, other used fields}'), this way you would load only needed fields, maybe skip unneded relations when using object hydration
check SF profiler EXPLAIN on a given query under doctrine section, maybe indexes are not used
does p.hashtags and p.likes return only one row or is oneToMany, which multiplies result
maybe some Posts design changes, that would remove some joins:
have p.hashtags field defined as #ORM\Column(type="array") and have stored string values of tags. Later maybe using full text search on serialized array.
have p.likesCount field defined as #ORM\Column(type="integer") which would have count of likes
I use KnpLabs/KnpPaginatorBundle and can also have speed issues for complex queries.
Usually using LIMIT x,z is slow for DB, because it runs COUNT on whole dataset. If indexes are not used it is painfully slow.
You could use different approach and do some custom pagination by ID advancing, but that would complicate your approach. I have used this with large datasets like SYSLOG tables. But you loose sorting and total record count functionality.
At the end of the day, many of the queries used in my application are too complex to make proper use of the Paginator, and I wasn't able to use array hydration mode with the Paginator.
According to MySQL documentation, ORDER BY cannot be resolved by indexes if GROUP BY is used on a different column. Thus, I ended up using a couple post-processing queries to populate my base results (ORDERed and LIMITed) with one-to-many relations (like hashtags).
For joins that load a single row from the joined table, I was able to join the desired values in the base ordered query. For example, when loading the "like status" for a current user, only one like from the set of likes needs to be loaded to indicate whether or not the current post has been liked. Similarly, the presence of only one author for a given post produces a single joined author row. e.g.
$query = $em->createQueryBuilder()
->select('p')
->from('Post', 'p')
->leftJoin('p.author', 'a')
->leftJoin('p.likes', 'l', 'WITH', 'l.post_id = p.id AND l.user_id = 10')
->where("p.foo = bar")
->addSelect('a AS post_author')
->addSelect('l AS post_liked')
->orderBy('p.time', 'DESC')
->setFirstResult(0)
->setMaxResults(10);
// SUCCEEDS - because joins only join a single author and single like
// no collections are joined, so LIMIT applies only the the posts, as intended
$result = $query->getQuery()->getArrayResult();
This produces a result in the form:
[
[0] => [
['id'] => 1
['text'] => 'foo',
['author'] => [
['id'] => 10,
['username'] => 'username',
],
['likes'] => [
[0] => [
['post_id'] => 1,
['user_id'] => 10,
]
],
],
[1] => [...],
...
[9] => [...]
]
Then in a second query I load the hashtags for posts loaded in the previous query. e.g.
// we don't care about orders or limits here, we just want all the hashtags
$query = $em->createQueryBuilder()
->select('p, h')
->from('Post', 'p')
->leftJoin('p.hashtags', 'h')
->where("p.id IN :post_ids")
->setParameter('post_ids', $pids);
Which produces the following:
[
[0] => [
['id'] => 1
['text'] => 'foo',
['hashtags'] => [
[0] => [
['id'] => 1,
['name'] => '#foo',
],
[2] => [
['id'] => 2,
['name'] => '#bar',
],
...
],
],
...
]
Then I just traverse the results containing hashtags and append them to the original (ordered and limited) results. This approach ends up being much faster (even though it uses more queries), as it avoids GROUP BY and COUNT, fully leverages MySQL indexes, and allows for more complex queries, such as the one I posted here.
You can configure the paginator to use a simpler 'count' sql strategy by doing one or more of the optimizations below.
$paginator = new Paginator($query, false);
$paginator->setUseOutputWalkers(false);
If results are unexpected you may want to do a DISTINCT select (select('DISTINCT p'))
For us it made massive improvements and we had no need to write or use a custom paginator.
More details can be found on this site. Note that I am owner of that website.

EMongoCriteria : Limit and Group By show less rows

I am using MongoDB, with PHP YII. I have used YiiMongoDbSuite for setting up the criteria for mongoDB Queries.
Currently, I am using Group by and Limit together. But due to some reason queries are returning less number of rows than that are expected.
$criteria=new EMongoCriteria();
$criteria->group('col_1');
$criteria->limit(10);
$result = TableName::model()->findAll($criteria);
Can somebody guide me as I am quite new to MongoDB and YiiMongoDbSuite.
Thanks in advance,
Well to do it using MongoYii (which I maintain):
$result = MongoModel::model()->aggregate(
array(
'$group' => array('_id' => 'col_1'),
'$limit' => 10
)
)
I am unsure how to do it with YiiMongoDbSuite, in fact there is no group command in its EMongoCriteria from what I see.

Save php function in mysql

It has been my late-childhood dream to create a game, and now as I actually know how I thought I should fulfill my dream and started working on a little game project in my spare time. It's basically a combat type of game where you have up to 3 units, as well as your opponent, and you take turns ( since it's http, you know that feel ) to attack each other and cast spells and stuff. The issue I came across with is with abilities and how to store them. Basically if I were to store abilities in an array it would look something like
$abilities = array(
0 => array(
'name' => 'Fire ball',
'desc' => 'Hurls a fire ball at your enemy, dealing X damage.'
'effect' => function($data){
$data['caster']->damage($data['target'], $data['caster']->magicPower);
}
),
1 => array(...
);
But if I were to store abilities this way, every time I needed to fetch information about a single ability I would need to load the whole array and it's probably going to get pretty big in time, so that would be a tremendous waste of memory. So I jumped to my other option, to save the abilities in a mysql table, however I'm having issues with the effect part. How can I save a function into a mysql field and be able to run it on demand?
Or if you can suggest another way to save the abilities, I may have missed.
To answer your question related to storing arrays into database like MySQL I would like you to serialize the Array as String. The normal direct serialization not going to work because they don't deal with closure.
You need to use classes like super_closure which can serialize the methods and convert them into string. Read more here
https://github.com/jeremeamia/super_closure
$helloWorld = new SerializableClosure(function($data){
$data['caster']->damage($data['target'], $data['caster']->magicPower);
});
$serializedFunc = serialize($helloWorld);
Now you can create array like this:
$abilities = array(
0 => array(
'name' => 'Fire ball',
'desc' => 'Hurls a fire ball at your enemy, dealing X damage.'
'effect' => $serializedFunc
));
This Array can now be saved directly, serialized or encoded to JSON.
I would recommend you to look at Redis or Memcache for caching query results and don't use MySQL to store functions.
You could have tree tables
spell
id
name
description
spell_effect
id
name
serversidescript
spell_effect_binder
spell_id
spell_effect_id
This would make sure, that your logic is in php files, where ever you would like them to be located, but all the meta of the spells, effects and how they bind together in the database. Meaning you will only load the function/script of the ones in need. Plus, giving you the possibility to append multiple effects to one spell.
//Firedamage.php
public function calculateEffects($level,$caster,$target) {
$extraDamage = 5*$level;
$randDamage = rand(10,50);
$caster->damage( $target, ($randDamage+$extraDamage) );
}
Spell_effect entry
id = 1
name = 'firedamage'
serversidescript = 'Firedamage.php'
spell
id = 1
name = 'Fireball'
description = 'Hurls a fireball at your foe'
spell_effect_binder
spell_id = 1
spell_effect_id = 1

Categories