Question:
I have a collection composed as follows, but with a lot more data.
{
_id: ObjectId("db759d014f70743495ef1000"),
tracked_item_origin: "winword",
tracked_item_type: "Software",
machine_user: "mmm.mmm",
organization_id: ObjectId("a91864df4f7074b33b020000"),
group_id: ObjectId("20ea74df4f7074b33b520000"),
tracked_item_id: ObjectId("1a050df94f70748419140000"),
tracked_item_name: "Word",
duration: 9540,
}
{
_id: ObjectId("2b769d014f70743495fa1000"),
tracked_item_origin: "http://www.facebook.com",
tracked_item_type: "Site",
machine_user: "gabriel.mello",
organization_id: ObjectId("a91864df4f7074b33b020000"),
group_id: ObjectId("3f6a64df4f7074b33b040000"),
tracked_item_id: ObjectId("6f3466df4f7074b33b080000"),
tracked_item_name: "Facebook",
duration: 7920,
}
I've already done an aggregation that returns all this data in a grouped way as follows:
{"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"Twitter"}, "duration"=>288540},
{"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"ANoticia"}, "duration"=>237300},
{"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"Facebook"}, "duration"=>203460},
{"_id"=>{"tracked_item_type"=>"Software", "tracked_item_name"=>"Word"}, "duration"=>269760},
{"_id"=>{"tracked_item_type"=>"Software", "tracked_item_name"=>"Excel"}, "duration"=>204240}
The aggregation code is simple:
AgentCollector.collection.aggregate(
{'$match' => {group_id: '20ea74df4f7074b33b520000'}},
{'$group' => {
_id: {tracked_item_type: '$tracked_item_type', tracked_item_name: '$tracked_item_name'},
duration: {'$sum' => '$duration'}
}},
{'$sort' => {
'_id.tracked_item_type' => 1,
duration: -1
}}
)
My problem, is how can I limit it to only 2 grouped Site items (tracked_item_type: "Site") and 2 Software items (tracked_item_type: "Software")?
Answer:
I thought a lot about a simpler way to do this, but apparently the only possible way is to throw the result of this aggregation into a collection and then make filters on top of this stored result.
Saving the report result in a new collection : The "aggregate" command doesn't have this type of option, for that I can substitute a mapReduce() that will do the same thing but specifying the "out" option:
db.trackeditems.mapReduce(
function() {
var key = {'tracked_item_type': this.tracked_item_type, 'tracked_item_name': this.tracked_item_name};
emit(key, this.duration);
},
function(key, values) {
var totalDuration = 0;
for (var i = 0; i < values.length; ++i) {
totalDuration += values[i]
}
return totalDuration;
},
{
out: "tracked_items_report",
query: {group_id: ObjectId('20ea74df4f7074b33b520000')}
}
)
Note: I have no knowledge of the Ruby API, so I'm showing the example only on the command line in Mongo.
Consulting the report
After executing this command you will have in your db the tracked_items_report collection with the result of the aggregation. From there you can make any query ordering and limiting the records in order to achieve the desired result.
// Obtendo os 2 tops no tipo 'Site'
db.tracked_items_report.find({'_id.tracked_item_type': 'Site'}).sort({'value':-1}).limit(2)
// Obtendo os 2 tops no tipo 'Software'
db.tracked_items_report.find({'_id.tracked_item_type': 'Software'}).sort({'value':-1}).limit(2)
Yes, it takes a little more work, now there are 3 operations in the bank instead of just one (which according to your question would be ideal), but you also get an additional benefit "for free" which is to have the result of the aggregation (possibly a heavy transaction) stored in the database as a cache for later queries.