NoSQL Zone is brought to you in partnership with:

Don Pinto is a Product Marketing Manager with experience in cloud and database technologies. Don is a DZone MVB and is not an employee of DZone and has posted 88 posts at DZone. You can read more from them at their website. View Full User Profile

Couchbase: Understanding group_level in View Queries

01.28.2013
| 1715 views |
  • submit to reddit

Curator's Note:  The content of this article was originally written by Jasdeep Jaitla over at the Couchbase blog.

Querying Views is what views are all about. Our documentation is great and can be found here: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views.html

Compound Keys

Just to illustrate how group level works, let's use a Map function that sorts users by the latest login timestamp, converting that timestamp to an array [yyyy,MM,dd,hh,mm,ss] in what we call a "compound key" (or Array Key) as the output of the Map function:

Map Function
function(doc, meta) {
    if (doc.login_timestamp) {
        emit(dateToArray(doc.login_timestamp), doc.username);
    }
}

Reduce Function _count Resulting Rows (with reduce=false)
[2011,10,10,10,10,09] "scalabl3"
[2012,12,24,17,03,59] "scalabl3"
[2013,01,01,08,22,23] "scalabl3"
[2013,01,25,10,38,01] "tgrall"
[2013,01,25,11,02,32] "jzablocki"
[2013,02,01,11,02,32] "scalabl3"

First thing of note and *very* important, even though this is an array output that seems like integers from the javascript Map function, they are not, each of those Index Keys are strings, and are ordered character by character as strings, including the brackets and commas, notice that all single digits are padded with zeros in front, and that is why the order is maintained. It's more like this, so we'll go ahead and keep the quote characters:

["2011","10","10","10","10","09"] "scalabl3"
["2012","12","24","17","03","59"] "scalabl3"
["2013","01","01","08","22","23"] "scalabl3"
["2013","01","25","10","38","01"] "tgrall"
["2013","01","25","11","02","32"] "jzablocki"
["2013","02","01","11","02","32"] "scalabl3"

Quick Illustrative Counter-Example for Key Ordering

If you had the following Map output, notice that it is sorted differently than it would be if the Int parameters were actually Int's, in fact Index Keys are always strings.

[2012,"beer",1] null
[2012,"beer",10] null
[2012,"beer",2] null

Notice that the second "element" of the Index Key is ordered to be before the 3rd because of string compare, these are not integers. Back to the scheduled program…

Continuing with Group Level...

So now, back to the list of login_timestamps in the view above, if we want to see how many people logged in during each year, I add the query parameter "group_level=1" and it splits the string on the first comma and groups the left elements together by string match, and then reduces on the group, producing the count.

["2011"] 1
["2012"] 1
["2013"] 4

Now, if we want to have a finer grain view (year, month), we add 1 to the group_level, group_level=2, which splits on the second comma and takes the left element and groups them, followed by the reduce. This results in the following with the same data:

["2011","10"] 1
["2012","12"] 1
["2013","01"] 3
["2013","02"] 1

Moving along to the next group_level, group_level=3, will group by (yyyy,MM,dd), which is the third element of the compound key, this splits on the 3rd comma, and groups together the left element and reduces:

["2011","10","10"] 1
["2012","12","24"] 1
["2013","01","01"] 1
["2013","01","25"] 2
["2013","02","01"] 1

In this tiny dataset, if we go to group_level=4, which in this case is by hour (yyyy,MM,dd,hh), each element is it's own row result with being able to be grouped and reduced because they are all unique. In a larger and more frequently updated dataset you could probably group by minute and see higher _count numbers:

["2011","10","10","10"] 1
["2012","12","24","17"] 1
["2013","01","01","08"] 1
["2013","01","25","10"] 1
["2013","01","25","11"] 1
["2013","02","01","11"] 1

Understanding Compound Keys

As you can see, compound keys are very useful, but some level of understanding helps you use it correctly. Remembering that this is a string compare will help you avoid creating keys that won't sort the way you expect. In addition it's good to left pad numbers with 0's to ensure they are ordered correctly as well since it is string compare and not by integer "value":

function (doc, meta) {
  if (doc.year && doc.type == "beer" && doc.int_value) {
    var val = doc.int_value.toString();
    var max ="000000000";
    var pval = max.substring(0, max.length - val.length) + val;
    emit([doc.year, doc.type, pval], null);
  }
}
Resulting Rows (with reduce=false)
[2012,"beer","000000001"] 1
[2012,"beer","000000002"] 1
[2012,"beer","000000010"] 1

Of course you can put anything in the array, not just dateToArray! It's a useful tactic for being able to group items together for reductions. Last thing to remember is that if you pad with zero's these are strings, so if you want to do a Range Query on the resulting Index Key, you must also pad that range with 0's as well if you are looking for a specific range, for example:

startkey=[2012,"beer","000000002"]
&endkey=[2012,"beer","000000007"]

Understanding Unicode Collation

For more information about string character ordering, please read my post on Unicode Collation in Views, as it isn't in byte order and applies to compound keys as well: Understanding Letter Ordering in View Queries


 

Published at DZone with permission of Don Pinto, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)