Big Data/Analytics Zone is brought to you in partnership with:

Coming from a background of Aerospace Engineering, John soon discovered that his true interest lay at the intersections of information technology and entrepreneurship (and when applicable - math). In early 2011, John stepped away from his day job to take up software consulting. Finally John found permanent employment at Opensource Connections where he currently consults large enterprises about full-text search and Big Data applications. Highlights to this point have included prototyping the future of search with the US Patent and Trademark Office, implementing the search syntax used by patent examiners, and building a Solr search relevancy tuning framework called SolrPanl. John is a DZone MVB and is not an employee of DZone and has posted 23 posts at DZone. You can read more from them at their website. View Full User Profile

Understanding How CQL3 Maps to Cassandra's Internal Data Structure

07.25.2013
| 3222 views |
  • submit to reddit

This is just a quick reference for now. Later I hope to flesh out the details and discuss some of the cooler features present here:

Maps

Defining:

cqlsh:test> CREATE TABLE phonelists (... user text PRIMARY KEY,... phoneNumbers map<text,text>);

Inserting:

cqlsh:test> INSERT INTO phonelists (user, phonenumbers)... VALUES ('john',{'patricia':'555-4326','doug':'555-1579'});
cqlsh:test> INSERT INTO phonelists (user, phonenumbers)... VALUES ('scott',{'bill':'555-7382','patricia':'555-4326','jane':'555-8743'});

Retrieving:

cqlsh:test> SELECT * FROM phonelists;

 user  | phonenumbers
-------+------------------------------------------------------
 scott |{bill:555-7382, jane:555-8743, patricia:555-4326}
  john |{doug:555-1579, patricia:555-4326}

Updating:

cqlsh:test> UPDATE phonelists  
        ... SET phonenumbers = phonenumbers +{'daniel':'555-0453'}... WHERE user='john';

The internal representation:

[default@test] list phonelists;-------------------RowKey: scott
=>(column=, value=, timestamp=1374684062860000)=>(column=phonenumbers:bill, value='555-7382', timestamp=1374684062860000)=>(column=phonenumbers:jane, value='555-8743', timestamp=1374684062860000)=>(column=phonenumbers:patricia, value='555-4326', timestamp=1374684062860000)-------------------RowKey: john
=>(column=, value=, timestamp=1374683971220000)=>(column=phonenumbers:doug, value='555-1579', timestamp=1374683971220000)=>(column=phonenumbers:patricia, value='555-4326', timestamp=1374683971220000)

Note that the above text components, e.g. ‘doug’ and ’555-1579′, are returned by cassandra-cli in their hex encoding. I’ve taken the liberty to decode them so that the print out is more understandable. I will do this throughout the post.

Lists

Defining:

cqlsh:test> CREATE TABLE friendlists (... user text PRIMARY KEY,... friends list <text>...);

Inserting:

cqlsh:test> INSERT INTO friendlists (user, friends)... VALUES ('john',['doug','patricia','scott']);
cqlsh:test> INSERT INTO friendlists (user, friends)... VALUES ('patricia',['john','lucifer']);

Retrieving:

cqlsh:test> SELECT * FROM friendlists;

 user     | friends
----------+-------------------------
     john |[doug, patricia, scott]
 patricia |[john, lucifer]

Updating:

cqlsh:test> UPDATE friendlists  
        ... SET friends = friends +['matt','eric']... WHERE user='john';
cqlsh:test> UPDATE friendlists  
        ... SET friends = friends -['lucifer']... WHERE user='patricia';

The internal representation:

[default@test] list friendlists;Usingdefault limit of 100Usingdefault column limit of 100-------------------RowKey: john
=>(column=, value=, timestamp=1374687324950000)=>(column=friends:26017c10f48711e2801fdf9895e5d0f8, value='doug', timestamp=1374687206993000)=>(column=friends:26017c11f48711e2801fdf9895e5d0f8, value='patricia', timestamp=1374687206993000)=>(column=friends:26017c12f48711e2801fdf9895e5d0f8, value='scott', timestamp=1374687206993000)=>(column=friends:6c504b60f48711e2801fdf9895e5d0f8, value='matt', timestamp=1374687324950000)=>(column=friends:6c504b61f48711e2801fdf9895e5d0f8, value='eric', timestamp=1374687324950000)-------------------RowKey: patricia
=>(column=, value=, timestamp=1374687352290000)=>(column=friends:3b817b80f48711e2801fdf9895e5d0f8, value='john', timestamp=1374687243064000)

Here the internal column name is more complicated because a UUID is appended to the name of the CQL field “friend”. This is used to keep track of the order of items in the list.

To be determined: Does a list item delete take more time than a list insert? I suspect so – I don’t see how Cassandra can delete an element in the list without reading in all the elements in the list and then deleting column that has the value indicated in the delete.

Set

Defining:

cqlsh:test> CREATE TABLE friendsets (... user text PRIMARY KEY,... friends set<text>...);

Inserting:

cqlsh:test> INSERT INTO friendsets (user, friends)... VALUES ('john',{'doug','patricia','scott'});
cqlsh:test> INSERT INTO friendsets (user, friends)... VALUES ('patricia',{'john','lucifer'});

Retrieving:

cqlsh:test> SELECT * FROM friendsets;

 user     | friends
----------+-------------------------
     john |{doug, patricia, scott}
 patricia |{john, lucifer}

Updating:

cqlsh:test> UPDATE friendsets  
        ... SET friends = friends +{'matt','eric'}... WHERE user='john';
cqlsh:test> UPDATE friendsets  
        ... SET friends = friends -{'lucifer'}... WHERE user='patricia';

The internal representation:

[default@test] list friendsets;Usingdefault limit of 100Usingdefault column limit of 100-------------------RowKey: john
=>(column=, value=, timestamp=1374688135443000)=>(column=friends:'doug', value=, timestamp=1374688108307000)=>(column=friends:'eric', value=, timestamp=1374688135443000)=>(column=friends:'matt', value=, timestamp=1374688135443000)=>(column=friends:'patricia', value=, timestamp=1374688108307000)=>(column=friends:'scott', value=, timestamp=1374688108307000)-------------------RowKey: patricia
=>(column=, value=, timestamp=1374688151386000)=>(column=friends:'john', value=, timestamp=1374688116595000)

- See more at: http://www.opensourceconnections.com/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure-sets-lists-and-maps/#sthash.6nKcPjFs.dpuf

Published at DZone with permission of John Berryman, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)