Big Data/Analytics Zone is brought to you in partnership with:

Adam Kawa is an Hadoop developer and administrator, as well as a co-founder of the Warsaw Hadoop User Group and blogger at HakunaMapData.com. Adam is a DZone MVB and is not an employee of DZone and has posted 3 posts at DZone. You can read more from them at their website. View Full User Profile

Recreating an HBase Table Without Violating Region Starting Keys

10.14.2012
| 3078 views |
  • submit to reddit

Recently, I was using completebulkload to load a large amount of data into HBase. To keep this process well-balanced over the entire cluster (and thus faster), I was loading data into a table with pre-created regions. Since my bulkloading process failed in the middle a couple of times (due to some misconfiguration), I needed to truncate the table over and over.

I have noticed that a command like

hbase(main):017:0> truncate 'table'

will disable, drop and recreate the table with the same name settings (number of column familiers, compression, ttl, blocksize etc), but it does not maintain the region boundaries. It means that if you have a table with some number of regions and then truncate it, the table will be recreated with a single one region e.g.

hbase(main):018:0> create 't1', 'f1', {SPLITS => ['10', '20', '30', '40']}
hbase(main):019:0> truncate 't1'
Truncating 't1' table (it may take a while):
 - Disabling table...
 - Dropping table...
 - Creating table...

If you look at http://${hbase-master}:60010/table.jsp?name=t1, you will find that truncating makes table t1 empty and reduces the number of regions to single one.

Actually, I needed a functionality to truncate a table, but not violate the region and their starting keys (since I have choosen them so carefully earlier to balance the load over all region servers). I've implemented simple script for this purpose:

include Java
 
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hbase.HTableDescriptor
 
table_name = ARGV[0]
 
table = HTable.new(table_name)
region_start_keys = table.getStartKeys()
table.close()
 
admin = HBaseAdmin.new(Configuration.new())
table_descriptor = admin.getTableDescriptor(Bytes.toBytes(table_name))
admin.disableTable(table_name)
admin.deleteTable(table_name)
admin.createTable(table_descriptor, region_start_keys)

You may use it in the following day:

$ hbase org.jruby.Main region-keys.rb t1

The table should be successfully recreated with the same region boundaries, so that it will look the same as before, but it will be empty.

Published at DZone with permission of Adam Kawa, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Cristofer Weber replied on Mon, 2012/10/15 - 11:15am

 

Nice post, Adam!

I had the same problem few months ago, doing a similar bulk load activity, and I solved the problem writing a script with drop followed by create. Your idea - keeping HTableDescriptor for reuse - can be incorporated to standard truncate command as an optional parameter. Have you thought about that?

Regards,

Cristofer

 

Adam Kawa replied on Mon, 2012/10/15 - 6:00pm in response to: Cristofer Weber

 

Hi Cristofer,

Thank you for reading the post.

Do you mean proposing an overloaded truncate command to the HBase shell that supports an additional parameter for recreating regions' boundaries?

If yes, obviously I can fill Jira and prepare a simple patch for that ;)

Kind regards,

Adam

 

Cristofer Weber replied on Tue, 2012/10/16 - 7:28am in response to: Adam Kawa

Hi Adam!

Yes, that's what I meant. After re-reading my post I saw it was not clear. It would be really nice to have this feature available.

Regards,

Cristofer 

Federico Gaule ... replied on Thu, 2013/10/17 - 2:13pm

My 2 cents:

In case you have, let's say, 4 regions:

  1. <nothing> - C
  2. D - H
  3. I - Q
  4. P - Z

You will get an error like: java.lang.IllegalArgumentException: Empty split key must not be passed in the split keys.

It's because you have one region where startkey is empty. 



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.