NoSQL Zone is brought to you in partnership with:

Arnon Rotem-Gal-Oz is the director of technology research for Amdocs. Arnon has more than 20 years of experience developing, managing and architecting large distributed systems using varied platforms and technologies. Arnon is the author of SOA Patterns from Manning publications. Arnon is a DZone MVB and is not an employee of DZone and has posted 68 posts at DZone. You can read more from them at their website. View Full User Profile

Killing the HBase Zombie Table

01.16.2013
| 1726 views |
  • submit to reddit

 

zombie

One of our team leaders approached me in the hall today and asked if I could land a hand in troubleshooting something. He and our QA lead were configuring one of our test Hadoop clusters after an upgrade and they had a problem with one table they were trying to set up:

  • When they tried to create the table in HBase shell they got an error that the table exists
  • When they tried to delete the table they got an error that the table does not exist
  • HBase ships with a health-check and fix util called hbck (use: hbase hbck to run. see here for details) – they’ve run hbase reports everything is fine and dandy

Hmm, The first thing I tied to do is to look at the .META. table. This is where HBase keeps the tables and the regions they use. I thought maybe there was some just there. but it didn’t look like that. I tried to do a major compaction for it and that didn’t help either.

The next thing I tried actually found the problem. I ran the Zookeeper client (I used hbase zkcli but you can also run it via zookeeper scripts) and looked at /hbase/table (ls /hbase table) -the zombie table was listed right there with all the legit tables. HBase stores some data schema and state of each table in zookeeper to be able to coordinate between all the regionservers and it seems that during the upgrade process the system was restarted a few times. One of these restarts coincided with a removal of the table and caught it in the middle.

Ok, so that is the problem – what’s the solution? Simple just remove the offending znode from zookeeper (rmr /hbase/table/TABLE_NAME ) and restart the cluster (since the data is cached in the regionservers/hbase master to save trips to zookeeper). Also be careful not to remove any other node or you’d cause problems to other tables.

The role of ZooKeeper in HBase is not documented very well. The only online account of ZooKeeper’s role with HBase I found (save looking at the code itself of course) is really outdated. Hopefully this post will save some head scratching and time for others who find themselves with the same problem.

Anyway, I hope the next post I’ll do on ZooKeeper will be about something much nicer  :) 

Published at DZone with permission of Arnon Rotem-gal-oz, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)