Born from the idea of Google Bigtable



  1. Java 6 or above
  2. SSH - using passwordless login (Google “ssh passwordless login”)
  3. DNS
  4. NTP
  5. ulimit and nproc
  • base需要很多文件句柄,所以系统默认的文件句柄数基本上不够,一般需要设置为1w以上
  • “upping the file descriptors and nproc for the user who is running the HBase process is an operating system configuration, not an HBase configuration.”


Hbase的配置需要在所有结点之间同步, 可以在一台机器上编辑完,然后用rsync同步,或者,专门搞一台发布机,集中管理配置, 然后分发(puppet之类的软件应该是干这个事情的)。

配置hbase的时候, 最基本的需要通过override默认的配置,告知hbase:

  1. 使用什么FileSystem,是local的还是HDFS等;
  2. 要使用的zookeeper部署位置是什么;


HBase will lose data unless it is running on an HDFS that has a durable sync.

HBase Replication

8.6.4. Write Ahead Log (WAL)
    The WAL is in HDFS in /hbase/.logs/ with subdirectories per region.

应该需要了解region server在zk里的ephemeral node,以便在region server failover之后,eromanga也可以转到新的region server上从新的hlog开始抓取变更。

API - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements

Once a HRegionServer starts and is opening the regions it hosts it checks if there are some left over log files and applies those all the way down in Store.doReconstructionLog(). Replaying a log is simply done by reading the log and adding the contained edits to the current MemStore. At the end an explicit flush of the MemStore (note, this is not the flush of the log!) helps writing those changes out to disk.

|—-HLog Replay

HBase Replication Ref

  1. http://hbase.apache.org/replication.html
  2. http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/wal/HLog.html
  3. http://blog.sematext.com/2011/03/11/hbase-backup-options/ - HBase Backup Options > You might want to check out at MapR’s distro for Apache Hadoop (www.mapr.com). It has consistent point-in-time snapshots, as well the ability to mirror the snapshots to another data-center for disaster-recovery.
  4. https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/hadoop/Backup.java | http://blog.mozilla.com/data/2011/02/04/migrating-hbase-in-the-trenches/ - Mozilla Backup tool
  5. http://javamaster.wordpress.com/2010/03/19/replication-architecture-in-cassandra-and-hbase/
  6. http://koven2049.iteye.com/blog/983633
  7. How Google Serves Data From Multiple Datacenters
  8. Hbase的log管理(一)
  9. Hbase的log管理(二)
  10. HBase异常——当RegionServer Crash之后


  1. When hbase.hlog.split.skip.errors is set to false, we fail the split but thats it
  2. Figure how to deal with eof splitting logs
  3. Multi data center replication

Schema Design

in general its best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.



  1. Apache HBase
  2. Cloudera
  3. Others


  1. hbase 源码解析之master篇1
  2. hbase 源码解析之master篇2


  1. http://www.pigi-project.org/ - Powerful, Invincible, Great Indexing for HBase
  2. ** Bigtable: A Distributed Storage System for Structured Data **
  3. HBase Schema Design - Things you need to know
  4. HBase Architecture 101 - Storage
  5. HBase Architecture 101 - Write-ahead-Log
  6. HBase Schema Design - Things you need to know
  7. http://www.spnguru.com/tag/hbase/ - 趋势科技中国研发中心SPN研发团队hbase tag
  8. Coprocessors: Support aggregate functions
  9. HBase在淘宝的应用和优化小结
  10. http://www.meetup.com/hbaseusergroup/files/
  11. https://www.guru99.com/hbase-architecture-data-flow-usecases.html - Guru99 HBase Tutorial

>>>>>> 更多阅读 <<<<<<


  1. 不但有及时新鲜的AI资讯和深度探讨
  2. 还分享AI工具、产品方法和商业机会
  3. 更有体系化精品付费内容等着你,加入星球(https://t.zsxq.com/0dI3ZA0sL) 即可免费领取。(加入之后一定记得看置顶消息呀!)