“HBase”

fujohnwang

2011-12-20

HBase的前世今生

Born from the idea of Google Bigtable

HBase的配置

依赖前提

Java 6 or above
SSH - using passwordless login (Google “ssh passwordless login”)
DNS
NTP
ulimit and nproc

base需要很多文件句柄，所以系统默认的文件句柄数基本上不够，一般需要设置为1w以上
“upping the file descriptors and nproc for the user who is running the HBase process is an operating system configuration, not an HBase configuration.”

分布式部署

Hbase的配置需要在所有结点之间同步，可以在一台机器上编辑完，然后用rsync同步，或者，专门搞一台发布机，集中管理配置，然后分发（puppet之类的软件应该是干这个事情的）。

配置hbase的时候，最基本的需要通过override默认的配置，告知hbase：

使用什么FileSystem，是local的还是HDFS等；
要使用的zookeeper部署位置是什么；

Cautions

HBase will lose data unless it is running on an HDFS that has a durable sync.

HBase Replication

8.6.4. Write Ahead Log (WAL)
    The WAL is in HDFS in /hbase/.logs/ with subdirectories per region.

应该需要了解region server在zk里的ephemeral node，以便在region server failover之后，eromanga也可以转到新的region server上从新的hlog开始抓取变更。

API - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements

Once a HRegionServer starts and is opening the regions it hosts it checks if there are some left over log files and applies those all the way down in Store.doReconstructionLog(). Replaying a log is simply done by reading the log and adding the contained edits to the current MemStore. At the end an explicit flush of the MemStore (note, this is not the flush of the log!) helps writing those changes out to disk.

|—-HLog Replay