实例介绍
                                【实例简介】
讲解HBase的实现原理,非常详细。讲解HBase的实现原理,非常详细。
thisconnection. locateRegion(table Name, HConstants EMPTY_ START ROW); this write Buffer Size this configuration. getLong ("hbase client write buffer, 2097152; this clear BufferOn Fail= true: this auto Flush true this currentWrite Buffer size=o this scannerCaching=this configuration.getInt("hbase client scanner. caching",1); this. maxKeyValueSize= this configuration, getInt(hbase client keyvalue. maxsize,-1); this closed false 实例化 HTable时会从 HConnectionManager管理兆中获取一个 COnnection实现类.使用Map来缓存 COnnection实例 Map的key是由配置文件构造成的 HConnectionKey,vaue是 COnnection实现类: HConnectionImplementation public class COnnection Manager l An LRU Map of HConnectionKey-> HConnection(TableServer). All access must be synchronized. static final Map<HConnectionKey, HConnectionlmplementation> HBASE INSTANCES public static HConnection get Connection( Configuration conf)i HConnectionKey connectionKey new HConnection Key(conf); synchronized (HBASE INSTANCES)[ HConnectionlmplementation connection = HBASE INSTANCESget(connectionKey) if (connection = null)t connection = new HConnectionlmplementation(conf, true, null); HBASE INSTANCES.put(connectionKey, connection); connection. incCounto; return connections COnnection public Configuration get Configuration o 获取 COnnection实例使用的配置对象 public HTablelnterface getTable(String tableName public HTablelnterface getTable(byte[] tableName) public HTablelnterface getTable( String tableName, Executorservice pool) public HTablelnterface getTable(byte[ tableName, Executor Service pool; 参数可以是 String或byte[]的 tablename,或可选的 ExecutorService pool连接池 返回类型 HTablelnterface有: HTable, Polledtable, Remote table lbic ZooKeeperWatcher getZeeKeeperwatchere; 获得该连接可以使用的个 ZooKeeperWrapper,进而获取RoOT-,META表信息, session信息等 public HMasterInterface getAaster0 public boolean isMaster Running 获得一个到 HMaster server的连接 public boolean isTable Enabled(byte[] table Name) public boolean isTable Disabledbytel] tableName); public boolean isTableAvailable(byte[] tableName); public HTable Descriptor[] listTableso; 获取所有的用户表,扫描META表,返回代表每个用户表的 HTableDescriptor[]数组 public HTable Descriptor getH Table Descriptor(byte[] table Name) public HTable Descriptor[] getH Table Descriptors(List<String> tableNames) 参数是byte[] tablename,获取指定表的 HTable Descriptor对象,也有参数为列表返回数组 public HRegionLocation locateRegion(final byte []table Name, final byte []row); public HRegionLocation relocateRegion(final byte [ tableName, final byte []row); public HRegionLocation locateRegion(final byte region Name) public List<HRegionLocation> locate Regions(final byte[] tableName public List<HRegionLocation> locateRegions(final byte[] table Name, final boolean use Cache, final boolean offlined) HRegionLocation getRegionLocation(byte [ tableName, byte []row, boolean reload) 根据 table name和row定位对应的 Region位置信息.没有指定row时定位的是表的 Region列表 因为张衣会分成多个 Region,指定row后,只会对应其中的个 region public HRegionInterface getHRegion Connection(final String hostname, final int port) public HRegionInterface getHRegion Connection(final String hostname, final int port, boolean getMaster) 指定host和port后,就能获取到 REgion egionServer的一个连接实例 RegionServer与 Region的关系类似 NameNode和 BlockLocation以及 Data Node和 Block的关系 RegionServer/NN/DN是真正的物理机器,通过主机名P地址和端口舭能建立到 RegionServer的连接 public <T> T getRegien Server\AAth Retries(Server Callable<T> callable public <T> T getRegionServerWithoutRetries( Servercallable<T> callable 传递一个 ServerCallable对象,在其中的cal方法可以实现自己的逻辑.with表示会重试, withou不重试 调用该方法,就能管理等待重试的过程,重新找到丢失的 region信息〔 region失效需要重新获取) public void processBatch(List<? extends Row> actions, final byte[] tableName, Executor Service pool, Object[]results ) public <R> void process BatchCallback (List<? extends Row> list, byte[] tableName, Executor Service pool, Object[] results, Batch Callback<R> callback); 批处理,动作有Put, Delete,cet.同一个 RegionServer的所有动作只形成一次RPC调用 public void clear Region Cacheo; public void clear Region Cache(final byte [] tableName); public void delete CachedRegionLocation(final HRegionLocation location ) public void prewarmRegion Cache(final byte[] tableName, final Map<HRegionInfo, HServerAddress> regions ) public void clearCaches(final String servername); 客户端会缓存 Region信息,也可以清除指定或所有的缓存信息 主要方法: 方法 说明 HTablelnterface getTable 获取HTab|e实例 HTable Descriptor gettAble Descriptor 获取表描述符 HMasterInterface setMaster e 建立到 Master server的连接 HRegionInterface getHRegionConnection建立到 Region Server的连接 HRegionLocation locateRegion 定位表某一行的 Region位置信息 T getRegien ServerWithRetries 尝试连接 Region Server并回调 ServerCallable void processBatch 批处理 HConnectionManager COnnection Manager是对 COnnection的管理类.管理动作一般有添加,删除 COnnection实例等 HConnectionlmplementation是 COnnection Manager的内部类,实现了 COnnection接口 / Encapsulates connection to zookeeper and regionservers, * static class HConnectionlmplementation implements HConnection, Closeable private final class<? extends HRegionInterface> serverlnterfaceclass;∥ HRegionServer,通过反射实例化对象 private final long pause; private final int numRetries; private final int max RPCAttempts; private final int rpcTimeout private final int prefetchRegion Limit; rivate final Object masterLock new Object(; private volatile boolean closed private volatile boolean aborted private volatile boolean resettin private volatile HMasterInterface master; HAste private volatile ZooKeeper Watcher zooKeeper; // ZooKeeper reference private volatile MasterAddressTracker master AddressTracker; //ZooKeeper-based master address tracker private volatile RootRegion Tracker rootRegion Tracker; private volatile ClusterId clustered; private final Object meta RegionLock = new Object( private final object user RegionLock new objecto private final Object resetLock new Objecto); //thread executor shared by all HTablelnterface instances created by this connection rivate volatile Executor Service batchPool null private volatile boolean cleanup Pool false private final Configuration conf private RpcEngine rpcEngine // Known region HServerAddresstoString(-> HRegionInterface(Region Server) private final Map<String, HRegionInterface> servers=new ConcurrentHash Map<String, HRegionInterface> O; private final ConcurrentHashMap<String, String> connectionLock= new ConcurrentHash Map<String, String> O /I Map of table to table HRegionLocations. The table key is made by doing a Bytes#mapKey(bytel of the tables name private final Map<Integer, SoftValueSorted Map<byte [] HRegion Location>> cachedRegionLocations-new HashMap<Integer, SoftValue SortedMap<byte [], HRegionLocation>>O l The presence of a server in the map implies it's likely that there is an entry in cachedRegionLocations that map to this server // but the absence of a server in this map guarentees that there is no entry in cache that maps to the absent server private final Set<String> cachedServers=new Hash Set<String>O: // region cache prefetch is enabled by default. this set contains all tables whose region cache prefetch are disabled private final Set< Integer> region Cache PrefetchDisabledTables new Copy On Write ArraySet<Integer>o private int ref Count private final boolean managed; /indicates whether this connection s life cycle is managed 建立到 Master的连接: public HMasterInterface getMasterO throws MasterNotRunning Exception, ZooKeeperConnectionException i if(master :=null & master isMasterRunningO)return master; / Check if we already have a good master connection ensureZookeeper Trackers ∥初始化 ZooKeeper和主节点的地址管理器 checkifBaseNodeAvailable o ∥检查节点是否可用 ServerName sn= null ∥通过地址管理器可以获取主节点的地址:主机名和端口 synchronized (this master Lock){∥加锁 if(master !=null & master isMasterRunning O)return master this master =nu for(int tries =0; this closed &8 this. master = null & tries numRetries; trieS++)f sn=masterAddress Tracker getMasterAddresso; net SocketAddress isa= new InetsocketAddress( sn. getHostname(,sn. getPortO;∥建立到 Master的连接 HMasterInterface try Master=rpcEngine getProxy(HMasterInterface class, HMasterInterface VERSION, isa, conf, rpcTimeout ) if(try Master is MasterRunningO)[ this master tryMaster; this. masterLock notify Al();释放锁 break ∥一旦建立到 Master的连接,就不需要重试.只有失败才需要重试 ∥ cannot connect to master or it is not running.seep& retry建立到 Master的连接失败,等待并重试 this. masterLock wait(ConnectionUtils getPause Time(this pause, tries)) return this. master: 建立到 REgion server的连接: HRegionInterface getHRegion Connection(final String hostname, final int port, final InetSocketAddress isa, final boolean master if (master)getMaster( HRegionIntertace server; String reName =null; if (isa=null) rsName = Addressing. createHostAndPortStr(isa. getHostName(, isa. getPortO); else rsName= Addressing. createHostAndPortStr (hostname, port); ensureZookeeperTrackerso server=this servers.get(rs Name); See if we already have a connection(common case if(server ==null)[ this connectionLock putlfAbsent(rs Name, rsName); create a unique lock for this RS(if necessary) synchronized( this. connectionLock get( SName){∥ get the RS lock同步锁,并没有对 servers进行同步 server= this servers.get(reName); lI do one more lookup in case we were stalled above if( server==nu){∥ definitely a cache miss. establish an RPC for this RegionServer缓存失效,建立到 Region server的连接 InetSocketAddress address=isa !=null? isa: new Inet SocketAddress(hostname, port); l Only create isa when we need to server= HBaseRPC waitForProxy(this rpCEngine, serverInterface Class, HRegionInterface, VERSION, address, this conf, this. maxRPCAttempts, this rpcTimeout, this rpcTimeout); this servers. put(Addressing. createHostAndPortStr(address getHostNameo, address. getPort(), server ) return serve Region定位 用户表数据不断增大时,一张表会分成多个 Region存储在多个 RegionServer上 个 Region只能对应一个 RegionServer,一个 Region Server可以存放多个 Region 客户端査询数据时,需要先联系 Zookeeper,依次访问ROoT表→META表→最后确定数据的位置 private HRegionLocation locate Region(final byte tableName, final byte [ row, boolean use Cache, boolean retry)0 ensureZookeeperTrackerso; if(Bytes. equals(tableName, HConstants ROOT_ TABLE NAMED[ Server Name servername=this. rootRegion Tracker wait RootRegionLocation(this rpcTimeout ) if(servername = null) return null; return new HRegionLocation (H RegionInto ROOT_ REGIONINFO, servername.getHostnameO, servername. getPortO) f else if (Bytes. equals(table Name, HConstants META TABLE NAME)I return locateRegionInMeta(HConstants ROOT TABLE NAME, tableName, row, useCache, metaRegionLock, retry); f else i// Region not in the cache -have to go to the meta RS return locateRegionInMeta(HConstants META TABLE NAME, tableName, row, useCache, userRegionLock, retry ) 定位一个用户表某一行的 region位置是个递归调用的过程, locateRegionInMeta会调用 locateRegion //Search one of the meta tables (-ROOT-or META )for the HRegionLocation info that contains the table and row were seeking private HRegionLocation locate RegionIn Meta(final byte[] parentTable, final byte [ tableName, final byte [ row, boolean use Cache, Object regionLockObject, boolean retry )throws IOException i HRegionLocation location: if (useCache)[ // If we are supposed to be using the cache, look in the cache to see if we already have the region location=getCached Location(tableName, row ) if (location != nullreturn locatio int localNumRetries retry numRetries: 1; build the key of the meta region we should be looking for l the extra g's on the end are necessary to allow" exact" matches without knowing the precise region names byte [] metakey-HRegionlnfo createRegion Name(tableName, row, HConstants NINES, false); for(int tries= 0; true; tries++)i if(tries >=localNumRetries)throw new NoServer For Region Exception("Unable to find region after"+ numRetries +"tries ") HRegionLocation metaLocation =null; metaLocation=locate Region(parentTable, metaKey, true, false); //locate the root or meta region if(metaLocation - null)continue; //If null still, go around again RegionInterface server=getHRegion Connection(metaLocation getHostnameo, metaLocation get PortO); Result regionInforow= null //This block guards against two threads trying to load the meta region at the same time. /The first will load the meta region and the second will use the value that the first one found synchronized (regionLockobject)I l Check the cache again for a hit in case some other thread made the same query while we were waiting on the lock if(use Cache)i location= get CachedLocation(table Name, row ) if (location ! nullreturn location // If the parent table is META, we may want to pre- fetch some region info into the global region cache for this table. if(Bytes. equals(parentTable, HConstants META TABLE NAME)&&(getRegionCachePrefetch(tableNameD)L prefetchRegion Cache(tableName, row ) location= getCachedLocation(tableName, row ) if (location ! null)return location felse[ / If we are not supposed to be using the cache, delete any existing cached location so it won't interfere delete Cached Location(table Name, row ) l Query the root or meta region for the location of the meta region regionInfoRow= server getclosestRowBefore( metaLocation, getRegionInfo( getRegion NameO, metaKey, HConstants CATALOG FAMILY); if (regionInfo Row = null) throw new Table Not Found Exception(Bytes. toString(tableName)); byte [ value =regionInfoRow. getValue(HConstants CATALOG FAMILY, HConstants REGIONINFO_ QUALIFIER); l convert the row result into the HRegionLocation we need IRegionInfo regionInfo =(HRegionInfo)Writable. getWritable(value, new HRegionInfoO) value regionInfoRow.getValue(HConstants CATALOG FAMILY, HConstants SERVER QUALIFIER); String hostAndPort=Bytes. toString(value); String hostname= Addressing-parseHostname (hostAnd Port); int port=Addressing, parse Port(hostAndPort); location new HRegionLocation(regionInfo, hostname port ); // Instantiate the location cacheLocation(tableName, location); return location: 这个方法中涉及到了很多知识点: 1.-ROOT,META也被当做 Table,表结构包括row-key,列族名称为info,列名有 regioninfo和 server等 2. HRegionLocation由 HRegionInfo和 Region server的host+pot组成.以及 HRegionlnfo的组成 3. HRegionLocation对应了上面表结构.其中 HRegionInfo对应info: regioninfo, host+port对应info: server 4.获取缓存 get CachedLocation和设置缓存 cachelocation 5.给定表的某一行 row-key,如何找出离指定行最近的行 Region Server. getclosestRow Before 6.递归调用 N△ region ame HRegionInfo table Name, startKey, regionld HRegionLocation hosf+port row-key nfo history regioninfo server serverstartcode tableName, startKey, regionldTs startKey, endKey, family list.: host+port META, tablea,001,1222989,195856 RS1 tablea,001,1223434 RS2 上面的表结构(ROOT,META)的 row-key实际上是 HRegionInto的 regionName 递归调用对应了查找 Region的过程: private HRegionLocation locate Region(final byte tableName, final byte []row, boolean use Cache, boolean retry )t if (Bytes. equals(tableName, "-ROOT- )i return new HRegionLocation(HRegionInfo ROOT_REGIONINFO, servername. getHostnameO, servername.getPortO); ( 3) 3 else if (Bytes. equals(table Name "META. "))O return locateRegionIn Meta("-ROOT-",table Name, row, use Cache, metaRegionLock, retry); 3 else i//Region not in the cache- have to go to the meta RS return locateRegionIn Meta("META. " tableName, row, use Cache, userRegion Lock, retry)( private HRegionLocation locate RegionIn Meta(final byte [l parentTable, final byte [ tableName, fir al byte [l row,C byte[] metaKey= HRegionInfo createRegion Name(tableName, row, HConstants. NINES, false); HRegionLocation metaLocation locateRegion(parentTable, metaKey, true, false); /locate the root or meta region HRegionInterface server =getH Region Connection(metaLocation-getHostnameO, metaLocation getPortO) Result regionlnfoRow-server. getClosestRow Before(metaLocationgetRegionInfoO-getRegion Nar he(), metakey, "info/); byte [] value =regionInfoRowgetValue("info", "regioninfo");-info: regioninfo-HRegioninto HRegionInfo regionInfo =(HRegionInfo)Writables getWritable(value, new HRegionInfoo); value regionInfoRow getValue("info","server); info: server-+Region Servers host+port location= new HRegionLocation(regionInfo, hostname, port); //Instantiate the location return location (5④ 假设客户端要査询 tablea,row-oo1的 HRegionLocation: locateRegion ("tableA","row-001") locateRegionInMeta("META. ","tableA","row-001") locateRegion ("META. ","tableA, row-001, 99999) locateRegionInMeta("-ROOT-","META. ","tableA,row-0o1,99999 ②2 locateRegion ("-ROOT","META, tableA, row-001, 99999, 99999) return new HRegionLocation(new HRegionInfo(o, "ROOT.③ host+ port HRegionServer servergetRow("META, tableA, row-001, 99999,99999") Result→ info regionInfo→ HRegionlnfo Result info: server RegionServer's host + port return new HRegionLoation (HRegionInfo, host, port) 在ROoT表中 row-key-="META, tableA row0999g8gow的 RegionServer U1(4 返回 RegionServer U,对应1的host+port→ server. getRow" tableA row:001999y 在U的META表中查找 row-key=" tableA,row-oc099ow的 RegionServer U(5 返回 Region Server U(6 最后我们就可以到U2的用户表 tablea中查找 row-key="owoo"的Row Table→ RegionServer→ Region tableA regione tableA: keys[oo1-234) keys[oo1-234) rs U1 keys[234-456) region1 tableN: keys[oo1-18 9) keys[456-…) rs u2 region2 tableA: keys 234-456) tablen keys[o01-189) keys189-255) rs 03 (region3 tableN: keys[ 189-255) Table,Region Server(Regions) RegionServer U1 regiont tableA regione keys[o01-234) keys{234-456) region region keys[456-…) Region Server U2 region2 tablen keyS[01-189) keys[189-255) RegionServer U3 gion Table(RegionServer- Region) tableA tableA Regionserver U1 U1: region(row-0O1 regione roW-001, row-002, U2: region2(row234,… RegionServer U2 region4(row456,… region2 row- 234, row-235, region5(row789,… tableN tablen Region Server U1 U1: region1(row-001 gion rOw-001, row-002, CU3: region3 (row189. 【实例截图】
【核心代码】
 
                            
                        讲解HBase的实现原理,非常详细。讲解HBase的实现原理,非常详细。
thisconnection. locateRegion(table Name, HConstants EMPTY_ START ROW); this write Buffer Size this configuration. getLong ("hbase client write buffer, 2097152; this clear BufferOn Fail= true: this auto Flush true this currentWrite Buffer size=o this scannerCaching=this configuration.getInt("hbase client scanner. caching",1); this. maxKeyValueSize= this configuration, getInt(hbase client keyvalue. maxsize,-1); this closed false 实例化 HTable时会从 HConnectionManager管理兆中获取一个 COnnection实现类.使用Map来缓存 COnnection实例 Map的key是由配置文件构造成的 HConnectionKey,vaue是 COnnection实现类: HConnectionImplementation public class COnnection Manager l An LRU Map of HConnectionKey-> HConnection(TableServer). All access must be synchronized. static final Map<HConnectionKey, HConnectionlmplementation> HBASE INSTANCES public static HConnection get Connection( Configuration conf)i HConnectionKey connectionKey new HConnection Key(conf); synchronized (HBASE INSTANCES)[ HConnectionlmplementation connection = HBASE INSTANCESget(connectionKey) if (connection = null)t connection = new HConnectionlmplementation(conf, true, null); HBASE INSTANCES.put(connectionKey, connection); connection. incCounto; return connections COnnection public Configuration get Configuration o 获取 COnnection实例使用的配置对象 public HTablelnterface getTable(String tableName public HTablelnterface getTable(byte[] tableName) public HTablelnterface getTable( String tableName, Executorservice pool) public HTablelnterface getTable(byte[ tableName, Executor Service pool; 参数可以是 String或byte[]的 tablename,或可选的 ExecutorService pool连接池 返回类型 HTablelnterface有: HTable, Polledtable, Remote table lbic ZooKeeperWatcher getZeeKeeperwatchere; 获得该连接可以使用的个 ZooKeeperWrapper,进而获取RoOT-,META表信息, session信息等 public HMasterInterface getAaster0 public boolean isMaster Running 获得一个到 HMaster server的连接 public boolean isTable Enabled(byte[] table Name) public boolean isTable Disabledbytel] tableName); public boolean isTableAvailable(byte[] tableName); public HTable Descriptor[] listTableso; 获取所有的用户表,扫描META表,返回代表每个用户表的 HTableDescriptor[]数组 public HTable Descriptor getH Table Descriptor(byte[] table Name) public HTable Descriptor[] getH Table Descriptors(List<String> tableNames) 参数是byte[] tablename,获取指定表的 HTable Descriptor对象,也有参数为列表返回数组 public HRegionLocation locateRegion(final byte []table Name, final byte []row); public HRegionLocation relocateRegion(final byte [ tableName, final byte []row); public HRegionLocation locateRegion(final byte region Name) public List<HRegionLocation> locate Regions(final byte[] tableName public List<HRegionLocation> locateRegions(final byte[] table Name, final boolean use Cache, final boolean offlined) HRegionLocation getRegionLocation(byte [ tableName, byte []row, boolean reload) 根据 table name和row定位对应的 Region位置信息.没有指定row时定位的是表的 Region列表 因为张衣会分成多个 Region,指定row后,只会对应其中的个 region public HRegionInterface getHRegion Connection(final String hostname, final int port) public HRegionInterface getHRegion Connection(final String hostname, final int port, boolean getMaster) 指定host和port后,就能获取到 REgion egionServer的一个连接实例 RegionServer与 Region的关系类似 NameNode和 BlockLocation以及 Data Node和 Block的关系 RegionServer/NN/DN是真正的物理机器,通过主机名P地址和端口舭能建立到 RegionServer的连接 public <T> T getRegien Server\AAth Retries(Server Callable<T> callable public <T> T getRegionServerWithoutRetries( Servercallable<T> callable 传递一个 ServerCallable对象,在其中的cal方法可以实现自己的逻辑.with表示会重试, withou不重试 调用该方法,就能管理等待重试的过程,重新找到丢失的 region信息〔 region失效需要重新获取) public void processBatch(List<? extends Row> actions, final byte[] tableName, Executor Service pool, Object[]results ) public <R> void process BatchCallback (List<? extends Row> list, byte[] tableName, Executor Service pool, Object[] results, Batch Callback<R> callback); 批处理,动作有Put, Delete,cet.同一个 RegionServer的所有动作只形成一次RPC调用 public void clear Region Cacheo; public void clear Region Cache(final byte [] tableName); public void delete CachedRegionLocation(final HRegionLocation location ) public void prewarmRegion Cache(final byte[] tableName, final Map<HRegionInfo, HServerAddress> regions ) public void clearCaches(final String servername); 客户端会缓存 Region信息,也可以清除指定或所有的缓存信息 主要方法: 方法 说明 HTablelnterface getTable 获取HTab|e实例 HTable Descriptor gettAble Descriptor 获取表描述符 HMasterInterface setMaster e 建立到 Master server的连接 HRegionInterface getHRegionConnection建立到 Region Server的连接 HRegionLocation locateRegion 定位表某一行的 Region位置信息 T getRegien ServerWithRetries 尝试连接 Region Server并回调 ServerCallable void processBatch 批处理 HConnectionManager COnnection Manager是对 COnnection的管理类.管理动作一般有添加,删除 COnnection实例等 HConnectionlmplementation是 COnnection Manager的内部类,实现了 COnnection接口 / Encapsulates connection to zookeeper and regionservers, * static class HConnectionlmplementation implements HConnection, Closeable private final class<? extends HRegionInterface> serverlnterfaceclass;∥ HRegionServer,通过反射实例化对象 private final long pause; private final int numRetries; private final int max RPCAttempts; private final int rpcTimeout private final int prefetchRegion Limit; rivate final Object masterLock new Object(; private volatile boolean closed private volatile boolean aborted private volatile boolean resettin private volatile HMasterInterface master; HAste private volatile ZooKeeper Watcher zooKeeper; // ZooKeeper reference private volatile MasterAddressTracker master AddressTracker; //ZooKeeper-based master address tracker private volatile RootRegion Tracker rootRegion Tracker; private volatile ClusterId clustered; private final Object meta RegionLock = new Object( private final object user RegionLock new objecto private final Object resetLock new Objecto); //thread executor shared by all HTablelnterface instances created by this connection rivate volatile Executor Service batchPool null private volatile boolean cleanup Pool false private final Configuration conf private RpcEngine rpcEngine // Known region HServerAddresstoString(-> HRegionInterface(Region Server) private final Map<String, HRegionInterface> servers=new ConcurrentHash Map<String, HRegionInterface> O; private final ConcurrentHashMap<String, String> connectionLock= new ConcurrentHash Map<String, String> O /I Map of table to table HRegionLocations. The table key is made by doing a Bytes#mapKey(bytel of the tables name private final Map<Integer, SoftValueSorted Map<byte [] HRegion Location>> cachedRegionLocations-new HashMap<Integer, SoftValue SortedMap<byte [], HRegionLocation>>O l The presence of a server in the map implies it's likely that there is an entry in cachedRegionLocations that map to this server // but the absence of a server in this map guarentees that there is no entry in cache that maps to the absent server private final Set<String> cachedServers=new Hash Set<String>O: // region cache prefetch is enabled by default. this set contains all tables whose region cache prefetch are disabled private final Set< Integer> region Cache PrefetchDisabledTables new Copy On Write ArraySet<Integer>o private int ref Count private final boolean managed; /indicates whether this connection s life cycle is managed 建立到 Master的连接: public HMasterInterface getMasterO throws MasterNotRunning Exception, ZooKeeperConnectionException i if(master :=null & master isMasterRunningO)return master; / Check if we already have a good master connection ensureZookeeper Trackers ∥初始化 ZooKeeper和主节点的地址管理器 checkifBaseNodeAvailable o ∥检查节点是否可用 ServerName sn= null ∥通过地址管理器可以获取主节点的地址:主机名和端口 synchronized (this master Lock){∥加锁 if(master !=null & master isMasterRunning O)return master this master =nu for(int tries =0; this closed &8 this. master = null & tries numRetries; trieS++)f sn=masterAddress Tracker getMasterAddresso; net SocketAddress isa= new InetsocketAddress( sn. getHostname(,sn. getPortO;∥建立到 Master的连接 HMasterInterface try Master=rpcEngine getProxy(HMasterInterface class, HMasterInterface VERSION, isa, conf, rpcTimeout ) if(try Master is MasterRunningO)[ this master tryMaster; this. masterLock notify Al();释放锁 break ∥一旦建立到 Master的连接,就不需要重试.只有失败才需要重试 ∥ cannot connect to master or it is not running.seep& retry建立到 Master的连接失败,等待并重试 this. masterLock wait(ConnectionUtils getPause Time(this pause, tries)) return this. master: 建立到 REgion server的连接: HRegionInterface getHRegion Connection(final String hostname, final int port, final InetSocketAddress isa, final boolean master if (master)getMaster( HRegionIntertace server; String reName =null; if (isa=null) rsName = Addressing. createHostAndPortStr(isa. getHostName(, isa. getPortO); else rsName= Addressing. createHostAndPortStr (hostname, port); ensureZookeeperTrackerso server=this servers.get(rs Name); See if we already have a connection(common case if(server ==null)[ this connectionLock putlfAbsent(rs Name, rsName); create a unique lock for this RS(if necessary) synchronized( this. connectionLock get( SName){∥ get the RS lock同步锁,并没有对 servers进行同步 server= this servers.get(reName); lI do one more lookup in case we were stalled above if( server==nu){∥ definitely a cache miss. establish an RPC for this RegionServer缓存失效,建立到 Region server的连接 InetSocketAddress address=isa !=null? isa: new Inet SocketAddress(hostname, port); l Only create isa when we need to server= HBaseRPC waitForProxy(this rpCEngine, serverInterface Class, HRegionInterface, VERSION, address, this conf, this. maxRPCAttempts, this rpcTimeout, this rpcTimeout); this servers. put(Addressing. createHostAndPortStr(address getHostNameo, address. getPort(), server ) return serve Region定位 用户表数据不断增大时,一张表会分成多个 Region存储在多个 RegionServer上 个 Region只能对应一个 RegionServer,一个 Region Server可以存放多个 Region 客户端査询数据时,需要先联系 Zookeeper,依次访问ROoT表→META表→最后确定数据的位置 private HRegionLocation locate Region(final byte tableName, final byte [ row, boolean use Cache, boolean retry)0 ensureZookeeperTrackerso; if(Bytes. equals(tableName, HConstants ROOT_ TABLE NAMED[ Server Name servername=this. rootRegion Tracker wait RootRegionLocation(this rpcTimeout ) if(servername = null) return null; return new HRegionLocation (H RegionInto ROOT_ REGIONINFO, servername.getHostnameO, servername. getPortO) f else if (Bytes. equals(table Name, HConstants META TABLE NAME)I return locateRegionInMeta(HConstants ROOT TABLE NAME, tableName, row, useCache, metaRegionLock, retry); f else i// Region not in the cache -have to go to the meta RS return locateRegionInMeta(HConstants META TABLE NAME, tableName, row, useCache, userRegionLock, retry ) 定位一个用户表某一行的 region位置是个递归调用的过程, locateRegionInMeta会调用 locateRegion //Search one of the meta tables (-ROOT-or META )for the HRegionLocation info that contains the table and row were seeking private HRegionLocation locate RegionIn Meta(final byte[] parentTable, final byte [ tableName, final byte [ row, boolean use Cache, Object regionLockObject, boolean retry )throws IOException i HRegionLocation location: if (useCache)[ // If we are supposed to be using the cache, look in the cache to see if we already have the region location=getCached Location(tableName, row ) if (location != nullreturn locatio int localNumRetries retry numRetries: 1; build the key of the meta region we should be looking for l the extra g's on the end are necessary to allow" exact" matches without knowing the precise region names byte [] metakey-HRegionlnfo createRegion Name(tableName, row, HConstants NINES, false); for(int tries= 0; true; tries++)i if(tries >=localNumRetries)throw new NoServer For Region Exception("Unable to find region after"+ numRetries +"tries ") HRegionLocation metaLocation =null; metaLocation=locate Region(parentTable, metaKey, true, false); //locate the root or meta region if(metaLocation - null)continue; //If null still, go around again RegionInterface server=getHRegion Connection(metaLocation getHostnameo, metaLocation get PortO); Result regionInforow= null //This block guards against two threads trying to load the meta region at the same time. /The first will load the meta region and the second will use the value that the first one found synchronized (regionLockobject)I l Check the cache again for a hit in case some other thread made the same query while we were waiting on the lock if(use Cache)i location= get CachedLocation(table Name, row ) if (location ! nullreturn location // If the parent table is META, we may want to pre- fetch some region info into the global region cache for this table. if(Bytes. equals(parentTable, HConstants META TABLE NAME)&&(getRegionCachePrefetch(tableNameD)L prefetchRegion Cache(tableName, row ) location= getCachedLocation(tableName, row ) if (location ! null)return location felse[ / If we are not supposed to be using the cache, delete any existing cached location so it won't interfere delete Cached Location(table Name, row ) l Query the root or meta region for the location of the meta region regionInfoRow= server getclosestRowBefore( metaLocation, getRegionInfo( getRegion NameO, metaKey, HConstants CATALOG FAMILY); if (regionInfo Row = null) throw new Table Not Found Exception(Bytes. toString(tableName)); byte [ value =regionInfoRow. getValue(HConstants CATALOG FAMILY, HConstants REGIONINFO_ QUALIFIER); l convert the row result into the HRegionLocation we need IRegionInfo regionInfo =(HRegionInfo)Writable. getWritable(value, new HRegionInfoO) value regionInfoRow.getValue(HConstants CATALOG FAMILY, HConstants SERVER QUALIFIER); String hostAndPort=Bytes. toString(value); String hostname= Addressing-parseHostname (hostAnd Port); int port=Addressing, parse Port(hostAndPort); location new HRegionLocation(regionInfo, hostname port ); // Instantiate the location cacheLocation(tableName, location); return location: 这个方法中涉及到了很多知识点: 1.-ROOT,META也被当做 Table,表结构包括row-key,列族名称为info,列名有 regioninfo和 server等 2. HRegionLocation由 HRegionInfo和 Region server的host+pot组成.以及 HRegionlnfo的组成 3. HRegionLocation对应了上面表结构.其中 HRegionInfo对应info: regioninfo, host+port对应info: server 4.获取缓存 get CachedLocation和设置缓存 cachelocation 5.给定表的某一行 row-key,如何找出离指定行最近的行 Region Server. getclosestRow Before 6.递归调用 N△ region ame HRegionInfo table Name, startKey, regionld HRegionLocation hosf+port row-key nfo history regioninfo server serverstartcode tableName, startKey, regionldTs startKey, endKey, family list.: host+port META, tablea,001,1222989,195856 RS1 tablea,001,1223434 RS2 上面的表结构(ROOT,META)的 row-key实际上是 HRegionInto的 regionName 递归调用对应了查找 Region的过程: private HRegionLocation locate Region(final byte tableName, final byte []row, boolean use Cache, boolean retry )t if (Bytes. equals(tableName, "-ROOT- )i return new HRegionLocation(HRegionInfo ROOT_REGIONINFO, servername. getHostnameO, servername.getPortO); ( 3) 3 else if (Bytes. equals(table Name "META. "))O return locateRegionIn Meta("-ROOT-",table Name, row, use Cache, metaRegionLock, retry); 3 else i//Region not in the cache- have to go to the meta RS return locateRegionIn Meta("META. " tableName, row, use Cache, userRegion Lock, retry)( private HRegionLocation locate RegionIn Meta(final byte [l parentTable, final byte [ tableName, fir al byte [l row,C byte[] metaKey= HRegionInfo createRegion Name(tableName, row, HConstants. NINES, false); HRegionLocation metaLocation locateRegion(parentTable, metaKey, true, false); /locate the root or meta region HRegionInterface server =getH Region Connection(metaLocation-getHostnameO, metaLocation getPortO) Result regionlnfoRow-server. getClosestRow Before(metaLocationgetRegionInfoO-getRegion Nar he(), metakey, "info/); byte [] value =regionInfoRowgetValue("info", "regioninfo");-info: regioninfo-HRegioninto HRegionInfo regionInfo =(HRegionInfo)Writables getWritable(value, new HRegionInfoo); value regionInfoRow getValue("info","server); info: server-+Region Servers host+port location= new HRegionLocation(regionInfo, hostname, port); //Instantiate the location return location (5④ 假设客户端要査询 tablea,row-oo1的 HRegionLocation: locateRegion ("tableA","row-001") locateRegionInMeta("META. ","tableA","row-001") locateRegion ("META. ","tableA, row-001, 99999) locateRegionInMeta("-ROOT-","META. ","tableA,row-0o1,99999 ②2 locateRegion ("-ROOT","META, tableA, row-001, 99999, 99999) return new HRegionLocation(new HRegionInfo(o, "ROOT.③ host+ port HRegionServer servergetRow("META, tableA, row-001, 99999,99999") Result→ info regionInfo→ HRegionlnfo Result info: server RegionServer's host + port return new HRegionLoation (HRegionInfo, host, port) 在ROoT表中 row-key-="META, tableA row0999g8gow的 RegionServer U1(4 返回 RegionServer U,对应1的host+port→ server. getRow" tableA row:001999y 在U的META表中查找 row-key=" tableA,row-oc099ow的 RegionServer U(5 返回 Region Server U(6 最后我们就可以到U2的用户表 tablea中查找 row-key="owoo"的Row Table→ RegionServer→ Region tableA regione tableA: keys[oo1-234) keys[oo1-234) rs U1 keys[234-456) region1 tableN: keys[oo1-18 9) keys[456-…) rs u2 region2 tableA: keys 234-456) tablen keys[o01-189) keys189-255) rs 03 (region3 tableN: keys[ 189-255) Table,Region Server(Regions) RegionServer U1 regiont tableA regione keys[o01-234) keys{234-456) region region keys[456-…) Region Server U2 region2 tablen keyS[01-189) keys[189-255) RegionServer U3 gion Table(RegionServer- Region) tableA tableA Regionserver U1 U1: region(row-0O1 regione roW-001, row-002, U2: region2(row234,… RegionServer U2 region4(row456,… region2 row- 234, row-235, region5(row789,… tableN tablen Region Server U1 U1: region1(row-001 gion rOw-001, row-002, CU3: region3 (row189. 【实例截图】
【核心代码】
标签:
                            好例子网口号:伸出你的我的手 — 分享!
                            
                            
                            
                            
                            
                            
                        
                        相关软件
小贴士
感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。
- 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
- 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
- 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
- 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。
关于好例子网
本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明
 
                 
            

网友评论
我要评论