【前言】
本篇博客主要描述一下在開發過程中遇到的scan的超時問題。
【問題描述】
剛剛完成了對索引表的定義和建議,并且在單元測試中對該表進行插入和掃描時均未發現錯誤。但是在對該表進行整體更新時,需要在掃描weather表的過程中對該表進行不斷的更新操作。但是發現每次更新到第100條數據的時候就報scan的超時錯誤。即使只更新一行數據中的某一列也是如此(只獲取區塊首的時候掃描量會大大下降),于是證明不是掃描量的問題。具體報錯如下
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hbase.client.ScannerTimeoutException: 1255382ms passed since the last invocation, timeout is currently set to 60000at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:97)at com.zxc.fox.dao.IndexDao.updateAll(IndexDao.java:118)at com.zxc.fox.dao.IndexDao.main(IndexDao.java:38) Caused by: org.apache.hadoop.hbase.client.ScannerTimeoutException: 1255382ms passed since the last invocation, timeout is currently set to 60000at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:417)at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:332)at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:94)... 2 more Caused by: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 533, already closed?at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2017)at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)at java.lang.Thread.run(Thread.java:745)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:525)at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:313)at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:241)at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:310)at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:291)at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)at java.util.concurrent.FutureTask.run(FutureTask.java:166)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.UnknownScannerException): org.apache.hadoop.hbase.UnknownScannerException: Name: 533, already closed?at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2017)at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)at java.lang.Thread.run(Thread.java:745)at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1199)at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300)at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:31889)at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:200)... 9 more
【解決過程】
先查看了datanode的錯誤日志,然后參考了如下兩篇博客:參考博客01,參考博客02,在代碼中配置了超時時間,未成功。修改了hadoop配置文件,但是沒有起作用。仔細思考了問題出錯的原因,比對了之前自己寫的一些方法,最后發現我嵌套了兩個scan,于是修改此部分代碼,先用一個scan得到所有的城市id之后保存在一個list里,在遍歷這個list來代替之前的scan,這樣做在實踐中沒有出現明顯的時間消耗,但是卻避免了scan超時問題。
【體會】
出現該問題可能是由多種原因造成的,先檢查一下。可以從如下幾方面考慮
1. 是否你自己每次的scan處理較耗時? -> ?優化處理程序,scan一些設置調優(比如setBlockCache(false) )
2. 是否每次scan的caching設置過大? ?-> ?減少caching (一般默認先設100)
3. 是否是網絡或機器負載問題? ? ?->? 查看集群原因
4. 是否HBase本身負載問題? ? ? -> ? 查看RegionServer日志