執行Hive語句運行MapReduce程序時突然出現這樣的異常:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:set mapred.reduce.tasks=<number>
Starting Job = job_1510149586421_0024, Tracking URL = http://shizhan:8088/proxy/application_1510149586421_0024/
Kill Command = /root/apps/hadoop-2.6.4/bin/hadoop job -Dmapred.job.tracker=ignorethis -kill job_1510149586421_0024
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-11-08 22:55:28,184 Stage-1 map = 0%, reduce = 0%
2017-11-08 22:55:37,507 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:38,539 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:39,564 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:40,588 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:41,615 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:42,644 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:43,679 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:44,709 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:45,737 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:46,770 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:47,794 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:48,823 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.4 sec
2017-11-08 22:55:49,855 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.4 sec
MapReduce Total cumulative CPU time: 1 seconds 400 msec
Ended Job = job_1510149586421_0024 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1510149586421_0024_m_000000 (and more) from job job_1510149586421_0024
Exception in thread "Thread-414" java.lang.IllegalArgumentException: Does not contain a valid host:port authority: ignorethisat org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212)at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)at org.apache.hadoop.hive.ql.exec.JobTrackerURLResolver.getURL(JobTrackerURLResolver.java:42)at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:209)at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:92)at java.lang.Thread.run(Thread.java:744)
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.4 sec HDFS Read: 1125 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 1 seconds 400 msec
hive> date -s '2017-11-8 22:58:00'> ;
FAILED: Parse Error: line 1:0 cannot recognize input near 'date' '-' 's'
平時怎么操作都沒問題,今天突然就拋出這樣的異常,也有點讓人摸不著頭腦。重點是執行某些簡單的查詢語句時沒有任何問題,一旦查詢語句復雜點,就會拋出這樣的異常。根據拋出的異常信息發現不了什么,于是就去查看hadoop打印的log日志,發現了問題的原因。
關鍵日志如下:
2017-11-08 22:54:52,543 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1510149586421_0023
2017-11-08 22:54:52,545 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1510149586421_0023 failed 2 times due to Error launching appattempt_1510149586421_0023_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1510181693928 found 1510153492530
Note: System times on machines may be out of sync. Check system time and time zones.at sun.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:251)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:744)
. Failing the application.
2017-11-08 22:54:52,545 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1510149586421_0023 State change from FINAL_SAVING to FAILED
2017-11-08 22:54:52,545 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1510149586421_0023 failed 2 times due to Error launching appattempt_1510149586421_0023_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1510181693928 found 1510153492530
Note: System times on machines may be out of sync. Check system time and time zones.at sun.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:251)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:744)
. Failing the application. APPID=application_1510149586421_0023
2017-11-08 22:54:52,545 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary:
可以看到這樣的一行:Note: System times on machines may be out of sync. Check system time and time zones.
原來是節點間的時間不同步。。。
這樣也能解釋下為什么有些簡單的語句可以執行,有些復雜的卻執行不了。
原因就是當我執行簡單語句時,maptask和reducetask都在同一個節點上執行,不會有時間不同步問題;而當我執行復雜的語句時,也許有一個或多個reducetask,如果是多個的話,那么多個reducetask分布在不同的節點上,時間就會不同步,最終寫出數據到文件時就會有時間差異,所以才會報錯。如果仍是一個,那這一個reducetask肯定在另一個節點上,時間不同步,才會拋出異常。(比如我reduce的時間要早于map的時間)
但是從最初的異常信息來看,我個人認為是在最終將reduce的結果輸出到文件的這個階段出現了異常(reduce階段已經達到100%)。當然,如果想確認在那一塊出現了問題,還得去查看源碼。