sqoop MySQL導入數據到hive報錯
ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 64
報錯解釋:
這個錯誤表明Sqoop在嘗試導入數據到Hive時遇到了問題,導致Hive進程異常退出。狀態碼64是一個特殊的退出代碼,它表明有一些基本的配置問題或者環境問題導致Hive無法正常啟動。
24/06/25 13:07:56 INFO hive.HiveImport: OK
24/06/25 13:07:56 INFO hive.HiveImport: Time taken: 6.621 seconds
24/06/25 13:07:57 INFO hive.HiveImport: FAILED: NullPointerException null
24/06/25 13:07:58 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 64at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:389)at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:339)at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:240)at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)at org.apache.sqoop.Sqoop.run(Sqoop.java:143)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
可見報錯是因為hive.HiveImport: FAILED: NullPointerException null
查看hive的日志
cat $HIVE_HOME/logs/hive.log
2024-06-25T13:07:57,385 ?INFO [bfdcbd34-f72f-47c8-8abd-ec287b0cc884 main] ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
2024-06-25T13:07:57,558 ERROR [bfdcbd34-f72f-47c8-8abd-ec287b0cc884 main] ql.Driver: FAILED: NullPointerException null
java.lang.NullPointerExceptionat org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAccessController.getCurrentRoleNames(SQLStdHiveAccessController.java:194)at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAccessControllerWrapper.getCurrentRoleNames(SQLStdHiveAccessControllerWrapper.java:155)at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizationValidator.checkPrivileges(SQLStdHiveAuthorizationValidator.java:132)at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizationValidator.checkPrivileges(SQLStdHiveAuthorizationValidator.java:84)at org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizerImpl.checkPrivileges(HiveAuthorizerImpl.java:87)at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:974)at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:
由堆棧跟蹤,可見是授權類錯誤
查看conf/hive-site.xml
<property><name>hive.security.authorization.enabled</name><value>true</value>
</property>
<property><name>hive.server2.enable.doAs</name><value>false</value>
</property>
<!-- 指定超級管理員 -->
<property><name>hive.users.in.admin.role</name><value>ljr</value>
</property><!-- 默認字符集 -->
<property><name>default.character.set</name><value>UTF-8</value>
</property>
<!-- 默認分隔符-->
<property><name>hive.default.fileformat.magic.code</name><value>\t</value>
</property><!-- 授權類-->
<property><name>hive.security.authorization.manager</name><value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value>
</property>
<property><name>hive.security.authenticator.manager</name><value>org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator</value>
</property>
發現配置了hive.security.authorization.manage和hive.security.authenticator.manager
hive.security.authorization.manager 和 hive.security.authenticator.manager 是 Hive 配置中與安全相關的兩個重要參數。
1.hive.security.authorization.manager:
這個參數用于指定 Hive 的授權管理器。
授權管理器負責處理 Hive 中的權限檢查,例如檢查用戶是否有權限執行某個查詢或訪問某個表1。
Hive 默認可能不提供具體的授權管理器實現,但你可以通過配置此參數來指定自定義的授權管理器類,該類需要實現 Hive 提供的授權接口。配置時,你可以hive.security.authorization.manager 設置為自定義的類名,并將相關的 JAR 包放入 Hive 的類路徑下(如 $HIVE_HOME/lib/)。
2.hive.security.authenticator.manager:
這個參數用于指定 Hive 的身份驗證管理器。身份驗證管理器負責驗證連接到 Hive 的用戶的身份。
Hive 默認可能使用 Hadoop 的默認身份驗證器(如 HadoopDefaultAuthenticator),但你也可以通過配置此參數來指定自定義的身份驗證管理器類。當你需要在 Hive 中跟蹤用戶操作或實現更復雜的身份驗證邏輯時,可能需要重寫身份驗證管理器
由于配置以上兩個參數但是實際上并沒有把自定義的hive.security.authorization.manager類放到lib目錄下所以導致了NullPointerException null;
非必要指定授權管理器的話,注釋掉以上兩個參數,重啟hive的相關服務;
重試導入
[ljr@node1 sqoop]$ sqoop import --connect jdbc:mysql://node1:3306/test?zeroDateTimeBehavior=CONVERT_TO_NULL --username root --password 1234 --table student2 --split-by id --num-mappers 1 --hive-import --fields-terminated-by "," ?--hive-overwrite --hive-table test.student --null-non-string '\\N' --null-string '\\N'
?
24/06/25 14:20:08 INFO hive.HiveImport: Loading uploaded data into Hive
24/06/25 14:20:23 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
24/06/25 14:20:23 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/export/server/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
24/06/25 14:20:23 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/export/server/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
24/06/25 14:20:23 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/export/server/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
24/06/25 14:20:23 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
24/06/25 14:20:23 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
24/06/25 14:20:29 INFO hive.HiveImport:
24/06/25 14:20:29 INFO hive.HiveImport: Logging initialized using configuration in file:/export/server/hive/conf/hive-log4j2.properties Async: true
24/06/25 14:20:37 INFO hive.HiveImport: OK
24/06/25 14:20:37 INFO hive.HiveImport: Time taken: 5.598 seconds
24/06/25 14:20:39 INFO hive.HiveImport: Loading data to table test.student
24/06/25 14:20:39 INFO hive.HiveImport: Moved: 'hdfs://node1:8020/user/hive/warehouse/test.db/student/part-m-00000' to trash at: hdfs://node1:8020/user/ljr/.Trash/Current
24/06/25 14:20:45 INFO hive.HiveImport: OK
24/06/25 14:20:45 INFO hive.HiveImport: Time taken: 7.687 seconds
24/06/25 14:20:45 INFO hive.HiveImport: Hive import complete.
可見import complete
查看目標表student
數據導入成功。
以上案例可說明,正確的文件配置對環境的正常運行起著至關重要的作用,在寫入配置參數時要充分了解參數的作用。