1. 計數器應用
計數器是用來記錄job的執行進度和狀態的。MapReduce 計數器(Counter)為我們提供一個窗口,用于觀察 MapReduce Job 運行期的各種細節數據。對MapReduce性能調優很有幫助,MapReduce性能優化的評估大部分都是基于這些 Counter 的數值表現出來的。
MapReduce 自帶了許多默認Counter。在執行mr程序的日志上,大家也許注意到了類似以下這樣的信息:
Shuffle Errors
BAD_ID=0
CONNECTION=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=89
File Output Format Counters
Bytes Written=86
內置計數器包括:
文件系統計數器(File System Counters)
作業計數器(Job Counters)
MapReduce框架計數器(Map-Reduce Framework)
Shuffle 錯誤計數器(Shuffle Errors)
文件輸入格式計數器(File Output Format Counters)
文件輸出格式計數器(File Input Format Counters)
當然, Hadoop也支持自定義計數器。在實際生產代碼中,常常需要將數據處理過程中遇到的不合規數據行進行全局計數,類似這種需求可以借助mapreduce框架中提供的全局計數器來實現。
示例代碼如下:
public class WordCount{static class WordCount Mapper extends Mapper<LongWritable, Text, Text, LongWritable> {@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {Counter counter =context.getCounter(“SelfCounters”,”myCounters”);String[] words = value.toString().split(",");for (String word : words) {if("hello".equals(word)){counter.increment(1)};context.write(new Text(word), new LongWritable(1));}}}
2. 多job串聯
一個稍復雜點的處理邏輯往往需要多個mapreduce程序串聯處理,多job的串聯可以借助mapreduce框架的JobControl實現
示例代碼:
1. ControlledJob controlledJob1 = new ControlledJob(job1.getConfiguration()); 2. controlledJob1.setJob(job1); 3. ControlledJob controlledJob2 = new ControlledJob(job2.getConfiguration()); 4. controlledJob2.setJob(job2); 5. controlledJob2.addDependingJob(controlledJob1); // job2 依賴于 job16. JobControl jc = new JobControl(chainName); 7. jc.addJob(controlledJob1); 8. jc.addJob(controlledJob2); 9. Thread jcThread = new Thread(jc); 10. jcThread.start(); 11. while(true){ 12. if(jc.allFinished()){ 13. System.out.println(jc.getSuccessfulJobList()); 14. jc.stop(); 15. return 0; 16. } 17. if(jc.getFailedJobList().size() > 0){ 18. System.out.println(jc.getFailedJobList()); 19. jc.stop(); 20. return 1; 21. } 22. }
轉載于:https://blog.51cto.com/13587708/2295809