HFileOutputFormat與TotalOrderPartitioner

最近需要為一些數據增加隨機讀的功能,于是采用生成HFile再bulk load進HBase的方式。

運行的時候map很快完成,reduce在sort階段花費時間很長,reducer用的是KeyValueSortReducer而且只有一個,這就形成了單reducer全排序的瓶頸。于是就想著采用TotalOrderPartitioner使得MR Job可以有多個reducer,來提高并行度解決這個瓶頸。

于是動手寫代碼,不僅用了TotalOrderPartitioner,還使用InputSampler.RandomSampler生成分區文件。但執行時碰到問題,查資料時無意發現HFileOutputFormat內部是使用TotalOrderPartitioner來進行全排序的,

 public static void configureIncrementalLoad(Job job, HTable table)throws IOException {Configuration conf = job.getConfiguration();Class<? extends Partitioner> topClass;try {topClass = getTotalOrderPartitionerClass();} catch (ClassNotFoundException e) {throw new IOException("Failed getting TotalOrderPartitioner", e);}job.setPartitionerClass(topClass);......

?

分區文件的內容就是各region的startKey(去掉最小的),

private static void writePartitions(Configuration conf, Path partitionsPath,List<ImmutableBytesWritable> startKeys) throws IOException {if (startKeys.isEmpty()) {throw new IllegalArgumentException("No regions passed");}// We're generating a list of split points, and we don't ever// have keys < the first region (which has an empty start key)// so we need to remove it. Otherwise we would end up with an// empty reducer with index 0//沒有哪個rowkey會排在最小的startKey之前,所以去掉最小的startKeyTreeSet<ImmutableBytesWritable> sorted =new TreeSet<ImmutableBytesWritable>(startKeys);ImmutableBytesWritable first = sorted.first();//如果最小的region startKey不是“法定”的最小rowkey,那就報異常if (!first.equals(HConstants.EMPTY_BYTE_ARRAY)) {throw new IllegalArgumentException("First region of table should have empty start key. Instead has: "+ Bytes.toStringBinary(first.get()));}sorted.remove(first);// Write the actual fileFileSystem fs = partitionsPath.getFileSystem(conf);SequenceFile.Writer writer = SequenceFile.createWriter(fs,conf, partitionsPath, ImmutableBytesWritable.class, NullWritable.class);try {//寫入分區文件中 for (ImmutableBytesWritable startKey : sorted) {writer.append(startKey, NullWritable.get());}} finally {writer.close();}}

因為我的表都是新表,只有一個region, 所以肯定是只有一個reducer了。

既然如此,使用HFileOutputFormat時reducer的數量就是HTable的region數量,如果使用bluk load HFile的方式導入巨量數據,最好的辦法是在定義htable是就預先定義好各region。這種方式其實叫Pre-Creating Regions,PCR還能帶來些別的優化,比如減少split region的操作:淘寶有些優化就是應用PCR并且關閉自動split,等到系統空閑時再手動split,這樣可以保證系統繁忙時不會再被split雪上加霜。

關于Pre-Creating Regions: http://hbase.apache.org/book.html#precreate.regions

?11.7.2. Table Creation: Pre-Creating Regions Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region until it is large enough to split and become distributed across the cluster. A useful pattern to speed up the bulk import process is to pre-create empty regions. Be somewhat conservative in this, because too-many regions can actually degrade performance. There are two different approaches to pre-creating splits. The first approach is to rely on the default HBaseAdmin strategy (which is implemented in Bytes.split)...

byte[] startKey = ...;       // your lowest keuy
byte[] endKey = ...;           // your highest key
int numberOfRegions = ...;    // # of regions to create
admin.createTable(table, startKey, endKey, numberOfRegions);

And the other approach is to define the splits yourself...

byte[][] splits = ...;   // create your own splits
admin.createTable(table, splits);

?

?

?

轉載于:https://www.cnblogs.com/aprilrain/archive/2013/03/27/2985064.html

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/274309.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/274309.shtml
英文地址,請注明出處:http://en.pswp.cn/news/274309.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

qt按鈕禁用和激活禁用_為什么試探法只是經驗法則:禁用按鈕的情況

qt按鈕禁用和激活禁用Most user experience designers will be familiar with Jackob Nielsen’s 10 usability heuristics. They are widely cited and a great set of broad rules of thumb to follow when designing user interfaces.大多數用戶體驗設計師將熟悉Jackob Niel…

Teach Yourself Java 2 in 21 Days 書中樣例代碼實踐

找了好幾書JAVA的書&#xff0c;看了幾章&#xff0c;都看不下去。 我覺得適合《Teach Yourself Java 2 in 21 Days》&#xff08;Rogers Cadenhead Laura Lemay&#xff09;還是適合我的。 孫衛琴那本&#xff0c;我感覺就羅嗦多了沒到我點子上。 接口&#xff0c;抽象類這些內…

好奇心機制_好奇心問題

好奇心機制For my past two jobs I’ve posted a question every week in my team chat and learned so much about my co-workers. Give it a try! :D對于過去的兩個工作&#xff0c;我每周都會在團隊聊天中發布一個問題&#xff0c;并且對我的同事了解很多。 試試看&#xff…

20130328java基礎學習筆記-循環結構for以及for,while循環區別

1.循環結構:for講解class ForDemo{ public static void main(String[] args) { /* for(初始化表達式;循環條件表達式;循環后的操作表達式) { 執行語句;(循環體) } */ for(int x 1; x<3; x) { …

小程序設計避免犯什么錯_新設計師犯下的5種印刷錯誤以及如何避免

小程序設計避免犯什么錯Over the last year and a half, I’ve had the opportunity to teach the basics of typography to undergraduate graphic design students. During this time, I’ve noticed some common mistakes that my students make when first learning how to…

移動設備web文字單位_移動設備如何塑造現代Web設計

移動設備web文字單位I was working with a nonprofit earlier this month on redesigning their website and during the first meeting, I proposed a very standard idea: the home page needed to tell a story and guide the intended user through the intended process (…

hp-ux修改時區方法_UX研究人員可以倡導人類的6種方法

hp-ux修改時區方法In the UX world, we often hear terms like “user-centered,” “human-centered,” and “customer-centered.” We believe that in order to be innovative, we need to center experiences that are authentic, intuitive, and practical.在UX世界中&am…

2013年3月百度之星A題

偽隨機數生成器 題目描述 baidu熊最近在學習隨機算法&#xff0c;于是他決定自己做一個隨機數生成器。 這個隨機數生成器通過三個參數c, q, n作為種子, 然后它就可以通過以下方式生成偽隨機數序列&#xff1a; m0 c, mi1 (q2mi 1) mod 2n, for all i > 0. 因為一些奇怪的…

為什么張揚的人別人很討厭_為什么每個人總是討厭重新設計,即使他們很好

為什么張揚的人別人很討厭重點 (Top highlight)微處理 (Microprocessing) In Microprocessing, columnist Angela Lashbrook aims to improve your relationship with technology every week. Microprocessing goes deep on the little things that define your online life to…

轉載--C語言:浮點數在內存中的表示

單精度浮點數&#xff1a; 1位符號位 8位階碼位 23位尾數 雙精度浮點數&#xff1a; 1位符號位 8位階碼位 52位尾數 實數在內存中以規范化的浮點數存放&#xff0c;包括數符、階碼、尾數。數的精度取決于尾數的位數。比如32位機上float型為23位 double型為52位。…

學習ui設計_如果您想學習UI設計,該怎么辦

學習ui設計There is a question that is always asked when we want to learn something new.當我們想學習新東西時&#xff0c;總會問一個問題。 Where to start?從哪兒開始&#xff1f; This is also being my question when I want to learn UI design. In this article, …

Christmas

html5 game - Christmasloading......轉載于:https://www.cnblogs.com/yorhom/archive/2013/04/05/3001116.html

30個WordPress Retina(iPad)自適應主題

原文地址&#xff1a;http://www.goodfav.com/zh/retina-ready-wordpress-themes-3556.html WordPress Retina定制主題進行了優化&#xff0c;支持Retina屏幕上的高品質和清晰的圖像。如果你關心這個話題&#xff0c;又不知道這究竟是什么&#xff0c;那么請你繼續閱讀。 wordp…

Thinking in java第一章對象導論

這一章&#xff0c;做筆記感覺不是很好做。每個人又每個人對面向對象的理解。這里說一下書里的關鍵字&#xff0c;穿插一下自己的思想 面向對象的編程語言里面很流行的一句話&#xff0c;一切都是對象。面向對象的核心就是抽象&#xff0c;抽象的能力有大有小&#xff0c;是決定…

Android SlidingMenu插件的使用

1、在github上下載了源碼后 不知道如何使用&#xff0c;在折騰了一個晚上后終于弄好了 下載地址 https://github.com/jfeinstein10/SlidingMenu 下載完后&#xff0c;解壓&#xff0c;然后先import 其中的library &#xff0c;然后把項目名改為SlidingMenu 2、然后再到http…

css 字體字體圖標_CSS基礎知識:了解字體

css 字體字體圖標In this tutorial, we’ll be learning about working with fonts in CSS!在本教程中&#xff0c;我們將學習有關在CSS中使用字體的知識&#xff01; The font property is a shorthand property which can combine a number of sub-properties in a single d…

openstack quantum搭建過程中一些有用的鏈接

OpenvSwitch的概念和流程&#xff1a; http://blog.wachang.net/2013/03/openvswitch-fullbook-2-workflow-1/ OpenvSwitch的vlan模式&#xff1a; http://openvswitch.org/support/config-cookbooks/vlan-configuration-cookbook/ OpenvSwitch問答&#xff1a; http://openvsw…

mysql下載哪一代版本好_潮一代更好的設計

mysql下載哪一代版本好I think we can all agree that quarantined life has been strange. And while most of the day is comprised of the monotony of domestic life, I’ve been surprised at how much of my time is dominated by technology.我認為我們都可以同意隔離的…

預約清單ui設計_持續交付質量設計所需的UI清單

預約清單ui設計重點 (Top highlight)Over the past few months, my design team at StreetEasy has started experimenting in adding a “design buddy” check-in to the final stages of the design process.在過去的幾個月中&#xff0c;我在StreetEasy的設計團隊已開始嘗試…

黑書上的DP例題

pagesectionnotitlesubmit1131.5.1例題1括號序列POJ11411161.5.1例題2棋盤分割POJ11911171.5.1例題3決斗Sicily18221171.5.1例題4“舞蹈家”懷特先生ACM-ICPC Live Archive1191.5.1例題5積木游戲http://202.120.80.191/problem.php?problemid12441231.5.2例題1方塊消除http://…