結構在堆棧和堆上都非常有用。 據我所知,不可能在Java堆棧上模擬此功能。 無法在堆棧上執行此操作真是令人遺憾,因為它極大地限制了某些并行算法的性能,但這又是另一回事。
在Java中,所有用戶定義的類型都必須存在于堆中。 在一般情況下,Java堆由垃圾收集器管理,但是Java進程中的堆更大。 通過引入直接ByteBuffer ,可以分配內存,垃圾回收器不會跟蹤該內存,因為本機代碼可將其用于任務,例如避免為IO向內核復制數據或從內核復制數據。 因此,一種管理結構的方法是在ByteBuffer中偽造它們,這是一種合理的方法。 這可以允許緊湊的數據表示,但是具有性能和大小限制。 例如,不可能有一個大于2GB的ByteBuffer,并且所有訪問都經過邊界檢查,這會影響性能。 存在使用Unsafe的替代方法,它不僅速度更快而且不受ByteBuffer的大小限制。
我要詳細介紹的方法不是傳統的Java。 如果您的問題空間正在處理大數據或極高的性能,那么就會有好處。 如果您的數據集很小,并且性能不是問題,那么請立即逃避以避免陷入本機內存管理的黑手黨。
我將詳細介紹的方法的好處是:
- 顯著改善的性能
- 更緊湊的數據表示
- 能夠處理非常大的數據集,同時避免了令人討厭的GC暫停[1]
各種選擇都有其后果。 通過采用下面詳述的方法,您將自己負責一些內存管理。 弄錯了會導致內存泄漏,或者更糟的是,您可能使JVM崩潰! 謹慎行事...
合適的例子– 貿易數據
財務應用程序面臨的一個共同挑戰是捕獲和處理大量的訂單和交易數據。 對于該示例,我將創建一個很大的內存貿易數據表,可以對它運行分析查詢。 該表將使用兩種對比方法構建。 首先,我將采用傳統的Java方法創建大型數組并引用單個Trade對象。 其次,我保持用法代碼相同,但是用可通過Flyweight模式操作的堆外結構數組替換大數組和Trade對象。
如果對于傳統的Java方法,我使用其他一些數據結構(例如Map或Tree),則內存占用量將更大,而性能會更低。
傳統的Java方法
public class TestJavaMemoryLayout
{private static final int NUM_RECORDS = 50 * 1000 * 1000;private static JavaMemoryTrade[] trades;public static void main(final String[] args){for (int i = 0; i < 5; i++){System.gc();perfRun(i);}}private static void perfRun(final int runNum){long start = System.currentTimeMillis();init();System.out.format('Memory %,d total, %,d free\n',Runtime.getRuntime().totalMemory(),Runtime.getRuntime().freeMemory());long buyCost = 0;long sellCost = 0;for (int i = 0; i < NUM_RECORDS; i++){final JavaMemoryTrade trade = get(i);if (trade.getSide() == 'B'){buyCost += (trade.getPrice() * trade.getQuantity());}else{sellCost += (trade.getPrice() * trade.getQuantity());}}long duration = System.currentTimeMillis() - start;System.out.println(runNum + ' - duration ' + duration + 'ms');System.out.println('buyCost = ' + buyCost + ' sellCost = ' + sellCost);}private static JavaMemoryTrade get(final int index){return trades[index];}public static void init(){trades = new JavaMemoryTrade[NUM_RECORDS];final byte[] londonStockExchange = {'X', 'L', 'O', 'N'};final int venueCode = pack(londonStockExchange);final byte[] billiton = {'B', 'H', 'P'};final int instrumentCode = pack( billiton);for (int i = 0; i < NUM_RECORDS; i++){JavaMemoryTrade trade = new JavaMemoryTrade();trades[i] = trade;trade.setTradeId(i);trade.setClientId(1);trade.setVenueCode(venueCode);trade.setInstrumentCode(instrumentCode);trade.setPrice(i);trade.setQuantity(i);trade.setSide((i & 1) == 0 ? 'B' : 'S');}}private static int pack(final byte[] value){int result = 0;switch (value.length){case 4:result = (value[3]);case 3:result |= ((int)value[2] << 8);case 2:result |= ((int)value[1] << 16);case 1:result |= ((int)value[0] << 24);break;default:throw new IllegalArgumentException('Invalid array size');}return result;}private static class JavaMemoryTrade{private long tradeId;private long clientId;private int venueCode;private int instrumentCode;private long price;private long quantity;private char side;public long getTradeId(){return tradeId;}public void setTradeId(final long tradeId){this.tradeId = tradeId;}public long getClientId(){return clientId;}public void setClientId(final long clientId){this.clientId = clientId;}public int getVenueCode(){return venueCode;}public void setVenueCode(final int venueCode){this.venueCode = venueCode;}public int getInstrumentCode(){return instrumentCode;}public void setInstrumentCode(final int instrumentCode){this.instrumentCode = instrumentCode;}public long getPrice(){return price;}public void setPrice(final long price){this.price = price;}public long getQuantity(){return quantity;}public void setQuantity(final long quantity){this.quantity = quantity;}public char getSide(){return side;}public void setSide(final char side){this.side = side;}}
}
緊湊型堆外結構
import sun.misc.Unsafe;import java.lang.reflect.Field;public class TestDirectMemoryLayout
{private static final Unsafe unsafe;static{try{Field field = Unsafe.class.getDeclaredField('theUnsafe');field.setAccessible(true);unsafe = (Unsafe)field.get(null);}catch (Exception e){throw new RuntimeException(e);}}private static final int NUM_RECORDS = 50 * 1000 * 1000;private static long address;private static final DirectMemoryTrade flyweight = new DirectMemoryTrade();public static void main(final String[] args){for (int i = 0; i < 5; i++){System.gc();perfRun(i);}}private static void perfRun(final int runNum){long start = System.currentTimeMillis();init();System.out.format('Memory %,d total, %,d free\n',Runtime.getRuntime().totalMemory(),Runtime.getRuntime().freeMemory());long buyCost = 0;long sellCost = 0;for (int i = 0; i < NUM_RECORDS; i++){final DirectMemoryTrade trade = get(i);if (trade.getSide() == 'B'){buyCost += (trade.getPrice() * trade.getQuantity());}else{sellCost += (trade.getPrice() * trade.getQuantity());}}long duration = System.currentTimeMillis() - start;System.out.println(runNum + ' - duration ' + duration + 'ms');System.out.println('buyCost = ' + buyCost + ' sellCost = ' + sellCost);destroy();}private static DirectMemoryTrade get(final int index){final long offset = address + (index * DirectMemoryTrade.getObjectSize());flyweight.setObjectOffset(offset);return flyweight;}public static void init(){final long requiredHeap = NUM_RECORDS * DirectMemoryTrade.getObjectSize();address = unsafe.allocateMemory(requiredHeap);final byte[] londonStockExchange = {'X', 'L', 'O', 'N'};final int venueCode = pack(londonStockExchange);final byte[] billiton = {'B', 'H', 'P'};final int instrumentCode = pack( billiton);for (int i = 0; i < NUM_RECORDS; i++){DirectMemoryTrade trade = get(i);trade.setTradeId(i);trade.setClientId(1);trade.setVenueCode(venueCode);trade.setInstrumentCode(instrumentCode);trade.setPrice(i);trade.setQuantity(i);trade.setSide((i & 1) == 0 ? 'B' : 'S');}}private static void destroy(){unsafe.freeMemory(address);}private static int pack(final byte[] value){int result = 0;switch (value.length){case 4:result |= (value[3]);case 3:result |= ((int)value[2] << 8);case 2:result |= ((int)value[1] << 16);case 1:result |= ((int)value[0] << 24);break;default:throw new IllegalArgumentException('Invalid array size');}return result;}private static class DirectMemoryTrade{private static long offset = 0;private static final long tradeIdOffset = offset += 0;private static final long clientIdOffset = offset += 8;private static final long venueCodeOffset = offset += 8;private static final long instrumentCodeOffset = offset += 4;private static final long priceOffset = offset += 4;private static final long quantityOffset = offset += 8;private static final long sideOffset = offset += 8;private static final long objectSize = offset += 2;private long objectOffset;public static long getObjectSize(){return objectSize;}void setObjectOffset(final long objectOffset){this.objectOffset = objectOffset;}public long getTradeId(){return unsafe.getLong(objectOffset + tradeIdOffset);}public void setTradeId(final long tradeId){unsafe.putLong(objectOffset + tradeIdOffset, tradeId);}public long getClientId(){return unsafe.getLong(objectOffset + clientIdOffset);}public void setClientId(final long clientId){unsafe.putLong(objectOffset + clientIdOffset, clientId);}public int getVenueCode(){return unsafe.getInt(objectOffset + venueCodeOffset);}public void setVenueCode(final int venueCode){unsafe.putInt(objectOffset + venueCodeOffset, venueCode);}public int getInstrumentCode(){return unsafe.getInt(objectOffset + instrumentCodeOffset);}public void setInstrumentCode(final int instrumentCode){unsafe.putInt(objectOffset + instrumentCodeOffset, instrumentCode);}public long getPrice(){return unsafe.getLong(objectOffset + priceOffset);}public void setPrice(final long price){unsafe.putLong(objectOffset + priceOffset, price);}public long getQuantity(){return unsafe.getLong(objectOffset + quantityOffset);}public void setQuantity(final long quantity){unsafe.putLong(objectOffset + quantityOffset, quantity);}public char getSide(){return unsafe.getChar(objectOffset + sideOffset);}public void setSide(final char side){unsafe.putChar(objectOffset + sideOffset, side);}}
}
結果
Intel i7-860 @ 2.8GHz, 8GB RAM DDR3 1333MHz,
Windows 7 64-bit, Java 1.7.0_07
=============================================
java -server -Xms4g -Xmx4g TestJavaMemoryLayout
Memory 4,116,054,016 total, 1,108,901,104 free
0 - duration 19334ms
Memory 4,116,054,016 total, 1,109,964,752 free
1 - duration 14295ms
Memory 4,116,054,016 total, 1,108,455,504 free
2 - duration 14272ms
Memory 3,817,799,680 total, 815,308,600 free
3 - duration 28358ms
Memory 3,817,799,680 total, 810,552,816 free
4 - duration 32487msjava -server TestDirectMemoryLayout
Memory 128,647,168 total, 126,391,384 free
0 - duration 983ms
Memory 128,647,168 total, 126,992,160 free
1 - duration 958ms
Memory 128,647,168 total, 127,663,408 free
2 - duration 873ms
Memory 128,647,168 total, 127,663,408 free
3 - duration 886ms
Memory 128,647,168 total, 127,663,408 free
4 - duration 884msIntel i7-2760QM @ 2.40GHz, 8GB RAM DDR3 1600MHz,
Linux 3.4.11 kernel 64-bit, Java 1.7.0_07
=================================================
java -server -Xms4g -Xmx4g TestJavaMemoryLayout
Memory 4,116,054,016 total, 1,108,912,960 free
0 - duration 12262ms
Memory 4,116,054,016 total, 1,109,962,832 free
1 - duration 9822ms
Memory 4,116,054,016 total, 1,108,458,720 free
2 - duration 10239ms
Memory 3,817,799,680 total, 815,307,640 free
3 - duration 21558ms
Memory 3,817,799,680 total, 810,551,856 free
4 - duration 23074msjava -server TestDirectMemoryLayout
Memory 123,994,112 total, 121,818,528 free
0 - duration 634ms
Memory 123,994,112 total, 122,455,944 free
1 - duration 619ms
Memory 123,994,112 total, 123,103,320 free
2 - duration 546ms
Memory 123,994,112 total, 123,103,320 free
3 - duration 547ms
Memory 123,994,112 total, 123,103,320 free
4 - duration 534ms
分析
讓我們將結果與上面承諾的3個好處進行比較。
1.顯著改善性能
這里的證據很明確。 使用堆外結構方法要快一個數量級以上。 最極端的情況是,在Sandy Bridge處理器上運行第五次 ,完成任務所需的時間相差43.2 倍 。 這也很好地說明了Sandy Bridge在可預測的數據訪問模式方面的表現。 性能不僅明顯更好,而且更加一致。 隨著堆變得碎片化,因此訪問模式變得更加隨機,性能會下降,這在以后使用標準Java方法運行時可以看到。
2.更緊湊的數據表示
對于我們的堆外表示,每個對象需要42個字節。 如示例所示,要存儲其中的5000萬個字節,我們需要2100,000,000字節。 JVM堆所需的內存為:
所需內存=總內存–可用內存–基本JVM需求
2,883,248,712 = 3,817,799,680 – 810,551,856 – 123,999,112
這意味著JVM需要多40%的內存來表示相同的數據。 產生這種開銷的原因是對Java對象的引用數組以及對象標頭。 在上一篇文章中,我討論了Java中的對象布局。
當處理非常大的數據集時,此開銷可能成為重要的限制因素。
3.能夠處理非常大的數據集,同時避免令人討厭的GC暫停
上面的示例代碼在每次運行之前強制執行GC循環,并且在某些情況下可以提高結果的一致性。 隨時刪除對System.gc()的調用,并親自觀察其中的含義。 如果運行添加以下命令行參數的測試,則垃圾收集器將詳細輸出發生的情況。
-XX:+ PrintGC -XX:+ PrintGCDetails -XX:+ PrintGCDateStamps -XX:+ PrintTenuringDistribution -XX:+ PrintHeapAtGC -XX:+ PrintGCApplicationConcurrentTime -XX:+ PrintGCApplicationStoppedTime -XX:+ PrintSafepointStatistics
通過分析輸出,我可以看到該應用程序總共進行了29個GC循環。 通過從輸出中提取指示應用程序線程何時停止的行,下面列出了暫停時間。
With System.gc() before each run
================================
Total time for which application threads were stopped: 0.0085280 seconds
Total time for which application threads were stopped: 0.7280530 seconds
Total time for which application threads were stopped: 8.1703460 seconds
Total time for which application threads were stopped: 5.6112210 seconds
Total time for which application threads were stopped: 1.2531370 seconds
Total time for which application threads were stopped: 7.6392250 seconds
Total time for which application threads were stopped: 5.7847050 seconds
Total time for which application threads were stopped: 1.3070470 seconds
Total time for which application threads were stopped: 8.2520880 seconds
Total time for which application threads were stopped: 6.0949910 seconds
Total time for which application threads were stopped: 1.3988480 seconds
Total time for which application threads were stopped: 8.1793240 seconds
Total time for which application threads were stopped: 6.4138720 seconds
Total time for which application threads were stopped: 4.4991670 seconds
Total time for which application threads were stopped: 4.5612290 seconds
Total time for which application threads were stopped: 0.3598490 seconds
Total time for which application threads were stopped: 0.7111000 seconds
Total time for which application threads were stopped: 1.4426750 seconds
Total time for which application threads were stopped: 1.5931500 seconds
Total time for which application threads were stopped: 10.9484920 seconds
Total time for which application threads were stopped: 7.0707230 secondsWithout System.gc() before each run
===================================
Test run times
0 - duration 12120ms
1 - duration 9439ms
2 - duration 9844ms
3 - duration 20933ms
4 - duration 23041msTotal time for which application threads were stopped: 0.0170860 seconds
Total time for which application threads were stopped: 0.7915350 seconds
Total time for which application threads were stopped: 10.7153320 seconds
Total time for which application threads were stopped: 5.6234650 seconds
Total time for which application threads were stopped: 1.2689950 seconds
Total time for which application threads were stopped: 7.6238170 seconds
Total time for which application threads were stopped: 6.0114540 seconds
Total time for which application threads were stopped: 1.2990070 seconds
Total time for which application threads were stopped: 7.9918480 seconds
Total time for which application threads were stopped: 5.9997920 seconds
Total time for which application threads were stopped: 1.3430040 seconds
Total time for which application threads were stopped: 8.0759940 seconds
Total time for which application threads were stopped: 6.3980610 seconds
Total time for which application threads were stopped: 4.5572100 seconds
Total time for which application threads were stopped: 4.6193830 seconds
Total time for which application threads were stopped: 0.3877930 seconds
Total time for which application threads were stopped: 0.7429270 seconds
Total time for which application threads were stopped: 1.5248070 seconds
Total time for which application threads were stopped: 1.5312130 seconds
Total time for which application threads were stopped: 10.9120250 seconds
Total time for which application threads were stopped: 7.3528590 seconds
從輸出中可以看出,垃圾回收器花費了大量時間。 當線程停止時,您的應用程序無響應。 這些測試已使用默認GC設置完成。 可以調整GC以獲得更好的結果,但這可能是一項非常熟練的工作。 我知道,即使在高吞吐量條件下,即使不在高吞吐量條件下也不施加較長的暫停時間,這可以很好地應對JVM。
在對該應用程序進行性能分析時,我可以看到大部分時間都花在分配對象并將它們提升到老一代,因為它們不適合年輕一代。 可以從計時中除去初始化成本,但這是不現實的。 如果采用傳統的Java方法,則需要先建立狀態,然后才能進行查詢。 應用程序的最終用戶必須等待狀態建立和查詢執行。
這個測試確實非常簡單。 想象一下使用100 GB規模的相似數據集。
注意:當垃圾收集器壓縮區域時,可以將彼此相鄰的對象移開很遠。 這可能會導致TLB和其他緩存未命中。
關于序列化的旁注
以這種方式使用堆外結構的一個巨大好處是,如何通過簡單的內存副本將它們很容易地序列化到網絡或存儲中,就像我在上一篇文章中所展示的那樣。 這樣,我們可以完全繞過中間緩沖區和對象分配。
結論
如果您愿意對大型數據集進行一些C風格的編程,則可以通過脫離堆控制Java中的內存布局。 如果這樣做,那么在性能,緊湊性和避免GC問題方面的好處就非常重要。 但是,這種方法不應該用于所有應用程序。 僅對于非常大的數據集或吞吐量和/或延遲的極端性能,才注意到它的優勢。
我希望Java社區可以共同認識到支持在堆和堆棧上的結構的重要性。 John Rose在這方面做了出色的工作 ,定義了如何將元組添加到JVM。 他今年在JVM語言峰會上發表的有關Arrays 2.0的演講確實值得關注。 約翰在演講中討論了結構數組的選擇和數組結構。 如果有John提出的元組可用,則此處描述的測試可以具有可比的性能,并且是更令人愉快的編程風格。 整個結構數組可以在一個動作中分配,從而繞開了跨代的單個對象的副本,并且可以緊湊的連續方式進行存儲。 這將消除此類問題中的重大GC問題。
最近,我比較了Java和.Net之間的標準數據結構。 在某些情況下,當.Net使用本機結構支持時,對于諸如地圖和字典之類的東西,我發現.Net的性能優勢是6-10倍。 讓我們盡快將其納入Java!
從結果也很明顯,如果我們要使用Java對大數據進行實時分析,那么我們的標準垃圾收集器就需要顯著改善并支持真正的并發操作。
[1] –據我所知,唯一處理非常大堆的JVM是Azul Zing祝您編程愉快,別忘了分享!
參考:來自Java的JCG合作伙伴 Martin Thompson在Mechanical Sympathy博客上的緊湊型堆外結構/堆棧在Java中 。
翻譯自: https://www.javacodegeeks.com/2012/10/compact-off-heap-structurestuples-in.html