[TOC]

0 Stream簡介

家庭住址：java.util.stream.Stream<T>
出生年月：Java8問世的時候他就來到了世上
主要技能：那可以吹上三天三夜了……
主要特征
- 不改變輸入源
- 中間的各種操作是lazy的(惰性求值、延遲操作)
- 只有當開始消費流的時候，流才有意義
- 隱式迭代
……

總體感覺，Stream相當于一個進化版的Iterator。Java8源碼里是這么注釋的：

A sequence of elements supporting sequential and parallel aggregate operations

可以方便的對集合進行遍歷、過濾、映射、匯聚、切片等復雜操作。最終匯聚成一個新的Stream，不改變原始數據。并且各種復雜的操作都是lazy的，也就是說會盡可能的將所有的中間操作在最終的匯聚操作一次性完成。

比起傳統的對象和數據的操作，Stream更專注于對流的計算,和傳說中的函數式編程有點類似。

他具體進化的多牛逼，自己體驗吧。

給一組輸入數據:

List<Integer> list = Arrays.asList(1, null, 3, 1, null, 4, 5, null, 2, 0);

求輸入序列中非空奇數之和，并且相同奇數算作同一個。

在lambda還在娘胎里的時候，為了實現這個功能，可能會這么做

int s = 0;
// 先放在Set里去重
Set<Integer> set = new HashSet<>(list);
for (Integer i : set) {if (i != null && (i & 1) == 0) {s += i;}
}
System.out.println(s);

當lambda和Stream雙劍合璧之后：

int sum = list.stream().filter(e -> e != null && (e & 1) == 1).distinct().mapToInt(i -> i).sum();

1 獲取Stream

從lambda的其他好基友那里獲取Stream

從1.8開始，接口中也可以存在 default 修飾的方法了。

java.util.Collection<E> 中有如下聲明：

public interface Collection<E> extends Iterable<E> {// 獲取普通的流default Stream<E> stream() {return StreamSupport.stream(spliterator(), false);}// 獲取并行流default Stream<E> parallelStream() {return StreamSupport.stream(spliterator(), true);}
}

java.util.Arrays中有如下聲明：

    public static <T> Stream<T> stream(T[] array) {return stream(array, 0, array.length);}public static IntStream stream(int[] array) {return stream(array, 0, array.length);}// 其他類似的方法不再一一列出

示例

List<String> strs = Arrays.asList("apache", "spark");
Stream<String> stringStream = strs.stream();IntStream intStream = Arrays.stream(new int[] { 1, 25, 4, 2 });

通過Stream接口獲取

Stream<String> stream = Stream.of("hello", "world");
Stream<String> stream2 = Stream.of("haha");
Stream<HouseInfo> stream3 = Stream.of(new HouseInfo[] { new HouseInfo(), new HouseInfo() });Stream<Integer> stream4 = Stream.iterate(1, i -> 2 * i + 1);Stream<Double> stream5 = Stream.generate(() -> Math.random());

注意：Stream.iterate()和 Stream.generate()生成的是無限流，一般要手動limit 。

2 轉換Stream

流過濾、流切片

這部分相對來說還算簡單明了，看個例子就夠了

// 獲取流
Stream<String> stream = Stream.of(//null, "apache", null, "apache", "apache", //"github", "docker", "java", //"hadoop", "linux", "spark", "alifafa");stream// 去除null,保留包含a的字符串.filter(e -> e != null && e.contains("a"))//.distinct()// 去重,當然要有equals()和hashCode()方法支持了.limit(3)// 只取滿足條件的前三個.forEach(System.out::println);// 消費流

map/flatMap

Stream的map定義如下：

<R> Stream<R> map(Function<? super T, ? extends R> mapper);

也就是說，接收一個輸入(T:當前正在迭代的元素)，輸出另一種類型(R)。

Stream.of(null, "apache", null, "apache", "apache", //"hadoop", "linux", "spark", "alifafa")//.filter(e -> e != null && e.length() > 0)//.map(str -> str.charAt(0))//取出第一個字符.forEach(System.out::println);

sorted

排序也比較直觀，有兩種：

// 按照元素的Comparable接口的實現來排序
Stream<T> sorted();// 指定Comparator來自定義排序
Stream<T> sorted(Comparator<? super T> comparator);

示例:

List<HouseInfo> houseInfos = Lists.newArrayList(//new HouseInfo(1, "恒大星級公寓", 100, 1), //new HouseInfo(2, "匯智湖畔", 999, 2), //new HouseInfo(3, "張江湯臣豪園", 100, 1), //new HouseInfo(4, "保利星苑", 23, 10), //new HouseInfo(5, "北顧小區", 66, 23), //new HouseInfo(6, "北杰公寓", null, 55), //new HouseInfo(7, "保利星苑", 77, 66), //new HouseInfo(8, "保利星苑", 111, 12)//
);houseInfos.stream().sorted((h1, h2) -> {if (h1 == null || h2 == null)return 0;if (h1.getDistance() == null || h2.getDistance() == null)return 0;int ret = h1.getDistance().compareTo(h2.getDistance());if (ret == 0) {if (h1.getBrowseCount() == null || h2.getBrowseCount() == null)return 0;return h1.getBrowseCount().compareTo(h2.getBrowseCount());}return ret;
});

3 終止/消費Stream

條件測試、初級統計操作

List<Integer> list = Arrays.asList(1, 2, 3, 4, 5);// 是不是所有元素都大于零
System.out.println(list.stream().allMatch(e -> e > 0));
// 是不是存在偶數
System.out.println(list.stream().anyMatch(e -> (e & 1) == 0));
// 是不是都不小于零
System.out.println(list.stream().noneMatch(e -> e < 0));// 找出第一個大于等于4的元素
Optional<Integer> optional = list.stream().filter(e -> e >= 4).findFirst();
// 如果存在的話,就執行ifPresent中指定的操作
optional.ifPresent(System.out::println);// 大于等于4的元素的個數
System.out.println(list.stream().filter(e -> e >= 4).count());
// 獲取最小的
System.out.println(list.stream().min(Integer::compareTo));
// 獲取最大的
System.out.println(list.stream().max(Integer::compareTo));
// 先轉換成IntStream,max就不需要比較器了
System.out.println(list.stream().mapToInt(i -> i).max());

reduce

這個詞不知道怎么翻譯，有人翻譯為 規約 或 匯聚。

反正就是將經過一系列轉換后的流中的數據最終收集起來，收集的同時可能會反復 apply 某個 reduce函數。

reduce()方法有以下兩個重載的變體：

// 返回的不是Optional,因為正常情況下至少有參數identity可以保證返回值不會為null
T reduce(T identity, BinaryOperator<T> accumulator);<U> U reduce(U identity,BiFunction<U, ? super T, U> accumulator,BinaryOperator<U> combiner);

示例：

// 遍歷元素，反復apply (i,j)->i+j的操作
Integer reduce = Stream.iterate(1, i -> i + 1)//1,2,3,...,10,....limit(10)//.reduce(0, (i, j) -> i + j);//55Optional<Integer> reduce2 = Stream.iterate(1, i -> i + 1)//.limit(10)//.reduce((i, j) -> i + j);

collect

該操作很好理解，顧名思義就是將Stream中的元素collect到一個地方。

最常規(最不常用)的collect方法

// 最牛逼的往往是最不常用的,畢竟這個方法理解起來太過復雜了
<R> R collect(Supplier<R> supplier,BiConsumer<R, ? super T> accumulator,BiConsumer<R, R> combiner);
// 至于這個方法的參數含義，請看下面的例子

一個參數的版本

<R, A> R collect(Collector<? super T, A, R> collector);

Collector接口(他不是函數式接口，沒法使用lambda)的關鍵代碼如下：

public interface Collector<T, A, R> {/****/Supplier<A> supplier();/*** */BiConsumer<A, T> accumulator();/*** */BinaryOperator<A> combiner();/****/Function<A, R> finisher();/*** */Set<Characteristics> characteristics();}

先來看一個關于三個參數的collect()方法的例子，除非特殊情況，不然我保證你看了之后這輩子都不想用它……

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
ArrayList<Integer> ret1 = numbers.stream()//.map(i -> i * 2)// 擴大兩倍.collect(//() -> new ArrayList<Integer>(), //參數1(list, e) -> list.add(e), //參數2(list1, list2) -> list1.addAll(list2)//參數3
);/**** <pre>* collect()方法的三個參數解釋如下：* 1. () -> new ArrayList<Integer>() *         生成一個新的用來存儲結果的集合* 2. (list, e) -> list.add(e)*         list：是參數1中生成的新集合*         e：是Stream中正在被迭代的當前元素*         該參數的作用就是將元素添加到新生成的集合中* 3. (list1, list2) -> list1.addAll(list2)*         合并集合* </pre>***/ret1.forEach(System.out::println);

不使用lambda的時候，等價的代碼應該是這個樣子的……

List<Integer> ret3 = numbers.stream()//.map(i -> i * 2)// 擴大兩倍.collect(new Supplier<List<Integer>>() {@Overridepublic List<Integer> get() {// 只是為了提供一個集合來存儲元素return new ArrayList<>();}}, new BiConsumer<List<Integer>, Integer>() {@Overridepublic void accept(List<Integer> list, Integer e) {// 將當前元素添加至第一個參數返回的容器中list.add(e);}}, new BiConsumer<List<Integer>, List<Integer>>() {@Overridepublic void accept(List<Integer> list1, List<Integer> list2) {// 合并容器list1.addAll(list2);}});ret3.forEach(System.out::println);

是不是被惡心到了……

同樣的，用Java調用spark的api的時候，如果沒有lambda的話，比上面的代碼還惡心……

順便打個免費的廣告，可以看看本大俠這篇使用各種版本實現的Spark的HelloWorld: http://blog.csdn.net/hylexus/...，來證明一下有lambda的世界是有多么幸福……

不過，當你理解了三個參數的collect方法之后，可以使用構造器引用和方法引用來使代碼更簡潔：

ArrayList<Integer> ret2 = numbers.stream()//.map(i -> i * 2)// 擴大兩倍.collect(//ArrayList::new, //List::add, //List::addAll//
);ret2.forEach(System.out::println);

Collectors工具的使用(高級統計操作)

上面的三個和一個參數的collect()方法都異常復雜，最常用的還是一個參數的版本。但是那個Collector自己實現的話還是很惡心。

還好，常用的Collect操作對應的Collector都在java.util.stream.Collectors 中提供了。很強大的工具……

以下示例都是對該list的操作：

List<HouseInfo> houseInfos = Lists.newArrayList(//new HouseInfo(1, "恒大星級公寓", 100, 1), // 小區ID，小區名，瀏覽數，距離new HouseInfo(2, "匯智湖畔", 999, 2), //new HouseInfo(3, "張江湯臣豪園", 100, 1), //new HouseInfo(4, "保利星苑", 111, 10), //new HouseInfo(5, "北顧小區", 66, 23), //new HouseInfo(6, "北杰公寓", 77, 55), //new HouseInfo(7, "保利星苑", 77, 66), //new HouseInfo(8, "保利星苑", 111, 12)//
);

好了，開始裝逼之旅 ^_^ ……

提取小區名

// 獲取所有小區名，放到list中
List<String> ret1 = houseInfos.stream().map(HouseInfo::getHouseName).collect(Collectors.toList());
ret1.forEach(System.out::println);// 獲取所有的小區名，放到set中去重
// 當然也可先distinct()再collect到List中
Set<String> ret2 = houseInfos.stream().map(HouseInfo::getHouseName).collect(Collectors.toSet());
ret2.forEach(System.out::println);// 將所有的小區名用_^_連接起來
// 恒大星級公寓_^_匯智湖畔_^_張江湯臣豪園_^_保利星苑_^_北顧小區_^_北杰公寓_^_保利星苑_^_保利星苑
String names = houseInfos.stream().map(HouseInfo::getHouseName).collect(Collectors.joining("_^_"));
System.out.println(names);// 指定集合類型為ArrayList
ArrayList<String> collect = houseInfos.stream().map(HouseInfo::getHouseName).collect(Collectors.toCollection(ArrayList::new));

最值

// 獲取瀏覽數最高的小區
Optional<HouseInfo> ret3 = houseInfos.stream()//.filter(h -> h.getBrowseCount() != null)// 過濾掉瀏覽數為空的.collect(Collectors.maxBy((h1, h2) -> Integer.compare(h1.getBrowseCount(), h2.getBrowseCount())));
System.out.println(ret3.get());// 獲取最高瀏覽數
Optional<Integer> ret4 = houseInfos.stream()//.filter(h -> h.getBrowseCount() != null)// 去掉瀏覽數為空的.map(HouseInfo::getBrowseCount)// 取出瀏覽數.collect(Collectors.maxBy(Integer::compare));// 方法引用，比較瀏覽數
System.out.println(ret4.get());

總數、總和

// 獲取總數
// 其實這個操作直接用houseInfos.size()就可以了，此處僅為演示語法
Long total = houseInfos.stream().collect(Collectors.counting());
System.out.println(total);// 瀏覽數總和
Integer ret5 = houseInfos.stream()//.filter(h -> h.getBrowseCount() != null)// 過濾掉瀏覽數為空的.collect(Collectors.summingInt(HouseInfo::getBrowseCount));
System.out.println(ret5);// 瀏覽數總和
Integer ret6 = houseInfos.stream()//.filter(h -> h.getBrowseCount() != null)// 過濾掉瀏覽數為空的.map(HouseInfo::getBrowseCount).collect(Collectors.summingInt(i -> i));
System.out.println(ret6);// 瀏覽數總和
int ret7 = houseInfos.stream()//.filter(h -> h.getBrowseCount() != null)// 過濾掉瀏覽數為空的.mapToInt(HouseInfo::getBrowseCount)// 先轉換為IntStream后直接用其sum()方法.sum();
System.out.println(ret7);

均值

// 瀏覽數平均值
Double ret8 = houseInfos.stream()//.filter(h -> h.getBrowseCount() != null)// 過濾掉瀏覽數為空的.collect(Collectors.averagingDouble(HouseInfo::getBrowseCount));
System.out.println(ret8);// 瀏覽數平均值
OptionalDouble ret9 = houseInfos.stream()//.filter(h -> h.getBrowseCount() != null)// 過濾掉瀏覽數為空的.mapToDouble(HouseInfo::getBrowseCount)// 先轉換為DoubleStream后直接用其average()方法.average();
System.out.println(ret9.getAsDouble());

統計信息

// 獲取統計信息
DoubleSummaryStatistics statistics = houseInfos.stream()//.filter(h -> h.getBrowseCount() != null).collect(Collectors.summarizingDouble(HouseInfo::getBrowseCount));
System.out.println("avg:" + statistics.getAverage());
System.out.println("max:" + statistics.getMax());
System.out.println("sum:" + statistics.getSum());

分組

// 按瀏覽數分組
Map<Integer, List<HouseInfo>> ret10 = houseInfos.stream()//.filter(h -> h.getBrowseCount() != null)// 過濾掉瀏覽數為空的.collect(Collectors.groupingBy(HouseInfo::getBrowseCount));
ret10.forEach((count, house) -> {System.out.println("BrowseCount:" + count + " " + house);
});// 多級分組
// 先按瀏覽數分組,二級分組用距離分組
Map<Integer, Map<String, List<HouseInfo>>> ret11 = houseInfos.stream()//.filter(h -> h.getBrowseCount() != null && h.getDistance() != null)//.collect(Collectors.groupingBy(HouseInfo::getBrowseCount,Collectors.groupingBy((HouseInfo h) -> {if (h.getDistance() <= 10)return "較近";else if (h.getDistance() <= 20)return "近";return "較遠";})));//結果大概長這樣
ret11.forEach((count, v) -> {System.out.println("瀏覽數:" + count);v.forEach((desc, houses) -> {System.out.println("\t" + desc);houses.forEach(h -> System.out.println("\t\t" + h));});
});
/***** <pre>*  瀏覽數:66較遠HouseInfo [houseId=5, houseName=北顧小區, browseCount=66, distance=23]瀏覽數:100較近HouseInfo [houseId=1, houseName=恒大星級公寓, browseCount=100, distance=1]HouseInfo [houseId=3, houseName=張江湯臣豪園, browseCount=100, distance=1]瀏覽數:999較近HouseInfo [houseId=2, houseName=匯智湖畔, browseCount=999, distance=2]瀏覽數:77較遠HouseInfo [houseId=6, houseName=北杰公寓, browseCount=77, distance=55]HouseInfo [houseId=7, houseName=保利星苑, browseCount=77, distance=66]瀏覽數:111近HouseInfo [houseId=8, houseName=保利星苑, browseCount=111, distance=12]較近HouseInfo [houseId=4, houseName=保利星苑, browseCount=111, distance=10]* * </pre>* ****/

分區

// 按距離分區(兩部分)
Map<Boolean, List<HouseInfo>> ret12 = houseInfos.stream()//.filter(h -> h.getDistance() != null)//.collect(Collectors.partitioningBy(h -> h.getDistance() <= 20));
/***** <pre>*  較遠HouseInfo [houseId=5, houseName=北顧小區, browseCount=66, distance=23]HouseInfo [houseId=6, houseName=北杰公寓, browseCount=77, distance=55]HouseInfo [houseId=7, houseName=保利星苑, browseCount=77, distance=66]較近HouseInfo [houseId=1, houseName=恒大星級公寓, browseCount=100, distance=1]HouseInfo [houseId=2, houseName=匯智湖畔, browseCount=999, distance=2]HouseInfo [houseId=3, houseName=張江湯臣豪園, browseCount=100, distance=1]HouseInfo [houseId=4, houseName=保利星苑, browseCount=111, distance=10]HouseInfo [houseId=8, houseName=保利星苑, browseCount=111, distance=12]* * </pre>****/
ret12.forEach((t, houses) -> {System.out.println(t ? "較近" : "較遠");houses.forEach(h -> System.out.println("\t\t" + h));
});Map<Boolean, Map<Boolean, List<HouseInfo>>> ret13 = houseInfos.stream()//.filter(h -> h.getDistance() != null)//.collect(Collectors.partitioningBy(h -> h.getDistance() <= 20,Collectors.partitioningBy(h -> h.getBrowseCount() >= 70))
);/****** <pre>*  較遠瀏覽較少HouseInfo [houseId=5, houseName=北顧小區, browseCount=66, distance=23]瀏覽較多HouseInfo [houseId=6, houseName=北杰公寓, browseCount=77, distance=55]HouseInfo [houseId=7, houseName=保利星苑, browseCount=77, distance=66]較近瀏覽較少瀏覽較多HouseInfo [houseId=1, houseName=恒大星級公寓, browseCount=100, distance=1]HouseInfo [houseId=2, houseName=匯智湖畔, browseCount=999, distance=2]HouseInfo [houseId=3, houseName=張江湯臣豪園, browseCount=100, distance=1]HouseInfo [houseId=4, houseName=保利星苑, browseCount=111, distance=10]HouseInfo [houseId=8, houseName=保利星苑, browseCount=111, distance=12]* </pre>****/ret13.forEach((less, value) -> {System.out.println(less ? "較近" : "較遠");value.forEach((moreCount, houses) -> {System.out.println(moreCount ? "\t瀏覽較多" : "\t瀏覽較少");houses.forEach(h -> System.out.println("\t\t" + h));});
});