Java里String.split需要注意的用法

我們常常用String的split()方法去分割字符串，有兩個地方值得注意：

1. 當分隔符是句號時(".")，需要轉義：

由于String.split是基于正則表達式來分割字符串，而句號在正則表達式里表示任意字符。

//Wrong:
//String[] words = tmp.split(".");//Correct:
String[] words = tmp.split("\\.");

所以，假設分隔符在正則表達式里有一定的意義時，需要格外留心，必須將它們轉義才能達到分割的效果。

2. 假設字符串最后有連續多個分隔符，且這些分隔符都需要被分割的話，需要調用split(String regex,int limit)這個方法：

String abc = "a,b,c,,,";
String[] str = abc.split(",");System.out.println(Arrays.toString(str)+" "+str.length);String[] str2 = abc.split(",",-1);System.out.println(Arrays.toString(str2)+" "+str2.length);

輸出如下：

[a, b, c] 3
[a, b, c, , , ] 6

需要輸出csv文件的時候，尤其需要注意。

3. 假設需要快速分割字符串，split()并不是最有效的方法。在split()方法內，有如下的實現：

1 public String[] split(String regex, int limit) {
2       return Pattern.compile(regex).split(this, limit);
3 }

頻繁調用split()會不斷創建Pattern這個對象，因此可以這樣去實現，減少Pattern的創建：

1 //create the Pattern object outside the loop    
2 Pattern pattern = Pattern.compile(" ");
3 
4 for (int i = 0; i < 1000000; i++)
5 {
6     String[] split = pattern.split("Hello World", 0);
7     list.add(split);
8 }

另外split()也往往比indexOf()+subString()這個組合分割字符串要稍慢，詳情可看這個帖子。

我在本機做過測試，感覺indexOf()+subString()比split()快一倍：

 1 public static void main(String[] args) {
 2         StringBuilder sb = new StringBuilder();
 3         for (int i = 100000; i < 100000 + 60; i++)
 4             sb.append(i).append(' ');
 5         String sample = sb.toString();
 6 
 7         int runs = 100000;
 8         for (int i = 0; i < 5; i++) {
 9             {
10                 long start = System.nanoTime();
11                 for (int r = 0; r < runs; r++) {
12                     StringTokenizer st = new StringTokenizer(sample);
13                     List<String> list = new ArrayList<String>();
14                     while (st.hasMoreTokens())
15                         list.add(st.nextToken());
16                 }
17                 long time = System.nanoTime() - start;
18                 System.out.printf("StringTokenizer took an average of %.1f us%n", time / runs
19                         / 1000.0);
20             }
21             {
22                 long start = System.nanoTime();
23                 Pattern spacePattern = Pattern.compile(" ");
24                 for (int r = 0; r < runs; r++) {
25                     List<String> list = Arrays.asList(spacePattern.split(sample, 0));
26                 }
27                 long time = System.nanoTime() - start;
28                 System.out.printf("Pattern.split took an average of %.1f us%n", time / runs
29                         / 1000.0);
30             }
31             {
32                 long start = System.nanoTime();
33                 for (int r = 0; r < runs; r++) {
34                     List<String> list = new ArrayList<String>();
35                     int pos = 0, end;
36                     while ((end = sample.indexOf(' ', pos)) >= 0) {
37                         list.add(sample.substring(pos, end));
38                         pos = end + 1;
39                     }
40                 }
41                 long time = System.nanoTime() - start;
42                 System.out
43                         .printf("indexOf loop took an average of %.1f us%n", time / runs / 1000.0);
44             }
45         }
46     }

在jdk1.7測試后，結果如下：

StringTokenizer took an average of 7.2 us
Pattern.split took an average of 7.9 us
indexOf loop took an average of 3.5 us

------------------------------------------
StringTokenizer took an average of 6.8 us
Pattern.split took an average of 5.4 us
indexOf loop took an average of 3.1 us

------------------------------------------
StringTokenizer took an average of 6.0 us
Pattern.split took an average of 5.5 us
indexOf loop took an average of 3.1 us

------------------------------------------
StringTokenizer took an average of 5.9 us
Pattern.split took an average of 5.5 us
indexOf loop took an average of 3.1 us

------------------------------------------
StringTokenizer took an average of 6.4 us
Pattern.split took an average of 5.5 us
indexOf loop took an average of 3.2 us

本文完

轉載于:https://www.cnblogs.com/techyc/p/3709182.html

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/377052.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/377052.shtml
英文地址，請注明出處：http://en.pswp.cn/news/377052.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！