一、第二章 科學計算器
要檢查兩個數字是否一樣,要使用 all.equal() ,不要使用 == ,== 符號僅用于比較兩個整型數是否存在相同 。
> all.equal(sqrt(2)^2,2) [1] TRUE > all.equal(sqrt(2) ^ 2,3) [1] "Mean relative difference: 0.5" > isTRUE(all.equal(sqrt(2) ^ 2,2)) [1] TRUE > isTRUE(all.equal(sqrt(2) ^ 2,3)) [1] FALSE
?
二、第三章 檢查變量和工作區
變量的類:邏輯類(logical)、三個數值的類(numeric、complex、integer)、用于存儲文本的字符character、存儲類別數據的因子factor,以及較罕見的存儲二進制數據的原始值raw
factor因子,存儲類別數據
> gender = factor(c("male","female","male","female")) > gender [1] male female male female Levels: female male > levels(gender) [1] "female" "male" > nlevels(gender) [1] 2
在底層,因子的值被存儲為整數,而非字符。可以通過調用 as.integer() 清楚的看到
> as.integer(gender) [1] 2 1 2 1
?事實證明,采用整數而非字符文本的存儲方式,令內存的使用非常高效
> gender_char = sample(c("female","male"),1000,replace = TRUE) > gender_char ......> gender_fac = as.factor(gender_char) > #把數據的類型轉換為因子型 > object.size(gender_char)#object.size()函數返回對象的內存大小 8160 bytes > object.size(gender_fac) 4560 bytes
?把因子轉換為字符串
> as.character(gender) [1] "male" "female" "male" "female"
?改變一個對象的類型(轉型casting)
> x = "123.456" #使用as*函數改變x的類型 > as.numeric(x) #as(x,"numeric") [1] 123.456 > is.numeric(x) [1] FALSE
?代碼? options(digits = n) 設置全局變量確定打印數字的小數點位數。
> options(digits = 10) > (x = runif(5)) [1] 0.040052175522 0.544388080016 0.506369658280 [4] 0.144690239336 0.005838404642
?runif 函數將生成30個均勻分布于0和1之間的隨機數,summary 函數就不同的數據類型提供匯總信息,例如對數值變量:
> num = runif(30) > summary(num)Min. 1st Qu. Median Mean 0.001235794 0.199856233 0.475356185 0.475318138 3rd Qu. Max. 0.703412558 0.984893506
?letters、LETTERS 是兩個內置的常數
> letters[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" [13] "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" [25] "y" "z" > LETTERS[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" [13] "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" [25] "Y" "Z"
?sample 函數為抽樣函數,它的格式為:sample( x , size= , replace= ) 第三個參數的缺省值是F ,表示進行的是無放回抽樣。
對a~e重復隨機抽樣30次:
> fac = factor(sample(letters[1:5],size = 30,replace = T)) > summary(fac)a b c d e 4 7 2 5 12
> bool = sample(c(TRUE,FALSE,NA),30,replace = TRUE) > summary(bool)Mode FALSE TRUE NA's logical 10 8 12
? 創建數據框dfr ,這里只顯示他的前幾行
> dfr = data.frame(num,fac,bool) > head(dfr) #默認顯示前6行num fac bool 1 0.34019507235 b NA 2 0.77415443189 e TRUE 3 0.02201034524 d TRUE 4 0.11190012516 e NA 5 0.18030911358 a NA 6 0.98489350639 d TRUE
> summary(dfr)num fac bool Min. :0.001235794 a: 4 Mode :logical 1st Qu.:0.199856233 b: 7 FALSE:10 Median :0.475356185 c: 2 TRUE :8 Mean :0.475318138 d: 5 NA's :12 3rd Qu.:0.703412558 e:12 Max. :0.984893506
?str 函數能顯示對象的結構。對向量來說,它并非很有趣(因為它們太簡單了),但 str 對數據框和嵌套列表非常有用:
> str(num)num [1:30] 0.34 0.774 0.022 0.112 0.18 ... > str(dfr) 'data.frame': 30 obs. of 3 variables:$ num : num 0.34 0.774 0.022 0.112 0.18 ...$ fac : Factor w/ 5 levels "a","b","c","d",..: 2 5 4 5 1 4 1 4 1 5 ...$ bool: logi NA TRUE TRUE NA NA TRUE ...
?每個類都有自己的打印(print)方法,以此控制如何顯示到控制臺。又是,這種打印模糊了其內部結構,或忽略了一些有用的信息。用unclass函數可繞開這一點,顯示變量是如何構建的。例如,對因子調用 unclass 函數會顯示它僅是一個整數(integer) 向量,擁有一個叫 levels 的屬性:
unclass(fac) [1] 2 1 4 3 attr(,"levels") [1] "cat" "dog" "goldfish" "hamster"
?attributes 函數能顯示當前對象的所有屬性列表:
> attributes(fac) $levels [1] "cat" "dog" "goldfish" "hamster" $class [1] "factor"
?view 函數會把數據框顯示為電子表格。edit 和 fix 與其相似,不過它們允許手動更改數據值。
View(dfr) #不允許更改 new_dfr = edit(dfr) #更改將保存于new_dfr fix(dfr) #更改將保存于dfr
View(head(dfr,50)) #查看前50行
?三、第四章 向量、矩陣和數組
?數組能存放多維矩形數據。矩陣是二維數組的特例。
有很多創建序列的方法,seq創建的優點是可設置步長。
> (xulie = seq(1,15,2)) [1] 1 3 5 7 9 11 13 15
?length() 函數查詢序列的長度:
> length(xulie) [1] 8
?向量的命名:
> c(apple = 1,banana = 2,"kiwi fruit" = 3, 4)apple banana kiwi fruit 1 2 3 4 > x = 1:4 > names(x) = c("apple" ,"banana" ,"kiwi fruit","") > xapple banana kiwi fruit 1 2 3 4
? 數組的創建:
> three_d_array = array( #三維數組 + 1:24, + dim = c(4,3,2), + dimnames = list( + c("one","two","three","four"), + c("ein","zwei","drei"), + c("un","deux") + ) + ) > three_d_array , , unein zwei drei one 1 5 9 two 2 6 10 three 3 7 11 four 4 8 12, , deuxein zwei drei one 13 17 21 two 14 18 22 three 15 19 23 four 16 20 24
> (a_matrix = matrix( #創建矩陣 + 1:12, + nrow = 4,byrow = T, + dimnames = list( + c("one","two","three","four"), + c("ein","zwei","drei") + ) + ))ein zwei drei one 1 2 3 two 4 5 6 three 7 8 9 four 10 11 12
?
一些函數:
> x = (1:5) ^ 2 > x [1] 1 4 9 16 25 > x[c(1,3,5)] [1] 1 9 25 > x[c(-2,-4)] [1] 1 9 25 > x[c(TRUE,F,T,F,T)] [1] 1 9 25 > names(x) = c("one","four","nine","sixteen","twenty five") > xone four nine sixteen twenty five 1 4 9 16 25 > which(x > 10)sixteen twenty five 4 5 > which.min(x) one 1 > which.max(x) twenty five 5 >
> rep(1:5 , 3)[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 > rep(1:5 , each = 3)[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 > rep(1:5 , times = 1:5)[1] 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 > rep(1:5 , length.out = 7) [1] 1 2 3 4 5 1 2
> rep.int(1:5 , 3)[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 > rep_len(1:5 , 13)[1] 1 2 3 4 5 1 2 3 4 5 1 2 3
> dim(three_d_array) [1] 4 3 2 > dim(a_matrix) [1] 4 3 > nrow(a_matrix) [1] 4 > ncol(a_matrix) [1] 3
?第五章 列表和數據框
#創建一個列表
> (main_list = list( + element_in_main_list = log10(1:10), + middle_list = list( + element_in_middle_list = diag(3), + inner_list = list( + element_in_inner_list = pi ^ 1:4, + another_element_in_inner_list = "a" + ) + ) + )) $element_in_main_list[1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700 0.7781513[7] 0.8450980 0.9030900 0.9542425 1.0000000$middle_list $middle_list$element_in_middle_list[,1] [,2] [,3] [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 1$middle_list$inner_list $middle_list$inner_list$element_in_inner_list [1] 3.141593$middle_list$inner_list$another_element_in_inner_list [1] "a"
?對列表進行查詢
(uk_bank_holidays_2013 <-list(Jan = "New Year's Day",Feb = NULL,Mar = "Good Friday",Apr = "Easter Monday",May = c("Early May Bank Holiday","Spring Bank Holiday"),Jun = NULL,Jul = NULL,Aug = "Summer Bank Holiday",Sep = NULL,Oct = NULL,Nov = NULL,Dec = c("Chrismas Day","Boxing Day") ))
#假如創建列表時并沒有給各元素命名,可在最后利用names()對列表命名。names(uk_bank_holidays_2013) = month.abb;
?
> uk_bank_holidays_2013["Jan"] #各種查詢方式 $Jan [1] "New Year's Day"> uk_bank_holidays_2013[["Jan"]] [1] "New Year's Day" > uk_bank_holidays_2013$Jan [1] "New Year's Day" > uk_bank_holidays_2013[1] $Jan [1] "New Year's Day"> uk_bank_holidays_2013[[1]] [1] "New Year's Day"
> uk_bank_holidays_2013[[c(5,2)]] #各種查詢方式 [1] "Spring Bank Holiday" > uk_bank_holidays_2013[[5]][[2]] [1] "Spring Bank Holiday" > uk_bank_holidays_2013$May [1] "Early May Bank Holiday" "Spring Bank Holiday" > uk_bank_holidays_2013$May[2] [1] "Spring Bank Holiday"
> uk_bank_holidays_2013$Jan = NULL #刪去元素操作 > uk_bank_holidays_2013$Feb = NULL > uk_bank_holidays_2013["Aug"] = list(NULL) #將元素設置為NULL值 > uk_bank_holidays_2013 $Mar [1] "Good Friday"$Apr [1] "Easter Monday"$May [1] "Early May Bank Holiday" "Spring Bank Holiday" $Jun NULL$Jul NULL$Aug NULL$Sep NULL$Oct NULL$Nov NULL$Dec [1] "Chrismas Day" "Boxing Day"
?創建數據框:
?
> (a_data_frame = data.frame( #創建一個數據框 + x = letters[1:5], + y = rnorm(5), + z = runif(5) > 0.5
+ row.names = NULL #如果輸入的任何變量有名稱,那么行名稱就取自第一 + )) #個向量名稱,該語句可將此規則覆蓋掉x y z 1 a 0.3067414 FALSE 2 b -2.4637065 TRUE 3 c 0.8443321 TRUE 4 d -0.0163287 TRUE 5 e 0.8291859 TRUE
?
? ?注意,每列的類型可與其他列不同,但在同一列中的元素類型必須相同。還要注意的是,對象的類名是 data.frame ,中間有一個點,而非空字符。
?
第五章介紹了列表與數據框,操作數據框是一個很大的話題,13章會討論。
?