當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

R语言入门3---R语言六大基本数据结构

發(fā)布時間：2024/1/23 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 R语言入门3---R语言六大基本数据结构小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

寫在篇前
向量
矩陣
數(shù)組
因子
數(shù)據(jù)框
- - - 構(gòu)建數(shù)據(jù)框
    - 觀察數(shù)據(jù)
    - 行名、列名
    - 獲取行數(shù)據(jù)、列數(shù)據(jù)
    - 添加列
    - 數(shù)據(jù)類型轉(zhuǎn)換
    - 子集查詢
    - 數(shù)據(jù)合并
列表
其他

寫在篇前

??本篇主要總結(jié)R語言中六大基本數(shù)據(jù)結(jié)構(gòu)的基本概念和常用操作，包括向量（Vector）、矩陣（Matrix）、數(shù)組（Array）、因子（Factor）、數(shù)據(jù)框（Data.Frame）、列表（List）。這六大基本數(shù)據(jù)結(jié)構(gòu)和R語言流程控制是我們編寫R腳本的基石，再結(jié)合R語言豐富的函數(shù)以及社區(qū)開發(fā)Package，我們就能應(yīng)用R語言做很多非常Cool的事情。

向量

??向量是用于存儲數(shù)值型、字符型或邏輯型數(shù)據(jù)的一維數(shù)組。執(zhí)行組合功能的函數(shù)c()可用來創(chuàng)建向量。注意，單個向量中的數(shù)據(jù)必須擁有相同的類型或模式（數(shù)值型、字符型或邏輯型），如：

> a = c(1,2,3,4,5) > mode(a) # 說明這是一個數(shù)值型存儲的向量 [1] "numeric"

??向量是一個常用并且非常簡單的數(shù)據(jù)結(jié)構(gòu)，主要需要注意一下向量元素的索引（R語言的數(shù)據(jù)結(jié)構(gòu)的下標(biāo)是從1開始的）以及數(shù)據(jù)類型轉(zhuǎn)換：

# 創(chuàng)建向量 > a = c(1,2,3,4,5) > b = c(1:5) > c_ = c("1","2","3","4","5") > d = c(T,F,T,T,F)# 數(shù)據(jù)類型相關(guān)操作 > typeof(a) [1] "double" > mode(a) [1] "numeric" > class(a) [1] "numeric"> is.numeric(a) [1] TRUE > is.double(a) [1] TRUE> as.character(a) [1] "1" "2" "3" "4" "5" > as.character(a) == b [1] TRUE TRUE TRUE TRUE TRUE# 索引向量元素 > a[1] [1] 1 > a[2:4] [1] 2 3 4 > a[c(2,4)] [1] 2 4

矩陣

??矩陣是一個二維數(shù)組，只是每個元素都擁有相同的模式（數(shù)值型、字符型或邏輯型），可通

過函數(shù)matrix創(chuàng)建矩陣。一般使用格式為：

mymatrix <- matrix(vector, nrow,ncol,byrow=T,dimnames=list(char_verctor_rownames,char_vector_colnames))

??其中vector包含了矩陣的元素，nrow和ncol用以指定行和列的維數(shù)，dimnames包含了可選的、

以字符型向量表示的行名和列名。選項byrow則表明矩陣應(yīng)當(dāng)按行填充（byrow=TRUE）還是按

列填充（byrow=FALSE），默認情況下按列填。

> nums = 1:4 > rnames = c('r1','r2') > cnames = c('c1','c2')> matrix_obj = matrix(nums,nrow=2,dimnames=list(c(),cnames)) > matrix_objc1 c2 [1,] 1 3 [2,] 2 4> matrix_obj = matrix(nums,nrow=2,dimnames=list(rnames,cnames) + ) > matrix_objc1 c2 r1 1 3 r2 2 4

??可以使用下標(biāo)和方括號來選擇矩陣中的行、列或元素。X[i,]指矩陣X中的第i 行，X[,j]

指第j 列，X[i, j]指第i 行第j 個元素，選擇多行或多列時，下標(biāo)i 和j 可為數(shù)值型向量。

> a = matrix(1:20,nrow=5) > a[,1] [,2] [,3] [,4] [1,] 1 6 11 16 [2,] 2 7 12 17 [3,] 3 8 13 18 [4,] 4 9 14 19 [5,] 5 10 15 20# 索引單個數(shù)據(jù) > a[1] integer(1) > a[7] [1] 7 # 索引行 > a[1,] [1] 1 6 11 16 > matrix_obj['r1',] c1 c2 1 3 # 索引列 > a[,1:2][,1] [,2] [1,] 1 6 [2,] 2 7 [3,] 3 8 [4,] 4 9 [5,] 5 10 > matrix_obj[,'c1'] r1 r2 1 2 # 綜合 > a[1:2,2:3][,1] [,2] [1,] 6 11 [2,] 7 12

數(shù)組

??數(shù)組（array）與矩陣類似，但是維度可以大于2。數(shù)組可通過array函數(shù)創(chuàng)建，形式如下：

myarray <- array(vector,dimensions,dimnames)

??其中vector包含了數(shù)組中的數(shù)據(jù)，dimensions是一個數(shù)值型向量，給出了各個維度下標(biāo)的最大

值，而dimnames是可選的、各維度名稱標(biāo)簽的列表：

> dim1 = c('A1','A2') > dim2 = c('B1','B2','B3') > dim3 = c('C1','C2','C3','C4') > z = array(1:24,c(2,3,4),dimnames=list(dim1,dim2,dim3)) # 由此創(chuàng)建了一個2*3*4的數(shù)組

??這里特別需要注意的是這些數(shù)在空間上的延伸順序，此數(shù)組可以看作4個2*3的矩陣，各個矩陣中依次按列延伸。因此，該矩陣如下：

> z , , C1B1 B2 B3 A1 1 3 5 A2 2 4 6, , C2B1 B2 B3 A1 7 9 11 A2 8 10 12, , C3B1 B2 B3 A1 13 15 17 A2 14 16 18, , C4B1 B2 B3 A1 19 21 23 A2 20 22 24

??與前面相同，我們需要關(guān)注數(shù)組的索引操作，基本和向量、矩陣如出一轍：

# 索引元素 > z[1,1,3] [1] 13# 綜合索引 > z[1:2,1:3,2]B1 B2 B3 A1 7 9 11 A2 8 10 12 > z[c('A1','A2'),c('B1','B2','B3'),'C2']B1 B2 B3 A1 7 9 11 A2 8 10 12

因子

變量可以歸結(jié)為以下幾種：

名義型

名義型變量是沒有順序之分的類別變量。糖尿病類型Diabetes（Type1、Type2）是名義型變量的一例。即使在數(shù)據(jù)中Type1編碼為1而Type2編碼為2，這也并不意味著二者是有序的。
有序型

有序型變量表示一種順序關(guān)系，而非數(shù)量關(guān)系。病情Status（poor, improved, excellent）是順序型變量的一個上佳示例。我們明白，病情為poor（較差）病人的狀態(tài)不如improved（病情好轉(zhuǎn)）的病人，但并不知道相差多少。
連續(xù)型

連續(xù) 型變量可以呈現(xiàn)為某個范圍內(nèi)的任意值，并同時表示了順序和數(shù)量。年齡Age就是一個連續(xù)型變

量，它能夠表示像14.5或22.8這樣的值以及其間的其他任意值。

??類別（名義型）變量和有序類別（有序型）變量在R中稱為因子（factor），函數(shù)factor()以一個整數(shù)向量的形式存儲類別值，整數(shù)的取值范圍是[ 1 … k ]（其中k 是名義型變量中唯一值的個數(shù)），同時一個由字符串（原始值）組成的內(nèi)部向量將映射到這些整數(shù)。

因子主要有以下幾種情況：

名義型變量因子
> diabetes = c("Type1","Type2","Type1","Type2") > diabetes = factor(diabetes) > diabetes [1] Type1 Type2 Type1 Type2 Levels: Type1 Type2> str(diabetes)Factor w/ 2 levels "Type1","Type2": 1 2 1 2 > summary(diabetes) Type1 Type2 2 2
有序型變量因子
> status = c("Poor","Imporved","Excellent","Poor") > status = factor(status,ordered=TRUE) > status [1] Poor Imporved Excellent Poor Levels: Excellent < Imporved < Poor> str(status)Ord.factor w/ 3 levels "Excellent"<"Imporved"<..: 3 2 1 3 > summary(status) Excellent Imporved Poor 1 1 2
自定義因子水平順序
> status = c("Poor","Improved","Excellent","Poor") > status = factor(status,ordered=TRUE,levels=c("Poor","Improved","Excellent"),labels=c("bad","middle","good")) > status [1] bad middle good bad Levels: bad < middle < good > str(status)Ord.factor w/ 3 levels "bad"<"middle"<..: 1 2 3 1 > summary(status)bad middle good 2 1 1

數(shù)據(jù)框

??數(shù)據(jù)框（data.frame）可以理解為二維數(shù)據(jù)表，每一行代表一條記錄，每一列代表一個屬性。不同于矩陣，數(shù)據(jù)框中每一列的數(shù)據(jù)類型可以不同，更加靈活多變、應(yīng)用廣泛，比如Excel數(shù)據(jù)導(dǎo)入R中處理一般就采用該種數(shù)據(jù)類型。數(shù)據(jù)框的操作稍微更復(fù)雜，以下主要例舉基本的數(shù)據(jù)框構(gòu)建、行列名操作、子集操作、數(shù)據(jù)類型轉(zhuǎn)換、查詢合并等方面。

構(gòu)建數(shù)據(jù)框

# 最基本的初始化方式 students<-data.frame(ID=c(1,2,3),Name=c("jeffery","tom","kim"),Gender=c("male","male","female"),Birthdate=c("1986-10-19","1997-5-26","1998-9-8"))

觀察數(shù)據(jù)

> summary(students)ID Name Gender BirthdateMin. :1.0 jeffery:1 female:1 1986-10-19:1 1st Qu.:1.5 kim :1 male :2 1997-5-26 :1 Median :2.0 tom :1 1998-9-8 :1 Mean :2.0 3rd Qu.:2.5 Max. :3.0 > str(students) 'data.frame': 3 obs. of 4 variables:$ ID : num 1 2 3$ Name : Factor w/ 3 levels "jeffery","kim",..: 1 3 2$ Gender : Factor w/ 2 levels "female","male": 2 2 1$ Birthdate: Factor w/ 3 levels "1986-10-19","1997-5-26",..: 1 2 3

行名、列名

# 獲取行名、列名 > row.names(students) [1] "1" "2" "3" > rownames(students) [1] "1" "2" "3"> names(students) [1] "ID" "Name" "Gender" "Birthdate" >colnames(students) [1] "ID" "Name" "Gender" "Birthdate"# 設(shè)置列名、行名 > row.names(students)<-c("001","002","003") > rownames(students)<-c("001","002","004")> names(students)<-c("id",'name','gender','birthday') > colnames(students)<-c("id",'name','gender','birth')

獲取行數(shù)據(jù)、列數(shù)據(jù)

??需要注意的是R語言的下標(biāo)是從1開始

# 獲取列 > students$name [1] jeffery tom kim Levels: jeffery kim tom> students[,2] [1] jeffery tom kim Levels: jeffery kim tom> students[[2]] [1] "jeffery" "tom" "kim" > students[2]name 001 jeffery 002 tom 004 kim> students['name']name 001 jeffery 002 tom 004 kim> students[c('id','name')]id name 001 1 jeffery 002 2 tom 004 3 kim> students[1:2]id name 001 1 jeffery 002 2 tom 004 3 kim# 獲取行 > students[1,]ID Name Gender Birthdate 1 1 jeffery male 1986-10-19# 獲取列和行 > students[2:3,2:4]name gender birth 002 tom male 1997-5-26 004 kim female 1998-9-8

??在復(fù)雜操作時，可以使用以下代碼簡化代碼：

# attach、detach > attach(students) > name<-name > detach(students) > name [1] jeffery tom kim Levels: jeffery kim tom# with > with(students,{ + name<-name + }) > print(name) [1] jeffery tom kim Levels: jeffery kim tom

??但是上面的with有一種情況需要注意，當(dāng)要在{}中對存在的全局變量賦值時，需要使用<<-進行賦值：

# 01 name<-c(1,2,3) > with(students,{ + name<-name + }) > name # 你會發(fā)現(xiàn)，結(jié)果和上面不一樣 [1] 1 2 3# 02 > name<-c(1,2,3) > with(students,{ + name<<-name + }) > name # 此時效果將和上面一樣 [1] jeffery tom kim Levels: jeffery kim tom

添加列

> students$Age<-as.integer(format(Sys.Date(),"%Y"))-as.integer(format(as.Date(students$Birthdate),"%Y")) > students<-within(students,{ Age<-as.integer(format(Sys.Date(),"%Y"))-as.integer(format(as.Date(Birthdate),"%Y")) })

數(shù)據(jù)類型轉(zhuǎn)換

student$Name<-as.character(student$Name) student$Birthdate<-as.Date(student$Birthdate)

子集查詢

> students[which(students$Gender=="male"),] # 獲取性別是male的數(shù)據(jù)行> students[which(students$Gender=="male"),"Name"] # 獲取性別是male的名字 [1] jeffery tom Levels: jeffery kim tom> subset(students,Gender=="male" & Age<30 ,select=c("Name","Age"))Name Age 2 tom 22> library(sqldf) > result<-sqldf("select Name,Age from student where Gender='male' and Age<30")

數(shù)據(jù)合并

# inner join score<-data.frame(SID=c(1,1,2,3,3),Course=c("Math","English","Math","Chinese","Math"),Score=c(90,80,80,95,96)) > result<-merge(students,score,by.x="ID",by.y="SID") > resultID Name Gender Birthdate Age Course Score 1 1 jeffery male 1986-10-19 33 Math 90 2 1 jeffery male 1986-10-19 33 English 80 3 2 tom male 1997-5-26 22 Math 80 4 3 kim female 1998-9-8 21 Chinese 95 5 3 kim female 1998-9-8 21 Math 96# rbind > student2<-data.frame(ID=c(21,22),Name=c("Yan","Peng"),Gender=c("female","male"),Birthdate=c("1982-2-9","1983-1-16"),Age=c(32,31)) > rbind(student2, students)ID Name Gender Birthdate Age 1 21 Yan female 1982-2-9 32 2 22 Peng male 1983-1-16 31 3 1 jeffery male 1986-10-19 33 4 2 tom male 1997-5-26 22 5 3 kim female 1998-9-8 21# cbind > cbind(students, score[1:3,])ID Name Gender Birthdate Age SID Course Score 1 1 jeffery male 1986-10-19 33 1 Math 90 2 2 tom male 1997-5-26 22 1 English 80 3 3 kim female 1998-9-8 21 2 Math 80

列表

??列表（list）是R的數(shù)據(jù)類型中最為復(fù)雜的一種。一般來說，列表就是一些對象（或成分，

component）的有序集合。列表允許你整合若干（可能無關(guān)的）對象到單個對象名下。例如，某個

列表中可能是若干向量、矩陣、數(shù)據(jù)框，甚至其他列表的組合。可以使用函數(shù)list()創(chuàng)建列表:

mylist <- list(obj1,bj2,...) # or mylist <-(name1=obj1,name2=obj2,...)

??以下展示列表的主要操作，包括構(gòu)建列表、獲取列表元素等：

# 構(gòu)建 > a = 'My First List' > b = c(1,2,3,4,5) > c = matrix(1:10, nrow=5) > d = c("1","2","3","4","5") > mylist = list(title=a,months=b,c,d) > mylist $title [1] "My First List"$months [1] 1 2 3 4 5[[3]][,1] [,2] [1,] 1 6 [2,] 2 7 [3,] 3 8 [4,] 4 9 [5,] 5 10[[4]] [1] "1" "2" "3" "4" "5"# 索引方式(特別注意他們之間的區(qū)別) > mylist[[1]] # 返回list中對應(yīng)元素 [1] "My First List" > mylist[1] # 返回的是list類型 $title [1] "My First List"> mylist['title'] # 返回的是list類型 $title [1] "My First List" > mylist[['title']] # 返回list中對應(yīng)元素 [1] "My First List"> mylist$title # 返回list中對應(yīng)元素 [1] "My First List"# 所以不難推測，構(gòu)建list的子集可以如下： > mylist[c('title','months')] $title [1] "My First List"$months [1] 1 2 3 4 5

其他

上面的示例代碼中涉及可能涉及下面這些容易混淆的函數(shù)，在此，對這些函數(shù)進行總結(jié)歸納：

上下文函數(shù)

with和attach的區(qū)別就是，如果在with上下文中需覆蓋全局變量的值，需要使用<<-符號，而attach會默認覆蓋；within跟with功能相同，但返回值不同，within會返回所有修改生效后的原始數(shù)據(jù)結(jié)構(gòu)（列表、數(shù)據(jù)框等），而with的返回值一般都被忽略。
- with
- attach、detach
- within
數(shù)據(jù)類型函數(shù)

?在R里面，每一個對象都有一個mode和一個class，前者表示對象在內(nèi)存中是如何存儲的 (numeric, character, list and function)；后者表示對象的抽象類型。
- typeof
  
  The Type of an Object
- mode
  
  The (Storage) Mode of an Object
- class
  
  R possesses a simple generic function mechanism which can be used for an object-oriented style of programming.Method dispatch takes place based on the class of the first argument to the generic function.

總結(jié)

以上是生活随笔為你收集整理的R语言入门3---R语言六大基本数据结构的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python描述器深度解析
下一篇： R语言入门4---R语言流程控制