Adonis结果P值小于0.05,一定代表两组样品物种构成差异显著吗?
前情回顧
方差分析基本概念:方差分析中的“元”和“因素”是什么?
PERMANOVA原理解釋:這個統計檢驗可用于判斷PCA/PCoA等的分群效果是否顯著!
實戰1:畫一個帶統計檢驗的PCoA分析結果
配對檢驗:畫一個帶統計檢驗的PcOA分析結果 (再進一步,配對比較)
你的adonis用對了嗎?不同因素的順序竟然對結果有很大影響
為PERMANOVA/Adonis分析保駕護航,檢驗數據離散度
非參數檢驗也不是什么都不需要關注,比如上面提到的因素順序和方差加和方式是一個需要注意的點。除此之外,非參數多元方差分析應用時還有下面這些注意事項:
PERMANOVA檢驗沒有考慮變量之間的共線性關系,因此也不能夠用于探索這種關系。
嵌套或分層設計 (Nested or hierarchical designs)時需要考慮合適的置換策略。
需要明確哪些樣品是真正可以交換的 (exchangeable)。
PERMANOVA有個假設是balanced designs (不同分組的樣本數相等), 非平衡設計也能處理。
如果不同組的樣品在檢測指標構成的空間的中心點沒有差別,但每個組內檢測指標離散度較大,也會導致獲得顯著性的P值。
在解釋結果時,需要同時評估數據離散度的影響。
vegdist評估數據離散度,再解釋adonis的結果
前面我們用下面的代碼檢驗了Managment對物種組成差異影響的顯著程度,獲得P-value=0.002 < 0.05,表示管理方式對物種組成有顯著影響。但這一影響是否受到每個分組里面數據離散程度的影響呢?
library(vegan) data(dune) data(dune.env) # 基于bray-curtis距離進行計算 set.seed(1) dune.div <- adonis2(dune ~ Management, data = dune.env, permutations = 999, method="bray")dune.div## Permutation test for adonis under reduced model ## Terms added sequentially (first to last) ## Permutation: free ## Number of permutations: 999 ## ## adonis2(formula = dune ~ Management, data = dune.env, permutations = 999, method = "bray") ## Df SumOfSqs R2 F Pr(>F) ## Management 3 1.4686 0.34161 2.7672 0.002 ** ## Residual 16 2.8304 0.65839 ## Total 19 4.2990 1.00000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1我們還需要利用betadisper評估下每組樣本物種組成的多元一致性 (Multivariate homogeneity of groups dispersions (variances))。如下代碼計算出P=0.168表示不同分組樣品檢測指標的離散度(方差)沒有顯著差異。那么,adonis檢測出的差異就是因為每組數據在空間的中心點不同造成的,進一步說明Management對物種組成有顯著影響。
# 計算加權bray-curtis距離 dune.dist <- vegdist(dune, method="bray", binary=F)# One measure of multivariate dispersion (variance) for a group of samples # is to calculate the average distance of group members to the group centroid # or spatial median in multivariate space. # To test if the dispersions (variances) of one or more groups are different, # the distances of group members to the group centroid are subject to ANOVA. # This is a multivariate analogue of Levene's test for homogeneity of variances # if the distances between group members and # group centroids is the Euclidean distance. dispersion <- betadisper(dune.dist, group=dune.env$Management) permutest(dispersion)## ## Permutation test for homogeneity of multivariate dispersions ## Permutation: free ## Number of permutations: 999 ## ## Response: Distances ## Df Sum Sq Mean Sq F N.Perm Pr(>F) ## Groups 3 0.13831 0.046104 1.9506 999 0.159 ## Residuals 16 0.37816 0.023635從下面的圖上也可以看出,4種管理方式下樣品在空間的中心點相距較遠。(也可以參考前面如何美化這個圖)
plot(dispersion, hull=FALSE, ellipse=TRUE) ##sd ellipseQ: When running adonis (vegan package) I got an r2 = 0.45, andp = 0.001. When I ran the betadisper and ran a subsequent permutation test I got an F = 1 and p = 0.3.
A: A non-significant result in betadisper is not necessarily related to a significant/non-significant result in adonis. The two tests are testing different hypothesis. The former testshomogeneity of dispersion among groups (regions in your case), which is a condition (assumption) for adonis. The latter tests no difference in ‘location’, that is, tests whether composition among groups is similar or not. You may have the centroids of two groups in NMS at a very similar position in the ordination space, but if theirdispersions are quite different, adonis will give you a significant p-value, thus, the result is heavily influenced not by thedifference in composition between groups but bydifferences in composition within groups (heterogeneous dispersion, and thus a measure of beta diversity). In short, your results are fine, you are meeting the ‘one assumption’ for adonis (homogeneous dispersion) and thus you are certain that results from adonis are ‘real’ and not an artifact of heterogeneous dispersions. For more information you can read Anderson (2006) Biometrics 62(1):245-253 and Anderson (2006) Ecology Letters 9(6):683-693. Hope this helps!
https://stats.stackexchange.com/questions/212137/betadisper-and-adonis-in-r-am-i-interpreting-my-output-correctly
數據離散度不同而中心點一致,adonis也可能顯著
下面我們看一個模擬的例子,模擬出3套群體的物種豐度表,群體1、群體2、群體3的物種空間的中心點一致,而物種豐度的離散度依次變小,PERMANOVA檢驗顯著,betadisper結果也顯著,這時解釋數據時就要小心。這個導致顯著的原因是什么。
set.seed(1) num <- 30 # 控制每個物種的均值 mean <- seq(10,120,by=10) # 控制離散度 disp <- c(5,50,200)# 模擬3組樣品的數據;直接是轉置后的物種豐度表 sites.a <- as.data.frame(mapply(rnbinom, n=num, size=disp[1], mu=mean)) rownames(sites.a) <- paste('site.a', 1:num, sep=".") colnames(sites.a) <- paste('Species',letters[1:length(mean)], sep=".")sites.b <- as.data.frame(mapply(rnbinom, n=num, size=disp[1:2], mu=mean)) rownames(sites.b) <- paste('site.b', 1:num, sep=".") colnames(sites.b) <- paste('Species',letters[1:length(mean)], sep=".")sites.c <- as.data.frame(mapply(rnbinom, n=num, size=disp, mu=mean)) rownames(sites.c) <- paste('site.c', 1:num, sep=".") colnames(sites.c) <- paste('Species',letters[1:length(mean)], sep=".")otu_table_t <- rbind(sites.a,sites.b,sites.c) otu_table_t[sample(1:90,5),]## Species.a Species.b Species.c Species.d Species.e Species.f Species.g Species.h Species.i Species.j ## site.c.22 13 15 43 29 49 72 24 102 75 96 ## site.a.26 8 23 46 29 25 15 91 49 58 54 ## site.a.13 14 30 47 56 18 77 111 128 90 53 ## site.a.14 5 15 17 56 37 75 81 59 63 58 ## site.b.21 15 24 8 33 28 42 108 74 76 64 ## Species.k Species.l ## site.c.22 139 142 ## site.a.26 87 129 ## site.a.13 33 47 ## site.a.14 164 183 ## site.b.21 52 103生成Metadata數據,包含樣品的分組信息。目的就是檢驗不同組的物種構成是否有顯著差異。
metadata <- data.frame(Sample=rownames(otu_table_t), Group=rep(c("A","B","C"), each=num)) rownames(metadata) <- metadata$Sample metadata[sample(1:90,5),,drop=F]## Sample Group ## site.a.28 site.a.28 A ## site.b.12 site.b.12 B ## site.a.20 site.a.20 A ## site.b.10 site.b.10 B ## site.a.10 site.a.10 APCoA和NMDS分析可視化不同組樣品物種組成的差異度
統計分析前,先直觀的看一下不同組樣本在物種定義的空間上的分布。
為什么要畫個圖:參考 - 什么是安斯庫姆四重奏?為什么統計分析之前必須要作圖?
# 計算加權bray-curtis距離 otu_dist <- vegdist(otu_table_t, method="bray", binary=F)otu_pcoa <- cmdscale(otu_dist, k=3, eig=T)otu_pcoa_points <- as.data.frame(otu_pcoa$points) sum_eig <- sum(otu_pcoa$eig) eig_percent <- round(otu_pcoa$eig/sum_eig*100,1)colnames(otu_pcoa_points) <- paste0("PCoA", 1:3)otu_pcoa_result <- cbind(otu_pcoa_points, metadata)從PCoA的結果上來看,A,B,C三個組在第一、第二、第三主坐標軸沒有明顯的區分開。
library(ggplot2) library(patchwork)ggplot(otu_pcoa_result, aes(x=PCoA1, y=PCoA2, color=Group)) +labs(x=paste("PCoA 1 (", eig_percent[1], "%)", sep=""),y=paste("PCoA 2 (", eig_percent[2], "%)", sep="")) +geom_point(size=4) + stat_ellipse(level=0.9) +theme_classic() + coord_fixed() +ggplot(otu_pcoa_result, aes(x=PCoA1, y=PCoA3, color=Group)) +labs(x=paste("PCoA 1 (", eig_percent[1], "%)", sep=""),y=paste("PCoA 3 (", eig_percent[3], "%)", sep="")) +geom_point(size=4) + stat_ellipse(level=0.9) +theme_classic() + coord_fixed()從NMDS結果看,A,B,C三組也區分不開。
otu_mds <- metaMDS(otu_table_t, k=5) #using all the defaults## Square root transformation ## Wisconsin double standardization ## Run 0 stress 0.1131245 ## Run 1 stress 0.1131233 ## ... New best solution ## ... Procrustes: rmse 0.0003155417 max resid 0.001341899 ## ... Similar to previous best ## Run 2 stress 0.1131243 ## ... Procrustes: rmse 0.0009154324 max resid 0.00352237 ## ... Similar to previous best ## Run 3 stress 0.1131238 ## ... Procrustes: rmse 0.0002307456 max resid 0.001378836 ## ... Similar to previous best ## Run 4 stress 0.1131239 ## ... Procrustes: rmse 0.0002008885 max resid 0.0008441584 ## ... Similar to previous best ## Run 5 stress 0.1131233 ## ... Procrustes: rmse 0.0004594988 max resid 0.00248363 ## ... Similar to previous best ## Run 6 stress 0.1136538 ## Run 7 stress 0.1131231 ## ... New best solution ## ... Procrustes: rmse 6.187922e-05 max resid 0.0002788433 ## ... Similar to previous best ## Run 8 stress 0.1131234 ## ... Procrustes: rmse 0.000457399 max resid 0.002017475 ## ... Similar to previous best ## Run 9 stress 0.1131243 ## ... Procrustes: rmse 0.0003620819 max resid 0.001329571 ## ... Similar to previous best ## Run 10 stress 0.1131235 ## ... Procrustes: rmse 0.0001788438 max resid 0.0008840311 ## ... Similar to previous best ## Run 11 stress 0.1131248 ## ... Procrustes: rmse 0.0004674201 max resid 0.001960981 ## ... Similar to previous best ## Run 12 stress 0.1131231 ## ... New best solution ## ... Procrustes: rmse 0.0003807188 max resid 0.001578129 ## ... Similar to previous best ## Run 13 stress 0.1131238 ## ... Procrustes: rmse 0.0004016239 max resid 0.002178598 ## ... Similar to previous best ## Run 14 stress 0.113123 ## ... New best solution ## ... Procrustes: rmse 0.0001931854 max resid 0.0007886561 ## ... Similar to previous best ## Run 15 stress 0.1176584 ## Run 16 stress 0.1131244 ## ... Procrustes: rmse 0.000621146 max resid 0.002339344 ## ... Similar to previous best ## Run 17 stress 0.1131237 ## ... Procrustes: rmse 0.0004553297 max resid 0.0019548 ## ... Similar to previous best ## Run 18 stress 0.1131236 ## ... Procrustes: rmse 0.000454603 max resid 0.001894929 ## ... Similar to previous best ## Run 19 stress 0.1131241 ## ... Procrustes: rmse 0.0005855289 max resid 0.002455173 ## ... Similar to previous best ## Run 20 stress 0.113124 ## ... Procrustes: rmse 0.0005247607 max resid 0.001899271 ## ... Similar to previous best ## *** Solution reachedotu_mds_scores <- as.data.frame(scores(otu_mds)) otu_mds_scores <- cbind(otu_mds_scores, metadata)library(ggplot2) ggplot(data=otu_mds_scores, aes(x=NMDS1,y=NMDS2,colour=Group)) + geom_point(size=4) + stat_ellipse(level = 0.9) +theme_classic()進行Adonis檢驗和數據離散度評估
adonis結果顯示Pr(>F)<0.05,統計顯著;不同組之間的物種組成存在顯著差異。這與PCoA和NMDS的結果還是有些不一致的。那這個統計差異是怎么來的呢?
library(vegan) adon.results<-adonis(otu_dist ~ Group, data=metadata, perm=999) print(adon.results)## ## Call: ## adonis(formula = otu_dist ~ Group, data = metadata, permutations = 999) ## ## Permutation: free ## Number of permutations: 999 ## ## Terms added sequentially (first to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## Group 2 0.10752 0.053760 2.4707 0.05375 0.001 *** ## Residuals 87 1.89300 0.021759 0.94625 ## Total 89 2.00052 1.00000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1betadisper檢驗Pr(>F)<0.05表明不同組的數據在空間分布的離散度顯著不同。這是導致adonis結果顯著的主要原因。不同分組之間物種的構成的顯著不同不是體現在物種空間中心點的變化,而是物種空間離散度的變化。
mod <- betadisper(otu_dist, metadata$Group) permutest(mod)## ## Permutation test for homogeneity of multivariate dispersions ## Permutation: free ## Number of permutations: 999 ## ## Response: Distances ## Df Sum Sq Mean Sq F N.Perm Pr(>F) ## Groups 2 0.157498 0.078749 80.188 999 0.001 *** ## Residuals 87 0.085439 0.000982 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1用一組可視化來展示這個差異的成因
把每組樣本抽提出來,分別繪制下PCoA的樣品分布,可以看出,每組樣品在PCoA定義的空間上中心點是很相近的,而樣品分散程度不同。也就是說分組內樣品的多樣性反應到了不同分組的物種構成差異上了,這個“顯著”的差異是不是我們關注的,需要自己來判斷了。
# extract the centroids and the site points in multivariate space. centroids<-data.frame(grps=rownames(mod$centroids),data.frame(mod$centroids)) vectors<-data.frame(group=mod$group,data.frame(mod$vectors))# to create the lines from the centroids to each point we will put it in a format that ggplot can handle seg.data<-cbind(vectors[,1:3],centroids[rep(1:nrow(centroids),as.data.frame(table(vectors$group))$Freq),2:3]) names(seg.data)<-c("group","v.PCoA1","v.PCoA2","PCoA1","PCoA2")# create the convex hulls of the outermost points grp1.hull<-seg.data[seg.data$group=="A",1:3][chull(seg.data[seg.data$group=="A",2:3]),] grp2.hull<-seg.data[seg.data$group=="B",1:3][chull(seg.data[seg.data$group=="B",2:3]),] grp3.hull<-seg.data[seg.data$group=="C",1:3][chull(seg.data[seg.data$group=="C",2:3]),] all.hull<-rbind(grp1.hull,grp2.hull,grp3.hull)library(gridExtra)panel.a<-ggplot() +geom_polygon(data=all.hull[all.hull=="A",],aes(x=v.PCoA1,y=v.PCoA2),colour="black",alpha=0,linetype="dashed") +geom_segment(data=seg.data[1:30,],aes(x=v.PCoA1,xend=PCoA1,y=v.PCoA2,yend=PCoA2),alpha=0.30) + geom_point(data=centroids[1,1:3], aes(x=PCoA1,y=PCoA2),size=4,colour="red",shape=16) + geom_point(data=seg.data[1:30,], aes(x=v.PCoA1,y=v.PCoA2),size=2,shape=16) +labs(title="A",x="",y="") +coord_cartesian(xlim = c(-0.2,0.2), ylim = c(-0.25,0.2)) +theme_classic() + theme(legend.position="none")panel.b<-ggplot() + geom_polygon(data=all.hull[all.hull=="B",],aes(x=v.PCoA1,y=v.PCoA2),colour="black",alpha=0,linetype="dashed") +geom_segment(data=seg.data[31:60,],aes(x=v.PCoA1,xend=PCoA1,y=v.PCoA2,yend=PCoA2),alpha=0.30) + geom_point(data=centroids[2,1:3], aes(x=PCoA1,y=PCoA2),size=4,colour="red",shape=17) + geom_point(data=seg.data[31:60,], aes(x=v.PCoA1,y=v.PCoA2),size=2,shape=17) +labs(title="B",x="",y="") +coord_cartesian(xlim = c(-0.2,0.2), ylim = c(-0.25,0.2)) +theme_classic() + theme(legend.position="none")panel.c<-ggplot() + geom_polygon(data=all.hull[all.hull=="C",],aes(x=v.PCoA1,y=v.PCoA2),colour="black",alpha=0,linetype="dashed") +geom_segment(data=seg.data[61:90,],aes(x=v.PCoA1,xend=PCoA1,y=v.PCoA2,yend=PCoA2),alpha=0.30) +geom_point(data=centroids[3,1:3], aes(x=PCoA1,y=PCoA2),size=4,colour="red",shape=15) + geom_point(data=seg.data[61:90,], aes(x=v.PCoA1,y=v.PCoA2),size=2,shape=15) + labs(title="C",x="",y="") +coord_cartesian(xlim = c(-0.2,0.2), ylim = c(-0.25,0.2)) +theme_classic() + theme(legend.position="none")panel.d<-ggplot() + geom_polygon(data=all.hull,aes(x=v.PCoA1,y=v.PCoA2),colour="black",alpha=0,linetype="dashed") +geom_segment(data=seg.data,aes(x=v.PCoA1,xend=PCoA1,y=v.PCoA2,yend=PCoA2),alpha=0.30) + geom_point(data=centroids[,1:3], aes(x=PCoA1,y=PCoA2,shape=grps),size=4,colour="red") + geom_point(data=seg.data, aes(x=v.PCoA1,y=v.PCoA2,shape=group),size=2) + labs(title="All",x="",y="") +coord_cartesian(xlim = c(-0.2,0.2), ylim = c(-0.25,0.2)) +theme_classic() + theme(legend.position="none")grid.arrange(panel.a,panel.b,panel.c,panel.d,nrow=1)PERMANOVA的作者對這個問題的看法
Marti Anderson: “[…] Although there is also no explicit assumption regarding the homogeneity of spread within each group, PERMANOVA, like ANOSIM (Clarke 1993), will be sensitive to differences in spread (variability) among groups. Thus, if a significant difference between groups is detected using PERMANOVA, then this could be due to differences in location, differences in spread, or a combinationof the two. Perhaps the best approach is to perform a separate test for homogeneity (e.g., using the program PERMDISP) including pair-wise comparisons, as well as examining the average within and between-group distances and associated MDS plots. This will help to determine the nature of the difference between any pair of groups, whether it be due to location, spread, or a combination of the two. […]”
參考
https://www.scribbr.com/frequently-asked-questions/one-way-vs-two-way-anova/
MANOVA的前提假設 https://www.real-statistics.com/multivariate-statistics/multivariate-analysis-of-variance-manova/manova-assumptions/ ?https://www.statology.org/manova-assumptions/
https://statistics.laerd.com/statistical-guides/one-way-anova-statistical-guide.php
https://www.yunbios.net/h-nd-570.html
https://mp.weixin.qq.com/s/v_k4Yhe9rBWM9y9A3P3wQw
https://mp.weixin.qq.com/s?__biz=MzUzMjA4Njc1MA==&mid=2247484678&idx=1&sn=f95418a311e639704e9848545efc7fd7&scene=21#wechat_redirect
https://chrischizinski.github.io/rstats/vegan-ggplot2/
https://chrischizinski.github.io/rstats/adonis/
https://chrischizinski.github.io/rstats/ordisurf/
https://www.rdocumentation.org/packages/vegan/versions/1.11-0/topics/adonis
https://www.jianshu.com/p/dfa689f7cafd
https://stats.stackexchange.com/questions/312302/adonis-in-vegan-order-of-variables-non-nested-with-one-degree-of-freedom-for
https://stats.stackexchange.com/questions/188519/adonis-in-vegan-order-of-variables-or-use-of-strata?noredirect=1
https://github.com/vegandevs/vegan/issues/229
https://stats.stackexchange.com/questions/476256/adonis-vs-adonis2
清晰解釋Type I, Type II, Type III https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/
清晰解釋Type I, Type II, Type III https://stats.stackexchange.com/questions/60362/choice-between-type-i-type-ii-or-type-iii-anova
https://thebiobucket.blogspot.com/2011/08/two-way-permanova-adonis-with-custom.html#more
adonis的前提條件 https://thebiobucket.blogspot.com/2011/04/assumptions-for-permanova-with-adonis.html#more
作者的論文 https://static1.squarespace.com/static/580e3c475016e191c523a0e2/t/5813ba8b5016e1a5b61f454a/1477687949842/Anderson_et_al-2013-ANOSIM+vs.+PERMANOVA.pdf
離散度 adonis https://chrischizinski.github.io/rstats/adonis/
往期精品(點擊圖片直達文字對應教程)
機器學習
后臺回復“生信寶典福利第一波”或點擊閱讀原文獲取教程合集
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的Adonis结果P值小于0.05,一定代表两组样品物种构成差异显著吗?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: linux图形界面编程基本知识
- 下一篇: su官网