日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪(fǎng)問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

GSVA和生存分析

發(fā)布時(shí)間:2023/12/16 编程问答 26 豆豆
生活随笔 收集整理的這篇文章主要介紹了 GSVA和生存分析 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

為什么一定要是預(yù)后相關(guān)基因呢 | 生信菜鳥(niǎo)團(tuán) (bio-info-trainee.com)?

200塊的代碼我的學(xué)徒免費(fèi)送給你,GSVA和生存分析 (qq.com)

我們生信技能樹(shù)B站又免費(fèi)悄咪咪的上線(xiàn)了一個(gè)GSVA生存分析教學(xué)視頻,然后學(xué)徒馬上就學(xué)習(xí)了,居然是主動(dòng)學(xué)習(xí)的,讓我喜出望外!所以馬上分享了他匯報(bào)給我的學(xué)習(xí)筆記,希望對(duì)你有幫助哦!

(現(xiàn)在學(xué)習(xí)量和彈幕都非常少,大家的機(jī)會(huì)來(lái)了哦!)

https://www.bilibili.com/video/av81874183

前奏

最近做的生存分析都是奇奇怪怪的,從來(lái)沒(méi)有重復(fù)出作者的圖。哈哈哈,但是我相信自己的代碼是沒(méi)有錯(cuò)的,只是參數(shù)設(shè)置跟作者不同而已。

最近跟著Jimmy老師的視頻,學(xué)習(xí)用GSVA做生存分析,反反復(fù)復(fù)做了幾次都無(wú)法復(fù)現(xiàn)作者的結(jié)果。

第一次是我找的基因集跟作者不一致,第二次可能參數(shù)不同,也只有一個(gè)顯著性結(jié)果。

看了這篇推文,幫助你節(jié)約200塊錢(qián),可以多搓一頓火鍋了。

看上圖👆,不是我騙你,真的有人代碼賣(mài)200塊。好啦,正文開(kāi)始啦!

學(xué)習(xí)前必須要知道的生物學(xué)知識(shí)

01

生存分析(Survival analysis)是指根據(jù)試驗(yàn)或調(diào)查得到的數(shù)據(jù)對(duì)生物或人的生存時(shí)間進(jìn)行分析和推斷,研究生存時(shí)間和結(jié)局與眾多影響因素間關(guān)系及其程度大小的方法,也稱(chēng)生存率分析存活率分析。常見(jiàn)分析的用壽命表法、Kaplan-Meier,Cox回歸等等. 構(gòu)建在于病人分組后進(jìn)行比較!

如果是多個(gè)基因,可以通過(guò)GSVA來(lái)進(jìn)行分組再去做生存分析。

02

GSVA?(The Gene Set Variation Analysis package for microarray and RNA-seq data),算法太復(fù)雜,詳情搜對(duì)應(yīng)的R包。

我是大標(biāo)題-1.讀文獻(xiàn)

? ? ? ?第一步需要對(duì)文獻(xiàn)進(jìn)行解析,文獻(xiàn)來(lái)源為Cell. 2018 May ?:Chemoresistance Evolution in Triple-Negative Breast Cancer Delineated by Single-Cell Sequencing

? ? ? ? 暫時(shí)跳過(guò)了這一步,因?yàn)闀r(shí)間有限,就先看了Jimmy老師的推文使用單細(xì)胞多組學(xué)探索TNBC病人的新輔助化療療效

大標(biāo)題-2.數(shù)據(jù)庫(kù)查詢(xún)指定基因集的基因列表

  • msigdb數(shù)據(jù)庫(kù)(找出基因集的基因)

  • 根據(jù)一些feature分類(lèi);可能是unique,可能是multi

  • 下載msigd數(shù)據(jù)庫(kù)的all.gmt數(shù)據(jù)后,用Linux處理

2-1.下載gmt數(shù)據(jù)

打開(kāi)對(duì)應(yīng)的官網(wǎng)(http://software.broadinstitute.org/gsea/msigdb/index.jsp)

?

?

👆就是對(duì)應(yīng)的gmt文件,我們下載基因名稱(chēng)的gmt文件

然后打開(kāi)終端,cd到文件夾

ls?-lh

結(jié)果如👇

?

這里做了兩次,我分兩次講解,第二次會(huì)總結(jié)第一次的錯(cuò)誤

下面是第一次的workflow

找到你想要的基因集的基因

###尋找基因集 grep?-i?epithelial_mesenchymal?* #h.all.v6.2.entrez.gmt:HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION????http://www.broadinstitute.org/gsea/msigdb/cards/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION??1281????1290????12892200????1277????2335????1293????5054????1278????1282????1284????1462????3486????7045????66784060????3915????4015????3918????1490????6876????1294????4017????1292????3685????7058????13074837????7168????1000????4313????1301????7057????2191????633?871?11167???10631???70706696????3371????7980????22795???3693????4314#???7431????4016????10516???1303????2006????10091311????6695????649?9235????3909????7076????5768????7078????7412????3491????10085???800?57692???2192????6443????1893????3908????10272???7169????3624????1601????2014????10409???36784256????7422????2919????7474????6382????5352????5118????26585???3688????50509???388?56542247????6641????1647????4232????131578??4982????966?59??30008???4147????26577???52702817????25890???2517????6586????284217??56937???1296????2201????3485????5217????6385????960?4616????3576????11010???290?64175???7424????4323????6444????5351????4148????10398???6535813?5396????51330???2331????3398????2669????5329????4638????7040????6422????8985????3569333?2199????4487????5806????8325????3725????10979???22943???6591????667?7171????16342697????5376????3487????1314????4035????3673????2316????8076????5744????7049????6424????39565999????1004????6303????4907????1809????5479????7052????6445????3690????8572????115908??18429244????374?3600????4176????2619????5645????23705???5021????7857????6372????4312????7128822?10486???25878???2303????50863???2026????355?627?8038????5817????6387????51599353????4853????79709???2882????7456

這里我下載成entrezID.gmt,后面重新下載。正確基因集的結(jié)果如👇

grep?-i?epithelial_mesenchymal?* #HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION????http://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITIONCOL3A1???COL5A2??COL5A1??FBN1????COL1A1??FN1?COL6A3??SERPINE1????COL1A2??COL4A1??COL4A2??VCAN????IGFBP3??TGFBI???SPARC???LUM?LAMC1???LOX?LAMC2???CCN2????TAGLN???COL7A1??LOXL2???COL6A2??ITGAV???THBS2???COL16A1?NNMT????TPM1????CDH2????MMP2????COL11A1?THBS1???FAP?BGN?SERPINH1????FSTL1???POSTN???THY1????SPP1????TNC?TFPI2???NID2????ITGB5???MMP3????VIM?LOXL1???FBLN5???COL12A1?ELN?CDH11???COMP????SPOCK1??BMP1????IL32????LAMA3???TIMP1???QSOX1???TIMP3???VCAM1???CCN1????EDIL3???CALD1???MAGEE1??FBLN1???SGCB????ECM1????LAMA2???FSTL3???TPM2????INHBA???DAB2????EMP3????BASP1???ITGA5???MGP?VEGFA???CXCL1???WNT5A???SDC1????PLOD2???PCOLCE??GREM1???ITGB1???COL5A3??RHOB????HTRA1???FGF2????SNTB1???GADD45A?MEST????LRRC15??TNFRSF11B???CD59????ACTA2???EFEMP2??MATN2???PCOLCE2?SERPINE2????GPC1????ABI3BP??FUCA1???SLIT3???LAMA1???PMEPA1??COL8A2??FBN2????IGFBP2??PFN2????SDC4????CD44????GADD45B?CXCL8???GLIPR1??ANPEP???P3H1????VEGFC???MMP14???SGCD????PLOD1???MATN3???MYL9????SLC6A8??CALU????PRRX1???TNFRSF12A???FMOD????ID2?GEM?PLAUR???MYLK????TGFB1???SFRP1???PLOD3???IL6?APLP1???FBLN2???MSX1????PTX3????FZD8????JUN?FERMT2??DKK1????SNAI2???DST?TPM4????DCN?GJA1????PMP22???IGFBP4??COPA????LRP1????ITGA2???FLNA????MFAP5???PTHLH???TGFBR3??SFRP4???LGALS1??RGS4????CDH6????SAT1????NT5E????DPYSL3??PPIB????TGM2????SGCG????ITGB3???PDLIM4??CTHRC1??ECM2????CRLF1???AREG????IL15????MCM7????GAS1????PRSS2???CADM1???OXTR????SCG2????CXCL6???MMP1????TNFAIP3?CAPG????CAP2????MXRA5???FOXC2???NTM?ENO2????FAS?BDNF????ADAM12??PVR?CXCL12??PDGFRB??SLIT2???NOTCH2??COLGALT1????GPX7????WIPF1 grep?-i?HALLMARK_HYPOXIA?* #msigdb.v7.0.symbols.gmt:HALLMARK_HYPOXIA????http://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_HYPOXIAPGK1???PDK1????GBE1????PFKL????ALDOA???ENO2????PGM1????NDRG1???HK2?ALDOC???GPI?MXI1????SLC2A1??P4HA1ADM????P4HA2???ENO1????PFKP????AK4?FAM162A?PFKFB3??VEGFA???BNIP3L??TPI1????ERO1A???KDM3A???CCNG2???LDHA????GYS1????GAPDH???BHLHE40?ANGPTL4?JUN?SERPINE1????LOX?GCK?PPFIA4??MAFF????DDIT4???SLC2A3??IGFBPNFIL3??FOS?RBPJ????HK1?CITED2??ISG20???GALK1???WSB1????PYGM????STC1????ZNF292??BTG1????PLIN2???CSRP2VLDLR??JMJD6???EXT1????F3??PDK3????ANKZF1??UGP2????ALDOB???STC2????ERRFI1??ENO3????PNRC1???HMOX1???PGF?GAPDHS??CHST2???TMEM45A?BCAN????ATF3????CAV1????AMPD3???GPC3????NDST1???IRS2????SAP30???GAA?SDC4????STBD1IER3???PKLR????IGFBP1??PLAUR???CAVIN3??CCN5????LARGE1??NOCT????S100A4??RRAGD???ZFP36???EGFR????EDN2????IDS?CDKN1A??RORA????DUSP1???MIF?PPP1R3C?DPYSL4??KDELR3??DTNA????ADORA2B?HS3ST1??CAVIN1??NR3C1???KLF6????GPC4????CCN1????TNFAIP3?CA12????HEXA????BGN?PPP1R15A????PGM2????PIM1????PRDX5???NAGK????CDKN1B??BRS3????TKTL1MT1E???ATP7A???MT2A????SDC3????TIPARP??PKP1????ANXA2???PGAM2???DDIT3???PRKCA???SLC37A4?CXCR4???EFNA3???CP??KLF7????CCN2????CHST3???TPD52???LXN?B4GALNT2????PPARGC1A????BCL2????GCNT2???HAS1????KLHL24??SCARBSLC25A1????SDC2????CASP6???VHL?FOXO3???PDGFB???B3GALT6?SLC2A5??SRPX????EFNA1???GLRX????ACKR3???PAM?TGFBIDCN????SIAH2???PLAC8???FBP1????TPST2???PHKG1???MYH9????CDKN1C??GRHPR???PCK1????INHA????HSPA5???NDST2???NEDD4TPBG???XPNPEP1?IL6?SLC6A6??MAP3K1??LDHC????AKAP12??TES?KIF5A???LALBA???COL5A1??GPC1????HDLBP???ILVBLNCAN???TGM2????ETS1????HOXB9???SELENBP1????FOSL2???SULT2B1?TGFB3 grep?-i?HALLMARK_ANGIOGENESIS?* #msigdb.v7.0.symbols.gmt:HALLMARK_ANGIOGENESIS????http://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_ANGIOGENESIS??VCAN????POSTN???FSTL1???LRPAP1??STC1????LPL?VEGFA???PF4?THBD????FGFR1???TNFRSF21????CCND2COL5A2?ITGAV???SERPINA5????KCNJ8???APP?JAG1????COL3A1??SPP1????NRP1????OLR1????PDGFA???PTK2????SLCO2A1?PGLYRP1?VAV2????S100A4??MSX1????VTN?TIMP1???APOH????PRG2????JAG2????LUM?CXCL6 grep?-i?HALLMARK_PI3K_AKT_MTOR_SIGNALING?* #msigdb.v7.0.symbols.gmt:HALLMARK_PI3K_AKT_MTOR_SIGNALING????http://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_PI3K_AKT_MTOR_SIGNALING???MAPK8???PIK3R3??GRB2????NFKBIB??MAP2K6??MAPK9???AKT1????MAPK1???PLCG1???TRIB3GSK3B??MAP2K3??CDKN1A??RAC1????RIPK1???AKT1S1??ACTR2???PRKAR2A?YWHAB???HRAS????PDK1????PIKFYVE?TBK1????ACTR3E2F1???MYD88???ITPR2???SQSTM1??RPS6KA1?PTPN11??MAPKAP1?PLCB1???RAF1????CAMK4???RPTOR???CFL1????CDK4????TRAF2GNGT1??UBE2N???ADCY2???CDKN1B??VAV3????FGF6????ECSIT???RALB????ARF1????MKNK1???CDK1????PTEN????ARHGDIA?GRK2????FGF17???DDIT3???IRAK4???TIAM1???CDK2????SFN?PRKCB???GNA14???EIF4E???CLTC????TSC2????FGF22???PPP1CA??DUSP3HSP90B1????IL4?STAT2???SLA?EGFR????PLA2G12A????MAPK10??CALR????THEM4???RIT1????MKNK2???PPP2R1B?CAB39ARPC3??PITX2???NCK1????IL2RG???PFN1????FASLG???NOD1????DAPP1???UBE2D3??CAB39???AP2M1???MAP3K7??PRKAG1??CSNK2PRKAA2?ATF1????SLC2A1??PIN1????TNFRSF1A????LCK?RPS6KA3?NGF?CXCR4???ACACA???SMAD2???PAK4 grep?-i?BIOCARTA_ECM_PATHWAY?* #msigdb.v7.0.symbols.gmt:BIOCARTA_ECM_PATHWAY????http://www.gsea-msigdb.org/gsea/msigdb/cards/BIOCARTA_ECM_PATHWAY???MAPK3???PIK3CG??HRAS????MYL2????PIK3R1??MAP2K1??RAF1????RHOA????ROCK1???FYN?MAPK1???SHC1????TLN1????PFN1????GSN?DIAPH1??PIK3CA??ARHGAP5?ITGB1

這里有一個(gè)基因集找不到,只有查看來(lái)源文獻(xiàn)Loss of E-cadherin promotes metastasis via multiple downstream transcriptional pathways

結(jié)果是文獻(xiàn)套文獻(xiàn),套娃么。。放棄掉,這5個(gè)應(yīng)該也夠了。

下面是用哪個(gè)metabric下載表達(dá)矩陣

2-2.metabric下載表達(dá)矩陣

官網(wǎng)為[http://www.cbioportal.org/datasets],這里下載需要的BRCA的數(shù)據(jù)

?

下載完成解壓后內(nèi)容如👇

?

主要用到臨床信息、表達(dá)矩陣與體細(xì)胞變異文件

后面就可以放在R語(yǔ)言了完成🌶

2-3.制作GSVA的輸入

主要是制作表達(dá)矩陣與臨床信息

###讀入表達(dá)矩陣與臨床信息 rm(list?=?ls()) wkdir=getwd()?#為工作目錄賦值 options(stringsAsFactors?=?F) ##臨床信息 clin=read.table(file.path(wkdir,'brca_metabric','data_clinical_patient.txt'),header?=?T,sep='\t') ##表達(dá)矩陣 expr=read.table(file.path(wkdir,'brca_metabric','data_expression_median.txt'),header?=?T,sep='\t') expr[1:4,1:4] #??Hugo_Symbol?Entrez_Gene_Id??MB.0362??MB.0346 1????????RERE????????????473?8.676978?9.653589 2??????RNF165?????????494470?6.075331?6.687887 3????CD049690?????????????NA?5.453928?5.454185 4????BC033982?????????????NA?4.994525?5.346010 #需要將第一列作為行名并去掉第一、二列 rownames(expr)=expr$Hugo_Symbol expr=expr[,-c(1,2)] ###這里Jimmy老師為了節(jié)省空間,只取了取前兩位數(shù)expr?<-?apply(expr,?2,?function(x){as.numeric(format(x,digits?=?2))?#digit代表數(shù)字位數(shù)})rownames(expr)=gsexpr=na.omit(expr)?#忽略NA值

接下來(lái)還需要做一些變化,因?yàn)楸磉_(dá)矩陣的樣本名是以"."分割

?

而臨床信息是"-"分割

?

clin$PATIENT_ID?=?gsub('-','.',clin$PATIENT_ID)?##將“-”變?yōu)椤?” expr=clin[match(colnames(expr),clin$PATIENT_ID),]?#防止錯(cuò)位進(jìn)行配對(duì)矯正 ###取TNBC亞型 clin1=clin[clin$CLAUDIN_SUBTYPE%in%c("Basal","claudin-low"),] clin1=expr[,colnames(expr)%in%clin1$PATIENT_ID] save(clin1,clin1,file?=?"GSVA_input.Rdata")

可以看到有427個(gè)病人屬于這個(gè)亞型。如果不看Jimmy老師的視屏,我不會(huì)知道哪些是TNBC的。😃,生物學(xué)背景真的很重要,見(jiàn)推文沒(méi)有生物學(xué)背景的數(shù)據(jù)分析很危險(xiǎn)

2-4.gmt文件制作

用Linux將前面得到的基因集復(fù)制到一個(gè)文檔,后綴為gmt

cat?>?GSVA_input.gmt msigdb.v7.0.symbols.gmt:HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION????http://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITIONCOL3A1???COL5A2??COL5A1??FBN1????COL1A1??FN1?COL6A3??SERPINE1????COL1A2??COL4A1??COL4A2??VCAN????IGFBP3??TGFBI???SPARC???LUM?LAMC1???LOX?LAMC2???CCN2????TAGLN???COL7A1??LOXL2???COL6A2??ITGAV???THBS2???COL16A1?NNMT????TPM1????CDH2????MMP2????COL11A1?THBS1???FAP?BGN?SERPINH1????FSTL1???POSTN???THY1????SPP1????TNC?TFPI2???NID2????ITGB5???MMP3????VIM?LOXL1???FBLN5???COL12A1?ELN?CDH11???COMP????SPOCK1??BMP1????IL32????LAMA3???TIMP1???QSOX1???TIMP3???VCAM1???CCN1????EDIL3???CALD1???MAGEE1??FBLN1???SGCB????ECM1????LAMA2???FSTL3???TPM2????INHBA???DAB2????EMP3????BASP1???ITGA5???MGP?VEGFA???CXCL1???WNT5A???SDC1????PLOD2???PCOLCE??GREM1???ITGB1???COL5A3??RHOB????HTRA1???FGF2????SNTB1???GADD45A?MEST????LRRC15??TNFRSF11B???CD59????ACTA2???EFEMP2??MATN2???PCOLCE2?SERPINE2????GPC1????ABI3BP??FUCA1???SLIT3???LAMA1???PMEPA1??COL8A2??FBN2????IGFBP2??PFN2????SDC4????CD44????GADD45B?CXCL8???GLIPR1??ANPEP???P3H1????VEGFC???MMP14???SGCD????PLOD1???MATN3???MYL9????SLC6A8??CALU????PRRX1???TNFRSF12A???FMOD????ID2?GEM?PLAUR???MYLK????TGFB1???SFRP1???PLOD3???IL6?APLP1???FBLN2???MSX1????PTX3????FZD8????JUN?FERMT2??DKK1????SNAI2???DST?TPM4????DCN?GJA1????PMP22???IGFBP4??COPA????LRP1????ITGA2???FLNA????MFAP5???PTHLH???TGFBR3??SFRP4???LGALS1??RGS4????CDH6????SAT1????NT5E????DPYSL3??PPIB????TGM2????SGCG????ITGB3???PDLIM4??CTHRC1??ECM2????CRLF1???AREG????IL15????MCM7????GAS1????PRSS2???CADM1???OXTR????SCG2????CXCL6???MMP1????TNFAIP3?CAPG????CAP2????MXRA5???FOXC2???NTM?ENO2????FAS?BDNF????ADAM12??PVR?CXCL12??PDGFRB??SLIT2???NOTCH2??COLGALT1????GPX7????WIPF1 msigdb.v7.0.symbols.gmt:HALLMARK_HYPOXIA????http://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_HYPOXIAPGK1???PDK1????GBE1????PFKL????ALDOA???ENO2????PGM1????NDRG1???HK2?ALDOC???GPI?MXI1????SLC2A1??P4HA1ADM????P4HA2???ENO1????PFKP????AK4?FAM162A?PFKFB3??VEGFA???BNIP3L??TPI1????ERO1A???KDM3A???CCNG2???LDHA????GYS1????GAPDH???BHLHE40?ANGPTL4?JUN?SERPINE1????LOX?GCK?PPFIA4??MAFF????DDIT4???SLC2A3??IGFBPNFIL3??FOS?RBPJ????HK1?CITED2??ISG20???GALK1???WSB1????PYGM????STC1????ZNF292??BTG1????PLIN2???CSRP2VLDLR??JMJD6???EXT1????F3??PDK3????ANKZF1??UGP2????ALDOB???STC2????ERRFI1??ENO3????PNRC1???HMOX1???PGF?GAPDHS??CHST2???TMEM45A?BCAN????ATF3????CAV1????AMPD3???GPC3????NDST1???IRS2????SAP30???GAA?SDC4????STBD1IER3???PKLR????IGFBP1??PLAUR???CAVIN3??CCN5????LARGE1??NOCT????S100A4??RRAGD???ZFP36???EGFR????EDN2????IDS?CDKN1A??RORA????DUSP1???MIF?PPP1R3C?DPYSL4??KDELR3??DTNA????ADORA2B?HS3ST1??CAVIN1??NR3C1???KLF6????GPC4????CCN1????TNFAIP3?CA12????HEXA????BGN?PPP1R15A????PGM2????PIM1????PRDX5???NAGK????CDKN1B??BRS3????TKTL1MT1E???ATP7A???MT2A????SDC3????TIPARP??PKP1????ANXA2???PGAM2???DDIT3???PRKCA???SLC37A4?CXCR4???EFNA3???CP??KLF7????CCN2????CHST3???TPD52???LXN?B4GALNT2????PPARGC1A????BCL2????GCNT2???HAS1????KLHL24??SCARBSLC25A1????SDC2????CASP6???VHL?FOXO3???PDGFB???B3GALT6?SLC2A5??SRPX????EFNA1???GLRX????ACKR3???PAM?TGFBIDCN????SIAH2???PLAC8???FBP1????TPST2???PHKG1???MYH9????CDKN1C??GRHPR???PCK1????INHA????HSPA5???NDST2???NEDD4TPBG???XPNPEP1?IL6?SLC6A6??MAP3K1??LDHC????AKAP12??TES?KIF5A???LALBA???COL5A1??GPC1????HDLBP???ILVBLNCAN???TGM2????ETS1????HOXB9???SELENBP1????FOSL2???SULT2B1?TGFB3 msigdb.v7.0.symbols.gmt:HALLMARK_ANGIOGENESIS????http://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_ANGIOGENESIS??VCAN????POSTN???FSTL1???LRPAP1??STC1????LPL?VEGFA???PF4?THBD????FGFR1???TNFRSF21????CCND2COL5A2?ITGAV???SERPINA5????KCNJ8???APP?JAG1????COL3A1??SPP1????NRP1????OLR1????PDGFA???PTK2????SLCO2A1?PGLYRP1?VAV2????S100A4??MSX1????VTN?TIMP1???APOH????PRG2????JAG2????LUM?CXCL6 msigdb.v7.0.symbols.gmt:HALLMARK_PI3K_AKT_MTOR_SIGNALING????http://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_PI3K_AKT_MTOR_SIGNALING???MAPK8???PIK3R3??GRB2????NFKBIB??MAP2K6??MAPK9???AKT1????MAPK1???PLCG1???TRIB3GSK3B??MAP2K3??CDKN1A??RAC1????RIPK1???AKT1S1??ACTR2???PRKAR2A?YWHAB???HRAS????PDK1????PIKFYVE?TBK1????ACTR3E2F1???MYD88???ITPR2???SQSTM1??RPS6KA1?PTPN11??MAPKAP1?PLCB1???RAF1????CAMK4???RPTOR???CFL1????CDK4????TRAF2GNGT1??UBE2N???ADCY2???CDKN1B??VAV3????FGF6????ECSIT???RALB????ARF1????MKNK1???CDK1????PTEN????ARHGDIA?GRK2????FGF17???DDIT3???IRAK4???TIAM1???CDK2????SFN?PRKCB???GNA14???EIF4E???CLTC????TSC2????FGF22???PPP1CA??DUSP3HSP90B1????IL4?STAT2???SLA?EGFR????PLA2G12A????MAPK10??CALR????THEM4???RIT1????MKNK2???PPP2R1B?CAB39ARPC3??PITX2???NCK1????IL2RG???PFN1????FASLG???NOD1????DAPP1???UBE2D3??CAB39???AP2M1???MAP3K7??PRKAG1??CSNK2PRKAA2?ATF1????SLC2A1??PIN1????TNFRSF1A????LCK?RPS6KA3?NGF?CXCR4???ACACA???SMAD2???PAK4 msigdb.v7.0.symbols.gmt:BIOCARTA_ECM_PATHWAY????http://www.gsea-msigdb.org/gsea/msigdb/cards/BIOCARTA_ECM_PATHWAY???MAPK3???PIK3CG??HRAS????MYL2????PIK3R1??MAP2K1??RAF1????RHOA????ROCK1???FYN?MAPK1???SHC1????TLN1????PFN1????GSN?DIAPH1??PIK3CA??ARHGAP5?ITGB1

結(jié)果如👇

大標(biāo)題-3.GSVA

有了上面的文件就可以利用GSVA計(jì)算了

library(GSVA) library(stringr) gs=readLines("GSVA_input.gmt")#讀入基因集 gs1?<-?lapply(gs,?function(x)??{x=strsplit(x,"\t")[[1]]???#通過(guò)\t分割y=x[3:length(x)]??????????#不取前兩行,因?yàn)椴皇腔騬eturn(y)})???????????????#返回y值 X=expr es.max?<-?gsva(X,?gs1,mx.diff=FALSE,?verbose=FALSE,?parallel.sz=1) rownames(es.max)=unlist(lapply(gs,?function(x)? {x=strsplit(x,"\t")[[1]][1] x=strsplit(x,":")[[1]][2]})) es.dif?<-?gsva(X,?gs1,?mx.diff=TRUE,?verbose=FALSE,?parallel.sz=1) rownames(es.dif)=rownames(es.max)???????#行名命名 save(es.max,es.dif,gs1,file?=?"es_output.Rdata") save(es.max,es.dif,gs1,file?=?"es_output.Rdata")

👆是兩種算法的差異結(jié)果,具體我也沒(méi)有看太懂,根據(jù)文章設(shè)置參數(shù)就好。

大標(biāo)題-4.生存分析

兩部分:es.max/es.dif

es.max

library(survival) library(survminer) phe$event=ifelse(phe$OS_STATUS=='DECEASED',1,0)?#分為死亡與生存2種情況 phe$time=as.numeric(phe$OS_MONTHS)?#根據(jù)月份劃分 rownames(phe)=gsub("-",".",phe$PATIENT_ID)?#同上面一樣需要變一下分割符 ids=intersect(rownames(phe),colnames(es.max))?#取交集 phe=phe[ids,]?#取子集 es.max=es.max[,ids]?#取子集

?

篩選完成后只剩398個(gè)樣本

根據(jù)文獻(xiàn)大于0.1為“high,小于”-0.1“為low,方法就是用的GSVA,所以這里我們對(duì)其進(jìn)行分組

for?(i?in?1:5){ phe1=phe[abs(es.max[i,])?>?0.1,]? es.max1=es.max[,abs(es.max[i,])?>?0.1] phe1$group_list=ifelse(es.max1[i,]?>?0.1,'high','low') #?利用ggsurvplot快速繪制漂亮的生存曲線(xiàn)圖 sfit?<-?survfit(Surv(time,?event)~group_list,?data=phe1) p=ggsurvplot(sfit,?conf.int=F,?pval=TRUE,legend.title=rownames(es.max)[i]) print(p$plot) ggsave(filename?=?paste0("0.1",rownames(es.max)[i],".pdf"))} dev.off()

結(jié)果如👇,沒(méi)有一個(gè)顯著

?

但當(dāng)我把參數(shù)調(diào)為0.2的時(shí)候居然第一個(gè)基因集ECM顯著,但是并不是文章中顯著的2個(gè)基因集(AKT、HYPOXIA)

?

下面我用另一種方法試一下

es.dif

方法跟👆差不多

phe=clin1[clin1$OS_STATUS?%in%?c('DECEASED','LIVING'),] phe$event=ifelse(phe$OS_STATUS=='DECEASED',1,0) phe$time=as.numeric(phe$OS_MONTHS) colnames(phe) rownames(phe)=gsub("-",".",phe$PATIENT_ID) ids1=intersect(rownames(phe),colnames(es.dif)) phe=phe[ids1,] es.dif=es.dif[,ids1] for?(i?in?1:5){phe2=phe[abs(es.dif[i,])?>?0.1,]?es.dif1=es.dif[,abs(es.dif[i,])?>?0.1]phe2$group_list=ifelse(es.dif1[i,]?>?0.1,'high','low')#?利用ggsurvplot快速繪制漂亮的生存曲線(xiàn)圖sfit?<-?survfit(Surv(time,?event)~group_list,?data=phe2)p=ggsurvplot(sfit,?conf.int=F,?pval=TRUE,legend.title=rownames(es.max)[i])print(p$plot)ggsave(filename?=?paste0("0.1_i",rownames(es.dif)[i],".pdf"))}?

結(jié)果如👇

參數(shù)為0.1,只有第一個(gè)基因集ECM顯著

?

參數(shù)為0.2時(shí),也是只有第一個(gè)基因集ECM顯著


大標(biāo)題-插曲

前面在下載gmt數(shù)據(jù)那里做了兩次,第一次在上面已經(jīng)講解了,下面是第二次的探索過(guò)程,第二次會(huì)總結(jié)第一次的錯(cuò)誤。

那么為什么有如此大的差別呢?方法與文中一致,那么只可能是數(shù)據(jù)出現(xiàn)的問(wèn)題。檢查文章,果然發(fā)現(xiàn)篩選的基因集不一致。前面我們主要篩選的是HALLMARK的基因集,是由多個(gè)已知的基因集構(gòu)成的超基因集

?

而文章里用的是以前文章發(fā)表的基因集

?

那應(yīng)該怎么搜索基因集呢?我找到了規(guī)律,只需要搜索作者名字+基因集名稱(chēng)就可以了

grep?HARRIS_HYPOXIA?*

結(jié)果如下面

?

這就是我們所需要的基因集了,其他基因集方法類(lèi)似

grep?HARRIS_HYPOXIA?* #msigdb.v7.0.symbols.gmt:HARRIS_HYPOXIA????http://www.gsea-msigdb.org/gsea/msigdb/cards/HARRIS_HYPOXIA?PLAUR???PGK1????JUN?VEGFA???XRCC5???RP1?HIF1A???CA9?PGF?BNIP3???PKM?ALDOA???BNIP3L??BHLHE40?HGF?TFF3????ENO1????HK2?F3??CP??SPP1????FOS?HMOX1???DDIT3???HDAC9???AK3?EDN1????P4HA1???TH??EPAS1???IGFBP3??CCL2????STC1????LDHA????COL5A1??FLT1????CA12????PDGFB???TEK?IGF2????PTGS2???CD99????TXN?CDKN1A??PFKL????XRCC6???SAT1????TGFB1???FTL?TFRC????LRP8????CXCL8???NFKB1???TF??TAGLN???CCNG2???TGM2????EDN2????IL6?BIK?PFKP????ANGPT2??TGFA????EPO?SLC2A1??ADM?VIM?SLC2A3??IGFBP1??L1CAM???GAPDH???CDKN1B??MIF?TGFB3???IGFBP2??APEX1???FGF3????PRPS1???HK1?MMP13???ENPEP grep?CREIGHTON_AKT1_SIGNALING?* #msigdb.v7.0.symbols.gmt:CREIGHTON_AKT1_SIGNALING_VIA_MTOR_UP????http://www.gsea-msigdb.org/gsea/msigdb/cards/CREIGHTON_AKT1_SIGNALING_VIA_MTOR_UP???NEDD8???BRMS1???CDK16???SPINT1??DDR1????LASP1???CLSTN1??MVK?PRKCD???UBE2M???AC004156.1??NEU1????CORO1B??CYB561??KCTD5???SYNJ2BP?CDC34???TJP3????HNRNPAB?SLC37A1?NECTIN2?ADIPOR1?BSG?BIK?AKT1????DUSP10??MMP15???POR?GPX4????ARHGEF16????CLDN3???TMED10??TOLLIP??PMPCA #msigdb.v7.0.symbols.gmt:CREIGHTON_AKT1_SIGNALING_VIA_MTOR_DN????http://www.gsea-msigdb.org/gsea/msigdb/cards/CREIGHTON_AKT1_SIGNALING_VIA_MTOR_DN???TNFRSF12A???RGL2????PFKL????PPP4C???CIB1????ATP6V0B?PPP2R1A?KRT8????DHCR7???MRPS7???TUBB4B??YWHAB???ALDOA???PAFAH1B3????GOT1????TOM1????ATP6AP1?CTSA????ATP6V0C?MIF?GPI?ATP6V1F?TSPAN1 grep?ONDER_CDH1?* #msigdb.v7.0.symbols.gmt:ONDER_CDH1_TARGETS_3_UP????http://www.gsea-msigdb.org/gsea/msigdb/cards/ONDER_CDH1_TARGETS_3_UP????TSC22D3?KDELR3??ZBTB16??MAGED1??STC2????CYP1B1??MAGED2??LEPR????ALDH6A1?TCEA2???WIPI1???KCNMA1??RNASE4??S100A8??CRIP2???FTL?PRKCA???KLF9 #msigdb.v7.0.symbols.gmt:ONDER_CDH1_TARGETS_3_DN????http://www.gsea-msigdb.org/gsea/msigdb/cards/ONDER_CDH1_TARGETS_3_DN????CXCL1???EPN3????PTGS2???DSC2????NAV3????EDN1????SOX9????KRT13???CWH43???LCN2????TGFA????SCEL????HS3ST2??C1orf116????FST?HBEGF???CEACAM6?SPRR2D??IL1R2???MFAP5???TMPRSS11E???CRCT1???IVL?KLK10???IL36G???EHF?ADGRE2??SERPINB2????P2RY2???SERPINB13???ATP12A??IL1B????CXCL8???KRT8????MYO5C???KLK11???KLK7????SPAG1???ARL4C???MARC1???CYB5R2??CXCL2???ST6GALNAC5??ANO1????KRT15???PI3?GPRC5A??SERPINB1????TP63????PTHLH???CSF3????DEPP1???DUSP6???ARHGAP25????CXCL3???CSF2????VGLL1???MMP10???S100A7

其他的3個(gè)基因集文章沒(méi)有并給文獻(xiàn),只能默認(rèn)用HALLMARK的基因集

下面是第二次的workflow

其實(shí)跟第一次類(lèi)似,不同的地方做注釋,相似的地方不在贅述。這里以hypoxia基因集作為🌰,其他的基因集類(lèi)似

rm(list?=?ls()) options(stringsAsFactors?=?F) load("data/metabric_clinical.Rdata") load("data/metabric_expression.Rdata") clin1=clin[clin$CLAUDIN_SUBTYPE%in%c("Basal","claudin-low"),] class(expr) class(expr1) library(GSVA) library(stringr) gs=readLines("harris_hypoxia.gmt")?#記得這里要替換成不同基因集的gmt gs1?<-?lapply(gs,?function(x)? {x=strsplit(x,"\t")[[1]] y=x[3:length(x)] return(y)}) X=expr hy.max?<-?gsva(X,?gs1,mx.diff=FALSE,?verbose=FALSE,?parallel.sz=1)#max法 rownames(hy.max)=unlist(lapply(gs,?function(x)? {x=strsplit(x,"\t")[[1]][1] x=strsplit(x,":")[[1]][2]})) hy.dif?<-?gsva(X,?gs1,?mx.diff=TRUE,?verbose=FALSE,?parallel.sz=1)#diff法 rownames(hy.dif)=rownames(hy.max) save(hy.max,hy.dif,gs1,file?=?"hy_hypoxia.Rdata")#存儲(chǔ)一下 load("hy_hypoxia.Rdata") library(survival) library(survminer) library(stringr) dim(clin) clin[1:4,1:4] #?終點(diǎn)事件(outcome?event)又稱(chēng)失效事件(failure?event)?或死亡事件(death?event)?? #?這種分組資料的生存分析常采用壽命表法(life-table?method) #?生存分析也經(jīng)常采用Kaplan-Meier曲線(xiàn)及l(fā)og-rank檢驗(yàn) table(clin$VITAL_STATUS) table(clin$OS_STATUS) #?理論上這個(gè)時(shí)候需要區(qū)分?Died?of?Disease和Other?Caushy phe=clin1[clin1$OS_STATUS?%in%?c('DECEASED','LIVING'),] phe$event=ifelse(phe$OS_STATUS=='DECEASED',1,0) phe$time=as.numeric(phe$OS_MONTHS) colnames(phe) rownames(phe)=gsub("-",".",phe$PATIENT_ID) ids=intersect(rownames(phe),colnames(hy.max)) phe=phe[ids,] hy.max=as.matrix(hy.max)#因?yàn)橹挥幸涣?#xff0c;需要轉(zhuǎn)換成矩陣 hy.max=hy.max[,ids] hy.max=as.matrix(hy.max)#因?yàn)橹挥幸涣?#xff0c;需要轉(zhuǎn)換成矩陣 phe[1:4,1:4] phe1=phe[abs(hy.max[1,])?>?0.1,]? hy.max1=hy.max[,abs(hy.max[1,])?>?0.1] hy.max1=t(as.matrix(hy.max1))?#因?yàn)橹挥幸涣?#xff0c;需要轉(zhuǎn)換成矩陣并轉(zhuǎn)置,不然會(huì)報(bào)錯(cuò) phe1$group_list=ifelse(hy.max1[1,]?>?0.1,'high','low') #?利用ggsurvplot快速繪制漂亮的生存曲線(xiàn)圖 sfit1?<-?survfit(Surv(time,?event)~group_list,?data=phe1) p1=ggsurvplot(sfit1,?conf.int=F,?pval=TRUE,legend.title="hypoxia") p1 dev.off()###diff phe=clin1[clin1$OS_STATUS?%in%?c('DECEASED','LIVING'),] phe$event=ifelse(phe$OS_STATUS=='DECEASED',1,0) phe$time=as.numeric(phe$OS_MONTHS) colnames(phe) rownames(phe)=gsub("-",".",phe$PATIENT_ID) ids1=intersect(rownames(phe),colnames(hy.dif)) phe=phe[ids1,] hy.dif=hy.dif[,ids1] hy.dif=as.matrix(hy.dif) phe2=phe[abs(hy.dif[1,])?>?0.1,]? hy.dif1=hy.dif[,abs(hy.dif[1,])?>?0.1] hy.dif1=t(as.matrix(hy.dif1)) phe2$group_list=ifelse(hy.dif1[1,]?>?0,'high','low') #?利用ggsurvplot快速繪制漂亮的生存曲線(xiàn)圖 sfit2?<-?survfit(Surv(time,?event)~group_list,?data=phe2) p2=ggsurvplot(sfit2,?conf.int=F,?pval=TRUE,legend.title="hypoxia") p2 dev.off()

最終結(jié)果如👇

0.1的閾值,MAX法結(jié)果只有文章中顯著的1個(gè)基因集(AKT)顯著,另一個(gè)HYPOXIA并不顯著

?

0.1的閾值,diff法結(jié)果文章中顯著的1個(gè)基因集(AKT)顯著,另一個(gè)HYPOXIA并不顯著,另外ECM基因集也顯著

總結(jié)

以上是生活随笔為你收集整理的GSVA和生存分析的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。