當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

UCSC数据库数据调用cruzdb

發布時間：2024/4/15 数据库 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 UCSC数据库数据调用cruzdb 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

https://github.com/Wy2160640/cruzdb

UCSC基因組數據庫是注釋，調節和變異以及越來越多的分類群的各種數據的重要資源。該庫旨在簡化數據的利用，以便我們可以進行復雜的分析，而無需采用易于操作，容易出錯的操作。作為動機，以下是一些功能的示例：

>>> from cruzdb import Genome>>> g = Genome(db="hg18")>>> muc5b = g.refGene.filter_by(name2="MUC5B").first() >>> muc5b refGene(chr11:MUC5B:1200870-1239982)>>> muc5b.strand '+'# the first 4 introns >>> muc5b.introns[:4] [(1200999L, 1203486L), (1203543L, 1204010L), (1204082L, 1204420L), (1204682L, 1204836L)]# the first 4 exons. >>> muc5b.exons[:4] [(1200870L, 1200999L), (1203486L, 1203543L), (1204010L, 1204082L), (1204420L, 1204682L)]# note that some of these are not coding because they are < cdsStart >>> muc5b.cdsStart 1200929L# the extent of the 5' utr. >>> muc5b.utr5 (1200870L, 1200929L)# we can get the (first 4) actual CDS's with: >>> muc5b.cds[:4] [(1200929L, 1200999L), (1203486L, 1203543L), (1204010L, 1204082L), (1204420L, 1204682L)]# the cds sequence from the UCSC DAS server as a list with one entry per cds >>> muc5b.cds_sequence #doctest: +ELLIPSIS ['atgggtgccccgagcgcgtgccggacgctggtgttggctctggcggccatgctcgtggtgccgcaggcag', ...]>>> transcript = g.knownGene.filter_by(name="uc001aaa.2").first() >>> transcript.is_coding False# convert a genome coordinate to a local coordinate. >>> transcript.localize(transcript.txStart) 0L# or localize to the CDNA position. >>> print transcript.localize(transcript.cdsStart, cdna=True) None

命令行調用

python -m cruzdb hg18 input.bed refGene cpgIslandExt

使用版本hg18中的refGene和cpgIslandExt表注釋間隔。

數據框

......是這樣的。我們可以從桌子上得到一個：

>>> df = g.dataframe('cpgIslandExt') >>> df.columns #doctest: +ELLIPSIS Index([chrom, chromStart, chromEnd, name, length, cpgNum, gcNum, perCpg, perGc, obsExp], dtype=object)

通過將'refGene'更改為'knownGene'，可以使用knownGene注釋重復上述所有操作。而且，它可以很容易地完成一組基因。

空間的

可以使用k近鄰，上游和下游搜索。上行和下游搜索使用查詢功能的鏈來確定方向：

>>> nearest = g.knearest("refGene", "chr1", 9444, 9555, k=6) >>> up_list = g.upstream("refGene", "chr1", 9444, 9555, k=6) >>> down_list = g.downstream("refGene", "chr1", 9444, 9555, k=6)

鏡像

以上使用UCSC的mysql接口。現在可以通過以下方式將任何表從UCSC鏡像到本地sqlite數據庫：

>>> import os >>> if os.path.exists("/tmp/u.db"): os.unlink('/tmp/u.db')>>> g = Genome('hg18')>>> gs = g.mirror(['chromInfo'], 'sqlite:tmp/u.db')

然后用作：

>>> gs.chromInfo <class 'cruzdb.sqlsoup.chromInfo'>

代碼

大多數每行功能都在Feature類的cruzdb/models.py中實現。如果要向功能添加內容（如現有feature.utr5），請在此處添加。
這些表使用sqlalchemy反映并映射到cruzdb/__ init__.py中Genome類的__getattr__方法中，所以像這樣調用：

genome.knownGene

調用__getattr__方法，將表arg設置為'knownGene'，然后反映該表，并返回父類為Feature和sqlalchemy的declarative_base的對象。

貢獻

要開始編碼，獲取一些UCSC表的副本可能很有禮貌，以免使UCSC服務器過載。你可以運行類似的東西：

Genome('hg18').mirror(["refGene", "cpgIslandExt", "chromInfo", "knownGene", "kgXref"], "sqlite:tmp/hg18.db")

然后連接將是這樣的：

g = Genome("sqlite:tmp/hg18.db")

轉載于:https://www.cnblogs.com/yahengwang/p/10195614.html

總結

以上是生活随笔為你收集整理的UCSC数据库数据调用cruzdb的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Android.bp 添加宏开关【转】
下一篇：简易的CRM系统案例之SpringMVC