使用集合映射和關聯關系映射
Inter-conversion of gene ID’s is the most important aspect enabling genomic and proteomic data analysis. There are multiple tools available each with its own drawbacks. While performing enrichment analysis on Mass Spectrometry datasets, I had always struggled to prepare the input files required for each of the packages in R. It takes some data tweaking and cleanup to enable the R tools or packages to accept them as an input. The struggle is more in case of UniProt id’s as very few applications accept them as input. Although UniProt provides the retrieve id mapping function, it does not take into account the number of rows which means any protein or gene id which cannot be mapped is simply omitted from the output file. This makes combining the datasets difficult.
基因ID的相互轉換是實現基因組和蛋白質組數據分析的最重要方面。 有多種可用的工具,每種工具都有其自身的缺點。 在對質譜數據集進行富集分析時,我一直在努力準備R中每個程序包所需的輸入文件。需要進行一些數據調整和清理,以使R工具或程序包可以將它們作為輸入來接受。 在UniProt id的情況下,斗爭更加艱巨,因為很少有應用程序接受它們作為輸入。 盡管UniProt提供了檢索ID映射功能,但它沒有考慮行數,這意味著從輸出文件中會省略掉無法映射的任何蛋白質或基因ID。 這使得難以合并數據集。
There are numerous tools available for such kind of ID mapping. Here I am laying out a few R packages that I have used and worked smoothly.
有許多工具可用于此類ID映射。 在這里,我將介紹一些我使用和順利工作過的R軟件包。
AnnotationDbi package
AnnotationDbi包
The org.Hs.eg.db package or the org.Mm.eg.db package is to be used for human and mice respectively. mapIds can take any input form like UniProt id, HGNC symbol, Ensembl id and Entrez id and interconvert them.
org.Hs.eg.db軟件包或org.Mm.eg.db軟件包將分別用于人類和小鼠。 mapId可以采用任何輸入形式,例如UniProt ID,HGNC符號,Ensembl ID和Entrez ID并相互轉換。
library(‘org.Mm.eg.db’)ensembl<-mapIds(org.Mm.eg.db, keys=rownames(df), column=’ENSEMBL’, keytype=’SYMBOL’, multiVals=”first”)entrez<-mapIds(org.Mm.eg.db, keys=rownames(df), column=’ENTREZID’, keytype=’SYMBOL’, multiVals=”first”)entrez<-mapIds(org.Mm.eg.db, keys=rownames(df), column=’UNIPROT’, keytype=’SYMBOL’, multiVals=”first”)
mapIds()
returns a named vector of id’s.
mapIds()
返回id的命名向量。
The output can be merged to the original dataset using `cbind` for further downstream analysis. The one advantage that I have noticed with mapIds is that it matches the gene id’s row by row and inserts NA when it can’t find gene names or symbols for certain UniProt id’s. This is a huge lifesaver when working with huge datasets.
可以使用`cbind`將輸出合并到原始數據集中以進行進一步的下游分析。 我用mapIds注意到的一個優點是,它與基因ID的行匹配,并且在找不到某些UniProt ID的基因名稱或符號時插入NA。 當使用龐大的數據集時,這是一個巨大的救星。
2. biomaRt package
2.生物材料包裝
require(biomaRt)mart<-useMart(biomart = “ensembl”, dataset = “mmusculus_gene_ensembl”)mart <- useDataset(dataset=”mmusculus_gene_ensembl”, mart=mart)mapping <- getBM(attributes=c(“mgi_symbol”,”ensembl_gene_id”,”entrezgene_id”), filters = “mgi_symbol”, mart=mart, values=data, uniqueRows=TRUE, bmHeader = T)
For human hgnc_symbol and for mouse mgi_symbol is to be used.
對于人類hgnc_symbol和對于小鼠, mgi_symbol將被使用。
Generally, with biomaRt, extra work is required after you perform the initial mapping. You will note that biomaRt does not even return the genes in the same order in which they were submitted.
通常,對于biomaRt ,執行初始映射后需要額外的工作。 您會注意到, biomaRt甚至不按提交基因的順序返回基因。
3. bitr from ClusterProfiler package
3.從ClusterProfiler包中獲取bitr
The ClusterProfiler package was developed by Guangchuang Yu for statistical analysis and visualization of functional profiles for genes and gene clusters. The org.Hs.eg.db or the org.Mm.eg.db package is to be used for human and mice respectively. The key types can be obtained by typing keytypes(org.Mm.eg.db)
.
ClusterProfiler軟件包由Yu Guangchuang Yu開發,用于統計分析和可視化基因和基因簇的功能概況。 org.Hs.eg.db或org.Mm.eg.db包將分別用于人類和小鼠。 可以通過鍵入keytypes(org.Mm.eg.db)
獲得密鑰類型。
bitr(geneID, fromType, toType, OrgDb, drop = TRUE)ids <- bitr(data, fromType=”SYMBOL”, toType=c(“UNIPROT”, “ENSEMBL”, “ENTREZID”), OrgDb=”org.Mm.eg.db”)
Apart from the R functions listed above there are various tools for gene ID conversion like DAVID, UCSC gene ID converter etc. for non-programmers.
除了上面列出的R函數外,還有各種用于基因ID轉換的工具,例如DAVID,UCSC基因ID轉換器等,用于非編程人員。
翻譯自: https://medium.com/computational-biology/gene-id-mapping-using-r-14ff50eec9ba
使用集合映射和關聯關系映射
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/392319.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/392319.shtml 英文地址,請注明出處:http://en.pswp.cn/news/392319.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!