Is it possible to annotate ALL myIDs from Ensembl to symbol using R (biomaRt)?

Question

Is it possible to annotate ALL myIDs from Ensembl to symbol using R (biomaRt)?

403 views Asked by Hicham Hboub At 07 April 2022 at 18:49

I have a human datasets with genes ensembl and I want to annotate IDs to Symbol instead of ensembl in one of these datasets I have exactly 20176 genes I used two methods, but in boths I got NAs in some genes

First method:

library(biomaRt)

library(org.Hs.eg.db)

keytypes(org.Hs.eg.db)

Data <- read.csv("Data.csv", header = T, row.names = 1)
Data$SYMBOL <- mapIds (org.Hs.eg.db, keys = row.names(Data), keytype = "ENSEMBL", column = "SYMBOL")

but I found exactly 3845 NAs:

sum(is.na(Data))

Second Method:

`library("EnsDb.Hsapiens.v86")

keytypes(EnsDb.Hsapiens.v86) mapIds <- mapIds(EnsDb.Hsapiens.v86, keys = genes$'row.names(Data)', keytype = "GENEID", column = "SYMBOL")`

but also I found 761 NAs.

I'm wondering if there's a newer version of EnsDb.Hsapiens to use it to get all gene Symbols without any NAs or even another package.

my genes name : https://docs.google.com/document/d/1VVtveHXbOXt8m02ttcAmjHxF59YTFFgOEvyBhyqw13w/edit?usp=sharing

Original Q&A

There are 1 answers

**Chris** · Answer 1 · 2022-04-08T16:46:28+00:00

After downloading your shared data, steps taken:

ensem <- read.csv('~/Downloads/Ensembl.txt', header=TRUE, sep ='\n')
# here I cheated after finding
# https://www.biotools.fr/human/ensembl_symbol_converter
# pasted in without header and downloaded
ensem_symbol <- read.csv('ens_symbol_nohead.txt', header = FALSE, sep ='\n')
# returns Ensemble\Symbol
# [Wiktor](https://stackoverflow.com/questions/33210280/r-strsplit-on-backslash)
ensem_symb_split <- strsplit(x = ensem_symbol$V1, split ='\\\\|[^[:print:]]', perl = FALSE)
en_sy_tst_rbind <- do.call(rbind, ensem_symb_split)
en_sy_df <- as.data.frame(en_sy_tst_rbind)

At the above site they don't say explicitly what is returned if a match is not found, one would think NA:

not_defined_sym <- nchar(en_sy_df[, 2])
en_sy_df[which(not_defined_sym == 2), ]
                   V1 V2
498   ENSG00000039537 C6
523   ENSG00000042832 TG
551   ENSG00000047457 CP
554   ENSG00000047597 XK
749   ENSG00000062485 CS
1717  ENSG00000091483 FH
1719  ENSG00000091513 TF
2886  ENSG00000106804 C5
3533  ENSG00000112936 C7
3552  ENSG00000113141 IK
3604  ENSG00000113600 C9
4062  ENSG00000117525 F3
4835  ENSG00000125730 C3
9759  ENSG00000166278 C2
10169 ENSG00000168453 HR
10275 ENSG00000169083 AR
11001 ENSG00000173599 PC
11122 ENSG00000174611 KY
12380 ENSG00000185010 F8
13462 ENSG00000198125 MB
13605 ENSG00000198734 F5
13635 ENSG00000198814 GK
18737 ENSG00000257017 HP

# final test for all annotated
 en_sy_df[which(not_defined_sym == ''), ]
[1] V1 V2
<0 rows> (or 0-length row.names)
# all key:values complete

It appears the recommendation is to update to the version running the site above.

TechQA.

Is it possible to annotate ALL myIDs from Ensembl to symbol using R (biomaRt)?

There are 1 answers

Related Questions in R

Related Questions in PACKAGE

Related Questions in ANNOTATIONS

Related Questions in INSTALL.PACKAGES

Related Questions in BIOMART

Popular Questions

Popular Tags

Trending Questions