How do I use biomaRT to get the corresponding gene iDs

470 views Asked by At

I have a txt file and it looks like this. I need to use biomaRT in R to get the corresponding gene IDs of a whole list of different Refseq and peptides. Along with that, I need to keep the peptide sequence with the final outcome How would I do that? Please help

myData = read.delim("phosphopeptides.txt", header = FALSE)

 
1

There are 1 answers

5
zx8754 On

Using refseq_peptide to match our IDs:

library(biomaRt)

ensembl <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")

refseq_peptide = unique(myData$RefSeq)

res <- getBM(attributes = c("refseq_peptide", "hgnc_symbol"), 
             filters = "refseq_peptide",
             values = refseq_peptide, 
             mart = ensembl)
res
#   refseq_peptide hgnc_symbol
# 1      NP_000007       ACADM
# 2      NP_000009      ACADVL
# 3      NP_000012       PSEN1

#merge
merge(myData, res, by.x = "RefSeq", by.y = "refseq_peptide")
#      RefSeq                            Peptide hgnc_symbol
# 1 NP_000007                    R.SDPDPKAPANK.A       ACADM
# 2 NP_000009                    K.SDSHPSDALTR.K      ACADVL
# 3 NP_000012 K.YNAESTERESQDTVAENDDGGFSEEWEAQR.D       PSEN1
# 4 NP_000012            R.AAVQELSSSILAGEDPEER.G       PSEN1
# 5 NP_000012            R.AAVQELSSSILAGEDPEER.G       PSEN1
# 6 NP_000012                  R.S*LGHPEPLSNGR.P       PSEN1

Note: Useful function to find the attributes - searchAttributes, when we do not know the correct attribute name:

searchAttributes(mart = ensembl, pattern = "refseq")
#                        name                 description         page
# 86              refseq_mrna              RefSeq mRNA ID feature_page
# 87    refseq_mrna_predicted    RefSeq mRNA predicted ID feature_page
# 88             refseq_ncrna             RefSeq ncRNA ID feature_page
# 89   refseq_ncrna_predicted   RefSeq ncRNA predicted ID feature_page
# 90           refseq_peptide           RefSeq peptide ID feature_page
# 91 refseq_peptide_predicted RefSeq peptide predicted ID feature_page

searchAttributes(mart = ensembl, pattern = "hgnc")
#               name        description         page
# 64         hgnc_id            HGNC ID feature_page
# 65     hgnc_symbol        HGNC symbol feature_page
# 95 hgnc_trans_name Transcript name ID feature_page