How to add annotation to a gene in SBML?

541 views Asked by At

I have a genome-scale stoichiometric metabolic model iMM904.xml and when I open it in a text editor I can see that certain genes have annotation added to them, e.g.

<fbc:geneProduct fbc:id="G_YLR189C" fbc:label="YLR189C" metaid="G_YLR189C">
<annotation>
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
    <rdf:Description rdf:about="#G_YLR189C">
      <bqbiol:isEncodedBy>
        <rdf:Bag>
          <rdf:li rdf:resource="http://identifiers.org/ncbigene/850886" />
          <rdf:li rdf:resource="http://identifiers.org/sgd/S000004179" />
        </rdf:Bag>
      </bqbiol:isEncodedBy>
    </rdf:Description>
  </rdf:RDF>
</annotation>
</fbc:geneProduct>

How can I access and alter this annotation? When I try

import cbmpy as cbm

cmod = cbm.CBRead.readSBML3FBC('iMM904.xml')

gene = cmod.getGene('G_YLR189C')

print gene.getAnnotations()

I only see an empty dictionary.

In addition, how could I add annotations like last modified by and actual notes to it?

1

There are 1 answers

0
Cleb On BEST ANSWER

In CBMPy, you have three different options of adding annotation to a SBML file:

1) MIRIAM annotation,

2) arbitrary key value pairs and

3) human-readable notes

which should cover all points you have mentioned in your question. I demonstrate how to use them for the gene entry, but the same commands can be used to annotate species (metabolites) and reactions.

1. MIRIAM annotation

To access the existing MIRIAM annotation - the one you show in your question - you can use:

import cbmpy as cbm

mod = cbm.CBRead.readSBML3FBC('iMM904.xml.gz')

# access gene directly by its locus tag which avoids dealing with the "G_" in the ID
gene = mod.getGeneByLabel('YLR189C')

gene.getMIRIAMannotations()

This will give:

{'encodes': (),
 'hasPart': (),
 'hasProperty': (),
 'hasTaxon': (),
 'hasVersion': (),
 'is': (),
 'isDerivedFrom': (),
 'isDescribedBy': (),
 'isEncodedBy': ('http://identifiers.org/ncbigene/850886',
  'http://identifiers.org/sgd/S000004179'),
 'isHomologTo': (),
 'isPartOf': (),
 'isPropertyOf': (),
 'isVersionOf': (),
 'occursIn': ()}

As you can see, it contains the entries you saw in the SBML file.

If you now want to add MIRIAM annotation, you can use two approaches:

A) let CBMPy create the url for you:

gene.addMIRIAMannotation('is', 'UniProt Knowledgebase', 'Q06321')

B) enter the url your self:

# made up protein!
gene.addMIRIAMuri('is', 'http://identifiers.org/uniprot/P12345')

If you now check gene.getMIRIAMannotations(), you will see (I cut off a few empty entries):

'is': ('http://identifiers.org/uniprot/Q06321',
  'http://identifiers.org/uniprot/P12345'),
 'isDerivedFrom': (),
 'isDescribedBy': (),
 'isEncodedBy': ('http://identifiers.org/ncbigene/850886',
  'http://identifiers.org/sgd/S000004179'),

So, both of your entries have been added (again: the P12345 entry is just for demonstration, don't use it in your actual model!).

If you do not know the correct database identifier, CBMPy will also help you there, e.g. if you try:

gene.addMIRIAMannotation('is', 'uniprot', 'Q06321')

it will print

"uniprot" is not a valid entity were you looking for one of these:

    UNII
    UniGene
    UniParc
    UniPathway Compound
    UniPathway Reaction
    UniProt Isoform
    UniProt Knowledgebase
    UniSTS
    Unimod
    Unipathway
    Unit Ontology
    Unite
INFO: Invalid entity: "uniprot" MIRIAM entity NOT set

which contains 'UniProt Knowledgebase' which we used above.

2. Adding arbitrary key value pairs.

Not everything can be annotated using the MIRIAM annotation scheme but you can easily create your own key-value-pairs. Using your example,

gene.setAnnotation('last_modified_by', 'Vinz')

The keys and values are fully arbitrary,

gene.setAnnotation('arbitrary key', 'arbitrary value')

If you now call

gene.getAnnotations()

you receive

{'arbitrary key': 'arbitrary value', 'last_modified_by': 'Vinz'}

If you want to access a certain key, you can use

gene.getAnnotation('last_modified_by')

which yields

'Vinz'

3. Adding notes

If you want to write actual comments neither of the first two options are appropriate but you can use:

gene.setNotes('This is my favorite gene')

You can access them using

gene.getNotes()

If you now export the model using (make sure to use FBCV2!):

cbm.CBWrite.writeSBML3FBCV2(mod, 'iMM904_edited.xml')

and open the model in your text editor, you will see that all the annotation has been added in:

<fbc:geneProduct metaid="meta_G_YLR189C" fbc:id="G_YLR189C" fbc:label="YLR189C">
  <notes>
    <html:body>This is my favorite gene</html:body>
  </notes>
  <annotation>
    <listOfKeyValueData xmlns="http://pysces.sourceforge.net/KeyValueData">
      <data id="arbitrary key" value="arbitrary value"/>
      <data id="last_modified_by" value="Vinz"/>
    </listOfKeyValueData>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
      <rdf:Description rdf:about="#meta_G_YLR189C">
        <bqbiol:is>
          <rdf:Bag>
            <rdf:li rdf:resource="http://identifiers.org/uniprot/Q06321"/>
            <rdf:li rdf:resource="http://identifiers.org/uniprot/P12345"/>
          </rdf:Bag>
        </bqbiol:is>
        <bqbiol:isEncodedBy>
          <rdf:Bag>
            <rdf:li rdf:resource="http://identifiers.org/ncbigene/850886"/>
            <rdf:li rdf:resource="http://identifiers.org/sgd/S000004179"/>
          </rdf:Bag>
        </bqbiol:isEncodedBy>
      </rdf:Description>
    </rdf:RDF>
  </annotation>
</fbc:geneProduct>