Converting SMILES to chemical name or IUPAC name using rdkit or other python module

Question

Converting SMILES to chemical name or IUPAC name using rdkit or other python module

9.3k views Asked by Alex At 13 October 2020 at 05:27

Is there a way to convert SMILES to either chemical name or IUPAC name using RDKit or other python modules?

I couldn't find something very helpful in other posts.

Thank you very much!

Original Q&A

There are 3 answers

Time Step On 14 December 2023 at 03:02

for my part I have been working on some codes with rdkit, trying to solve the same question, I provide a solution using Google Colab, the smiles names of the molecules are in a list:

!pip install pubchempy #required in google colab
import pubchempy

name = [
"COC1=C(C=C2C(=C1)CC(C2=O)CC3CCN(CC3)CC4=CC=CC=C4)OC",
"C1CCNC2=C(C1)C=CC(=C2)C(=O)CCC3CCN(CC3)CC4=CC=CC=C4",
"C[N+]1(CCC(CC1)CC2CC3=CC(=C(C=C3C2=O)OC)OC)CC4=CC=CC=C4",
"C[N+]1(CCC(CC1)CCC(=O)C2=CC3=C(CCCCN3)C=C2)CC4=CC=CC=C4",
"C[N+]1(CCC(CC1)C(C2CC3=CC(=C(C=C3C2=O)OC)OC)O)CC4=CC=CC=C4",
"C[N+]1(CC[C@@]23C=C[C@@H](CC2OC4=C(C=CC(=C34)C1)OC)O)C",
"CC1=C[C@H]2CC3=C(C=CC(=O)N3)[C@@]4(C1)[C@@H]2CCCN4"]

for i in name:
  compounds = pubchempy.get_compounds(i, namespace='smiles')
  match = compounds[0]
  print(match.iupac_name)

The output here:

Someone left a Google Colab blog to test codes, I leave the link for those who are not very familiar with this site, as of the date of this publication, for me, this is relatively new, just a couple of days using Google Colab: https://colab.research.google.com/drive/1fM88FMcTJytuMQwpDm392v2bmNQbZYNg#scrollTo=iNWdZyjEqoyO

I warn that I am more chemist than a programmer, TimeStep

Rag On 16 June 2021 at 17:41

Recently I managed this conversion using pubchempy. Here is the code for trying.

import pubchempy as pcp

filename = open("inif.txt", "r") # holds SMILE info

for line in filename :
    event = line
    compounds = pcp.get_compounds(event, namespace='smiles') 
    match = compounds[0]
    print(i,'$$$','the CID is',compounds,'$$$',
            'The IUPAC name is', match.iupac_name, '$$$',
            'for the SMILE', event)
    i+=1

**Oliver Scott** · Accepted Answer · 2020-10-13T13:02:14+00:00

As far as I am aware this is not possible using rdkit, and I do not know of any python modules with this ability. If you are ok with using a web service you could use the NCI resolver.

Here is a naive implementation of a function to retrieve an IUPAC identifier from a SMILES string:

import requests


CACTUS = "https://cactus.nci.nih.gov/chemical/structure/{0}/{1}"


def smiles_to_iupac(smiles):
    rep = "iupac_name"
    url = CACTUS.format(smiles, rep)
    response = requests.get(url)
    response.raise_for_status()
    return response.text


print(smiles_to_iupac('c1ccccc1'))
print(smiles_to_iupac('CC(=O)OC1=CC=CC=C1C(=O)O'))

[Out]:
BENZENE
2-acetyloxybenzoic acid

You could easily extend it to convert multiple different formats, although the function isn't exactly fast...

Another solution is to use PubChem. You can use the API with the python package pubchempy. Bear in mind this may return multiple compounds.

import pubchempy


# Use the SMILES you provided
smiles = 'O=C(NCc1ccc(C(F)(F)F)cc1)[C@@H]1Cc2[nH]cnc2CN1Cc1ccc([N+](=O)[O-])cc1'
compounds = pubchempy.get_compounds(smiles, namespace='smiles')
match = compounds[0]
print(match.iupac_name)

[Out]:
(6S)-5-[(4-nitrophenyl)methyl]-N-[[4-(trifluoromethyl)phenyl]methyl]-3,4,6,7-tetrahydroimidazo[4,5-c]pyridine-6-carboxamide

TechQA.

Converting SMILES to chemical name or IUPAC name using rdkit or other python module

There are 3 answers

Related Questions in PYTHON

Related Questions in RDKIT

Related Questions in CHEMINFORMATICS

Popular Questions

Popular Tags

Trending Questions