RDKit ERG Node attribute names python

229 views Asked by At

I am using RDKit to extract an ERG [1] (Extended Reduced Graph) Graph for a particular molecule. After the extraction of the ERG graph form the original molecule I am creating a graph to be used for further processing.

However, I am noticing that some atoms/nodes, when I try to get their symbol it results in a '*'. Given that the ERG method has a number of abstract nodes such as the Aromatic node and the Lateral Hydrophobic nodes it is normal that I cannot retrieve them with the GetSymbol() method.

No matter how many methods I am trying I cannot extract the names of such nodes. I will need the names of these nodes because these will be using in a Graph Edit Distance technique.

One of the molecules I am testing with is: CHEMBL318217, SMILES string: CC(C)(C)NC(=O)[C@@H]2C[C@@H]1SCC[C@@H]1CN2CC@@HC@HNC(=O)OCc4ccccc4

My code is:

def get_graph(mol): #From reference [2]
  Chem.Kekulize(mol)
  atoms = [atom.GetAtomicNum() for atom in mol.GetAtoms()]
  am = Chem.GetAdjacencyMatrix(mol,useBO=True)
  for i,atom in enumerate(atoms):
    am[i,i] = atom
  G = nx.from_numpy_matrix(am)
  return G

query = 'CC(C)(C)NC(=O)[C@@H]2C[C@@H]1SCC[C@@H]1CN2C[C@@H](O)[C@H](CSc3ccccc3)NC(=O)OCc4ccccc4'
mol = Chem.MolFromSmiles(query)
red = rdReducedGraphs.GenerateMolExtendedReducedGraph(mol)
G1 = get_graph(red)

for atom in red.GetAtoms():
    print(atom.GetSymbol()) #The last four symbols are *

I would really appreciate your help!

Some references:

[1] ErG:  2D Pharmacophore Descriptions for Scaffold Hopping Nikolaus Stiefl, Ian A. Watson, Knut Baumann, and Andrea Zaliani Journal of Chemical Information and Modeling 2006 46 (1), 208-220 DOI: 10.1021/ci050457y

[2] http://proteinsandwavefunctions.blogspot.com/2020/01/computing-graph-edit-distance-between.html

0

There are 0 answers