I am using RDKit to extract an ERG [1] (Extended Reduced Graph) Graph for a particular molecule. After the extraction of the ERG graph form the original molecule I am creating a graph to be used for further processing.
However, I am noticing that some atoms/nodes, when I try to get their symbol it results in a '*'. Given that the ERG method has a number of abstract nodes such as the Aromatic node and the Lateral Hydrophobic nodes it is normal that I cannot retrieve them with the GetSymbol() method.
No matter how many methods I am trying I cannot extract the names of such nodes. I will need the names of these nodes because these will be using in a Graph Edit Distance technique.
One of the molecules I am testing with is: CHEMBL318217, SMILES string: CC(C)(C)NC(=O)[C@@H]2C[C@@H]1SCC[C@@H]1CN2CC@@HC@HNC(=O)OCc4ccccc4
My code is:
def get_graph(mol): #From reference [2]
Chem.Kekulize(mol)
atoms = [atom.GetAtomicNum() for atom in mol.GetAtoms()]
am = Chem.GetAdjacencyMatrix(mol,useBO=True)
for i,atom in enumerate(atoms):
am[i,i] = atom
G = nx.from_numpy_matrix(am)
return G
query = 'CC(C)(C)NC(=O)[C@@H]2C[C@@H]1SCC[C@@H]1CN2C[C@@H](O)[C@H](CSc3ccccc3)NC(=O)OCc4ccccc4'
mol = Chem.MolFromSmiles(query)
red = rdReducedGraphs.GenerateMolExtendedReducedGraph(mol)
G1 = get_graph(red)
for atom in red.GetAtoms():
print(atom.GetSymbol()) #The last four symbols are *
I would really appreciate your help!
Some references:
[1] ErG: 2D Pharmacophore Descriptions for Scaffold Hopping Nikolaus Stiefl, Ian A. Watson, Knut Baumann, and Andrea Zaliani Journal of Chemical Information and Modeling 2006 46 (1), 208-220 DOI: 10.1021/ci050457y
[2] http://proteinsandwavefunctions.blogspot.com/2020/01/computing-graph-edit-distance-between.html