I am trying to create a KNIME workflow that would accept a list of compounds and carry out bioisosteric replacements (we will use the following example here: carboxylic acid to tetrazole) automatically.
NOTE: I am using the following workflow as inspiration : RDKit-bioisosteres (myexperiment.org). This uses a text file as SMARTS input. I cannot seem to replicate the SMARTS format used here.
For this, I plan to use the Rdkit One Component Reaction node which uses a set of compounds to carry out the reaction on as input and a SMARTS string that defines the reaction.
My issue is the generation of a working SMARTS string describing the reaction.
I would like to input two SDF files (or another format, not particularly attached to SDF): one with the group to replace (carboxylic acid) and one with the list of possible bioisosteric replacements (tetrazole). I would then combine these two in KNIME and generate a SMARTS string for the reaction to then be used in the Rdkit One Component Reaction node.
NOTE: The input SDF files have the structures written with an attachment point (*COOH for the carboxylic acid for example) which defines where the group to replace is attached. I suspect this is the cause of many of the issues I am experiencing.
So far, I can easily generate the reactions in RXN format using the Reaction Builder node from the Indigo node package. However, converting this reaction into a SMARTS string that is accepted by the Rdkit One Component Reaction node has proven tricky.
What I have tried so far:
Converting RXN to SMARTS (Molecule Type Cast node) : gives the following error code :
scanner: BufferScanner::read() error
Converting the Source and Target molecules into SMARTS (Molecule Type Cast node) : gives the following error code :
SMILES loader: unrecognised lowercase symbol: y
- showing this as a string in KNIME shows that the conversion is not carried out and the string is of SDF format :
*filename*.sdf 0 0 0 0 0 0 0 V3000M V30 BEGIN
etc.
- showing this as a string in KNIME shows that the conversion is not carried out and the string is of SDF format :
Converting the Source and Target molecules into RDkit first (RDkit from Molecule node) then from RDkit into SMARTS (RDkit to Molecule node, SMARTS option). This outputs the following SMARTS strings:
- Carboxylic acid :
[#6](-[#8])=[#8]
- Tetrazole :
[#6]1:[#7H]:[#7]:[#7]:[#7]:1
- Carboxylic acid :
This is as close as I've managed to get. I can then join these two smarts strings with >>
in between (output: [#6](-[#8])=[#8]>>[#6]1:[#7H]:[#7]:[#7]:[#7]:1
) to create a SMARTS reaction string but this is not accepted as an input for the Rdkit One Component Reaction node.
Error message in KNIME console : ERROR RDKit One Component Reaction 0:40 Creation of Reaction from SMARTS value failed: null WARN RDKit One Component Reaction 0:40 Invalid Reaction SMARTS: missing
Note that the SMARTS strings that this last option (3.) generates are very different than the ones used in the myexperiments.org example ([*:1][C:2]([OH])=O>>[*:1][C:2]1=NNN=N1
). I also seem to have lost the attachment point information through these conversions which are likely to cause issues in the rest of the workflow.
Therefore I am looking for a way to generate the SMARTS strings used in the myexperiments.org example on my own sets of substituents. Obviously doing this by hand is not an option. I would also like this workflow to use only the open-source nodes available in KNIME and not proprietary nodes (Schrodinger etc.).
Hopefully, someone can help me out with this. If you need my current workflow I am happy to upload that with the source files if required.
Thanks in advance for your help,
Stay safe and healthy!
-Antoine
What you're describing is template generation, which has been a consistent field of work in reaction prediction and/or retrosynthesis in cheminformatics for a long time. I'm not particularly familiar with KNIME myself, though I know RDKit extensively: Your last option (3) is closest to what I'd consider a usable workflow. The way I would do this:
rdkit.Chem.MolToSmarts()
.before_substructure>>after_substructure
to generate a reaction SMARTS string.rxn = rdkit.Chem.AllChem.ReactionFromSmarts()
rxn.RunReactants()
method to generate your bioisosterically substituted products.The error you quote for the RDKit One Component Reaction node input cuts off just before the important information, unfortunately. Running
rdkit.Chem.AllChem.ReactionFromSmarts("[#6](-[#8])=[#8]>>[#6]1:[#7H]:[#7]:[#7]:[#7]:1")
produces no errors for me locally, which leads me to believe this is specific to the KNIME node functionality.Note, that the difference between
[#6](-[#8])=[#8]
and[*:1][C:2]([OH])=O
is relatively minimal: The former represents aO-C=O
substructure, the latter represents a~COOH
group. Within the square brackets of the latter, the:num
refers to an optional 'atom map' number, which allows a one-to-one mapping of reactant and product atoms. For example,[C:1][C:3].[C:2][C:4]>>[C:1][C:3][C:4][C:2]
allows you to track which carbon is which during a reaction, for situations where it may matter. The token[*:1]
means "any atom" and is equivalent to a wavey line in organic chemistry (and it is mapped to #1).There are only two situations I can think of where
[#6](-[#8])=[#8]
and[*:1][C:2]([OH])=O
might differ:COO-
!=COOH
)Converting these reaction SMARTS to RDKit reaction objects and running them on input molecule objects should potentially create a number of substituted products. Note: Typically, in extensive projects, there will be some SMARTS templates that require some degree of manual intervention - indicating attachment points, specifying explicit hydrogens, etc. If you need any help or have any questions don't hesitate to drop a comment and I'll do my best to help with specifics.