How to use the ChEMBL API to download the chembldescriptors?

282 views Asked by At

I have a .csv with Molecule ChEMBL IDs, and I can't find the code to download the chembldescriptors of that set of molecules. Specifically, I want to download: 'TPSA', 'NumHAcceptors', 'NumHDonors', 'CX Acidic pKa', 'CX Basic pKa', 'qed'.

1

There are 1 answers

1
LePe77it On

the starting point is not a .csv, but I can get all the information through the API (for python)

from chembl_webresource_client.new_client import new_client
import pandas as pd
#activity API:
activities = new_client.activity.filter(target_chembl_id__in = ['CHEMBL1824']  #erbB-2
                                       ).filter(standard_type = "IC50"
                                        , IC50_value__lte = 10000    
                                        , assay_type = 'B'                     #Only look for Binding Assays
                                       ).only(['molecule_chembl_id', 'ic50_value'])
act_df = pd.DataFrame(activities)
#find the list of compounds that are within the act_df dataframe:
cmpd_chembl_ids = list(set(act_df['molecule_chembl_id']))
#molecule API
molecules = new_client.molecule.filter(molecule_chembl_id__in = cmpd_chembl_ids  
                                       ).only([ 'molecule_chembl_id', 'molecule_properties'])
mol_df = pd.DataFrame(molecules)
#mol_df
# Convert nested cells (ie those containing a dictionary) to individual columns in the dataframe
mol_df['qed_weighted'] = mol_df.loc[ mol_df['molecule_properties'].notnull(), 'molecule_properties'].apply(lambda x: x['qed_weighted'])
#mol_df['cx_logd'] = mol_df.loc[ mol_df['molecule_properties'].notnull(), 'molecule_properties'].apply(lambda x: x['cx_logd'])
#mol_df['cx_logp'] = mol_df.loc[ mol_df['molecule_properties'].notnull(), 'molecule_properties'].apply(lambda x: x['cx_logp'])
mol_df['cx_most_apka'] = mol_df.loc[ mol_df['molecule_properties'].notnull(), 'molecule_properties'].apply(lambda x: x['cx_most_apka'])
mol_df['cx_most_bpka'] = mol_df.loc[ mol_df['molecule_properties'].notnull(), 'molecule_properties'].apply(lambda x: x['cx_most_bpka'])
mol_df['hba'] = mol_df.loc[ mol_df['molecule_properties'].notnull(), 'molecule_properties'].apply(lambda x: x['hba'])
mol_df['hbd'] = mol_df.loc[ mol_df['molecule_properties'].notnull(), 'molecule_properties'].apply(lambda x: x['hbd'])
mol_df['psa'] = mol_df.loc[ mol_df['molecule_properties'].notnull(), 'molecule_properties'].apply(lambda x: x['psa'])