Nested MATLAB struct is difficult to parse when fetched in DataJoint Python

221 views Asked by At

From DataJoint Python and DataJoint MATLAB, I have inserted the same values into a longblob attribute. From DataJoint Python it was inserted as a dictionary and from DataJoint MATLAB it was inserted as a struct. The entry that was inserted with DataJoint MATLAB is a recarray when fetched in Python, which is expected. However, this recarray is difficult to parse since there are nested values.

Inserted with DataJoint Python, fetched with DataJoint Python:

{'cat_gt': {'use_cat_gt': 1,
  'cat_gt_params': {'apfilter': ['biquad', 2, 300, 0],
   'gfix': [0.4, 0.1, 0.02],
   'extras': ['prb_fld', 't_miss_ok', 'ap', 'gblcar', 'out_prb_fld']}},
 'process_cluster': 'tiger',
 'clustering_method': 'Kilosort2'}

Inserted with DataJoint MATLAB, fetched with DataJoint Python:

rec.array([[(rec.array([[(array([[1.]]), rec.array([[(MatCell([['biquad'],
                                             [2.0],
                                             [300.0],
                                             [0.0]], dtype=object), array([[0.4 ],
                                           [0.1 ],
                                           [0.02]]), MatCell([['prb_fld'],
                                             ['t_miss_ok'],
                                             ['ap'],
                                             ['gblcar'],
                                             ['out_prb_fld']], dtype='<U11'))     ]],
                                  dtype=[('apfilter', 'O'), ('gfix', 'O'), ('extras', 'O')]))]],
                      dtype=[('use_cat_gt', 'O'), ('cat_gt_params', 'O')]), array(['tiger'], dtype='<U5'), array(['Kilosort2'], dtype='<U9'))]],
          dtype=[('cat_gt', 'O'), ('process_cluster', 'O'), ('clustering_method', 'O')])

Using query.fetch(as_dict=True) did not seem to solve the issue:

[{'preprocess_paramset': rec.array([[(rec.array([[(array([[1.]]), rec.array([[(MatCell([['biquad'],
                                               [2.0],
                                               [300.0],
                                               [0.0]], dtype=object), array([[0.4 ], ...

I could create a recursive function for converting a recarray to a dictionary, but wondering if there is a native method in DataJoint for fetching and converting this entry to a dictionary?

Thanks!

2

There are 2 answers

0
Dimitri Yatsenko On

This is expected behavior. MATLAB structs are not equivalent to lists of dictionaries in python. They are more like numpy.recarray. The fetch flag as_dict applies to the structure of the fetch result, not the blob internals.

One could write a function to convert nested recarrays to dictionaries. It's hard to make it work universally because MATLAB struct arrays and cell arrays are not mapped easily to native Python types.

0
sneakers-the-rat On

I'm not sure if what's being returned from datajoint is similar to what gets returned from scipy's matlab loading function, but here's some code to at least get started cleaning up matlab structs/recarrays borrowed from this post <3. It would be lovely to have clean roundtrips from datajoint (since the model definitions are identical in both languages and so it seems like returning from the storage format should also be, but haven't looked at the internals.), but in the meantime...

def clean_recarray(data:np.recarray) -> dict:
    '''
    Clean up a recarray into python lists, dictionaries, and
    numpy arrays rather than the sort-of hard to work with numpy record arrays.
    
    Credit to https://stackoverflow.com/a/29126361/13113166
    Args:
        data (:class:`numpy.recarray`): Array to be cleaned!
    Returns:
        dict
    '''
    def _check_keys(d):
        '''
        checks if entries in dictionary are mat-objects. If yes
        todict is called to change them to nested dictionaries
        '''
        for key in d:
            if isinstance(d[key], mat_struct):
                d[key] = _todict(d[key])
            elif _has_struct(d[key]):
                d[key] = _tolist(d[key])
        return d

    def _has_struct(elem):
        """Determine if elem is an array and if any array item is a struct"""
        return isinstance(elem, np.ndarray) and any(isinstance(
                    e, mat_struct) for e in elem)

    def _todict(matobj):
        '''
        A recursive function which constructs from matobjects nested dictionaries
        '''
        d = {}
        for strg in matobj._fieldnames:
            elem = matobj.__dict__[strg]
            if isinstance(elem, mat_struct):
                d[strg] = _todict(elem)
            elif _has_struct(elem):
                d[strg] = _tolist(elem)
            else:
                d[strg] = elem
        return d

    def _tolist(ndarray):
        '''
        A recursive function which constructs lists from cellarrays
        (which are loaded as numpy ndarrays), recursing into the elements
        if they contain matobjects.
        '''
        elem_list = []
        for sub_elem in ndarray:
            if isinstance(sub_elem, mat_struct):
                elem_list.append(_todict(sub_elem))
            elif _has_struct(sub_elem):
                elem_list.append(_tolist(sub_elem))
            else:
                elem_list.append(sub_elem)
        return elem_list
    return _check_keys(data)