I'm working on typing an old code base that attempts to provide uniform access to data file formats, including versions of the same format. Generally, the data format is a straightforward binary block, and the difference is in the header, so we have a structure File(Header, Data).
There is a paired hierarchy, such that more recent versions of a particular format are subclasses of the previous versions of that format, as are the headers.
Here is an example hierarchy that I am trying to type annotate:
class MetaDataMixin:
def __init__(self, metadata=None, *args, **kwargs):
super().__init__(*args, **kwargs)
self.metadata = {}
if metadata:
self.metadata.update(metadata)
class DataFile:
def __init__(self, header=None, data=None):
self.header = header or self.header_class()
class HeaderV1:
magic_number = b'HDR'
format_version = 1
class DataFileV1(DataFile):
header_class = HeaderV1
class HeaderV2(HeaderV1, MetaDataMixin):
format_version = 2
class DataFileV2(DataFileV1):
header_class = HeaderV2
It would be nice for type checkers to be able to recognize that DataFileV1().header has type HeaderV1 and DataFileV2().header has type HeaderV2. A critical component is that, for backwards compatibility, DataFileV1 must be both an instantiable class and a superclass of DataFileV2.
I tried making DataFile generic on a Header variable that would annotate both the header_class and header variables. Once made concrete with HeaderV1, there doesn't seem to be a way to override with HeaderV2. Here's my attempt:
import typing as ty
class Header:
pass
HdrT = ty.TypeVar('HdrT', bound=Header)
class MetaDataMixin:
metadata: dict[str, str]
def __init__(self, metadata=None, *args, **kwargs):
super().__init__(*args, **kwargs)
self.metadata = {}
if metadata:
self.metadata.update(metadata)
class DataFile(ty.Generic[HdrT]):
header: HdrT
header_class: type[HdrT]
def __init__(self, header: HdrT | None = None, data: ty.Any = None):
self.header = header or self.header_class()
class HeaderV1(Header):
magic_number: bytes = b'HDR'
format_version: int = 1
class DataFileV1(DataFile[HeaderV1]):
header_class = HeaderV1
class HeaderV2(HeaderV1, MetaDataMixin):
format_version = 2
class DataFileV2(DataFileV1, DataFile[HeaderV2]):
header_class = HeaderV2
file1 = DataFileV1()
file2 = DataFileV2()
print(file2.header.format_version)
if ty.TYPE_CHECKING:
# Shows HeaderV1, but I would like it to be HeaderV2
reveal_type(file2.header)
else:
# Acts like HeaderV2
print(file2.header.metadata)
I can manually set header: HeaderV2 in DataFileV2, but I'd hoped to eliminate that additional boilerplate by using type variables.
Swapping the order of the superclasses in DataFileV2 breaks the method resolution order, so that's not an option. I had an additional thought of creating a HdrV1T = ty.TypeVar('HdrV1T', bound=HeaderV1) and making class DataFileV1(DataFile[HdrV1T]), but then the annotations of DataFileV1 become Any.
While my question is specifically about how to retrofit typing onto an old structure, I would also be interested in how someone would design an API like this now, with typing as a first-class consideration.