I have yearly data over time (longitudinal data) with repeated measures for many of the subjects. I think I need multilevel modeling/regressions to deal with sure-to-be correlated clusters of measurements for the same individuals over time. The data currently is in separate tables for each year.
I was wondering if there was a way that was built into scikit-learn, like LinearRegression(), that would be able to conduct a multilevel regression where Level 1 is all the data over the years, and Level 2 is for the clustered on the subjects (clusters for each subject's measurements over time). And if so, if it's better to have the longitudnal data laid out length-wise (where the each subject's measures over time are all in one row) or stacked (where each measure for each year is it's own row).
Is there a way to do this?
Estimation of random effects in multilevel models is non-trivial and you typically have to resort to Bayesian inference methods.
I would suggest you look into Bayesian inference packages such as pymc3 or BRMS (if you know R) where you can specify such a model. Or alternatively, look at lme4 package in R for a fully-frequentist implementation of multi-level models.
Also, I think you would be able to get some inspiration from the "sleep-deprivation" dataset which is used as a textbook example of longitudinal data-analysis (https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf) pg.4
To get started in pymc3 have a look here:
https://github.com/fonnesbeck/Bios8366/blob/master/notebooks/Section4_7-Multilevel-Modeling.ipynb