so I am working on COVID-19 data of the state of Texas, USA. I have been given 2 hypotheses to work on
- A higher hospitalization rate gives a higher fatality rate
- A higher ICU rate gives a higher fatality rate.
Fatality Data - https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyFatalityCountData.xlsx
Hospitalization / ICU Data - https://dshs.texas.gov/coronavirus/CombinedHospitalDataoverTimebyTSA.xlsx
So the basic approach to proving these hypotheses should be to compare Cumulative/per day Fatality data vs cumulative/per day hospitalization / ICU Data.
The main issue with this is fatality data is given cumulative cumsum while hospitalization/icu data is active number per day. Is there anyway these two can be compared if yes then how? Or is there anything we can do about it?
A cumulated data is the cumsum version of a per-day data, and reciprocally, a per-day data is a differential cumsum data.
I assume the number of cumulated fatalities is accumulated per day, so you can extract the per-day # of fatalities with differential (e.g.
np.diff
). This way, every data will be a daily number. Note that in this case, you will end up with one missing point (at the end).You can also decide to accumulate the # of hospitalization or ICU data with cumsum to be compared with the cumulated # of facilities.