How to compare daily active data to cumulative data?

103 views Asked by At

so I am working on COVID-19 data of the state of Texas, USA. I have been given 2 hypotheses to work on

  1. A higher hospitalization rate gives a higher fatality rate
  2. A higher ICU rate gives a higher fatality rate.

Fatality Data - https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyFatalityCountData.xlsx

Hospitalization / ICU Data - https://dshs.texas.gov/coronavirus/CombinedHospitalDataoverTimebyTSA.xlsx

So the basic approach to proving these hypotheses should be to compare Cumulative/per day Fatality data vs cumulative/per day hospitalization / ICU Data.

The main issue with this is fatality data is given cumulative cumsum while hospitalization/icu data is active number per day. Is there anyway these two can be compared if yes then how? Or is there anything we can do about it?

1

There are 1 answers

5
Leonard On

A cumulated data is the cumsum version of a per-day data, and reciprocally, a per-day data is a differential cumsum data.

I assume the number of cumulated fatalities is accumulated per day, so you can extract the per-day # of fatalities with differential (e.g. np.diff). This way, every data will be a daily number. Note that in this case, you will end up with one missing point (at the end).

You can also decide to accumulate the # of hospitalization or ICU data with cumsum to be compared with the cumulated # of facilities.