I have two different data sets, one that has the annual unemployment rate by state (listed under a single column) and the second data set has the minimum wage for each state. Both have only have data between 2003-2020.
The problem is
- They are in different data sets
- The X variable (minimum wage) spans over 17 different columns
Questions
- How can I regress data from 2 different data sets
- How can I regress 17 columns without having to type minwage$2003 + minwage$2004 + . . . + minwage$2020
I tried this, but again, it's very inefficient.
unemp_minwage <- lm(unemployment_03_20$`U-3` ~ minwage$`2003` + minwage$`2004` + minwage$`2005` + minwage$`2006` + minwage$`2007` + minwage$`2008` + minwage$`2009` + minwage$`2010` + minwage$`2011` + minwage$`2012` + minwage$`2013` + minwage$`2014` + minwage$`2015` + minwage$`2016` + minwage$`2017` + minwage$`2018` + minwage$`2019` + minwage$`2020`)
Not to mention I got this error code:
Error in model.frame.default(formula = unemployment_03_20$U-3
~ minwage$2003
+ :
variable lengths differ (found for 'minwage$2003
')
Then I tried just regressing on one year of minimum wage, but got a similar error.
Suggestions?
To get the exact formula in your question:
So you can do something like this (for clarity):
I strongly suggest merging the data first so you don't inadvertently make an error, and then supplying
lm()
with that data (rather than individual vectors from multiple datasets.