Shorter way to reference every previous time-series year of a variable using tsset?

93 views Asked by At

So say I have a dataset like this:

Country year poverty rate sales
Austria 1950 0.54 142
Austria 1951 0.32 12441
Austria 1952 0.32 12441
Bangladesh 1950 0.11 142123123
Bangladesh 1951 0.52 1234
Bangladesh 1952 0.32 12441
Sri Lanka 1950 0.95 4215
Sri Lanka 1951 0.21 142421
Sri Lanka 1952 0.32 12441

And I have a code like this: gen dummy=1 if year==1952&(Sales!=.&L1.Sales!=.&L2.Sales!=.)

It's easy when there are only 3 years, but say there are many more. Instead of copy pasting the L#.Sales!=. for each year, what's a faster way to do this?

1

There are 1 answers

0
JR96 On

So judging by your example data and sample dummy formula, your goal is to identify which countries have a full time series as of 1952 (i.e., a complete and balanced panel); please correct me if this is off base. Your example data always satisfies this condition, so I will add one country that violates this to show what the dummy is identifying.

clear 
input str10 country year poverty_rate Sales
"Austria"   1950    0.54    142
"Austria"   1951    0.32    12441
"Austria"   1952    0.32    12441
"Bangladesh"    1950    0.11    142123123
"Bangladesh"    1951    0.52    1234
"Bangladesh"    1952    0.32    12441
"Sri Lanka" 1950    0.95    4215
"Sri Lanka" 1951    0.21    142421
"Sri Lanka" 1952    0.32    12441
"Canada"    1950    0.95    4215
"Canada"    1951    0.21    .
"Canada"    1952    0.32    12441 
end

* TSSET SET ON COUNTRY (after making a country id) AND YEAR
egen country_id = group(country)
tsset country_id year

* Example dummy
gen dummy= 1 if year==1952&(Sales!=.&L1.Sales!=.&L2.Sales!=.)

     +-------------------------------------------------------------+
     |    country   year   povert~e       Sales   countr~d   dummy |
     |-------------------------------------------------------------|
  1. |    Austria   1950        .54         142          1       . |
  2. |    Austria   1951        .32       12441          1       . |
  3. |    Austria   1952        .32       12441          1       1 |
  4. | Bangladesh   1950        .11   1.421e+08          2       . |
  5. | Bangladesh   1951        .52        1234          2       . |
     |-------------------------------------------------------------|
  6. | Bangladesh   1952        .32       12441          2       1 |
  7. |     Canada   1950        .95        4215          3       . |
  8. |     Canada   1951        .21           .          3       . |
  9. |     Canada   1952        .32       12441          3       . |
 10. |  Sri Lanka   1950        .95        4215          4       . |
     |-------------------------------------------------------------|
 11. |  Sri Lanka   1951        .21      142421          4       . |
 12. |  Sri Lanka   1952        .32       12441          4       1 |
     +-------------------------------------------------------------+

. tabdisp country if dummy == 1, c(year)

-----------------------
   country |       year
-----------+-----------
   Austria |       1952
Bangladesh |       1952
 Sri Lanka |       1952
-----------------------

Because Canada is missing sales in 1951, it does not have dummy == 1.

Now let's look at what happens when we want to add more years. I will give Sri Lanka a missing sales year in addition to Canada. The general strategy will be to track the cumulative number of years, up to and including the current year, that had non-missing sales. Let's first make some example data:

* 10 year example Data
clear 
set seed 1234
input str10 country
"Austria"   
"Bangladesh"    
"Sri Lanka"
"Canada"    
end
egen country_id = group(country)

expand 10
bysort country: gen year = (1952 - _N ) + _n 
gen poverty_rate = runiform(country_id/10, 1)
gen Sales = rnormal(10000 * country_id/10,500)
replace Sales = . if inlist(country, "Canada", "Sri Lanka") & mod(year, country_id + 3) == 0
tsset country_id year

. list

     +------------------------------------------------------+
     |    country   countr~d   year   poverty~e       Sales |
     |------------------------------------------------------|
  1. |    Austria          1   1943   .95250845   1247.2748 |
  2. |    Austria          1   1944   .14700104   868.84461 |
  3. |    Austria          1   1945   .97688645   1183.0619 |
  4. |    Austria          1   1946   .95117353   518.76747 |
  5. |    Austria          1   1947   .26708305   1126.6462 |
     |------------------------------------------------------|
  6. |    Austria          1   1948   .95386004   2142.1245 |
  7. |    Austria          1   1949   .89428386   1161.1905 |
  8. |    Austria          1   1950   .94966985   797.04767 |
  9. |    Austria          1   1951   .18048327   689.44633 |
 10. |    Austria          1   1952   .77549004   1066.0907 |
     |------------------------------------------------------|
 11. | Bangladesh          2   1943   .95879865   3018.9243 |
 12. | Bangladesh          2   1944   .28973012   2582.4464 |
 13. | Bangladesh          2   1945   .58472512   2524.7572 |
 14. | Bangladesh          2   1946    .9810758   2164.6962 |
 15. | Bangladesh          2   1947   .30039802   2364.3507 |
     |------------------------------------------------------|
 16. | Bangladesh          2   1948   .81240204   2407.6086 |
 17. | Bangladesh          2   1949   .22868748   2441.4124 |
 18. | Bangladesh          2   1950   .25618875   1989.3041 |
 19. | Bangladesh          2   1951   .36814293   2479.9563 |
 20. | Bangladesh          2   1952   .72928052   2302.7052 |
     |------------------------------------------------------|
 21. |     Canada          3   1943   .44079668   2673.9421 |
 22. |     Canada          3   1944    .9912414           . |
 23. |     Canada          3   1945   .50897682    1960.373 |
 24. |     Canada          3   1946   .92788352   3217.6927 |
 25. |     Canada          3   1947   .35683626   3401.9663 |
     |------------------------------------------------------|
 26. |     Canada          3   1948   .76214979   3976.6976 |
 27. |     Canada          3   1949    .7398694   2350.2898 |
 28. |     Canada          3   1950   .31369335           . |
 29. |     Canada          3   1951   .44475408   3216.2126 |
 30. |     Canada          3   1952    .8668553   1833.1472 |
     |------------------------------------------------------|
 31. |  Sri Lanka          4   1943   .86803337   3617.7311 |
 32. |  Sri Lanka          4   1944   .40923508   4508.0392 |
 33. |  Sri Lanka          4   1945    .7448494   4093.6019 |
 34. |  Sri Lanka          4   1946   .79308608           . |
 35. |  Sri Lanka          4   1947   .72141991   4233.8767 |
     |------------------------------------------------------|
 36. |  Sri Lanka          4   1948    .6399412   4404.9189 |
 37. |  Sri Lanka          4   1949    .6140176   3711.7107 |
 38. |  Sri Lanka          4   1950   .84002398   3311.1258 |
 39. |  Sri Lanka          4   1951   .74770728    4021.116 |
 40. |  Sri Lanka          4   1952   .89887266   4581.6338 |
     +------------------------------------------------------+

Now we will identify missing sales, take the cumulative sum, and flag in 1952.

gen year_has_sales = !missing(Sales)
bysort country (year): gen years_with_sales = sum(year_has_sales)

by country: gen dummy = (year == 1952) & years_with_sales == _n
tabdisp country if dummy == 1, c(year)


-----------------------
   country |       year
-----------+-----------
   Austria |       1952
Bangladesh |       1952
-----------------------

As expected we see Austria and Bangladesh with dummy == 1. Note that I have assumed you have a balanced panel; the code above could be adapted to leverage the min and max year within a country. If you wanted to check a narrower window, say 5 years, you could do something like

gen years_with_sales_5 = years_with_sales - L5.years_with_sales

and check whether it is equal to 5.