So say I have a dataset like this:
Country | year | poverty rate | sales |
---|---|---|---|
Austria | 1950 | 0.54 | 142 |
Austria | 1951 | 0.32 | 12441 |
Austria | 1952 | 0.32 | 12441 |
Bangladesh | 1950 | 0.11 | 142123123 |
Bangladesh | 1951 | 0.52 | 1234 |
Bangladesh | 1952 | 0.32 | 12441 |
Sri Lanka | 1950 | 0.95 | 4215 |
Sri Lanka | 1951 | 0.21 | 142421 |
Sri Lanka | 1952 | 0.32 | 12441 |
And I have a code like this:
gen dummy=1 if year==1952&(Sales!=.&L1.Sales!=.&L2.Sales!=.)
It's easy when there are only 3 years, but say there are many more. Instead of copy pasting the L#.Sales!=. for each year, what's a faster way to do this?
So judging by your example data and sample
dummy
formula, your goal is to identify which countries have a full time series as of 1952 (i.e., a complete and balanced panel); please correct me if this is off base. Your example data always satisfies this condition, so I will add one country that violates this to show what the dummy is identifying.Because Canada is missing sales in 1951, it does not have
dummy == 1
.Now let's look at what happens when we want to add more years. I will give Sri Lanka a missing sales year in addition to Canada. The general strategy will be to track the cumulative number of years, up to and including the current year, that had non-missing sales. Let's first make some example data:
Now we will identify missing sales, take the cumulative sum, and flag in 1952.
As expected we see Austria and Bangladesh with
dummy == 1
. Note that I have assumed you have a balanced panel; the code above could be adapted to leverage themin
andmax
year within a country. If you wanted to check a narrower window, say 5 years, you could do something likeand check whether it is equal to 5.