How to do a loop merging multiple datasets in Stata?

1k views Asked by At

I am trying to create two lists of files and create two new datasets that merges all those files. To do So I was trying the following:

*** SET FOLDER PATHS ***********************************************************
    global projectFolder  "C:\Users\XXX"
    global codeFolder     "${projectFolder}\code"
    global databaseFolder "${projectFolder}\data"
    global rawFolder      "${databaseFolder}\raw"
    global outputsFolder  "${databaseFolder}\output"
    
*** CREATING VECTORS WITH FILE NAMES *******************************************
global file_all        dir "$outputsFolder" files "*.dta"
di `$file_all' 

global file_monthly    dir "$outputsFolder" files "*_monthly.dta"
di `$file_monthly' 

global file_yearly :  list global file_all - global file_monthly
di `$file_yearly' 

I found a few problems. First, I was not able to create the list of files, and second, I didn't find a way to create this loop without merging twice the first dataset.

*** MERGING YEARLY OUTCOMES ****************************************************
use "$outputsFolder\first_dataset.dta", clear


    foreach file in `file_yearly' {
         merge 1:1 muni_code year using `file', nogen
    }
1

There are 1 answers

0
langtang On

Within your foreach loop over the files, you can conditionally load/use if the first file (in the example below, it requires knowing the name of the "first" file), else merge, like this:

local files: dir "." files "yearly*.dta"
foreach f of local files {
    if "`f'" == "yearly_1.dta" use `f'
    else merge 1:1 year muni using `f', nogen
}
list, clean

Output:

       year   muni       val1       val2       val3  
  1.   2001      1   .3132002   .1924075   .8190824  
  2.   2002      2   .5559791   .1951401   .4882096  
  3.   2003      3   .9382851   .9509598   .2704866  
  4.   2004      4   .7363221   .2904454   .5859706 

Input:

set seed 123
forvalues i = 1/3 {
    clear
    set obs 4
    gen year = 2000 + _n
    gen muni = _n
    gen val`i' = runiform()
    save yearly_`i', replace
}