I have been trying unsuccessfully to merge three Stata files that I originally imported from MS Excel against a 6 character string ID code (e.g. n5fpeb
). They are stored as str6
variables in the Stata data files.
I have recorded some other variables that also identify each observation--a numeric participant ID and a school ID number, as each participant is a school pupil. My master dataset is in stacked format, because my data is longitudinal. When I attempt a 1:m merge (i.e. merge 1:m id using "C:\Users ... May.dta", generate(_merge1)
), Stata returns the following error message : variable id does not uniquely identify observations in the master data
.
I've read various guides, but can't figure out why the datasets won't merge. Could I be using the wrong command? Or perhaps the string variables, or multiple string variables, are confusing Stata? I'd like to learn how to cleanly add future observations to my master dataset.
The solution to your problem depends on what exactly you are trying to merge with your master dataset. From your description, I think your master data has each participant identified by numeric ID or alternatively by the string ID. Since you mention it is a longitudinal, stacked file, I would guess there is also a year variable (or some other time variable).
If the dataset you are trying to merge in is more observations of the same data (same variables, just more years perhaps), look into the
append
command.If the dataset you are trying to merge in is at the participant level, then William is right and you want
m:1
. This is because you have many observations of the same participant in your master file stacked on top of each other.1:m
expects to find only one copy of each id in the stack.If the dataset you are trying to merge in is at the particpant-year level (i.e. is also longitudinal), then you want
merge 1:1 id year ...
(or whatever your time variable is). This will work if there is only one copy of each id and year pair (i.e. only one record for each participant for each time period) in your stack of observations. Be warned, if your data is not clean and there is more than one, this will not work.Hope this helps!