I have biographical data of more than 1600 people. The data includes their gender, birth year, hometowns, etc., as well as their career trajectories from the year they begun their work. I'm trying to turn this into a panel data, so that I have a grip of how their workplaces have changed since they have started their jobs. I have the following problems with this dataset:
1) How do I turn this into a panel dataset? The optimal format I want for each person(id) is:
id gender hometown year job
1 1 1 NY 1990 3
1 1 1 NY 1991 3
1 1 1 NY 1992 3
1 1 1 NY 1993 3
1 1 1 NY 1994 5
2) How do I save information if the person had overlapping positions? For instance, the person can have job 3 and job 5 at the same time. I'm hoping later to only use the job that is higher than the other, but meanwhile I would like to save as much information as possible.
Okay, give this a try.
First select a subset of your data.
Next we re-organise the data into what the format that I think you are after.
We can see that the job field is empty for a few of these records, so we exclude those.
Sorting out overlapping positions is a secondary problem. If I know that the above is basically what you are after then we can address that next.