turning biographical data into panel data

82 views Asked by At

I have biographical data of more than 1600 people. The data includes their gender, birth year, hometowns, etc., as well as their career trajectories from the year they begun their work. I'm trying to turn this into a panel data, so that I have a grip of how their workplaces have changed since they have started their jobs. I have the following problems with this dataset:

1) How do I turn this into a panel dataset? The optimal format I want for each person(id) is:

  id gender hometown year job
1  1      1       NY 1990   3
1  1      1       NY 1991   3
1  1      1       NY 1992   3
1  1      1       NY 1993   3
1  1      1       NY 1994   5

2) How do I save information if the person had overlapping positions? For instance, the person can have job 3 and job 5 at the same time. I'm hoping later to only use the job that is higher than the other, but meanwhile I would like to save as much information as possible.

1

There are 1 answers

2
datawookie On BEST ANSWER

Okay, give this a try.

First select a subset of your data.

> (D = head(origin[, c("id", "name1", "gender", "job1", "job1s", "job1e",
            "job2", "job10")]))
  id                name1 gender job1 job1s job1e job2 job10
1  1 Abulaiti Abureduxiti      1 2305  1980  1991 2303    NA
2  2  Aisihaiti Kelimubai      1 2307  1972  1987 2307    NA
3  3          Ai Zhisheng      1 4509  1996  1997 1075 10103
4  4         An Pingsheng      1 3555  1975  1977 3561  2191
5  5            An Zhiwen      1 2063  1977  1979 1127  2507
6  6             An Ziwen      1 4509  1954  1966 4007  2517

Next we re-organise the data into what the format that I think you are after.

> library(reshape2)
> (D = melt(D, id.vars = c("id", "name1", "gender")))
   id                name1 gender variable value
1   1 Abulaiti Abureduxiti      1     job1  2305
2   2  Aisihaiti Kelimubai      1     job1  2307
3   3          Ai Zhisheng      1     job1  4509
4   4         An Pingsheng      1     job1  3555
5   5            An Zhiwen      1     job1  2063
6   6             An Ziwen      1     job1  4509
7   1 Abulaiti Abureduxiti      1    job1s  1980
8   2  Aisihaiti Kelimubai      1    job1s  1972
9   3          Ai Zhisheng      1    job1s  1996
10  4         An Pingsheng      1    job1s  1975
11  5            An Zhiwen      1    job1s  1977
12  6             An Ziwen      1    job1s  1954
13  1 Abulaiti Abureduxiti      1    job1e  1991
14  2  Aisihaiti Kelimubai      1    job1e  1987
15  3          Ai Zhisheng      1    job1e  1997
16  4         An Pingsheng      1    job1e  1977
17  5            An Zhiwen      1    job1e  1979
18  6             An Ziwen      1    job1e  1966
19  1 Abulaiti Abureduxiti      1     job2  2303
20  2  Aisihaiti Kelimubai      1     job2  2307
21  3          Ai Zhisheng      1     job2  1075
22  4         An Pingsheng      1     job2  3561
23  5            An Zhiwen      1     job2  1127
24  6             An Ziwen      1     job2  4007
25  1 Abulaiti Abureduxiti      1    job10    NA
26  2  Aisihaiti Kelimubai      1    job10    NA
27  3          Ai Zhisheng      1    job10 10103
28  4         An Pingsheng      1    job10  2191
29  5            An Zhiwen      1    job10  2507
30  6             An Ziwen      1    job10  2517

We can see that the job field is empty for a few of these records, so we exclude those.

> (D = D[complete.cases(D),])
   id                name1 gender variable value
1   1 Abulaiti Abureduxiti      1     job1  2305
2   2  Aisihaiti Kelimubai      1     job1  2307
3   3          Ai Zhisheng      1     job1  4509
4   4         An Pingsheng      1     job1  3555
5   5            An Zhiwen      1     job1  2063
6   6             An Ziwen      1     job1  4509
7   1 Abulaiti Abureduxiti      1    job1s  1980
8   2  Aisihaiti Kelimubai      1    job1s  1972
9   3          Ai Zhisheng      1    job1s  1996
10  4         An Pingsheng      1    job1s  1975
11  5            An Zhiwen      1    job1s  1977
12  6             An Ziwen      1    job1s  1954
13  1 Abulaiti Abureduxiti      1    job1e  1991
14  2  Aisihaiti Kelimubai      1    job1e  1987
15  3          Ai Zhisheng      1    job1e  1997
16  4         An Pingsheng      1    job1e  1977
17  5            An Zhiwen      1    job1e  1979
18  6             An Ziwen      1    job1e  1966
19  1 Abulaiti Abureduxiti      1     job2  2303
20  2  Aisihaiti Kelimubai      1     job2  2307
21  3          Ai Zhisheng      1     job2  1075
22  4         An Pingsheng      1     job2  3561
23  5            An Zhiwen      1     job2  1127
24  6             An Ziwen      1     job2  4007
27  3          Ai Zhisheng      1    job10 10103
28  4         An Pingsheng      1    job10  2191
29  5            An Zhiwen      1    job10  2507
30  6             An Ziwen      1    job10  2517

Sorting out overlapping positions is a secondary problem. If I know that the above is basically what you are after then we can address that next.