I'm working with a dataset named company_data_with_beta_2, containing quarterly firm-level data, including financial metrics and firm characteristics. My objective is to use the Synth package in R to create a synthetic control group for a treatment effect study. The dataset includes a Treatment_Control column indicating treated (1) and control (0) firms and a time column as a pre/post-treatment indicator (1 for post-treatment, 0 for pre-treatment).
Predictors for Synthetic Control Group: "Firm_Size" "Fixed_Assets_Ratio" "Leverage" "Revenue_Growth"
Issue: When attempting to implement the Synthetic Control Method, I encounter an "unbalanced panel" error, stating: "Your panel, as described by unit.variable and time.variable, is unbalanced. Balance it and run again."
Steps Taken:
- Created a subset of the dataset (cds) and omitted all NAs.
cds <- subset(company_data_with_beta_2, select = c("gvkey", "firmid", "time", "datadate", "Days_Sales_Outstanding", "NAICS_3digits", "Firm_Size", "Fixed_Assets_Ratio", "Leverage", "Revenue_Growth","Treatment_Control"))
# Remove rows with NA values in any column
company_data_with_beta_2 <- na.omit(cds)
- Ensured each firm (firmid) has data for all time periods, attempting to balance the panel.
Create a table of firmid by time
firm_time_count <- table(company_data_with_beta_2$firmid, company_data_with_beta_2$time)
# Identify firmids that have data for all time periods
complete_firmids <- rownames(firm_time_count)[apply(firm_time_count, 1, function(x) all(x > 0))]
# Filter the dataset
company_data_balanced <- company_data_with_beta_2[company_data_with_beta_2$firmid %in% complete_firmids, ]
# Verify if the panel is balanced
firm_time_table_balanced <- table(company_data_balanced$firmid, company_data_balanced$time)
if(all(firm_time_table_balanced > 0)) {
print("The panel is now balanced.")
} else {
print("The panel is still unbalanced.")
}
Despite these steps, I continue to encounter the unbalanced panel error when executing the Synth package's dataprep function:
company_data_balanced <- as.data.frame(company_data_balanced)
dataprep.out <- dataprep(
foo = company_data_balanced,
predictors = c("Firm_Size", "Fixed_Assets_Ratio", "Leverage", "Revenue_Growth"),
predictors.op = "mean",
dependent = "Days_Sales_Outstanding",
unit.variable = "firmid",
time.variable = "time",
treatment.identifier = single_treated_unit,
controls.identifier = control_units_n,
time.predictors.prior = 0,
time.optimize.ssr = 0,
unit.names.variable = "gvkey",
time.plot = c(0, 1)
)
Additional Context: For testing purposes, I used a single treated unit and multiple control units.
# Get the list of gvkeys for treatment units
treatment_units <- unique(company_data_balanced$gvkey[company_data_balanced$Treatment_Control == 1])
treatment_units_n <- unique(company_data_balanced$firmid[company_data_balanced$Treatment_Control == 1])
# Select a single treated unit for testing
single_treated_unit <- treatment_units[1]
# Get the list of gvkeys for control units
control_units <- unique(company_data_balanced$gvkey[company_data_balanced$Treatment_Control == 0])
control_units_n <- unique(company_data_balanced$firmid[company_data_balanced$Treatment_Control == 0])
And I saved the dataset as a dataframe, following other posts on issues of the Synth Package.
Data Sample:
Here's a look at the first few rows of my dataset:
Output from head(company_data_balanced)
head(company_data_balanced,n = 200)
gvkey firmid time datadate Days_Sales_Outstanding NAICS_3digits Firm_Size Fixed_Assets_Ratio Leverage Revenue_Growth Treatment_Control
1 1045 1045 0 2017-12-31 60.27 481 9.269646 0.97 0.44 -0.03 1
2 1045 1045 0 2018-03-31 63.48 481 9.249657 0.95 0.41 -0.02 1
3 1045 1045 0 2018-06-30 60.85 481 9.362460 0.97 0.42 0.12 1
4 1045 1045 0 2018-09-30 68.52 481 9.355220 0.98 0.42 -0.01 1
5 1045 1045 0 2018-12-31 56.93 481 9.299998 1.00 0.48 -0.05 1
6 1045 1045 0 2019-03-31 64.70 481 9.267099 1.01 0.47 -0.03 1
7 1045 1045 0 2019-06-30 59.30 481 9.389323 1.00 0.48 0.13 1
8 1045 1045 0 2019-09-30 56.69 481 9.385218 1.01 0.48 0.00 1
9 1045 1045 0 2019-12-31 56.46 481 9.333708 1.04 0.48 -0.05 1
10 1045 1045 1 2020-12-31 121.61 481 8.301025 0.91 0.59 -0.64 1
11 1045 1045 1 2021-03-31 88.43 481 8.296048 0.81 0.64 0.00 1
12 1045 1045 1 2021-09-30 54.21 481 9.101529 0.81 0.62 1.24 1
13 1045 1045 1 2021-12-31 58.27 481 9.151333 0.84 0.63 0.05 1
14 1045 1045 1 2022-03-31 63.04 481 9.093694 0.83 0.62 -0.06 1
15 1045 1045 1 2022-06-30 49.90 481 9.504650 0.83 0.61 0.51 1
16 1045 1045 1 2022-09-30 53.98 481 9.507626 0.86 0.61 0.00 1
17 1045 1045 1 2022-12-31 59.17 481 9.487063 0.90 0.60 -0.02 1
18 1045 1045 1 2023-03-31 59.56 481 9.408289 0.88 0.57 -0.08 1
19 1045 1045 1 2023-06-30 53.83 481 9.550733 0.88 0.55 0.15 1
20 1045 1045 1 2023-09-30 54.71 481 9.509111 0.91 0.55 -0.04 1
21 1050 1050 0 2017-12-31 337.44 333 4.297870 0.10 0.25 -0.13 1
22 1050 1050 0 2019-12-31 279.36 333 4.493266 0.12 0.20 0.05 1
23 1050 1050 1 2020-12-31 277.48 333 4.417997 0.11 0.20 0.07 1
24 1050 1050 1 2021-12-31 292.47 333 4.538913 0.11 0.18 0.17 1
25 1050 1050 1 2022-12-31 260.53 333 4.757050 0.11 0.24 0.07 1
26 1075 1075 0 2017-09-30 130.32 221 7.076081 1.12 0.26 0.25 0
27 1075 1075 0 2017-12-31 145.41 221 6.632870 1.16 0.28 -0.36 0
28 1075 1075 0 2018-03-31 110.73 221 6.540617 1.14 0.25 -0.09 0
29 1075 1075 0 2018-06-30 118.89 221 6.881538 1.13 0.24 0.41 0
30 1075 1075 0 2018-12-31 127.31 221 6.628539 1.16 0.26 -0.22 0
31 1075 1075 0 2019-03-31 121.80 221 6.607366 1.14 0.28 -0.02 0
32 1075 1075 0 2019-09-30 108.71 221 7.082370 1.13 0.27 0.61 0
33 1075 1075 0 2019-12-31 140.61 221 6.507861 1.16 0.26 -0.44 0
34 1075 1075 1 2020-06-30 105.19 221 6.834744 1.14 0.33 0.39 0
35 1075 1075 1 2020-09-30 116.19 221 7.134493 1.12 0.34 0.35 0
36 1075 1075 1 2020-12-31 144.72 221 6.607948 1.14 0.33 -0.41 0
37 1075 1075 1 2021-03-31 127.20 221 6.546032 1.13 0.34 -0.06 0
38 1075 1075 1 2021-06-30 122.01 221 6.908004 1.10 0.33 0.44 0
39 1075 1075 1 2021-09-30 112.95 221 7.176449 1.09 0.34 0.31 0
40 1075 1075 1 2022-03-31 139.75 221 6.663811 1.10 0.36 -0.40 0
41 1075 1075 1 2022-06-30 136.54 221 6.967597 1.10 0.35 0.35 0
42 1075 1075 1 2022-09-30 143.83 221 7.292930 1.09 0.35 0.38 0
43 1075 1075 1 2022-12-31 155.30 221 6.917026 1.13 0.37 -0.31 0
44 1075 1075 1 2023-06-30 147.97 221 7.022603 1.11 0.39 0.11 0
45 1075 1075 1 2023-09-30 143.59 221 7.401084 1.10 0.38 0.46 0
46 1078 1078 0 2017-09-30 256.55 334 8.828934 0.20 0.32 0.03 1
47 1078 1078 0 2017-12-31 252.46 334 8.934455 0.20 0.36 0.11 1
48 1078 1078 0 2018-06-30 243.99 334 8.957639 0.22 0.29 0.02 1
49 1078 1078 0 2018-09-30 251.30 334 8.943245 0.22 0.27 -0.01 1
50 1078 1078 0 2018-12-31 243.58 334 8.957382 0.23 0.29 0.01 1
51 1078 1078 0 2019-03-31 258.92 334 8.927314 0.25 0.29 -0.03 1
52 1078 1078 0 2019-06-30 253.79 334 8.984568 0.25 0.29 0.06 1
53 1078 1078 0 2019-09-30 246.32 334 8.996652 0.25 0.27 0.01 1
54 1078 1078 0 2019-12-31 238.17 334 9.025696 0.26 0.26 0.03 1
55 1078 1078 0 2020-03-31 250.01 334 8.952347 0.25 0.25 -0.07 1
56 1078 1078 1 2020-09-30 232.90 334 9.088512 0.26 0.27 0.15 1
57 1078 1078 1 2020-12-31 218.77 334 9.278092 0.27 0.27 0.21 1
58 1078 1078 1 2021-03-31 212.80 334 9.254931 0.26 0.24 -0.02 1
59 1078 1078 1 2021-06-30 218.26 334 9.232395 0.26 0.24 -0.02 1
60 1078 1078 1 2021-09-30 213.93 334 9.299084 0.26 0.24 0.07 1
61 1078 1078 1 2022-03-31 220.29 334 9.383873 0.26 0.23 0.09 1
62 1078 1078 1 2022-06-30 233.42 334 9.328745 0.26 0.22 -0.05 1
63 1078 1078 1 2022-09-30 224.68 334 9.250522 0.27 0.21 -0.08 1
64 1078 1078 1 2023-03-31 225.43 334 9.184715 0.28 0.20 -0.06 1
65 1078 1078 1 2023-06-30 225.77 334 9.208138 0.29 0.19 0.02 1
66 1078 1078 1 2023-09-30 233.87 334 9.224539 0.29 0.20 0.02 1
67 1117 1117 0 2017-12-31 213.99 334 2.243047 0.22 0.00 -0.20 1
68 1117 1117 0 2018-12-31 195.59 334 2.367998 0.27 0.00 -0.20 1
69 1117 1117 0 2019-12-31 196.66 334 1.995652 0.42 0.08 -0.45 1
70 1117 1117 1 2020-12-31 223.64 334 2.356410 0.42 0.08 0.06 1
71 1117 1117 1 2021-12-31 233.96 334 2.552410 0.34 0.06 0.02 1
72 1161 1161 0 2017-06-30 194.71 334 7.048386 0.28 0.41 -0.02 1
73 1161 1161 0 2017-09-30 177.66 334 7.367709 0.27 0.38 0.38 1
74 1161 1161 0 2018-03-31 163.55 334 7.406711 0.26 0.31 0.04 1
75 1161 1161 0 2018-09-30 266.52 334 7.410347 0.24 0.27 0.00 1
76 1161 1161 0 2019-06-30 317.80 334 7.333676 0.28 0.24 -0.07 1
77 1161 1161 0 2019-09-30 282.31 334 7.496097 0.27 0.20 0.18 1
78 1161 1161 0 2019-12-31 322.44 334 7.662468 0.24 0.11 0.18 1
79 1161 1161 0 2020-03-31 349.06 334 7.487734 0.27 0.12 -0.16 1
80 1161 1161 1 2020-06-30 339.87 334 7.566311 0.25 0.11 0.08 1
81 1161 1161 1 2020-09-30 278.60 334 7.937732 0.24 0.08 0.45 1
82 1161 1161 1 2021-03-31 231.50 334 8.144679 0.19 0.05 0.23 1
83 1161 1161 1 2021-06-30 192.08 334 8.255828 0.18 0.05 0.12 1
84 1161 1161 1 2021-09-30 188.64 334 8.369389 0.18 0.02 0.12 1
85 1161 1161 1 2021-12-31 204.81 334 8.481773 0.18 0.03 0.12 1
86 1161 1161 1 2022-03-31 228.23 334 8.680502 0.04 0.03 0.22 1
87 1161 1161 1 2022-06-30 225.85 334 8.787220 0.05 0.04 0.11 1
88 1161 1161 1 2022-09-30 284.65 334 8.624252 0.05 0.04 -0.15 1
89 1161 1161 1 2022-12-31 269.11 334 8.630343 0.05 0.04 0.01 1
90 1161 1161 1 2023-03-31 275.61 334 8.585412 0.05 0.04 -0.04 1
And the structure, in case it helps:
str(company_data_balanced)
'data.frame': 10853 obs. of 11 variables:
$ gvkey : chr "1045" "1045" "1045" "1045" ...
$ firmid : num 1045 1045 1045 1045 1045 ...
$ time : num 0 0 0 0 0 0 0 0 0 1 ...
$ datadate : chr "2017-12-31" "2018-03-31" "2018-06-30" "2018-09-30" ...
$ Days_Sales_Outstanding: num 60.3 63.5 60.9 68.5 56.9 ...
$ NAICS_3digits : chr "481" "481" "481" "481" ...
$ Firm_Size : num 9.27 9.25 9.36 9.36 9.3 ...
$ Fixed_Assets_Ratio : num 0.97 0.95 0.97 0.98 1 1.01 1 1.01 1.04 0.91 ...
$ Leverage : num 0.44 0.41 0.42 0.42 0.48 0.47 0.48 0.48 0.48 0.59 ...
$ Revenue_Growth : num -0.03 -0.02 0.12 -0.01 -0.05 -0.03 0.13 0 -0.05 -0.64 ...
$ Treatment_Control : num 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "na.action")= 'omit' Named int [1:5905] 1 22 23 24 26 27 28 29 30 31 ...
..- attr(*, "names")= chr [1:5905] "1" "22" "23" "24" ...