I have two functions below. The first function is called unlevered_beta_f, and the second function is called industry_total_beta_f. The second function uses polars from rust which helps me to manipulate a DataFrame that is being read from a CSV file. I want to create a new column using the first function, but I am not quite sure how to do it successfully.
pub fn unlevered_beta_f(
levered_beta: f32,
de_ratio: f32,
marginal_tax_rate: Option<f32>,
effective_tax_rate: f32,
cash_firm_value: f32,
) -> Option<f32> {
// Do you want to use marginal or effective tax trates in unlevering betas?
// if marginal tax rate, enter the marginal tax rate to use
let tax_rate = tax_rate_f(marginal_tax_rate, effective_tax_rate);
let mut unlevered_beta = levered_beta / (1.0 + (1.0 - tax_rate) * de_ratio);
unlevered_beta = unlevered_beta / (1.0 - cash_firm_value);
return Some(unlevered_beta);
}
pub fn industry_total_beta_f(raw_data: DataFrame) -> DataFrame {
let df = raw_data
.clone()
.lazy()
.with_columns([unlevered_beta_f(
col("Average of Beta"),
col("Sum of Total Debt incl leases (in US $)") / col("Sum of Market Cap (in US $)"),
marginal_tax_rate = marginal_tax_rate,
col("Average of Effective Tax Rate"),
col("Sum of Cash") / col("Sum of Firm Value (in US $)"),
)
.alias("Average Unlevered Beta")])
.with_columns([
(col("Average Unlevered Beta") / col("Average of Correlation with market"))
.alias("Total Unlevered Beta"),
(col("Average of Beta") / col("Average of Correlation with market"))
.alias("Total Levered Beta"),
])
.select([
col("Industry Name"),
col("Number of firms"),
col("Average Unlevered Beta"),
col("Average of Beta"),
col("Average of Correlation with market"),
col("Total Unlevered Beta"),
col("Total Levered Beta"),
])
.collect()
.unwrap();
return df;
}
I tried the code above, but everything works except for the following section of the code:
.with_columns([unlevered_beta_f(
col("Average of Beta"),
col("Sum of Total Debt incl leases (in US $)") / col("Sum of Market Cap (in US $)"),
marginal_tax_rate = marginal_tax_rate,
col("Average of Effective Tax Rate"),
col("Sum of Cash") / col("Sum of Firm Value (in US $)"),
)
.alias("Average Unlevered Beta")])
I want to create a column called "Average Unlevered Beta", which takes the following columns as inputs obtained from a CSV file. In the other section of the code, I successfully created a new column, but I am not quite sure how to do it using a function.
A general remark: if you can make use of the polars Expression system, do that instead. It result in much more readable code, and is also slightly more performant for larger number of records (I did some quick benchmarks, see below).
If you can't (because, for example, the
tax_rate_f
function in your example is not expressible as a polars Expression), then you can apply a function to a subset of columns via theas_struct
in combination withmap
, as explained in another SO question. Note that I'm making use here of a third party dependency,itertools
, to easily iterate over multiple zipped iterators.Based on the comments you included in your code, I assumed a very simple implementation of the
tax_rate_f
function. I then also implemented both theunlevered_beta_f
andtax_rate_f
as polars Expression functions, to show the difference in complexity.I benchmarked both approaches using the
divan
crate, and got the following result for 10M records:As you can see, the approach using polar's Expression syntax is slightly faster. For smaller number of records, it's actually the other way around. I'm not familiar enough with the internals of polars to explain this observation. Do take these benchmarks with a grain of salt: the random DataFrame generation is part of the benchmark, but I assume the time spend is similar for both approaches.