R Learner, Causal Inference and PCA in Production
Published:
The tricky part of being senior is having too much to do for everything and too little time for the core responsibilities of your role. I saw this decades ago with my PhD supervisor — one of the finest scientists — spending hours approving expenditures, writing budgets and grants, managing funds, and attending faculty and committee meetings. The business world isn’t much different.
The bright side of senior roles, however, is the occasional opportunity to learn from eager and talented team members. It’s a pleasure to sit in a meeting with one of our brightest minds, discussing aspects of causal inference, including the challenge of properly validating CI models due to counterfactuals.
In causal inference, we often want to estimate heterogeneous treatment effects (HTE) — how a treatment or intervention’s effect varies across individuals or contexts. Classical approaches include T-learner (Fit separate models for treatment and control, then subtract predictions.) and S-learner (Fit a single model with treatment as a feature). The R-Learner, however, is a more robust, flexible approach that combines the advantages of these methods while being orthogonal to nuisance estimation, which reduces bias and improves efficiency — especially with high-dimensional covariates. It can, however, be computationally challenging when the feature count is massive. PCA or regularisation can help.
It got us to thinking about PCA and how it can be productionized. Can we treat it like normalization — fit a scaler on the training set and apply the same to the test set? I don’t think PCA works exactly like that, because while we can control the number of eigenvectors, the actual mapping can vary.