Propensity Score Matching

Analysis
Kidney Allocation
R
Author

Molly H. White

Published

February 7, 2024

The purpose of propensity score matching is to identify observations where the covariates that determine assignment to a particular group have similar values across different groups. In our example, we want to identify recipients of single kidney transplants whose Expected Post Transplant Survival (EPTS) characteristics are similar to those of recipients of en bloc kidneys. By identifying such patients, we will be able to better compare post-transplant survival outcomes by accounting for other possible confounding factors. In this case, these covariates are age, dialysis, diabetes, and previous transplant.

Note that we did not match based on donor characteristics. This is because after donor kidneys are assigned a sequence, they are treated equally to other kidneys in that sequence, so donor characteristics have no additional influence on assignment to each patient. When propensity score matching it is important to match only on those covariates which impact assignment of the treatment to each group.

Here’s a look at our dataframe:

enbloc_under8 REC_AGE_AT_TX dialysis_duration no_dial rec_diabetes CAN_PREV_TX
1 0 46 1.7397260 0 0 0
34 0 64 0.0000000 1 1 0
47 0 25 0.0000000 1 0 1
74 0 51 0.0000000 1 1 0
81 0 66 2.7315068 0 1 0
103 0 43 1.8246575 0 0 0
106 0 67 0.4958904 0 1 0
108 0 51 0.0000000 1 0 0
109 0 43 0.6136986 0 0 0
111 0 57 0.9260274 0 0 0

In this example, we are going to do 1-to-1 matching, where for each recipient of an ebloc from a donor < 8kg, we are finding 1 recipient of a sequence A SKT with similar characteristics. This is the most common form of propensity score matching but not the only (see Austin 2011)

To perform propensity score matching, we use the MatchIt library in R. To ensure that the covariates are sufficiently similar, we include cross terms that allow the standardized mean differences of the covariates to be < 0.1.

match_obj = matchit(enbloc_under8 ~ REC_AGE_AT_TX + dialysis_duration + no_dial + rec_diabetes + CAN_PREV_TX + REC_AGE_AT_TX*no_dial + REC_AGE_AT_TX*rec_diabetes, data = seq_a_data_under, method = "nearest", distance ="glm",
  ratio = 1,
  replace = FALSE)

Model specifications:

To return a dataset from this matching process, use the match.data() function as follows:

match_data_under = match.data(match_obj)

To visualize the standardized mean differences in the covariates, use the love.plot() function from the cobalt library

love.plot(match_obj, stats = "m", binary = "std", threshold = .1, 
          var.order = "unadjusted", var.names = v)
Warning: No shared levels found between `names(values)` of the manual scale and the
data's fill values.

As we can see in the love plot, without matching, recipients of en blocs from donors < 8kg have significantly different values of EPTS characteristics than recipients of sequence A SKTs. By matching on these characteristics, we are able to identify patients who have similar values for these characteristics, leading to standardized mean differences that are all below 0.1.

References:

Austin PC. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behav Res. 2011;46(3):399-424. doi:10.1080/00273171.2011.568786