Multiple Listings Issue

Data Cleaning
Kidney Allocation
R
Author

Molly White

Published

March 22, 2024

Some patients registered for kidney or other organ transplants may be listed for a transplant at multiple centers. Failing to account for these multiple listings can lead to incorrect results in analyses of waitlist outcomes for these patients. In this post, we will go over the methods we used to deal with this multiple listings issue for our manuscript Association of Race and Ethnicity with Priority for Deceased Donor Kidney Transplant.

Note

All lowercase variables are ones we created, all uppercase variables already exist in the SRTR dataset. For privacy, we have anonymized ID numbers and dates, all of which are stored in lowercase versions of the original variable names.

For all patients, we define the waitlist end date as the date of transplant, removal, last date of active status, or last date of inactive status:

df_cand_kipa <- df_cand_kipa_all %>%
  mutate(waitlist_end_date = case_when(
    is.na(REC_TX_DT) == FALSE ~ REC_TX_DT,
    is.na(CAN_REM_DT) == FALSE ~ CAN_REM_DT,
    is.na(CAN_LAST_INACT_STAT_DT) == FALSE & CAN_LAST_INACT_STAT_DT > CAN_LAST_ACT_STAT_DT ~ CAN_LAST_INACT_STAT_DT,
    !is.na(CAN_LAST_ACT_STAT_DT) ~ CAN_LAST_ACT_STAT_DT,
    is.na(CAN_LAST_ACT_STAT_DT) & !is.na(CAN_LAST_INACT_STAT_DT) ~ CAN_LAST_INACT_STAT_DT,
    TRUE ~ CAN_LAST_ACT_STAT_DT)
    )

For patients who only have one listing, their minimum list date and wait time are defined as follows:

single_registrations <- df_cand_kipa %>%
  group_by(PERS_ID) %>%
  mutate(num_list = n()) %>%
  filter(num_list == 1) %>%
  ungroup() %>% 
  mutate(min_list_date = CAN_LISTING_DT,
         wait_time = waitlist_end_date - min_list_date,
         outcome = case_when(
           DON_TY == "C" ~ "DDKT",
           DON_TY == "L" ~ "LDKT",
           is.na(CAN_REM_CD) == FALSE ~ "removed/died",
           TRUE ~ "censored"
         ))

In the SRTR data set, there are two codes used to identify a patient: PERS_ID and PX_ID. PX_ID is the identifier for a patient’s unique transplant registration, whereas PERS_ID is unique to just the patient. So, for one PERS_ID, there could be several PX_ID codes.

pers_id px_id
125259 267233
125259 286319
147634 265234
147634 26018
170660 35422
170660 69221
136502 209937
136502 108
4913 286454
4913 163548

There are two types of candidates that we classify as “multiple listed”: concurrent and sequential. Those who are listed at multiple centers at once are concurrently listed and those who are listed at multiple centers one after the other are sequentially listed.

Our goal is to consolidate instances of a patient being listed at multiple centers concurrently, but to treat sequential listings as separate observations. So, there may be multiple observations for one PERS_ID, as long as those observations represent non-overlapping time on the waitlist.

To distinguish between each type of listing we will label them as follows:

multiple_registrations <- multiple_registrations %>%
  mutate(list_type = case_when(
    CAN_LISTING_DT < lag(waitlist_end_date) ~ "concurrent",
    waitlist_end_date > lead(CAN_LISTING_DT) ~ "concurrent",
    TRUE ~ "sequential")) %>%
  mutate(REC_TX_DT = as.Date(REC_TX_DT)) %>%
  mutate(num_tx = length(unique(na.omit(REC_TX_DT)))) %>%
  fill(REC_TX_DT, .direction='downup')

Some patients receive multiple transplants, and therefore have multiple values for “REC_TX_DT”. To account for this, we implement a counter that changes value whenever the transplant date changes but the PERS_ID is the same. We then fill this value down the rows such that if the last row had a different counter value but the PERS_ID stays the same, the counter changes.

## Relocate order (sort) function
multiple_registrations <- multiple_registrations[order(multiple_registrations$PERS_ID, multiple_registrations$waitlist_end_date), ]


## Retransplant counter
multiple_registrations$transplant_num <- 1


## If transplant date changed from previous row to current row but person ID stayed the same, counter + 1
for(i in 2:nrow(multiple_registrations)) {
  if(multiple_registrations$PERS_ID[i-1] == multiple_registrations$PERS_ID[i] &
     multiple_registrations$REC_TX_DT[i-1] != multiple_registrations$REC_TX_DT[i] &
     !is.na(multiple_registrations$REC_TX_DT[i])) {
    
    multiple_registrations$transplant_num[i] = multiple_registrations$transplant_num[i-1] + 1
     } 
}

for(i in 2:nrow(multiple_registrations)) {
  if(multiple_registrations$PERS_ID[i-1] == multiple_registrations$PERS_ID[i] &
     multiple_registrations$transplant_num[i-1] != multiple_registrations$transplant_num[i] &
     multiple_registrations$transplant_num[i-1] != 1) {
    
    multiple_registrations$transplant_num[i] = multiple_registrations$transplant_num[i-1]
     } 
}


Filling the data can sometimes lead to incorrect values. In our case, we had to correct wrong transplant dates by filling earlier values with the latter transplant date for concurrent observations within the same PERS_ID.

## Change the counter value back to 0 for sequential
multiple_registrations$transplant_num[multiple_registrations$list_type == 'sequential'] <- 0


## Unusual cases when filling data leads to incorrect transplant date
## Correct by taking the latter transplant date
for(i in 1:(nrow(multiple_registrations)-1)) {
  if(multiple_registrations$PERS_ID[i] == multiple_registrations$PERS_ID[i+1] &
     multiple_registrations$list_type[i] == 'concurrent' & multiple_registrations$list_type[i+1] == 'concurrent' &
     !is.na(multiple_registrations$REC_TX_DT[i]) & !is.na(multiple_registrations$REC_TX_DT[i+1]) &
     multiple_registrations$REC_TX_DT[i] < multiple_registrations$REC_TX_DT[i+1] ) {
    
    multiple_registrations$REC_TX_DT[i] <- multiple_registrations$REC_TX_DT[i+1] 
     }
}

sequential_lists <- multiple_registrations %>%
  filter(list_type == "sequential") %>%
  mutate(min_list_date = CAN_LISTING_DT,
         wait_time = waitlist_end_date - min_list_date,
         outcome = case_when(
           DON_TY == "C" ~ "DDKT",
           DON_TY == "L" ~ "LDKT",
           is.na(CAN_REM_CD) == FALSE ~ "removed/died",
           TRUE ~ "censored"
         ))

## How many possible transplants do we need to account for
max_retransplants <- max(multiple_registrations$transplant_num)


## Minimum list date for each concurrent transplant
multiple_registrations <- multiple_registrations %>%
  group_by(PERS_ID, transplant_num) %>%
  mutate(min_list_date = min(CAN_LISTING_DT, na.rm=T),
         wait_time = waitlist_end_date - min_list_date)

To collapse these concurrent listings, we want one observation for each patient at each “transplant number”.

collapsed_concurrent_registrations <- NULL
for(i in 1:max_retransplants) {
  
  collapsed_concurrent_registrations <- rbind(collapsed_concurrent_registrations, 
        
  multiple_registrations %>%
    filter(list_type == "concurrent" & transplant_num == i) %>% ## Do it separately for each transplant counter number
    mutate(DON_TY = ifelse(DON_TY == "", NA, DON_TY),
           last_wait_date = max(waitlist_end_date, na.rm = TRUE)) %>%
    fill(REC_TX_DT, .direction = "up") %>%
    fill(DON_TY, .direction = "up") %>%
    fill(DONOR_ID, .direction = "up") %>%
    fill(CAN_REM_CD, .direction = "up") %>%
    mutate(wait_time = case_when(
      is.na(REC_TX_DT) == FALSE & transplant_num != '0' ~ REC_TX_DT- min_list_date, ### Ignore non-transplanted rows
      TRUE ~ last_wait_date - min_list_date),
      outcome = case_when(
        DON_TY == "C" ~ "DDKT",
        DON_TY == "L" ~ "LDKT",
        is.na(CAN_REM_CD) == FALSE ~ "removed/died",
        TRUE ~ "censored")
    ) %>%
    select(-c(waitlist_end_date, CAN_LISTING_DT, CAN_REM_DT)) %>%
    filter(row_number() ==1) %>%
    
    mutate(last_wait_date = case_when(
      REC_TX_DT < last_wait_date ~ REC_TX_DT,
      TRUE ~last_wait_date))) 
  
}

So this…

pers_id px_id num_list transplant_num list_type rec_tx_dt
129197 211951 4 1 concurrent 2001-11-11
129197 201029 4 2 concurrent 2002-01-19
129197 153197 4 2 concurrent 2002-02-01
248512 89145 6 1 concurrent 2003-06-05
248512 252951 6 1 concurrent 2003-06-05
248512 130133 6 1 concurrent 2003-06-07
248512 189125 6 2 concurrent 2004-07-24
248512 197677 6 2 concurrent 2004-07-25
248512 120614 6 2 concurrent 2004-07-26
27204 180595 4 1 concurrent 2003-06-22
27204 213683 4 1 concurrent 2003-06-25
27204 208840 4 1 concurrent 2004-08-21
27204 43006 4 2 concurrent 2006-05-06


Turns into this:

pers_id px_id rec_tx_dt num_list transplant_num list_type
27204 180595 2003-06-22 4 1 concurrent
27204 43006 2006-05-06 4 2 concurrent
129197 211951 2001-11-11 4 1 concurrent
129197 201029 2002-01-19 4 2 concurrent
248512 89145 2003-06-05 6 1 concurrent
248512 189125 2004-07-24 6 2 concurrent


In some cases, there are multiple rounds of concurrent listings. Here is an example of such an edge case, where a patient is has 3 concurrent listings and receives a transplant, and then has concurrent relistings.

pers_id list_num list_type can_listing_dt waitlist_end_date num_tx rec_tx_dt
248512 1 concurrent 2000-01-01 2003-06-05 2 2003-06-05
248512 2 concurrent 2000-01-03 2003-06-05 2 2003-06-05
248512 3 concurrent 2002-07-10 2003-06-07 2 2003-06-07
248512 4 concurrent 2003-10-07 2004-07-24 2 2004-07-24
248512 5 concurrent 2003-10-08 2004-07-25 2 2004-07-25
248512 6 concurrent 2003-11-26 2004-07-26 2 2004-07-26


This patient’s observations are collapsed into two observations, reflecting their original time on the waitlist, as well as their relisted period:

pers_id list_type can_listing_dt last_wait_date num_tx rec_tx_dt
248512 concurrent 2000-01-01 2003-06-07 2 2003-06-05
248512 concurrent 2003-10-07 2004-07-26 2 2004-07-24


Finally, we recombine the collapsed concurrent registrations with the single registrations and sequential listings, to form one dataset.