Multiple Listings Issue

Data Cleaning

Kidney Allocation

Author

Molly White

Published

March 22, 2024

Some patients registered for kidney or other organ transplants may be listed for a transplant at multiple centers. Failing to account for these multiple listings can lead to incorrect results in analyses of waitlist outcomes for these patients. In this post, we will go over the methods we used to deal with this multiple listings issue for our manuscript Association of Race and Ethnicity with Priority for Deceased Donor Kidney Transplant.

Note

All lowercase variables are ones we created, all uppercase variables already exist in the SRTR dataset. For privacy, we have anonymized ID numbers and dates, all of which are stored in lowercase versions of the original variable names.

For all patients, we define the waitlist end date as the date of transplant, removal, last date of active status, or last date of inactive status:

df_cand_kipa <- df_cand_kipa_all %>%
  mutate(waitlist_end_date = case_when(
    is.na(REC_TX_DT) == FALSE ~ REC_TX_DT,
    is.na(CAN_REM_DT) == FALSE ~ CAN_REM_DT,
    is.na(CAN_LAST_INACT_STAT_DT) == FALSE & CAN_LAST_INACT_STAT_DT > CAN_LAST_ACT_STAT_DT ~ CAN_LAST_INACT_STAT_DT,
    !is.na(CAN_LAST_ACT_STAT_DT) ~ CAN_LAST_ACT_STAT_DT,
    is.na(CAN_LAST_ACT_STAT_DT) & !is.na(CAN_LAST_INACT_STAT_DT) ~ CAN_LAST_INACT_STAT_DT,
    TRUE ~ CAN_LAST_ACT_STAT_DT)
    )

For patients who only have one listing, their minimum list date and wait time are defined as follows:

min_list_date is equivalent to the CAN_LISTING_DT
wait_time is the difference between their waitlist_end_date and their min_list_date

single_registrations <- df_cand_kipa %>%
  group_by(PERS_ID) %>%
  mutate(num_list = n()) %>%
  filter(num_list == 1) %>%
  ungroup() %>% 
  mutate(min_list_date = CAN_LISTING_DT,
         wait_time = waitlist_end_date - min_list_date,
         outcome = case_when(
           DON_TY == "C" ~ "DDKT",
           DON_TY == "L" ~ "LDKT",
           is.na(CAN_REM_CD) == FALSE ~ "removed/died",
           TRUE ~ "censored"
         ))

In the SRTR data set, there are two codes used to identify a patient: PERS_ID and PX_ID. PX_ID is the identifier for a patient’s unique transplant registration, whereas PERS_ID is unique to just the patient. So, for one PERS_ID, there could be several PX_ID codes.

pers_id	px_id
125259	267233
125259	286319
147634	265234
147634	26018
170660	35422
170660	69221
136502	209937
136502	108
4913	286454
4913	163548

There are two types of candidates that we classify as “multiple listed”: concurrent and sequential. Those who are listed at multiple centers at once are concurrently listed and those who are listed at multiple centers one after the other are sequentially listed.

Our goal is to consolidate instances of a patient being listed at multiple centers concurrently, but to treat sequential listings as separate observations. So, there may be multiple observations for one PERS_ID, as long as those observations represent non-overlapping time on the waitlist.

To distinguish between each type of listing we will label them as follows:

multiple_registrations <- multiple_registrations %>%
  mutate(list_type = case_when(
    CAN_LISTING_DT < lag(waitlist_end_date) ~ "concurrent",
    waitlist_end_date > lead(CAN_LISTING_DT) ~ "concurrent",
    TRUE ~ "sequential")) %>%
  mutate(REC_TX_DT = as.Date(REC_TX_DT)) %>%
  mutate(num_tx = length(unique(na.omit(REC_TX_DT)))) %>%
  fill(REC_TX_DT, .direction='downup')

Some patients receive multiple transplants, and therefore have multiple values for “REC_TX_DT”. To account for this, we implement a counter that changes value whenever the transplant date changes but the PERS_ID is the same. We then fill this value down the rows such that if the last row had a different counter value but the PERS_ID stays the same, the counter changes.

## Relocate order (sort) function
multiple_registrations <- multiple_registrations[order(multiple_registrations$PERS_ID, multiple_registrations$waitlist_end_date), ]


## Retransplant counter
multiple_registrations$transplant_num <- 1


## If transplant date changed from previous row to current row but person ID stayed the same, counter + 1
for(i in 2:nrow(multiple_registrations)) {
  if(multiple_registrations$PERS_ID[i-1] == multiple_registrations$PERS_ID[i] &
     multiple_registrations$REC_TX_DT[i-1] != multiple_registrations$REC_TX_DT[i] &
     !is.na(multiple_registrations$REC_TX_DT[i])) {
    
    multiple_registrations$transplant_num[i] = multiple_registrations$transplant_num[i-1] + 1
     } 
}

for(i in 2:nrow(multiple_registrations)) {
  if(multiple_registrations$PERS_ID[i-1] == multiple_registrations$PERS_ID[i] &
     multiple_registrations$transplant_num[i-1] != multiple_registrations$transplant_num[i] &
     multiple_registrations$transplant_num[i-1] != 1) {
    
    multiple_registrations$transplant_num[i] = multiple_registrations$transplant_num[i-1]
     } 
}

Filling the data can sometimes lead to incorrect values. In our case, we had to correct wrong transplant dates by filling earlier values with the latter transplant date for concurrent observations within the same PERS_ID.

## Change the counter value back to 0 for sequential
multiple_registrations$transplant_num[multiple_registrations$list_type == 'sequential'] <- 0


## Unusual cases when filling data leads to incorrect transplant date
## Correct by taking the latter transplant date
for(i in 1:(nrow(multiple_registrations)-1)) {
  if(multiple_registrations$PERS_ID[i] == multiple_registrations$PERS_ID[i+1] &
     multiple_registrations$list_type[i] == 'concurrent' & multiple_registrations$list_type[i+1] == 'concurrent' &
     !is.na(multiple_registrations$REC_TX_DT[i]) & !is.na(multiple_registrations$REC_TX_DT[i+1]) &
     multiple_registrations$REC_TX_DT[i] < multiple_registrations$REC_TX_DT[i+1] ) {
    
    multiple_registrations$REC_TX_DT[i] <- multiple_registrations$REC_TX_DT[i+1] 
     }
}

sequential_lists <- multiple_registrations %>%
  filter(list_type == "sequential") %>%
  mutate(min_list_date = CAN_LISTING_DT,
         wait_time = waitlist_end_date - min_list_date,
         outcome = case_when(
           DON_TY == "C" ~ "DDKT",
           DON_TY == "L" ~ "LDKT",
           is.na(CAN_REM_CD) == FALSE ~ "removed/died",
           TRUE ~ "censored"
         ))

## How many possible transplants do we need to account for
max_retransplants <- max(multiple_registrations$transplant_num)


## Minimum list date for each concurrent transplant
multiple_registrations <- multiple_registrations %>%
  group_by(PERS_ID, transplant_num) %>%
  mutate(min_list_date = min(CAN_LISTING_DT, na.rm=T),
         wait_time = waitlist_end_date - min_list_date)

To collapse these concurrent listings, we want one observation for each patient at each “transplant number”.

collapsed_concurrent_registrations <- NULL
for(i in 1:max_retransplants) {
  
  collapsed_concurrent_registrations <- rbind(collapsed_concurrent_registrations, 
        
  multiple_registrations %>%
    filter(list_type == "concurrent" & transplant_num == i) %>% ## Do it separately for each transplant counter number
    mutate(DON_TY = ifelse(DON_TY == "", NA, DON_TY),
           last_wait_date = max(waitlist_end_date, na.rm = TRUE)) %>%
    fill(REC_TX_DT, .direction = "up") %>%
    fill(DON_TY, .direction = "up") %>%
    fill(DONOR_ID, .direction = "up") %>%
    fill(CAN_REM_CD, .direction = "up") %>%
    mutate(wait_time = case_when(
      is.na(REC_TX_DT) == FALSE & transplant_num != '0' ~ REC_TX_DT- min_list_date, ### Ignore non-transplanted rows
      TRUE ~ last_wait_date - min_list_date),
      outcome = case_when(
        DON_TY == "C" ~ "DDKT",
        DON_TY == "L" ~ "LDKT",
        is.na(CAN_REM_CD) == FALSE ~ "removed/died",
        TRUE ~ "censored")
    ) %>%
    select(-c(waitlist_end_date, CAN_LISTING_DT, CAN_REM_DT)) %>%
    filter(row_number() ==1) %>%
    
    mutate(last_wait_date = case_when(
      REC_TX_DT < last_wait_date ~ REC_TX_DT,
      TRUE ~last_wait_date))) 
  
}

So this…

pers_id	px_id	num_list	transplant_num	list_type	rec_tx_dt
129197	211951	4	1	concurrent	2001-11-11
129197	201029	4	2	concurrent	2002-01-19
129197	153197	4	2	concurrent	2002-02-01
248512	89145	6	1	concurrent	2003-06-05
248512	252951	6	1	concurrent	2003-06-05
248512	130133	6	1	concurrent	2003-06-07
248512	189125	6	2	concurrent	2004-07-24
248512	197677	6	2	concurrent	2004-07-25
248512	120614	6	2	concurrent	2004-07-26
27204	180595	4	1	concurrent	2003-06-22
27204	213683	4	1	concurrent	2003-06-25
27204	208840	4	1	concurrent	2004-08-21
27204	43006	4	2	concurrent	2006-05-06

Turns into this:

pers_id	px_id	rec_tx_dt	num_list	transplant_num	list_type
27204	180595	2003-06-22	4	1	concurrent
27204	43006	2006-05-06	4	2	concurrent
129197	211951	2001-11-11	4	1	concurrent
129197	201029	2002-01-19	4	2	concurrent
248512	89145	2003-06-05	6	1	concurrent
248512	189125	2004-07-24	6	2	concurrent

In some cases, there are multiple rounds of concurrent listings. Here is an example of such an edge case, where a patient is has 3 concurrent listings and receives a transplant, and then has concurrent relistings.

pers_id	list_num	list_type	can_listing_dt	waitlist_end_date	num_tx	rec_tx_dt
248512	1	concurrent	2000-01-01	2003-06-05	2	2003-06-05
248512	2	concurrent	2000-01-03	2003-06-05	2	2003-06-05
248512	3	concurrent	2002-07-10	2003-06-07	2	2003-06-07
248512	4	concurrent	2003-10-07	2004-07-24	2	2004-07-24
248512	5	concurrent	2003-10-08	2004-07-25	2	2004-07-25
248512	6	concurrent	2003-11-26	2004-07-26	2	2004-07-26

This patient’s observations are collapsed into two observations, reflecting their original time on the waitlist, as well as their relisted period:

pers_id	list_type	can_listing_dt	last_wait_date	num_tx	rec_tx_dt
248512	concurrent	2000-01-01	2003-06-07	2	2003-06-05
248512	concurrent	2003-10-07	2004-07-26	2	2004-07-24

Finally, we recombine the collapsed concurrent registrations with the single registrations and sequential listings, to form one dataset.