df_cand_kipa <- df_cand_kipa_all %>%
mutate(waitlist_end_date = case_when(
is.na(REC_TX_DT) == FALSE ~ REC_TX_DT,
is.na(CAN_REM_DT) == FALSE ~ CAN_REM_DT,
is.na(CAN_LAST_INACT_STAT_DT) == FALSE & CAN_LAST_INACT_STAT_DT > CAN_LAST_ACT_STAT_DT ~ CAN_LAST_INACT_STAT_DT,
!is.na(CAN_LAST_ACT_STAT_DT) ~ CAN_LAST_ACT_STAT_DT,
is.na(CAN_LAST_ACT_STAT_DT) & !is.na(CAN_LAST_INACT_STAT_DT) ~ CAN_LAST_INACT_STAT_DT,
TRUE ~ CAN_LAST_ACT_STAT_DT)
)Multiple Listings Issue
Some patients registered for kidney or other organ transplants may be listed for a transplant at multiple centers. Failing to account for these multiple listings can lead to incorrect results in analyses of waitlist outcomes for these patients. In this post, we will go over the methods we used to deal with this multiple listings issue for our manuscript Association of Race and Ethnicity with Priority for Deceased Donor Kidney Transplant.
All lowercase variables are ones we created, all uppercase variables already exist in the SRTR dataset. For privacy, we have anonymized ID numbers and dates, all of which are stored in lowercase versions of the original variable names.
For all patients, we define the waitlist end date as the date of transplant, removal, last date of active status, or last date of inactive status:
For patients who only have one listing, their minimum list date and wait time are defined as follows:
min_list_date is equivalent to the CAN_LISTING_DT
wait_time is the difference between their waitlist_end_date and their min_list_date
single_registrations <- df_cand_kipa %>%
group_by(PERS_ID) %>%
mutate(num_list = n()) %>%
filter(num_list == 1) %>%
ungroup() %>%
mutate(min_list_date = CAN_LISTING_DT,
wait_time = waitlist_end_date - min_list_date,
outcome = case_when(
DON_TY == "C" ~ "DDKT",
DON_TY == "L" ~ "LDKT",
is.na(CAN_REM_CD) == FALSE ~ "removed/died",
TRUE ~ "censored"
))In the SRTR data set, there are two codes used to identify a patient: PERS_ID and PX_ID. PX_ID is the identifier for a patient’s unique transplant registration, whereas PERS_ID is unique to just the patient. So, for one PERS_ID, there could be several PX_ID codes.
| pers_id | px_id |
|---|---|
| 125259 | 267233 |
| 125259 | 286319 |
| 147634 | 265234 |
| 147634 | 26018 |
| 170660 | 35422 |
| 170660 | 69221 |
| 136502 | 209937 |
| 136502 | 108 |
| 4913 | 286454 |
| 4913 | 163548 |
There are two types of candidates that we classify as “multiple listed”: concurrent and sequential. Those who are listed at multiple centers at once are concurrently listed and those who are listed at multiple centers one after the other are sequentially listed.
Our goal is to consolidate instances of a patient being listed at multiple centers concurrently, but to treat sequential listings as separate observations. So, there may be multiple observations for one PERS_ID, as long as those observations represent non-overlapping time on the waitlist.
To distinguish between each type of listing we will label them as follows:
multiple_registrations <- multiple_registrations %>%
mutate(list_type = case_when(
CAN_LISTING_DT < lag(waitlist_end_date) ~ "concurrent",
waitlist_end_date > lead(CAN_LISTING_DT) ~ "concurrent",
TRUE ~ "sequential")) %>%
mutate(REC_TX_DT = as.Date(REC_TX_DT)) %>%
mutate(num_tx = length(unique(na.omit(REC_TX_DT)))) %>%
fill(REC_TX_DT, .direction='downup')Some patients receive multiple transplants, and therefore have multiple values for “REC_TX_DT”. To account for this, we implement a counter that changes value whenever the transplant date changes but the PERS_ID is the same. We then fill this value down the rows such that if the last row had a different counter value but the PERS_ID stays the same, the counter changes.
## Relocate order (sort) function
multiple_registrations <- multiple_registrations[order(multiple_registrations$PERS_ID, multiple_registrations$waitlist_end_date), ]
## Retransplant counter
multiple_registrations$transplant_num <- 1
## If transplant date changed from previous row to current row but person ID stayed the same, counter + 1
for(i in 2:nrow(multiple_registrations)) {
if(multiple_registrations$PERS_ID[i-1] == multiple_registrations$PERS_ID[i] &
multiple_registrations$REC_TX_DT[i-1] != multiple_registrations$REC_TX_DT[i] &
!is.na(multiple_registrations$REC_TX_DT[i])) {
multiple_registrations$transplant_num[i] = multiple_registrations$transplant_num[i-1] + 1
}
}
for(i in 2:nrow(multiple_registrations)) {
if(multiple_registrations$PERS_ID[i-1] == multiple_registrations$PERS_ID[i] &
multiple_registrations$transplant_num[i-1] != multiple_registrations$transplant_num[i] &
multiple_registrations$transplant_num[i-1] != 1) {
multiple_registrations$transplant_num[i] = multiple_registrations$transplant_num[i-1]
}
}Filling the data can sometimes lead to incorrect values. In our case, we had to correct wrong transplant dates by filling earlier values with the latter transplant date for concurrent observations within the same PERS_ID.
## Change the counter value back to 0 for sequential
multiple_registrations$transplant_num[multiple_registrations$list_type == 'sequential'] <- 0
## Unusual cases when filling data leads to incorrect transplant date
## Correct by taking the latter transplant date
for(i in 1:(nrow(multiple_registrations)-1)) {
if(multiple_registrations$PERS_ID[i] == multiple_registrations$PERS_ID[i+1] &
multiple_registrations$list_type[i] == 'concurrent' & multiple_registrations$list_type[i+1] == 'concurrent' &
!is.na(multiple_registrations$REC_TX_DT[i]) & !is.na(multiple_registrations$REC_TX_DT[i+1]) &
multiple_registrations$REC_TX_DT[i] < multiple_registrations$REC_TX_DT[i+1] ) {
multiple_registrations$REC_TX_DT[i] <- multiple_registrations$REC_TX_DT[i+1]
}
}
sequential_lists <- multiple_registrations %>%
filter(list_type == "sequential") %>%
mutate(min_list_date = CAN_LISTING_DT,
wait_time = waitlist_end_date - min_list_date,
outcome = case_when(
DON_TY == "C" ~ "DDKT",
DON_TY == "L" ~ "LDKT",
is.na(CAN_REM_CD) == FALSE ~ "removed/died",
TRUE ~ "censored"
))
## How many possible transplants do we need to account for
max_retransplants <- max(multiple_registrations$transplant_num)
## Minimum list date for each concurrent transplant
multiple_registrations <- multiple_registrations %>%
group_by(PERS_ID, transplant_num) %>%
mutate(min_list_date = min(CAN_LISTING_DT, na.rm=T),
wait_time = waitlist_end_date - min_list_date)To collapse these concurrent listings, we want one observation for each patient at each “transplant number”.
collapsed_concurrent_registrations <- NULL
for(i in 1:max_retransplants) {
collapsed_concurrent_registrations <- rbind(collapsed_concurrent_registrations,
multiple_registrations %>%
filter(list_type == "concurrent" & transplant_num == i) %>% ## Do it separately for each transplant counter number
mutate(DON_TY = ifelse(DON_TY == "", NA, DON_TY),
last_wait_date = max(waitlist_end_date, na.rm = TRUE)) %>%
fill(REC_TX_DT, .direction = "up") %>%
fill(DON_TY, .direction = "up") %>%
fill(DONOR_ID, .direction = "up") %>%
fill(CAN_REM_CD, .direction = "up") %>%
mutate(wait_time = case_when(
is.na(REC_TX_DT) == FALSE & transplant_num != '0' ~ REC_TX_DT- min_list_date, ### Ignore non-transplanted rows
TRUE ~ last_wait_date - min_list_date),
outcome = case_when(
DON_TY == "C" ~ "DDKT",
DON_TY == "L" ~ "LDKT",
is.na(CAN_REM_CD) == FALSE ~ "removed/died",
TRUE ~ "censored")
) %>%
select(-c(waitlist_end_date, CAN_LISTING_DT, CAN_REM_DT)) %>%
filter(row_number() ==1) %>%
mutate(last_wait_date = case_when(
REC_TX_DT < last_wait_date ~ REC_TX_DT,
TRUE ~last_wait_date)))
}So this…
| pers_id | px_id | num_list | transplant_num | list_type | rec_tx_dt |
|---|---|---|---|---|---|
| 129197 | 211951 | 4 | 1 | concurrent | 2001-11-11 |
| 129197 | 201029 | 4 | 2 | concurrent | 2002-01-19 |
| 129197 | 153197 | 4 | 2 | concurrent | 2002-02-01 |
| 248512 | 89145 | 6 | 1 | concurrent | 2003-06-05 |
| 248512 | 252951 | 6 | 1 | concurrent | 2003-06-05 |
| 248512 | 130133 | 6 | 1 | concurrent | 2003-06-07 |
| 248512 | 189125 | 6 | 2 | concurrent | 2004-07-24 |
| 248512 | 197677 | 6 | 2 | concurrent | 2004-07-25 |
| 248512 | 120614 | 6 | 2 | concurrent | 2004-07-26 |
| 27204 | 180595 | 4 | 1 | concurrent | 2003-06-22 |
| 27204 | 213683 | 4 | 1 | concurrent | 2003-06-25 |
| 27204 | 208840 | 4 | 1 | concurrent | 2004-08-21 |
| 27204 | 43006 | 4 | 2 | concurrent | 2006-05-06 |
Turns into this:
| pers_id | px_id | rec_tx_dt | num_list | transplant_num | list_type |
|---|---|---|---|---|---|
| 27204 | 180595 | 2003-06-22 | 4 | 1 | concurrent |
| 27204 | 43006 | 2006-05-06 | 4 | 2 | concurrent |
| 129197 | 211951 | 2001-11-11 | 4 | 1 | concurrent |
| 129197 | 201029 | 2002-01-19 | 4 | 2 | concurrent |
| 248512 | 89145 | 2003-06-05 | 6 | 1 | concurrent |
| 248512 | 189125 | 2004-07-24 | 6 | 2 | concurrent |
In some cases, there are multiple rounds of concurrent listings. Here is an example of such an edge case, where a patient is has 3 concurrent listings and receives a transplant, and then has concurrent relistings.
| pers_id | list_num | list_type | can_listing_dt | waitlist_end_date | num_tx | rec_tx_dt |
|---|---|---|---|---|---|---|
| 248512 | 1 | concurrent | 2000-01-01 | 2003-06-05 | 2 | 2003-06-05 |
| 248512 | 2 | concurrent | 2000-01-03 | 2003-06-05 | 2 | 2003-06-05 |
| 248512 | 3 | concurrent | 2002-07-10 | 2003-06-07 | 2 | 2003-06-07 |
| 248512 | 4 | concurrent | 2003-10-07 | 2004-07-24 | 2 | 2004-07-24 |
| 248512 | 5 | concurrent | 2003-10-08 | 2004-07-25 | 2 | 2004-07-25 |
| 248512 | 6 | concurrent | 2003-11-26 | 2004-07-26 | 2 | 2004-07-26 |
This patient’s observations are collapsed into two observations, reflecting their original time on the waitlist, as well as their relisted period:
| pers_id | list_type | can_listing_dt | last_wait_date | num_tx | rec_tx_dt |
|---|---|---|---|---|---|
| 248512 | concurrent | 2000-01-01 | 2003-06-07 | 2 | 2003-06-05 |
| 248512 | concurrent | 2003-10-07 | 2004-07-26 | 2 | 2004-07-24 |
Finally, we recombine the collapsed concurrent registrations with the single registrations and sequential listings, to form one dataset.
