3 Curate a master UKB event table

This chapter gathers different types of fields available in the UKB assessment center data and generates a master event table. We also convert UKB defined special dates to normal dates. We take the following fields (details of which can be searched here) from the UKB assessment center data:

  • first occurrence fields in first_occur_UKB.RDS

  • algorithmically defined outcome fields in demog_UKB.RDS:

    • f.42000.0.0
    • f.42008.0.0
    • f.42010.0.0
    • f.42012.0.0
    • f.42006.0.0
    • f.42026.0.0
  • ICD10 code fields and their date fields in ICD_UKB.RDS:

    • starts with f.41270 (ICD10 code)
    • starts with f.41280 (ICD10 code event date)
    • starts with f.40001 (ICD10 code primary death)
    • starts with f.40002 (ICD10 code secondary death)
    • starts with f.40000 (ICD10 code death date)
  • OPCS4 code fields in OPCS_procedures_UKB.RDS:

    • starts with f.41272 (OPCS4 code)
    • starts with f.41282 (OPCS4 code event date)
  • self-reported condition field in demog_UKB.RDS:

    • starts with f.20002 (self-reported condition code)
    • starts with f.20008 (self-reported condition code event date)
  • self-reported operation field in demog_UKB.RDS:

    • starts with f.20004 (self-reported operation code)
    • starts with f.20010 (self-reported operation code event date)

In addition to these pre-defined fields, we define a custom field called dr_self which is a combination of the following fields:

  • f.5901.0.0
  • f.5901.1.0
  • f.5901.2.0
  • f.5901.3.0

These fields record age at which diabetic eye disease was diagnosed at four different time points. Using this information and the date of birth of a subject, we define the first occurrence event date for this customized outcome dr_self. Custom fields can be defined by users by combining multiple pre-defined fields. However, dr_self is the only custom outcome we define using the UKB assessment center data and included in the master event table.

When converting special dates to “normal” dates, we use the following mapping defined by UKB study:

  • First occurrence, algorithimically defined outcome and OPC4 code event date fields
Special date Map
1900-01-01 Missing
1901-01-01 Missing
2037-07-07 Missing
1902-02-02 DOB of a subject
1903-03-03 DOB of a subject
  • Self-reported condition and self-reported operation code event date fields:
Special date Map
decimal date < 1900 Missing

The conversion is carried out by the cleandates() function defined in functions.R.

Load packages.

library(tidyverse)
library(data.table)
library(lubridate)
source("functions.R")

Load reformatted raw UKB assessment data for generating a master UKB event table.

firstoccurs <- readRDS("generated_data/first_occur_UKB.RDS")
ICD <- readRDS("generated_data/ICD_UKB.RDS")
procs <- readRDS("generated_data/OPCS_procedures_UKB.RDS")
demog <- readRDS("generated_data/demog_UKB.RDS")

Define the date of birth and gender

demog <- demog %>% 
  rename(YOB = f.34.0.0) %>%
  rename(MOB = f.52.0.0) %>%
  mutate(DOB = lubridate::make_date(YOB, MOB)) %>%
  mutate(SEX = as.character(f.31.0.0))