8 Phenotype outcome and control exclusion events using UKB assessment center data

In this chapter, we use the master UKB event table created in chapter 3 to generate different outcome event tables. We phenotype the following outcomes:

  • Diabetes (DM)
  • Myocardial Infarction (MI)
  • Unstable Angina (UA)
  • Ischemic Stroke (IS)
  • Hemorrhagic Stroke (HS)
  • Stroke (ST)
  • PCI
  • Composite CVD
  • Diabetic retinopathy (DR)
  • Diabetic kidney disease (DKD)

The following event tables will also be created to exclude certain controls from time-to-event tables in later chapters:

  • Diabetic eye disease control exclusion events
  • Diabetic kidney disease control exclusion events
  • Cardiovascular control exclusion events
  • Cerebrovascular control exclusion events
  • Non-coronary revascularization control exclusion events

The general phenotyping procedure for complication outcomes and control exclusion outcomes are the same. All we have to do is to define patterns we want to search and match these patterns from the master UKB event table.

The pattern searching is abstracted away in a function get_phenotype_tab() in functions.R. The function searches certain regular expression patterns corresponding to outcomes from the master UKB event table and outputs the table containing all of the events matching the patterns. The following are the types of patterns the function accepts:

  • UKB-defined field patterns
  • custom-defined field patterns
  • ICD10 code patterns (also accepts more specific ICD10 codes for primary death and secondary death events)
  • OPCS4 code patterns
  • self-reported condition code patterns
  • self-reported operation code patterns

The UKB-defined field patterns match fields that are associated with first occurrence fields and algorithimically defined fields as defined by UKB study. The custom field pattern matches a custom field defined in chapter 3. As a reminder, the only custom field we have defined is dr_self used in phenotyping diabetes related eye disease. The code patterns including ICD10, OPCS4, self-reported condition and self-reported operation match the codes that represent some clinical events. Thus, to phenotype an outcome, one should first identify the fields and codes associated with an outcome, define the patterns which are one of the inputs to get_phenotype_tab(). Internally, get_phenotype_tab function uses grepl function in dplyr package to filter the master event table.

Load packages and functions.

library(tidyverse)
library(data.table)
source("functions.R")

Import the master event table.

event_tab <- readRDS("generated_data/all_ukb_events_tab.RDS")