11 Create biomarker trajectory data from primary care data

In this chapter, we create biomarker trajectory data using the primary care data. The biomarkers we phenotype are

  • urine albumin
  • urine creatinine
  • blood creatinine
  • blood glucose (random)
  • fasting glucose
  • hba1c
  • HDL
  • LDL
  • triglycerides
  • systolic blood pressure (SBP)
  • diastolic blood pressure (DBP)
  • urine albumin to urine creatinine ratio (UACR)
  • BMI

As we did in the previous chapter, we phenotype patient’s macroalbumuminuria or microalbuminuria status, and exclude events whose measurements or their dates are missing. However, we do not apply sex mismatch filters since we have already filtered out those subjects in chapter 5.

Load necessary packages.

library(tidyverse)
library(knitr)

Load reformatted primary care table.

gp_clinical <- data.table::fread("generated_data/entire_gp_clinical_30March2021_formatted.txt")
head(gp_clinical)

Read in the previously generated full code dictionary for primary care terms.

full_dict <- readRDS("generated_data/full_dict.RDS")
tail(full_dict)

For the purpose of combining the dictionary with gp_clinical, remove terms with missing descriptions, additional descriptions for the same code, and identical codes+descriptions across terminology(read v2/CTV3).

full_dict <- full_dict %>%
  select(-terminology) %>%
  distinct() %>%
  filter(term_description != "") %>%
  distinct(code, .keep_all = T)

tail(full_dict)

Add the term descriptions to gp_clinical.

gp_clinical <- gp_clinical %>%
  left_join(full_dict) 

Define the clinical terms to extract from gp_clinical for each biomarker. Each biomarker gets a string to be used in a grepl pattern matching query, and as such different patterns should be separated by ‘|’, brackets used to denote multiple possible patterns to match, and ‘^’ to denote the beginning of the string.

BP_codes <- "^246[.cdgABCDEFGJNPQRSTVWXY012345679]|^XaF4[abFLOS]|^XaJ2[EFGH]|^XaKF[xw]|^G20"
HDL_codes <- '^44d[23]|^X772M|^44P[5BC]|^XaEVr'
LDL_codes <- '^44d[45]|^44P[6DE]|^XaEVs' 
totchol_codes <- "^44P[.12349HJKZ]|^XE2eD|^XSK14|^XaFs9|^XaIRd|^XaJe9|^XaLux" 
triglyc_codes <- '^44e|^44Q|^X772O|^XE2q9' 
fastgluc_codes <- "^44[fg]1"
randgluc_codes <- "^44[fg][0\\.]|^44TA|^XM0ly"
a1c_codes <- "^XaPbt|^XaERp|^X772q|^42W[12345Z\\.]\\.|^44TB\\."
height_weight_BMI_codes <- "^XaCDR|^XaJJH|^XaJqk|^XaZcl|^22K|^229|^22A|^162[23]|^X76CG|^XE1h4|^XM01G|^Xa7wI"
blood_creatinine_codes <- '^44J3[.0123z]|^44J[CDF]|XE2q5|XaERc|XaERX|XaETQ|^4Q40.|X771Q'
urine_creatinine_codes <- '^46M7'
urine_albumin_codes <- '^46N4|^XE2eI|^46N8.|^46W[\\.01]|^XE2bw'
UACR_codes <- '^46TC|^XE2n3|^X773Y|^46TD|^XE2n4'