1 Clean raw UKB assesment center data

We first reformat raw UKB assessment center data using the R scripts (ending with .r) in raw_data folder for easier downstream data wrangling. These scripts were first auto-generated using the ukbconv utility, which automatically applies encodings to data fields. We have slightly modified the auto-generated scripts to achieve a faster reading of data by using fread() function from data.table package rather than the default read.table() function. We also changed the default naming of the R objects storing each dataset to something more descriptive which reflects the contents. The ukbconv utility can be obtained from the download section of the UKB showcase website. We used the ukbconv with a flag -i to specify a subset of fields to be included in each dataset. For further details on downloading, decrypting, and converting the format of your main dataset(s), see UK Biobank’s insructions.

Load packages.

library(tidyverse)
library(data.table)

Execute the reformatting scripts.

source("raw_data/demog_UKB_MOD.r")
source("raw_data/assessment_center_UKB.r")
source("raw_data/first_occurrences_UKB_MOD.r")
source("raw_data/ICD_UKB_MOD.r")
source("raw_data/OPCS_procedures_UKB_MOD.r")
source("raw_data/sampleQC_UKB_MOD.r")
source("raw_data/labs_UKB_MOD.r")

Save reformatted UKB assessment center data.

saveRDS(demog,"generated_data/demog_UKB.RDS")
saveRDS(bd,"generated_data/assessment_center_UKB.RDS")
saveRDS(firstoccurs,"generated_data/first_occur_UKB.RDS")
saveRDS(procs,"generated_data/OPCS_procedures_UKB.RDS")
saveRDS(ICD,"generated_data/ICD_UKB.RDS")
saveRDS(sampleqc,"generated_data/sampleQC_UKB.RDS")
saveRDS(labs,"generated_data/labs_UKB.RDS")