The biobanks have developed rapidly. They typically aggregate genetic information such as WGS (whole genome sequencing), WES (whole exome sequencing), and SNP (single nucleotide polymorphisms) with a range of other data on the same individuals: health records like GP data, hospitalizations, diagnoses, prescriptions, MRI (magnetic resonance imaging), lab results from biochemistry and hematology, and patient-reported information such as family history, behavioral history, and socio-demographics are all included. And the number and diversity of individuals included have grown, too. Many countries either have or are developing national biobanks, including the UK (UK Biobank), China (Kadorrie), Japan (Jenger), the US (All of Us), and Finland (FinnGen). The UK Biobank (UKBB) has 7,400 categories of phenotypes along with single nucleotide polymorphisms (SNP) and WES data from 500,000 participants. Organizations are also now working to facilitate more collaboration between biobanks.
The resulting massive multi-dimensional data sets can be used for computing large association analyses to relate specific genetic variations and specific phenotypes with susceptibility to, or protection from, certain diseases. Against this background, it is not surprising that algorithms, database platforms, and IT tools have needed to evolve, too—and the role of the bioinformatician is now, more than ever, center stage.
Read this article that explains the role of bioinformatics in bringing biobanks to the forefront of scientific breakthroughs.