Individual Ancestry Prediction from Genetic Data

December 15, 2015

The collection contains three experiments which predict the ancestry for subjects from the publicly available Human Genome Diversity Project (HGDP) using genetic data. The subjects are represented with collections of Single Nucleotide Polymorphism (SNP) genotypes, which are variations in single nucleotides at specific loci in the human genome. SNPs represent the small collection of genotypes that contain variation across the human population and can contain signatures for variation across different geographic sub-populations. The experiments only consider seven broad continent-level populations and the ancestry prediction produces estimated percentage proportions for these populations for the subject. The algorithm deployed is a simplified version of the well-known ADMIXTURE/STRUCTURE model in population genetics. From a pre-selected subset of ancestry-informative SNPs, the algorithm aggregates the likelihood score for each major population across all SNPs to estimate the different population proportions, assuming SNPs are independent. Created by a Microsoft employee.