torstai 8. syyskuuta 2016

Worldwide diversity based on 3.2 millions X chromosome markers

Genetic diversity tests are usually done using around 300-500 thousands markers.  It is however possible to use much more markers (SNPs) using already available data from the 1000 genomes project.  The downside is that we have only a few populatons and the upside is that we see the big picture accurately, without possible bad sampling.

I made this test using Chromopainter and Finestructure.  Unfortunately Chromopainter is a rather ineffective tool and incapable to use available computing resources (threads, memory).  Without this drawback I would have made this using 25 millions markers instead of only 3.2 millions.

The process:

1 Vcftools, parameters  -remove indels -chr 23
2 Haplytyping using HAPI-UR and all samples, run three times and driven in consensus
3 Made a manual selection for random samples, 10-20 of each population
4 Chromopainter,  without specifying donor haplotypes
5 Finestructure  with run parameters 30000/300000
6 MDS using Past.

Additionally I ran Vcftools using parameters -keep-only-indels and -chr 23.   The result was filtered and biallelic deletions (CN=0) were counted.  Male results were treated biallelic, so CN=0 should give us the number of effectine deletions in both cases, for females and males.


Finestructure












































MDS done by Past:























All previous pictures are downloadable with better resolution, here.

Deletions per 3.2 million markers (averages per sample):







































The British subgrouping is gathered from internet and can be unreliable.  The Finnish one represents those with highest Siberian admixture, the group being "most Finnish" / local, those closest ancient Corded Ware samples and the rest of all 99 samples.  The last Finnish group includes all outliers.  

Some ancestral changes in Iron Age Estonia

QpAdm was designed to detect admixtures giving also probability and standard error statistics.  Two kind of parameters are inputted: admixtu...