I have just released my whole genome sequence (WGS) to the public domain (CC0, no rights reserved), via the Harvard Personal Genome Project (PGP). I believe that my data represents both the first $1000 genome-with-analysis ever performed as well as the first $1000 genome released for public use. Thank you to both the PGP and to Veritas Genetics for making this possible. I would like to specifically thank Mirza Cifric, CEO of Veritas Genetics and also Christen Hart of Veritas for acting as my liaison and dealing with my frequent email requests for status updates. From my PGP profile page you can download my genome data (as a BAM file (17.8GB) or in VCF format (383MB)), as well as my 23andMe (v3, pre-FDA letter) SNP chip data and my full mitochondrial DNA sequence as tested by FamilyTreeDNA (since deposited in GenBank as accession ID KU530226).
Why would I do this?
Put simply, I wanted to make a contribution to science. Further, since working for a genomic drug development company in the 2000s where I met, then married, a bioinformatician, I’ve had an interest in the potential applications of genomics, from what some then referred to as the “pharmaceutically tractable genome” to today’s “precision medicine”. That employer spun off an early DNA sequencing platform (454 Life Sciences pyrosequencing, the first company to complete and make public an individual human genome), and I find it fitting that an ex-employee, and one from the IT staff, not even the scientific team, would release the first public $1000 genome.
I would like to see science make some good use of my genetic data. Only a relatively small number of whole genome sequences available for scientific research without privacy or intellectual property encumbrances exist. As a participant in the PGP, by making my genome available I hope not only to directly support scientific research but to aid the PGP’s other research goal to identify the risk and consequences of having one’s genetic data available to the public without any effort at de-identification or obfuscation. I have the benefit of living in one of the few states with genetic information laws that exceed the US Federal Genetic Information Nondiscrimination Act in placing restrictions on life insurance providers and others.
After my first blood labs with my current primary care doctor, she told me that I had the absolute worst blood levels of vitamin D that she had ever seen, along with the best HDL/LDL cholesterol levels she had seen. This comes from a genetic basis, not anything that I have pursued through diet or lifestyle. In fact my cholesterol should be, frankly, terrible, and though I live only a few miles south of the 45th parallel I get enough sun that lack of exposure can’t account for my vitamin D levels alone. My 23andMe data, when run through Promethease, reveals a train wreck throughput the vitamin D pathway, as well as matching many variants known to increase HDL cholesterol. With my whole genome sequence released for any imaginable use, I hope that researchers can either spot something unique enough on its own or work my data into genome wide association studies (GWAS) to tease out some drug targets or relevant alleles.
As a PGP participant I have filled out the PGP’s phenotype surveys to help associate phenotypes with my genotype. I have done the same at OpenHumans and remain willing to provide further phenotype data on request. I will attend the GET Conference and GET Labs 2016 at the end of April and get signed up with some other research studies.
You can also find my autosomal SNP chip data on GEDMatch as kit M205442, my YDNA data at ysearch under id CZVXU, and my full mitochondrial DNA sequence in GenBank as KU530226 (though services report my mtDNA haplogroup as U2e1*, I hope the next build of PhyloTree will note the mtDNA SNPs I carry extraneous to U2e1 and define a new haplogroup as with my deposition several mtDNA sequence motifs now have three independent depositions, enough to justify naming a new U2e1* branch). I have much of my genealogy traced several generations back and several apparent triangulation groups worth of matches. Genealogy traces my surname back to the Paradis in Quebec but hits a brick wall in the mid 1800s, though my YDNA 67-STR results at FTDNA show close matches with other tested Paradis males who have traceable lineages back to Pierre Paradis of Mortagne-au-Perche, France (d. 1675), apparent patriarch of new world Paradis/Pardy lines. Several of my lines go back to early US colonials (Trowbridge provides my nexus to Charlemagne, though I’ve found no Mayflower descendents), as well as mixed ancestry (French/German/more) Creoles along the German Coast in Louisiana. I also have a bit of direct Scottish (Halcro) ancestry along with other Scots-Irish.
How can a security and privacy aware individual choose to release this data?
For me, the recognition that sequencing continues to fall in price and will eventually become ubiquitous to the point of banality, coupled with the fact that we shed DNA all day long convinces me that any genetic privacy we may believe we have now exists only for a disappearing moment in history and only in lieu of a determined adversary willing to put some effort into collection. Setting aside the issue of disclosing one’s unique genetic signature to third parties, simply knowing what secrets sit in one’s own DNA empowers some individuals but makes others uneasy. Some people do not want to know if their genetics give them a high probability of Alzheimers, or a disposition to cancer. Some regulators believe they cannot trust the public to make responsible decisions once given knowledge of the forbidden fruit in their genetic code. Because science does not yet know enough about the complex interactions of all parts of the genome to determine the exact medical significance of every gene or non-gene variant, the interpretation of your static genome can and will change with the ongoing discovery of new genetic associations and with failures to replicate previously reported associations. By donating my sequence to an unencumbered public dataset I hope to help speed up this process and embolden others to take this step to share for science, with eyes wide open as to the limitations of data de-identification and possibilities of personalized medicine. Whether you share your genome through the PGP, your microbiome through uBiome, the next virus you catch through GoViral, your FitBit data through OpenHumans, your direct to consumer SNP chip results through OpenSNP, or any other data through any other platform, each of us has a unique chance to contribute to research to better lives today and our species tomorrow.
What does whole genome sequencing give a non-expert that SNP genotyping doesn’t?
Several years ago I took 23andMe’s genotyping test. As this occurred prior to the FDA sending 23andMe a nastygram barring them from reporting health-relevant results, I received a decent amount of information relevant to health issues. So why bother having a whole genome sequence done? To put it simply, a WGS has more long-term value than a genotyping SNP chip. As 23andMe V2 customers discovered, as time moves on and science learns more about genetic variants, and as new builds of the human genome get released, SNP results based on older data lose their relevance. New genome scaffolds obsolete what we believed we knew about older SNPs. New SNPs get discovered with more meaningful disease associations than those believed to associate with diseases years ago during chip design. With my whole genome sequence in my pocket, I have better positioning for the future as I can look up newly-reported variants going forward whether or not the designer of the probes on a SNP chip foresaw the relevance of that genetic region. If I develop cancer in the future, I or my medical providers can compare the sequence of a tumor cell to my genome sequence, easing the process of identifying genes that may have gone haywire and caused cancer, and potentially informing the selection of anti-cancer drugs that could save my life. Further, by ordering and releasing my whole genome sequence, scientists working with public datasets can perform more useful analyses than those available simply from releasing my SNP chip data.
Go use my data!
Mike Cariaso has graciously run Promethease against my WGS data.
Results here. Unfortunately Promethease results expire after a number of days, rendering this report now inaccessible.