Tag Archives: WGS

Genomic Explorer, a new genomic data analysis/interpretation tool for consumer WGS and SNP chip users

I have written before about having my whole genome sequenced through the Personal Genome Project and releasing my results to public domain, and about third party data analysis available for people who have access to their raw whole genome sequencing data. I recently learned of a new option for consumer WGS raw data analysis, called Genomic Explorer. This new tool provides analysis and interpretation of consumer DNA data, including both SNP chip results as from 23andMe or AncestryDNA, and also including whole genome sequence data.

Background

Not very many third-parties offer detailed analysis or interpretation of consumer genomic data. One of the most well-known, Promethease, has existed for a while but can appear overwhelming for a novice user. Individuals sharing their genome data through OpenHumans also have access to a new tool called Genevieve (see my public WGS Genevieve report here).

Now enough people have sequenced their genomes that more third-party options have begun to appear. Some, like FGC, specialize in genealogical-type analysis, others like DNA.land provide whole genome imputation based on SNP chip data and help a user to find others with matching DNA segments (GEDMatch gives us another example of relative-finding tools).

This new tool, Genomic Explorer, focuses primarily on non-medical trait interpretation (e.g. educational attainment, male-pattern baldness, motion sickness, endurance performance, caffeine/alcohol/tobacco usage behavior, personal traits like agreeableness, openness, neuroticism, and so on) for users in the United States. Non-US users may have access to medical interpretation, depending on laws in their jurisdiction. At this time, Genomic Explorer has made access available for users with SNP chip data from 23andMe or AncestryDNA, and has opened up a WGS interpretation trial limited to 30 users. After uploading my 23andMe data I followed the links to request access to the WGS trial and provided the team with a link to my public domain WGS data sequenced by Veritas Genetics as part of the Harvard Personal Genome Project. Genomic Explorer accepted me into the trial and informed me that it will take about a week or so to import my WGS data, so I do not intend to write a full review of the tool at this time until I have had a chance to run it against my whole genome. In the meantime they offer access to a demonstration using sample data.

Who can participate?

I contacted the Genomic Explorer team to ask for more details about their offering and who can participate. Their response:

– 23andMe & Ancestry.com users can upload the genome data and use GENOMIC EXPLORER for FREE.
– For existing WGS data holder, the company is offering the trial use of GENOMIC EXPLORER with uploading your WGS for FREE (for the future, there would be some fee to upload), but the spots are limited only for 30 users. If you have your own WGS data and are interested in participating in the trial, you can reach to the team via info@awakens.tokyo.

They have not specified whether or not they could accept and process imputed whole genome data such as DNA.land provides, but I would take great care in trusting imputed data beyond a certain point anyway, so if you have both 23andMe/Ancestry data and imputed DNA.land data based on that SNP chip, I would suggest uploading your SNP chip data rather than trying to use an imputed WGS dataset.

I hope that other PGP participants, or those with Veritas Genetics MyGenome data, or those who have had FGC perform whole genome sequencing, take advantage of this opportunity to put those large data files to further use and check out Genomic Explorer. We all win as this market grows and competition comes in.

Getting started

Visit the Genomic Explorer signup page to create an account, upload your SNP chip data, and ask about the WGS trial. Processing of my 23andMe data went very quickly and I had it usable in the tool in minutes.

What’s the catch?

As a newly released service, Genomic Explorer has requested feedback on their site from trial users, via online submission and potentially a user interview via Skype. This gives anyone interested in trait analysis of consumer DNA results a chance to provide input into the design of and user experience provided by the tool, as well as a chance to potentially gain some insights otherwise unavailable to them, at a great price (free). In the future they intend to charge a fee for WGS data and I believe they will also make a move into performing sequencing directly instead of only using data produced from other sources.

Conflicts

I have no affiliation with the company behind Genomic Explorer other than participating as one of the 30 whole genome sequence trial users and offering them feedback.

Take my $1000 genome, please!

I have just released my whole genome sequence (WGS) to the public domain (CC0, no rights reserved), via the Harvard Personal Genome Project (PGP). I believe that my data represents both the first $1000 genome-with-analysis ever performed as well as the first $1000 genome released for public use. Thank you to both the PGP and to Veritas Genetics for making this possible. I would like to specifically thank Mirza Cifric, CEO of Veritas Genetics and also Christen Hart of Veritas for acting as my liaison and dealing with my frequent email requests for status updates. From my PGP profile page you can download my genome data (as a BAM file (17.8GB) or in VCF format (383MB)), as well as my 23andMe (v3, pre-FDA letter) SNP chip data and my full mitochondrial DNA sequence as tested by FamilyTreeDNA (since deposited in GenBank as accession ID KU530226).

Why would I do this?

Put simply, I wanted to make a contribution to science. Further, since working for a genomic drug development company in the 2000s where I met, then married, a bioinformatician, I’ve had an interest in the potential applications of genomics, from what some then referred to as the “pharmaceutically tractable genome” to today’s “precision medicine”. That employer spun off an early DNA sequencing platform (454 Life Sciences pyrosequencing, the first company to complete and make public an individual human genome), and I find it fitting that an ex-employee, and one from the IT staff, not even the scientific team, would release the first public $1000 genome.

I would like to see science make some good use of my genetic data. Only a relatively small number of whole genome sequences available for scientific research without privacy or intellectual property encumbrances exist. As a participant in the PGP, by making my genome available I hope not only to directly support scientific research but to aid the PGP’s other research goal to identify the risk and consequences of having one’s genetic data available to the public without any effort at de-identification or obfuscation. I have the benefit of living in one of the few states with genetic information laws that exceed the US Federal Genetic Information Nondiscrimination Act in placing restrictions on life insurance providers and others.

After my first blood labs with my current primary care doctor, she told me that I had the absolute worst blood levels of vitamin D that she had ever seen, along with the best HDL/LDL cholesterol levels she had seen. This comes from a genetic basis, not anything that I have pursued through diet or lifestyle. In fact my cholesterol should be, frankly, terrible, and though I live only a few miles south of the 45th parallel I get enough sun that lack of exposure can’t account for my vitamin D levels alone. My 23andMe data, when run through Promethease, reveals a train wreck throughput the vitamin D pathway, as well as matching many variants known to increase HDL cholesterol. With my whole genome sequence released for any imaginable use, I hope that researchers can either spot something unique enough on its own or work my data into genome wide association studies (GWAS) to tease out some drug targets or relevant alleles.

As a PGP participant I have filled out the PGP’s phenotype surveys to help associate phenotypes with my genotype. I have done the same at OpenHumans and remain willing to provide further phenotype data on request. I will attend the GET Conference and GET Labs 2016 at the end of April and get signed up with some other research studies.

You can also find my autosomal SNP chip data on GEDMatch as kit M205442, my YDNA data at ysearch under id CZVXU, and my full mitochondrial DNA sequence in GenBank as KU530226 (though services report my mtDNA haplogroup as U2e1*, I hope the next build of PhyloTree will note the mtDNA SNPs I carry extraneous to U2e1 and define a new haplogroup as with my deposition several mtDNA sequence motifs now have three independent depositions, enough to justify naming a new U2e1* branch). I have much of my genealogy traced several generations back and several apparent triangulation groups worth of matches. Genealogy traces my surname back to the Paradis in Quebec but hits a brick wall in the mid 1800s, though my YDNA 67-STR results at FTDNA show close matches with other tested Paradis males who have traceable lineages back to Pierre Paradis of Mortagne-au-Perche, France (d. 1675), apparent patriarch of new world Paradis/Pardy lines. Several of my lines go back to early US colonials (Trowbridge provides my nexus to Charlemagne, though I’ve found no Mayflower descendents), as well as mixed ancestry (French/German/more) Creoles along the German Coast in Louisiana. I also have a bit of direct Scottish (Halcro) ancestry along with other Scots-Irish.

How can a security and privacy aware individual choose to release this data?

For me, the recognition that sequencing continues to fall in price and will eventually become ubiquitous to the point of banality, coupled with the fact that we shed DNA all day long convinces me that any genetic privacy we may believe we have now exists only for a disappearing moment in history and only in lieu of a determined adversary willing to put some effort into collection. Setting aside the issue of disclosing one’s unique genetic signature to third parties, simply knowing what secrets sit in one’s own DNA empowers some individuals but makes others uneasy. Some people do not want to know if their genetics give them a high probability of Alzheimers, or a disposition to cancer. Some regulators believe they cannot trust the public to make responsible decisions once given knowledge of the forbidden fruit in their genetic code. Because science does not yet know enough about the complex interactions of all parts of the genome to determine the exact medical significance of every gene or non-gene variant, the interpretation of your static genome can and will change with the ongoing discovery of new genetic associations and with failures to replicate previously reported associations. By donating my sequence to an unencumbered public dataset I hope to help speed up this process and embolden others to take this step to share for science, with eyes wide open as to the limitations of data de-identification and possibilities of personalized medicine. Whether you share your genome through the PGP, your microbiome through uBiome, the next virus you catch through GoViral, your FitBit data through OpenHumans, your direct to consumer SNP chip results through OpenSNP, or any other data through any other platform, each of us has a unique chance to contribute to research to better lives today and our species tomorrow.

What does whole genome sequencing give a non-expert that SNP genotyping doesn’t?

Several years ago I took 23andMe’s genotyping test. As this occurred prior to the FDA sending 23andMe a nastygram barring them from reporting health-relevant results, I received a decent amount of information relevant to health issues. So why bother having a whole genome sequence done? To put it simply, a WGS has more long-term value than a genotyping SNP chip. As 23andMe V2 customers discovered, as time moves on and science learns more about genetic variants, and as new builds of the human genome get released, SNP results based on older data lose their relevance. New genome scaffolds obsolete what we believed we knew about older SNPs. New SNPs get discovered with more meaningful disease associations than those believed to associate with diseases years ago during chip design. With my whole genome sequence in my pocket, I have better positioning for the future as I can look up newly-reported variants going forward whether or not the designer of the probes on a SNP chip foresaw the relevance of that genetic region. If I develop cancer in the future, I or my medical providers can compare the sequence of a tumor cell to my genome sequence, easing the process of identifying genes that may have gone haywire and caused cancer, and potentially informing the selection of anti-cancer drugs that could save my life. Further, by ordering and releasing my whole genome sequence, scientists working with public datasets can perform more useful analyses than those available simply from releasing my SNP chip data.

Go use my data!

Updates

Mike Cariaso has graciously run Promethease against my WGS data. Results here. Unfortunately Promethease results expire after a number of days, rendering this report now inaccessible.