Review: Full Genomes Corp third party analysis of Veritas Genetics raw WGS data

In this post, I will provide my review of Full Genomes Corp‘s service offering third party analysis of raw data produced by Veritas Genetics‘ $999 whole genome sequencing (Veritas myGenome). After I released my raw genome data to the public domain, FGC contacted me and offered to run my WGS data through their BAM processing pipeline at no cost. I naturally accepted and agreed to write a review.

This service from FGC includes three categories of analysis: mtDNA, YDNA, and autosomal ancestry. As of now, I have received my mtDNA and YDNA results; the autosomal analysis takes longer to produce and I will leave it out of scope for this review.

Getting Started

After creating an account on the FGC site, I needed to provide them with access to the BAM file that Veritas Genetics produced. My participation in the Personal Genome Project made this easy as I only had to give them the URL to my BAM file on the PGP public data repository.

A little bit more than two weeks later I received email reporting that I had results ready. When I logged back in to FGC a prominent link provided access to download all of my results in a single zip archive. This zip archive contained a readme file directing me to two PDF documents with further information: one focused on extracting private SNPs from YDNA results and the second describing the individual data files FGC returns, which I will get to below.

Mitochondrial DNA results

I have already had my full mitochondrial DNA sequenced by FamilyTreeDNA, so I did not expect to learn anything new from FGC’s data analysis, which produced two files. The first file contains a list of variants found in my mtDNA with respect to the Yoruba reference sequence by position. The second file contains my full mtDNA sequence in FASTA format.

The FASTA file took me by surprise, as they indicated a heteroplasmic length variant that FamilyTreeDNA had not come across (or had not informed me of) in their Sanger sequencing. FGC found a deletion at position 310, the loss of a T flanked by C repeats on both sides. I do not know if this information will turn out relevant for me, but who knows, I prefer to have it.
[ADDED 20170306: I should have updated this sooner. I contacted the FGC team shortly after receiving my results to ask for more information about this reported heteroplasmy. After reviewing my data in more detail, FGC determined that based on the reads in my BAM file, my mitochondrial DNA does not show any heteroplasmy, and this errant result should not have appeared in my report.]

YDNA results

FGC grouped my YDNA results into two folders: YSTR and YSNP.

YSTR

YSTR results consisted of two output files generated from lobSTR. The first file contains roughly 3000 lines of data reporting identified YSTRs according to NIST/lobSTR standards, with some additional markers FGC has added to lobSTR.

The second file contains a subset of the first file including only those YSTR markers which FamilyTreeDNA tests and reports, counted according to FamilyTreeDNA’s standards. Mine reported values for 95 FTDNA-style markers.

Prior to whole genome sequencing I had only FTDNA’s 67 marker YSTR results combined with 23andMe‘s v3 chip Y SNPs with which to determine my YDNA haplogroup, giving nothing more specific than the huge R1b M269 group. I have not yet found my YSTR results from FGC particularly useful as not very many males from my line appear to have taken YDNA testing, so I do not have many data points to compare to.  I do have several close matches on FTDNA’s 67 marker test sharing variants of my surname which have convinced me that I don’t need to consider non paternity events along my direct male line going back at least 400 years based on the known years when Paradis YDNA arrived in Canada from France.

Once more Paradis-descended men take YDNA tests like the Veritas myGenome, FGC Y-Elite, FTDNA Big Y or others, I expect this data to have more value in tracing drift across this line.

YSNP

YSNP results consisted of five separate files. Two described as variant discovery reports, two as variant genotyping, and one haplogroup classification report containing output from yKnot that identifies my sample’s place in the ISOGG tree.

Haplogroup Classification

I have provided below a portion of my yKnot file showing the placement of my YDNA on the ISOGG tree back to the R1b M343+ branch. For the moment, I sit on the S1217+/Z295+ branch (ISOGG, Big Tree). I do not match any subclades of S1217+/Z295+ yet identified, but I will follow developments in this area, and, having my genome already sequenced, can place myself on future revised trees without the need for any further SNP testing.

*Extras: Z1518+, Y4010+, 50f2(P)+, Z14907+, PH3244*, Y2550+, P80+, CTS1789+, CTS12019+, L1228+, M3629+, Z3327+, Z28+, FGC5628+, CTS12440+, PF2372+, M162_1*, FGC5085+, Z13028+, P266+, Z12253+, L798+, DYS257_2+, Z28771*, P27.2_2+, Y2252+, CTS616+, CTS2646*, M118+, M236+, Y2754+, FGC20667*, M141+, L665+, L588+, Z14350+, P34_5+, Z6859+, Z889+, Z13537*, Z6171+, Z1237+, FGC756+, BY451+,     P19_1*, P79*, PF2276+, Z16986+, M5220+, FGC1920+, Z12467+, Z1842+, V161.1+, V190+, CTS6911+, CTS2518+, FGC4872+, Y5185*, Y2986+, Z1101+, CTS32+, Z15165+, IMS-JST022457+, PF2779+, S730+, S504+, Z836*, Z14050+, IMS-JST029149+, M1994*, L990+, P198+, Z16208+, PF3126+, Z2182*
R1b1a2a1a2a1a1a
|Matches: S1217+, Z295+
|____R1b1a2a1a2a1a1
     |Matches: S230+, Z209+, S356+, Z220+
     |____R1b1a2a1a2a1a
          |Matches: Z272+
          |*No-calls: Z274?, S229?
          |____R1b1a2a1a2a1
               |Matches: Z195+, S227+
               |*No-calls: S355?, Z196?
               |____R1b1a2a1a2a
                    |Matches: DF27+, S250+
                    |____R1b1a2a1a2
                         |Matches: P312+, PF6547+, S116+
                         |____R1b1a2a1a
                              |Matches: L151+, PF6542+, L52+, PF6541+, P310+, PF6546+, S129+, P311+, PF6545+, S128+, PF6539+
                              |*No-calls: (being investigated as to placement: L11?, S127)?
                              |____R1b1a2a1
                                   |Matches: L51+, M412+, PF6536+, S167+
                                   |____R1b1a2a
                                        |Matches: L23+, PF6534+, S141+, L49.1+, S349.1+
                                        |____R1b1a2
                                             |Matches: M269+, CTS623+, CTS2664+, PF6454+, CTS3575+, PF6457+, CTS8728+, L1063+, PF6480+, S13+, CTS12478+, PF6529+, F1794+, PF6455+, L265+, PF6431+, L407+, PF6252+, L478+, PF6403+, L482+, PF6427+, L483+, L500+,   PF6481+, L773+, PF6421+, YSC0000276+, L1353+, PF6489+, YSC0000294+, M520+, PF6410+, PF6399+, S10+, PF6404+, PF6505+, YSC0000225+,   PF6409+, PF6411+, PF6425+, PF6430+, PF6432+, PF6434+, PF6438+, PF6475+, S17+, YSC0000269+, PF6482+, YSC0000203+, PF6485+, S3+, PF6494+, PF6495+, PF6497+, YSC0000219+, PF6500+, PF6507+, PF6509+, L150.1+, PF6274.1+, S351.1+
                                             |*No-calls: PF6443?
                                             |**Mismatches: CTS8591- (exp. +), CTS8665- (exp. +), FGC464- (exp. +), CTS10834- (exp. +), CTS11468- (exp. +), FGC49- (exp. +)
                                             |____R1b1a
                                                  |Matches: P297+, PF6398+, L320+
                                                  |____R1b1
                                                       |Matches: P25_3+, L278+, M415+, PF6251+
                                                       |**Mismatches: P25_1- (exp. +), P25_2- (exp. +)
                                                       |____R1b
                                                            |Matches: M343+, PF6242+

Variant Genotyping

The first variant genotyping file provides my results at a little over 54,000 known SNPs. The second variant genotyping file provides results for an additional 16,600 SNPs. The results provided include counts of each base called at the SNP position as identified in my BAM file data, the SNP position on the chromosome, and the build 37 reference sequence call at that position. I do not know the criteria used to place each SNP in each file. I consider these files more as an intermediate step in the data analysis, used to generate the other returned files, but I expect I will find some more direct use for them as well.

Variant Discovery

The two variant discovery reports provide the most detailed and useful information in my opinion, as they include quality rankings on variants as well as the specific details of variants such as SNPs and INDELs. Even more usefully, these files contain the results for the kits most similar to mine within FGC’s database, which can help in identifying private variants that originated in much more recent genealogical times. Because these files include data from others as well as my own, I cannot comfortably release them to the general public without redacting other individuals’ data. For public facing purposes if someone wanted to run comparisons against my detailed data I would most likely refer them to the Big Tree (if R1b) or advise that they pursue their own analysis with FGC directly. The how-to document FGC provides with this analysis (Reading the Full Genomes analysis reports) explains working with this data much better than I could in my own words. The inclusion of quality scores greatly simplifies the process of narrowing down on key SNPs, and I look forward to spending more time with this data — probably after more Paradis males have had next generation YDNA sequencing as my results appear rather distant from the nearest matching males in any database except for the one Paradis I’ve found with a Big-Y at FTDNA.

Data Sharing

It pleased me to see that FGC offers a very quick and easy method to share your results with any email address you provide. I took advantage of this to share my data with Alex Williamson for inclusion in the Big Tree to aid in reconstructing the phylogeny of the R1b tree under R P312. For now, my Big Tree entry sits in the R-Z295/S1217 paragroup, awaiting more submissions sharing SNPs with me to help identify a terminal SNP more recent than the estimated 3900 year old Z295. I don’t match any SNPs identified as downstream of Z295 on the FTDNA tree, the ISOGG tree, or the YFull tree. I encourage any other Z295 or Paradis/Pardy/Paradee/etc male to get your YDNA analyzed and shared with these projects so we can better place ourselves on the tree.

More Info

If this has interested you, I highly recommend you take a look at another review and description of FGC’s analysis.

Advertisement

6 thoughts on “Review: Full Genomes Corp third party analysis of Veritas Genetics raw WGS data

  1. Ankit Sharma

    Hi Brian, I am thinking of taking this genome test, but i am not sure to go with Veritas, 23andme or helix. Also i don’t know what report will be generated. Can you please share there report. What data i get from it. And will my physician be able to read it.
    Thanks
    Ankit

    Reply
    1. Brian Pardy Post author

      Hello Ankit,

      Thank you very much for commenting. First I have to say that I have not used the genome test from Helix and I do not know anybody that has tried their test so far. I do think they have a very interesting business model where you pay a low price for sequencing, and then only purchase the specific analysis that interests you. I have used both 23andMe and Veritas Genetics. About Veritas, I love the company and I love that they provide whole genome sequencing, but they can only offer their product directly in certain countries (I know that it is available in the USA and in China). If you are not in the US or China and you want to use Veritas, they provide a list of international distributors that may serve your country, which you can view here Veritas Genetics Distribution Partners. I have asked one of my contacts there for a sample report, and once I receive something I would be happy to send it your way if you are still interested. Veritas provides genetic counseling services when rare disease variants are found, and they would provide reports that would be usable by your physician. 23andMe has a very nice page showing their sample reports, and they are able to directly offer their services in most countries. You could print out your 23andMe report to show your physician, or show it to them using a smartphone or tablet, but they don’t provide a direct way for your physician to login and view your results unless you show it to them on your own. Many 23andMe users also submit their data to Promethease later to get additional health-related information that 23andMe does not provide. I think the most important part of making your decision will be finding out which of the genome testing companies do business in your country, then you can decide if you want to pay more money for a whole genome sequence like Veritas provides, or less money for a SNP-chip test like 23andMe provides. Good luck!

      Reply
  2. Pingback: Genomic Explorer, a new genomic data analysis/interpretation tool for consumer WGS and SNP chip users | Pardy DBA

  3. Caroline

    Hi,
    I’m interested in your experience with Veritas. I ordered whole exome sequencing from them back in February, and I have yet to get the report from them. I’ve been in contact with many customer service reps, and every 3-6 weeks, they say it’ll be another “few weeks”. Their tech team was ready to send me the genome raw data file back in July, which means my genome was sequenced by then, just no analysis had been completed. Without the analysis, they could not hand over my raw data file. I have been extremely patient. This whole process started off poorly, with the initial package for my my spit sample being lost in the mail, for a month. I figured after 12 weeks, their estimate posted on their website at the time I ordered, I would have a genome report. It’s been 38 weeks – I really think I will never get a report, or raw data, and it feels like I just gave a company access to my DNA for nothing in return, and that this is the beginning of them going belly up.
    Your posts seem to indicate you have good contact with the company, but was your experience anything like mine? Sure, they have been very apologetic and mostly quick to reply to my emails requesting updates, but the delay has gotten a bit ridiculous.
    -Caroline

    Reply
    1. Brian Pardy Post author

      Hi Caroline,

      Thank you for posting. I’ll share what I can about my experience, but I should mention up front that I was literally the first whole genome sequencing customer for Veritas, and I purchased the sequencing about two years ago, so I think it is probably safe to say that my experience won’t be typical for a more recent customer. I believe Veritas first contacted me around October of 2015 (after announcing intent to offer WGS services to Personal Genome Project participants in mid-2015), and then after placing an order in Nov 2015 I received my kit in December, and then received my results in the middle of April 2016. So I would estimate my time from payment to results was around 20 weeks or so.

      I’ve asked around a bit and received an email address for a contact at Veritas that I would like to send your way. It looks like you included an email address in your comment, so I will email that to you directly, hopefully that is acceptable for you.

      Cheers!
      -Brian

      Reply
  4. Joachim Raese MD

    Brian,

    I have the same experience that Caroline describes.
    Coud you forward me the email contact you sent to Caroline?

    Jack

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s