This is a departure from what I usually write about, but technically it’s also about databases: GEDCOMs and genetic ones. This post will cover a general strategy to get started doing your own genetic genealogy work. I appreciate any comments you may have. If anyone is interested, I may write future posts on suggested tools and other tips.
Briefly, genetic genealogy is the act of supplementing traditional paper genealogy with genetic information. By doing so you can extend your family tree further, find distant (sometimes extremely distant) relatives and help confirm the details found in your genealogy research. If you were adopted or have known NPEs in your line back a few generations, this may be the only way to track down your real ancestors.
- Do as much genealogy as you can on paper
- Get yourself, and possibly other close relatives tested, by one of the well known companies whose tests enable this work
- Make contact with your matches as identified by those companies
- Compare family trees with your matches
- Share your genetic ancestry data in other places to broaden the scope of potential matches
- Extend your tree with the results of research done by your matches on your shared lines
- Make more contacts and use your previously confirmed ancestors to triangulate on your unknown matches
Step 1: Do Genealogy
So many others have written so much about getting started with and getting better at genealogy that I’m not going to cover this step in very much detail here. Do a few web searches, read what others have to say, and check for “how to” articles on any commercial genealogy sites you join.
The best way, in my opinion, to get started with genealogy is to stand on the shoulders of giants. Someone in your family, maybe a grandparent or second cousin probably already does genealogy research and would be happy to share their data. But in case you can’t find someone like that or just want to get started on your own, here’s a little advice.
Make an account on ancestry.com. They simply have one of the best, easiest to use archives of vital records, wills, immigrant entries, military records, newspaper articles and so on. You can start with a 14 day free full access subscription and try to nail down as much as possible, then choose to subscribe or not depending on how much progress you’re making.
The mid term goal of this genealogy work is to produce a GEDCOM file, which is a database of people, their relationships, and source citations back to primary documents that confirm the relationship claims made in the file. You will then upload this file to various sites to share your research and help others find their match to you. You can optionally privatize the file so that people born after 1900 have their names hidden to avoid revealing information about other people that may not share your enthusiasm for finding your roots.
While you work on your genealogy, proceed with DNA testing, the next step, because it takes a while and you’ll be spending a while waiting for your results.
Step 2: Get tested
You have several choices for testing. The big three companies are 23andMe, FamilyTreeDNA and AncestryDNA, but several other options exist for specialized use. I highly recommend 23andMe, for reasons I’ll explain below, but I’ll give some information about each. All three are based in the USA so the longer your family has been in the US, the more matches you will find (see digression below).
Simply your best choice. For the same price, $99, you will receive genetic information about your health at the same time you receive information useful for genetic genealogy. 23andMe has busy community forums covering health, ancestry and genealogy, but the best part for our purposes is that they test more markers than the other options (since the other companies specifically do not test anything implicated in human health) and you can download your raw genetic data and have it processed by FamilyTreeDNA for a lower fee than having FTDNA test you directly.
The 23andMe test is a saliva test. They will send you a kit including a tube, into which you spit about a teaspoon of saliva, close the top, snap the paraffin seal to release the stabilization/lysing buffer solution and then send it back in a prepaid package. Totally painless unless you have trouble producing saliva or you are trying to test an infant.
The strong point of FTDNA is 23andMe’s weak point. You only sign up for FTDNA if you are interested in genealogy, but many of 23andMe’s users are only there for health information and have zero interest in genealogy or helping you to research yours. The other strong point is their “transfer family finder” service which allows you to upload your 23andMe data file to FTDNA for a better price than testing directly with them. You’ll still receive all the same matches and benefits as if you had tested there directly.
Further, FTDNA has some test offerings the others don’t provide. While 23andMe will test enough single nucleotide polymorphisms (SNPs) on your Y-DNA and mitochondrial DNA to assign a high level haplogroup, FTDNA provides full mitochondrial sequencing and Y-DNA short terminal repeat (STR) testing. The Y-DNA test can help confirm genealogy along your direct male ancestor line, but the mitochondrial sequence is relatively useless for this kind of genealogy. I’ve had a 67-marker Y-DNA STR test done along with a full mitochondrial sequence, plus the family finder transfer of my 23andMe data.
FamilyTreeDNA does provide a way for you to download your raw test results. Their test is done by scraping a cotton swab on the inside of your cheek.
They are the most recent new provider of these tests. I have not used their testing service so I have no first hand knowledge of it. As I understand it they will scan your tree to find genealogical matches with your DNA matches and simplify the process of identifying your common ancestors. This sounds great, and it may be the best choice for those who can’t invest much time in this work, but the downside is that Ancestry has many users who aren’t as careful about validating and sourcing the data in their trees as a serious genealogist needs to do. You really have to doublecheck your match’s work more carefully than on other sites. Being new, their database is currently the smallest of the big three, but it is growing rapidly.
AncestryDNA does support user download of their raw test result data file. As with 23andMe, their test is performed with a saliva sample.
The quality and number of matches you will find on any of these sites depends significantly on your family background and the backgrounds of others who have elected to test. The majority of users on these sites are American, so if you are the second generation of an immigrant family, new to the US, you will find only a few matches. But if you can trace your lines to ancestors in the early US, you’re going to have hundreds or even thousands of matches. Or if you come from a highly endogenous population like the Ashkenazi Jews, you will have a lot of matches but they will be so far back in time you’ll have a lot difficulty finding on-paper genealogical links.
Step 3: Make contact
I should call this step “wait”, since no matter which company you use, it will take a few weeks or months to get your results back. Use this time to work on your family tree some more.
Once you do receive your results, the fun starts. If you don’t check your email very frequently or your results have been in a while, you may already have matches starting to contact you. FTDNA contacts are generally made directly through email to the address you share when signing up. For 23andMe users, you can send or receive a “sharing request”, which if accepted allows you and your match to compare your results to each other and your other matches with whom you have an accepted sharing request.
How do you find your matches? On FTDNA you go to the Family Finder Matches tool and review the list of names, their family trees, and the significance of your match. I’ll cover significance later. On 23andMe you go to the DNA Relatives tool and do the same thing, except most of your matches will have chosen not to reveal their name and family tree, so you’ll need to send them one of the sharing requests I mentioned and hope they accept. I imagine the process on AncestryDNA is both similar to and different from the way it works elsewhere.
Discuss your background with your matches and find out what surnames, locations, or other details your families may have in common. You may find a connection immediately, or there may be nothing obvious. File all this information away for later because you never know when you or they will update their family tree and your connection will suddenly be staring you in the face.
What does a match mean anyway?
The simple answer is that they share a portion of your DNA, based on both of you having inherited that portion from a common ancestor. The significance of the match is generally evaluated in terms of four variables:
- How many segments? A person with you match five segments on five different chromosomes is likely to be a much closer relative than someone with whom you match one segment on one chromosome.
- How long is the match? You measure the length of a match by examining the start and end positions on the chromosome where a segment matches. A match may be, for example, from position 16 million to position 50 million on chromosome 12. The longer the match, the closer it generally is, but see below.
- How densely tested is the match region? This is reported as a SNP count, the number of consecutive polymorphisms you share with your match on a segment. The more SNPs tested on a matching segment, the closer it generally is, but see below.
- How variable is the genomic region where you matching segment exists? Fortunately you don’t have to calculate this yourself. 23andMe and FTDNA will give you a number to represent this value for your matches. The variability of the region, combined with the length of a match and the number of tested SNPs all combine to give you a number of centiMorgans (cM) representing the significance of your match. Researchers disagree on how many cM a matching segment should have to be useful for genealogy, but bigger is definitely better. 5cM and 7cM are common minimum cutoffs. Anything larger than 10cM is quite useful in my opinion.
Long Technical Digression
The detailed answer is much more complex. Feel free to skip this part. I’m skipping over some details but what I’ve described below is accurate enough for genetic genealogy.
Each of our DNA sequences is unique, unless you have an identical twin. Our DNA is composed of 23 chromosomes, and we all have two of each (except in cases like trisomy where an individual has a third copy of a chromosome). One copy of each chromosome is inherited from your father and the other copy is inherited from your mother. Chromosomes 1-22 are the autosomes, while chromosome 23 is the sex chromosome. Women have two copies of the X sex chromosome, designated XX, while men have one copy of the X and one copy of the Y chromosome, designated XY.
Now, when you inherit one copy of each autosome from your two parents, you don’t inherit an exact copy. The autosomes split and recombine. To give an example, you have two copies of chromosome 1. One copy may have only one third of the genetic sequence come from your father’s chromosome 1, with two thirds of your mother’s chromosome 1. But your other copy of chromosome 1 may then have one third from your mother and two thirds from your father. Which of those two copies your child inherits will determine how much they received on chromosome 1 from your mother versus your father. Repeat this over many generations, and sequences break up and rejoin repeatedly over time. Because of this, the fundamental unit of genetic genealogy is the “half IBD segment”, which means “half identical by descent”. The half signifies that half of the segment — the half from one of your chromosomes, but not the other — is identical to one of someone else’s chromosomes, and that the segments being identical is due to both of you having inherited them from a common ancestors. The alternative is an “IBS”, or “identical by state” segment, in which case you and this other individual happened to randomly inherit sequences that match, but did NOT come from a common ancestor. You can’t easily identify these false positives in advance, so some proportion of your matches will be type 1 errors like this. You won’t ever find that match.
It gets even more complicated though. The commercial testing companies generally do not phase your genetic data. Instead they report the results of your SNP test at a position from both copies of your chromosomes, but they cannot tell if a given sequence of consecutive SNPs came from copy A or copy B of your chromosome. This will also contribute to false positive matches. There are ways around this, and if you phase your data you will have much better results with genetic genealogy. To phase your data you need to have both of your parents tested with the same test you take. That will allow comparison of your father’s DNA to yours, and your mother’s to yours, and you will have a much more accurate vision of your DNA. There are tools online to automate the process for you (such as GEDMatch), but you need to have at least one parent tested. Two are even better.
Unlike the autosomes, the sex chromosomes (X and Y) are inherited nearly unchanged from each parent. With detailed Y-DNA testing you can compare your direct male ancestor line back thousands of years. My Y-DNA test helped confirm that my male line descends from Pierre Paradis (1604 – 1675), of Montagne-au-Perche, France, who immigrated to Quebec in 1651, even though my genealogy on that line hits a brick wall with my fifth great grandfather Henry H Paradis, born around 1847 in Riviere-du-Loupe, Quebec. See this link on Paradis history if you’re interested in the line.
For various reasons, particularly the fact that women inherit one X from their mother and one X from their father, the X chromosome is not as useful for genetic genealogy as the Y chromosome. It does not travel an unbroken line of the same sex like the Y does.
Mitochondrial DNA on the other hand is passed only along the maternal line. Whether male or female, you inherited it from your mother. Unfortunately mitochondrial DNA changes so slowly that even if someone has an exact match to your full mitochondrial sequence, that could still be 20 generations back and extremely difficult to find. My mitochondrial haplotype, U2e1* points to early European ancestry and then further back to the Indian subcontinent but this is somewhere along the lines of 5000+ years ago and not useful for what I’m trying to do.
Complicating this further, we’re all related to each other somewhere. The hope is that you find people related closely enough that you can identify your genealogical link. But if, for example, you are of European descent, there’s a better than 95% chance that you descend from Charlemagne, probably along several lines (he was my 38th, 39th, and 40th great grandfather — yours too). Or if you trace back to early Quebec settlers, then you are probably related to 95% of French-Canadians.
Step 4: Compare family trees
I believe AncestryDNA does this for you automatically which is a huge point in their favor. Otherwise you need to review your matches’ surname lists and compare them to yours to find your common link. Sometimes this is easy, if you’ve both done a lot of genealogy work, and sometimes it’s difficult, like if one of you was adopted or has large gaps in their tree, or simply hasn’t done much genealogical research. There are some third party ways to simplify this process which I will get to later.
Step 5: Share your ancestry information
The easiest thing to do here is make sure you fully fill out your user profile on the testing site you use. This will help your matches to do some of the matching work for you, and make them more likely to get in contact with you.
The best thing you can do, though, is upload your raw data to GEDMatch. This is a third party tool run by volunteers for free (they accept donations if you find it useful) that allows users from 23andMe, FTDNA and AncestryDNA to all put their data in one place so that you can compare across vendors. Otherwise you can never be sure if this one guy on FTDNA that you match also matches this one woman from 23andMe and so on.
I can’t reiterate enough how useful GEDMatch is, and how much you’ll help other genetic genealogists by uploading your data there. The service they provide is in many ways superior to that offered by the commercial testing companies. They also support uploading your GEDCOM and doing the family tree matching for you, but that feature is unavailable for now due to the huge influx of data submitted recently. It will be back someday. Once you’ve used it it is tough to do this work without it.
Step 6: Extend your tree
If you’re lucky you’ve been able to identify common ancestors with some of your matches by now. Look through their trees, and if they have any details about your ancestors that you don’t, add them to your tree. If they have the line traced back farther, extend the line in your tree. Add the other descendants of your common ancestors to your tree. You’re related to them, if only distantly, and having those surnames in your tree may help you track down your other matches.
I’ve confirmed via paper genealogy matches as close as third cousins and as far back as ninth cousins. I have documented ancestors going back to early New World settlers so that means I have a LOT of matches and finding the link with other people that have old confirmed lineages eventually gets quite easy. But there are many more people who descend from these early settlers than there are people that can document their ancestry back to them, so sometimes it can be frustrating.
My easiest matches go back to colonial days in the US, particularly some of the early Connecticut settlers like Eleazer Beecher and Phebe Prindle. Early Quebec settlers like Nicholas Pelletier and Jean de Vouzy are another great source for confirmed matches. I also have some large clusters from early French settlers in Louisiana, as well as Quebec French who immigrated to Louisiana later.
As a reference point, I am sharing with nearly 100 matches on 23andMe. I have confirmed genealogical ancestry with somewhere around ten of them. Your results will vary. One of my most recent matches had a detailed family tree and I found our ancestors in 1780s Louisiana after only about ten minutes of work. I was the first person she shared with, so while I only have a 10% success rate she’s at 100%.
Step 7: Triangulate!
The only way to do this is to share with as many people as possible on 23andMe, manually collate your matches from FTDNA or use GEDMatch. Share with people even if you see no obvious connection besides your matching segment. As you accumulate matches, you will eventually discover multiple people that you match in the same region of the same chromosome.
Once you have a list of two or more people you match in the same region, compare them to each other. If you match person A at a particular region, and you match person B at the same spot, compare A to B. If they match each other at the same spot, congratulations. All three of you very likely share a common ancestor. If A and B do not match each other, then most likely you match A on the copy of the chromosome you inherited from your mother and you match B on the other copy, inherited from your father, so that can help you track down the common ancestor you have with each, even though A and B are not related.
Where it gets really interesting is when you have a cluster of several people that all match you and each other but stubbornly resists identification. Then you find a new match who matches all of them, and you find your common ancestor with this new match based on the quality of their genealogical research. That allows you to positively assign a spot in history to the rest of your cluster and may help with future identification. This was the case for me with the recent Louisiana match I mentioned. This match was on a cluster including a woman in Italy that had only one known ancestor who went to the US. We were quite sure our match was somewhere along this American immigrant’s line, but since my new match places a portion of this segment in 1788 Louisiana, that means my match with the Italian woman is back older than that, likely somewhere in France, Germany or Luxembourg in the 1600s or earlier, based on the ancestors of this specific Louisiana settler family.
I’m planning another blog post later on ways to leverage the clusters you’ve identified using 23andMe’s Ancestry Finder tool and GEDMatch. The method will be obvious to anyone who has done this a while but I haven’t seen anybody wrote it up yet.
Here are links to the companies and sites I’ve mentioned along with a few other reference materials on genetic genealogy.
Other than the 23andMe referral link, I have no employment relationship with any of the sites mentioned or linked, nor have I received any compensation for this post. I am a happy user/member/reader of many of the sites and I will get only the indirect benefit of having your DNA tested and potentially matched to mine.