Saturday, August 24, 2019

My Genetic Genealogy: Pros and Cons of Too Many Matches

I've been working with DNA kits on 23andMe, MyHeritage, and AncestryDNA. One of my first observations was, after beginning with 23andMe and seeing about 1000 DNA matches, that MyHeritage's 3,ooo matches was ridiculous. Who will ever have time to go through and try to link 3,000 matches! 23andMe is now providing about 1200 matches and MyHeritage is now about 8000. Really? But now I've crunched some numbers and am having second thoughts.

The Beginning


Browsing through matches on 23andMe, I started exploring a not-too-distant match for my father, 0.95% shared DNA, about 70cM, somewhere near average for a 3rd cousin. Except that Dad is in his early nineties and the match was middle-aged, so the relationship is more likely to be a 2nd cousin twice removed. This indicates a common ancestor of Dad's great-grandparents who immigrated to the United States.

The Genealogies


Dad's match was able to provide me with his family genealogy back to the early 1800s in Ireland. There was no intersection with my tree, which also geos back this far. Knowing that there is a connection, through the DNA match, the genealogies indicated that the family connection would have to be one or more generations earlier than the 0.8%  shared DNA suggested. Something's not right.

Different Relatives in Common


Comparing notes, we realized that Dennis's list of Relatives in Common (persons that were DNA matches to both him and to Dad) was different from Dad's list. I've noticed this with others, but hadn't delved into the explanation. So, FYI. Both lists were about 35 persons long, but only about 5 persons were the same on both lists. I asked 23andMe for an explanation.

The Relatives in Common list is created by taking your list of DNA matches - about 1200 at 23andMe - and selecting from them those that also share at least 5cM of DNA with the match you are comparing to. To make this less abstract. Suppose Dad's match is Dennis. [In what follows, Dennis and Keith are made-up names.] Dennis has a list of 1200 DNA matches, one of which is Dad. When he clicks on Dad, he is presented with a list of about 35 Relatives in Common. This list is created by taking Dennis's 1200 matches and selecting those who share at least 5cM (this is a VERY small piece of DNA) with Dad. If I look at Dad's list of all DNA matches, the very last one shares 0.27% (about 20cM). Dad's list of Relatives in Common must be from his list of matches, all of which share at least 20cM of DNA with him. The only persons who who show up on both Dennis's and Dad's lists share at least 20cM of DNA with both of them (though I don't know exactly Dennis's threshold), only about 5 persons. Note that both lists are valid, but this explains why they are different.

Cousin Keith


Dennis mentioned that his first cousin, Keith, was on his Relatives in Common, though it was not on Dad's. It turns out that Keith shares about 0.15% DNA with Dad, so doesn't make Dad's list of 1200 matches, so doesn't show up on Dad's version of the Relatives in Common. The second thing to note is that two first cousins should share about the same amount of DNA with Dad, while Dennis and Keith share 0.95% and 0.15%, respectively. This is a reminder that there can be large variations in inherited DNA. One possibility is that Dennis and Keith are related to Dad through different relatives, but further research showed this to be nearly impossible. Comparing to the genealogy research we were studying earlier, though, Keith's shared DNA indicates a common ancestor one or two generations further back than our immigrant ancestor, which could fit our observations better. My current hypothesis is that cousin Keith shares a more normal amount of DNA for the relationship with Dad, while Dennis inherited an unusually long strand of DNA.

What Does This Mean?


In this case, I seem to have gotten lucky that Dennis had an unusually long inherited strand of DNA that moved him above Dad's match threshold of about 0.27%. If not, I would not have seen this connection to investigate. This is disappointing. Much of my known genealogy ends with immigrant ancestors who are great-grandparents to my parents (whose DNA I am working with). My findings with cousins Dennis and Keith leads me to believe it is unlikely I will find connections to earlier ancestors in their countries of origin through 23andMe. Remember that my initial thought had been 1200 DNA matches is more than enough to work with. Now I see that it is not enough for the pre-immigration connections I eventually hope to make.

Not Quite That Bad


So far, in two of my ancestral lines, I was able to connect with many matches through 23andMe whose common ancestor was a pre-immigration family. Fortunately, there are older participants from these "clans" whose relationship to Mom/Dad were 3rd cousin once removed. The average shared DNA for 3rd cousins once removed is about 0.4%, so above the 0.27% threshold for 23andMe matches. But it is important to seek connections with older matches (say, 60 and up). It remains to be seen whether this population will decrease, from natural causes, or increase as more people get their family elders tested.

What About Other DNA Services?


AncestryDNA: I don't know the numbers for Ancestry. I haven't found a way to harvest their matches, Ancestry does allow downloads of this information, and I ran out of patience scrolling endlessly through who knows how many matches to find the end.

MyHeritage identifies about 8,000 DNA matches, down to about 8cM. Perhaps overwhelming. Perhaps absurd. But it does seem to allow the possibility of connecting back further in time. Identifying the ancestral line going so far back from smaller DNA segments will, however, require lots of luck and lots of work.

[I've assumed a very simple relationship between shared DNA and relationship, while in reality, it is not simple. A simple relationship is easier to understand, and I think allows me to make my point.]

No comments:

Post a Comment