Saturday, August 11, 2018

My DNA Genealogy: Genetic Origin Prediction

Just My Observations

A few months ago I began researching genealogy through three DNA analysis services. There is information all over the Internet about these services, so I don't intend to make a thorough comparison or recommendation. Just some thoughts, observations and experiences from someone who read some, has good technical and Internet skills, and has done some serious genealogy. But I still did not know what I was getting into, so maybe my observations can help give a realistic idea of what to expect if you sign up for one of these services.

I'm currently using Ancestry, MyHeritage and 23andMe: Ancestry as an invited guest, the others as paid test customers. One person's DNA was tested on both MyHeritage and 23andMe, so I'm seeing a lot of different aspects of these services. I'd like to stay away from detailed comparison, so although 23andMe provides significant health-related analysis, I'm just going to concern myself with "genetic origin prediction" (just a mention) and "genetic matching" (my main interest).

"Genetic Origin Prediction"

Just a brief mention of genetic origin prediction. All of these services attempt to tell you what country your not-too-distance ancestors came from. If you're me, this is boring. Through my genealogy research, I already know where my not-too-distant (to, the last few hundred years) ancestors came from. Lessons learned: 1) genealogy (if you can do the research) is more accurate than current DNA testing, 2) there's a trade off between precision and confidence, and 3) don't expect the testing service to be upfront about the limitations of their predictions.

Just a few words about each of those points. Predicting genetic origins is very difficult. They are trying to distinguish between sets of genes that look very much the same but that if you perform a statistical analysis on genes from very large numbers of people from "small" geographic areas you might find subtle differences.

1) So if you're like me, where most of my ancestors come from the British Isles, and their genes look very much alike, it is unlikely that a service will accurately tell you the difference between your Irish, English, Scottish, Welsh, and maybe even northern European origins. So for me, my genealogy is much more precise about my European origins. Having said that, not everyone has such a homogeneous ancestry. One of my DNA subjects was thought to have, through genealogy research, Italian ancestors, in addition to predominantly British Isles origins. I suspected, however, because one of the Italian ancestors had a typically Portuguese name, that a Portuguese ancestor had emigrated to Italy, before one of their descendants emigrated to the United States. The DNA results predicted an ancestor from the Iberian peninsula. And if you understand the math of percent shared DNA and how it changes with each generation, the amount of "shared DNA" was consistent with a full-blooded Portuguese ancestor who emigrated, from Italy to the United States. So in that case, the DNA test results provided confidence in what had been a guess at a portion of the ancestry.

And not everyone has thoroughly researched their DNA, whether because they haven't taken the time or because records are not available for their ancestors. So if you don't know where your ancestors are from, testing will give you a broad region. And if your origins are from distantly separated areas (Native American, East Asian, Eastern European, South African, etc.), the results will show you distinctly different regions. I believe that some test services can produce more precise predictions for different regions of the world, so if you have non-European ancestry, you may want to look for recommendations for best testing for your region of interest.

2) In one of the pages showing estimated origins on the 23andMe service, you are able to also choose a "confidence level". My memory is that choices are 50%, 70% and 90% confidence. It's interesting to see that the "best" predictions of DNA origin, meaning a list of several distinct countries or regions with the percent of the DNA that came from those countries, corresponded to a 50% confidence level. 50% confidence means that the prediction is just as likely to be wrong as it is to be right! By increasing the confidence level to 90% the countries (Ireland, France, Italy, etc.) disappeared to be replaced by larger generic regions (British Isles, northern Europe, etc.) So they're certain I'm from broad areas, but not so sure about the more specific countries. I have not seen any way to make this adjustment on the other services, nor could I figure out what confidence level they use. My guess is that the default predictions, that look the most interesting to clients, are nearer a 50% confidence level.

3) In fairness to the genealogy services, talk about confidence levels and precisions and statistics and reference groups would not attract customers, and many wouldn't understand even if it were presented more openly. And if you read the test agreement and reference pages, much of this is explained in some way. But I think it should be more apparent that, for now, origin estimates should be taken as broad indications. Ancestry has been claiming lately that they can predict far more origins than any of the others. I don't have numbers handy, but I believe Ancestry has tested far more people than any of the other services. It wouldn't surprise me to learn that they have invested far more money in identifying more origin reference groups or in leveraging some of their members' uploaded genealogy information to improve the accuracy of their analysis.

No comments:

Post a Comment