Tuesday, July 30, 2019

My Genetic Genealogy: Is It Working?

The short answer is "that depends". Lots of work. Some important progress. So far, I'll give it a "thumbs up": yes, it's working.

It's been about a year and a half, now, that I've been chasing family genealogy through DNA. Here's what I've learned so far.
  1. The power of DNA matching is that it identifies for us persons who share identical segments of DNA, and so are likely related. It also estimates what that relationship is, based on how much DNA is identical and other proprietary tweeks.
  2. The DNA match information is a starting point, but we still must search for our common ancestors, the couple from whom we are both descended. Most of the matches shown are fourth cousins and more distant. Our common ancestors must be five generations or more back. I'll come back to this.
  3. Since less than half of DNA matches reply to requests for information, it is often necessary to research several generations of their ancestors, i.e., to do all the research unassisted. Among those who do reply, most have little information beyond their own grandparents, so a lot of work is still required to build their family trees.
  4. Different people undoubtedly have different goals in providing DNA samples for study. I've been researching family genealogy for 25 years and am not interested in finding more distant cousins. My goal is to extend my families back further in time than I have been able to uncover so far. Some have been adopted and are looking for birth families. Some are confirming or refuting rumored infidelities. I don't know what others are doing because they don't reply to my queries.
  5. Even though I'm not interested in fitting more cousins in my family tree, I need to do it anyway. An important clue when trying to extend and connect my ancestry is to at least identify which branch of my ancestry I'm trying to connect with. Second and third cousins allow me to identify which DNA segments come from which already known ancestors. When I find one of these segments in a more distant cousin, it at least helps me to focus my efforts on connecting to a particular ancestor couple.
  6. Genealogy DNA testing services differ. I have been using AncestryDNA, MyHeritage, and 23andMe.
    • AncestryDNA has the largest collection of clients, so may provide the best opportunity to find connections. Also, since Ancestry.com has been a genealogy research service, providing access to lots of indexed historical records and to customers' family trees, the matches are often more knowledgeable about their family history and have well-developed trees. Surprisingly, though, I still get replies to less than half of my queries. Ancestry will allow you to download your DNA analysis results, basically a map of your chromosomes, but it will not allow you to download DNA matches information to use with third party services or software. Since I'm not an Ancestry.com subscriber, I did find it frustrating, until recently, that I can't view family trees of matches. Ancestry is currently testing a beta version of their service, though. I can now view up to five generations of a tree attached to a DNA match. This has been very helpful. I've been able to see family tree connections now to dozens of DNA matches. (That's about a dozen per DNA kit. I'm working with DNA results for two relatives. Five generation trees have helped me find connections to about a dozen DNA matches for each of them.) After the initial excitement, I've come to three realizations: (1) most AncestryDNA subscribers don't have well-developed trees; (2) five generations allows me to connect with cousins withing my known ancestry, but does not allow me to see connections beyond my current known ancestry; (3) (not really a new realization, but commonly found in family trees) information in a tree is not necessarily true: some is contradicted by my records, and some is often copied from some other tree with no knowledge of where the original information came from; (4) AncestryDNA members seem to be very happy to provide access to their private trees when I explain how were related and what I hope to see in their tree and send them a link to my own online family tree.
    • MyHeritage is my preferred service because they allowed me to load raw DNA files downloaded from other services so that I can get matches to all four of my dna files (two parents, two in-laws). While they still allow you to upload DNA files, there are now limits on what information you may access. MyHeritage also allows access to customers' family trees. Most of these trees are either private or contain only a few individuals, but some are quite large which can make it much easier to find a connection. MyHeritage has a new feature that goes through their subscriber trees, through FamilySearch trees, and other available trees, and proposes connections with matches. It hasn't shown me an "important" connections, yet - and by important I mean one that I don't already know and that helped extend my tree back in time - but it might. It does not propose a lot of connections, yet, but it might be very useful especially for those whose trees are not yet very well developed.
    • 23andMe is not a genealogy records company. So unlike the above two companies, I never click on a button and get a message that I have to be a subscriber to use that function. They have a variety of interesting gene related reports, some regarding health predispositions, some regarding physical traits. While they do not have a family trees as part of their service, they do permit self-reporting of family surnames and locations, which is often helpful.
    • Note: I've read that the testing services may differ quite a bit in their accuracy with different ethnic groups or geographic origins. My ancestry is white European. I have noticed some inaccuracies that I don't understand. AncestryDNA often predicts a significantly more distant relationship than the true relationship and than I expect from the amount of shared DNA (where I assume a simplistic single path between matches). On the other hand, I'm finding many cousins estimated to be fairly close (third and fourth) are actually quite distant (6th and 7th). This latter only after lots of work tracing back so many generations. These cases seem to be for very old American families when there are multiple paths of relationship over many generations that must accumulate to as much shared DNA as a closer relative.
    • Note 2:
      DNA Matches by Service
      CompanyRelativeNew matchesMatches to Gr-parents
      23andMe
      Mother
      37D & L: 3
      C & H: 10 *
      H & M:1.5
      L & D: 17.5
      [closer: 5] 
      Father
      13
      C & C: 3 *
      P & D: 1
      W & A:  2
      W & M: 0
      [closer: 7]
      AncestryDNA
      Mother-in-law
      18P & C: 7
      H & C: 1 *
      C & K: 0
      K & R: 0
      [closer: 10]
      Father-in-law
      19M & W: 0
      C & McL: 17
      M & P: 0
      S & B: 0
      [closer: 2]
      MyHeritage
      Mother
      8D & L: 3
      C & H: 0
      H & M: 0
      L & D:4

      [closer: 1]
      Father
      31C & C: 2
      P & D: 27
      W & A:  0
      W & M: 0
      [closer: 2]
      Mother-in-law
      4P & C: 2
      H & C: 0
      C & K: 0
      K & R: 0
      [closer: 2]
      Father-in-law
      3M & W: 0
      C & McL: 3 *
      M & P: 0
      S & B: 0
      [closer: 0]

  7. Probably the reason that I have been most successful finding connections for my mom is that all of her ancestors immigrated to the US in the early to mid 1800s. So her family history is not that long, at least not in this country. For my dad, it's more complicated. Because most of his ancestral lines go back centuries in the US, it can be much more difficult to research all the way back to our common ancestor. Also, after so many generation, many of them in the northeastern US (or colonies), there has been a lot of mixing of ancestral lines, so there are multiple paths of relationship and, because each path adds inherited DNA, the estimated relationships implied by the amount of shared DNA may be in error by multiple generations.
The numbers in that table show that in the past year and a half I've made about 130 connections to relatives, with (only) one major find in each of our four parental lines (wife's parents and my parents). So, I'm certainly working hard. But I'm not sure I can sustain this level of effort to advance our tree. For now, I'm continuing with an emphasis on finding certain missing family members and specific pre-emigration families in Europe.

DNA Case Study: Hayden Family

So far, my typical DNA connections consist of picking a DNA match and trying to piece together a family tree that connects to my own. This is sometimes successful. Sometimes I ask for help from the DNA match, who sometimes replies. It usually involves lots of work. And as I continue down my list of DNA matches, toward more distant relations, it gets harder and harder.

My Hayden family connection was different. While exploring match profiles on 23andMe, I noticed several that seemed grouped together, frequently showing up as common matches. Almost all replied to my messages. Almost all had researched their genealogies extensively. I fairly quickly established that the common family was the Haydens. Some put me in contact with other Hayden family genealogists. One had attached resources to a Hayden tree on FamilySearch, and also replied to my message. After gathering their information and researching the gaps, I was able to assemble a skeletal family tree, just connecting the DNA matches, not including their families and ancestors families that I have typically included in my tree. I then tried to connect my own Hayden ancestor to their tree. No census records together, no Irish baptismal record, no FamilySearch, Rootsweb, message board, FindAGrave, Google, or other public data information. None of the matches had among their records any mention of my ancestor.

Anne Hayden Campbell
One of my matches referred me to an article they had written many years ago in which I recognized a photograph that had been hanging on my parents' wall for decades, in what they called the "Rogues Gallery", their photos of their ancestors. While my match had guessed at the the identity of the person, ours was labeled Anne Hayden Campbell by one of Anne's grandchildren. So although I was not finding the family connection, this photograph implied that there was a connection and that most likely their Hayden family was my own.

Now I wondered, if my Anne was part of this Hayden family, where would she fit in? All of the others traced back to Martin and Katherine Headen, born in 1796 and ca. 1790, respectively, in Ireland. The baptisms of many of their children took place in the Catholic parish of Myshall in County Carlow, where records state the family lived in the town of Shangarry. The known birth dates were in 1817, 1822, 1825, and 1832. Anne was born between about 1823 and 1826, so would fit nicely into an unusual gap in children. Baptismal records in those early years were infrequent, so she could simply have been missed. But Anne could also have been Martin's niece, in a different branch of the family.

Now I turned to DNA. The amount of DNA shared with matches was about right for Anne as a daughter of Martin. But there can be quite a bit of variation in inherited DNA, so I was not comfortable placing Anne in this tree based simply on shared DNA. Yet. So I constructed the following chart. It requires some explanation.


Hayden DNA Comparison Chart

I identified fourteen DNA matches to my parent on 23andMe who were likely related through the Haydens. Of these, I could place ten on a Hayden family tree. In the chart above I recorded in the lower half the relationships between all these cousins as read from the tree and added average amount of DNA that should be shared between these cousins, if only a simple single relationship exists. 3c-2/0.2 , for instance, is third cousin twice removed, who share an average of 0.2% of their DNA. The four empty lines are the four persons whom I could not place in the tree, and so with whom I cannot know their relationship with the others. In the upper half I recorded the estimated relationship and the measured amount of shared DNA as reported by 23andMe. The columns/rows of x's are Hayden descendants whose DNA was either not analyzed on 23andME or who did not show as a match to my parent. The gray boxes are where DNA matches were not detected/reported, even though both were matches to my parent. Finally, I color coded the results. Basically, green shows 23andMe estimates close to true relationships, "red" (purple) not close, and yellow somewhere in between.

First I compared just the matches among themselves. Now I'm down to eight matches: started with fourteen, four I couldn't place in the tree and two did not show up as a match to my parent. Among these eight persons, there are twenty-eight relationships. Of this twenty-eight, eleven (39%) don't show up at all. This is typical for 3rd/4th cousins. Of the seventeen that do appear, eight (47%) are good/green, six (35%) are so-so, and three (18%) are incorrect. Note that by "incorrect" I mean percent shared DNA is different from what I expect by a factor of two or more. This is only half a generation, or, say, the difference between 4c and 4c-1 (fourth cousin vs. fourth cousin once removed). This may not be a huge error, but it is important in determining where Anne might fit into this tree. So the above numbers, % good numbers, are my baseline.

Now I look just at my parent's relationship with the other eight. In order to have relationships from the tree, I have to place her somewhere in the tree. I placed her as a child of Martin and Katherine. There are eight possible relationships. Of these, none did not show up. That's obvious, because those that don't show up are not visible in my results. Actually, I later discovered that one of the persons who did not show has had her DNA tested on 23andMe, but can't find me among her matches, either. Of the eight that are visible, 50% are green, 37.5% are yellow, and 12.5% are "red". I think these compare very well with the 47%, 35% and 18% baseline. My conclusion is that my ancestor, Anne Hayden Campbell, is the daughter of Martin and Katherine Headen.

Now I need to go back and fill in all those quick-and-dirty sources I noted while assembling a family tree ...