Haplotypes determine the fine branches of the tree of Man, on the scale
of families and hundreds of years, just as haplogroups determine the larger
branches, on the scale of populations and tens of thousands of years.
Haplotypes are defined by "STR"s.
STR stands for "Short Tandem Repeat", also known as microsatellites.
STRs are a very different type of mutation from SNPs and are distantly
related to insertions or deletions, though they occur by a very
different molecular mechanism. There are areas of DNA along the
chromosome, with no known purpose, which consist of several repeated
copies of some short (2 to 6 bases) motif. For example, the list of
bases in the region known as DYS391 might be
TGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTGCCT
which
has ten copies of the motif TCTA flanked by different patterns.
Sometimes the genetic copy mechanism "slips up" and copies the wrong
number of repeats. This constitutes an STR mutation. The main
measurement made on people in our DNA project consists of the
measurement of the number of these repeats for a large number of
"markers". DYS391 is one such marker. Others are given different names,
many beginning with the designation DYS, others with random sounding
names or names telling the repeat unit (like GATA-A10). The "DYS" is
often left off of the designation of marker names, leaving only a
number. There is also a type known as DYF, which is always listed, and
note carefully that DYS385 and DYF385 are very different markers. Only a
few Clan Donald men have yet tested DYF385. The numbers in our charts
are just these repeat counts.
The list of all the numbers is called a haplotype. The numbers themselves really mean nothing to genealogy, only differences between people matter. Each marker mutates independently of all the others (but see below for DYS389). Using these mutations we can calculate how long ago two people shared a common ancestor. We can also, to some extent, determine whole genealogies from haplotypes (see Network Tree Charts).
The marker DYS389 represents a special case in interpretion of our results. Two numbers are listed for this marker, called 389-1 and 389-2. There are two places on the Y chromosome where this marker occurs. The testing companies have devised two ways of testing their length. One way measures the length of just one, while the other way measures the sum of the lengths.The 389-1 listing is the single one and the 389-2 listing is the sum. The 389-1 number gives the length of one marker, and the difference between 389-2 and 389-1 gives the other length. Hence if one person is listed as having numbers 13 and 29 for 389-1 and 389-2 respectively the lengths are 13 and 16. If another person is listed at 14 and 30, the real numbers to compare are 14 and 16, so there really is only one mutation, not two.
Some other markers, DYS385, DYS459, DYS464, DYF371, DYF385, DYF397, DYF399, DYF401, DYF408, DYF411, and YCAII, all having two or
more values, are prone to an entirely different type of mutation called
a recLOH. This causes one of the copies to become a duplicate of the
other. Thus 8-10 at 459 could become 10-10 or 8-8 in one event, not two
separate ones. At DYS 464, 13-14-16-18 could become 16-16-18-18 in one
event, not seven. You have to look for these yourself, as our software does not detect them.
By comparing haplotypes we can make rough estimates of how many generations have elapsed since two men shared a common male line ancestor. This is called the TMRCA (Time since Most Recent Common Ancestor). If one tests the 37 FTDNA markers, one expects to see one mutation every 7 generations. It is of great importance to realize that mutations happen at random, like throws of dice, rather than like a precise clock. One family line might actually have 3 mutations in ten generations (assuming one son per generation) while another might have none at all for 20 generations.
On average, men differing by one marker in 37 will have a common ancestor 4 back (roughly half of the 7 mentioned above, since either line could mutate). Our data charts give the number of mutations measured and this TMRCA. This number is actually just a rough mathematical estimate: half the time the real number will be greater, half the time it will be less. Sometimes it will be way off. 10% of the time it would be in fact 2 generations or less, and 10% of the time it would be expected to be 12 generations or more. "Rough estimate" really does mean rough. The only way to get better estimates is to pay for more markers. The graph below shows what the probability is that two men who match at 36 out of 37 markers have their most recent common ancestor at a certain generation back.
Similar methods can be used to predict when a larger group of people had a common all male line ancestor, or at least when a very small group of related people started a large increase in population. Our results tables give such a result for each subgroup.
Our TMRCA results are in generations since th MRCA. To get to years, you have to know how long a generation is. A good average number for Scotland before 1800 is 31 years. Some TMRCA calculators, such as this one , are in "Transmission Events" rather than in generations since the MRCA. "Transmission events" counts both legs of the tree since the MRCA, and hence is about twice the number of generations, and is exactly twice if the data is for cousins which are not "removed" at all.
Each STR on average mutates at a differing rate from other STRs. Using the Sorenson Foundation, Ysearch, and Ymatch databases, plus several academic studies, your webmaster has been able to infer the 'concensus' absolute mutation speeds of the Family Tree DNA 37-marker STR panel as well as most other markers we list. These rates are needed for the various calculations. Note that Professor McDonald's rates, as well as those calculated by the Sorenson Foundation itself, are quite a bit slower than those used by FTDNA for their on-line TMRCA "FTDNATiP". The most recent rates, those calculated by John F. Chandler , are similar in speed to other sources for the "FTDNA first 12" but are substantially faster for the "FTDNA 13-25" and "FTDNA 26-37" sets. These Chandler rates are now (March 2007) included in the data we use on this web site. Our new numbers are still slower than the numbers FTDNA itself uses. We believe that ours are more reliable than those of FTDNA. Our March 2007 numbers give larger TMRCA numbers than FTDNA does, but smaller than our old numbers. Our rates for markers beyond the "FTDNA-37" set are unchanged and remain of very dubious accuracy.
It is possible to infer relationships from DNA data, without starting with a paper trail. With the number of markers we have, even if everyone had 118 of them, it is not possible to do so with certainty. Nevertheless we have used three computer methods to process our data. We present the results from one of them (median joining). Our criterion for inclusion in a specific group are that all three different mathematical methods, out of three tried, clearly group some people together, or that two clearly group them and they share a surname other than McDonald or McDaniel. For the most common haplogroup, R1b, this leaves a large number of participants in one "unclassified" group. Unclassified people are colored yellow. Larger dots indicate haplotypes with several identical people. If you are in one of these, the code may be that of someone else. People whose results arrived after September 24, 2006 may not be on these charts.
The imprecision with which these clustering methods operate is illustrated by the difference between the R1b chart made using just 25 markers and the one made using 37. Since more data is better, the one with 37 is more likely correct. But given the amount of difference, clearly we need far more than 37 to get real certainty. Possibly even 118 will not be enough.
These charts illustrate one possible "genealogical path" between two people. The path length is proportional to the number of mutations along it. They operate on a different assumption than simple TMRCA calculations. TMRCA for two people uses just the information for those two people, but the computer programs that make the charts use information on all the people. One way of doing this selects the route between two people that is populated by the most "in-between" people (actually "in-between" haplotypes.) Another method attempts to find ways of making the number of mutations for each marker on the whole chart proportional to the known per-marker mutation rate. These sometimes give different results. As a result of these uncertainties, one can find "pairs of pairs" of people such that the TMRCA of each pair is the same and the distance along the path is quite different. This is just a reflection of the imprecision of TMRCA.
The programs also can calculate a "group TMRCA" for a whole collection of people. This number is more precise than the TMRCA estimates given in our results tables. For the R1a Somerled group (red on the chart) the actual number is in startlingly good agreement with the value known from history. For the R1b charts the number for the smaller groups is only a few generations, while for the larger ones, such as the red and green ones, it is typically 3000 years or more. For the R1b chart as a whole the time of the group TMRCA is well back into the Ice Age.
These charts are reached through the Results page. One of them, including the two subgroups of haplogroup I, is reproduced below.