Permission to use the Database and Concordance for research, and to cite portions of the data in results, is freely given provided that suitable acknowledgement is made
2 January 2006: It is replaced by the database described in All enquiries should be directed to the FBI or to Dr John Dawson, JLD1@cam.ac.uk.
It is replaced by the database described in
All enquiries should be directed to the FBI or to Dr John Dawson, JLD1@cam.ac.uk.
The concordance is based on a collection of individuals' sequences and sequences representing more than one individual, stored in a standardized form, using the site numbering of the Cambridge Reference Sequence (Anderson numbers) and IUPAC symbols, with the addition of "o" to denote an omitted site. The concordance is in three parts: each of the hypervariable regions separately, then both together:
and each part may be accessed by site numbers. The standard sites are those most commonly used in studies of the two hypervariable regions. Some researchers sequence less than these standard sites, in which case omitted sites are marked as [o]; many researchers sequence more than these standard sites, in which case different colours are used to denote the status of a particular site (see below for details).
The following sample of concordance (taken from the HVR1+HVR2 section) will be used to illustrate the features of the concordance web pages. Some imaginary fields have been added to complete the entry.
|. . .||. . .|
|16024[o] 16051[G] 16114[T] 16189[C] 16192[T] 16223[T] 16293[T] 16311[C] 16316[G] 16355[T] 16362[C] 63-72[o] 73[G] 146[C] 152[C] 195[C] 244[G] 263[G] 309.1[C] 315.1[C] 340[T]||Twgdam; AFAM 11; [HA]; African Amer.(1)|
|. . .||. . .|
The large red heading 16189[C] shows the key site, the substitution which is being processed at this point. These headings are in strict numerical and alphabetic order, with the exception that all HVR1 sites precede all HVR2 sites. Care should therefore be taken that following a long section such as 16231[C], less usual substitutions such as 16231[G] and 16231[T] are not overlooked. Insertion sites such as 16193.1, 16193.2, etc. appear after their basic site 16193.
Because omitted sites are included in the sorting of sequences, be aware that a sequence which begins (for example)
73[G] 146[C] . . .
may also appear separately as
63-72[o] 73[G] 146[C] . . .
Generally, standard sites are coloured blue, even if they have been omitted in the sequence being illustrated; sites outside the range of the standard sites (i.e. less than 63, 323-16023 inclusive, and greater than 16324) are coloured black; the key site is coloured red (so it is quite clear which key site is being considered, even if there is no heading visible on the page).
Sites outside the range of standard sites will also appear as key sites, in which case they are coloured purple, both in headings and within sequences. We apologise to anyone who is colour-blind.
In the sequence illustrated:
16024[o] shows that site 16024 has been omitted by the researchers who deposited the sequence, although it forms part of the HVR1 standard sites.
16189[C] is the key site being studied.
The remaining substitutions 16051[G] ... 16316[G] are included in the standard sites for HVR1.
16355[T] and 16362[C] are outside the standard sites, and hence are coloured black.
63-72[o] shows that the researchers did not sequence sites 63 to 72 inclusive, although they form part of the HVR2 standard sites.
73[G] ... 315.1[C] are the substitutions which appear in the standard sites of HVR2.
340[T] lies beyond the standard sites of HVR2, and so is coloured black.
Sequences under the same key site are sorted numerically and alphabetically, with HVR1 preceding HVR2, and [o] preceding [A].
In the right-hand column of the table, each distinct citation of this exact sequence is listed, preceded by (sometimes there are hundreds of citations, listed in alphabetic order of population name). Items are separated by semicolons. The first item after the , in this case Twgdam is the short title of the published paper or other source in which this sequence appears. Complete details of the paper can be found in the bibliography.
The next item AFAM 11 is the sequence identifier assigned to this sequence by the researchers.
If known, a haplogroup number then appears in square brackets, in this case [HA].
There then follows a population name, in this case African Amer., with an absolute frequency in parentheses. In the case of a single individual's sequence, there will only be one population name, and the frequency will always be (1); in the case of a sequence representing more than one individual, there may be more than one population listed, and the absolute frequencies may be greater than one. It is impossible to ascertain these frequencies from some of the published papers, and in these cases the frequency appears as (?).
Population names (sometimes abbreviated) are usually as given in the published literature, and reference should be made to the Population Index and the Geographical Index for identifying the geographical location of less well known groups, for example Yanomama.