Tuesday, July 5, 2011

Genome Update: Representing variation in the LRC on chr. 19q13.4

Human GRCh37 patch release 5 includes eight NOVEL patches representing different haplotypes in  the Leukocyte Receptor Complex (LRC) region on chromosome 19q13.4 (GL949746.1, GL949747.1, GL949748.1, GL949749.1, GL949750.1, GL949751.1, GL949752.1, GL949753.1). This region contains multiple clusters of genes belonging to the immunoglobulin superfamily, including killer immunoglobulin-like receptors (KIRs), leukocyte immunoglobulin-like receptors (LILRs) and leucocyte-associated immunoglobulin-like receptors (LAIRs). The LRC complex is of major importance in human disease across a wide context. Research efforts have focused in particular on the KIR cluster, since this ~150kb  region displays extensive haplotypic variation due to both differences in coding sequences and the presence or absence of particular loci. 

Several reports indicated problems with the representation of the LRC region in both NCBI36 and GRCh37. In GRCh37, one improvement was made when the NCBI36 chr. 19 unlocalized scaffold NT_113949.1, which contained a second representation of this region, was determined to be mis-assembled and was excluded from the assembly (tracked in HG-196). However, in both assembly versions, the chromosome 19 sequence for this variable region is derived from multiple clone libraries, suggesting a haplotype representation problem. On-going GRC efforts to replace this region of chromosome 19 in future assembly versions with a new single haplotype  from the CHORI-17 hydatidiform mole library are being tracked in HG-1079. The NOVEL LRC patches that have now been released provide partial representations of the LRC region for eight different haplotypes.

Four of the NOVEL patch LRC haplotypes are derived from the same PGF and COX cell lines that were used in the Major Histocompatibility Complex (MHC) project (7 haplotypes from the MHC project have already been incorporated into GRCh37 as alternate loci: GL000250.1, GL000251.1, GL000252.1, GL000253.1, GL000254.1, GL000255.1, GL000256.1). However, whilst PGF and COX are homozygous for the HLA region of the MHC, they are heterozygous for the KIR region of the MHC, and hence are represented here as PGF1 and 2, and COX1 and 2 (PMID:17092261). The other four LRC haplotypes, named s, t, j and i, are derived from a study by Traherne et al. that identified rare contracted KIR haplotypes in families of European origin (PMID: 19959527).

The sequence coverage of the s, t, j and i haplotypes is limited to the KIR region, whilst that of COX1/2 and PGF 1/2 extends in to the LILR and LAIR clusters. Corresponding manual gene annotation for each of these haplotypes has been generated as part of the Vega project.

Figure 1 (below): Alignment of the 8 LRC region NOVEL patches to GRCh37 chr. 19. The blue bars at top represent the tiling path of chr. 19 (NC_000019.9). Genes annotated on this sequence are shown in green. The gray tracks below represent the alignments: the thin horizontal lines indicate gaps, while the small vertical red bars indicate mismatches. 

Wednesday, March 23, 2011

Updating the genome: the CCL3L1 region of chr17q21

The CCL3L1 and CCL4L1 genes are found in a region of Human chromosome 17q12. These genes encode cytokines and the number of gene copies varies between individuals, with 0-4 copies in European individuals and 3-10 copies in African individuals. Copy number variations of these genes have been associated with various autoimmune diseases, possibly playing a role in rheumatoid arthritis susceptibility [PMID:17604289]. There are conflicting reports concerning how this region influences HIV infection and progression [PMID:15637236 and PMID:19812560].

In the NCBI36 reference, this region was comprised of clones from different libraries, and thus different haplotypes. A user reported that it seemed likely that the selected tiling path, that also contained a gap, did not represent a valid structure at this biomedically important locus. We have tracked the work on this region in HG-75. Despite our best efforts, we could not resolve all problems in this region in time for the release of GRCh37.

Because of the complexity of this region, we chose to produce a new tiling path using a BAC library that has been constructed from a hydatidiform mole library (CHORI-17). Complete hydatidiform moles are the result of a single sperm fertilizing an enucleated egg. The sperm reduplicates to generate two sets of the paternal chromosomes and thus contains DNA from a single haplotype. Clones from this resource have proven very useful for resolving highly duplicated genomic regions. The joins between the newly sequenced mole clones are of excellent quality, so we have a higher degree of confidence in the assembly of the new components than in the old, mixed-haplotype assembly.

 The resultant pathway closes the gap and contains a single copy of each CCL3L1 and CCL4L1, providing a valid allele at this locus. The new pathway also contains five full copies of TBC1D3, two of which flank the CCL genes, and could provide a reasonable explanation for the generation of the null state resulting from the recombination between the CCL flanking copies of TBC1D3. Because of the clinical importance of this region, we released this sequence as part of patch release 2 (GL383560.1).

Figure 1: Alignment of GL383560.1 to chr17 sequence. The top track, represented by the blue line, shows the sequence of GL383560.1. Below that is a track of gene features annotated on this sequence (blue represents transcript features and red represents CDS). The track below this, represented by the gray line with the red vertical bars, is the alignment to chr17 (CM000679.1). Vertical red bars show mismatches, the then red lines are gaps in the CM000679.1 sequence. The annotation on chr17 is projected below this alignment so that the resulting change in annotation can be seen.