Wednesday, March 23, 2011

Updating the genome: the CCL3L1 region of chr17q21

The CCL3L1 and CCL4L1 genes are found in a region of Human chromosome 17q12. These genes encode cytokines and the number of gene copies varies between individuals, with 0-4 copies in European individuals and 3-10 copies in African individuals. Copy number variations of these genes have been associated with various autoimmune diseases, possibly playing a role in rheumatoid arthritis susceptibility [PMID:17604289]. There are conflicting reports concerning how this region influences HIV infection and progression [PMID:15637236 and PMID:19812560].

In the NCBI36 reference, this region was comprised of clones from different libraries, and thus different haplotypes. A user reported that it seemed likely that the selected tiling path, that also contained a gap, did not represent a valid structure at this biomedically important locus. We have tracked the work on this region in HG-75. Despite our best efforts, we could not resolve all problems in this region in time for the release of GRCh37.

Because of the complexity of this region, we chose to produce a new tiling path using a BAC library that has been constructed from a hydatidiform mole library (CHORI-17). Complete hydatidiform moles are the result of a single sperm fertilizing an enucleated egg. The sperm reduplicates to generate two sets of the paternal chromosomes and thus contains DNA from a single haplotype. Clones from this resource have proven very useful for resolving highly duplicated genomic regions. The joins between the newly sequenced mole clones are of excellent quality, so we have a higher degree of confidence in the assembly of the new components than in the old, mixed-haplotype assembly.

 The resultant pathway closes the gap and contains a single copy of each CCL3L1 and CCL4L1, providing a valid allele at this locus. The new pathway also contains five full copies of TBC1D3, two of which flank the CCL genes, and could provide a reasonable explanation for the generation of the null state resulting from the recombination between the CCL flanking copies of TBC1D3. Because of the clinical importance of this region, we released this sequence as part of patch release 2 (GL383560.1).

Figure 1: Alignment of GL383560.1 to chr17 sequence. The top track, represented by the blue line, shows the sequence of GL383560.1. Below that is a track of gene features annotated on this sequence (blue represents transcript features and red represents CDS). The track below this, represented by the gray line with the red vertical bars, is the alignment to chr17 (CM000679.1). Vertical red bars show mismatches, the then red lines are gaps in the CM000679.1 sequence. The annotation on chr17 is projected below this alignment so that the resulting change in annotation can be seen.