Wednesday, January 9, 2013

Genome Update: Highly variant immune regions retiled as single haplotype paths


Genes encoding for proteins that compose the immune system are constantly evolving in response to selective pressures from pathogens. This rapid host-pathogen co-evolution has led to large families of genes that are highly polymorphic and are often a result of gene duplication and diversification. In GRCh37, the current reference assembly, some chromosome regions encompassing such genes are comprised of components from several different genomic libraries. The lack of a single haplotype and excess allelic variation at such regions hinders haplotype inference using traditional linkage disequilibrium based methodology. In addition, given the polymorphic nature of these genes, paralogs may be missing from the reference assembly. The CHORI-17 BAC library, derived from a hydatidiform mole, is an excellent resource for resolving loci such as these, as it is composed of germline material without any allelic variation. We sequenced clones from CHORI-17 to create a single haplotype across two of these loci: the leukocyte receptor complex (LRC) and the immunoglobulin heavy chain locus (IGH). These new paths have now been released as fix patches in GRCh37.p11.

The LRC on chromosome 19q13.4 is approximately 1 Mbp and contains many genes related to immune response including the LILR (Leukocyte Immunoglobulin-like Receptor) and KIR (Killer Immunoglobulin-like Receptor) gene families (Fig.1). The products of these genes interact with HLA molecules making them important components of the innate immune response. The GRC previously released 8 novel patches providing partial representation of the LRC region for eight different haplotypes. We have now released a fix patch (KB021647.1for this region that provides full representation for the CHORI-17 haplotype. In GRCh38, this patch will be incorporated into the reference chromosome, replacing the GRCh37 mixed haplotype. The CHORI-17 haplotype harbors the common 6.8 kbp LILRA3 deletion, which has been associated with multiple autoimmune disorders such as psoriasis and multiple sclerosis. In addition, the KIR haplotype is the A01 haplotype, which contains the 22 bp frameshift deletion variant of the 2DS4 gene that inactivates the protein.


Fig. 1 LRC CHORI-17 patch
Fig. 1 Top: Alignment of GRCh37 chr. 19 to the LRC region fix patch. Bottom: Alignment of the fix patch and 8 LRC region novel patches to GRCh37 chr. 19. The blue bars represent the tiling paths of chr. 19 (NC_000019.9) and the fix patch (KB021647.1). The region of the fix patch comprised of CHORI-17 clones is highlighted in orange. Genes annotated on the chromosome are shown in green. The gray tracks below represent the alignments: the thin horizontal lines indicate gaps, while the small vertical red bars indicate mismatches.  The red arrows show the location of the LILRA3 deletion in the CHORI-17 haplotype.

The 1 Mbp IGH locus on chromosome 14q32.33 contains genes that encode for the heavy chain of immunoglobulin molecules that interact with antigen epitopes (Fig. 2). This locus is even more complicated than the LRC given that the IGH genes are subject to somatic rearrangements, and attempts to reconcile the organization of the locus using B-lymphocyte derived material have been difficult. The GRC has now released a fix patch (KB021645.1that provides a single haplotype representation for the majority of this locus, covering the IG variable domain encoding gene segments. The CHORI-17 haplotype adds 101 kbp of previously uncharacterized sequence, including functional IGH variable genes and four large germline copy number variants (Watson and Steinberg, in review).


Fig. 2. Top: Alignment of GRCh37 chr. 14 to the IGH region fix patch. Bottom: Alignment of the fix patch to GRCh37 chr. 14. The blue and gray bars represent the tiling paths of chr. 14 (NC_000014.8) and the fix patch (KB021645.1). The region of the fix patch comprised of CHORI-17 clones is highlighted in orange. Genes annotated on the chromosome are shown in green. The purple bars below represent the alignments: the thin regions indicate gaps, while the small vertical ticks indicate mismatches.

These two updates highlight the utility of using hydatidiform mole BAC libraries for resolving complex, highly duplicated loci of the human genome. By releasing these updates as fix patches to the reference sequence researchers can make use of these high quality sequences to better characterize sequence variation from their own disease association studies ahead of the GRCh38 genome update.