Genes encoding for proteins that compose the immune system are constantly evolving in response to selective pressures from pathogens. This rapid host-pathogen co-evolution has led to large families of genes that are highly polymorphic and are often a result of gene duplication and diversification. In GRCh37, the current reference assembly, some chromosome regions encompassing such genes are comprised of components from several different genomic libraries. The lack of a single haplotype and excess allelic variation at such regions hinders haplotype inference using traditional linkage disequilibrium based methodology. In addition, given the polymorphic nature of these genes, paralogs may be missing from the reference assembly. The CHORI-17 BAC library, derived from a hydatidiform mole, is an excellent resource for resolving loci such as these, as it is composed of germline material without any allelic variation. We sequenced clones from CHORI-17 to create a single haplotype across two of these loci: the leukocyte receptor complex (LRC) and the immunoglobulin heavy chain locus (IGH). These new paths have now been released as fix patches in GRCh37.p11.
The LRC on chromosome 19q13.4 is approximately 1 Mbp and contains many genes related to immune response including the LILR (Leukocyte Immunoglobulin-like Receptor) and KIR (Killer Immunoglobulin-like Receptor) gene families (Fig.1). The products of these genes interact with HLA molecules making them important components of the innate immune response. The GRC previously released 8 novel patches providing partial representation of the LRC region for eight different haplotypes. We have now released a fix patch (KB021647.1) for this region that provides full representation for the CHORI-17 haplotype. In GRCh38, this patch will be incorporated into the reference chromosome, replacing the GRCh37 mixed haplotype. The CHORI-17 haplotype harbors the common 6.8 kbp LILRA3 deletion, which has been associated with multiple autoimmune disorders such as psoriasis and multiple sclerosis. In addition, the KIR haplotype is the A01 haplotype, which contains the 22 bp frameshift deletion variant of the 2DS4 gene that inactivates the protein.
The 1 Mbp IGH locus on chromosome 14q32.33 contains genes that encode for the heavy chain of immunoglobulin molecules that interact with antigen epitopes (Fig. 2). This locus is even more complicated than the LRC given that the IGH genes are subject to somatic rearrangements, and attempts to reconcile the organization of the locus using B-lymphocyte derived material have been difficult. The GRC has now released a fix patch (KB021645.1) that provides a single haplotype representation for the majority of this locus, covering the IG variable domain encoding gene segments. The CHORI-17 haplotype adds 101 kbp of previously uncharacterized sequence, including functional IGH variable genes and four large germline copy number variants (Watson and Steinberg, in review).
These two updates highlight the utility of using hydatidiform mole BAC libraries for resolving complex, highly duplicated loci of the human genome. By releasing these updates as fix patches to the reference sequence researchers can make use of these high quality sequences to better characterize sequence variation from their own disease association studies ahead of the GRCh38 genome update.