Tuesday, July 11, 2017

GRCh38.p11: Update to GCNT2

The GRC prioritizes curation efforts that improve gene representation in the human reference genome assembly. In some cases, such curation takes the form of base-pair level edits. The recent GRCh38.p11 patch release includes a new, curated, representation for the GCNT2 gene. The representation of the GCNT2 gene in the GRCh38 reference assembly contains the "C" allele for SNP rs539351 on chromosome 6 (NC_000006.12) at position 10,586,805, which reflects the sequence from the underlying component AL358777.12 (RP11-421M1) (Figure 1, top). During human development, the fetal blood group antigen (i) is converted to the adult antigen (I) by a beta-1,6-N-acetylglucosaminyltransferase-2 (GCNT2). Alternative splicing of the gene generates 3 isoforms, which differ only in their first exon. The SNP rs539351 is found in the first exon unique to the GCNT2 isoform C, which is the only one expressed in red blood cells, where this conversion occurs (NM_145655.3: c.816C>G (NP_663630.2: p.Asp272Glu). A user contacted the GRC with information that the reference allele had previously been described as a rare allele [1].

Although the reference assembly does not provide the most common alleles for all loci, the GRC does make an effort to make sure that reference alleles are not  universally rare (defined for reference purposes as those with a global MAF < 5%), provided that it can do so while representing a biologically valid haplotype and a functional allele. Data from the 1000 Genomes project revealed that the "C" allele in GRCh38 had a global MAF=0.017. Thus, this allele was in scope for an update.

The GRC used sequence from ABBA01022081.1, a component of the HuRef assembly, as a new assembly component to provide the more common G allele at this position (Figure 1, bottom). We used haplotype information provided by Ensembl to confirm that the new coding representation is one that is biologically valid (GCNT2: 272D>E). This update is now included in the fix patch (KZ208911.1). This update should improve reviewing variation analyses results in which the reference assembly is being used as a model. The GRC continues to make these base updates for GRCh38. If you have questions or concerns about this process, let us know.

Figure 1 Top: Zoomed-in graphical view of the GCNT2 gene in GRCh38. The assembly sequence is shown at the top. The GCNT2 is shown in green. The reference allele D272 is a minor allele (brown box). Bottom: Zoomed-in graphical view of HG2057_PATCH, represents the more common allele (G) from ABBA01022081.1 (red box).  

Reference:

  1. Reid M., et al. The Blood Group Antigen FactsBook (3rd Edition), 603–608 (2012)