GRCh38.p14 (GCA_000001405.29/GCF_000001405.40), the latest update to the human reference assembly, has been released! It adds 69 new patch scaffolds, 51 of which are FIX patches that update sequences on the GRCh38 reference chromosomes or alternate loci, while 18 are NOVEL patches, providing new alternate representations for complex genomic regions that are inadequately represented by a single sequence. Two previously released FIX patches were also updated. With this release, the reference assembly contains a total of 250 patch scaffolds (164 FIX, 90 NOVEL).
|Table 1. Gene representations updated on FIX patches addressing assembly component problems.
|Table 2. Coding alleles of polymorphic pseudogenes updated by FIX patches addressing genomic variation.
An example of an important FIX patch in this release is an update to APOB, one of the genes the American College of Medical Genetics and Genomics recommends for reporting of incidental findings in clinical exome and genome sequencing. The patch scaffold provided in GRCh38.p14 represents the common allele.
There are 18 NOVEL patches in this release, providing alternate sequence representations of chromosomal sequences, including 9 genes (Table 3). Other NOVEL patches represent inversion and insertion haplotypes relative to the corresponding chromosomal region.
|Table 3. Genes with alternate sequence representation on GRCh38.p14 NOVEL patches.
Notably, 9 of the NOVEL patches used clone sequence generated by Evan Eichler's lab as part of a published study of the evolution and population diversity of human-specific segmental duplications.The GRC also used sequences generated by the Eichler lab to create a FIX patch to improve a GRCh38 chromosome 5 alternate locus scaffold (KI270897.1/NT_187651.1) representing the haplotype from the CHM1 hydatidiform mole at the hypervariable SMA locus. Informed by CHM1 Bionano optical map data, the GRC provided a FIX patch (MU273354.1/NW_025791777.1) that corrects component order and adds sequence from several newly sequenced CHM1 BAC clones to the alternate locus scaffold.
This patch release also extends GRC efforts to identify and exclude problematic sequences, such as false redundancies and contamination, from the reference assembly. The companion BED file available from GenBank that identifies such regions and can be used as a mask to exclude them from analyses, has now been updated. The latest updates reflect curation done in response to reports from GRCh38 analyses performed by the Genome In a Bottle (GIAB) and Telomere-to-Telomere (T2T) consortia. In addition to the chromosome 21p regions previously reported, the file provides coordinates for 7 other regions in which the sequence falsely duplicates other sequence found in the assembly.