Wednesday, June 28, 2017

Improvements in the 5S rRNA gene cluster on chromosome 1q42.11-q42.13

Sequence updates that improve gene representation in the human reference genome assembly are priorities for the GRC. The recent GRCh38.p11 patch release includes a newly curated representation of the 5S RNA gene cluster (RN5S1@) located on chromosome 1q42.11-q42.13. The 5S ribosomal RNA (rRNA) is a component of the large subunit of the ribosome in all organisms. In humans, the 5S rRNA cluster is comprised of individual rRNA genes repeated in head-to-tail orientation with non-rRNA sequences in the spacer regions. The number of 5S rRNA repeats per haploid human genome is highly polymorphic, in a range of 35-175 (1).

The repetitive clustered nature of the 5S rRNA region has long complicated both its sequencing and assembly, and its representation is incomplete in GRCh37 and GRCh38, the last two major reference assembly versions, though in different ways. The underlying components AL139288.15 (RP5-915N17) and AL713899.14 (RP4-621O15) provide the sequence for the 5S rRNA region in both assemblies. In GRCh37, a false alignment between repeat copies in the two components led to a contiguous, but collapsed, representation. In GRCh38, the false alignment was broken and a default 50 kb gap was inserted in chromosome 1 (CM000663.2/NC_000001.11) at 228,558,365 bp as a placeholder for the missing sequence (Figure 1, top). The GRCh38 representation of the cluster includes only 19 5S rRNA gene unit copies (17 functional and 2 pseudogenes) (Figure 1, top).

The fix patch (KZ208906.1) included in the GRCh38.p11 release now provides a contiguous and validated representation of the 5S rRNA genomic region. The patch closes the assembly gap and replaces the 5S rRNA copies from AL139288.15 and AL713899.14 with sequences from AC275639.1 (CH17-275P10), a BAC clone that completely spans the cluster (Figure 1, bottom). The patch provides 35 copies (34 functional and a single pseudogene) of 5S rRNA genes (Figure 1, bottom). The haplotype represented in this clone has been verified by BioNano optical map data for the haploid CHM1 sample, from which the clone library was derived. This new representation should serve as an improved substrate for analysis of the region, including read alignment and variation analysis.


Figure 1 Top: 5S rRNA region in GRCh38. Incomplete representation of 5S rRNA gene cluster in GRCh38 due to an assembly gap. Bottom: 5S rRNA fix patch in GRCh38.p11. The gap is closed and a complete representation of the 5S rRNA is provided.

Reference:
  1. Stults, DM. et al. Genome Res. 18(1):13-8 (2008)