Friday, April 4, 2014

Chromosome 9 peri-centromeric assembly improvement

With the release of the GRCh38 reference assembly, we are highlighting areas where improvements to the genome have been made.

The chromosome 9 peri-centromeric region has undergone significant change for GRCh38. Assembly-assembly alignments between GRCh37 and GRCh38 reveal some of the differences in the peri-centromeric region of chr. 9. As shown below, some sequences that were on the q-arm in GRCh37 are now on the p-arm in GRCh38. Why were these and other changes made?

Peri-centromeric regions of Chr. 9 in GRCh37 (top) and GRCh38 (bottom).
Blue horizontal bar: chromosome sequence. Blue/green fragments: individual clone and WGS components in the assembly tiling path. Purple bars: assembly-assembly alignments. The p- and q- arms, as well as the location of the centromere and adjacent heterochromatin gaps are marked. Note: in GRCh38, the centromere gap was replaced with sequence. The vertical bars through the alignments highlight sequence from the q-arm of GRCh37 chr. 9 that is now found on the p-arm of GRCh38.

In the GRCh37 release the region was highly fragmented, with little evidence for the order and orientation of the contigs placed within. The optical map information was consistent with a path problem in this region. The map data suggested that several contigs in the region were misplaced and did not represent a valid chromosome structure in this region.

Optical map alignments to GRCh37, highlighting the fragmented and discordant pathway.
Track legend:
Pink: clone path; Green: gap; Blue: in silico SwaI fragments.
Aligned optical map track legend:
Gold: Concordant fragment; Red: Missing fragment (seen where OM consensus span gap); Grey: Unaligned fragment

Utilizing analyses from optical mapping, strand sequencing and admixture mapping we have made advancements in the representation of the region.

These data sets have allowed us to alter the tile path with a degree of confidence and the GRCh38 release now provides near complete representation of the chromosome 9 short arm.

Admixture mapping data provided by GRC collaborator Giulio Genovese confirmed localisation of clones to chromosome 9 and, in several instances, their positioning on the long or short arm. Strand sequencing data from GRC collaborators Mark Hills and Peter Lansdorp identified contigs on the GRCh37 reference assembly that sat in incorrect orientations.
Aligning these sequences to the optical map data from 3 cell lines, we were able to confirm results from the other data analysis and place clone contigs in the correct order, creating longer contiguous contigs.

Optical map alignments to a pre-release GRCh38 pathway containing unfinished clones.

Although the heterochromatic region on chr. 9 is still underrepresented in GRCh38, improvements have also been made to the long arm. Several contigs localizing to the peri-centromeric region are now ordered, thus providing a better representation of the chromosome.