Thursday, February 19, 2015

GRC Website: Individual Genome Issues

Are you looking for the latest status updates from the GRC on the human, mouse or zebrafish reference genome assemblies? In companion to our previous post, we now explain how to use the Individual Genome Issue reports on the GRC website. As described in our last blog, you can filter and search for issues of interest using the organism-specific "Issues Under Review" pages. To provide an example for this blog post, we applied the follow filtering options on the human "Issues Under Review" page: issues location = GRCh38.p2, chromosome = chr3 , type = variation and scaffold type = ALT (alternative loci). We then selected HG-1291 from column 1 of the results table to go to the individual issue page, shown below in Figure 1. On this and other individual issue pages, you'll find the following information:
  • Summary fields describing the issue and its latest status updates or resolution (blue box)
  • Ideogram showing the issue's genomic location (green box)
  • Patch and/or alternate loci status and history (orange box)
  • Graphical view of genomic region to which issue is mapped (red box). 
    • Note: graphical views are provided for all mapped locations in the previous and current assembly versions. For example, HG-1291 has been mapped to chr. 3 and an alternate locus scaffold in GRCh38.p2, and to chr. 3 and a novel patch in GRCh37.p13. Use the radio buttons to toggle the display between the different sequence locations.
Figure 1. GRC Issue page for HG-1291, with page features highlighted.

Below, figure 2 shows the graphical view of the GRCh38.p2 alternate locus scaffold to which HG-1291 has been mapped (NW_003871060.2). Default tracks in the graphical views provide you with additional information about the assembly composition and quality. They include:

  • Assembly components
  • Alignments of alternate loci/patch scaffolds to the primary assembly
  • Annotated component assembly problems
  • All GRC issues mapped to the region
  • NCBI Gene annotation
  • Ensembl Gene annotation

In this image of HG-1291, review of the Genes and Alignment tracks reveals two exons in a region of the alternate loci that has no alignment to the chromosome (arrow and circle). This annotation supports the description in the Issue Summary fields. You can further configure the tracks or upload your own data files to the graphical view by clicking on the "Configure" button at the top right of the viewer (red box).
Figure 2. Graphical view of NW_003871060.2, the GRCh38.p2 alternate loci scaffold to which issue HG-1291 is mapped. The exons captured by the additional sequence in the scaffold are highlighted.

If you have questions about any of the issues you see, please contact the GRC and reference the issue number. If you know of a genome issue that isn't found on these pages, please report the issue to the GRC.

Tuesday, February 3, 2015

GRC Website Update: Genome Issues Under Review

GRC "Genome Issues under Review" webpage update!

Do you know how to find genome issues on the GRC website? To get started, select an organism from the top of the GRC homepage, and in the corresponding organism overview page select the link for "Issues Under Review". These pages provide you with the latest information about potential problems and other issues related to the human, mouse and zebrafish reference genome assemblies that the GRC are working on. Recent updates to these pages make them more interactive, informative and easier to navigate so you can pinpoint issues relevant to your research interests. Some of the page features are highlighted in Figure 1, which shows "Human Genome Issues".
  • Show issue locations on (blue box): Use this to define the assembly version on which you want to see mapped issues. We support issue mapping to the current assembly and the last release of the prior assembly version.
  • Ideogram (green box): The histogram above presents the number of issues related to each chromosome, and the annotations show issue locations. Looking for issues related to a single chromosome? Click on a chromosome or histogram of interest to see a more detailed ideogram with annotated issues (more on this below).
  • Search (purple box)Use this to finding issues related to a specific gene/clone/accession number/chromosomal location.
  • Data table: Provides a summary of issues. Within this table, click on issue ID (brown box) to go to web pages for specific issues or View in browsers (brown box) to see the relevant genome regions in browsers at Ensembl, NCBI, and UCSC.
Figure 1. Human Genome Issues overview
Additional page features shown below in Figure 2 will help you identify the issues that interest you most:
  • Filter: Located to the left of the data table, this section contains various display filters, including issue type and issue status, to help you find GRC issues meeting specified criteria.
  • Issue Annotations: In the single chromosome ideogram displays, issues are annotated below the figure.
    • Tool-tips: Click on any annotation for a summary and a link to the issue page
    • Bar charts: Click on either of the interactive bar charts below the ideogram to re-categorize the issue annotation display by Type or Status.
Figure 2. Chromosome 1 genome issues
If you have questions about any of the issues you see, please contact the GRC and reference the issue number. If you know of a genome issue that isn't found on these pages, please report the issue to the GRC.

Wednesday, January 28, 2015

GRCh38: Patching the ABO gene

GRCh38 has started receiving patch updates, and this blog post describes a FIX patch to the ABO gene, located on chr. 9. You might have been aware that the GRC released a FIX patch to ABO for GRCh37. So why is there an ABO FIX patch for GRCh38 as well?

In GRCh37, the ABO gene was annotated on sequence derived from two RP11 library clones, AL732364.9 (RP11-244N20) and AL158826.23 (RP11-430N14). However, the RP11 library is derived from a diploid genome and analysis demonstrated that the two sequenced clones represented two different Type O ABO alleles. As a result, the GRCh37 chr.9  representation of ABO was an invalid haplotype for the gene (Fig. 1, top panel).
Fig. 1 Top: ABO region in GRCh37. The gene is derived from 2 components, resulting in an invalid  haplotype not seen in any individuals. Bottom: ABO fix patch. The gene is derived from a single component and represents a known Type O haplotype.

To address this issue, we identified a clone from the CalTech human BAC library D that captures the complete ABO gene (CTD-2612A24). The sequence for this clone was finished (AL772161.10) and inserted into the chr. 9 tiling path, replacing RP11 component AL158826.23. By setting the switch points between AL732364.9 (RP11-244N20) and AL772161.10 (CTD-2612A24) so that the full insert sequence of the new component contributed to the scaffold, we were able to provide a complete and valid ABO Type A1.02 representation for the gene. Thus update was provided as a FIX patch scaffold (GL339450.1) for GRCh37. (Fig.1, bottom panel).

Unfortunately, this update is not reflected in GRCh38. Subsequent to the final GRCh37 patch release (GRCh37.p13) and the release of GRCh38, the sequence to RP11-244N20 was updated (AL732364.10) and inserted into the chr. 9 tiling path. The switch points between the updated sequence AL732364.10 and AL772161.10 were set incorrectly (Fig. 2). This resulted in an invalid haplotypic representation for ABO. Whereas the GRCh37 representation was a Type O/O mix, in GRCh38 it is a Type A/O mix.
Fig.2 Top: ABO fix patch. Gene is derived from a single component. Bottom: ABO region in GRCh38. The gene is derived from 2 components, creating an invalid haplotype. This is fixed by the GRCh38 FIX patch.

The GRCh38 FIX patch scaffold KN196479.1 corrects this switch point and provides the same single haplotype representation for ABO that was present in the GRCh37 FIX patch scaffold. This re-patching of the ABO gene again restores the functionality of the gene with the valid Type A1.02 haplotype.