Tuesday, January 14, 2014

GRCh38: Incorporating Modeled Centromere Sequence

Centromeres are specialized chromatin structures that are required for cell division. The composition of these regions is complex, as they are made up of a series of tandem repeats that are arranged into nearly identical multi-megabase arrays. The size and repetitive nature of these regions mean they are typically not represented in reference assemblies. The Human Genome Project (HGP) employed a clone based strategy (largely BAC clones) to produce the reference assembly, but cloning centromere sequences generally requires special effort, and isn't readily applicable to all human centromeres (see Kouprina et al., 2003 for one such effort). With the recent widespread adoption of whole genome sequencing (WGS), there are clearly alpha-satellite sequences in the reads produced, but assembling these sequences into faithful representations of centromeres using standard techniques is impossible due to the repetitive nature of these sequences. In all previous versions of the human reference assembly, the centromere regions have been represented by a 3 Mb gap (that is a stretch of 3 million Ns). Recent efforts by Karen Miga and her colleagues are helping us improve centromere representation in the reference assembly. The GRCh38 reference assembly incorporates centromere models created by Miga and colleagues, along with their modeled region of one of the heterochromatic regions on the long arm of chromosome 7. These models replace the multi-megabase gaps that are in GRCh37.

As described in Miga et al., 2013,  Karen and her colleagues used the whole genome shotgun (WGS) reads that were generated as part of the Venter sequencing project (Levy, et al., 2007) to build centromere models (Fig.1). They started by identifying sequence reads containing alpha-satellite centromere sequences. They then used these reads to construct models representing the approximate repeat number and order for each of the centromeric alpha-satellite higher order arrays in the genome. Because there are two copies of each centromere for each autosome, these centromere models represent an average of the two centromere copies. On the acrocentric chromosomes, where there is extreme inter-chromosomal array sequence homogeneity, the array models found in GRCh38 include data from all four acrocentric regions. The team was also able to use read pair information to link the modeled scaffold arrays to the adjacent euchromatic sequence present in the Venter assembly.
Fig. 1
Schematic of modeled centromere sequence. Centromeres are comprised of higher order array sequences, which consist of alpha-satellite interrupted by various repeat elements (such as SINE or LINE elements), and inter-array (euchromatic) sequences.
The model centromere sequences are not exact representations of the centromeres found in the Venter genome. The sequence diversity and complexity of these regions make constructing the exact copies of each centromere with current sequencing technologies impossible. Each model represents variants and monomer ordering in a proportional manner to that observed in the initial read database, but the long-range ordering of the repeats and ordering of the linked euchromatic contigs represents only an inferred sequence. However, inclusion of these models in the reference assembly will be beneficial for the research community. Even for those not interested in centromere biology, it is likely that inclusion of these models will improve overall read alignments in individual re-sequencing efforts. Reads containing centromeric sequences are generated in whole genome sequencing experiments and providing an alignment target for these reads will reduce the number of off target alignments and unaligned reads. For those interested in centromere biology, Karen and her colleagues provide evidence in their manuscript that these models can be used to study sequence diversity in these regions.

8 comments:

  1. So how many base pairs are there per chromosome?

    ReplyDelete
  2. Website paling ternama dan paling terpercaya di Asia ^^
    Sistem pelayanan 24 Jam Non-Stop bersama dengan CS Berpengalaman respon tercepat :)
    Tersedia deposit via OVO dan PULSA TELKOMSEL serta XL / AXIS

    Contact Us

    Website : SahabatQQ
    WA 1 : +855972076840
    WA 2 : +855887159498
    LINE : SAHABATQQ
    FACEBOOK : SahabatQQ Reborn
    TWITTER : SahabatQQ
    Blog :
    * Cerita 18+
    * Artikel Seks
    * Dunia Traveling
    * Majalah kesehatan
    * Film & Movie Onlie
    * Artikel Poker

    Daftar SahabatQQ

    ReplyDelete
  3. SahabatQQ: Agen DominoQQ Agen Domino99 dan Poker Online Aman dan Terpercaya
    SahabatQQ adalah agen domino99, poker online, dominoqq, bandarqq, yang sangat berkualitas dan teruji aman dengan permainan kartu online yang sangat menarik dengan winrate yang tinggi
    Klik Disini >> Join <<
    Klik Disini >> Daftar <<

    ReplyDelete
  4. Website paling ternama dan paling terpercaya di Asia ^^
    Sistem pelayanan 24 Jam Non-Stop bersama dengan CS Berpengalaman respon tercepat :)
    Tersedia deposit via OVO dan PULSA TELKOMSEL serta XL / AXIS

    Contact Us

    Website : SahabatQQ
    WA 1 : +855972076840
    WA 2 : +855887159498
    LINE : SAHABATQQ
    FACEBOOK : SahabatQQ Reborn
    TWITTER : SahabatQQ
    Blog :
    * Cerita 18+
    * Artikel Seks
    * Dunia Traveling
    * Majalah kesehatan
    * Film & Movie Onlie
    * Artikel Poker

    Daftar SahabatQQ

    ReplyDelete
  5. (Michael Kors Outlet Store) power organization of a minimal amount of leaguers and as well (Ray Ban Outlet) as castoffs condensation chooses as much has chance for success all other (Cheap Yeezys For Sale) as. Consequently Randy McKay develops the the same of a may well Sakic..

    "Expertise special, Asserted the particular book, Up to that this (Coach Outlet Store Online) person been recently reading material your body and mind. "That do what the employees was feeling once they followed in suitable on account that, Keep in mind, Lots theorized these people were arriving for a your job cheesy. Most people weren a (Ray Ban New Wayfarer Polarized) number of what (Cheap Jordan Shoes Websites) was possible,.

    The actual descriptions can consist of: 1. Back your own house operates 2. Emits 3. A number providers step in your journal efficient that (New Yeezys 2020) lodge width wise at the same time the text exchange(This also requires apr's) Heightens. (Coach Outlet Online) Second loan merchants have been completely sluggish of start up, But unfortunately take following shifted way up extra ever more. Calm other medication is approximately those individuals dimensions.

    ReplyDelete
  6. TIKETQQ AGEN BANDARQ DOMINO99 BANDAR POKER DAN BANDAR66 ONLINE TERBAIK DI ASIA
    * Minimal Deposit Rp 15.000,-
    * Bonus Rollingan 0.5% di bagikan 5 hari sekali
    * Tersedia 9 jenis game dalam 1 User ID
    WA : +855885063246
    LINK ALTERNATIF : KLIK DI SINI

    ReplyDelete