English 网站地图
Irys 长链单分子荧光分析技术
Irys 长链单分子荧光分析技术
BioNano Irys 在结构变异研究中的重大突破-利用纳米芯片技术全面谱解CEPH trio基因的结构变异
发布日期:2015-10-30  查看次数: 102 次  

        众所周知,利用目前传统的方法在观察整个基因的结构变异的影像学研究中仍然存在着困难,基因测序由于其读长短和对重复碱基的检测(例如二倍体中核苷酸)中存在的局限性,不可能有效地观察基因结构变异。近期,Angel CY Mark等学者利用BioNano Irys系统的纳米芯片技术全面呈现了CEPH Trio(NA12878/891/892)基因在结构变异中的三位图谱,该项研究已经在CAS(美国遗传协会)的代表性期刊GENETICS上发表,充分的展示了BioNano卓越的荧光纳米芯片技术和分析方法。


Comprehensive whole genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole genome structural variation detection without sequencing. While whole genome haplotyping is not achieved, local phasing (across >150 kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variation that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation. 



Whole genome “short-read” sequencing is now routine and affordable. However, three challenges remain in genome analysis: genome sequence assembly, structural variation detection, and separation of the two parental genomes. In addition to the fact that humans are diploid, with cells harboring two genomes from the parents, the presence of numerous repetitive elements that are longer than the usual sequencing library insert size makes it close to impossible to assemble genome sequences with short- read sequencing alone (El-Metwally et al. 2013). Consequently, almost all whole genome sequencing projects map the sequencing reads onto the reference human genome sequence without performing whole genome assemblies (Ley et al. 2008). When whole genome assembly is attempted, it is done by the laborious and expensive approach of generating paired-end sequencing of cloned genomic DNA fragments to provide scaffolds for sequence assembly (Siegel et al. 2000). Alignment of short sequencing reads to the human reference sequence reveals single nucleotide variation and small indels in the individuals sequenced but larger structural variants and repetitive regions in the genome are more difficult to detect. As structural variation can disrupt genes or regulatory elements, whole genome sequencing without assembly and detection of structural variation produces an incomplete picture of the genome. Recently, clone-free approaches (e.g. Hi-C scaffolding) have been used to generate sequence motif maps or long sequences to serve as scaffolds for the assembly of highly accurate short- read sequences (Burton et al. 2013; Kaplan and Dekker. 2013), including the de novo assembly of a diploid human genome (Pendleton et al. 2015). These “hybrid assembly” approaches rely on 3 sets of data - short read sequences, long read sequences (5-20 kb reads), and genome maps (150-500 kb) - to overcome repetitive elements and duplicated regions larger than the typical contigs assembled from short read sequences.

A fully assembled and phased diploid genome makes it possible to identify all structural variants present with direct access to the breakpoints involved. However, high-quality human genome sequence assembly with base-pair resolution, while feasible, is still a costly and laborious endeavor. In this report, we demonstrate the utility of genome mapping, an approach based on massively parallel analysis of extremely long single DNA molecules fluorescently labeled at specific sequence motifs in nanochannel arrays, in genome-wide identification of structural variation at 5 kb resolution without sequencing. In contrast to the short reads (hundreds of bases) used in next-generation sequencing (NGS) approaches, genome mapping analyzes individual DNA molecules of hundreds of thousands of base-pairs, thus preserving the long-range genome architecture and enabling direct interrogation of structural variants. 


Genome mapping has been used in several previous studies to provide scaffolds for genome sequence assembly (Cao et al. 2014; English et al. 2015; Hastieet al. 2013; Pendleton et al. 2015; Usher et al. 2015; Xiao et al. 2015). The DNA sample is prepared with a protocol that preserves the integrity of the DNA. Because native DNA is used, no amplification bias is present. Currently, analyzing a genome by genome mapping (at >60X coverage takes less than 2 days and costs less than $1000. While whole genome haplotyping is not achieved with genome mapping alone, local phasing across regions of at least 150 kb

is routine with our single-molecule analysis approach, as molecules derived from the parental chromosomes are examined separately.

We generated genome maps from a trio from the 1000 Genomes Project where the individuals have been sequenced to high depths and with their structural variations published previously. We compared the genome maps obtained from the trio against those derived from the reference human genome and identified all the structural variation that are >5 kb present. Comparing the genome maps of the parents and the child allows us to check for consistency in Mendelian inheritance and separate the haplotypes. Our study shows that these individuals have many more structural variants than those published and that some of these variants have the potential to disrupt gene function or regulation. Using nicking endonuclease Nt.BspQI (GCTCTTCN^), one label is observed about every 8 kb in the human genome. Without sequencing or cloning, we are able to map breakpoints of the structural variation within 8 kb, making this a novel and efficient approach to whole genome structural variation analysis. Furthermore, our maps pinpoint the Epstein-Barr Virus (EBV) integration sites in the lymphoblastoid cell lines used and provide size estimates of two-thirds of the large “N-base gaps” in the hg38 human reference genome sequence. 





上一条: 暂无信息 单分子荧光分析技术揭示解旋酶作用机制下一条
关于仁科 | 行业新闻| 人才招聘 | 联系我们 | 法律声明 | 网站地图

友情链接: 转化医学网 生物谷

地址:上海市徐汇区漕宝路401号3号楼4楼D座 | 电话:021-34250079 64811910 | 传真:021-64398697 | E-mail:marketing@bio-star.cn
版权所有:上海仁科生物科技有限公司 备案号:沪ICP备06047532号 技术支持:浦元