Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions

Kidd, J.; Sampas, N.; Antonacci, F.; Graves, T.; Fulton, R.; Hayden, H.; Alkan, C.; Malig, M.; Ventura, M.; Giannuzzi, G.; Kallicki, J.; Anderson, P.; Tsalenko, A.; Yamada, N.; Tsang, P.; Kaul, R.; Wilson, R.; Bruhn, L.; Eichler, E.

doi:10.1038/NMETH.1451

The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 new insertion sequences corresponding to 720 genomic loci. We found that a substantial fraction of these sequences are either missing, fragmented or misassigned when compared to recent de novo sequence assemblies from short-read next-generation sequence data. We determined that 18-37% of these new insertions are copy-number polymorphic, including loci that show extensive population stratification among Europeans, Asians and Africans. Complete sequencing of 156 of these insertions identified new exons and conserved noncoding sequences not yet represented in the reference genome. We developed a method to accurately genotype these new insertions by mapping next-generation sequencing datasets to the breakpoint, thereby providing a means to characterize copy-number status for regions previously inaccessible to single-nucleotide polymorphism microarrays.

Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions / J. Kidd, N. Sampas, F. Antonacci, T. Graves, R. Fulton, H. Hayden, C. Alkan, M. Malig, M. Ventura, G. Giannuzzi, J. Kallicki, P. Anderson, A. Tsalenko, N. Yamada, P. Tsang, R. Kaul, R. Wilson, L. Bruhn, E. Eichler. - In: NATURE METHODS. - ISSN 1548-7091. - 7:5(2010), pp. 365-371. [10.1038/NMETH.1451]

Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions

KIDD JM;SAMPAS N;ANTONACCI F;GRAVES T;FULTON R;HAYDEN HS;ALKAN C;MALIG M;VENTURA M;G. Giannuzzi;KALLICKI J;ANDERSON P;TSALENKO A;YAMADA NA;TSANG P;KAUL R;WILSON RK;BRUHN L;EICHLER EE

2010

Abstract

The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 new insertion sequences corresponding to 720 genomic loci. We found that a substantial fraction of these sequences are either missing, fragmented or misassigned when compared to recent de novo sequence assemblies from short-read next-generation sequence data. We determined that 18-37% of these new insertions are copy-number polymorphic, including loci that show extensive population stratification among Europeans, Asians and Africans. Complete sequencing of 156 of these insertions identified new exons and conserved noncoding sequences not yet represented in the reference genome. We developed a method to accurately genotype these new insertions by mapping next-generation sequencing datasets to the breakpoint, thereby providing a means to characterize copy-number status for regions previously inaccessible to single-nucleotide polymorphism microarrays.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore BIO/18 - Genetica
			
	Data di pubblicazione
	
				2010
			
	Rivista in ANCE
	
				NATURE METHODS
			
	DOI
	
				https://dx.doi.org/10.1038/NMETH.1451
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
nmeth.1451.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 1.18 MB Formato Adobe PDF Visualizza/Apri	1.18 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/863062

Citazioni

77

112

103

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca