Past few years to identify structural variants by paired end mappingPast few years to identify

Past few years to identify structural variants by paired end mapping
Past few years to identify structural variants by paired end mapping [14-18] and [19] review many of the recent Pan-RAS-IN-1 biological activity techniques for accomplishing this goal. In addition, when the sequencing coverage is high, the number of aligned reads [20] or concordant pairs [21] provides an estimate of the number of copies of segments of the cancer genome.In this paper we address the problem of reconstructing the organization of the cancer genome(s) present in a cancer DNA sample from the adjacencies and copy number information revealed by the concordant and discordant pairs from a paired-end resequencing approach. We define the Copy Number and Adjacency Genome Reconstruction Problem, a general formulation of the problem which we solve as a convex optimization problem. Our approach adapts and generalizes techniques that have been employed previously in genome assembly [22-24], ancestral genome reconstruction and genome rearrangement PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/28667899 analysis in the presence of duplicated genes [25], and prediction of copy number variants [26]. In contrast to these works, we focus on the particular features and challenges of cancer genome reconstruction including a broad class of rearrangements, aneuploidy, heterogeneity, and the availability of an “ancestral” reference genome. We apply our algorithm, called Paired-end Reconstruction of Genome Organization (PREGO), for solving the Copy Number and Adjacency Genome Reconstruction Problem to simulated cancer genome data and to real sequencing data from 5 ovarian cancer genomes from The Cancer Genome Atlas (TCGA). We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B) cycles.MethodsIntervals, adjacencies, and cancer genome reconstructionSuppose the cancer genome is derived from the germline genome through a series of somatic rearrangements. We perform paired-end DNA sequencing on a cancer DNA sample S . We assume that the sample S contains a genome sequence derived from the reference genome through some series of somatic structural rearrangements of blocks of DNA (we are not considering single nucleotide mutations). From the alignments of paired reads to the reference genome, we derive three pieces of information. First, we derive a partition of the reference genome into a sequence of intervals I = (I 1 , I 2 , …, I n ). Each interval I j = [s j , t j ] is the DNA segment from the positive strand of the reference genome that starts at coordinate sj and ends at coordinate t j . Since intervals also appear in the opposite direction in a cancer genome (e.g. due to an inversion), we denote by I-j = [tj, sj] the inverted DNA segment. Second, concurrently with the definition of I, we derive a set A of novel adjacencies in the cancer genome. Each adjacency (Ij, Ik) indicates that the end tj of interval Ij is adjacent to the start sk of interval Ik inOesper et al. BMC Bioinformatics 2012, 13(Suppl 6):S10 http://www.biomedcentral.com/1471-2105/13/S6/SPage 3 ofthe cancer genome. Thus A j, k ?, ?, . . . , . The partition I and associated set of adjacencies A are obtained by clustering discordant paired reads whose distance or orientation suggest a rearrangement in the cancer genome [13]. Any existing algorithm can be used to create such input and therefore, the decision about what data to use (i.e. ambigu.