Gabor Marth
Citations
All
Search in:AllTitleAbstractAuthor name
Publications
(18)
Patents
Grants
Pathways
Clinical trials
Publication
Journal: Bioinformatics
January/13/2010
Abstract
CONCLUSIONS
The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
BACKGROUND
http://samtools.sourceforge.net.
Pulse
Views:
19
Posts:
No posts
Rating:
Not rated
Publication
Journal: Nature
October/25/2015
Abstract
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.
Publication
Journal: Science
October/28/2013
Abstract
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations ("ultrasensitive") and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, "motif-breakers"). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
Publication
Journal: American Journal of Human Genetics
October/27/2002
Abstract
We report the identification and characterization of 2,000 human diallelic insertion/deletion polymorphisms (indels) distributed throughout the human genome. Candidate indels were identified by comparison of overlapping genomic or cDNA sequences. Average confirmation rate for indels with a>> or =2-nt allele-length difference was 58%, but the confirmation rate for indels with a 1-nt length difference was only 14%. The vast majority of the human diallelic indels were monomorphic in chimpanzees and gorillas. The ratio of deletionrcolon;insertion mutations was 4.1. Allele frequencies for the indels were measured in Europeans, Africans, Japanese, and Native Americans. New alleles were generally lower in frequency than old alleles. This tendency was most pronounced for the Africans, who are likely to be closest among the four groups to the original modern human population. Diallelic indels comprise approximately 8% of all human polymorphisms. Their abundance and ease of analysis make them useful for many applications.
Publication
Journal: Genome Research
October/30/2008
Abstract
The emergence of high-throughput next-generation sequencing technologies (e.g., 454 Life Sciences [Roche], Illumina sequencing [formerly Solexa sequencing]) has dramatically sped up whole-genome de novo sequencing and resequencing. While the low cost of these sequencing technologies provides an unparalleled opportunity for genome-wide polymorphism discovery, the analysis of the new data types and huge data volume poses formidable informatics challenges for base calling, read alignment and genome assembly, polymorphism detection, as well as data visualization. We introduce a new data integration and visualization tool EagleView to facilitate data analyses, visual validation, and hypothesis generation. EagleView can handle a large genome assembly of millions of reads. It supports a compact assembly view, multiple navigation modes, and a pinpoint view of technology-specific trace information. Moreover, EagleView supports viewing coassembly of mixed-type reads from different technologies and supports integrating genome feature annotations into genome assemblies. EagleView has been used in our own lab and by over 100 research labs worldwide for next-generation sequence analyses. The EagleView software is freely available for not-for-profit use at http://bioinformatics.bc.edu/marthlab/EagleView.
Publication
Journal: Genome Biology
July/9/2017
Abstract
High-throughput sequencing enables unbiased profiling of microbial communities, universal pathogen detection, and host response to infectious diseases. However, computation times and algorithmic inaccuracies have hindered adoption.
We present Taxonomer, an ultrafast, web-tool for comprehensive metagenomics data analysis and interactive results visualization. Taxonomer is unique in providing integrated nucleotide and protein-based classification and simultaneous host messenger RNA (mRNA) transcript profiling. Using real-world case-studies, we show that Taxonomer detects previously unrecognized infections and reveals antiviral host mRNA expression profiles. To facilitate data-sharing across geographic distances in outbreak settings, Taxonomer is publicly available through a web-based user interface.
Taxonomer enables rapid, accurate, and interactive analyses of metagenomics data on personal computers and mobile devices.
Publication
Journal: Nature Communications
November/13/2017
Abstract
Metastatic breast cancer remains challenging to treat, and most patients ultimately progress on therapy. This acquired drug resistance is largely due to drug-refractory sub-populations (subclones) within heterogeneous tumors. Here, we track the genetic and phenotypic subclonal evolution of four breast cancers through years of treatment to better understand how breast cancers become drug-resistant. Recurrently appearing post-chemotherapy mutations are rare. However, bulk and single-cell RNA sequencing reveal acquisition of malignant phenotypes after treatment, including enhanced mesenchymal and growth factor signaling, which may promote drug resistance, and decreased antigen presentation and TNF-α signaling, which may enable immune system avoidance. Some of these phenotypes pre-exist in pre-treatment subclones that become dominant after chemotherapy, indicating selection for resistance phenotypes. Post-chemotherapy cancer cells are effectively treated with drugs targeting acquired phenotypes. These findings highlight cancer's ability to evolve phenotypically and suggest a phenotype-targeted treatment strategy that adapts to cancer as it evolves.
Publication
Journal: Proceedings of the National Academy of Sciences of the United States of America
February/20/2003
Abstract
Single-nucleotide polymorphisms (SNPs) constitute the great majority of variations in the human genome, and as heritable variable landmarks they are useful markers for disease mapping and resolving population structure. Redundant coverage in overlaps of large-insert genomic clones, sequenced as part of the Human Genome Project, comprises a quarter of the genome, and it is representative in terms of base compositional and functional sequence features. We mined these regions to produce 500,000 high-confidence SNP candidates as a uniform resource for describing nucleotide diversity and its regional variation within the genome. Distributions of marker density observed at different overlap length scales under a model of recombination and population size change show that the history of the population represented by the public genome sequence is one of collapse followed by a recent phase of mild size recovery. The inferred times of collapse and recovery are Upper Paleolithic, in agreement with archaeological evidence of the initial modern human colonization of Europe.
Publication
Journal: PLoS Pathogens
August/30/2015
Abstract
The simultaneous targeting of host and pathogen processes represents an untapped approach for the treatment of intracellular infections. Hypoxia-inducible factor-1 (HIF-1) is a host cell transcription factor that is activated by and required for the growth of the intracellular protozoan parasite Toxoplasma gondii at physiological oxygen levels. Parasite activation of HIF-1 is blocked by inhibiting the family of closely related Activin-Like Kinase (ALK) host cell receptors ALK4, ALK5, and ALK7, which was determined in part by use of an ALK4,5,7 inhibitor named SB505124. Besides inhibiting HIF-1 activation, SB505124 also potently blocks parasite replication under normoxic conditions. To determine whether SB505124 inhibition of parasite growth was exclusively due to inhibition of ALK4,5,7 or because the drug inhibited a second kinase, SB505124-resistant parasites were isolated by chemical mutagenesis. Whole-genome sequencing of these mutants revealed mutations in the Toxoplasma MAP kinase, TgMAPK1. Allelic replacement of mutant TgMAPK1 alleles into wild-type parasites was sufficient to confer SB505124 resistance. SB505124 independently impacts TgMAPK1 and ALK4,5,7 signaling since drug resistant parasites could not activate HIF-1 in the presence of SB505124 or grow in HIF-1 deficient cells. In addition, TgMAPK1 kinase activity is inhibited by SB505124. Finally, mice treated with SB505124 had significantly lower tissue burdens following Toxoplasma infection. These data therefore identify SB505124 as a novel small molecule inhibitor that acts by inhibiting two distinct targets, host HIF-1 and TgMAPK1.
Publication
Journal: BMC Genomics
July/24/2003
Abstract
BACKGROUND
Short tandem repeat polymorphisms (STRPs) are powerful tools for gene mapping and other applications. A STRP genome scan of 10 cM is usually adequate for mapping single gene disorders. However mapping studies involving genetically complex disorders and especially association (linkage disequilibrium) often require higher STRP density.
RESULTS
We report the development of two separate 10 cM human STRP Screening Sets (Sets 12 and 52) which span all chromosomes. When combined, the two Sets contain a total of 782 STRPs, with average STRP spacing of 4.8 cM, average heterozygosity of 0.72, and total sex-average coverage of 3535 cM. The current Sets are comprised almost entirely of STRPs based on tri- and tetranucleotide repeats. We also report correction of primer sequences for many STRPs used in previous Screening Sets. Detailed information for the new Screening Sets is available from our web site: http://research.marshfieldclinic.org/genetics.
CONCLUSIONS
Our new human STRP Screening Sets will improve the quality and cost effectiveness of genotyping for gene mapping and other applications.
Publication
Journal: Bioinformatics
August/4/2013
Abstract
BACKGROUND
High-throughput biological research requires simultaneous visualization as well as analysis of genomic data, e.g. read alignments, variant calls and genomic annotations. Traditionally, such integrative analysis required desktop applications operating on locally stored data. Many current terabyte-size datasets generated by large public consortia projects, however, are already only feasibly stored at specialist genome analysis centers. As even small laboratories can afford very large datasets, local storage and analysis are becoming increasingly limiting, and it is likely that most such datasets will soon be stored remotely, e.g. in the cloud. These developments will require web-based tools that enable users to access, analyze and view vast remotely stored data with a level of sophistication and interactivity that approximates desktop applications. As rapidly dropping cost enables researchers to collect data intended to answer questions in very specialized contexts, developers must also provide software libraries that empower users to implement customized data analyses and data views for their particular application. Such specialized, yet lightweight, applications would empower scientists to better answer specific biological questions than possible with general-purpose genome browsers currently available.
RESULTS
Using recent advances in core web technologies (HTML5), we developed Scribl, a flexible genomic visualization library specifically targeting coordinate-based data such as genomic features, DNA sequence and genetic variants. Scribl simplifies the development of sophisticated web-based graphical tools that approach the dynamism and interactivity of desktop applications.
METHODS
Software is freely available online at http://chmille4.github.com/Scribl/ and is implemented in JavaScript with all modern browsers supported.
Publication
Journal: Genetics in Medicine
February/14/2019
Abstract
EPHB4 variants were recently reported to cause capillary malformation-arteriovenous malformation 2 (CM-AVM2). CM-AVM2 mimics RASA1-related CM-AVM1 and hereditary hemorrhagic telangiectasia (HHT), as clinical features include capillary malformations (CMs), telangiectasia, and arteriovenous malformations (AVMs). Epistaxis, another clinical feature that overlaps with HHT, was reported in several cases. Based on the clinical overlap of CM-AVM2 and HHT, we hypothesized that patients considered clinically suspicious for HHT with no variant detected in an HHT gene (ENG, ACVRL1, or SMAD4) may have an EPHB4 variant.Exome sequencing or a next-generation sequencing panel including EPHB4 was performed on individuals with previously negative molecular genetic testing for the HHT genes and/or RASA1.An EPHB4 variant was identified in ten unrelated cases. Seven cases had a pathogenic EPHB4 variant, including one with mosaicism. Three cases had an EPHB4 variant of uncertain significance. The majority had epistaxis (6/10 cases) and telangiectasia (8/10 cases), as well as CMs. Two of ten cases had a central nervous system AVM.Our results emphasize the importance of considering CM-AVM2 as part of the clinical differential for HHT and other vascular malformation syndromes. Yet, these cases highlight significant differences in the cutaneous presentations of CM-AVM2 versus HHT.
Publication
Journal: JAMA Otolaryngology - Head and Neck Surgery
July/6/2017
Abstract
Sensorineural hearing loss (SNHL) is commonly caused by conditions that affect cochlear structures or the auditory nerve, and the genes identified as causing SNHL to date only explain a fraction of the overall genetic risk for this debilitating disorder. It is likely that other genes and mutations also cause SNHL.
To identify a candidate gene that causes bilateral, symmetric, progressive SNHL in a large multigeneration family of Northern European descent.
In this prospective genotype and phenotype study performed from January 1, 2006, through April 1, 2016, a 6-generation family of Northern European descent with 19 individuals having reported early-onset hearing loss suggestive of an autosomal dominant inheritance were studied at a tertiary academic medical center. In addition, 179 unrelated adult individuals with SNHL and 186 adult individuals reporting nondeafness were examined.
Sensorineural hearing loss.
Nine family members (5 women [55.6%]) provided clinical audiometric and medical records that documented hearing loss. The hearing loss is characterized as bilateral, symmetric, progressive SNHL that reached severe to profound loss in childhood. Audiometric configurations demonstrated a characteristic dip at 1000 to 2000 Hz. All affected family members wear hearing aids or have undergone cochlear implantation. Exome sequencing and linkage and association analyses identified a fully penetrant sequence variant (rs35725509) on chromosome 12q21 (logarithm of odds, 3.3) in the TMTC2 gene region that segregates with SNHL in this family. This gene explains the SNHL occurrence in this family. The variant is also associated with SNHL in a cohort of 363 unrelated individuals (179 patients with confirmed SNHL and 184 controls, P = 7 × 10-4).
A previously uncharacterized gene, TMTC2, has been identified as a candidate for causing progressive SNHL in humans. This finding identifies a novel locus that causes autosomal dominant SNHL and therefore a more detailed understanding of the genetic basis of SNHL. Because TMTC2 has not been previously reported to regulate auditory function, the discovery reveals a potentially new, uncharacterized mechanism of hearing loss.
Publication
Journal: BMC Genomics
November/24/2015
Abstract
BACKGROUND
Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls.
RESULTS
This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%.
CONCLUSIONS
In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.
Publication
Journal: Nature Communications
November/9/2018
Abstract
The originally published version of this Article contained an error in Figure 4. In panel a, grey boxes surrounding the subclones associated with patients #2 and #4 obscured adjacent portions of the heatmap. This error has now been corrected in both the PDF and HTML versions of the Article.
Publication
Journal: Journal of Molecular Diagnostics
May/11/2017
Abstract
A national workgroup convened by the Centers for Disease Control and Prevention identified principles and made recommendations for standardizing the description of sequence data contained within the variant file generated during the course of clinical next-generation sequence analysis for diagnosing human heritable conditions. The specifications for variant files were initially developed to be flexible with regard to content representation to support a variety of research applications. This flexibility permits variation with regard to how sequence findings are described and this depends, in part, on the conventions used. For clinical laboratory testing, this poses a problem because these differences can compromise the capability to compare sequence findings among laboratories to confirm results and to query databases to identify clinically relevant variants. To provide for a more consistent representation of sequence findings described within variant files, the workgroup made several recommendations that considered alignment to a common reference sequence, variant caller settings, use of genomic coordinates, and gene and variant naming conventions. These recommendations were considered with regard to the existing variant file specifications presently used in the clinical setting. Adoption of these recommendations is anticipated to reduce the potential for ambiguity in describing sequence findings and facilitate the sharing of genomic data among clinical laboratories and other entities.
Publication
Journal: BMC Medical Genomics
December/12/2019
Abstract
When ordering genetic testing or triaging candidate variants in exome and genome sequencing studies, it is critical to generate and test a comprehensive list of candidate genes that succinctly describe the complete and objective phenotypic features of disease. Significant efforts have been made to curate gene:disease associations both in academic research and commercial genetic testing laboratory settings. However, many of these valuable resources exist as islands and must be used independently, generating static, single-resource gene:disease association lists. Here we describe genepanel.iobio (https://genepanel.iobio.io) an easy to use, free and open-source web tool for generating disease- and phenotype-associated gene lists from multiple gene:disease association resources, including the NCBI Genetic Testing Registry (GTR), Phenolyzer, and the Human Phenotype Ontology (HPO). We demonstrate the utility of genepanel.iobio by applying it to complex, rare and undiagnosed disease cases that had reached a diagnostic conclusion. We find that genepanel.iobio is able to correctly prioritize the gene containing the diagnostic variant in roughly half of these challenging cases. Importantly, each component resource contributed diagnostic value, showing the benefits of this aggregate approach. We expect genepanel.iobio will improve the ease and diagnostic value of generating gene:disease association lists for genetic test ordering and whole genome or exome sequencing variant prioritization.
Publication
Journal: Journal of Medical Genetics
November/22/2018
Abstract
BACKGROUND
Hereditary haemorrhagic telangiectasia (HHT) is a genetically heterogeneous disorder caused by mutations in the genes ENG, ACVRL1, and SMAD4. Yet the genetic cause remains unknown for some families even after exhaustive exome analysis. We hypothesised that non-coding regions of the known HHT genes may harbour variants that disrupt splicing in these cases.
METHODS
DNA from 35 individuals with clinical findings of HHT and 2 healthy controls from 13 families underwent whole genome sequencing. Additionally, 87 unrelated cases suspected to have HHT were evaluated using a custom designed next-generation sequencing panel to capture the coding and non-coding regions of ENG, ACVRL1 and SMAD4. Individuals from both groups had tested negative previously for a mutation in the coding region of known HHT genes. Samples were sequenced on a HiSeq2500 instrument and data were analysed to identify novel and rare variants.
RESULTS
Eight cases had a novel non-coding ACVRL1 variant that disrupted splicing. One family had an ACVRL1intron 9:chromosome 3 translocation, the first reported case of a translocation causing HHT. The other seven cases had a variant located within a ~300 bp CT-rich 'hotspot' region of ACVRL1intron 9 that disrupted splicing.
CONCLUSIONS
Despite the difficulty of interpreting deep intronic variants, our study highlights the importance of non-coding regions in the disease mechanism of HHT, particularly the CT-rich hotspot region of ACVRL1intron 9. The addition of this region to HHT molecular diagnostic testing algorithms will improve clinical sensitivity.
Related with