Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 68
Filter
1.
Proc Natl Acad Sci U S A ; 119(51): e2210773119, 2022 12 20.
Article in English | MEDLINE | ID: mdl-36512494

ABSTRACT

A prevalent and persistent biodiversity concern is that modern cropping systems lead to an erosion in crop genetic diversity. Although certain trait uniformity provides advantages in crop management and marketing, farmers facing risks from change in climate, pests, and markets are also incentivized to adopt new varieties to address complex and spatially variable genetics, environment, and crop management interactions to optimize crop performance. In this study, we applied phylogenetically blind and phylogenetically informed diversity metrics to reveal significant increases in both the spatial and temporal diversity of the US wheat crop over the past century. Contrary to commonly held perceptions on the negative impact of modern cropping systems on crop genetic diversity, our results demonstrated a win-win outcome where the widespread uptake of scientifically selected varieties increased both crop production and crop diversity.


Subject(s)
Crop Production , Triticum , Humans , Triticum/genetics , Farmers , Biodiversity , Agriculture
2.
G3 (Bethesda) ; 12(2)2022 02 04.
Article in English | MEDLINE | ID: mdl-34897429

ABSTRACT

The zebra mussel, Dreissena polymorpha, continues to spread from its native range in Eurasia to Europe and North America, causing billions of dollars in damage and dramatically altering invaded aquatic ecosystems. Despite these impacts, there are few genomic resources for Dreissena or related bivalves. Although the D. polymorpha genome is highly repetitive, we have used a combination of long-read sequencing and Hi-C-based scaffolding to generate a high-quality chromosome-scale genome assembly. Through comparative analysis and transcriptomics experiments, we have gained insights into processes that likely control the invasive success of zebra mussels, including shell formation, synthesis of byssal threads, and thermal tolerance. We identified multiple intact steamer-like elements, a retrotransposon that has been linked to transmissible cancer in marine clams. We also found that D. polymorpha have an unusual 67 kb mitochondrial genome containing numerous tandem repeats, making it the largest observed in Eumetazoa. Together these findings create a rich resource for invasive species research and control efforts.


Subject(s)
Dreissena , Animals , Dreissena/genetics , Ecosystem , Genome , Genomics , Introduced Species
4.
Hortic Res ; 8(1): 202, 2021 Sep 01.
Article in English | MEDLINE | ID: mdl-34465774

ABSTRACT

Pedigree information is of fundamental importance in breeding programs and related genetics efforts. However, many individuals have unknown pedigrees. While methods to identify and confirm direct parent-offspring relationships are routine, those for other types of close relationships have yet to be effectively and widely implemented with plants, due to complications such as asexual propagation and extensive inbreeding. The objective of this study was to develop and demonstrate methods that support complex pedigree reconstruction via the total length of identical by state haplotypes (referred to in this study as "summed potential lengths of shared haplotypes", SPLoSH). A custom Python script, HapShared, was developed to generate SPLoSH data in apple and sweet cherry. HapShared was used to establish empirical distributions of SPLoSH data for known relationships in these crops. These distributions were then used to estimate previously unknown relationships. Case studies in each crop demonstrated various pedigree reconstruction scenarios using SPLoSH data. For cherry, a full-sib relationship was deduced for 'Emperor Francis, and 'Schmidt', a half-sib relationship for 'Van' and 'Windsor', and the paternal grandparents of 'Stella' were confirmed. For apple, 29 cultivars were found to share an unknown parent, the pedigree of the unknown parent of 'Cox's Pomona' was reconstructed, and 'Fameuse' was deduced to be a likely grandparent of 'McIntosh'. Key genetic resources that enabled this empirical study were large genome-wide SNP array datasets, integrated genetic maps, and previously identified pedigree relationships. Crops with similar resources are also expected to benefit from using HapShared for empowering pedigree reconstruction.

5.
Patterns (N Y) ; 1(7): 100105, 2020 Oct 09.
Article in English | MEDLINE | ID: mdl-33205138

ABSTRACT

Heterogeneous and multidisciplinary data generated by research on sustainable global agriculture and agrifood systems requires quality data labeling or annotation in order to be interoperable. As recommended by the FAIR principles, data, labels, and metadata must use controlled vocabularies and ontologies that are popular in the knowledge domain and commonly used by the community. Despite the existence of robust ontologies in the Life Sciences, there is currently no comprehensive full set of ontologies recommended for data annotation across agricultural research disciplines. In this paper, we discuss the added value of the Ontologies Community of Practice (CoP) of the CGIAR Platform for Big Data in Agriculture for harnessing relevant expertise in ontology development and identifying innovative solutions that support quality data annotation. The Ontologies CoP stimulates knowledge sharing among stakeholders, such as researchers, data managers, domain experts, experts in ontology design, and platform development teams.

6.
BMC Res Notes ; 13(1): 71, 2020 Feb 12.
Article in English | MEDLINE | ID: mdl-32051026

ABSTRACT

OBJECTIVES: Advanced tools and resources are needed to efficiently and sustainably produce food for an increasing world population in the context of variable environmental conditions. The maize genomes to fields (G2F) initiative is a multi-institutional initiative effort that seeks to approach this challenge by developing a flexible and distributed infrastructure addressing emerging problems. G2F has generated large-scale phenotypic, genotypic, and environmental datasets using publicly available inbred lines and hybrids evaluated through a network of collaborators that are part of the G2F's genotype-by-environment (G × E) project. This report covers the public release of datasets for 2014-2017. DATA DESCRIPTION: Datasets include inbred genotypic information; phenotypic, climatic, and soil measurements and metadata information for each testing location across years. For a subset of inbreds in 2014 and 2015, yield component phenotypes were quantified by image analysis. Data released are accompanied by README descriptions. For genotypic and phenotypic data, both raw data and a version without outliers are reported. For climatic data, a version calibrated to the nearest airport weather station and a version without outliers are reported. The 2014 and 2015 datasets are updated versions from the previously released files [1] while 2016 and 2017 datasets are newly available to the public.


Subject(s)
Genome, Plant/genetics , Plant Breeding , Zea mays/genetics , Datasets as Topic , Genotype , Phenotype
7.
Nat Commun ; 10(1): 5068, 2019 11 07.
Article in English | MEDLINE | ID: mdl-31699975

ABSTRACT

Parasexuality contributes to diversity and adaptive evolution of haploid (monokaryotic) fungi. However, non-sexual genetic exchange mechanisms are not defined in dikaryotic fungi (containing two distinct haploid nuclei). Newly emerged strains of the wheat stem rust pathogen, Puccinia graminis f. sp. tritici (Pgt), such as Ug99, are a major threat to global food security. Here, we provide genomics-based evidence supporting that Ug99 arose by somatic hybridisation and nuclear exchange between dikaryons. Fully haplotype-resolved genome assembly and DNA proximity analysis reveal that Ug99 shares one haploid nucleus genotype with a much older African lineage of Pgt, with no recombination or chromosome reassortment. These findings indicate that nuclear exchange between dikaryotes can generate genetic diversity and facilitate the emergence of new lineages in asexual fungal populations.


Subject(s)
Basidiomycota/genetics , Genome, Fungal/genetics , Basidiomycota/physiology , Evolution, Molecular , Genetic Variation , Haplotypes , Reproduction , Sequence Homology, Nucleic Acid , Triticum/microbiology
8.
mBio ; 10(6)2019 11 19.
Article in English | MEDLINE | ID: mdl-31744914

ABSTRACT

Fungi dominate the recycling of carbon sequestered in woody biomass. This process of organic turnover was first evolved among "white rot" fungi that degrade lignin to access carbohydrates and later evolved multiple times toward more efficient strategies to selectively target carbohydrates-"brown rot." The brown rot adaption was often explained by mechanisms to deploy reactive oxygen species (ROS) to oxidatively attack wood structures. However, its genetic basis remains unclear, especially in the context of gene contractions of conventional carbohydrate-active enzymes (CAZYs) relative to white rot ancestors. Here, we hypothesized that these apparent gains in brown rot efficiency despite gene losses were due, in part, to upregulation of the retained genes. We applied comparative transcriptomics to multiple species of both rot types grown across a wood wafer to create a gradient of progressive decay and to enable tracking temporal gene expression. Dozens of "decay-stage-dependent" ortho-genes were isolated, narrowing a pool of candidate genes with time-dependent regulation unique to brown rot fungi. A broad comparison of the expression timing of CAZY families indicated a temporal regulatory shift of lignocellulose-oxidizing genes toward early stages in brown rot compared to white rot, enabling the segregation of oxidative treatment ahead of hydrolysis. These key brown rot ROS-generating genes with iron ion binding functions were isolated. Moreover, transcription energy was shifted to be invested on the retained GHs in brown rot fungi to strengthen carbohydrate conversion. Collectively, these results support the hypothesis that gene regulation shifts played a pivotal role in brown rot adaptation.IMPORTANCE Fungi dominate the turnover of wood, Earth's largest pool of aboveground terrestrial carbon. Fungi first evolved this capacity by degrading lignin to access and hydrolyze embedded carbohydrates (white rot). Multiple lineages, however, adapted faster reactive oxygen species (ROS) pretreatments to loosen lignocellulose and selectively extract sugars (brown rot). This brown rot "shortcut" often coincided with losses (>60%) of conventional lignocellulolytic genes, implying that ROS adaptations supplanted conventional pathways. We used comparative transcriptomics to further pursue brown rot adaptations, which illuminated the clear temporal expression shift of ROS genes, as well as the shift toward synthesizing more GHs in brown rot relative to white rot. These imply that gene regulatory shifts, not simply ROS innovations, were key to brown rot fungal evolution. These results not only reveal an important biological shift among these unique fungi, but they may also illuminate a trait that restricts brown rot fungi to certain ecological niches.


Subject(s)
Adaptation, Biological , Biomass , Fungal Proteins/genetics , Fungi/genetics , Fungi/metabolism , Gene Expression Regulation, Fungal , Plants/microbiology , Biodegradation, Environmental , Computational Biology/methods , Gene Expression Profiling , Hydrolysis , Plants/metabolism , Wood/chemistry , Wood/metabolism , Wood/microbiology
9.
Microbiol Resour Announc ; 8(23)2019 Jun 06.
Article in English | MEDLINE | ID: mdl-31171611

ABSTRACT

The hospital-acquired methicillin-resistant Staphylococcus aureus (HA-MRSA) strain WCUH29 has been intensively and widely used as a model system for identification and evaluation of novel antibacterial targets and pathogenicity. In this announcement, we report the complete genome sequence of HA-MRSA WCUH29 (NCIMB 40771).

10.
Mol Genet Metab Rep ; 19: 100464, 2019 Jun.
Article in English | MEDLINE | ID: mdl-30891420

ABSTRACT

Clinical laboratories have adopted next generation sequencing (NGS) as a gold standard for the diagnosis of hereditary disorders because of its analytic accuracy, high throughput, and potential for cost-effectiveness. We describe the implementation of a single broad-based NGS sequencing assay to meet the genetic testing needs at the University of Minnesota. A single hybrid capture library preparation was used for each test ordered, data was informatically blinded to clinically-ordered genes, and identified variants were reviewed and classified by genetic counselors and molecular pathologists. We performed 2509 sequencing tests from August 2012 till December 2017. The diagnostic yield has remained steady at 25%, but the number of variants of uncertain significance (VUS) included in a patient report decreased over time with 50% of the patient reports including at least one VUS in 2012 and only 22% of the patient reports reporting a VUS in 2017 (p = .002). Among the various clinical specialties, the diagnostic yield was highest in dermatology (60% diagnostic yield) and ophthalmology (42% diagnostic yield) while the diagnostic yield was lowest in gastrointestinal diseases and pulmonary diseases (10% detection yield in both specialties). Deletion/duplication analysis was also implemented in a subset of panels ordered, with 9% of samples having a diagnostic finding using the deletion/duplication analysis. We have demonstrated the feasibility of this broad-based NGS platform to meet the needs of our academic institution by aggregating a sufficient sample volume from many individually rare tests and providing a flexible ordering for custom, patient-specific panels.

11.
New Phytol ; 222(3): 1538-1550, 2019 05.
Article in English | MEDLINE | ID: mdl-30664233

ABSTRACT

Symbiotic nitrogen fixation in legumes is mediated by an interplay of signaling processes between plant hosts and rhizobial symbionts. In legumes, several secreted protein families have undergone expansions and play key roles in nodulation. Thus, identifying lineage-specific expansions (LSEs) of nodulation-associated genes can be a strategy to discover candidate gene families. Using bioinformatic tools, we identified 13 LSEs of nodulation-related secreted protein families, each unique to either Glycine, Arachis or Medicago lineages. In the Medicago lineage, nodule-specific Polycystin-1, Lipoxygenase, Alpha Toxin (PLAT) domain proteins (NPDs) expanded to five members. We examined NPD function using CRISPR/Cas9 multiplex genome editing to create Medicago truncatula NPD knockout lines, targeting one to five NPD genes. Mutant lines with differing combinations of NPD gene inactivations had progressively smaller nodules, earlier onset of nodule senescence, or ineffective nodules compared to the wild-type control. Double- and triple-knockout lines showed dissimilar nodulation phenotypes but coincided in upregulation of a DHHC-type zinc finger and an aspartyl protease gene, possible candidates for the observed disturbance of proper nodule function. By postulating that gene family expansions can be used to detect candidate genes, we identified a family of nodule-specific PLAT domain proteins and confirmed that they play a role in successful nodule formation.


Subject(s)
Medicago truncatula/metabolism , Phylogeny , Plant Proteins/chemistry , Plant Proteins/metabolism , Plant Root Nodulation , Root Nodules, Plant/metabolism , Amino Acid Sequence , Gene Expression Regulation, Plant , Genotype , Medicago truncatula/genetics , Medicago truncatula/microbiology , Phenotype , Plant Root Nodulation/genetics , Protein Domains , Rhizobium/physiology , Root Nodules, Plant/microbiology
12.
Nanoscale ; 11(7): 3112-3116, 2019 Feb 14.
Article in English | MEDLINE | ID: mdl-30556551

ABSTRACT

A highly conductive graphene derivative was produced by using a low-defect form of graphene oxide, oxo-G, in conjunction with voltage-reduction, a simple and environmentally-benign procedure for removing oxygen-containing functional groups. A low temperature coefficient of resistance was achieved, making this material promising for temperature-stable electronics and sensors.

13.
Cancer Res ; 78(9): 2343-2355, 2018 05 01.
Article in English | MEDLINE | ID: mdl-29437708

ABSTRACT

Tumor-associated macrophages (TAM) play a critical role in cancer development and progression. However, the heterogeneity of TAM presents a major challenge to identify clinically relevant markers for protumor TAM. Here, we report that expression of adipocyte/macrophage fatty acid-binding protein (A-FABP) in TAM promotes breast cancer progression. Although upregulation of A-FABP was inversely associated with breast cancer survival, deficiency of A-FABP significantly reduced mammary tumor growth and metastasis. Furthermore, the protumor effect of A-FABP was mediated by TAM, in particular, in a subset of TAM with a CD11b+F4/80+MHCII-Ly6C- phenotype. A-FABP expression in TAM facilitated protumor IL6/STAT3 signaling through regulation of the NFκB/miR-29b pathway. Collectively, our results suggest A-FABP as a new functional marker for protumor TAM.Significance: These findings identify A-FABP as a functional marker for protumor macrophages, thus offering a new target for tumor immunotherapy. Cancer Res; 78(9); 2343-55. ©2018 AACR.


Subject(s)
Breast Neoplasms/genetics , Breast Neoplasms/pathology , Fatty Acid-Binding Proteins/genetics , Gene Expression Regulation, Neoplastic , Macrophages/metabolism , Animals , Biomarkers, Tumor , Cell Line, Tumor , Cytokines/metabolism , Disease Models, Animal , Disease Progression , Fatty Acid-Binding Proteins/metabolism , Female , Humans , Immunohistochemistry , Macrophages/pathology , Mice , Mice, Knockout , MicroRNAs/genetics , MicroRNAs/metabolism , NF-kappa B/metabolism , Neoplasm Metastasis
14.
BMC Genomics ; 18(1): 578, 2017 08 04.
Article in English | MEDLINE | ID: mdl-28778149

ABSTRACT

BACKGROUND: Third generation sequencing technologies, with sequencing reads in the tens- of kilo-bases, facilitate genome assembly by spanning ambiguous regions and improving continuity. This has been critical for plant genomes, which are difficult to assemble due to high repeat content, gene family expansions, segmental and tandem duplications, and polyploidy. Recently, high-throughput mapping and scaffolding strategies have further improved continuity. Together, these long-range technologies enable quality draft assemblies of complex genomes in a cost-effective and timely manner. RESULTS: Here, we present high quality genome assemblies of the model legume plant, Medicago truncatula (R108) using PacBio, Dovetail Chicago (hereafter, Dovetail) and BioNano technologies. To test these technologies for plant genome assembly, we generated five assemblies using all possible combinations and ordering of these three technologies in the R108 assembly. While the BioNano and Dovetail joins overlapped, they also showed complementary gains in continuity and join numbers. Both technologies spanned repetitive regions that PacBio alone was unable to bridge. Combining technologies, particularly Dovetail followed by BioNano, resulted in notable improvements compared to Dovetail or BioNano alone. A combination of PacBio, Dovetail, and BioNano was used to generate a high quality draft assembly of R108, a M. truncatula accession widely used in studies of functional genomics. As a test for the usefulness of the resulting genome sequence, the new R108 assembly was used to pinpoint breakpoints and characterize flanking sequence of a previously identified translocation between chromosomes 4 and 8, identifying more than 22.7 Mb of novel sequence not present in the earlier A17 reference assembly. CONCLUSIONS: Adding Dovetail followed by BioNano data yielded complementary improvements in continuity over the original PacBio assembly. This strategy proved efficient and cost-effective for developing a quality draft assembly compared to traditional reference assemblies.


Subject(s)
Genomics/methods , Genomics/standards , Medicago truncatula/genetics , Chromosomes, Plant/genetics , Cost-Benefit Analysis , Genome, Plant/genetics , Genomics/economics , Quality Control , Reference Standards , Time Factors
15.
BMC Bioinformatics ; 18(1): 367, 2017 Aug 10.
Article in English | MEDLINE | ID: mdl-28797229

ABSTRACT

BACKGROUND: Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. RESULTS: The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. CONCLUSIONS: ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.


Subject(s)
Databases, Nucleic Acid , Genomics , Software , Molecular Sequence Annotation
16.
BMC Genomics ; 18(1): 541, 2017 07 19.
Article in English | MEDLINE | ID: mdl-28724409

ABSTRACT

BACKGROUND: Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome sequencing projects. Combinations of both may be appropriate for surveys of within-species genomic variation. METHODS: We developed a hybrid assembly pipeline called "Alpaca" that can operate on 20X long-read coverage plus about 50X short-insert and 50X long-insert short-read coverage. To preclude collapse of tandem repeats, Alpaca relies on base-call-corrected long reads for contig formation. RESULTS: Compared to two other assembly protocols, Alpaca demonstrated the most reference agreement and repeat capture on the rice genome. On three accessions of the model legume Medicago truncatula, Alpaca generated the most agreement to a conspecific reference and predicted tandemly repeated genes absent from the other assemblies. CONCLUSION: Our results suggest Alpaca is a useful tool for investigating structural and copy number variation within de novo assemblies of sampled populations.


Subject(s)
Genes, Plant/genetics , Genomics/methods , DNA Copy Number Variations , Medicago truncatula/genetics , Multigene Family/genetics , Oryza/genetics , Phenotype , Tandem Repeat Sequences/genetics
17.
Clin Cancer Res ; 23(16): 4704-4715, 2017 Aug 15.
Article in English | MEDLINE | ID: mdl-28473535

ABSTRACT

Purpose: Androgen receptor (AR) variant AR-V7 is a ligand-independent transcription factor that promotes prostate cancer resistance to AR-targeted therapies. Accordingly, efforts are under way to develop strategies for monitoring and inhibiting AR-V7 in castration-resistant prostate cancer (CRPC). The purpose of this study was to understand whether other AR variants may be coexpressed with AR-V7 and promote resistance to AR-targeted therapies.Experimental Design: We utilized complementary short- and long-read sequencing of intact AR mRNA isoforms to characterize AR expression in CRPC models. Coexpression of AR-V7 and AR-V9 mRNA in CRPC metastases and circulating tumor cells was assessed by RNA-seq and RT-PCR, respectively. Expression of AR-V9 protein in CRPC models was evaluated with polyclonal antisera. Multivariate analysis was performed to test whether AR variant mRNA expression in metastatic tissues was associated with a 12-week progression-free survival endpoint in a prospective clinical trial of 78 CRPC-stage patients initiating therapy with the androgen synthesis inhibitor, abiraterone acetate.Results: AR-V9 was frequently coexpressed with AR-V7. Both AR variant species were found to share a common 3' terminal cryptic exon, which rendered AR-V9 susceptible to experimental manipulations that were previously thought to target AR-V7 uniquely. AR-V9 promoted ligand-independent growth of prostate cancer cells. High AR-V9 mRNA expression in CRPC metastases was predictive of primary resistance to abiraterone acetate (HR = 4.0; 95% confidence interval, 1.31-12.2; P = 0.02).Conclusions: AR-V9 may be an important component of therapeutic resistance in CRPC. Clin Cancer Res; 23(16); 4704-15. ©2017 AACR.


Subject(s)
Androstenes/therapeutic use , Gene Expression Regulation, Neoplastic/drug effects , Genetic Variation , Prostatic Neoplasms, Castration-Resistant/drug therapy , Receptors, Androgen/genetics , Cell Line, Tumor , Cell Proliferation/genetics , Disease-Free Survival , Drug Resistance, Neoplasm/genetics , Humans , Male , Neoplasm Metastasis , Prospective Studies , Prostatic Neoplasms, Castration-Resistant/genetics , Prostatic Neoplasms, Castration-Resistant/metabolism , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA Interference , Receptors, Androgen/metabolism
18.
Nat Immunol ; 18(6): 694-704, 2017 06.
Article in English | MEDLINE | ID: mdl-28369050

ABSTRACT

The transcription factor STAT5 has a critical role in B cell acute lymphoblastic leukemia (B-ALL). How STAT5 mediates this effect is unclear. Here we found that activation of STAT5 worked together with defects in signaling components of the precursor to the B cell antigen receptor (pre-BCR), including defects in BLNK, BTK, PKCß, NF-κB1 and IKAROS, to initiate B-ALL. STAT5 antagonized the transcription factors NF-κB and IKAROS by opposing regulation of shared target genes. Super-enhancers showed enrichment for STAT5 binding and were associated with an opposing network of transcription factors, including PAX5, EBF1, PU.1, IRF4 and IKAROS. Patients with a high ratio of active STAT5 to NF-κB or IKAROS had more-aggressive disease. Our studies indicate that an imbalance of two opposing transcriptional programs drives B-ALL and suggest that restoring the balance of these pathways might inhibit B-ALL.


Subject(s)
Adaptor Proteins, Signal Transducing/genetics , B-Lymphocytes , Gene Expression Regulation, Neoplastic , Ikaros Transcription Factor/genetics , Pre-B Cell Receptors/genetics , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , STAT5 Transcription Factor/metabolism , Agammaglobulinaemia Tyrosine Kinase , Animals , Chromatin Immunoprecipitation , Flow Cytometry , Humans , Interferon Regulatory Factors/genetics , Mice , Multiplex Polymerase Chain Reaction , NF-kappa B p50 Subunit/genetics , PAX5 Transcription Factor/genetics , Precursor Cell Lymphoblastic Leukemia-Lymphoma/metabolism , Precursor Cell Lymphoblastic Leukemia-Lymphoma/mortality , Prognosis , Protein Kinase C beta/genetics , Protein-Tyrosine Kinases/genetics , Proto-Oncogene Proteins/genetics , Real-Time Polymerase Chain Reaction , Signal Transduction , Survival Rate , Trans-Activators/genetics
19.
BMC Genomics ; 18(1): 261, 2017 03 27.
Article in English | MEDLINE | ID: mdl-28347275

ABSTRACT

BACKGROUND: Previous studies exploring sequence variation in the model legume, Medicago truncatula, relied on mapping short reads to a single reference. However, read-mapping approaches are inadequate to examine large, diverse gene families or to probe variation in repeat-rich or highly divergent genome regions. De novo sequencing and assembly of M. truncatula genomes enables near-comprehensive discovery of structural variants (SVs), analysis of rapidly evolving gene families, and ultimately, construction of a pan-genome. RESULTS: Genome-wide synteny based on 15 de novo M. truncatula assemblies effectively detected different types of SVs indicating that as much as 22% of the genome is involved in large structural changes, altogether affecting 28% of gene models. A total of 63 million base pairs (Mbp) of novel sequence was discovered, expanding the reference genome space for Medicago by 16%. Pan-genome analysis revealed that 42% (180 Mbp) of genomic sequences is missing in one or more accession, while examination of de novo annotated genes identified 67% (50,700) of all ortholog groups as dispensable - estimates comparable to recent studies in rice, maize and soybean. Rapidly evolving gene families typically associated with biotic interactions and stress response were found to be enriched in the accession-specific gene pool. The nucleotide-binding site leucine-rich repeat (NBS-LRR) family, in particular, harbors the highest level of nucleotide diversity, large effect single nucleotide change, protein diversity, and presence/absence variation. However, the leucine-rich repeat (LRR) and heat shock gene families are disproportionately affected by large effect single nucleotide changes and even higher levels of copy number variation. CONCLUSIONS: Analysis of multiple M. truncatula genomes illustrates the value of de novo assemblies to discover and describe structural variation, something that is often under-estimated when using read-mapping approaches. Comparisons among the de novo assemblies also indicate that different large gene families differ in the architecture of their structural variation.


Subject(s)
DNA Copy Number Variations/genetics , Genome, Plant , Medicago truncatula/genetics , Comparative Genomic Hybridization , Heat-Shock Proteins/genetics , High-Throughput Nucleotide Sequencing , Leucine-Rich Repeat Proteins , Plant Proteins/genetics , Proteins/genetics , RNA, Plant/chemistry , RNA, Plant/isolation & purification , RNA, Plant/metabolism , Sequence Alignment , Sequence Analysis, DNA
20.
J Mol Diagn ; 18(6): 872-881, 2016 11.
Article in English | MEDLINE | ID: mdl-27597741

ABSTRACT

Simultaneous detection of small copy number variations (CNVs) (<0.5 kb) and single-nucleotide variants in clinically significant genes is of great interest for clinical laboratories. The analytical variability in next-generation sequencing (NGS) and artifacts in coverage data because of issues with mappability along with lack of robust bioinformatics tools for CNV detection have limited the utility of targeted NGS data to identify CNVs. We describe the development and implementation of a bioinformatics algorithm, copy number variation-random forest (CNV-RF), that incorporates a machine learning component to identify CNVs from targeted NGS data. Using CNV-RF, we identified 12 of 13 deletions in samples with known CNVs, two cases with duplications, and identified novel deletions in 22 additional cases. Furthermore, no CNVs were identified among 60 genes in 14 cases with normal copy number and no CNVs were identified in another 104 patients with clinical suspicion of CNVs. All positive deletions and duplications were confirmed using a quantitative PCR method. CNV-RF also detected heterozygous deletions and duplications with a specificity of 50% across 4813 genes. The ability of CNV-RF to detect clinically relevant CNVs with a high degree of sensitivity along with confirmation using a low-cost quantitative PCR method provides a framework for providing comprehensive NGS-based CNV/single-nucleotide variant detection in a clinical molecular diagnostics laboratory.


Subject(s)
DNA Copy Number Variations , Genetic Testing , High-Throughput Nucleotide Sequencing , Algorithms , Computational Biology/methods , Female , Gene Deletion , Gene Duplication , Genetic Markers , Genetic Testing/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Male , Real-Time Polymerase Chain Reaction , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...