Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Environ Monit Assess ; 195(10): 1161, 2023 Sep 07.
Article in English | MEDLINE | ID: mdl-37676354

ABSTRACT

Biodiversity loss on agricultural land is a major concern. Comprehensive monitoring is needed to quantify the ongoing changes and assess the effectiveness of agri-environmental measures. However, current approaches to monitoring biodiversity on agricultural land are limited in their ability to capture the complex pattern of species and habitats. Using a real-world example of plant and habitat monitoring on Swiss agricultural land, we show how meaningful and efficient sampling can be achieved at the relevant scales. The multi-stage sampling design of this approach uses unequal probability sampling in combination with intermediate small-scale habitat sampling to ensure broad representation of regions, landscape types, and plant species. To achieve broad coverage of temporary agri-environmental measures, the baseline survey on permanent plots is complemented by dynamic sampling of these specific areas. Sampling efficiency and practicality are ensured at all stages of sampling through modern sampling techniques, such as unequal probability sampling with fixed sample size, self-weighting, spatial spreading, balancing on additional information, and stratified balancing. In this way, the samples are well distributed across ecological and geographic space. Despite the high complexity of the sampling design, simple estimators are provided. The effects of stratified balancing and clustering of samples are demonstrated in Monte Carlo simulations using modelled habitat data. A power analysis based on actual survey data is also presented. Overall, the study could serve as a useful example for improving future biodiversity monitoring networks on agricultural land at multiple scales.


Subject(s)
Biodiversity , Environmental Monitoring , Agriculture , Cluster Analysis , Monte Carlo Method
2.
BMC Bioinformatics ; 11 Suppl 12: S6, 2010 Dec 21.
Article in English | MEDLINE | ID: mdl-21210985

ABSTRACT

BACKGROUND: An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. METHODS: This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. RESULTS: A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. CONCLUSION: WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data.


Subject(s)
DNA/chemistry , Genomics/methods , Regulatory Sequences, Nucleic Acid , Software , Algorithms , Arabidopsis/genetics , Genome, Plant , Markov Chains , Sequence Analysis, DNA
3.
BMC Genomics ; 10: 463, 2009 Oct 08.
Article in English | MEDLINE | ID: mdl-19814816

ABSTRACT

BACKGROUND: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression. RESULTS: Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others. CONCLUSION: Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome.


Subject(s)
Arabidopsis/genetics , Computational Biology/methods , Genome, Plant , Models, Statistical , 3' Untranslated Regions , 5' Untranslated Regions , DNA, Plant/genetics , Gene Expression Regulation, Plant , Introns , Markov Chains , Promoter Regions, Genetic , Sequence Analysis, DNA
4.
BMC Genomics ; 10 Suppl 1: S18, 2009 Jul 07.
Article in English | MEDLINE | ID: mdl-19594877

ABSTRACT

BACKGROUND: DNA repair genes provide an important contribution towards the surveillance and repair of DNA damage. These genes produce a large network of interacting proteins whose mRNA expression is likely to be regulated by similar regulatory factors. Full characterization of promoters of DNA repair genes and the similarities among them will more fully elucidate the regulatory networks that activate or inhibit their expression. To address this goal, the authors introduce a technique to find regulatory genomic signatures, which represents a specific application of the genomic signature methodology to classify DNA sequences as putative functional elements within a single organism. RESULTS: The effectiveness of the regulatory genomic signatures is demonstrated via analysis of promoter sequences for genes in DNA repair pathways of humans. The promoters are divided into two classes, the bidirectional promoters and the unidirectional promoters, and distinct genomic signatures are calculated for each class. The genomic signatures include statistically overrepresented words, word clusters, and co-occurring words. The robustness of this method is confirmed by the ability to identify sequences that exist as motifs in TRANSFAC and JASPAR databases, and in overlap with verified binding sites in this set of promoter regions. CONCLUSION: The word-based signatures are shown to be effective by finding occurrences of known regulatory sites. Moreover, the signatures of the bidirectional and unidirectional promoters of human DNA repair pathways are clearly distinct, exhibiting virtually no overlap. In addition to providing an effective characterization method for related DNA sequences, the signatures elucidate putative regulatory aspects of DNA repair pathways, which are notably under-characterized.


Subject(s)
Computational Biology/methods , DNA Repair , Promoter Regions, Genetic , Base Composition , Cluster Analysis , Databases, Genetic , Humans , Models, Statistical
5.
In Silico Biol ; 9(1-2): 11-22, 2009.
Article in English | MEDLINE | ID: mdl-19537158

ABSTRACT

In this paper we describe some utilizing conditions of a recently published tool that offers two basic functions for the classical problem of discovering motifs in a set of promoter sequences. For the first it is assumed that not necessarily all of the sequences possess a common motif of given length l. In this case, CHECKPROMOTER allows an exact identification of maximal subsets of related promoters. The purpose of this program is to recognize putatively co-regulated genes. The second, CHECKMOTIF, solves the problem of checking if the given promoters have a common motif. It uses a fast approximation algorithm for which we were able to derive non-trivial low performance bounds (defined as the ratio of Hamming distance of the obtained solution to that of a theoretically best solution) for the computed outputs. Both programs use a novel weighted Hamming distance paradigm for evaluating the similarity of sets of l-mers, and we are able to compute performance bounds for the proposed motifs. A set of At promoters were used as a benchmark for a comparative test against five known tools. It could be verified that SiteSeeker significantly outperformed these tools.


Subject(s)
Arabidopsis/genetics , Gene Expression Regulation, Plant , Promoter Regions, Genetic/genetics , Regulatory Sequences, Nucleic Acid , Algorithms , Computational Biology
SELECTION OF CITATIONS
SEARCH DETAIL
...