Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
J Theor Biol ; 574: 111625, 2023 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-37748534

RESUMO

Understanding spatially varying survival is crucial for understanding the ecology and evolution of migratory animals, which may ultimately help to conserve such species. We develop an approach to estimate an annual survival probability function varying continuously in geographic space, if the recovery probability is constant over space. This estimate is based on a density function over continuous geographic space and the discrete age at death obtained from dead recovery data. From the same density function, we obtain an estimate for animal distribution in space corrected for survival, i.e., migratory connectivity. This is possible, when migratory connectivity can be separated from recovery probability. In this article, we present the method how spatially and continuously varying survival and the migratory connectivity corrected for survival can be obtained, if a constant recovery probability can be assumed reasonably. The model is a stepping stone in developing a model allowing for disentangling spatially heterogeneous survival and migratory connectivity corrected for survival from a spatially heterogeneous recovery probability. We implement the method using kernel density estimates in the R-package CONSURE. Any other density estimation technique can be used as an alternative. In a simulation study, the estimators are unbiased but show edge effects in survival and migratory connectivity. Applying the method to a real-world data set of European robins Erithacus rubecula results in biologically reasonable continuous heat-maps for survival and migratory connectivity.

2.
Artigo em Inglês | MEDLINE | ID: mdl-35254989

RESUMO

In life sciences, high-throughput techniques typically lead to high-dimensional data and often the number of covariates is much larger than the number of observations. This inherently comes with multicollinearity challenging a statistical analysis in a linear regression framework. Penalization methods such as the lasso, ridge regression, the group lasso, and convex combinations thereof, which introduce additional conditions on regression variables, have proven themselves effective. In this study, we introduce a novel approach by combining the lasso and the standardized group lasso leading to meaningful weighting of the predicted ("fitted") outcome which is of primary importance, e.g., in breeding populations. This "fitted" sparse-group lasso was implemented as a proximal-averaged gradient descent method and is part of the R package "seagull" available at CRAN. For the evaluation of the novel method, we executed an extensive simulation study. We simulated genotypes and phenotypes which resemble data of a dairy cattle population. Genotypes at thousands of genomic markers were used as covariates to fit a quantitative response. The proximity of markers on a chromosome determined grouping. In the majority of simulated scenarios, the new method revealed improved prediction abilities compared to other penalization approaches and was able to localize the signals of simulated features.


Assuntos
Genoma , Animais , Bovinos , Genoma/genética , Genótipo , Simulação por Computador , Modelos Lineares , Fenótipo
3.
J Theor Biol ; 543: 111108, 2022 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-35367238

RESUMO

Spatial variation in survival has individual fitness consequences and influences population dynamics. Which space animals use during the annual cycle determines how they are affected by this spatial variability. Therefore, knowing spatial patterns of survival and space use is crucial to understand demography of migrating animals. Extracting information on survival and space use from observation data, in particular dead recovery data, requires explicitly identifying the observation process. We build a fully stochastic model for animals marked in populations of origin, which were found dead in spatially discrete destination areas. The model acts on the population level and includes parameters for use of space, survival and recovery probability. It is based on the division coefficient and the multinomial reencounter model. We use a likelihood-based approach, derive Restricted Maximum Likelihood-like estimates for all parameters and prove their existence and uniqueness. In a simulation study we demonstrate the performance of the model by using Bayesian estimators derived by the Markov chain Monte Carlo method. We obtain unbiased estimates for survival and recovery probability if the sample size is large enough. Moreover, we apply the model to real-world data of European robins Erithacus rubecula ringed at a stopover site. We obtain annual survival estimates for different spatially discrete non-breeding areas. Additionally, we can reproduce already known patterns of use of space for this species.


Assuntos
Funções Verossimilhança , Animais , Teorema de Bayes , Simulação por Computador , Cadeias de Markov , Método de Monte Carlo , Dinâmica Populacional
4.
Front Vet Sci ; 8: 620327, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33614764

RESUMO

Analysis of volatile organic compounds (VOCs) is a novel approach to accelerate bacterial culture diagnostics of Mycobacterium avium subsp. paratuberculosis (MAP). In the present study, cultures of fecal and tissue samples from MAP-infected and non-suspect dairy cattle and goats were explored to elucidate the effects of sample matrix and of animal species on VOC emissions during bacterial cultivation and to identify early markers for bacterial growth. The samples were processed following standard laboratory procedures, culture tubes were incubated for different time periods. Headspace volume of the tubes was sampled by needle trap-micro-extraction, and analyzed by gas chromatography-mass spectrometry. Analysis of MAP-specific VOC emissions considered potential characteristic VOC patterns. To address variation of the patterns, a flexible and robust machine learning workflow was set up, based on random forest classifiers, and comprising three steps: variable selection, parameter optimization, and classification. Only a few substances originated either from a certain matrix or could be assigned to one animal species. These additional emissions were not considered informative by the variable selection procedure. Classification accuracy of MAP-positive and negative cultures of bovine feces was 0.98 and of caprine feces 0.88, respectively. Six compounds indicating MAP presence were selected in all four settings (cattle vs. goat, feces vs. tissue): 2-Methyl-1-propanol, 2-methyl-1-butanol, 3-methyl-1-butanol, heptanal, isoprene, and 2-heptanone. Classification accuracies for MAP growth-scores ranged from 0.82 for goat tissue to 0.89 for cattle feces. Misclassification occurred predominantly between related scores. Seventeen compounds indicating MAP growth were selected in all four settings, including the 6 compounds indicating MAP presence. The concentration levels of 2,3,5-trimethylfuran, 2-pentylfuran, 1-propanol, and 1-hexanol were indicative for MAP cultures before visible growth was apparent. Thus, very accurate classification of the VOC samples was achieved and the potential of VOC analysis to detect bacterial growth before colonies become visible was confirmed. These results indicate that diagnosis of paratuberculosis can be optimized by monitoring VOC emissions of bacterial cultures. Further validation studies are needed to increase the robustness of indicative VOC patterns for early MAP growth as a pre-requisite for the development of VOC-based diagnostic analysis systems.

5.
BMC Bioinformatics ; 21(1): 407, 2020 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-32933477

RESUMO

BACKGROUND: Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. RESULTS: Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R2 > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. CONCLUSIONS: The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.


Assuntos
Modelos Lineares , Aprendizado de Máquina/normas , Algoritmos , Humanos
6.
Sci Rep ; 10(1): 125, 2020 01 10.
Artigo em Inglês | MEDLINE | ID: mdl-31924851

RESUMO

Fluorescence-tags, commonly used to visualize the spatial distribution of proteins within cells, can influence the localization of the tagged proteins by affecting their stability, interaction with other proteins or the induction of oligomerization artifacts. To circumvent these obstacles, a protocol was developed to generate 50 nm thick serial sections suitable for immunogold labeling and subsequent reconstruction of the spatial distribution of immuno-labeled native proteins within individual bacterial cells. Applying this method, we show a cellular distribution of the staphylococcal alkaline shock protein 23 (Asp23), which is compatible with filament formation, a property of Asp23 that we also demonstrate in vitro.


Assuntos
Proteínas de Bactérias/química , Imageamento Tridimensional , Multimerização Proteica , Microscopia de Fluorescência , Estrutura Quaternária de Proteína
7.
J Math Biol ; 78(1-2): 413-439, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30094616

RESUMO

The Rosenzweig-MacArthur system is a particular case of the Gause model, which is widely used to describe predator-prey systems. In the classical derivation, the interaction terms in the differential equation are essentially derived from considering handling time vs. search time, and moreover there exist derivations in the literature which are based on quasi-steady state assumptions. In the present paper we introduce a derivation of this model from first principles and singular perturbation reductions. We first establish a simple stochastic mass action model which leads to a three-dimensional ordinary differential equation, and systematically determine all possible singular perturbation reductions (in the sense of Tikhonov and Fenichel) to two-dimensional systems. Among the reductions obtained we find the Rosenzweig-MacArthur system for a certain choice of small parameters as well as an alternative to the Rosenzweig-MacArthur model, with density dependent death rates for predators. The arguments to obtain the reductions are intrinsically mathematical; no heuristics are employed.


Assuntos
Cadeia Alimentar , Modelos Biológicos , Comportamento Predatório , Algoritmos , Animais , Biologia Computacional , Ecossistema , Conceitos Matemáticos , Processos Estocásticos , Biologia de Sistemas
8.
Heliyon ; 5(12): e02943, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31890941

RESUMO

The spatio-temporal reduction and oxidation of protein thiols is an essential mechanism in signal transduction in all kingdoms of life. Thioredoxin (Trx) family proteins efficiently catalyze thiol-disulfide exchange reactions and the proteins are widely recognized for their importance in the operation of thiol switches. Trx family proteins have a broad and at the same time very distinct substrate specificity - a prerequisite for redox switching. Despite of multiple efforts, the true nature for this specificity is still under debate. Here, we comprehensively compare the classification/clustering of various redoxins from all domains of life based on their similarity in amino acid sequence, tertiary structure, and their electrostatic properties. We correlate these similarities to the existence of common interaction partners, identified in various previous studies and suggested by proteomic screenings. These analyses confirm that primary and tertiary structure similarity, and thereby all common classification systems, do not correlate to the target specificity of the proteins as thiol-disulfide oxidoreductases. Instead, a number of examples clearly demonstrate the importance of electrostatic similarity for their target specificity, independent of their belonging to the Trx or glutaredoxin subfamilies.

9.
Biom J ; 60(6): 1096-1109, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30101421

RESUMO

Genomic information can be used to study the genetic architecture of some trait. Not only the size of the genetic effect captured by molecular markers and their position on the genome but also the mode of inheritance, which might be additive or dominant, and the presence of interactions are interesting parameters. When searching for interacting loci, estimating the effect size and determining the significant marker pairs increases the computational burden in terms of speed and memory allocation dramatically. This study revisits a rapid Bayesian approach (fastbayes). As a novel contribution, a measure of evidence is derived to select markers with effect significantly different from zero. It is based on the credibility of the highest posterior density interval next to zero in a marginalized manner. This methodology is applied to simulated data resembling a dairy cattle population in order to verify the sensitivity of testing for a given range of type-I error levels. A real data application complements this study. Sensitivity and specificity of fastbayes were similar to a variational Bayesian method, and a further reduction of computing time could be achieved. More than 50% of the simulated causative variants were identified. The most complex model containing different kinds of genetic effects and their pairwise interactions yielded the best outcome over a range of type-I error levels. The validation study showed that fastbayes is a dual-purpose tool for genomic inferences - it is applicable to predict future outcome of not-yet phenotyped individuals with high precision as well as to estimate and test single-marker effects. Furthermore, it allows the estimation of billions of interaction effects.


Assuntos
Biometria/métodos , Genômica , Animais , Teorema de Bayes , Camundongos , Polimorfismo de Nucleotídeo Único , Software
10.
Bull Math Biol ; 80(3): 493-518, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29297144

RESUMO

We present a new class of metrics for unrooted phylogenetic X-trees inspired by the Gromov-Hausdorff distance for (compact) metric spaces. These metrics can be efficiently computed by linear or quadratic programming. They are robust under NNI operations, too. The local behaviour of the metrics shows that they are different from any previously introduced metrics. The performance of the metrics is briefly analysed on random weighted and unweighted trees as well as random caterpillars.


Assuntos
Filogenia , Algoritmos , Animais , Evolução Biológica , Simulação por Computador , Humanos , Conceitos Matemáticos , Modelos Biológicos
11.
J Breath Res ; 11(4): 047105, 2017 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-28768897

RESUMO

Modern statistical methods which were developed for pattern recognition are increasingly being used for data analysis in studies on emissions of volatile organic compounds (VOCs). With the detection of disease-related VOC profiles, novel non-invasive diagnostic tools could be developed for clinical applications. However, it is important to bear in mind that not all statistical methods are equally suitable for the investigation of VOC profiles. In particular, univariate methods are not able to discover VOC patterns as they consider each compound separately. The present study demonstrates this fact in practice. Using VOC samples from a controlled animal study on paratuberculosis, the random forest classification method was applied for pattern recognition and disease prediction. This strategy was compared with a prediction approach based on single compounds. Both methods were framed within a cross-validation procedure. A comparison of both strategies based on these VOC data reveals that random forests achieves higher sensitivities and specificities than predictions based on single compounds. Therefore, it will most likely be more fruitful to further investigate VOC patterns instead of single biomarkers for paratuberculosis. All methods used are thoroughly explained to aid the transfer to other data analyses.


Assuntos
Algoritmos , Testes Respiratórios/métodos , Paratuberculose/diagnóstico , Compostos Orgânicos Voláteis/análise , Animais , Biomarcadores/análise , Árvores de Decisões , Modelos Animais de Doenças , Expiração , Fezes/química , Cabras , Sensibilidade e Especificidade
12.
Mol Biosyst ; 12(10): 3196-208, 2016 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-27507577

RESUMO

The biological relationships both between and within the functions, processes and pathways that operate within complex biological systems are only poorly characterized, making the interpretation of large scale gene expression datasets extremely challenging. Here, we present an approach that integrates gene expression and biological annotation data to identify and describe the interactions between biological functions, processes and pathways that govern a phenotype of interest. The product is a global, interconnected network, not of genes but of functions, processes and pathways, that represents the biological relationships within the system. We validated our approach on two high-throughput expression datasets describing organismal and organ development. Our findings are well supported by the available literature, confirming that developmental processes and apoptosis play key roles in cell differentiation. Furthermore, our results suggest that processes related to pluripotency and lineage commitment, which are known to be critical for development, interact mainly indirectly, through genes implicated in more general biological processes. Moreover, we provide evidence that supports the relevance of cell spatial organization in the developing liver for proper liver function. Our strategy can be viewed as an abstraction that is useful to interpret high-throughput data and devise further experiments.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Animais , Diferenciação Celular/genética , Análise por Conglomerados , Desenvolvimento Embrionário/genética , Regulação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Humanos , Camundongos , Anotação de Sequência Molecular , Organogênese/genética , Regeneração/genética , Transdução de Sinais
13.
Sci Rep ; 6: 28172, 2016 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-27344979

RESUMO

Absolute protein quantification was applied to follow the dynamics of the cytoplasmic proteome of Staphylococcus aureus in response to long-term oxygen starvation. For 1,168 proteins, the majority of all expressed proteins, molecule numbers per cell have been determined to monitor the cellular investments in single branches of bacterial life for the first time. In the presence of glucose the anaerobic protein pattern is characterized by increased amounts of glycolytic and fermentative enzymes such as Eno, GapA1, Ldh1, and PflB. Interestingly, the ferritin-like protein FtnA belongs to the most abundant proteins during anaerobic growth. Depletion of glucose finally leads to an accumulation of different enzymes such as ArcB1, ArcB2, and ArcC2 involved in arginine deiminase pathway. Concentrations of 29 exo- and 78 endometabolites were comparatively assessed and have been integrated to the metabolic networks. Here we provide an almost complete picture on the response to oxygen starvation, from signal transduction pathways to gene expression pattern, from metabolic reorganization after oxygen depletion to beginning cell death and lysis after glucose exhaustion. This experimental approach can be considered as a proof of principle how to combine cell physiology with quantitative proteomics for a new dimension in understanding simple life processes as an entity.


Assuntos
Proteínas de Bactérias/metabolismo , Proteoma/metabolismo , Staphylococcus aureus/metabolismo , Anaerobiose , Cromatografia Líquida de Alta Pressão , Glucose/metabolismo , Espectrometria de Massas , Metaboloma , Oxigênio/metabolismo , Proteoma/análise , Proteômica , Staphylococcus aureus/crescimento & desenvolvimento
14.
Comput Med Imaging Graph ; 48: 9-20, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26741125

RESUMO

Intensity inhomogeneity (bias field) is a common artefact in magnetic resonance (MR) images, which hinders successful automatic segmentation. In this work, a novel algorithm for simultaneous segmentation and bias field correction is presented. The proposed energy functional allows for explicit regularization of the bias field term, making the model more flexible, which is crucial in presence of strong inhomogeneities. An efficient minimization procedure, attempting to find the global minimum, is applied to the energy functional. The algorithm is evaluated qualitatively and quantitatively using a synthetic example and real MR images of different organs. Comparisons with several state-of-the-art methods demonstrate the superior performance of the proposed technique. Desirable results are obtained even for images with strong and complicated inhomogeneity fields and sparse tissue structures.


Assuntos
Artefatos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Algoritmos , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador
15.
BMC Med Genomics ; 8: 61, 2015 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-26462558

RESUMO

BACKGROUND: Non-cellular blood circulating microRNAs (plasma miRNAs) represent a promising source for the development of prognostic and diagnostic tools owing to their minimally invasive sampling, high stability, and simple quantification by standard techniques such as RT-qPCR. So far, the majority of association studies involving plasma miRNAs were disease-specific case-control analyses. In contrast, in the present study, plasma miRNAs were analysed in a sample of 372 individuals from a population-based cohort study, the Study of Health in Pomerania (SHIP). METHODS: Quantification of miRNA levels was performed by RT-qPCR using the Exiqon Serum/Plasma Focus microRNA PCR Panel V3.M covering 179 different miRNAs. Of these, 155 were included in our analyses after quality-control. Associations between plasma miRNAs and the phenotypes age, body mass index (BMI), and sex were assessed via a two-step linear regression approach per miRNA. The first step regressed out the technical parameters and the second step determined the remaining associations between the respective plasma miRNA and the phenotypes of interest. RESULTS: After regressing out technical parameters and adjusting for the respective other two phenotypes, 7, 15, and 35 plasma miRNAs were significantly (q < 0.05) associated with age, BMI, and sex, respectively. Additional adjustment for the blood cell parameters identified 12 and 19 miRNAs to be significantly associated with age and BMI, respectively. Most of the BMI-associated miRNAs likely originate from liver. Sex-associated differences in miRNA levels were largely determined by differences in blood cell parameters. Thus, only 7 as compared to originally 35 sex-associated miRNAs displayed sex-specific differences after adjustment for blood cell parameters. CONCLUSIONS: These findings emphasize that circulating miRNAs are strongly impacted by age, BMI, and sex. Hence, these parameters should be considered as covariates in association studies based on plasma miRNA levels. The established experimental and computational workflow can now be used in future screening studies to determine associations of plasma miRNAs with defined disease phenotypes.


Assuntos
Envelhecimento/sangue , Índice de Massa Corporal , MicroRNAs/sangue , Caracteres Sexuais , Adulto , Idoso , Envelhecimento/genética , Células Sanguíneas/metabolismo , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Adulto Jovem
16.
Proteomics Clin Appl ; 9(11-12): 1003-11, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25676254

RESUMO

PURPOSE: The mortality rate of patients with Staphylococcus aureus infections is alarming and urgently demands new strategies to attenuate the course of these infections or to detect them at earlier stages. EXPERIMENTAL DESIGN: To study the adaptive immune response to S. aureus antigens in healthy human volunteers, a protein microarray containing 44 S. aureus proteins was developed using the ArrayStrip platform technology. RESULTS: Testing plasma samples from 15 S. aureus carriers and 15 noncarriers 21 immunogenic S. aureus antigens have been identified. Seven antigens were recognized by antibodies present in at least 60% of the samples, representing the core S. aureus immunome of healthy individuals. S. aureus-specific serum immunoglobulin G (IgG) levels were significantly lower in noncarriers than in carriers specifically anti-IsaA, anti-SACOL0479, and anti-SACOL0480 IgGs were found at lower frequencies and quantities. Twenty-two antigens present on the microarray were encoded by all S. aureus carrier isolates. Nevertheless, the immune system of the carriers was responsive to only eight of them and with different intensities. CONCLUSION AND CLINICAL RELEVANCE: The established protein microarray allows a broad profiling of the S. aureus-specific antibody response and can be used to identify S. aureus antigens that might serve as vaccines or diagnostic markers.


Assuntos
Anticorpos Antibacterianos/imunologia , Antígenos de Bactérias/imunologia , Análise Serial de Proteínas , Staphylococcus aureus/imunologia , Voluntários Saudáveis , Humanos , Imunidade Humoral , Imunoglobulina G/imunologia , Especificidade da Espécie
17.
PLoS One ; 9(11): e112709, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25422942

RESUMO

Breast density is a risk factor associated with the development of breast cancer. Usually, breast density is assessed on two dimensional (2D) mammograms using the American College of Radiology (ACR) classification. Magnetic resonance imaging (MRI) is a non-radiation based examination method, which offers a three dimensional (3D) alternative to classical 2D mammograms. We propose a new framework for automated breast density calculation on MRI data. Our framework consists of three steps. First, a recently developed method for simultaneous intensity inhomogeneity correction and breast tissue and parenchyma segmentation is applied. Second, the obtained breast component is extracted, and the breast-air and breast-body boundaries are refined. Finally, the fibroglandular/parenchymal tissue volume is extracted from the breast volume. The framework was tested on 37 randomly selected MR mammographies. All images were acquired on a 1.5T MR scanner using an axial, T1-weighted time-resolved angiography with stochastic trajectories sequence. The results were compared to manually obtained groundtruth. Dice's Similarity Coefficient (DSC) as well as Bland-Altman plots were used as the main tools for evaluation of similarity between automatic and manual segmentations. The average Dice's Similarity Coefficient values were 0.96±0.0172 and 0.83±0.0636 for breast and parenchymal volumes, respectively. Bland-Altman plots showed the mean bias (%) ± standard deviation equal 5.36±3.9 for breast volumes and -6.9±13.14 for parenchyma volumes. The automated framework produced sufficient results and has the potential to be applied for the analysis of breast volume and breast density of numerous data in clinical and research settings.


Assuntos
Neoplasias da Mama/diagnóstico , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Idoso , Algoritmos , Feminino , Humanos , Pessoa de Meia-Idade
18.
Cell Commun Signal ; 11: 85, 2013 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-24206562

RESUMO

BACKGROUND: Small molecule effects can be represented by active signaling pathways within functional networks. Identifying these can help to design new strategies to utilize known small molecules, e.g. to trigger specific cellular transformations or to reposition known drugs. RESULTS: We developed CellFateScout that uses the method of Latent Variables to turn differential high-throughput expression data and a functional network into a list of active signaling pathways. Applying it to Connectivity Map data, i.e., differential expression data describing small molecule effects, we then generated a Human Small Molecule Mechanisms Database. Finally, using a list of active signaling pathways as query, a similarity search can identify small molecules from the database that may trigger these pathways. We validated our approach systematically, using expression data of small molecule perturbations, yielding better predictions than popular bioinformatics tools. CONCLUSIONS: CellFateScout can be used to select small molecules for their desired effects. The CellFateScout Cytoscape plugin, a tutorial and the Human Small Molecule Mechanisms Database are available at https://sourceforge.net/projects/cellfatescout/ under LGPLv2 license.


Assuntos
Bases de Dados de Compostos Químicos , Modelos Biológicos , Modelos Estatísticos , Transdução de Sinais , Bibliotecas de Moléculas Pequenas/química , Animais , Biologia Computacional , Humanos , Internet , Camundongos
19.
PLoS One ; 8(10): e76561, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24146889

RESUMO

Microarrays have been useful in understanding various biological processes by allowing the simultaneous study of the expression of thousands of genes. However, the analysis of microarray data is a challenging task. One of the key problems in microarray analysis is the classification of unknown expression profiles. Specifically, the often large number of non-informative genes on the microarray adversely affects the performance and efficiency of classification algorithms. Furthermore, the skewed ratio of sample to variable poses a risk of overfitting. Thus, in this context, feature selection methods become crucial to select relevant genes and, hence, improve classification accuracy. In this study, we investigated feature selection methods based on gene expression profiles and protein interactions. We found that in our setup, the addition of protein interaction information did not contribute to any significant improvement of the classification results. Furthermore, we developed a novel feature selection method that relies exclusively on observed gene expression changes in microarray experiments, which we call "relative Signal-to-Noise ratio" (rSNR). More precisely, the rSNR ranks genes based on their specificity to an experimental condition, by comparing intrinsic variation, i.e. variation in gene expression within an experimental condition, with extrinsic variation, i.e. variation in gene expression across experimental conditions. Genes with low variation within an experimental condition of interest and high variation across experimental conditions are ranked higher, and help in improving classification accuracy. We compared different feature selection methods on two time-series microarray datasets and one static microarray dataset. We found that the rSNR performed generally better than the other methods.


Assuntos
Genes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Animais , Arabidopsis/genética , Bases de Dados Genéticas , Camundongos , Reprodutibilidade dos Testes , Razão Sinal-Ruído , Fatores de Tempo
20.
Math Biosci ; 237(1-2): 38-48, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22430560

RESUMO

Methods of phylogenetic inference use more and more complex models to generate trees from data. However, even simple models and their implications are not fully understood. Here, we investigate the two-state Markov model on a tripod tree, inferring conditions under which a given set of observations gives rise to such a model. This type of investigation has been undertaken before by several scientists from different fields of research. In contrast to other work we fully analyse the model, presenting conditions under which one can infer a model from the observation or at least get support for the tree-shaped interdependence of the leaves considered. We also present all conditions under which the results can be extended from tripod trees to quartet trees, a step necessary to reconstruct at least a topology. Apart from finding conditions under which such an extension works we discuss example cases for which such an extension does not work.


Assuntos
Cadeias de Markov , Modelos Genéticos , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...