Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
Front Artif Intell ; 7: 1330257, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38962502

RESUMO

The world surrounding us is subject to constant change. These changes, frequently described as concept drift, influence many industrial and technical processes. As they can lead to malfunctions and other anomalous behavior, which may be safety-critical in many scenarios, detecting and analyzing concept drift is crucial. In this study, we provide a literature review focusing on concept drift in unsupervised data streams. While many surveys focus on supervised data streams, so far, there is no work reviewing the unsupervised setting. However, this setting is of particular relevance for monitoring and anomaly detection which are directly applicable to many tasks and challenges in engineering. This survey provides a taxonomy of existing work on unsupervised drift detection. In addition to providing a comprehensive literature review, it offers precise mathematical definitions of the considered problems and contains standardized experiments on parametric artificial datasets allowing for a direct comparison of different detection strategies. Thus, the suitability of different schemes can be analyzed systematically, and guidelines for their usage in real-world scenarios can be provided.

2.
Sci Rep ; 14(1): 16567, 2024 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-39019933

RESUMO

Serine proteases are important regulators of airway epithelial homeostasis. Altered serum or cellular levels of two serpins, Scca1 and Spink5, have been described for airway diseases but their function beyond antiproteolytic activity is insufficiently understood. To close this gap, we generated fly lines with overexpression or knockdown for each gene in the airways. Overexpression of both fly homologues of Scca1 and Spink5 induced the growth of additional airway branches, with more variable results for the respective knockdowns. Dysregulation of Scca1 resulted in a general delay in fruit fly development, with increases in larval and pupal mortality following overexpression of this gene. In addition, the morphological changes in the airways were concomitant with lower tolerance to hypoxia. In conclusion, the observed structural changes of the airways evidently had a strong impact on the airway function in our model as they manifested in a lower physical fitness of the animals. We assume that this is due to insufficient tissue oxygenation. Future work will be directed at the identification of key molecular regulators following the airway-specific dysregulation of Scca1 and Spink5 expression.


Assuntos
Asma , Drosophila melanogaster , Serpinas , Traqueia , Animais , Drosophila melanogaster/metabolismo , Drosophila melanogaster/genética , Traqueia/metabolismo , Traqueia/patologia , Asma/metabolismo , Asma/patologia , Asma/genética , Serpinas/metabolismo , Serpinas/genética , Proteínas de Drosophila/metabolismo , Proteínas de Drosophila/genética , Oxigênio/metabolismo
3.
Neural Comput Appl ; 35(11): 8423-8436, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36568475

RESUMO

Transfer learning schemes based on deep networks which have been trained on huge image corpora offer state-of-the-art technologies in computer vision. Here, supervised and semi-supervised approaches constitute efficient technologies which work well with comparably small data sets. Yet, such applications are currently restricted to application domains where suitable deep network models are readily available. In this contribution, we address an important application area in the domain of biotechnology, the automatic analysis of CHO-K1 suspension growth in microfluidic single-cell cultivation, where data characteristics are very dissimilar to existing domains and trained deep networks cannot easily be adapted by classical transfer learning. We propose a novel transfer learning scheme which expands a recently introduced Twin-VAE architecture, which is trained on realistic and synthetic data, and we modify its specialized training procedure to the transfer learning domain. In the specific domain, often only few to no labels exist and annotations are costly. We investigate a novel transfer learning strategy, which incorporates a simultaneous retraining on natural and synthetic data using an invariant shared representation as well as suitable target variables, while it learns to handle unseen data from a different microscopy technology. We show the superiority of the variation of our Twin-VAE architecture over the state-of-the-art transfer learning methodology in image processing as well as classical image processing technologies, which persists, even with strongly shortened training times and leads to satisfactory results in this domain. The source code is available at https://github.com/dstallmann/transfer_learning_twinvae, works cross-platform, is open-source and free (MIT licensed) software. We make the data sets available at https://pub.uni-bielefeld.de/record/2960030.

4.
World J Methodol ; 13(5): 390-398, 2023 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-38229943

RESUMO

Evidence-based literature reviews play a vital role in contemporary research, facilitating the synthesis of knowledge from multiple sources to inform decision-making and scientific advancements. Within this framework, de-duplication emerges as a part of the process for ensuring the integrity and reliability of evidence extraction. This opinion review delves into the evolution of de-duplication, highlights its importance in evidence synthesis, explores various de-duplication methods, discusses evolving technologies, and proposes best practices. By addressing ethical considerations this paper emphasizes the significance of de-duplication as a cornerstone for quality in evidence-based literature reviews.

5.
Environ Pollut ; 309: 119696, 2022 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-35780997

RESUMO

Early life environmental influences such as exposure to cigarette smoke (CS) can disturb molecular processes of lung development and thereby increase the risk for later development of chronic respiratory diseases. Among the latter, asthma and chronic obstructive pulmonary disease (COPD) are the most common. The airway epithelium plays a key role in their disease pathophysiology but how CS exposure in early life influences airway developmental pathways and epithelial stress responses or survival is poorly understood. Using Drosophila melanogaster larvae as a model for early life, we demonstrate that CS enters the entire larval airway system, where it activates cyp18a1 which is homologues to human CYP1A1 to metabolize CS-derived polycyclic aromatic hydrocarbons and further induces heat shock protein 70. RNASeq studies of isolated airways showed that CS dysregulates pathways involved in oxidative stress response, innate immune response, xenobiotic and glutathione metabolic processes as well as developmental processes (BMP, FGF signaling) in both sexes, while other pathways were exclusive to females or males. Glutathione S-transferase genes were further validated by qPCR showing upregulation of gstD4, gstD5 and gstD8 in respiratory tracts of females, while gstD8 was downregulated and gstD5 unchanged in males. ROS levels were increased in airways after CS. Exposure to CS further resulted in higher larval mortality, lower larval-pupal transition, and hatching rates in males only as compared to air-exposed controls. Taken together, early life CS induces airway epithelial stress responses and dysregulates pathways involved in the fly's branching morphogenesis as well as in mammalian lung development. CS further affected fitness and development in a highly sex-specific manner.


Assuntos
Doença Pulmonar Obstrutiva Crônica , Poluição por Fumaça de Tabaco , Animais , Células Cultivadas , Drosophila melanogaster , Células Epiteliais/metabolismo , Feminino , Humanos , Pulmão/metabolismo , Masculino , Mamíferos , Doença Pulmonar Obstrutiva Crônica/metabolismo , Transdução de Sinais , Nicotiana
6.
IEEE Trans Neural Netw Learn Syst ; 33(6): 2575-2585, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-34255637

RESUMO

Differentiable neural computers (DNCs) extend artificial neural networks with an explicit memory without interference, thus enabling the model to perform classic computation tasks, such as graph traversal. However, such models are difficult to train, requiring long training times and large datasets. In this work, we achieve some of the computational capabilities of DNCs with a model that can be trained very efficiently, namely, an echo state network with an explicit memory without interference. This extension enables echo state networks to recognize all regular languages, including those that contractive echo state networks provably cannot recognize. Furthermore, we demonstrate experimentally that our model performs comparably to its fully trained deep version on several typical benchmark tasks for DNCs.


Assuntos
Memória , Redes Neurais de Computação , Computadores , Idioma
7.
Artigo em Inglês | MEDLINE | ID: mdl-34886409

RESUMO

Emerging research suggests environmental exposures before conception may adversely affect allergies and lung diseases in future generations. Most studies are limited as they have focused on single exposures, not considering that these diseases have a multifactorial origin in which environmental and lifestyle factors are likely to interact. Traditional exposure assessment methods fail to capture the interactions among environmental exposures and their impact on fundamental biological processes, as well as individual and temporal factors. A valid estimation of exposure preconception is difficult since the human reproductive cycle spans decades and the access to germ cells is limited. The exposome is defined as the cumulative measure of external exposures on an organism (external exposome), and the associated biological responses (endogenous exposome) throughout the lifespan, from conception and onwards. An exposome approach implies a targeted or agnostic analysis of the concurrent and temporal multiple exposures, and may, together with recent technological advances, improve the assessment of the environmental contributors to health and disease. This review describes the current knowledge on preconception environmental exposures as related to respiratory health outcomes in offspring. We discuss the usefulness and feasibility of using an exposome approach in this research, advocating for the preconception exposure window to become included in the exposome concept.


Assuntos
Expossoma , Hipersensibilidade , Pneumopatias , Exposição Ambiental/estatística & dados numéricos , Humanos , Estilo de Vida , Pneumopatias/induzido quimicamente
8.
Neural Netw ; 144: 699-725, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34673323

RESUMO

Decentralization is a central characteristic of biological motor control that allows for fast responses relying on local sensory information. In contrast, the current trend of Deep Reinforcement Learning (DRL) based approaches to motor control follows a centralized paradigm using a single, holistic controller that has to untangle the whole input information space. This motivates to ask whether decentralization as seen in biological control architectures might also be beneficial for embodied sensori-motor control systems when using DRL. To answer this question, we provide an analysis and comparison of eight control architectures for adaptive locomotion that were derived for a four-legged agent, but with their degree of decentralization varying systematically between the extremes of fully centralized and fully decentralized. Our comparison shows that learning speed is significantly enhanced in distributed architectures-while still reaching the same high performance level of centralized architectures-due to smaller search spaces and local costs providing more focused information for learning. Second, we find an increased robustness of the learning process in the decentralized cases-it is less demanding to hyperparameter selection and less prone to becoming trapped in poor local minima. Finally, when examining generalization to uneven terrains-not used during training-we find best performance for an intermediate architecture that is decentralized, but integrates only local information from both neighboring legs. Together, these findings demonstrate beneficial effects of distributing control into decentralized units and relying on local information. This appears as a promising approach towards more robust DRL and better generalization towards adaptive behavior.

9.
Ther Adv Psychopharmacol ; 11: 20451253211015070, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34221348

RESUMO

OBJECTIVE: Clozapine remains the most effective intervention for treatment resistant schizophrenia; however, its use is prohibited following neutropenias. We review neutrophil biology as applied to clozapine and describe the strategies to initiate clozapine following neutropenia used in a case series of 14 consecutive patients rechallenged in a United Kingdom (UK) high-secure psychiatric hospital. We examine outcomes including the use of seclusion and transfer. METHODS: A case series of 14 male patients with treatment resistant schizophrenia treated with clozapine despite previous episodes of neutropenia between 2006 and 2015 is presented. Data were collected during 2015 and 2019. Using this routinely collected clinical data, we describe the patient characteristics, causes of neutropenia, the strategies used for rechallenging with clozapine and clinical outcomes. RESULTS: Previous neutropenias were the result of benign ethnic neutropenia, clozapine, other medications and autoimmune-related. Our risk mitigation strategies included: granulocyte-colony stimulating factor (G-CSF), lithium and watch-and-wait. There were no serious adverse events; at follow up half of the patient's had improved sufficiently to transfer them to conditions of lesser security. There were dramatic reductions in the use of seclusion. CONCLUSION: Even in this extreme group, clozapine can be safely and effectively re/initiated following neutropenias, resulting in marked benefits for patients. This requires careful planning based on an understanding of neutrophil biology and the aetiology of the specific episode of neutropenia.

10.
Int J Obes (Lond) ; 45(7): 1623-1627, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34002034

RESUMO

BACKGROUND: Active smoking has been reported among 7% of teenagers worldwide, with ages ranging from 13 to 15 years. An epidemiological study suggested that preconceptional paternal smoking is associated with adolescent obesity in boys. We developed a murine adolescent smoking model before conception to investigate the paternal molecular causes of changes in offspring's phenotype. METHOD: Male and female C57BL/6J mice were exposed to increasing doses of mainstream cigarette smoke (CS) from onset of puberty for 6 weeks and mated with room air (RA) controls. RESULTS: Thirteen miRNAs were upregulated and 32 downregulated in the spermatozoa of CS-exposed fathers, while there were no significant differences in the count and morphological integrity of spermatozoa, as well as the proliferation of spermatogonia between CS- and RA-exposed fathers. Offspring from preconceptional CS-exposed mothers had lower body weights (p = 0.007). Moreover, data from offspring from CS-exposed fathers suggested a potential increase in body weight (p = 0.062). CONCLUSION: We showed that preconceptional paternal CS exposure regulates spermatozoal miRNAs, and possibly influences the body weight of F1 progeny in early life. The regulated miRNAs may modulate transmittable epigenetic changes to offspring, thus influence the development of respiratory- and metabolic-related diseases such as obesity, a mechanism that warrants further studies for elaborate explanations.


Assuntos
Peso Corporal/efeitos dos fármacos , MicroRNAs/genética , Exposição Paterna , Espermatozoides/química , Fumar Tabaco/efeitos adversos , Animais , Epigênese Genética/genética , Feminino , Masculino , Camundongos , Gravidez , Transcriptoma/genética
11.
Bioinformatics ; 37(20): 3632-3639, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-34019074

RESUMO

MOTIVATION: Innovative microfluidic systems carry the promise to greatly facilitate spatio-temporal analysis of single cells under well-defined environmental conditions, allowing novel insights into population heterogeneity and opening new opportunities for fundamental and applied biotechnology. Microfluidics experiments, however, are accompanied by vast amounts of data, such as time series of microscopic images, for which manual evaluation is infeasible due to the sheer number of samples. While classical image processing technologies do not lead to satisfactory results in this domain, modern deep-learning technologies, such as convolutional networks can be sufficiently versatile for diverse tasks, including automatic cell counting as well as the extraction of critical parameters, such as growth rate. However, for successful training, current supervised deep learning requires label information, such as the number or positions of cells for each image in a series; obtaining these annotations is very costly in this setting. RESULTS: We propose a novel machine-learning architecture together with a specialized training procedure, which allows us to infuse a deep neural network with human-powered abstraction on the level of data, leading to a high-performing regression model that requires only a very small amount of labeled data. Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated. AVAILABILITY AND IMPLEMENTATION: The project is cross-platform, open-source and free (MIT licensed) software. We make the source code available at https://github.com/dstallmann/cell_cultivation_analysis; the dataset is available at https://pub.uni-bielefeld.de/record/2945513.

12.
Sensors (Basel) ; 21(6)2021 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-33803030

RESUMO

Reliable object tracking that is based on video data constitutes an important challenge in diverse areas, including, among others, assisted surgery. Particle filtering offers a state-of-the-art technology for this challenge. Becaise a particle filter is based on a probabilistic model, it provides explicit likelihood values; in theory, the question of whether an object is reliably tracked can be addressed based on these values, provided that the estimates are correct. In this contribution, we investigate the question of whether these likelihood values are suitable for deciding whether the tracked object has been lost. An immediate strategy uses a simple threshold value to reject settings with a likelihood that is too small. We show in an application from the medical domain-object tracking in assisted surgery in the domain of Robotic Osteotomies-that this simple threshold strategy does not provide a reliable reject option for object tracking, in particular if different settings are considered. However, it is possible to develop reliable and flexible machine learning models that predict a reject based on diverse quantities that are computed by the particle filter. Modeling the task in the form of a regression enables a flexible handling of different demands on the tracking accuracy; modeling the challenge as an ensemble of classification tasks yet surpasses the results, while offering the same flexibility.


Assuntos
Algoritmos
14.
Front Med (Lausanne) ; 7: 571003, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33240904

RESUMO

Smokers with apparently "healthy" lungs suffer from more severe and frequent viral respiratory infections, but the mechanisms underlying this observation are still unclear. Epithelial cells and dendritic cells (DC) form the first line of defense against inhaled noxes such as smoke or viruses. We therefore aimed to obtain insight into how cigarette smoke affects DCs and epithelial cells and how this influences the response to viral infection. Female C57BL/6J mice were exposed to cigarette smoke (CS) for 1 h daily for 24 days and then challenged i.n. with the viral mimic and Toll-like receptor 3 (TLR3) ligand poly (I:C) after the last exposure. DC subpopulations were analyzed 24 h later in whole lung homogenates by flow cytometry. Calu-3 cells or human precision-cut lung slices (PCLS) cultured at air-liquid interface were exposed to CS or air and subsequently inoculated with influenza H1N1. At 48 h post infection cytokines were analyzed by multiplex technology. Cytotoxic effects were measured by release of lactate dehydrogenase (LDH) and confocal imaging. In Calu-3 cells the trans-epithelial electrical resistance (TEER) was assessed. Smoke exposure of mice increased numbers of inflammatory and plasmacytoid DCs in lung tissue. Additional poly (I:C) challenge further increased the population of inflammatory DCs and conventional DCs, especially CD11b+ cDCs. Smoke exposure led to a loss of the barrier function in Calu-3 cells, which was further exaggerated by additional influenza H1N1 infection. Influenza H1N1-induced secretion of antiviral cytokines (IFN-α2a, IFN-λ, interferon-γ-induced protein 10 [IP-10]), pro-inflammatory cytokine IL-6, as well as T cell-associated cytokines (e.g., I-TAC) were completely suppressed in both Calu-3 cells and human PCLS after smoke exposure. In summary, cigarette smoke exposure increased the number of inflammatory DCs in the lung and disrupted epithelial barrier functions, both of which was further enhanced by viral stimulation. Additionally, the antiviral immune response to influenza H1N1 was strongly suppressed by smoke. These data suggest that smoke impairs protective innate mechanisms in the lung, which could be responsible for the increased susceptibility to viral infections in "healthy" smokers.

15.
BMC Psychiatry ; 20(1): 279, 2020 06 05.
Artigo em Inglês | MEDLINE | ID: mdl-32503471

RESUMO

Clozapine remains the only drug treatment likely to benefit patients with treatment resistant schizophrenia. Its use is complicated by an increased risk of neutropenia and so there are stringent monitoring requirements and restrictions in those with previous neutropenia from any cause or from clozapine in particular. Despite these difficulties clozapine may yet be used following neutropenia, albeit with caution. Having had involvement with 14 cases of clozapine use in these circumstances we set out our approach to the assessment of risks and benefits, risk mitigation and monitoring with a practical guide.


Assuntos
Antipsicóticos/farmacologia , Clozapina/efeitos adversos , Clozapina/farmacologia , Neutropenia/induzido quimicamente , Esquizofrenia/tratamento farmacológico , Antipsicóticos/administração & dosagem , Antipsicóticos/efeitos adversos , Clozapina/administração & dosagem , Humanos
16.
IEEE Trans Neural Syst Rehabil Eng ; 27(5): 956-962, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30908234

RESUMO

Research on machine learning approaches for upper-limb prosthesis control has shown impressive progress. However, translating these results from the lab to patient's everyday lives remains a challenge because advanced control schemes tend to break down under everyday disturbances, such as electrode shifts. Recently, it has been suggested to apply adaptive transfer learning to counteract electrode shifts using as little newly recorded training data as possible. In this paper, we present a novel, simple version of transfer learning and provide the first user study demonstrating the effectiveness of transfer learning to counteract electrode shifts. For this purpose, we introduce the novel Box and Beans test to evaluate prosthesis proficiency and compare user performance with an initial simple pattern recognition system, the system under electrode shifts, and the system after transfer learning. Our results show that transfer learning could significantly alleviate the impact of electrode shifts on user performance in the Box and Beans test.


Assuntos
Membros Artificiais , Eletrodos , Eletromiografia/instrumentação , Aprendizado de Máquina , Algoritmos , Amputados , Humanos , Satisfação do Paciente , Reconhecimento Automatizado de Padrão , Desenho de Prótese , Processamento de Sinais Assistido por Computador , Transferência de Experiência , Extremidade Superior
17.
Clin Exp Allergy ; 48(11): 1378-1390, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30244507

RESUMO

BACKGROUND: The prevalence of asthma and chronic obstructive pulmonary disease (COPD) has risen markedly over the last decades and is reaching epidemic proportions. However, underlying molecular mechanisms are not fully understood, hampering the urgently needed development of approaches to prevent these diseases. It is well established from epidemiological studies that prenatal exposure to cigarette smoke is one of the main risk factors for aberrant lung function development or reduced fetal growth, but also for the development of asthma and possibly COPD later in life. Of note, recent evidence suggests that the disease risk can be transferred across generations, that is, from grandparents to their grandchildren. While initial studies in mouse models on in utero smoke exposure have provided important mechanistic insights, there are still knowledge gaps that need to be filled. OBJECTIVE: Thus, in this review, we summarize current knowledge on this topic derived from mouse models, while also introducing two other relevant animal models: the fruit fly Drosophila melanogaster and the zebrafish Danio rerio. METHODS: This review is based on an intensive review of PubMed-listed transgenerational animal studies from 1902 to 2018 and focuses in detail on selected literature due to space limitations. RESULTS: This review gives a comprehensive overview of mechanistic insights obtained in studies with the three species, while highlighting the remaining knowledge gaps. We will further discuss potential (dis)advantages of all three animal models. CONCLUSION/CLINICAL RELEVANCE: Many studies have already addressed transgenerational inheritance of disease risk in mouse, zebrafish or fly models. We here propose a novel strategy for how these three model organisms can be synergistically combined to achieve a more detailed understanding of in utero cigarette smoke-induced transgenerational inheritance of disease risk.


Assuntos
Asma/etiologia , Reações Cruzadas/imunologia , Exposição Materna/efeitos adversos , Efeitos Tardios da Exposição Pré-Natal , Fumar/efeitos adversos , Alérgenos/imunologia , Animais , Asma/epidemiologia , Modelos Animais de Doenças , Feminino , Humanos , Fenótipo , Gravidez
18.
Bioinformatics ; 34(13): 2245-2253, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29462241

RESUMO

Motivation: Identification of cell populations in flow cytometry is a critical part of the analysis and lays the groundwork for many applications and research discovery. The current paradigm of manual analysis is time consuming and subjective. A common goal of users is to replace manual analysis with automated methods that replicate their results. Supervised tools provide the best performance in such a use case, however they require fine parameterization to obtain the best results. Hence, there is a strong need for methods that are fast to setup, accurate and interpretable. Results: flowLearn is a semi-supervised approach for the quality-checked identification of cell populations. Using a very small number of manually gated samples, through density alignments it is able to predict gates on other samples with high accuracy and speed. On two state-of-the-art datasets, our tool achieves median(F1)-measures exceeding 0.99 for 31%, and 0.90 for 80% of all analyzed populations. Furthermore, users can directly interpret and adjust automated gates on new sample files to iteratively improve the initial training. Availability and implementation: FlowLearn is available as an R package on https://github.com/mlux86/flowLearn. Evaluation data is publicly available online. Details can be found in the Supplementary Material. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Citometria de Fluxo/métodos , Software
19.
Entropy (Basel) ; 20(10)2018 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-33265863

RESUMO

We introduce a modeling framework for the investigation of on-line machine learning processes in non-stationary environments. We exemplify the approach in terms of two specific model situations: In the first, we consider the learning of a classification scheme from clustered data by means of prototype-based Learning Vector Quantization (LVQ). In the second, we study the training of layered neural networks with sigmoidal activations for the purpose of regression. In both cases, the target, i.e., the classification or regression scheme, is considered to change continuously while the system is trained from a stream of labeled data. We extend and apply methods borrowed from statistical physics which have been used frequently for the exact description of training dynamics in stationary environments. Extensions of the approach allow for the computation of typical learning curves in the presence of concept drift in a variety of model situations. First results are presented and discussed for stochastic drift processes in classification and regression problems. They indicate that LVQ is capable of tracking a classification scheme under drift to a non-trivial extent. Furthermore, we show that concept drift can cause the persistence of sub-optimal plateau states in gradient based training of layered neural networks for regression.

20.
BMC Bioinformatics ; 17(1): 543, 2016 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-27998267

RESUMO

BACKGROUND: A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. RESULTS: We present acdc, a tool specifically developed to aid the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. CONCLUSIONS: Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.


Assuntos
Contaminação por DNA , Genoma , Aprendizado de Máquina , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados , DNA/análise , DNA/genética , Controle de Qualidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...