Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Bioinform Comput Biol ; 20(3): 2250011, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35802463

RESUMO

Karyotype is a genetic test that is used for detection of chromosomal defects. In a karyotype test, an image is captured from chromosomes during the cell division. The captured images are then analyzed by cytogeneticists in order to detect possible chromosomal defects. In this paper, we have proposed an automated pipeline for analysis of karyotype images. There are three main steps for karyotype image analysis: image enhancement, image segmentation and chromosome classification. In this paper, we have proposed a novel chromosome segmentation algorithm to decompose overlapped chromosomes. We have also proposed a CNN-based classifier which outperforms all the existing classifiers. Our classifier is trained by a dataset of about 1,62,000 human chromosome images. We also introduced a novel post-processing algorithm which improves the classification results. The success rate of our segmentation algorithm is 95%. In addition, our experimental results show that the accuracy of our classifier for human chromosomes is 92.63% and our novel post-processing algorithm increases the classification results to 94%.


Assuntos
Algoritmos , Cromossomos Humanos , Humanos , Processamento de Imagem Assistida por Computador/métodos , Cariótipo , Cariotipagem
2.
Sci Rep ; 10(1): 9148, 2020 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-32499577

RESUMO

The study of salt tolerance mechanisms in halophyte plants can provide valuable information for crop breeding and plant engineering programs. The aim of the present study was to investigate whole transcriptome analysis of Aeluropus littoralis in response to salinity stress (200 and 400 mM NaCl) by de novo RNA-sequencing. To assemble the transcriptome, Trinity v2.4.0 and Bridger tools, were comparatively used with two k-mer sizes (25 and 32 bp). The de novo assembled transcriptome by Bridger (k-mer 32) was chosen as final assembly for subsequent analysis. In general, 103290 transcripts were obtained. The differential expression analysis (log2FC > 1 and FDR < 0.01) showed that 1861 transcripts expressed differentially, including169 up and 316 down-regulated transcripts in 200 mM NaCl treatment and 1035 up and 430 down-regulated transcripts in 400 mM NaCl treatment compared to control. In addition, 89 transcripts were common in both treatments. The most important over-represented terms in the GO analysis of differentially expressed genes (FDR < 0.05) were chitin response, response to abscisic acid, and regulation of jasmonic acid mediated signaling pathway under 400 mM NaCl treatment and cell cycle, cell division, and mitotic cell cycle process under 200 mM treatment. In addition, the phosphatidylcholine biosynthetic process term was common in both salt treatments. Interestingly, under 400 mM salt treatment, the PRC1 complex that contributes to chromatin remodeling was also enriched along with vacuole as a general salinity stress responsive cell component. Among enriched pathways, the MAPK signaling pathway (ko04016) and phytohormone signal transduction (ko04075) were significantly enriched in 400 mM NaCl treatment, whereas DNA replication (ko03032) was the only pathway that significantly enriched in 200 mM NaCl treatment. Finally, our findings indicate the salt-concentration depended responses of A. littoralis, which well-known salinity stress-related pathways are induced in 400 mM NaCl, while less considered pathways, e.g. cell cycle and DNA replication, are highlighted under 200 mM NaCl treatment.


Assuntos
Poaceae/genética , RNA de Plantas/metabolismo , Estresse Salino/fisiologia , Extratos Vegetais/metabolismo , Reguladores de Crescimento de Plantas/metabolismo , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Análise de Componente Principal , RNA de Plantas/química , Plantas Tolerantes a Sal/genética , Análise de Sequência de RNA , Transdução de Sinais/efeitos dos fármacos , Cloreto de Sódio/farmacologia , Transcriptoma
3.
IEEE Trans Pattern Anal Mach Intell ; 42(8): 1928-1941, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-30908258

RESUMO

It is of fundamental importance to find algorithms obtaining optimal performance for learning of statistical models in distributed and communication limited systems. Aiming at characterizing the optimal strategies, we consider learning of Gaussian Processes (GP) in distributed systems as a pivotal example. We first address a very basic problem: how many bits are required to estimate the inner-products of some Gaussian vectors across distributed machines? Using information theoretic bounds, we obtain an optimal solution for the problem which is based on vector quantization. Two suboptimal and more practical schemes are also presented as substitutes for the vector quantization scheme. In particular, it is shown that the performance of one of the practical schemes which is called per-symbol quantization is very close to the optimal one. Schemes provided for the inner-product calculations are incorporated into our proposed distributed learning methods for GPs. Experimental results show that with spending few bits per symbol in our communication scheme, our proposed methods outperform previous zero rate distributed GP learning schemes such as Bayesian Committee Model (BCM) and Product of experts (PoE).

4.
Sci Rep ; 9(1): 2342, 2019 02 20.
Artigo em Inglês | MEDLINE | ID: mdl-30787315

RESUMO

Understanding cell identity is an important task in many biomedical areas. Expression patterns of specific marker genes have been used to characterize some limited cell types, but exclusive markers are not available for many cell types. A second approach is to use machine learning to discriminate cell types based on the whole gene expression profiles (GEPs). The accuracies of simple classification algorithms such as linear discriminators or support vector machines are limited due to the complexity of biological systems. We used deep neural networks to analyze 1040 GEPs from 16 different human tissues and cell types. After comparing different architectures, we identified a specific structure of deep autoencoders that can encode a GEP into a vector of 30 numeric values, which we call the cell identity code (CIC). The original GEP can be reproduced from the CIC with an accuracy comparable to technical replicates of the same experiment. Although we use an unsupervised approach to train the autoencoder, we show different values of the CIC are connected to different biological aspects of the cell, such as different pathways or biological processes. This network can use CIC to reproduce the GEP of the cell types it has never seen during the training. It also can resist some noise in the measurement of the GEP. Furthermore, we introduce classifier autoencoder, an architecture that can accurately identify cell type based on the GEP or the CIC.


Assuntos
Células/metabolismo , Aprendizado Profundo , Perfilação da Expressão Gênica , Redes Neurais de Computação , Algoritmos , Compartimento Celular , Humanos , Especificidade de Órgãos/genética
5.
Comput Biol Med ; 106: 106-113, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30708219

RESUMO

BACKGROUND: Nutrigenomic has revolutionized our understanding of nutrition. As plants make up a noticeable part of our diet, in the present study we chose microRNAs of edible plants and investigated if they can perfectly match human genes, indicating potential regulatory functionalities. METHODS: miRNAs were obtained using the PNRD database. Edible plants were separated and microRNAs in common in at least four of them entered our analysis. Using vmatchPattern, these 64 miRNAs went through four steps of refinement to improve target prediction: Alignment with the whole genome (2581 results), filtered for those in gene regions (1371 results), filtered for exon regions (66 results) and finally alignment with the human CDS (41 results). The identified genes were further analyzed in-silico to find their functions and relations to human diseases. RESULTS: Four common plant miRNAs were identified to match perfectly with 22 human transcripts. The identified target genes were involved in a broad range of body functions, from muscle contraction to tumor suppression. We could also indicate some connections between these findings and folk herbology and botanical medicine. CONCLUSIONS: The food that we regularly eat has a great potential in affecting our genome and altering body functions. Plant miRNAs can provide means of designing drugs for a vast range of health problems including obesity and cancer, since they target genes involved in cell cycle (CCNC), digestion (GIPR) and muscular contractions (MYLK). They can also target regions of CDS for which we still have no sufficient information, to help boost our knowledge of the human genome.


Assuntos
Simulação por Computador , Bases de Dados de Ácidos Nucleicos , Alimentos , Genoma Humano , Nutrigenômica , RNA de Plantas , Análise de Sequência de RNA , Humanos , RNA de Plantas/genética , RNA de Plantas/metabolismo
6.
BMC Bioinformatics ; 20(1): 51, 2019 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-30678641

RESUMO

BACKGROUND: Long reads provide valuable information regarding the sequence composition of genomes. Long reads are usually very noisy which renders their alignments on the reference genome a daunting task. It may take days to process datasets enough to sequence a human genome on a single node. Hence, it is of primary importance to have an aligner which can operate on distributed clusters of computers with high performance in accuracy and speed. RESULTS: In this paper, we presented IMOS, an aligner for mapping noisy long reads to the reference genome. It can be used on a single node as well as on distributed nodes. In its single-node mode, IMOS is an Improved version of Meta-aligner (IM) enhancing both its accuracy and speed. IM is up to 6x faster than the original Meta-aligner. It is also implemented to run IM and Minimap2 on Apache Spark for deploying on a cluster of nodes. Moreover, multi-node IMOS is faster than SparkBWA while executing both IM (1.5x) and Minimap2 (25x). CONCLUSION: In this paper, we purposed an architecture for mapping long reads to a reference. Due to its implementation, IMOS speed can increase almost linearly with respect to the number of nodes in a cluster. Also, it is a multi-platform application able to operate on Linux, Windows, and macOS.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Alinhamento de Sequência , Algoritmos , Mapeamento Cromossômico , Bases de Dados Factuais , Genoma Humano , Genômica , Humanos , Análise de Sequência de DNA , Software , Fluxo de Trabalho
7.
Artigo em Inglês | MEDLINE | ID: mdl-29990264

RESUMO

Association mapping of genetic diseases has attracted extensive research interest during the recent years. However, most of the methodologies introduced so far suffer from spurious inference of the associated sites due to population inhomogeneities. In this paper, we introduce a statistical framework to compensate for this shortcoming by equipping the current methodologies with a state-of-the-art clustering algorithm being widely used in population genetics applications. The proposed framework jointly infers the disease-associated factors and the hidden population structures. In this regard, a Markov Chain-Monte Carlo (MCMC) procedure has been employed to assess the posterior probability distribution of the model parameters. We have implemented our proposed framework on a software package whose performance is extensively evaluated on a number of synthetic datasets, and compared to some of the well-known existing methods such as STRUCTURE. It has been shown that in extreme scenarios, up to $10-15$10-15 percent of improvement in the inference accuracy is achieved with a moderate increase in computational complexity.


Assuntos
Biologia Computacional/métodos , Genética Populacional/métodos , Estudo de Associação Genômica Ampla/métodos , Modelos Estatísticos , Algoritmos , Análise por Conglomerados , Humanos , Cadeias de Markov , Modelos Genéticos , Método de Monte Carlo
8.
BMC Bioinformatics ; 18(1): 126, 2017 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-28231760

RESUMO

BACKGROUND: Current development of sequencing technologies is towards generating longer and noisier reads. Evidently, accurate alignment of these reads play an important role in any downstream analysis. Similarly, reducing the overall cost of sequencing is related to the time consumption of the aligner. The tradeoff between accuracy and speed is the main challenge in designing long read aligners. RESULTS: We propose Meta-aligner which aligns long and very long reads to the reference genome very efficiently and accurately. Meta-aligner incorporates available short/long aligners as subcomponents and uses statistics from the reference genome to increase the performance. Meta-aligner estimates statistics from reads and the reference genome automatically. Meta-aligner is implemented in C++ and runs in popular POSIX-like operating systems such as Linux. CONCLUSIONS: Meta-aligner achieves high recall rates and precisions especially for long reads and high error rates. Also, it improves performance of alignment in the case of PacBio long-reads in comparison with traditional schemes.


Assuntos
Algoritmos , Genoma Humano , Alinhamento de Sequência/métodos , DNA/química , DNA/metabolismo , Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Software
9.
PLoS One ; 11(11): e0164888, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27806058

RESUMO

Lander-Waterman's coverage bound establishes the total number of reads required to cover the whole genome of size G bases. In fact, their bound is a direct consequence of the well-known solution to the coupon collector's problem which proves that for such genome, the total number of bases to be sequenced should be O(G ln G). Although the result leads to a tight bound, it is based on a tacit assumption that the set of reads are first collected through a sequencing process and then are processed through a computation process, i.e., there are two different machines: one for sequencing and one for processing. In this paper, we present a significant improvement compared to Lander-Waterman's result and prove that by combining the sequencing and computing processes, one can re-sequence the whole genome with as low as O(G) sequenced bases in total. Our approach also dramatically reduces the required computational power for the combined process. Simulation results are performed on real genomes with different sequencing error rates. The results support our theory predicting the log G improvement on coverage bound and corresponding reduction in the total number of bases required to be sequenced.


Assuntos
Algoritmos , Composição de Bases , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...