Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Genomics ; 23(1): 504, 2022 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-35831808

RESUMO

BACKGROUND: Using single-cell RNA sequencing (scRNA-seq) data to diagnose disease is an effective technique in medical research. Several statistical methods have been developed for the classification of RNA sequencing (RNA-seq) data, including, for example, Poisson linear discriminant analysis (PLDA), negative binomial linear discriminant analysis (NBLDA), and zero-inflated Poisson logistic discriminant analysis (ZIPLDA). Nevertheless, few existing methods perform well for large sample scRNA-seq data, in particular when the distribution assumption is also violated. RESULTS: We propose a deep learning classifier (scDLC) for large sample scRNA-seq data, based on the long short-term memory recurrent neural networks (LSTMs). Our new scDLC does not require a prior knowledge on the data distribution, but instead, it takes into account the dependency of the most outstanding feature genes in the LSTMs model. LSTMs is a special recurrent neural network, which can learn long-term dependencies of a sequence. CONCLUSIONS: Simulation studies show that our new scDLC performs consistently better than the existing methods in a wide range of settings with large sample sizes. Four real scRNA-seq datasets are also analyzed, and they coincide with the simulation results that our new scDLC always performs the best. The code named "scDLC" is publicly available at https://github.com/scDLC-code/code .


Assuntos
Aprendizado Profundo , Análise Discriminante , Perfilação da Expressão Gênica/métodos , RNA/genética , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
2.
Stat Methods Med Res ; 30(7): 1640-1653, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34134561

RESUMO

For a nonparametric Behrens-Fisher problem, a directional-sum test is proposed based on division-combination strategy. A one-layer wild bootstrap procedure is given to calculate its statistical significance. We conduct simulation studies with data generated from lognormal, t and Laplace distributions to show that the proposed test can control the type I error rates properly and is more powerful than the existing rank-sum and maximum-type tests under most of the considered scenarios. Applications to the dietary intervention trial further show the performance of the proposed test.


Assuntos
Dieta , Projetos de Pesquisa , Simulação por Computador , Modelos Estatísticos
3.
Stat Methods Med Res ; 30(1): 112-128, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-32726188

RESUMO

Hidden Markov models are useful in simultaneously analyzing a longitudinal observation process and its dynamic transition. Existing hidden Markov models focus on mean regression for the longitudinal response. However, the tails of the response distribution are as important as the center in many substantive studies. We propose a quantile hidden Markov model to provide a systematic method to examine the entire conditional distribution of the response given the hidden state and potential covariates. Instead of considering homogeneous hidden Markov models, which assume that the probabilities of between-state transitions are independent of subject- and time-specific characteristics, we allow the transition probabilities to depend on exogenous covariates, thereby yielding nonhomogeneous Markov chains and making the proposed model more flexible than its homogeneous counterpart. We develop a Bayesian approach coupled with efficient Markov chain Monte Carlo methods for statistical inference. Simulations are conducted to assess the empirical performance of the proposed method. The proposed methodology is applied to a cocaine use study to provide new insights into the prevention of cocaine use.


Assuntos
Modelos Estatísticos , Teorema de Bayes , Cadeias de Markov , Método de Monte Carlo
4.
PLoS One ; 15(6): e0234094, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32589640

RESUMO

An important inferential task in functional linear models is to test the dependence between the response and the functional predictor. The traditional testing theory was constructed based on the functional principle component analysis which requires estimating the covariance operator of the functional predictor. Due to the intrinsic high-dimensionality of functional data, the sample is often not large enough to allow accurate estimation of the covariance operator and hence causes the follow-up test underpowered. To avoid the expensive estimation of the covariance operator, we propose a nonparametric method called Functional Linear models with U-statistics TEsting (FLUTE) to test the dependence assumption. We show that the FLUTE test is more powerful than the current benchmark method (Kokoszka P,2008; Patilea V,2016) in the small or moderate sample case. We further prove the asymptotic normality of our test statistic under both the null hypothesis and a local alternative hypothesis. The merit of our method is demonstrated by both simulation studies and real examples.


Assuntos
Modelos Estatísticos , Canadá , Modelos Lineares , Estatísticas não Paramétricas , Tempo (Meteorologia)
5.
PLoS One ; 13(8): e0201586, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30086146

RESUMO

DNA methylation is an essential epigenetic modification involved in regulating the expression of mammalian genomes. A variety of experimental approaches to generate genome-wide or whole-genome DNA methylation data have emerged in recent years. Methylated DNA immunoprecipitation followed by sequencing (MeDIP-seq) is one of the major tools used in whole-genome epigenetic studies. However, analyzing this data in terms of accuracy, sensitivity, and speed still remains an important challenge. Existing methods, such as BATMAN and MEDIPS, analyze MeDIP-seq data by dividing the whole genome into equal length windows and assume that each CpG of the same window has the same methylation level. More precise work is necessary to estimate the methylation level of each CpG site in the whole genome. In this paper, we propose a Statistical Inferences with MeDIP-seq Data (SIMD) to infer the methylation level for each CpG site. In addition, we analyze a real dataset for DNA methylation. The results show that our method displays improved precision in detecting differentially methylated CpG sites compared to the existing method. To meet the demands of the application, we have developed an R package called "SIMD", which is freely available in https://github.com/FocusPaka/SIMD.


Assuntos
Metilação de DNA , Epigenômica/métodos , Sequenciamento Completo do Genoma/métodos , Algoritmos , Ilhas de CpG , Epigênese Genética , Regulação da Expressão Gênica , Humanos , Internet
6.
Bioinformatics ; 34(8): 1329-1335, 2018 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-29186294

RESUMO

Motivation: With the development of high-throughput techniques, RNA-sequencing (RNA-seq) is becoming increasingly popular as an alternative for gene expression analysis, such as RNAs profiling and classification. Identifying which type of diseases a new patient belongs to with RNA-seq data has been recognized as a vital problem in medical research. As RNA-seq data are discrete, statistical methods developed for classifying microarray data cannot be readily applied for RNA-seq data classification. Witten proposed a Poisson linear discriminant analysis (PLDA) to classify the RNA-seq data in 2011. Note, however, that the count datasets are frequently characterized by excess zeros in real RNA-seq or microRNA sequence data (i.e. when the sequence depth is not enough or small RNAs with the length of 18-30 nucleotides). Therefore, it is desired to develop a new model to analyze RNA-seq data with an excess of zeros. Results: In this paper, we propose a Zero-Inflated Poisson Logistic Discriminant Analysis (ZIPLDA) for RNA-seq data with an excess of zeros. The new method assumes that the data are from a mixture of two distributions: one is a point mass at zero, and the other follows a Poisson distribution. We then consider a logistic relation between the probability of observing zeros and the mean of the genes and the sequencing depth in the model. Simulation studies show that the proposed method performs better than, or at least as well as, the existing methods in a wide range of settings. Two real datasets including a breast cancer RNA-seq dataset and a microRNA-seq dataset are also analyzed, and they coincide with the simulation results that our proposed method outperforms the existing competitors. Availability and implementation: The software is available at http://www.math.hkbu.edu.hk/∼tongt. Contact: xwan@comp.hkbu.edu.hk or tongt@hkbu.edu.hk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Software , Neoplasias da Mama/genética , Análise Discriminante , Feminino , Humanos , MicroRNAs
7.
J Comput Biol ; 24(11): 1099-1111, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28414553

RESUMO

High-throughput techniques bring novel tools and also statistical challenges to genomic research. Identification of which type of diseases a new patient belongs to has been recognized as an important problem. For high-dimensional small sample size data, the classical discriminant methods suffer from the singularity problem and are, therefore, no longer applicable in practice. In this article, we propose a geometric diagonalization method for the regularized discriminant analysis. We then consider a bias correction to further improve the proposed method. Simulation studies show that the proposed method performs better than, or at least as well as, the existing methods in a wide range of settings. A microarray dataset and an RNA-seq dataset are also analyzed and they demonstrate the superiority of the proposed method over the existing competitors, especially when the number of samples is small or the number of genes is large. Finally, we have developed an R package called "GDRDA" which is available upon request.


Assuntos
Algoritmos , Biomarcadores Tumorais/genética , Neoplasias da Mama/genética , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Simulação por Computador , Análise Discriminante , Feminino , Humanos
8.
Cancer Immunol Immunother ; 66(6): 717-729, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28246881

RESUMO

Non-Hodgkin lymphoma (NHL) is an incurable lymphoproliferative cancer, and patients with NHL have a poor prognosis. The present study explored the regulatory mechanism of expression and possible roles of the immunosuppressive B7-H4 molecule in human NHL. For functional studies, NHL-reactive T cell lines were generated via the isolation of allogeneic CD3+ T cells from healthy donors and repeated in vitro stimulation with irradiated NHL cells isolated from patients. B7-H4 was found to be distributed in NHL cells and tissues, and its surface protein expression levels were further upregulated by the incubation of NHL cells with interleukin (IL)-6, IL-10, or interferon-γ. Additionally, the supernatants of tumor-associated macrophages (tMφs) upregulated B7-H4 surface expression by producing IL-6 and IL-10. B7-H4 expressed in NHL cells inhibited the cytotoxic activity of NHL-reactive T cells. Conversely, the inhibition of B7-H4 in NHL cells promoted T cell immunity and sensitized NHL cells to cytolysis. Furthermore, tMφs induced B7-H4 promoted NHL cell evasion of the T cell immune response. In conclusion, this study shows that NHL-expressed B7-H4 is an important immunosuppressive factor that inhibits host anti-tumor immunity to NHL. Targeting tumor-expressed B7-H4 may thus provide a new treatment strategy for NHL patients.


Assuntos
Interleucina-10/metabolismo , Interleucina-6/metabolismo , Linfoma não Hodgkin/imunologia , Linfoma não Hodgkin/metabolismo , Macrófagos/imunologia , Linfócitos T Reguladores/imunologia , Evasão Tumoral , Inibidor 1 da Ativação de Células T com Domínio V-Set/metabolismo , Comunicação Celular/imunologia , Humanos , Linfoma não Hodgkin/patologia , Células Tumorais Cultivadas
9.
PLoS One ; 11(7): e0159084, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27416030

RESUMO

Representation based classification methods, such as Sparse Representation Classification (SRC) and Linear Regression Classification (LRC) have been developed for face recognition problem successfully. However, most of these methods use the original face images without any preprocessing for recognition. Thus, their performances may be affected by some problematic factors (such as illumination and expression variances) in the face images. In order to overcome this limitation, a novel supervised filter learning algorithm is proposed for representation based face recognition in this paper. The underlying idea of our algorithm is to learn a filter so that the within-class representation residuals of the faces' Local Binary Pattern (LBP) features are minimized and the between-class representation residuals of the faces' LBP features are maximized. Therefore, the LBP features of filtered face images are more discriminative for representation based classifiers. Furthermore, we also extend our algorithm for heterogeneous face recognition problem. Extensive experiments are carried out on five databases and the experimental results verify the efficacy of the proposed algorithm.


Assuntos
Inteligência Artificial , Reconhecimento Facial , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Bases de Dados Factuais , Face , Interpretação de Imagem Assistida por Computador/métodos , Iluminação
10.
BioData Min ; 7(1): 30, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25503379

RESUMO

[This corrects the article DOI: 10.1186/1756-0381-7-15.].

11.
PLoS One ; 9(11): e113198, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25419662

RESUMO

Recently, Sparse Representation-based Classification (SRC) has attracted a lot of attention for its applications to various tasks, especially in biometric techniques such as face recognition. However, factors such as lighting, expression, pose and disguise variations in face images will decrease the performances of SRC and most other face recognition techniques. In order to overcome these limitations, we propose a robust face recognition method named Locality Constrained Joint Dynamic Sparse Representation-based Classification (LCJDSRC) in this paper. In our method, a face image is first partitioned into several smaller sub-images. Then, these sub-images are sparsely represented using the proposed locality constrained joint dynamic sparse representation algorithm. Finally, the representation results for all sub-images are aggregated to obtain the final recognition result. Compared with other algorithms which process each sub-image of a face image independently, the proposed algorithm regards the local matching-based face recognition as a multi-task learning problem. Thus, the latent relationships among the sub-images from the same face image are taken into account. Meanwhile, the locality information of the data is also considered in our algorithm. We evaluate our algorithm by comparing it with other state-of-the-art approaches. Extensive experiments on four benchmark face databases (ORL, Extended YaleB, AR and LFW) demonstrate the effectiveness of LCJDSRC.


Assuntos
Algoritmos , Face/anatomia & histologia , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Identificação Biométrica/métodos , Humanos , Reprodutibilidade dos Testes
12.
BMC Genomics ; 15: 868, 2014 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-25286960

RESUMO

BACKGROUND: Aberrant DNA methylation is a hallmark of many cancers. Classically there are two types of endometrial cancer, endometrioid adenocarcinoma (EAC), or Type I, and uterine papillary serous carcinoma (UPSC), or Type II. However, the whole genome DNA methylation changes in these two classical types of endometrial cancer is still unknown. RESULTS: Here we described complete genome-wide DNA methylome maps of EAC, UPSC, and normal endometrium by applying a combined strategy of methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylation-sensitive restriction enzyme digestion sequencing (MRE-seq). We discovered distinct genome-wide DNA methylation patterns in EAC and UPSC: 27,009 and 15,676 recurrent differentially methylated regions (DMRs) were identified respectively, compared with normal endometrium. Over 80% of DMRs were in intergenic and intronic regions. The majority of these DMRs were not interrogated on the commonly used Infinium 450K array platform. Large-scale demethylation of chromosome X was detected in UPSC, accompanied by decreased XIST expression. Importantly, we discovered that the majority of the DMRs harbored promoter or enhancer functions and are specifically associated with genes related to uterine development and disease. Among these, abnormal methylation of transposable elements (TEs) may provide a novel mechanism to deregulate normal endometrium-specific enhancers derived from specific TEs. CONCLUSIONS: DNA methylation changes are an important signature of endometrial cancer and regulate gene expression by affecting not only proximal promoters but also distal enhancers.


Assuntos
Neoplasias do Endométrio/genética , Neoplasias do Endométrio/fisiopatologia , Elementos Facilitadores Genéticos/genética , Regiões Promotoras Genéticas/genética , Neoplasias Uterinas/genética , Neoplasias Uterinas/fisiopatologia , Proteínas Adaptadoras de Transdução de Sinal/genética , Família Aldeído Desidrogenase 1 , Carcinoma Papilar/genética , Carcinoma Papilar/metabolismo , Cromossomos Humanos X , Ilhas de CpG , DNA (Citosina-5-)-Metiltransferases/genética , DNA (Citosina-5-)-Metiltransferases/metabolismo , Metilação de DNA , Elementos de DNA Transponíveis/genética , Feminino , Humanos , Fator 4 Semelhante a Kruppel , Fatores de Transcrição Kruppel-Like/genética , Proteína 1 Homóloga a MutL , Proteínas Nucleares/genética , Polimorfismo de Nucleotídeo Único , RNA Longo não Codificante/genética , Retinal Desidrogenase/genética , Análise de Sequência de DNA
13.
BioData Min ; 7: 15, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25285156

RESUMO

BACKGROUND: Next generation sequencing technologies are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key to analyzing massive and complex sequencing data. In order to derive gene expression measures and compare these measures across samples or libraries, we first need to normalize read counts to adjust for varying sample sequencing depths and other potentially technical effects. RESULTS: In this paper, we develop a normalization method based on iterating median of M-values (IMM) for detecting the differentially expressed (DE) genes. Compared to a previous approach TMM, the IMM method improves the accuracy of DE detection. Simulation studies show that the IMM method outperforms other methods for the sample normalization. We also look into the real data and find that the genes detected by IMM but not by TMM are much more accurate than the genes detected by TMM but not by IMM. What's more, we discovered that gene UNC5C is highly associated with kidney cancer and so on.

14.
Genome Res ; 23(9): 1522-40, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23804400

RESUMO

DNA methylation plays key roles in diverse biological processes such as X chromosome inactivation, transposable element repression, genomic imprinting, and tissue-specific gene expression. Sequencing-based DNA methylation profiling provides an unprecedented opportunity to map and compare complete DNA methylomes. This includes one of the most widely applied technologies for measuring DNA methylation: methylated DNA immunoprecipitation followed by sequencing (MeDIP-seq), coupled with a complementary method, methylation-sensitive restriction enzyme sequencing (MRE-seq). A computational approach that integrates data from these two different but complementary assays and predicts methylation differences between samples has been unavailable. Here, we present a novel integrative statistical framework M&M (for integration of MeDIP-seq and MRE-seq) that dynamically scales, normalizes, and combines MeDIP-seq and MRE-seq data to detect differentially methylated regions. Using sample-matched whole-genome bisulfite sequencing (WGBS) as a gold standard, we demonstrate superior accuracy and reproducibility of M&M compared to existing analytical methods for MeDIP-seq data alone. M&M leverages the complementary nature of MeDIP-seq and MRE-seq data to allow rapid comparative analysis between whole methylomes at a fraction of the cost of WGBS. Comprehensive analysis of nineteen human DNA methylomes with M&M reveals distinct DNA methylation patterns among different tissue types, cell types, and individuals, potentially underscoring divergent epigenetic regulation at different scales of phenotypic diversity. We find that differential DNA methylation at enhancer elements, with concurrent changes in histone modifications and transcription factor binding, is common at the cell, tissue, and individual levels, whereas promoter methylation is more prominent in reinforcing fundamental tissue identities.


Assuntos
Algoritmos , Metilação de DNA , Genoma Humano , Análise de Sequência de DNA/métodos , Interpretação Estatística de Dados , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Especificidade de Órgãos
15.
Beijing Da Xue Xue Bao Yi Xue Ban ; 44(3): 437-43, 2012 Jun 18.
Artigo em Chinês | MEDLINE | ID: mdl-22692318

RESUMO

OBJECTIVE: To investigate tissue distribution characteristics of 1,3-diphenyl-1,3-propanedione (DPPD) in mice. METHODS: Male ICR mice were dosed with DPPD 500 mg/kg via oral gavage, and the tissue samples of the heart, liver, spleen, lungs, kidneys and muscle of each mouse were collected as scheduled. At each time point, the concentrations of DPPD in the mouse tissues were measured by high performance liquid chromatography (HPLC) method. The main pharmacokinetic parameters were calculated by Thermo Kinetica 4.4.1 software. RESULTS: DPPD was absorbed rapidly after oral administration. The concentrations of DPPD in the liver and in the kidney were higher, respectively (liver: AUC(tot)=41.92 µg×h/g, kidney: AUC(tot)=40.40 µg×h/g). The drug concentrations showed a rapid distribution in the liver and lungs (T(max)=0.32 h and 0.33 h respectively) after oral administration, but in the muscle the maximum was 3.85 h. The maximum concentration of DPPD was in the liver (C(max)=31.20 µg/g), which was also the highest tissue concentration of all the subjects. DPPD could be detected at the low concentration within 24 h in all the tissues involved. CONCLUSION: DPPD distributed unevenly in various tissues. In the liver, kidney and muscle, the amount of the drug concentration was larger, and was lower in the lungs and spleen.


Assuntos
Chalconas/farmacocinética , Animais , Cromatografia Líquida de Alta Pressão , Masculino , Camundongos , Camundongos Endogâmicos ICR , Distribuição Tecidual
16.
BMC Bioinformatics ; 10: 146, 2009 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-19445669

RESUMO

BACKGROUND: Time-course microarray experiments produce vector gene expression profiles across a series of time points. Clustering genes based on these profiles is important in discovering functional related and co-regulated genes. Early developed clustering algorithms do not take advantage of the ordering in a time-course study, explicit use of which should allow more sensitive detection of genes that display a consistent pattern over time. Peddada et al. 1 proposed a clustering algorithm that can incorporate the temporal ordering using order-restricted statistical inference. This algorithm is, however, very time-consuming and hence inapplicable to most microarray experiments that contain a large number of genes. Its computational burden also imposes difficulty to assess the clustering reliability, which is a very important measure when clustering noisy microarray data. RESULTS: We propose a computationally efficient information criterion-based clustering algorithm, called ORICC, that also takes account of the ordering in time-course microarray experiments by embedding the order-restricted inference into a model selection framework. Genes are assigned to the profile which they best match determined by a newly proposed information criterion for order-restricted inference. In addition, we also developed a bootstrap procedure to assess ORICC's clustering reliability for every gene. Simulation studies show that the ORICC method is robust, always gives better clustering accuracy than Peddada's method and saves hundreds of times computational time. Under some scenarios, its accuracy is also better than some other existing clustering methods for short time-course microarray data, such as STEM 2 and Wang et al. 3. It is also computationally much faster than Wang et al. 3. CONCLUSION: Our ORICC algorithm, which takes advantage of the temporal ordering in time-course microarray experiments, provides good clustering accuracy and is meanwhile much faster than Peddada's method. Moreover, the clustering reliability for each gene can also be assessed, which is unavailable in Peddada's method. In a real data example, the ORICC algorithm identifies new and interesting genes that previous analyses failed to reveal.


Assuntos
Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Modelos Genéticos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Neoplasias da Mama , Simulação por Computador , Bases de Dados Factuais , Feminino , Genes , Humanos , Projetos de Pesquisa
17.
Comput Med Imaging Graph ; 32(8): 685-98, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18818051

RESUMO

Image segmentation is often required as a preliminary and indispensable stage in the computer aided medical image process, particularly during the clinical analysis of magnetic resonance (MR) brain images. In this paper, we present a modified fuzzy c-means (FCM) algorithm for MRI brain image segmentation. In order to reduce the noise effect during segmentation, the proposed method incorporates both the local spatial context and the non-local information into the standard FCM cluster algorithm using a novel dissimilarity index in place of the usual distance metric. The efficiency of the proposed algorithm is demonstrated by extensive segmentation experiments using both simulated and real MR images and by comparison with other state of the art algorithms.


Assuntos
Encéfalo/anatomia & histologia , Análise por Conglomerados , Lógica Fuzzy , Aumento da Imagem/métodos , Imageamento por Ressonância Magnética/métodos , Humanos , Imageamento Tridimensional/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Pesos e Medidas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...