Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 21(Suppl 2): 79, 2020 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-32164526

RESUMO

BACKGROUND: Disease gene prediction is a critical and challenging task. Many computational methods have been developed to predict disease genes, which can reduce the money and time used in the experimental validation. Since proteins (products of genes) usually work together to achieve a specific function, biomolecular networks, such as the protein-protein interaction (PPI) network and gene co-expression networks, are widely used to predict disease genes by analyzing the relationships between known disease genes and other genes in the networks. However, existing methods commonly use a universal static PPI network, which ignore the fact that PPIs are dynamic, and PPIs in various patients should also be different. RESULTS: To address these issues, we develop an ensemble algorithm to predict disease genes from clinical sample-based networks (EdgCSN). The algorithm first constructs single sample-based networks for each case sample of the disease under study. Then, these single sample-based networks are merged to several fused networks based on the clustering results of the samples. After that, logistic models are trained with centrality features extracted from the fused networks, and an ensemble strategy is used to predict the finial probability of each gene being disease-associated. EdgCSN is evaluated on breast cancer (BC), thyroid cancer (TC) and Alzheimer's disease (AD) and obtains AUC values of 0.970, 0.971 and 0.966, respectively, which are much better than the competing algorithms. Subsequent de novo validations also demonstrate the ability of EdgCSN in predicting new disease genes. CONCLUSIONS: In this study, we propose EdgCSN, which is an ensemble learning algorithm for predicting disease genes with models trained by centrality features extracted from clinical sample-based networks. Results of the leave-one-out cross validation show that our EdgCSN performs much better than the competing algorithms in predicting BC-associated, TC-associated and AD-associated genes. de novo validations also show that EdgCSN is valuable for identifying new disease genes.


Assuntos
Doença de Alzheimer/genética , Neoplasias da Mama/genética , Mapas de Interação de Proteínas , Neoplasias da Glândula Tireoide/genética , Doença de Alzheimer/patologia , Área Sob a Curva , Neoplasias da Mama/patologia , Análise por Conglomerados , Feminino , Humanos , Modelos Logísticos , Modelos Teóricos , Proteínas/metabolismo , Curva ROC , Neoplasias da Glândula Tireoide/patologia
2.
Front Genet ; 10: 270, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31001321

RESUMO

Complex diseases are known to be associated with disease genes. Uncovering disease-gene associations is critical for diagnosis, treatment, and prevention of diseases. Computational algorithms which effectively predict candidate disease-gene associations prior to experimental proof can greatly reduce the associated cost and time. Most existing methods are disease-specific which can only predict genes associated with a specific disease at a time. Similarities among diseases are not used during the prediction. Meanwhile, most methods predict new disease genes based on known associations, making them unable to predict disease genes for diseases without known associated genes.In this study, a manifold learning-based method is proposed for predicting disease-gene associations by assuming that the geodesic distance between any disease and its associated genes should be shorter than that of other non-associated disease-gene pairs. The model maps the diseases and genes into a lower dimensional manifold based on the known disease-gene associations, disease similarity and gene similarity to predict new associations in terms of the geodesic distance between disease-gene pairs. In the 3-fold cross-validation experiments, our method achieves scores of 0.882 and 0.854 in terms of the area under of the receiver operating characteristic (ROC) curve (AUC) for diseases with more than one known associated genes and diseases with only one known associated gene, respectively. Further de novo studies on Lung Cancer and Bladder Cancer also show that our model is capable of identifying new disease-gene associations.

3.
Proteomics ; 19(5): e1800129, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30650262

RESUMO

Cellular functions are always performed by protein complexes. At present, many approaches have been proposed to identify protein complexes from protein-protein interaction (PPI) networks. Some approaches focus on detecting local dense subgraphs in PPI networks which are regarded as protein-complex cores, then identify protein complexes by including local neighbors. However, from gene expression profiles at different time points or tissues it is known that proteins are dynamic. Therefore, identifying dynamic protein complexes should become very important and meaningful. In this study, a novel core-attachment-based method named CO-DPC to detect dynamic protein complexes is presented. First, CO-DPC selects active proteins according to gene expression profiles and the 3-sigma principle, and constructs dynamic PPI networks based on the co-expression principle and PPI networks. Second, CO-DPC detects local dense subgraphs as the cores of protein complexes and then attach close neighbors of these cores to form protein complexes. In order to evaluate the method, the method and the existing algorithms are applied to yeast PPI networks. The experimental results show that CO-DPC performs much better than the existing methods. In addition, the identified dynamic protein complexes can match very well and thus become more meaningful for future biological study.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteômica/métodos , Transcriptoma , Algoritmos , Animais , Perfilação da Expressão Gênica/métodos , Humanos , Mapas de Interação de Proteínas , Proteínas/genética , Proteínas/metabolismo
4.
BMC Bioinformatics ; 16 Suppl 5: S8, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25860434

RESUMO

MOTIVATION: Based on the next generation genome sequencing technologies, a variety of biological applications are developed, while alignment is the first step once the sequencing reads are obtained. In recent years, many software tools have been developed to efficiently and accurately align short reads to the reference genome. However, there are still many reads that can't be mapped to the reference genome, due to the exceeding of allowable mismatches. Moreover, besides the unmapped reads, the reads with low mapping qualities are also excluded from the downstream analysis, such as variance calling. If we can take advantages of the confident segments of these reads, not only can the alignment rates be improved, but also more information will be provided for the downstream analysis. RESULTS: This paper proposes a method, called RAUR (Re-align the Unmapped Reads), to re-align the reads that can not be mapped by alignment tools. Firstly, it takes advantages of the base quality scores (reported by the sequencer) to figure out the most confident and informative segments of the unmapped reads by controlling the number of possible mismatches in the alignment. Then, combined with an alignment tool, RAUR re-align these segments of the reads. We run RAUR on both simulated data and real data with different read lengths. The results show that many reads which fail to be aligned by the most popular alignment tools (BWA and Bowtie2) can be correctly re-aligned by RAUR, with a similar Precision. Even compared with the BWA-MEM and the local mode of Bowtie2, which perform local alignment for long reads to improve the alignment rate, RAUR also shows advantages on the Alignment rate and Precision in some cases. Therefore, the trimming strategy used in RAUR is useful to improve the Alignment rate of alignment tools for the next-generation genome sequencing. AVAILABILITY: All source code are available at http://netlab.csu.edu.cn/bioinformatics/RAUR.html.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Mapeamento Cromossômico/métodos , Humanos , Linguagens de Programação , Controle de Qualidade
5.
BMC Genomics ; 16 Suppl 3: S1, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25707432

RESUMO

Essential proteins are vitally important for cellular survival and development, and identifying essential proteins is very meaningful research work in the post-genome era. Rapid increase of available protein-protein interaction (PPI) data has made it possible to detect protein essentiality at the network level. A series of centrality measures have been proposed to discover essential proteins based on the PPI networks. However, the PPI data obtained from large scale, high-throughput experiments generally contain false positives. It is insufficient to use original PPI data to identify essential proteins. How to improve the accuracy, has become the focus of identifying essential proteins. In this paper, we proposed a framework for identifying essential proteins from active PPI networks constructed with dynamic gene expression. Firstly, we process the dynamic gene expression profiles by using time-dependent model and time-independent model. Secondly, we construct an active PPI network based on co-expressed genes. Lastly, we apply six classical centrality measures in the active PPI network. For the purpose of comparison, other prediction methods are also performed to identify essential proteins based on the active PPI network. The experimental results on yeast network show that identifying essential proteins based on the active PPI network can improve the performance of centrality measures considerably in terms of the number of identified essential proteins and identification accuracy. At the same time, the results also indicate that most of essential proteins are active.


Assuntos
Simulação por Computador , Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Transcriptoma , Biologia Computacional
6.
BMC Syst Biol ; 7: 28, 2013 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-23537347

RESUMO

BACKGROUND: Identifying protein complexes from protein-protein interaction network is fundamental for understanding the mechanism of cellular component and protein function. At present, many methods to identify protein complexes are mainly based on the topological characteristics or the functional similarity features, neglecting the fact that proteins must be in their active forms to interact with others and the formation of protein complex is following a just-in-time mechanism. RESULTS: This paper firstly presents a protein complex formation model based on the just-in-time mechanism. By investigating known protein complexes combined with gene expression data, we find that most protein complexes can be formed in continuous time points, and the average overlapping rate of the known complexes during the formation is large. A method is proposed to refine the protein complexes predicted by clustering algorithms based on the protein complex formation model and the properties of known protein complexes. After refinement, the number of known complexes that are matched by predicted complexes, Sensitivity, Specificity, and f-measure are significantly improved, when compared with those of the original predicted complexes. CONCLUSION: The refining method can discard the spurious proteins by protein activity and generate new complexes by just-in-time assemble mechanism, which can enhance the ability to predict complex.


Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos Biológicos , Complexos Multiproteicos/biossíntese , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Análise por Conglomerados , Bases de Dados de Proteínas , Sensibilidade e Especificidade , Leveduras
7.
Proteome Sci ; 11(Suppl 1): S20, 2013 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-24565281

RESUMO

BACKGROUND: Protein interaction networks (PINs) are known to be useful to detect protein complexes. However, most available PINs are static, which cannot reflect the dynamic changes in real networks. At present, some researchers have tried to construct dynamic networks by incorporating time-course (dynamic) gene expression data with PINs. However, the inevitable background noise exists in the gene expression array, which could degrade the quality of dynamic networkds. Therefore, it is needed to filter out contaminated gene expression data before further data integration and analysis. RESULTS: Firstly, we adopt a dynamic model-based method to filter noisy data from dynamic expression profiles. Then a new method is proposed for identifying active proteins from dynamic gene expression profiles. An active protein at a time point is defined as the protein the expression level of whose corresponding gene at that time point is higher than a threshold determined by a standard variance involved threshold function. Furthermore, a noise-filtered active protein interaction network (NF-APIN) is constructed. To demonstrate the efficiency of our method, we detect protein complexes from the NF-APIN, compared with those from other dynamic PINs. CONCLUSION: A dynamic model based method can effectively filter out noises in dynamic gene expression data. Our method to compute a threshold for determining the active time points of noise-filtered genes can make the dynamic construction more accuracy and provide a high quality framework for network analysis, such as protein complex prediction.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...