Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-37971919

RESUMO

This brief is concerned with the prediction problem of product popularity under a social network (SN) with positive-negative diffusion (PND). First, a PND model is proposed to enable the simulation of product diffusion, and three user states are defined. Second, an optimal and precise feature vector of every user is extracted through a multi-agent-system-based attention mechanism (MASAM) that is devised. The weight matrix shared in the mechanism of all agents is learned using a distributed learning algorithm provided in MASAM. Third, an MAS model for product diffusion on SN is established based on the feature representations from MASAM. Rules for agent interaction during PND diffusion are suggested, which accelerate the simulation of information spread in SN. Finally, comprehensive experiments are conducted to verify the effectiveness and efficiency of the proposed models and algorithms in prediction and to compare their performance with baseline methods. Furthermore, a case study is provided to illustrate the applicability and extendibility of the developed algorithm.

2.
IEEE Trans Cybern ; 53(9): 6004-6016, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37018298

RESUMO

This article is concerned with the influence maximization (IM) problem under a network with probabilistically unstable links (PULs) via graph embedding for multiagent systems (MASs). First, two diffusion models, the unstable-link independent cascade (UIC) model and the unstable-link linear threshold (ULT) model, are designed for the IM problem under the network with PULs. Second, the MAS model for the IM problem with PULs is established and a series of interaction rules among agents are built for the MAS model. Third, the similarity of the unstable structure of the nodes is defined and a novel graph embedding method, termed the unstable-similarity2vec (US2vec) approach, is proposed to tackle the IM problem under the network with PULs. According to the embedding results of the US2vec approach, the seed set is figured out by the developed algorithm. Finally, extensive experiments are conducted to: 1) verify the validity of the proposed model and the developed algorithms and 2) illustrate the optimal solution for IM under different scenarios with PULs.

3.
Comput Biol Chem ; 99: 107735, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35850048

RESUMO

The development of third-generation sequencing technology has brought significant changes and influences on genomics. Compared to the second-generation sequencing methods, the third-generation technologies produce around 100 times longer reads to reveal new genomic variations that complete long-term gaps in the human reference genome. However, these reads' excessive length and high error rate severely increase the amount of data and alignment cost. The traditional data analysis platform and serial sequence alignment method can not effectively deal with large-scale long read alignment. There is a critical need for a novel data analysis platform that can deliver fast alignment of large-scale sequences to solve the problem of long read alignment. High-performance computing platforms and efficient, scalable algorithms based on these platforms have significant potential to impact sequence analysis approaches. This paper presented minimapR, a multi-level parallel long-read alignment tool based on minimap2, a popular third-generation read aligner. MinimapR is developed based on the new high-performance distributed framework Ray. Ray fully integrates with the Python environment and can be easily installed with pip. MinimapR can utilize the power of multiple computing nodes, significantly accelerating alignment speeds without sacrificing sensitivity. The minimapR tool was tested on 64 nodes and demonstrated a 50 fold increase in speed with 78 % parallel efficiency. The source code and user manual of minimapR are freely available at https://github.com/Geehome/minimapR.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Algoritmos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Alinhamento de Sequência , Análise de Sequência de DNA/métodos
4.
Interdiscip Sci ; 14(1): 1-14, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34487327

RESUMO

The rapid advances in sequencing technology have led to an explosion of sequence data. Sequence alignment is the central and fundamental problem in many sequence analysis procedure, while local alignment is often the kernel of these algorithms. Usually, Smith-Waterman algorithm is used to find the best subsequence match between given sequences. However, the high time complexity makes the algorithm time-consuming. A lot of approaches have been developed to accelerate and parallelize it, such as vector-level parallelization, thread-level parallelization, process-level parallelization, and heterogeneous acceleration, but the current researches seem unsystematic, which hinders the further research of parallelizing the algorithm. In this paper, we summarize the current research status of parallel local alignments and describe the data layout in these work. Based on the research status, we emphasize large-scale genomic comparisons. By surveying some typical alignment tools' performance, we discuss some possible directions in the future. We hope our work will provide the developers of the alignment tool with technical principle support, and help researchers choose proper alignment tools.


Assuntos
Algoritmos , Software , Genômica , Alinhamento de Sequência , Análise de Sequência/métodos
5.
BMC Bioinformatics ; 22(1): 344, 2021 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-34167459

RESUMO

BACKGROUND: VISPR is an interactive visualization and analysis framework for CRISPR screening experiments. However, it only supports the output of MAGeCK, and requires installation and manual configuration. Furthermore, VISPR is designed to run on a single computer, and data sharing between collaborators is challenging. RESULTS: To make the tool easily accessible to the community, we present VISPR-online, a web-based general application allowing users to visualize, explore, and share CRISPR screening data online with a few simple steps. VISPR-online provides an exploration of screening results and visualization of read count changes. Apart from MAGeCK, VISPR-online supports two more popular CRISPR screening analysis tools: BAGEL and JACKS. It provides an interactive environment for exploring gene essentiality, viewing guide RNA (gRNA) locations, and allowing users to resume and share screening results. CONCLUSIONS: VISPR-online allows users to visualize, explore and share CRISPR screening data online. It is freely available at http://vispr-online.weililab.org , while the source code is available at https://github.com/lemoncyb/VISPR-online .


Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Software , Internet , RNA Guia de Cinetoplastídeos , Pesquisa
6.
IEEE/ACM Trans Comput Biol Bioinform ; 18(4): 1464-1473, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-31675339

RESUMO

With a variety of tumor subtypes, personalized treatments need to identify the subtype of a tumor as accurately as possible. The development of DNA microarrays provides an opportunity to predict tumor classification. One strategy is to use gene expression profiling to extend current biological insights into the disease. However, overfitting problems exist in most machine learning methods when classifying tumor gene expression profile data characterized by high dimensional, small samples and nonlinearities. As a new machine learning methods, dictionary learning has become a more effective algorithm for gene expression profile classification. Here, a new method called discriminant projection shared dictionary learning (DPSDL) is proposed for classifying tumor subtypes using LINCS gene expression profile data. The method trains a shared dictionary, embeds Fisher discriminant criteria to obtain a class-specific sub-dictionary and coding coefficients. At the same time, a projection matrix is trained to widen the distance between different classes of samples. Experimental results show that our method performs better classification based on gene expression profile than the other dictionary learning methods and machine learning methods.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Neoplasias/classificação , Transcriptoma/genética , Algoritmos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Humanos , Neoplasias/genética
7.
BMC Genomics ; 21(Suppl 1): 872, 2020 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-32138651

RESUMO

BACKGROUND: The Type II clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated proteins (Cas) is a powerful genome editing technology, which is more and more popular in gene function analysis. In CRISPR/Cas, RNA guides Cas nuclease to the target site to perform DNA modification. RESULTS: The performance of CRISPR/Cas depends on well-designed single guide RNA (sgRNA). However, the off-target effect of sgRNA leads to undesired mutations in genome and limits the use of CRISPR/Cas. Here, we present OffScan, a universal and fast CRISPR off-target detection tool. CONCLUSIONS: OffScan is not limited by the number of mismatches and allows custom protospacer-adjacent motif (PAM), which is the target site by Cas protein. Besides, OffScan adopts the FM-index, which efficiently improves query speed and reduce memory consumption.


Assuntos
Sistemas CRISPR-Cas , Biologia Computacional/métodos , Edição de Genes/métodos , RNA Guia de Cinetoplastídeos/genética , Algoritmos , Animais , Caenorhabditis elegans/genética , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Endonucleases/metabolismo , Humanos , Camundongos , Mutação , Peixe-Zebra/genética
8.
Artigo em Inglês | MEDLINE | ID: mdl-29994638

RESUMO

Molecular dynamics (MD) is a computer simulation method of studying physical movements of atoms and molecules that provide detailed microscopic sampling on molecular scale. With the continuous efforts and improvements, MD simulation gained popularity in materials science, biochemistry and biophysics with various application areas and expanding data scale. Assisted Model Building with Energy Refinement (AMBER) is one of the most widely used software packages for conducting MD simulations. However, the speed of AMBER MD simulations for system with millions of atoms in microsecond scale still need to be improved. In this paper, we propose a parallel acceleration strategy for AMBER on the Tianhe-2 supercomputer. The parallel optimization of AMBER is carried out on three different levels: fine grained OpenMP parallel on a single CPU, single node CPU/MIC parallel optimization and multi-node multi-MIC collaborated parallel acceleration. By the three levels of parallel acceleration strategy above, we achieved the highest speedup of 25-33 times compared with the original program.


Assuntos
Biologia Computacional/instrumentação , Biologia Computacional/métodos , Simulação de Dinâmica Molecular , Algoritmos , Computadores
9.
Brief Funct Genomics ; 18(1): 41-57, 2019 02 14.
Artigo em Inglês | MEDLINE | ID: mdl-30265280

RESUMO

Omics, such as genomics, transcriptome and proteomics, has been affected by the era of big data. A huge amount of high dimensional and complex structured data has made it no longer applicable for conventional machine learning algorithms. Fortunately, deep learning technology can contribute toward resolving these challenges. There is evidence that deep learning can handle omics data well and resolve omics problems. This survey aims to provide an entry-level guideline for researchers, to understand and use deep learning in order to solve omics problems. We first introduce several deep learning models and then discuss several research areas which have combined omics and deep learning in recent years. In addition, we summarize the general steps involved in using deep learning which have not yet been systematically discussed in the existent literature on this topic. Finally, we compare the features and performance of current mainstream open source deep learning frameworks and present the opportunities and challenges involved in deep learning. This survey will be a good starting point and guideline for omics researchers to understand deep learning.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Genômica/métodos , Proteômica/métodos , Transcriptoma , Algoritmos , Guias como Assunto , Humanos , Inquéritos e Questionários
10.
Artigo em Inglês | MEDLINE | ID: mdl-30387739

RESUMO

Co-evolution exists ubiquitously in biological systems. At the molecular level, interacting proteins, such as ligands and their receptors and components in protein complexes, co-evolve to maintain their structural and functional interactions. Many proteins contain multiple functional domains interacting with different partners, making co-evolution of interacting domains occur more prominently. Multiple methods have been developed to predict interacting proteins or domains within proteins by detecting their co-variation. This strategy neglects the fact that interacting domains can be highly co-conserved due to their functional interactions. Here we report a novel algorithm COPCOP to detect signals of both co-positive selection (co-variation) and co-purifying selection (co-conservation). Results show that our algorithm performs well and outperforms the popular co-variation analysis program CAPS. We also design and implement a multi-level parallel acceleration strategy for COPCOP based on Tianhe-2 CPU-MIC heterogeneous supercomputer system to meet the need of large-scale co-evolutionary domain detection.

11.
BMC Bioinformatics ; 19(Suppl 9): 282, 2018 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-30367570

RESUMO

BACKGROUND: Novel sequence motifs detection is becoming increasingly essential in computational biology. However, the high computational cost greatly constrains the efficiency of most motif discovery algorithms. RESULTS: In this paper, we accelerate MEME algorithm targeted on Intel Many Integrated Core (MIC) Architecture and present a parallel implementation of MEME called MIC-MEME base on hybrid CPU/MIC computing framework. Our method focuses on parallelizing the starting point searching method and improving iteration updating strategy of the algorithm. MIC-MEME has achieved significant speedups of 26.6 for ZOOPS model and 30.2 for OOPS model on average for the overall runtime when benchmarked on the experimental platform with two Xeon Phi 3120 coprocessors. CONCLUSIONS: Furthermore, MIC-MEME has been compared with state-of-arts methods and it shows good scalability with respect to dataset size and the number of MICs. Source code: https://github.com/hkwkevin28/MIC-MEME .


Assuntos
Biologia Computacional/métodos , Motivos de Nucleotídeos , Regiões Promotoras Genéticas , Elementos Reguladores de Transcrição , Software , Algoritmos , Gráficos por Computador , Bases de Dados Genéticas , Humanos , Internet , Fatores de Transcrição/metabolismo
12.
BMC Bioinformatics ; 19(Suppl 4): 98, 2018 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-29745832

RESUMO

BACKGROUND: Frequent subgraphs mining is a significant problem in many practical domains. The solution of this kind of problem can particularly used in some large-scale drug molecular or biological libraries to help us find drugs or core biological structures rapidly and predict toxicity of some unknown compounds. The main challenge is its efficiency, as (i) it is computationally intensive to test for graph isomorphisms, and (ii) the graph collection to be mined and mining results can be very large. Existing solutions often require days to derive mining results from biological networks even with relative low support threshold. Also, the whole mining results always cannot be stored in single node memory. RESULTS: In this paper, we implement a parallel acceleration tool for classical frequent subgraph mining algorithm called cmFSM. The core idea is to employ parallel techniques to parallelize extension tasks, so as to reduce computation time. On the other hand, we employ multi-node strategy to solve the problem of memory constraints. The parallel optimization of cmFSM is carried out on three different levels, including the fine-grained OpenMP parallelization on single node, multi-node multi-process parallel acceleration and CPU-MIC collaborated parallel optimization. CONCLUSIONS: Evaluation results show that cmFSM clearly outperforms the existing state-of-the-art miners even if we only hold a few parallel computing resources. It means that cmFSM provides a practical solution to frequent subgraph mining problem with huge number of mining results. Specifically, our solution is up to one order of magnitude faster than the best CPU-based approach on single node and presents a promising scalability of massive mining tasks in multi-node scenario. More source code are available at:Source Code: https://github.com/ysycloud/cmFSM .


Assuntos
Algoritmos , Mineração de Dados , Avaliação Pré-Clínica de Medicamentos , Software , Bases de Dados como Assunto
13.
Interdiscip Sci ; 10(2): 455-465, 2018 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29644494

RESUMO

The adaptive immunity system in bacteria and archaea, Clustered Regularly Interspaced Short Palindromic Repeats, CRISPR-associate (CRISPR/Cas), has been adapted as a powerful gene editing tool and got a broad application in genome research field due to its ease of use and cost-effectiveness. The performance of CRISPR/Cas relies on well-designed single-guide RNA (sgRNA), so a lot of bioinformatic tools have been developed to assist the design of highly active and specific sgRNA. These tools vary in design specifications, parameters, genomes and so on. To help researchers to choose their proper tools, we reviewed various sgRNA design tools, mainly focusing on their on-target efficiency prediction model and off-target detection algorithm.


Assuntos
Sistemas CRISPR-Cas/genética , Técnicas Genéticas , RNA Guia de Cinetoplastídeos/metabolismo , Animais , Edição de Genes , Humanos
14.
IEEE Trans Neural Netw Learn Syst ; 29(6): 2337-2351, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-28436893

RESUMO

As data sets become larger and more complicated, an extreme learning machine (ELM) that runs in a traditional serial environment cannot realize its ability to be fast and effective. Although a parallel ELM (PELM) based on MapReduce to process large-scale data shows more efficient learning speed than identical ELM algorithms in a serial environment, some operations, such as intermediate results stored on disks and multiple copies for each task, are indispensable, and these operations create a large amount of extra overhead and degrade the learning speed and efficiency of the PELMs. In this paper, an efficient ELM based on the Spark framework (SELM), which includes three parallel subalgorithms, is proposed for big data classification. By partitioning the corresponding data sets reasonably, the hidden layer output matrix calculation algorithm, matrix decomposition algorithm, and matrix decomposition algorithm perform most of the computations locally. At the same time, they retain the intermediate results in distributed memory and cache the diagonal matrix as broadcast variables instead of several copies for each task to reduce a large amount of the costs, and these actions strengthen the learning ability of the SELM. Finally, we implement our SELM algorithm to classify large data sets. Extensive experiments have been conducted to validate the effectiveness of the proposed algorithms. As shown, our SELM achieves an speedup on a cluster with ten nodes, and reaches a speedup with 15 nodes, an speedup with 20 nodes, a speedup with 25 nodes, a speedup with 30 nodes, and a speedup with 35 nodes.

15.
J Comput Biol ; 24(12): 1230-1242, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29116822

RESUMO

Multiple sequence alignment (MSA) is an essential prerequisite and dominant method to deduce the biological facts from a set of molecular biological sequences. It refers to a series of algorithmic solutions for the alignment of evolutionarily related sequences while taking into account evolutionary events such as mutations, insertions, deletions, and rearrangements under certain conditions. These methods can be applied to DNA, RNA, or protein sequences. In this work, we take advantage of a center-star strategy to reduce the MSA problem to pairwise alignments, and we use a suffix tree to match identical substrings between two pairwise sequences. Multiple sequence alignment based on a suffix tree and center-star strategy (MASC) can accomplish MSA in O(mn), which is linear time complexity, where m is the number of sequences and n is the average length of sequences. Furthermore, we execute our method on the Spark-distributed parallel framework to deal with ever-increasing massive data sets. Our method is significantly faster than previous techniques, with no loss in accuracy for highly similar nucleotide sequences like homologous sequences, which we experimentally demonstrate. Comparing with mainstream MSA tools (e.g., MAFFT), MASC could finish the alignment of 67,200 sequences, longer than 10,000 bps, in 9 minutes, which takes MAFFT >3.5 days.


Assuntos
DNA/química , Proteínas/química , RNA/química , Alinhamento de Sequência/métodos , Software , Algoritmos , Biologia Computacional/métodos , Humanos
16.
BMC Genomics ; 18(Suppl 2): 134, 2017 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-28361696

RESUMO

BACKGROUND: The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. RESULTS: In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. CONCLUSIONS: To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).


Assuntos
Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Metilação de DNA , Epigênese Genética , Software , Sequência de Aminoácidos , Sequência de Bases , Citosina/metabolismo , Genoma Humano , Humanos , Alinhamento de Sequência , Análise de Sequência de DNA
17.
Interdiscip Sci ; 8(1): 28-34, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26358141

RESUMO

Sequence alignment is the central process for sequence analysis, where mapping raw sequencing data to reference genome. The large amount of data generated by NGS is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. Intel recently announced the MIC coprocessor, which can provide massive computing power. The Tianhe-2 is the world's fastest supercomputer now equipped with three MIC coprocessors each compute node. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a MIC-oriented three-level parallelization strategy to speed up BWA, a widely used sequence alignment tool, and developed our ultrafast parallel sequence aligner: B-MIC. B-MIC contains three levels of parallelization: firstly, parallelization of data IO and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by MIC coprocessor technology; thirdly, inter-node parallelization implemented by MPI. In this paper, we demonstrate that B-MIC outperforms BWA by a combination of those techniques using Inspur NF5280M server and the Tianhe-2 supercomputer. To the best of our knowledge, B-MIC is the first sequence alignment tool to run on Intel MIC and it can achieve more than fivefold speedup over the original BWA while maintaining the alignment precision.


Assuntos
Computadores , Alinhamento de Sequência/instrumentação , Análise de Sequência de DNA/instrumentação , Software , Algoritmos
18.
Interdiscip Sci ; 8(2): 169-176, 2016 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26403255

RESUMO

' The de novo assembly of DNA sequences is increasingly important for biological researches in the genomic era. After more than one decade since the Human Genome Project, some challenges still exist and new solutions are being explored to improve de novo assembly of genomes. String graph assembler (SGA), based on the string graph theory, is a new method/tool developed to address the challenges. In this paper, based on an in-depth analysis of SGA we prove that the SGA-based sequence de novo assembly is an NP-complete problem. According to our analysis, SGA outperforms other similar methods/tools in memory consumption, but costs much more time, of which 60-70 % is spent on the index construction. Upon this analysis, we introduce a hybrid parallel optimization algorithm and implement this algorithm in the TianHe-2's parallel framework. Simulations are performed with different datasets. For data of small size the optimized solution is 3.06 times faster than before, and for data of middle size it's 1.60 times. The results demonstrate an evident performance improvement, with the linear scalability for parallel FM-index construction. This results thus contribute significantly to improving the efficiency of de novo assembly of DNA sequences.


Assuntos
Algoritmos , Biologia Computacional/métodos , Genômica , Análise de Sequência de DNA , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...