Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
PLoS One ; 16(3): e0248861, 2021.
Article in English | MEDLINE | ID: mdl-33780482

ABSTRACT

In this paper, we use network approaches to analyze the relations between protein sequence features for the top hierarchical classes of CATH and SCOP. We use fundamental connectivity measures such as correlation (CR), normalized mutual information rate (nMIR), and transfer entropy (TE) to analyze the pairwise-relationships between the protein sequence features, and use centrality measures to analyze weighted networks constructed from the relationship matrices. In the centrality analysis, we find both commonalities and differences between the different protein 3D structural classes. Results show that all top hierarchical classes of CATH and SCOP present strong non-deterministic interactions for the composition and arrangement features of Cystine (C), Methionine (M), Tryptophan (W), and also for the arrangement features of Histidine (H). The different protein 3D structural classes present different preferences in terms of their centrality distributions and significant features.


Subject(s)
Algorithms , Proteins/chemistry , Amino Acid Sequence , Entropy
2.
Sci Rep ; 10(1): 21773, 2020 12 10.
Article in English | MEDLINE | ID: mdl-33303802

ABSTRACT

Protein structure can provide insights that help biologists to predict and understand protein functions and interactions. However, the number of known protein structures has not kept pace with the number of protein sequences determined by high-throughput sequencing. Current techniques used to determine the structure of proteins are complex and require a lot of time to analyze the experimental results, especially for large protein molecules. The limitations of these methods have motivated us to create a new approach for protein structure prediction. Here we describe a new approach to predict of protein structures and structure classes from amino acid sequences. Our prediction model performs well in comparison with previous methods when applied to the structural classification of two CATH datasets with more than 5000 protein domains. The average accuracy is 92.5% for structure classification, which is higher than that of previous research. We also used our model to predict four known protein structures with a single amino acid sequence, while many other existing methods could only obtain one possible structure for a given sequence. The results show that our method provides a new effective and reliable tool for protein structure prediction research.


Subject(s)
Amino Acids/chemistry , Protein Conformation , Protein Folding , Proteins/chemistry , Amino Acid Sequence , Protein Domains
3.
PLoS One ; 14(12): e0226768, 2019.
Article in English | MEDLINE | ID: mdl-31869390

ABSTRACT

Proteins are diverse with their sequences, structures and functions, it is important to study the relations between the sequences, structures and functions. In this paper, we conduct a study that surveying the relations between the protein sequences and their structures. In this study, we use the natural vector (NV) and the averaged property factor (APF) features to represent protein sequences into feature vectors, and use the multi-class MSE and the convex hull methods to separate proteins of different structural classes into different regions. We found that proteins from different structural classes are separable by hyper-planes and convex hulls in the natural vector feature space, where the feature vectors of different structural classes are separated into disjoint regions or convex hulls in the high dimensional feature spaces. The natural vector outperforms the averaged property factor method in identifying the structures, and the convex hull method outperforms the multi-class MSE in separating the feature points. These outcomes convince the strong connections between the protein sequences and their structures, and may imply that the amino acids composition and their sequence arrangements represented by the natural vectors have greater influences to the structures than the averaged physical property factors of the amino acids.


Subject(s)
Algorithms , Proteins/chemistry , Proteomics , Amino Acid Sequence , Animals , Databases, Protein , Humans , Protein Conformation , Proteins/classification , Proteomics/methods
4.
PLoS One ; 13(12): e0208423, 2018.
Article in English | MEDLINE | ID: mdl-30521578

ABSTRACT

As the big data science develops, efficient methods are demanded for various data analysis. Granger causality provides the prime model for quantifying causal interactions. However, this theoretic model does not meet the requirement for real-world data analysis, because real-world time series are diverse whose models are usually unknown. Therefore, model-free measures such as information transfer measures are strongly desired. Here, we propose the multi-scale extension of conditional mutual information measures using MORLET wavelet, which are named the WM and WPM. The proposed measures are computational efficient and interpret information transfer by multi-scales. We use both synthetic data and real-world examples to demonstrate the efficiency of the new methods. The results of the new methods are robust and reliable. Via the simulation studies, we found the new methods outperform the wavelet extension of transfer entropy (WTE) in both computational efficiency and accuracy. The features and properties of the proposed measures are also discussed.


Subject(s)
Big Data , Computer Simulation , Algorithms , Entropy , Humans , Models, Theoretical
5.
PLoS One ; 12(3): e0174386, 2017.
Article in English | MEDLINE | ID: mdl-28350835

ABSTRACT

Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method.


Subject(s)
Algorithms , Proteins/classification , Proteomics/methods , Animals , HIV/chemistry , HIV Infections/virology , Humans , Influenza A virus/chemistry , Mitochondrial Proteins/chemistry , Mitochondrial Proteins/classification , Multivariate Analysis , Orthomyxoviridae Infections/virology , Protein Conformation , Protein Kinase C/chemistry , Protein Kinase C/classification , Proteins/chemistry , Viral Proteins/chemistry , Viral Proteins/classification , beta-Globins/chemistry , beta-Globins/classification
6.
Mol Phylogenet Evol ; 98: 271-9, 2016 May.
Article in English | MEDLINE | ID: mdl-26926946

ABSTRACT

The free-living SAR11 clade is a globally abundant group of oceanic Alphaproteobacteria, with small genome sizes and rich genomic A+T content. However, the taxonomy of SAR11 has become controversial recently. Some researchers argue that the position of SAR11 is a sister group to Rickettsiales. Other researchers advocate that SAR11 is located within free-living lineages of Alphaproteobacteria. Here, we use the natural vector representation method to identify the evolutionary origin of the SAR11 clade. This alignment-free method does not depend on any model assumptions. With this approach, the correspondence between proteome sequences and their natural vectors is one-to-one. After fixing a set of proteins, each bacterium is represented by a set of vectors. The Hausdorff distance is then used to compute the dissimilarity distance between two bacteria. The phylogenetic tree can be reconstructed based on these distances. Using our method, we systematically analyze four data sets of alphaproteobacterial proteomes in order to reconstruct the phylogeny of Alphaproteobacteria. From this we can see that the phylogenetic position of the SAR11 group is within a group of other free-living lineages of Alphaproteobacteria.


Subject(s)
Alphaproteobacteria/classification , Aquatic Organisms/classification , Phylogeny , Alphaproteobacteria/genetics , Alphaproteobacteria/metabolism , Aquatic Organisms/genetics , Aquatic Organisms/metabolism , Bacterial Proteins/metabolism , Proteome/metabolism
7.
PLoS One ; 9(12): e112776, 2014.
Article in English | MEDLINE | ID: mdl-25489852

ABSTRACT

We present an EEG study of two music improvisation experiments. Professional musicians with high level of improvisation skills were asked to perform music either according to notes (composed music) or in improvisation. Each piece of music was performed in two different modes: strict mode and "let-go" mode. Synchronized EEG data was measured from both musicians and listeners. We used one of the most reliable causality measures: conditional Mutual Information from Mixed Embedding (MIME), to analyze directed correlations between different EEG channels, which was combined with network theory to construct both intra-brain and cross-brain networks. Differences were identified in intra-brain neural networks between composed music and improvisation and between strict mode and "let-go" mode. Particular brain regions such as frontal, parietal and temporal regions were found to play a key role in differentiating the brain activities between different playing conditions. By comparing the level of degree centralities in intra-brain neural networks, we found a difference between the response of musicians and the listeners when comparing the different playing conditions.


Subject(s)
Brain/physiology , Electroencephalography , Music , Nerve Net/physiology , Humans , Signal Processing, Computer-Assisted
SELECTION OF CITATIONS
SEARCH DETAIL
...