Search | VHL Regional Portal

Inter-domain linker prediction using amino acid compositional index.

Shatnawi, Maad; Zaki, Nazar.

Comput Biol Chem ; 55: 23-30, 2015 Apr.

Article in English | MEDLINE | ID: mdl-25677918

ABSTRACT

Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate and reliable prediction of protein domain linkers and boundaries is often considered to be the initial step of protein tertiary structure and function predictions. In this paper, we introduce CISA as a method for predicting inter-domain linker regions solely from the amino acid sequence information. The method first computes the amino acid compositional index from the protein sequence dataset of domain-linker segments and the amino acid composition. A preference profile is then generated by calculating the average compositional index values along the amino acid sequence using a sliding window. Finally, the protein sequence is segmented into intervals and a simulated annealing algorithm is employed to enhance the prediction by finding the optimal threshold value for each segment that separates domains from inter-domain linkers. The method was tested on two standard protein datasets and showed considerable improvement over the state-of-the-art domain linker prediction methods.

Subject(s)

Amino Acid Sequence , Software , Computer Simulation , Machine Learning , Models, Molecular , Protein Conformation , Protein Structure, Tertiary

Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties.

Shatnawi, Maad; Zaki, Nazar; Yoo, Paul D.

BMC Bioinformatics ; 15 Suppl 16: S8, 2014.

Article in English | MEDLINE | ID: mdl-25521329

ABSTRACT

BACKGROUND: Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate prediction of protein domain linkers and boundaries is often regarded as the initial step of protein tertiary structure and function predictions. Such information not only enhances protein-targeted drug development but also reduces the experimental cost of protein analysis by allowing researchers to work on a set of smaller and independent units. In this study, we propose a novel and accurate domain-linker prediction approach based on protein primary structure information only. We utilize a nature-inspired machine-learning model called Random Forest along with a novel domain-linker profile that contains physiochemical and domain-linker information of amino acid sequences. RESULTS: The proposed approach was tested on two well-known benchmark protein datasets and achieved 68% sensitivity and 99% precision, which is better than any existing protein domain-linker predictor. Without applying any data balancing technique such as class weighting and data re-sampling, the proposed approach is able to accurately classify inter-domain linkers from highly imbalanced datasets. CONCLUSION: Our experimental results prove that the proposed approach is useful for domain-linker identification in highly imbalanced single- and multi-domain proteins.

Subject(s)

Algorithms , Amino Acids/chemistry , Models, Statistical , Proteins/chemistry , Sequence Analysis, Protein/methods , Amino Acid Sequence , Chemical Phenomena , Datasets as Topic , Humans , Hydrophobic and Hydrophilic Interactions , Molecular Sequence Data , Sequence Homology, Amino Acid

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL