Search | VHL Regional Portal

1.

ProtTrans and multi-window scanning convolutional neural networks for the prediction of protein-peptide interaction sites.

Le, Van-The; Zhan, Zi-Jun; Vu, Thi-Thu-Phuong; Malik, Muhammad-Shahid; Ou, Yu-Yen.

J Mol Graph Model ; 130: 108777, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38642500

ABSTRACT

This study delves into the prediction of protein-peptide interactions using advanced machine learning techniques, comparing models such as sequence-based, standard CNNs, and traditional classifiers. Leveraging pre-trained language models and multi-view window scanning CNNs, our approach yields significant improvements, with ProtTrans standing out based on 2.1 billion protein sequences and 393 billion amino acids. The integrated model demonstrates remarkable performance, achieving an AUC of 0.856 and 0.823 on the PepBCL Set_1 and Set_2 datasets, respectively. Additionally, it attains a Precision of 0.564 in PepBCL Set 1 and 0.527 in PepBCL Set 2, surpassing the performance of previous methods. Beyond this, we explore the application of this model in cancer therapy, particularly in identifying peptide interactions for selective targeting of cancer cells, and other fields. The findings of this study contribute to bioinformatics, providing valuable insights for drug discovery and therapeutic development.

Subject(s)

Computational Biology , Neural Networks, Computer , Peptides , Proteins , Peptides/chemistry , Proteins/chemistry , Computational Biology/methods , Humans , Machine Learning , Protein Binding , Binding Sites , Algorithms , Databases, Protein

2.

DeepPLM_mCNN: An approach for enhancing ion channel and ion transporter recognition by multi-window CNN based on features from pre-trained language models.

Le, Van-The; Malik, Muhammad-Shahid; Tseng, Yi-Hsuan; Lee, Yu-Cheng; Huang, Cheng-I; Ou, Yu-Yen.

Comput Biol Chem ; 110: 108055, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38555810

ABSTRACT

Accurate classification of membrane proteins like ion channels and transporters is critical for elucidating cellular processes and drug development. We present DeepPLM_mCNN, a novel framework combining Pretrained Language Models (PLMs) and multi-window convolutional neural networks (mCNNs) for effective classification of membrane proteins into ion channels and ion transporters. Our approach extracts informative features from protein sequences by utilizing various PLMs, including TAPE, ProtT5_XL_U50, ESM-1b, ESM-2_480, and ESM-2_1280. These PLM-derived features are then input into a mCNN architecture to learn conserved motifs important for classification. When evaluated on ion transporters, our best performing model utilizing ProtT5 achieved 90% sensitivity, 95.8% specificity, and 95.4% overall accuracy. For ion channels, we obtained 88.3% sensitivity, 95.7% specificity, and 95.2% overall accuracy using ESM-1b features. Our proposed DeepPLM_mCNN framework demonstrates significant improvements over previous methods on unseen test data. This study illustrates the potential of combining PLMs and deep learning for accurate computational identification of membrane proteins from sequence data alone. Our findings have important implications for membrane protein research and drug development targeting ion channels and transporters. The data and source codes in this study are publicly available at the following link: https://github.com/s1129108/DeepPLM_mCNN.

Subject(s)

Ion Channels , Neural Networks, Computer , Ion Channels/metabolism , Ion Channels/chemistry , Deep Learning , Ion Transport

3.

Categorization of tweets for damages: infrastructure and human damage assessment using fine-tuned BERT model.

Malik, Muhammad Shahid Iqbal; Younas, Muhammad Zeeshan; Jamjoom, Mona Mamdouh; Ignatov, Dmitry I.

PeerJ Comput Sci ; 10: e1859, 2024.

Article in English | MEDLINE | ID: mdl-38435619

ABSTRACT

Identification of infrastructure and human damage assessment tweets is beneficial to disaster management organizations as well as victims during a disaster. Most of the prior works focused on the detection of informative/situational tweets, and infrastructure damage, only one focused on human damage. This study presents a novel approach for detecting damage assessment tweets involving infrastructure and human damages. We investigated the potential of the Bidirectional Encoder Representations from Transformer (BERT) model to learn universal contextualized representations targeting to demonstrate its effectiveness for binary and multi-class classification of disaster damage assessment tweets. The objective is to exploit a pre-trained BERT as a transfer learning mechanism after fine-tuning important hyper-parameters on the CrisisMMD dataset containing seven disasters. The effectiveness of fine-tuned BERT is compared with five benchmarks and nine comparable models by conducting exhaustive experiments. The findings show that the fine-tuned BERT outperformed all benchmarks and comparable models and achieved state-of-the-art performance by demonstrating up to 95.12% macro-f1-score, and 88% macro-f1-score for binary and multi-class classification. Specifically, the improvement in the classification of human damage is promising.

4.

How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models.

Malik, Muhammad Shahid Iqbal; Imran, Tahir; Mona Mamdouh, Jamjoom.

PeerJ Comput Sci ; 9: e1248, 2023.

Article in English | MEDLINE | ID: mdl-37346552

ABSTRACT

Online propaganda is a mechanism to influence the opinions of social media users. It is a growing menace to public health, democratic institutions, and public society. The present study proposes a propaganda detection framework as a binary classification model based on a news repository. Several feature models are explored to develop a robust model such as part-of-speech, LIWC, word uni-gram, Embeddings from Language Models (ELMo), FastText, word2vec, latent semantic analysis (LSA), and char tri-gram feature models. Moreover, fine-tuning of the BERT is also performed. Three oversampling methods are investigated to handle the imbalance status of the Qprop dataset. SMOTE Edited Nearest Neighbors (ENN) presented the best results. The fine-tuning of BERT revealed that the BERT-320 sequence length is the best model. As a standalone model, the char tri-gram presented superior performance as compared to other features. The robust performance is observed against the combination of char tri-gram + BERT and char tri-gram + word2vec and they outperformed the two state-of-the-art baselines. In contrast to prior approaches, the addition of feature selection further improves the performance and achieved more than 97.60% recall, f1-score, and AUC on the dev and test part of the dataset. The findings of the present study can be used to organize news articles for various public news websites.

5.

Rumour identification on Twitter as a function of novel textual and language-context features.

Ali, Ghulam; Malik, Muhammad Shahid Iqbal.

Multimed Tools Appl ; 82(5): 7017-7038, 2023.

Article in English | MEDLINE | ID: mdl-35974894

ABSTRACT

Social microblogs are one of the popular platforms for information spreading. However, with several advantages, these platforms are being used for spreading rumours. At present, the majority of existing approaches identify rumours at the topic level instead of at the tweet/post level. Moreover, prior studies used the sentiment and linguistic features for rumours identification without considering discrete positive and negative emotions and effective part-of-speech features in content-based approaches. Similarly, the majority of prior studies used content-based approaches for feature generation, and recent context-based approaches were not explored. To cope with these challenges, a robust framework for rumour detection at the tweet level is designed in this paper. The model used word2vec embeddings and bidirectional encoder representations from transformers method (BERT) from context-based and discrete emotions, linguistic, and metadata characteristics from content-based approaches. According to our knowledge, we are the first ones who used these features for rumour identification at the tweet/post level. The framework is tested on four real-life twitter microblog datasets. The results show that the detection model is capable of detecting 97%, 86%, 85%, and 80% of rumours on four datasets respectively. In addition, the proposed framework outperformed the three latest state-of-the-art baselines. BERT model presented the best performance among context-based approaches, and linguistic features are best performing among content-based approaches as a stand-alone model. Moreover, the utilization of two-step feature selection further improves the detection model performance.

6.

Identification of offensive language in Urdu using semantic and embedding models.

Hussain, Sajid; Malik, Muhammad Shahid Iqbal; Masood, Nayyer.

PeerJ Comput Sci ; 8: e1169, 2022.

Article in English | MEDLINE | ID: mdl-37346307

ABSTRACT

Automatic identification of offensive/abusive language is very necessary to get rid of unwanted behavior. However, it is more challenging to generalize the solution due to the different grammatical structures and vocabulary of each language. Most of the prior work targeted western languages, however, one study targeted a low-resource language (Urdu). The prior study used basic linguistic features and a small dataset. This study designed a new dataset (collected from popular Pakistani Facebook pages) containing 7,500 posts for offensive language detection in Urdu. The proposed methodology used four types of feature engineering models: three are frequency-based and the fourth one is the embedding model. Frequency-based are either determined by the term frequency-inverse document frequency (TF-IDF) or bag-of-words or word n-gram feature vectors. The fourth is generated by the word2vec model, trained on the Urdu embeddings using a corpus of 196,226 Facebook posts. The experiments demonstrate that the stacking-based ensemble model with word2vec shows the best performance as a standalone model by achieving 88.27% accuracy. In addition, the wrapper-based feature selection method further improves performance. The hybrid combination of TF-IDF, bag-of-words, and word2vec feature models achieved 90% accuracy and 97% AUC. In addition, it outperformed the baseline with an improvement of 3.55% in accuracy, 3.68% in the recall, 3.60% in f1-measure, 3.67% in precision, and 2.71% in AUC. The findings of this research provide practical implications for commercial applications and future research.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL