Search | VHL Regional Portal

Bioinformatics approaches for unveiling virus-host interactions.

Iuchi, Hitoshi; Kawasaki, Junna; Kubo, Kento; Fukunaga, Tsukasa; Hokao, Koki; Yokoyama, Gentaro; Ichinose, Akiko; Suga, Kanta; Hamada, Michiaki.

Comput Struct Biotechnol J ; 21: 1774-1784, 2023.

Article in English | MEDLINE | ID: mdl-36874163

ABSTRACT

The coronavirus disease-2019 (COVID-19) pandemic has elucidated major limitations in the capacity of medical and research institutions to appropriately manage emerging infectious diseases. We can improve our understanding of infectious diseases by unveiling virus-host interactions through host range prediction and protein-protein interaction prediction. Although many algorithms have been developed to predict virus-host interactions, numerous issues remain to be solved, and the entire network remains veiled. In this review, we comprehensively surveyed algorithms used to predict virus-host interactions. We also discuss the current challenges, such as dataset biases toward highly pathogenic viruses, and the potential solutions. The complete prediction of virus-host interactions remains difficult; however, bioinformatics can contribute to progress in research on infectious diseases and human health.

Representation learning applications in biological sequence analysis.

Iuchi, Hitoshi; Matsutani, Taro; Yamada, Keisuke; Iwano, Natsuki; Sumi, Shunsuke; Hosoda, Shion; Zhao, Shitao; Fukunaga, Tsukasa; Hamada, Michiaki.

Comput Struct Biotechnol J ; 19: 3198-3208, 2021.

Article in English | MEDLINE | ID: mdl-34141139

ABSTRACT

Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis.

Jonckheere-Terpstra-Kendall-based non-parametric analysis of temporal differential gene expression.

Iuchi, Hitoshi; Hamada, Michiaki.

NAR Genom Bioinform ; 3(1): lqab021, 2021 Mar.

Article in English | MEDLINE | ID: mdl-33796851

ABSTRACT

Time-course experiments using parallel sequencers have the potential to uncover gradual changes in cells over time that cannot be observed in a two-point comparison. An essential step in time-series data analysis is the identification of temporal differentially expressed genes (TEGs) under two conditions (e.g. control versus case). Model-based approaches, which are typical TEG detection methods, often set one parameter (e.g. degree or degree of freedom) for one dataset. This approach risks modeling of linearly increasing genes with higher-order functions, or fitting of cyclic gene expression with linear functions, thereby leading to false positives/negatives. Here, we present a Jonckheere-Terpstra-Kendall (JTK)-based non-parametric algorithm for TEG detection. Benchmarks, using simulation data, show that the JTK-based approach outperforms existing methods, especially in long time-series experiments. Additionally, application of JTK in the analysis of time-series RNA-seq data from seven tissue types, across developmental stages in mouse and rat, suggested that the wave pattern contributes to the TEG identification of JTK, not the difference in expression levels. This result suggests that JTK is a suitable algorithm when focusing on expression patterns over time rather than expression levels, such as comparisons between different species. These results show that JTK is an excellent candidate for TEG detection.

MICOP: Maximal information coefficient-based oscillation prediction to detect biological rhythms in proteomics data.

Iuchi, Hitoshi; Sugimoto, Masahiro; Tomita, Masaru.

BMC Bioinformatics ; 19(1): 249, 2018 06 28.

Article in English | MEDLINE | ID: mdl-29954316

ABSTRACT

BACKGROUND: Circadian rhythms comprise oscillating molecular interactions, the disruption of the homeostasis of which would cause various disorders. To understand this phenomenon systematically, an accurate technique to identify oscillating molecules among omics datasets must be developed; however, this is still impeded by many difficulties, such as experimental noise and attenuated amplitude. RESULTS: To address these issues, we developed a new algorithm named Maximal Information Coefficient-based Oscillation Prediction (MICOP), a sine curve-matching method. The performance of MICOP in labeling oscillation or non-oscillation was compared with four reported methods using Mathews correlation coefficient (MCC) values. The numerical experiments were performed with time-series data with (1) mimicking of molecular oscillation decay, (2) high noise and low sampling frequency and (3) one-cycle data. The first experiment revealed that MICOP could accurately identify the rhythmicity of decaying molecular oscillation (MCC > 0.7). The second experiment revealed that MICOP was robust against high-level noise (MCC > 0.8) even upon the use of low-sampling-frequency data. The third experiment revealed that MICOP could accurately identify the rhythmicity of noisy one-cycle data (MCC > 0.8). As an application, we utilized MICOP to analyze time-series proteome data of mouse liver. MICOP identified that novel oscillating candidates numbered 14 and 30 for C57BL/6 and C57BL/6 J, respectively. CONCLUSIONS: In this paper, we presented MICOP, which is an MIC-based algorithm, for predicting periodic patterns in large-scale time-resolved protein expression profiles. The performance test using artificially generated simulation data revealed that the performance of MICOP for decaying data was superior to that of the existing widely used methods. It can reveal novel findings from time-series data and may contribute to biologically significant results. This study suggests that MICOP is an ideal approach for detecting and characterizing oscillations in time-resolved omics data sets.

Subject(s)

Circadian Rhythm/genetics , Proteomics/methods , Humans

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL