Search | VHL Regional Portal

Implementation of machine learning in DNA barcoding for determining the plant family taxonomy.

Riza, Lala Septem; Zain, Muhammad Iqbal; Izzuddin, Ahmad; Prasetyo, Yudi; Hidayat, Topik; Abu Samah, Khyrina Airin Fariza.

Heliyon ; 9(10): e20161, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37767518

ABSTRACT

The DNA barcoding approach has been used extensively in taxonomy and phylogenetics. The differences in certain DNA sequences are able to differentiate and help classify organisms into taxa. It has been used in cases of taxonomic disputes where morphology by itself is insufficient. This research aimed to utilize hierarchical clustering, an unsupervised machine learning method, to determine and resolve disputes in plant family taxonomy. We take a case study of Leguminosae that historically some classify into three families (Fabaceae, Caesalpiniaceae, and Mimosaceae) but others classify into one family (Leguminosae). This study is divided into several phases, which are: (i) data collection, (ii) data preprocessing, (iii) finding the best distance method, and (iv) determining disputed family. The data used are collected from several sources, including National Center for Biotechnology Information (NCBI), journals, and websites. The data for validation of the methods were collected from NCBI. This was used to determine the best distance method for differentiating families or genera. The data for the case study in the Leguminosae group was collected from journals and a website. From the experiment that we have conducted, we found that the Pearson method is the best distance method to do clustering ITS sequence of plants, both in accuracy and computational cost. We use the Pearson method to determine the disputed family between Leguminosae. We found that the case study of Leguminosae should be grouped into one family based on our research.

Automatic generation of short-answer questions in reading comprehension using NLP and KNN.

Riza, Lala Septem; Firdaus, Yahya; Sukamto, Rosa Ariani; Abu Samah, Khyrina Airin Fariza.

Multimed Tools Appl ; : 1-28, 2023 Apr 12.

Article in English | MEDLINE | ID: mdl-37362718

ABSTRACT

In general, making evaluations requires a lot of time, especially in thinking about the questions and answers. Therefore, research on automatic question generation is carried out in the hope that it can be used as a tool to generate question and answer sentences, so as to save time in thinking about questions and answers. This research focuses on automatically generating short answer questions in the reading comprehension section using Natural Language Processing (NLP) and K-Nearest Neighborhood (KNN). The questions generated use article sources from news with reliable grammar. To maintain the quality of the questions produced, machine learning methods are also used, namely by conducting training on existing questions. The stages of this research in outline are simple sentence extraction, problem classification, generating question sentences, and finally comparing candidate questions with training data to determine eligibility. The results of the experiment carried out were for the Grammatical Correctness parameter to produce a percentage of 59.52%, for the Answer Existence parameter it yielded 95.24%, while for the Difficulty Index parameter it produced a percentage of 34.92%. So that the resulting average is 63.23%. So, this software deserves to be used as an alternative to automatically create reading comprehension questions.

SAX and Random Projection Algorithms for the Motif Discovery of Orbital Asteroid Resonance Using Big Data Platforms.

Riza, Lala Septem; Fazanadi, Muhammad Naufal; Utama, Judhistira Aria; Samah, Khyrina Airin Fariza Abu; Hidayat, Taufiq; Nazir, Shah.

Sensors (Basel) ; 22(14)2022 Jul 06.

Article in English | MEDLINE | ID: mdl-35890751

ABSTRACT

The phenomenon of big data has occurred in many fields of knowledge, one of which is astronomy. One example of a large dataset in astronomy is that of numerically integrated time series asteroid orbital elements from a time span of millions to billions of years. For example, the mean motion resonance (MMR) data of an asteroid are used to find out the duration that the asteroid was in a resonance state with a particular planet. For this reason, this research designs a computational model to obtain the mean motion resonance quickly and effectively by modifying and implementing the Symbolic Aggregate Approximation (SAX) algorithm and the motif discovery random projection algorithm on big data platforms (i.e., Apache Hadoop and Apache Spark). There are five following steps on the model: (i) saving data into the Hadoop Distributed File System (HDFS); (ii) importing files to the Resilient Distributed Datasets (RDD); (iii) preprocessing the data; (iv) calculating the motif discovery by executing the User-Defined Function (UDF) program; and (v) gathering the results from the UDF to the HDFS and the .csv file. The results indicated a very significant reduction in computational time between the use of the standalone method and the use of the big data platform. The proposed computational model obtained an average accuracy of 83%, compared with the SwiftVis software.

Subject(s)

Algorithms , Big Data , Data Collection , Software

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL