Search | VHL Regional Portal

Genome assembly composition of the String "ACGT" array: a review of data structure accuracy and performance challenges.

Magdy Mohamed Abdelaziz Barakat, Sherif; Sallehuddin, Roselina; Yuhaniz, Siti Sophiayati; R Khairuddin, Raja Farhana; Mahmood, Yasir.

PeerJ Comput Sci ; 9: e1180, 2023.

Article in English | MEDLINE | ID: mdl-37547391

ABSTRACT

Background: The development of sequencing technology increases the number of genomes being sequenced. However, obtaining a quality genome sequence remains a challenge in genome assembly by assembling a massive number of short strings (reads) with the presence of repetitive sequences (repeats). Computer algorithms for genome assembly construct the entire genome from reads in two approaches. The de novo approach concatenates the reads based on the exact match between their suffix-prefix (overlapping). Reference-guided approach orders the reads based on their offsets in a well-known reference genome (reads alignment). The presence of repeats extends the technical ambiguity, making the algorithm unable to distinguish the reads resulting in misassembly and affecting the assembly approach accuracy. On the other hand, the massive number of reads causes a big assembly performance challenge. Method: The repeat identification method was introduced for misassembly by prior identification of repetitive sequences, creating a repeat knowledge base to reduce ambiguity during the assembly process, thus enhancing the accuracy of the assembled genome. Also, hybridization between assembly approaches resulted in a lower misassembly degree with the aid of the reference genome. The assembly performance is optimized through data structure indexing and parallelization. This article's primary aim and contribution are to support the researchers through an extensive review to ease other researchers' search for genome assembly studies. The study also, highlighted the most recent developments and limitations in genome assembly accuracy and performance optimization. Results: Our findings show the limitations of the repeat identification methods available, which only allow to detect of specific lengths of the repeat, and may not perform well when various types of repeats are present in a genome. We also found that most of the hybrid assembly approaches, either starting with de novo or reference-guided, have some limitations in handling repetitive sequences as it is more computationally costly and time intensive. Although the hybrid approach was found to outperform individual assembly approaches, optimizing its performance remains a challenge. Also, the usage of parallelization in overlapping and reads alignment for genome assembly is yet to be fully implemented in the hybrid assembly approach. Conclusion: We suggest combining multiple repeat identification methods to enhance the accuracy of identifying the repeats as an initial step to the hybrid assembly approach and combining genome indexing with parallelization for better optimization of its performance.

Ensemble filters with harmonize PSO-SVM algorithm for optimal hearing disorder prediction.

Hamid, Tengku Mazlin Tengku Ab; Sallehuddin, Roselina; Yunos, Zuriahati Mohd; Ali, Aida.

Neural Comput Appl ; 35(14): 10473-10496, 2023.

Article in English | MEDLINE | ID: mdl-36747886

ABSTRACT

Discovering a hearing disorder at an earlier intervention is critical for reducing the effects of hearing loss and the approaches to increase the remaining hearing ability can be implemented to achieve the successful development of human communication. Recently, the explosive dataset features have increased the complexity for audiologists to decide the proper treatment for the patient. In most cases, data with irrelevant features and improper classifier parameters causes a crucial influence on the audiometry system in terms of accuracy. This is due to the dependent processes of these two, where the classification accuracy performance could be worsened if both processes are conducted independently. Although the filter algorithm is capable of eliminating irrelevant features, it still lacks the ability to consider feature reliance and results in a poor selection of significant features. Improper kernel parameter settings may also contribute to poor accuracy performance. In this paper, an ensemble filters feature selection based on Information Gain (IG), Gain Ratio (GR), Chi-squared (CS), and Relief-F (RF) with harmonize optimization of Particle Swarm Optimization (PSO) and Support Vector Machine (SVM) is presented to mitigate these problems. Ensemble filters are utilized so that the initial top dominant features relevant for classification can be considered. Then, PSO and SVM are optimized simultaneously to achieve the optimal solution. The results on a standard Audiology dataset show that the proposed method produces 96.50% accuracy with optimal solution compared to classical SVM, which signifies the proposed method is effective in handling high dimensional data for hearing disorder prediction.

Hybrid support vector regression and autoregressive integrated moving average models improved by particle swarm optimization for property crime rates forecasting with economic indicators.

Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina.

ScientificWorldJournal ; 2013: 951475, 2013.

Article in English | MEDLINE | ID: mdl-23766729

ABSTRACT

Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.

Subject(s)

Crime/statistics & numerical data , Crime/trends , Forecasting , Models, Economic , Models, Statistical , Support Vector Machine , Computer Simulation , Malaysia , Regression Analysis , Systems Integration

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL