Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Article in English | MEDLINE | ID: mdl-38277246

ABSTRACT

Recently, the massive growth of IoT devices and Internet data, which are widely used in many applications, including industry and healthcare, has dramatically increased the amount of free unlabeled data collected. However, this unlabeled data is useless if we want to learn supervised machine learning models. The expensive and time-consuming cost of labeling makes the problem even more challenging. Here, the active learning (AL) technique provides a solution by labeling small but highly informative and representative data, which guarantees a high degree of generalizability over space and improves classification performance with data we have never seen before. The task is more difficult when the active learner has no predefined knowledge, such as initial training data, and when the obtained data is incomplete (i.e., contains missing values). In previous studies, the missing data should first be imputed. Then, the active learner selects from the available unlabeled data, regardless of whether the points were originally observed or imputed. However, selecting inaccurate imputed data points would negatively affect the active learner and prevent it from selecting informative and/or representative points, thus reducing the overall classification performance of the prediction models. This motivated us to introduce a novel query selection strategy that accounts for imputation uncertainty when querying new points. For this purpose, we first introduce a novel multiple imputation method that considers feature importance in selecting the most promising feature groups for missing values estimation. This multiple imputation method provides the ability to quantify the imputation uncertainty of each imputed data point. Furthermore, in each of the two phases of the proposed active learner (exploration and exploitation), imputation uncertainty is taken into account to reduce the probability of selecting points with high imputation uncertainty. We tested the effectiveness of the proposed active learner on different binary and multiclass datasets with different missing rates.

2.
Comput Methods Programs Biomed ; 197: 105702, 2020 Dec.
Article in English | MEDLINE | ID: mdl-32818915

ABSTRACT

BACKGROUND AND OBJECTIVES: Toxicity testing is an important step for developing new drugs, and animals are widely used in this step by exposing them to the toxicants. Zebrafishes are widely used for measuring and detecting the toxicity. However, measuring and testing toxicity manually is not feasible due to the large number of embryos. This work presents an automated model to investigate the toxicity of two toxicants (3, 4-Dichloroaniline (34DCA) and p-Tert-Butylphenol (PTBP)). METHODS: The proposed model consists of two steps. In the first step, a set of features is extracted from microscopic images of zebrafish embryos using the Segmentation-Based Fractal Texture Analysis (SFTA) technique. Secondly, a novel rough set-based model using Social ski-driver (SSD) is used to find a global minimal subset of features that preserves important information of the original features. In the third step, the AdaBoost classifier is used to classify an unknown sample to alive or coagulant after exposing the embryo to a toxic compound. RESULTS: For detecting the toxicity, the proposed model is compared with (i) three deterministic rough set reduction algorithms and (ii) the PSO-based algorithm. The classification performance rate of our model was ranged from 97.1% to 99.5% and it outperformed the other algorithms. CONCLUSIONS: The results of our experiments proved that the proposed drug toxicity model is efficient for rough set-based feature selection and it obtains a high classification performance.


Subject(s)
Algorithms , Drug-Related Side Effects and Adverse Reactions , Animals , Zebrafish
3.
Comput Biol Chem ; 70: 198-210, 2017 Oct.
Article in English | MEDLINE | ID: mdl-28923545

ABSTRACT

Vitamin D deficiency is prevalent in the Arabian Gulf region, especially among women. Recent studies show that the vitamin D deficiency is associated with a mineral status of a patient. Therefore, it is important to assess the mineral status of the patient to reveal the hidden mineral imbalance associated with vitamin D deficiency. A well-known test such as the red blood cells is fairly expensive, invasive, and less informative. On the other hand, a hair mineral analysis can be considered an accurate, excellent, highly informative tool to measure mineral imbalance associated with vitamin D deficiency. In this study, 118 apparently healthy Kuwaiti women were assessed for their mineral levels and vitamin D status by a hair mineral analysis (HMA). This information was used to build a computerized model that would predict vitamin D deficiency based on its association with the levels and ratios of minerals. The first phase of the proposed model introduces a novel hybrid optimization algorithm, which can be considered as an improvement of Bat Algorithm (BA) to select the most discriminative features. The improvement includes using the mutation process of Genetic Algorithm (GA) to update the positions of bats with the aim of speeding up convergence; thus, making the algorithm more feasible for wider ranges of real-world applications. Due to the imbalanced class distribution in our dataset, in the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling, and Synthetic Minority Oversampling Technique are used to solve the problem of imbalanced datasets. In the third phase, an AdaBoost ensemble classifier is used to predicting the vitamin D deficiency. The results showed that the proposed model achieved good results to detect the deficiency in vitamin D.


Subject(s)
Algorithms , Hair/chemistry , Machine Learning , Minerals/analysis , Vitamin D Deficiency/diagnosis , Vitamin D/analysis , Female , Humans , Mutation , Vitamin D Deficiency/genetics
4.
J Biomed Inform ; 68: 132-149, 2017 04.
Article in English | MEDLINE | ID: mdl-28286029

ABSTRACT

Measuring toxicity is an important step in drug development. Nevertheless, the current experimental methods used to estimate the drug toxicity are expensive and time-consuming, indicating that they are not suitable for large-scale evaluation of drug toxicity in the early stage of drug development. Hence, there is a high demand to develop computational models that can predict the drug toxicity risks. In this study, we used a dataset that consists of 553 drugs that biotransformed in liver. The toxic effects were calculated for the current data, namely, mutagenic, tumorigenic, irritant and reproductive effect. Each drug is represented by 31 chemical descriptors (features). The proposed model consists of three phases. In the first phase, the most discriminative subset of features is selected using rough set-based methods to reduce the classification time while improving the classification performance. In the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique (SMOTE), BorderLine SMOTE and Safe Level SMOTE are used to solve the problem of imbalanced dataset. In the third phase, the Support Vector Machines (SVM) classifier is used to classify an unknown drug into toxic or non-toxic. SVM parameters such as the penalty parameter and kernel parameter have a great impact on the classification accuracy of the model. In this paper, Whale Optimization Algorithm (WOA) has been proposed to optimize the parameters of SVM, so that the classification error can be reduced. The experimental results proved that the proposed model achieved high sensitivity to all toxic effects. Overall, the high sensitivity of the WOA+SVM model indicates that it could be used for the prediction of drug toxicity in the early stage of drug development.


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Support Vector Machine , Algorithms , Chemical and Drug Induced Liver Injury , Drug Discovery , Forecasting , Humans
5.
Sci Rep ; 6: 38660, 2016 12 09.
Article in English | MEDLINE | ID: mdl-27934950

ABSTRACT

Measuring toxicity is one of the main steps in drug development. Hence, there is a high demand for computational models to predict the toxicity effects of the potential drugs. In this study, we used a dataset, which consists of four toxicity effects:mutagenic, tumorigenic, irritant and reproductive effects. The proposed model consists of three phases. In the first phase, rough set-based methods are used to select the most discriminative features for reducing the classification time and improving the classification performance. Due to the imbalanced class distribution, in the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique are used to solve the problem of imbalanced datasets. ITerative Sampling (ITS) method is proposed to avoid the limitations of those methods. ITS method has two steps. The first step (sampling step) iteratively modifies the prior distribution of the minority and majority classes. In the second step, a data cleaning method is used to remove the overlapping that is produced from the first step. In the third phase, Bagging classifier is used to classify an unknown drug into toxic or non-toxic. The experimental results proved that the proposed model performed well in classifying the unknown samples according to all toxic effects in the imbalanced datasets.


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Inactivation, Metabolic , Liver/metabolism , Models, Biological , Algorithms , Humans , ROC Curve , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...