Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.019
Filtrar
1.
Stat Med ; 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38951867

RESUMO

For survival analysis applications we propose a novel procedure for identifying subgroups with large treatment effects, with focus on subgroups where treatment is potentially detrimental. The approach, termed forest search, is relatively simple and flexible. All-possible subgroups are screened and selected based on hazard ratio thresholds indicative of harm with assessment according to the standard Cox model. By reversing the role of treatment one can seek to identify substantial benefit. We apply a splitting consistency criteria to identify a subgroup considered "maximally consistent with harm." The type-1 error and power for subgroup identification can be quickly approximated by numerical integration. To aid inference we describe a bootstrap bias-corrected Cox model estimator with variance estimated by a Jacknife approximation. We provide a detailed evaluation of operating characteristics in simulations and compare to virtual twins and generalized random forests where we find the proposal to have favorable performance. In particular, in our simulation setting, we find the proposed approach favorably controls the type-1 error for falsely identifying heterogeneity with higher power and classification accuracy for substantial heterogeneous effects. Two real data applications are provided for publicly available datasets from a clinical trial in oncology, and HIV.

2.
BMC Med Res Methodol ; 24(1): 123, 2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38831346

RESUMO

In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research.


Assuntos
Algoritmos , Depressão , Aprendizado de Máquina , Humanos , Depressão/diagnóstico , Índice de Gravidade de Doença , Sensibilidade e Especificidade , Feminino
3.
J Dairy Sci ; 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38876215

RESUMO

Feed efficiency is important for economic profitability of dairy farms; however, recording daily dry matter intakes (DMI) is expensive. Our objective was to investigate the potential use of milk mid-infrared (MIR) spectral data to predict proxy phenotypes for DMI based on different cross-validation schemes. We were specifically interested in comparisons between a model that included only MIR data (Model M1), a model that incorporated different energy sink predictors, such as body weight, body weight change, and milk energy (Model M2), and an extended model that incorporated both energy sinks and MIR data (Model M3). Models M2 and M3 also included various cow level variables (stage of lactation, age at calving, parity) such that any improvement in model performance from M2 to M3, whether through a smaller root mean squared error (RMSE) or a greater squared predictive correlation (R2), could indicate a potential benefit of MIR to predict residual feed intake. The data used in our study originated from a multi-institutional project on the genetics of feed efficiency in US Holsteins. Analyses were conducted on 2 different trait definitions based on different period lengths: averaged across weeks vs. averaged across 28-d. Specifically, there were 19,942 weekly records on 1,812 cows across 46 experiments or cohorts and 3,724 28-d records on 1,700 cows across 43 different cohorts. The cross-validation analyses involved 3 different k-fold schemes. First, a 10-fold cow-independent cross-validation was conducted whereby all records from any one cow were kept together in either training or test sets. Similarly, a 10-fold experiment-independent cross-validation kept entire experiments together whereas a 4-fold herd-independent cross-validation kept entire herds together in either training or test sets. Based on cow-independent cross-validation for both weekly and 28-d DMI, adding MIR predictors to energy sinks (Models M3 vs M2) significantly (P < 10-10) reduced average RMSE to 1.59 kg and increased average R2 to 0.89. However, adding MIR to energy sinks (M3) to predict DMI either within an experiment-independent or herd-independent cross-validation scheme seemed to demonstrate no merit (P > 0.05) compared with an energy sink model (M2) for either R2 or RMSE (respectively, 0.68 and 2.55 kg for M2 in herd-independent scheme). We further noted that with broader cross-validation schemes, i.e., from cow-independent to experiment-independent to herd-independent schemes, the mean and slope bias increased. Given that proxy DMI phenotypes for cows would need to be almost entirely generated in herds having no DMI or training data of their own, herd-independent cross-validation assessments of predictive performance should be emphasized. Hence, more research on predictive algorithms suitable for broader cross-validation schemes and a more earnest effort on calibration of spectrophotometers against each other should be considered.

4.
BMC Genom Data ; 25(1): 60, 2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-38877416

RESUMO

BACKGROUND: Forest geneticists typically use provenances to account for population differences in their improvement schemes; however, the historical records of the imported materials might not be very precise or well-aligned with the genetic clusters derived from advanced molecular techniques. The main objective of this study was to assess the impact of marker-based population structure on genetic parameter estimates related to growth and wood properties and their trade-offs in Norway spruce, by either incorporating it as a fixed effect (model-A) or excluding it entirely from the analysis (model-B). RESULTS: Our results indicate that models incorporating population structure significantly reduce estimates of additive genetic variance, resulting in substantial reduction of narrow-sense heritability. However, these models considerably improve prediction accuracies. This was particularly significant for growth and solid-wood properties, which showed to have the highest population genetic differentiation (QST) among the studied traits. Additionally, although the pattern of correlations remained similar across the models, their magnitude was slightly lower for models that included population structure as a fixed effect. This suggests that selection, consistently performed within populations, might be less affected by unfavourable genetic correlations compared to mass selection conducted without pedigree restrictions. CONCLUSION: We conclude that the results of models properly accounting for population structure are more accurate and less biased compared to those neglecting this effect. This might have practical implications for breeders and forest managers where, decisions based on imprecise selections can pose a high risk to economic efficiency.


Assuntos
Picea , Madeira , Picea/genética , Picea/crescimento & desenvolvimento , Madeira/genética , Marcadores Genéticos/genética , Modelos Genéticos , Genética Populacional/métodos , Variação Genética/genética
5.
J Arrhythm ; 40(3): 560-577, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38939795

RESUMO

Background: Remote monitoring (RM) of cardiac implantable electrical devices (CIEDs) can detect various events early. However, the diagnostic ability of CIEDs has not been sufficient, especially for lead failure. The first notification of lead failure was almost noise events, which were detected as arrhythmia by the CIED. A human must analyze the intracardiac electrogram to accurately detect lead failure. However, the number of arrhythmic events is too large for human analysis. Artificial intelligence (AI) seems to be helpful in the early and accurate detection of lead failure before human analysis. Objective: To test whether a neural network can be trained to precisely identify noise events in the intracardiac electrogram of RM data. Methods: We analyzed 21 918 RM data consisting of 12 925 and 1884 Medtronic and Boston Scientific data, respectively. Among these, 153 and 52 Medtronic and Boston Scientific data, respectively, were diagnosed as noise events by human analysis. In Medtronic, 306 events, including 153 noise events and randomly selected 153 out of 12 692 nonnoise events, were analyzed in a five-fold cross-validation with a convolutional neural network. The Boston Scientific data were analyzed similarly. Results: The precision rate, recall rate, F1 score, accuracy rate, and the area under the curve were 85.8 ± 4.0%, 91.6 ± 6.7%, 88.4 ± 2.0%, 88.0 ± 2.0%, and 0.958 ± 0.021 in Medtronic and 88.4 ± 12.8%, 81.0 ± 9.3%, 84.1 ± 8.3%, 84.2 ± 8.3% and 0.928 ± 0.041 in Boston Scientific. Five-fold cross-validation with a weighted loss function could increase the recall rate. Conclusions: AI can accurately detect noise events. AI analysis may be helpful for detecting lead failure events early and accurately.

6.
Open Respir Arch ; 6(Suppl 2): 100313, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38828405

RESUMO

Introduction: This study aims to create an artificial intelligence (AI) based machine learning (ML) model capable of predicting a spirometric obstructive pattern using variables with the highest predictive power derived from an active case-finding program for COPD in primary care. Material and methods: A total of 1190 smokers, aged 30-80 years old with no prior history of respiratory disease, underwent spirometry with bronchodilation. The sample was analyzed using AI tools. Based on an exploratory data analysis (EDA), independent variables (according to mutual information analysis) were trained using a gradient boosting algorithm (GBT) and validated through cross-validation. Results: With an area under the curve close to unity, the model predicted a spirometric obstructive pattern using variables with the highest predictive power: FEV1_theoretical_pre values. Sensitivity: 93%. Positive predictive value: 94%. Specificity: 97%. Negative predictive value: 96%. Accuracy: 95%. Precision: 94%. Conclusion: An ML model can predict the presence of an obstructive pattern in spirometry in a primary care smoking population with no prior diagnosis of respiratory disease using the FEV1_theoretical_pre values with an accuracy and precision exceeding 90%. Further studies including clinical data and strategies for integrating AI into clinical workflow are needed.


Introducción: Este estudio tiene como objetivo crear un modelo de aprendizaje automático (ML) basado en inteligencia artificial (IA) capaz de predecir un patrón obstructivo espirométrico utilizando variables con el mayor poder predictivo derivado de un programa activo de búsqueda de casos de enfermedad pulmonar obstructiva crónica (EPOC) en Atención Primaria. Materiales y métodos: Un total de 1.190 fumadores, de entre 30 y 80 años, sin antecedentes de enfermedad respiratoria, fueron sometidos a espirometría con IA artificial. Sobre la base de un análisis de datos exploratorio (EDA), las variables independientes (según el análisis de información mutua) se entrenaron utilizando un algoritmo de gradiente de aumento (GBT) y se validaron mediante validación cruzada. Resultados: Con un área bajo la curva cercana a la unidad, el modelo predijo un patrón obstructivo espirométrico utilizando los valores del FEV1 prebroncodilatador. Sensibilidad: 93%. Valor predictivo positivo: 94%. Especificidad: 97%. Valor predictivo negativo: 96%. Precisión: 95%. Precisión: 94%. Conclusión: Un modelo ML puede predecir la presencia de un patrón obstructivo en la espirometría en una población fumadora de atención primaria sin diagnóstico previo de enfermedad respiratoria utilizando los valores FEV1 prebroncodilatadores con una exactitud y precisión superiores al 90%. Se necesitan más estudios que incluyan datos clínicos y estrategias para integrar la IA en el flujo de trabajo clínico.

7.
Psychiatry Res Neuroimaging ; 343: 111845, 2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38908302

RESUMO

BACKGROUND: The incidence rate of Posttraumatic stress disorder (PTSD) is currently increasing due to wars, terrorism, and pandemic disease situations. Therefore, accurate detection of PTSD is crucial for the treatment of the patients, for this purpose, the present study aims to classify individuals with PTSD versus healthy control. METHODS: The resting-state functional MRI (rs-fMRI) scans of 19 PTSD and 24 healthy control male subjects have been used to identify the activation pattern in most affected brain regions using group-level independent component analysis (ICA) and t-test. To classify PTSD-affected subjects from healthy control six machine learning techniques including random forest, Naive Bayes, support vector machine, decision tree, K-nearest neighbor, linear discriminant analysis, and deep learning three-dimensional 3D-CNN have been performed on the data and compared. RESULTS: The rs-fMRI scans of the most commonly investigated 11 regions of trauma-exposed and healthy brains are analyzed to observe their level of activation. Amygdala and insula regions are determined as the most activated regions from the regions-of-interest in the brain of PTSD subjects. In addition, machine learning techniques have been applied to the components extracted from ICA but the models provided low classification accuracy. The ICA components are also fed into the 3D-CNN model, which is trained with a 5-fold cross-validation method. The 3D-CNN model demonstrated high accuracies, such as 98.12%, 98.25 %, and 98.00 % on average with training, validation, and testing datasets, respectively. CONCLUSION: The findings indicate that 3D-CNN is a surpassing method than the other six considered techniques and it helps to recognize PTSD patients accurately.

8.
Anal Chim Acta ; 1307: 342574, 2024 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-38719419

RESUMO

BACKGROUND: Metabolomics is nowadays considered one the most powerful analytical for the discovery of metabolic dysregulations associated with the insurgence of cancer, given the reprogramming of the cell metabolism to meet the bioenergetic and biosynthetic demands of the malignant cell. Notwithstanding, several challenges still exist regarding quality control, method standardization, data processing, and compound identification. Therefore, there is a need for effective and straightforward approaches for the untargeted analysis of structurally related classes of compounds, such as acylcarnitines, that have been widely investigated in prostate cancer research for their role in energy metabolism and transport and ß-oxidation of fatty acids. RESULTS: In the present study, an innovative analytical platform was developed for the straightforward albeit comprehensive characterization of acylcarnitines based on high-resolution mass spectrometry, Kendrick mass defect filtering, and confirmation by prediction of their retention time in reversed-phase chromatography. In particular, a customized data processing workflow was set up on Compound Discoverer software to enable the Kendrick mass defect filtering, which allowed filtering out more than 90 % of the initial features resulting from the processing of 25 tumoral and adjacent non-malignant prostate tissues collected from patients undergoing radical prostatectomy. Later, a partial least square-discriminant analysis model validated by repeated double cross-validation was built on the dataset of 74 annotated acylcarnitines, with classification rates higher than 93 % for both groups, and univariate statistical analysis helped elucidate the individual role of the annotated metabolites. SIGNIFICANCE: Hydroxylation of short- and medium-chain minor acylcarnitines appeared to be a significant variable in describing tissue differences, suggesting the hypothesis that the neoplastic growth is linked to oxidation phenomena on selected metabolites and reinforcing the need for effective methods for the annotation of minor metabolites.


Assuntos
Carnitina , Neoplasias da Próstata , Masculino , Carnitina/análogos & derivados , Carnitina/metabolismo , Carnitina/química , Carnitina/análise , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/patologia , Humanos , Fluxo de Trabalho , Metabolômica , Espectrometria de Massas
9.
Front Plant Sci ; 15: 1293307, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38726298

RESUMO

Sweet corn breeding programs, like field corn, focus on the development of elite inbred lines to produce commercial hybrids. For this reason, genomic selection models can help the in silico prediction of hybrid crosses from the elite lines, which is hypothesized to improve the test cross scheme, leading to higher genetic gain in a breeding program. This study aimed to explore the potential of implementing genomic selection in a sweet corn breeding program through hybrid prediction in a within-site across-year and across-site framework. A total of 506 hybrids were evaluated in six environments (California, Florida, and Wisconsin, in the years 2020 and 2021). A total of 20 traits from three different groups were measured (plant-, ear-, and flavor-related traits) across the six environments. Eight statistical models were considered for prediction, as the combination of two genomic prediction models (GBLUP and RKHS) with two different kernels (additive and additive + dominance), and in a single- and multi-trait framework. Also, three different cross-validation schemes were tested (CV1, CV0, and CV00). The different models were then compared based on the correlation between the estimated breeding values/total genetic values and phenotypic measurements. Overall, heritabilities and correlations varied among the traits. The models implemented showed good accuracies for trait prediction. The GBLUP implementation outperformed RKHS in all cross-validation schemes and models. Models with additive plus dominance kernels presented a slight improvement over the models with only additive kernels for some of the models examined. In addition, models for within-site across-year and across-site performed better in the CV0 than the CV00 scheme, on average. Hence, GBLUP should be considered as a standard model for sweet corn hybrid prediction. In addition, we found that the implementation of genomic prediction in a sweet corn breeding program presented reliable results, which can improve the testcross stage by identifying the top candidates that will reach advanced field-testing stages.

10.
Bioresour Technol ; 402: 130793, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38703965

RESUMO

This study aimed to clarify the statistical accuracy assessment approaches used in recent biogas prediction studies using state-of-the-art ensemble machine learning approach according to 10-fold cross-validation in 100 repetitions. Three thermally pretreated harvest residue types (maize stover, sunflower stalk and soybean straw) and manure were anaerobically co-digested, measuring biogas and methane yield alongside eight thermal preprocessing and biomass covariates. These were the inputs to an ensemble machine learning approach for biogas and methane yield prediction, employing three feature selection approaches. The Support Vector Machine prediction with the Recursive Feature Elimination resulted in the highest prediction accuracy, achieving the coefficient of determination of 0.820 and 0.823 for biogas and methane yield prediction, respectively. This study demonstrated an extreme dependency of prediction accuracy to input dataset properties, which could only be mitigated with ensemble machine learning and strongly suggested that the split-sample approach, often used in previous studies, should be avoided.


Assuntos
Biocombustíveis , Aprendizado de Máquina , Esterco , Anaerobiose , Metano , Máquina de Vetores de Suporte , Biomassa
11.
New Phytol ; 243(1): 111-131, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38708434

RESUMO

Leaf traits are essential for understanding many physiological and ecological processes. Partial least squares regression (PLSR) models with leaf spectroscopy are widely applied for trait estimation, but their transferability across space, time, and plant functional types (PFTs) remains unclear. We compiled a novel dataset of paired leaf traits and spectra, with 47 393 records for > 700 species and eight PFTs at 101 globally distributed locations across multiple seasons. Using this dataset, we conducted an unprecedented comprehensive analysis to assess the transferability of PLSR models in estimating leaf traits. While PLSR models demonstrate commendable performance in predicting chlorophyll content, carotenoid, leaf water, and leaf mass per area prediction within their training data space, their efficacy diminishes when extrapolating to new contexts. Specifically, extrapolating to locations, seasons, and PFTs beyond the training data leads to reduced R2 (0.12-0.49, 0.15-0.42, and 0.25-0.56) and increased NRMSE (3.58-18.24%, 6.27-11.55%, and 7.0-33.12%) compared with nonspatial random cross-validation. The results underscore the importance of incorporating greater spectral diversity in model training to boost its transferability. These findings highlight potential errors in estimating leaf traits across large spatial domains, diverse PFTs, and time due to biased validation schemes, and provide guidance for future field sampling strategies and remote sensing applications.


Assuntos
Folhas de Planta , Folhas de Planta/fisiologia , Folhas de Planta/anatomia & histologia , Análise dos Mínimos Quadrados , Característica Quantitativa Herdável , Clorofila/metabolismo , Estações do Ano , Modelos Biológicos , Água , Carotenoides/metabolismo
12.
Epigenomes ; 8(2)2024 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-38804368

RESUMO

We consider the newly developed multinomial mixed-link models for a high-risk intestinal metaplasia (IM) study with DNA methylation data. Different from the traditional multinomial logistic models commonly used for categorical responses, the mixed-link models allow us to select the most appropriate link function for each category. We show that the selected multinomial mixed-link model (Model 1) using the total number of stem cell divisions (TNSC) based on DNA methylation data outperforms the traditional logistic models in terms of cross-entropy loss from ten-fold cross-validations with significant p-values 8.12×10-4 and 6.94×10-5. Based on our selected model, the significance of TNSC's effect in predicting the risk of IM is justified with a p-value less than 10-6. We also select the most appropriate mixed-link models (Models 2 and 3) when an additional covariate, the status of gastric atrophy, is available. When the status is negative, mild, or moderate, we recommend Model 2; otherwise, we prefer Model 3. Both Models 2 and 3 can predict the risk of IM significantly better than Model 1, which justifies that the status of gastric atrophy is informative in predicting the risk of IM.

13.
Sensors (Basel) ; 24(9)2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38732969

RESUMO

The recent scientific literature abounds in proposals of seizure forecasting methods that exploit machine learning to automatically analyze electroencephalogram (EEG) signals. Deep learning algorithms seem to achieve a particularly remarkable performance, suggesting that the implementation of clinical devices for seizure prediction might be within reach. However, most of the research evaluated the robustness of automatic forecasting methods through randomized cross-validation techniques, while clinical applications require much more stringent validation based on patient-independent testing. In this study, we show that automatic seizure forecasting can be performed, to some extent, even on independent patients who have never been seen during the training phase, thanks to the implementation of a simple calibration pipeline that can fine-tune deep learning models, even on a single epileptic event recorded from a new patient. We evaluate our calibration procedure using two datasets containing EEG signals recorded from a large cohort of epileptic subjects, demonstrating that the forecast accuracy of deep learning methods can increase on average by more than 20%, and that performance improves systematically in all independent patients. We further show that our calibration procedure works best for deep learning models, but can also be successfully applied to machine learning algorithms based on engineered signal features. Although our method still requires at least one epileptic event per patient to calibrate the forecasting model, we conclude that focusing on realistic validation methods allows to more reliably compare different machine learning approaches for seizure prediction, enabling the implementation of robust and effective forecasting systems that can be used in daily healthcare practice.


Assuntos
Algoritmos , Aprendizado Profundo , Eletroencefalografia , Convulsões , Humanos , Eletroencefalografia/métodos , Convulsões/diagnóstico , Convulsões/fisiopatologia , Calibragem , Processamento de Sinais Assistido por Computador , Epilepsia/diagnóstico , Epilepsia/fisiopatologia , Aprendizado de Máquina
14.
J Med Invest ; 71(1.2): 141-147, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38735710

RESUMO

CapeOX is a regimen used as postoperative adjuvant chemotherapy for the treatment of advanced recurrent colorectal cancer. If early adverse events occur, treatment may not progress as planned and further dose reduction may be necessary. In this study, we investigated whether pre-treatment medical records could be used to predict adverse events in order to prevent adverse events caused by CapeOX treatment. The 178 patients were classified into two groups (97 in the adverse event positive group and 81 in the adverse event-negative group) based on withdrawal or postponement of four or fewer courses. In univariate analysis, age, height, weight, body surface area (BSA), creatinine clearance, muscle mass, and lean body mass were associated with early adverse events (P<0.05). The area under the receiver operating characteristic curve obtained by Stepwise logistic regression analysis using the Akaike information criterion method was 0.832. For nested k-fold cross validation, the accuracy rates of the support vector machine, random forest, and logistic regression algorithms were 0.71, 0.70, and 0.75, respectively. The results of the present study suggest that a logistic regression prediction model may be useful in predicting early adverse events caused by CapeOX therapy in patients with colorectal cancer. J. Med. Invest. 71 : 141-147, February, 2024.


Assuntos
Neoplasias Colorretais , Humanos , Neoplasias Colorretais/tratamento farmacológico , Masculino , Feminino , Idoso , Pessoa de Meia-Idade , Capecitabina/efeitos adversos , Capecitabina/administração & dosagem , Protocolos de Quimioterapia Combinada Antineoplásica/efeitos adversos , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Protocolos de Quimioterapia Combinada Antineoplásica/administração & dosagem , Idoso de 80 Anos ou mais , Adulto , Estudos Retrospectivos
15.
Data Brief ; 54: 110418, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38708311

RESUMO

Type 2 Diabetes (T2D) exerts a substantial impact on mortality rates. According to 2023 statistics, more than half a billion individuals are experiencing the effects of T2D, making it one of the top 10 leading contributors to worldwide deaths. Multiple factors contribute to the onset of T2D, such as obesity, poor diet and lifestyle, the mutation in specific genes and many more. Among the various factors that contribute to the development of T2D, genetics is a pivotal aspect. Due to the significant influence of genes in the initiation and advancement of various phases of T2D, our focus lies on exploring the association between T2D and genes. In the present article, we have curated Standard disease gene association data which contains evidence or reference sentences which contain this disease gene association information, which is further classified into 4 classes: Yes, No, Ambiguous and X each pertaining to Positive, Negative, Ambiguous and Not related disease-gene associations respectively. For the purpose of this work, we downloaded T2D related abstracts from PubMed using EDirect and further pre-processed this abstract data to extract Reference Sentences Data. This data was later double-fold manually validated to compile this disease gene association data. The data produced in this article serves as reference data for the training text mining-based biological literature classifiers. Classifiers will further be used to predict classes of published literature, not just for T2D, but can also be expanded beyond to encompass a wide range of disease and their complications. The compilation of positively linked genes derived from these predictions can then be utilized for in-depth system-level analysis of T2D.

16.
Front Neurosci ; 18: 1373515, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38765672

RESUMO

A growing number of studies apply deep neural networks (DNNs) to recordings of human electroencephalography (EEG) to identify a range of disorders. In many studies, EEG recordings are split into segments, and each segment is randomly assigned to the training or test set. As a consequence, data from individual subjects appears in both the training and the test set. Could high test-set accuracy reflect data leakage from subject-specific patterns in the data, rather than patterns that identify a disease? We address this question by testing the performance of DNN classifiers using segment-based holdout (in which segments from one subject can appear in both the training and test set), and comparing this to their performance using subject-based holdout (where all segments from one subject appear exclusively in either the training set or the test set). In two datasets (one classifying Alzheimer's disease, and the other classifying epileptic seizures), we find that performance on previously-unseen subjects is strongly overestimated when models are trained using segment-based holdout. Finally, we survey the literature and find that the majority of translational DNN-EEG studies use segment-based holdout. Most published DNN-EEG studies may dramatically overestimate their classification performance on new subjects.

17.
Comput Methods Programs Biomed ; 249: 108157, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38582037

RESUMO

BACKGROUND AND OBJECTIVE: T-wave alternans (TWA) is a fluctuation in the repolarization morphology of the ECG. It is associated with cardiac instability and sudden cardiac death risk. Diverse methods have been proposed for TWA analysis. However, TWA detection in ambulatory settings remains a challenge due to the absence of standardized evaluation metrics and detection thresholds. METHODS: In this work we use traditional TWA analysis signal processing-based methods for feature extraction, and two machine learning (ML) methods, namely, K-nearest-neighbor (KNN) and random forest (RF), for TWA detection, addressing hyper-parameter tuning and feature selection. The final goal is the detection in ambulatory recordings of short, non-sustained and sparse TWA events. RESULTS: We train ML methods to detect a wide variety of alternant voltage from 20 to 100 µV, i.e., ranging from non-visible micro-alternans to TWA of higher amplitudes, to recognize a wide range in concordance to risk stratification. In classification, RF outperforms significantly the recall in comparison with the signal processing methods, at the expense of a small lost in precision. Despite ambulatory detection stands for an imbalanced category context, the trained ML systems always outperform signal processing methods. CONCLUSIONS: We propose a comprehensive integration of multiple variables inspired by TWA signal processing methods to fed learning-based methods. ML models consistently outperform the best signal processing methods, yielding superior recall scores.


Assuntos
Arritmias Cardíacas , Eletrocardiografia Ambulatorial , Humanos , Eletrocardiografia Ambulatorial/métodos , Frequência Cardíaca , Arritmias Cardíacas/diagnóstico , Morte Súbita Cardíaca , Processamento de Sinais Assistido por Computador , Eletrocardiografia/métodos
18.
BMC Med Res Methodol ; 24(1): 83, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589775

RESUMO

BACKGROUND: The timing of treating cancer patients is an essential factor in the efficacy of treatment. So, patients who will not respond to current therapy should receive a different treatment as early as possible. Machine learning models can be built to classify responders and nonresponders. Such classification models predict the probability of a patient being a responder. Most methods use a probability threshold of 0.5 to convert the probabilities into binary group membership. However, the cutoff of 0.5 is not always the optimal choice. METHODS: In this study, we propose a novel data-driven approach to select a better cutoff value based on the optimal cross-validation technique. To illustrate our novel method, we applied it to three clinical trial datasets of small-cell lung cancer patients. We used two different datasets to build a scoring system to segment patients. Then the models were applied to segment patients into the test data. RESULTS: We found that, in test data, the predicted responders and non-responders had significantly different long-term survival outcomes. Our proposed novel method segments patients better than the standard approach using a cutoff of 0.5. Comparing clinical outcomes of responders versus non-responders, our novel method had a p-value of 0.009 with a hazard ratio of 0.668 for grouping patients using the Cox proportion hazard model and a p-value of 0.011 using the accelerated failure time model which approved a significant difference between responders and non-responders. In contrast, the standard approach had a p-value of 0.194 with a hazard ratio of 0.823 using the Cox proportion hazard model and a p-value of 0.240 using the accelerated failure time model indicating the responders and non-responders do not differ significantly in survival. CONCLUSION: In summary, our novel prediction method can successfully segment new patients into responders and non-responders. Clinicians can use our prediction to decide if a patient should receive a different treatment or stay with the current treatment.


Assuntos
Neoplasias Pulmonares , Carcinoma de Pequenas Células do Pulmão , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/terapia , Carcinoma de Pequenas Células do Pulmão/terapia , Resultado do Tratamento , Projetos de Pesquisa
19.
Stat Med ; 43(13): 2487-2500, 2024 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-38621856

RESUMO

Precision medicine aims to identify specific patient subgroups that may benefit the most from a particular treatment than the whole population. Existing definitions for the best subgroup in subgroup analysis are based on a single outcome and do not consider multiple outcomes; specifically, outcomes of different types. In this article, we introduce a definition for the best subgroup under a multiple-outcome setting with continuous, binary, and censored time-to-event outcomes. Our definition provides a trade-off between the subgroup size and the conditional average treatment effects (CATE) in the subgroup with respect to each of the outcomes while taking the relative contribution of the outcomes into account. We conduct simulations to illustrate the proposed definition. By examining the outcomes of urinary tract infection and renal scarring in the RIVUR clinical trial, we identify a subgroup of children that would benefit the most from long-term antimicrobial prophylaxis.


Assuntos
Simulação por Computador , Medicina de Precisão , Infecções Urinárias , Humanos , Infecções Urinárias/tratamento farmacológico , Resultado do Tratamento , Modelos Estatísticos , Criança
20.
Meat Sci ; 213: 109505, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38579509

RESUMO

Volatile organic compounds (VOCs) indicative of pork microbial spoilage can be quantified rapidly at trace levels using selected-ion flow-tube mass spectrometry (SIFT-MS). Packaging atmosphere is one of the factors influencing VOC production patterns during storage. On this basis, machine learning would help to process complex volatolomic data and predict pork microbial quality efficiently. This study focused on (1) investigating model generalizability based on different nested cross-validation settings, and (2) comparing the predictive power and feature importance of nine algorithms, including Artificial Neural Network (ANN), k-Nearest Neighbors, Support Vector Regression, Decision Tree, Partial Least Squares Regression, and four ensemble learning models. The datasets used contain 37 VOCs' concentrations (input) and total plate counts (TPC, output) of 350 pork samples with different storage times, including 225 pork loin samples stored under three high-O2 and three low-O2 conditions, and 125 commercially packaged products. An appropriate choice of cross-validation strategies resulted in trustworthy and relevant predictions. When trained on all possible selections of two high-O2 and two low-O2 conditions, ANNs produced satisfactory TPC predictions of unseen test scenarios (one high-O2 condition, one low-O2 condition, and the commercial products). ANN-based bagging outperformed other employed models, when TPC exceeded ca. 6 log CFU/g. VOCs including benzaldehyde, 3-methyl-1-butanol, ethanol and methyl mercaptan were identified with high feature importance. This elaborated case study illustrates great prospects of real-time detection techniques and machine learning in meat quality prediction. Further investigations on handling low VOC levels would enhance the model performance and decision making in commercial meat quality control.


Assuntos
Microbiologia de Alimentos , Aprendizado de Máquina , Espectrometria de Massas , Compostos Orgânicos Voláteis , Animais , Compostos Orgânicos Voláteis/análise , Suínos , Espectrometria de Massas/métodos , Armazenamento de Alimentos , Embalagem de Alimentos/métodos , Redes Neurais de Computação , Carne de Porco/análise , Carne de Porco/microbiologia , Oxigênio/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...