Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 35
Filter
1.
BMC Med Res Methodol ; 24(1): 83, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38589775

ABSTRACT

BACKGROUND: The timing of treating cancer patients is an essential factor in the efficacy of treatment. So, patients who will not respond to current therapy should receive a different treatment as early as possible. Machine learning models can be built to classify responders and nonresponders. Such classification models predict the probability of a patient being a responder. Most methods use a probability threshold of 0.5 to convert the probabilities into binary group membership. However, the cutoff of 0.5 is not always the optimal choice. METHODS: In this study, we propose a novel data-driven approach to select a better cutoff value based on the optimal cross-validation technique. To illustrate our novel method, we applied it to three clinical trial datasets of small-cell lung cancer patients. We used two different datasets to build a scoring system to segment patients. Then the models were applied to segment patients into the test data. RESULTS: We found that, in test data, the predicted responders and non-responders had significantly different long-term survival outcomes. Our proposed novel method segments patients better than the standard approach using a cutoff of 0.5. Comparing clinical outcomes of responders versus non-responders, our novel method had a p-value of 0.009 with a hazard ratio of 0.668 for grouping patients using the Cox proportion hazard model and a p-value of 0.011 using the accelerated failure time model which approved a significant difference between responders and non-responders. In contrast, the standard approach had a p-value of 0.194 with a hazard ratio of 0.823 using the Cox proportion hazard model and a p-value of 0.240 using the accelerated failure time model indicating the responders and non-responders do not differ significantly in survival. CONCLUSION: In summary, our novel prediction method can successfully segment new patients into responders and non-responders. Clinicians can use our prediction to decide if a patient should receive a different treatment or stay with the current treatment.


Subject(s)
Lung Neoplasms , Small Cell Lung Carcinoma , Humans , Lung Neoplasms/diagnosis , Lung Neoplasms/therapy , Small Cell Lung Carcinoma/therapy , Treatment Outcome , Research Design
2.
Sci Total Environ ; 933: 172817, 2024 Jul 10.
Article in English | MEDLINE | ID: mdl-38688372

ABSTRACT

Shellfish poisonings have posed severe risks to human health globally. The Canadian Shellfish Sanitation Program was established in 1948 to monitor the toxin levels at shellfish harvesting sites along the coast of six provinces in Canada. Domoic acid has been a causal toxin for amnesic shellfish poisoning, and a macro-scale analysis of the temporal and spatial variation of domoic acid along Canada's coast was conducted in this study. We aggregated the toxin levels by week in blue mussel (Mytilus edulis) and soft-shell clam (Mya arenaria) samples, respectively, over a one-year scale. The subsequent application of Functional Principal Component Analysis unveiled that magnitudes of seasonal variation and peaked DA levels around early summer, spring, or mid-fall formed the largest variation in the toxin levels in blue mussels along the coastlines of British Columbia and Prince Edward Island and in soft-shell calms along those of New Brunswick and Nova Scotia. In Quebec, the DA levels were low and varied mostly in terms of the overall magnitude from spring to fall. Downstream correlation analyses in British Columbia further discovered that, at most sites, the strongest correlations were negative between precipitation as well as inorganic nutrients (including nitrate, nitrite, phosphate, and silicate) on one side and DA a few weeks afterward on the other. These findings indicated associations between amnesic shellfish poisoning and environmental stresses.


Subject(s)
Environmental Monitoring , Kainic Acid , Water Pollutants, Chemical , Kainic Acid/analogs & derivatives , Kainic Acid/analysis , Animals , Canada , Water Pollutants, Chemical/analysis , Marine Toxins/analysis , Bivalvia , Mytilus edulis , Shellfish Poisoning , Seasons
3.
Environ Res ; 252(Pt 2): 118944, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38636647

ABSTRACT

Paralytic shellfish toxins (PST) in shellfish products have led to severe risks to human health. To monitor the risk, the Canadian Shellfish Sanitation Program has been collecting longitudinal PST measurements in blue mussel (Mytilus edulis) and soft-shell clam (Mya arenaria) samples in six coastal provinces of Canada. The spatial distributions of major temporal variation patterns were studied via Functional Principal Component Analysis. Seasonal increases in PST contamination were found to vary the most in terms of magnitude along the coastlines, which provides support for location-specific management of the time-sensitive PST contamination. In British Columbia, the first functional principal component (FPC1) indicated the variance among the magnitudes, while FPC2 indicated the seasonality of the PST levels. The temporal variations tended to be positively correlated with the abundance of dianoflagellates Alexandrium spp., and negatively with precipitation and inorganic nutrients. These findings indicate the underlying mechanism of PST variation in various geographical settings. In New Brunswick, Prince Edward, and Nova Scotia, the top FPCs indicated that the PST contamination differed mostly in the seasonal increase of the PST level during summer.


Subject(s)
Marine Toxins , Seasons , Animals , Longitudinal Studies , Marine Toxins/analysis , Canada , Environmental Monitoring , Mytilus edulis , Bivalvia , Principal Component Analysis , Dinoflagellida , Shellfish Poisoning
4.
Cells ; 12(24)2023 12 05.
Article in English | MEDLINE | ID: mdl-38132091

ABSTRACT

BACKGROUND: Macrophages and monocytes orchestrate inflammatory processes in the lungs. However, their role in the pathogenesis of chronic obstructive pulmonary disease (COPD), an inflammatory condition, is not well known. Here, we determined the characteristics of these cells in lungs of COPD patients and identified novel therapeutic targets. METHODS: We analyzed the RNA sequencing (scRNA-seq) data of explanted human lung tissue from COPD (n = 18) and control (n = 28) lungs and found 16 transcriptionally distinct groups of macrophages and monocytes. We performed pathway and gene enrichment analyses to determine the characteristics of macrophages and monocytes from COPD (versus control) lungs and to identify the therapeutic targets, which were then validated using data from a randomized controlled trial of COPD patients (DISARM). RESULTS: In the alveolar macrophages, 176 genes were differentially expressed (83 up- and 93 downregulated; Padj < 0.05, |log2FC| > 0.5) and were enriched in downstream biological processes predicted to cause poor lipid uptake and impaired cell activation, movement, and angiogenesis in COPD versus control lungs. Classical monocytes from COPD lungs harbored a differential gene set predicted to cause the activation, mobilization, and recruitment of cells and a hyperinflammatory response to influenza. In silico, the corticosteroid fluticasone propionate was one of the top compounds predicted to modulate the abnormal transcriptional profiles of these cells. In vivo, a fluticasone-salmeterol combination significantly modulated the gene expression profiles of bronchoalveolar lavage cells of COPD patients (p < 0.05). CONCLUSIONS: COPD lungs harbor transcriptionally distinct lung macrophages and monocytes, reflective of a dysfunctional and hyperinflammatory state. Inhaled corticosteroids and other compounds can modulate the transcriptomic profile of these cells in patients with COPD.


Subject(s)
Macrophages, Alveolar , Monocytes , Pulmonary Disease, Chronic Obstructive , Humans , Adrenal Cortex Hormones/pharmacology , Adrenal Cortex Hormones/therapeutic use , Lung/metabolism , Macrophages/metabolism , Macrophages, Alveolar/metabolism , Monocytes/metabolism , Non-Randomized Controlled Trials as Topic , Pulmonary Disease, Chronic Obstructive/drug therapy , Pulmonary Disease, Chronic Obstructive/genetics , Pulmonary Disease, Chronic Obstructive/metabolism
5.
Contemp Clin Trials Commun ; 36: 101229, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38034840

ABSTRACT

This short communication concerns a biomarker adaptive Phase 2/3 design for new oncology drugs with an uncertain biomarker effect. Depending on the outcome of an interim analysis for adaptive decision, a Phase 2 study that starts in a biomarker enriched subpopulation may continue to the end without expansion to Phase 3, expand to Phase 3 in the same population or expand to Phase 3 in a broader population. Each path can enjoy full alpha for hypothesis testing without inflating the overall Type I error.

7.
J Biomed Inform ; 146: 104501, 2023 10.
Article in English | MEDLINE | ID: mdl-37742781

ABSTRACT

BACKGROUND: We often must conduct diagnostic tests on a massive volume of samples within a limited time during outbreaks of infectious diseases (e.g., COVID-19,screening) or repeat many times routinely (e.g., regular and massive screening for plant virus infections in farms). These tests aim to obtain the diagnostic result of all samples within a limited time. In such scenarios, the limitation of testing resources and human labor drives the need to pool individual samples and test them together to improve testing efficiency. When a pool is positive, further testing is required to identify the affected individuals; whereas when a pool is negative, we conclude all individuals in the pool are negative. How one splits the samples into pools is a critical factor affecting testing efficiency. OBJECTIVE: We aim to find the optimal strategy that adaptively guides users on optimally splitting the sample cohort into test-pools. METHODS: We developed an algorithm that minimizes the expected number of tests needed to obtain the diagnostic results of all samples. Our algorithm dynamically updates the critical information according to the result of the most recent test and calculates the optimal pool size for the next test. We implemented our novel adaptive sample pooling strategy into a web-based application, ADSP (https://ADSP.uvic.ca). ADSP interactively guides users on how many samples to be pooled for the current test, asks users to report the test result back and uses it to update the best strategy on how many samples to be pooled for the next test. RESULTS: We compared ADSP with other popular pooling methods in simulation studies, and found that ADSP requires fewer tests to diagnose a cohort and is more robust to the inaccurate initial estimate of the test cohort's disease prevalence. CONCLUSION: Our web-based application can help researchers decide how to pool their samples for grouped diagnostic tests. It improves test efficiency when grouped tests are conducted.


Subject(s)
COVID-19 , Diagnostic Techniques and Procedures , Humans , COVID-19/diagnosis , COVID-19/epidemiology , COVID-19 Testing , Sensitivity and Specificity
9.
Bioinform Adv ; 3(1): vbad030, 2023.
Article in English | MEDLINE | ID: mdl-36949780

ABSTRACT

Motivation: Single-cell RNA-sequencing (scRNA-seq) technology enables researchers to investigate a genome at the cellular level with unprecedented resolution. An organism consists of a heterogeneous collection of cell types, each of which plays a distinct role in various biological processes. Hence, the first step of scRNA-seq data analysis is often to distinguish cell types so they can be investigated separately. Researchers have recently developed several automated cell-type annotation tools, requiring neither biological knowledge nor subjective human decisions. Dropout is a crucial characteristic of scRNA-seq data widely used in differential expression analysis. However, no current cell annotation method explicitly utilizes dropout information. Fully utilizing dropout information motivated this work. Results: We present scAnnotate, a cell annotation tool that fully utilizes dropout information. We model every gene's marginal distribution using a mixture model, which describes both the dropout proportion and the distribution of the non-dropout expression levels. Then, using an ensemble machine learning approach, we combine the mixture models of all genes into a single model for cell-type annotation. This combining approach can avoid estimating numerous parameters in the high-dimensional joint distribution of all genes. Using 14 real scRNA-seq datasets, we demonstrate that scAnnotate is competitive against nine existing annotation methods. Furthermore, because of its distinct modelling strategy, scAnnotate's misclassified cells differ greatly from competitor methods. This suggests using scAnnotate together with other methods could further improve annotation accuracy. Availability and implementation: We implemented scAnnotate as an R package and made it publicly available from CRAN: https://cran.r-project.org/package=scAnnotate. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

10.
Mar Pollut Bull ; 189: 114712, 2023 Apr.
Article in English | MEDLINE | ID: mdl-36827773

ABSTRACT

The vast coastline provides Canada with a flourishing seafood industry including bivalve shellfish production. To sustain a healthy bivalve molluscan shellfish production, the Canadian Shellfish Sanitation Program was established to monitor the health of shellfish harvesting habitats, and fecal coliform bacteria data have been collected at nearly 15,000 marine sample sites across six coastal provinces in Canada since 1979. We applied Functional Principal Component Analysis and subsequent correlation analyses to find annual variation patterns of bacteria levels at sites in each province. The overall magnitude and the seasonality of fecal contamination were modelled by functional principal component one and two, respectively. The amplitude was related to human and warm-blooded animal activities; the seasonality was strongly correlated with river discharge driven by precipitation and snow melt in British Columbia, but such correlation in provinces along the Atlantic coast could not be properly evaluated due to lack of data during winter.


Subject(s)
Bivalvia , Animals , Humans , Seasons , Shellfish , Gram-Negative Bacteria , British Columbia
11.
Financ Innov ; 9(1): 39, 2023.
Article in English | MEDLINE | ID: mdl-36687790

ABSTRACT

Full electronic automation in stock exchanges has recently become popular, generating high-frequency intraday data and motivating the development of near real-time price forecasting methods. Machine learning algorithms are widely applied to mid-price stock predictions. Processing raw data as inputs for prediction models (e.g., data thinning and feature engineering) can primarily affect the performance of the prediction methods. However, researchers rarely discuss this topic. This motivated us to propose three novel modelling strategies for processing raw data. We illustrate how our novel modelling strategies improve forecasting performance by analyzing high-frequency data of the Dow Jones 30 component stocks. In these experiments, our strategies often lead to statistically significant improvement in predictions. The three strategies improve the F1 scores of the SVM models by 0.056, 0.087, and 0.016, respectively. Supplementary Information: The online version contains supplementary material available at 10.1186/s40854-022-00431-9.

12.
Front Genet ; 13: 992070, 2022.
Article in English | MEDLINE | ID: mdl-36212148

ABSTRACT

Deep Learning (DL) has been broadly applied to solve big data problems in biomedical fields, which is most successful in image processing. Recently, many DL methods have been applied to analyze genomic studies. However, genomic data usually has too small a sample size to fit a complex network. They do not have common structural patterns like images to utilize pre-trained networks or take advantage of convolution layers. The concern of overusing DL methods motivates us to evaluate DL methods' performance versus popular non-deep Machine Learning (ML) methods for analyzing genomic data with a wide range of sample sizes. In this paper, we conduct a benchmark study using the UK Biobank data and its many random subsets with different sample sizes. The original UK Biobank data has about 500k participants. Each patient has comprehensive patient characteristics, disease histories, and genomic information, i.e., the genotypes of millions of Single-Nucleotide Polymorphism (SNPs). We are interested in predicting the risk of three lung diseases: asthma, COPD, and lung cancer. There are 205,238 participants have recorded disease outcomes for these three diseases. Five prediction models are investigated in this benchmark study, including three non-deep machine learning methods (Elastic Net, XGBoost, and SVM) and two deep learning methods (DNN and LSTM). Besides the most popular performance metrics, such as the F1-score, we promote the hit curve, a visual tool to describe the performance of predicting rare events. We discovered that DL methods frequently fail to outperform non-deep ML in analyzing genomic data, even in large datasets with over 200k samples. The experiment results suggest not overusing DL methods in genomic studies, even with biobank-level sample sizes. The performance differences between DL and non-deep ML decrease as the sample size of data increases. This suggests when the sample size of data is significant, further increasing sample sizes leads to more performance gain in DL methods. Hence, DL methods could be better if we analyze genomic data bigger than this study.

13.
Chaos ; 32(5): 053127, 2022 May.
Article in English | MEDLINE | ID: mdl-35649972

ABSTRACT

User opinion affects the performance of network reconstruction greatly since it plays a crucial role in the network structure. In this paper, we present a novel model for reconstructing the social network with community structure by taking into account the Hegselmann-Krause bounded confidence model of opinion dynamic and compressive sensing method of network reconstruction. Three types of user opinion, including the random opinion, the polarity opinion, and the overlap opinion, are constructed. First, in Zachary's karate club network, the reconstruction accuracies are compared among three types of opinions. Second, the synthetic networks, generated by the Stochastic Block Model, are further examined. The experimental results show that the user opinions play a more important role than the community structure for the network reconstruction. Moreover, the polarity of opinions can increase the accuracy of inter-community and the overlap of opinions can improve the reconstruction accuracy of intra-community. This work helps reveal the mechanism between information propagation and social relation prediction.


Subject(s)
Attitude , Mental Processes , Social Networking
14.
Brief Bioinform ; 23(3)2022 05 13.
Article in English | MEDLINE | ID: mdl-35368077

ABSTRACT

Survival analysis is a technique for identifying prognostic biomarkers and genetic vulnerabilities in cancer studies. Large-scale consortium-based projects have profiled >11 000 adult and >4000 pediatric tumor cases with clinical outcomes and multiomics approaches. This provides a resource for investigating molecular-level cancer etiologies using clinical correlations. Although cancers often arise from multiple genetic vulnerabilities and have deregulated gene sets (GSs), existing survival analysis protocols can report only on individual genes. Additionally, there is no systematic method to connect clinical outcomes with experimental (cell line) data. To address these gaps, we developed cSurvival (https://tau.cmmt.ubc.ca/cSurvival). cSurvival provides a user-adjustable analytical pipeline with a curated, integrated database and offers three main advances: (i) joint analysis with two genomic predictors to identify interacting biomarkers, including new algorithms to identify optimal cutoffs for two continuous predictors; (ii) survival analysis not only at the gene, but also the GS level; and (iii) integration of clinical and experimental cell line studies to generate synergistic biological insights. To demonstrate these advances, we report three case studies. We confirmed findings of autophagy-dependent survival in colorectal cancers and of synergistic negative effects between high expression of SLC7A11 and SLC2A1 on outcomes in several cancers. We further used cSurvival to identify high expression of the Nrf2-antioxidant response element pathway as a main indicator for lung cancer prognosis and for cellular resistance to oxidative stress-inducing drugs. Altogether, these analyses demonstrate cSurvival's ability to support biomarker prognosis and interaction analysis via gene- and GS-level approaches and to integrate clinical and experimental biomedical studies.


Subject(s)
Biomarkers, Tumor , Lung Neoplasms , Adult , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Cell Line , Child , Gene Expression Regulation, Neoplastic , Humans , Lung Neoplasms/genetics , Survival Analysis
15.
Front Genet ; 13: 836798, 2022.
Article in English | MEDLINE | ID: mdl-35281805

ABSTRACT

The new technology of single-cell RNA sequencing (scRNA-seq) can yield valuable insights into gene expression and give critical information about the cellular compositions of complex tissues. In recent years, vast numbers of scRNA-seq datasets have been generated and made publicly available, and this has enabled researchers to train supervised machine learning models for predicting or classifying various cell-level phenotypes. This has led to the development of many new methods for analyzing scRNA-seq data. Despite the popularity of such applications, there has as yet been no systematic investigation of the performance of these supervised algorithms using predictors from various sizes of scRNA-seq datasets. In this study, 13 popular supervised machine learning algorithms for cell phenotype classification were evaluated using published real and simulated datasets with diverse cell sizes. This benchmark comprises two parts. In the first, real datasets were used to assess the computing speed and cell phenotype classification performance of popular supervised algorithms. The classification performances were evaluated using the area under the receiver operating characteristic curve, F1-score, Precision, Recall, and false-positive rate. In the second part, we evaluated gene-selection performance using published simulated datasets with a known list of real genes. The results showed that ElasticNet with interactions performed the best for small and medium-sized datasets. The NaiveBayes classifier was found to be another appropriate method for medium-sized datasets. With large datasets, the performance of the XGBoost algorithm was found to be excellent. Ensemble algorithms were not found to be significantly superior to individual machine learning methods. Including interactions in the ElasticNet algorithm caused a significant performance improvement for small datasets. The linear discriminant analysis algorithm was found to be the best choice when speed is critical; it is the fastest method, it can scale to handle large sample sizes, and its performance is not much worse than the top performers.

16.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34791019

ABSTRACT

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for millions of deaths around the world. To help contribute to the understanding of crucial knowledge and to further generate new hypotheses relevant to SARS-CoV-2 and human protein interactions, we make use of the information abundant Biomine probabilistic database and extend the experimentally identified SARS-CoV-2-human protein-protein interaction (PPI) network in silico. We generate an extended network by integrating information from the Biomine database, the PPI network and other experimentally validated results. To generate novel hypotheses, we focus on the high-connectivity sub-communities that overlap most with the integrated experimentally validated results in the extended network. Therefore, we propose a new data analysis pipeline that can efficiently compute core decomposition on the extended network and identify dense subgraphs. We then evaluate the identified dense subgraph and the generated hypotheses in three contexts: literature validation for uncovered virus targeting genes and proteins, gene function enrichment analysis on subgraphs and literature support on drug repurposing for identified tissues and diseases related to COVID-19. The major types of the generated hypotheses are proteins with their encoding genes and we rank them by sorting their connections to the integrated experimentally validated nodes. In addition, we compile a comprehensive list of novel genes, and proteins potentially related to COVID-19, as well as novel diseases which might be comorbidities. Together with the generated hypotheses, our results provide novel knowledge relevant to COVID-19 for further validation.


Subject(s)
COVID-19 , Computer Simulation , Models, Biological , Protein Interaction Maps , COVID-19/genetics , COVID-19/metabolism , Humans , SARS-CoV-2/chemistry , SARS-CoV-2/genetics , SARS-CoV-2/metabolism
17.
EClinicalMedicine ; 38: 101035, 2021 Aug.
Article in English | MEDLINE | ID: mdl-34308301

ABSTRACT

BACKGROUND: Many countries have implemented lockdowns to reduce COVID-19 transmission. However, there is no consensus on the optimal timing of these lockdowns to control community spread of the disease. Here we evaluated the relationship between timing of lockdowns, along with other risk factors, and the growth trajectories of COVID-19 across 3,112 counties in the US. METHODS: We ascertained dates for lockdowns and implementation of various non-pharmaceutical interventions at a county level and merged these data with those of US census and county-specific COVID-19 daily cumulative case counts. We then applied a Functional Principal Component (FPC) analysis on this dataset to generate FPC scores, which were used as a surrogate variable to describe the trajectory of daily cumulative case counts for each county. We used machine learning methods to identify risk factors including the timing of lockdown that significantly influenced the FPC scores. FINDINGS: We found that the first eigen-function accounted for most (>92%) of the variations in the daily cumulative case counts. The impact of lockdown timing on the total daily case count of a county became significant beginning approximately 7 days prior to that county reporting at least 5 cumulative cases of COVID-19. Delays in lockdown implementation after this date led to a rapid acceleration of COVID-19 spread in the county over the first ~50 days from the date with at least 5 cumulative cases, and higher case counts across the entirety of the follow-up period. Other factors such as total population, median family income, Gini index, median age, and within-county mobility also had a substantial effect. When adjusted for all these factors, the timing of lockdowns was the most significant risk factor associated with the county-specific daily cumulative case counts. INTERPRETATION: Lockdowns are an effective way of controlling the spread of COVID-19 in communities. Significant delays in lockdown cause a dramatic increase in the cumulative case counts. Thus, the timing of the lockdown relative to the case count is an important consideration in controlling the pandemic in communities. FUNDING: The study period is from June 2020 to July 2021. Dr. Xuekui Zhang is a Tier 2 Canada Research Chairs (Grant No. 950231363) and funded by Natural Sciences and Engineering Research Council of Canada (Grant No. RGPIN201704722). Dr. Li Xing is funded by Natural Sciences and Engineering Research Council of Canada (Grant Number: RGPIN 202103530). This research was enabled in part by support provided by WestGrid (www.westgrid.ca) and Compute Canada (www.computecanada.ca). The computing resource is provided by Compute Canada Resource Allocation Competitions #3495 (PI: Xuekui Zhang) and #1551 (PI: Li Xing). Dr. Don Sin is a Tier 1 Canada Research Chair in COPD and holds the de Lazzari Family Chair at the Heart Lung Innovation, Vancouver, Canada.

18.
Stat Med ; 40(7): 1752-1766, 2021 03 30.
Article in English | MEDLINE | ID: mdl-33426649

ABSTRACT

As a future trend of healthcare, personalized medicine tailors medical treatments to individual patients. It requires to identify a subset of patients with the best response to treatment. The subset can be defined by a biomarker (eg, expression of a gene) and its cutoff value. Topics on subset identification have received massive attention. There are over two million hits by keyword searches on Google Scholar. However, designing clinical trials that utilize the discovered uncertain subsets/biomarkers is not trivial and rarely discussed in the literature. This leads to a gap between research results and real-world drug development. To fill in this gap, we formulate the problem of clinical trial design into an optimization problem involving high-dimensional integration, and propose a novel computational solution based on Monte Carlo and smoothing methods. Our method utilizes the modern techniques of general purpose computing on graphics processing units for large-scale parallel computing. Compared to a published method in three-dimensional problems, our approach is more accurate and 133 times faster. This advantage increases when dimensionality increases. Our method is scalable to higher dimensional problems since the precision bound of our estimated study power is a finite number not affected by dimensionality. To design clinical trials incorporating the potential biomarkers, users can use our software "DesignCTPB". This software can be found on Github and will be available as an R package on CRAN. Although our research is motivated by the design of clinical trials, the method can be used widely to solve other optimization problems involving high-dimensional integration.


Subject(s)
Computer Graphics , Software , Algorithms , Biomarkers , Humans , Monte Carlo Method
19.
Bioinformation ; 16(5): 393-397, 2020.
Article in English | MEDLINE | ID: mdl-32831520

ABSTRACT

Genome-wide association study (GWAS) is a popular approach to investigate relationships between genetic information and diseases. A number of associations are tested in a study and the results are often corrected using multiple adjustment methods. It is observed that GWAS studies suffer adequate statistical power for reliability. Hence, we document known models for reliability assessment using improved statistical power in GWAS analysis.

20.
Bioinformatics ; 36(1): 65-72, 2020 01 01.
Article in English | MEDLINE | ID: mdl-31263871

ABSTRACT

MOTIVATION: HIV is difficult to treat because its virus mutates at a high rate and mutated viruses easily develop resistance to existing drugs. If the relationships between mutations and drug resistances can be determined from historical data, patients can be provided personalized treatment according to their own mutation information. The HIV Drug Resistance Database was built to investigate the relationships. Our goal is to build a model using data in this database, which simultaneously predicts the resistance of multiple drugs using mutation information from sequences of viruses for any new patient. RESULTS: We propose two variations of a stacking algorithm which borrow information among multiple prediction tasks to improve multivariate prediction performance. The most attractive feature of our proposed methods is the flexibility with which complex multivariate prediction models can be constructed using any univariate prediction models. Using cross-validation studies, we show that our proposed methods outperform other popular multivariate prediction methods. AVAILABILITY AND IMPLEMENTATION: An R package is being developed. In the meantime, R code can be requested by email. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Drug Resistance, Viral , HIV Infections , HIV-1 , Computational Biology/methods , Drug Resistance, Viral/genetics , HIV Infections/virology , HIV-1/drug effects , HIV-1/genetics , Humans , Mutation , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...