Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 999
Filter
1.
Biom J ; 66(5): e202300278, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38988195

ABSTRACT

Rapid advances in high-throughput DNA sequencing technologies have enabled large-scale whole genome sequencing (WGS) studies. Before performing association analysis between phenotypes and genotypes, preprocessing and quality control (QC) of the raw sequence data need to be performed. Because many biostatisticians have not been working with WGS data so far, we first sketch Illumina's short-read sequencing technology. Second, we explain the general preprocessing pipeline for WGS studies. Third, we provide an overview of important QC metrics, which are applied to WGS data: on the raw data, after mapping and alignment, after variant calling, and after multisample variant calling. Fourth, we illustrate the QC with the data from the GENEtic SequencIng Study Hamburg-Davos (GENESIS-HD), a study involving more than 9000 human whole genomes. All samples were sequenced on an Illumina NovaSeq 6000 with an average coverage of 35× using a PCR-free protocol. For QC, one genome in a bottle (GIAB) trio was sequenced in four replicates, and one GIAB sample was successfully sequenced 70 times in different runs. Fifth, we provide empirical data on the compression of raw data using the DRAGEN original read archive (ORA). The most important quality metrics in the application were genetic similarity, sample cross-contamination, deviations from the expected Het/Hom ratio, relatedness, and coverage. The compression ratio of the raw files using DRAGEN ORA was 5.6:1, and compression time was linear by genome coverage. In summary, the preprocessing, joint calling, and QC of large WGS studies are feasible within a reasonable time, and efficient QC procedures are readily available.


Subject(s)
Quality Control , Whole Genome Sequencing , Humans , Biometry/methods , Biostatistics/methods , High-Throughput Nucleotide Sequencing
2.
Invest Ophthalmol Vis Sci ; 65(8): 7, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38958969

ABSTRACT

Purpose: To describe and demonstrate sample size and power calculation for ophthalmic studies with a binary outcome from one or both eyes. Methods: We describe sample size and power calculation for four commonly used eye designs: (1) one-eye design or person-design: one eye per subject or outcome is at person-level; (2) paired design: two eyes per subject and two eyes are in different treatment groups; (3) two-eye design: two eyes per subject and both eyes are in the same treatment group; and (4) mixture design: mixture of one eye and two eyes per subject. For each design, we demonstrate sample size and power calculations in real ophthalmic studies. Results: Using formulas and commercial or free statistical packages including SAS, STATA, R, and PS, we calculated sample size and power. We demonstrated that different statistical packages require different parameters and provide similar, yet not identical, results. We emphasize that studies using data from two eyes of a subject need to account for the intereye correlation for appropriate sample size and power calculations. We demonstrate the gain in efficiency in designs that include two eyes of a subject compared to one-eye designs. Conclusions: Ophthalmic studies use different eye designs that include one or both eyes in the same or different treatment groups. Appropriate sample size and power calculations depend on the eye design and should account for intereye correlation when two eyes from some or all subjects are included in a study. Calculations can be executed using formulas and commercial or free statistical packages.


Subject(s)
Biostatistics , Ophthalmology , Humans , Sample Size , Biostatistics/methods , Research Design , Eye Diseases/diagnosis
4.
Pharm Stat ; 23(4): 495-510, 2024.
Article in English | MEDLINE | ID: mdl-38326967

ABSTRACT

We present the motivation, experience, and learnings from a data challenge conducted at a large pharmaceutical corporation on the topic of subgroup identification. The data challenge aimed at exploring approaches to subgroup identification for future clinical trials. To mimic a realistic setting, participants had access to 4 Phase III clinical trials to derive a subgroup and predict its treatment effect on a future study not accessible to challenge participants. A total of 30 teams registered for the challenge with around 100 participants, primarily from Biostatistics organization. We outline the motivation for running the challenge, the challenge rules, and logistics. Finally, we present the results of the challenge, the participant feedback as well as the learnings. We also present our view on the implications of the results on exploratory analyses related to treatment effect heterogeneity.


Subject(s)
Clinical Trials, Phase III as Topic , Motivation , Humans , Clinical Trials, Phase III as Topic/methods , Drug Industry , Research Design , Treatment Outcome , Biostatistics/methods , Data Interpretation, Statistical
5.
Curr Opin Allergy Clin Immunol ; 24(4): 237-242, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-38236908

ABSTRACT

PURPOSE OF REVIEW: The aim of the review conducted was to present recent articles indicating the need to implement statistical recommendations in the daily work of biomedical journals. RECENT FINDINGS: The most recent literature shows an unchanged percentage of journals using specialized statistical review over 20 years. The problems of finding statistical reviewers, the impractical way in which biostatistics is taught and the nonimplementation of published statistical recommendations contribute to the fact that a small percentage of accepted manuscripts contain correctly performed analysis. The statistical recommendations published for authors and editorial board members in recent years contain important advice, but more emphasis should be placed on their practical and rigorous implementation. If this is not the case, we will additionally continue to experience low reproducibility of the research. SUMMARY: There is a low level of statistical reporting these days. Recommendations related to the statistical review of submitted manuscripts should be followed more rigorously.


Subject(s)
Research Design , Humans , Reproducibility of Results , Research Design/standards , Periodicals as Topic , Data Interpretation, Statistical , Biostatistics/methods
6.
Nucleic Acids Res ; 52(D1): D963-D971, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37953384

ABSTRACT

Polygenic score (PGS) is an important tool for the genetic prediction of complex traits. However, there are currently no resources providing comprehensive PGSs computed from published summary statistics, and it is difficult to implement and run different PGS methods due to the complexity of their pipelines and parameter settings. To address these issues, we introduce a new resource called PGS-Depot containing the most comprehensive set of publicly available disease-related GWAS summary statistics. PGS-Depot includes 5585 high quality summary statistics (1933 quantitative and 3652 binary trait statistics) curated from 1564 traits in European and East Asian populations. A standardized best-practice pipeline is used to implement 11 summary statistics-based PGS methods, each with different model assumptions and estimation procedures. The prediction performance of each method can be compared for both in- and cross-ancestry populations, and users can also submit their own summary statistics to obtain custom PGS with the available methods. Other features include searching for PGSs by trait name, publication, cohort information, population, or the MeSH ontology tree and searching for trait descriptions with the experimental factor ontology (EFO). All scores, SNP effect sizes and summary statistics can be downloaded via FTP. PGS-Depot is freely available at http://www.pgsdepot.net.


Subject(s)
Biostatistics , Multifactorial Inheritance , Genome-Wide Association Study , Multifactorial Inheritance/genetics , Phenotype , Polymorphism, Single Nucleotide , Biostatistics/methods
8.
Biostatistics ; 25(3): 666-680, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38141227

ABSTRACT

With rapid development of techniques to measure brain activity and structure, statistical methods for analyzing modern brain-imaging data play an important role in the advancement of science. Imaging data that measure brain function are usually multivariate high-density longitudinal data and are heterogeneous across both imaging sources and subjects, which lead to various statistical and computational challenges. In this article, we propose a group-based method to cluster a collection of multivariate high-density longitudinal data via a Bayesian mixture of smoothing splines. Our method assumes each multivariate high-density longitudinal trajectory is a mixture of multiple components with different mixing weights. Time-independent covariates are assumed to be associated with the mixture components and are incorporated via logistic weights of a mixture-of-experts model. We formulate this approach under a fully Bayesian framework using Gibbs sampling where the number of components is selected based on a deviance information criterion. The proposed method is compared to existing methods via simulation studies and is applied to a study on functional near-infrared spectroscopy, which aims to understand infant emotional reactivity and recovery from stress. The results reveal distinct patterns of brain activity, as well as associations between these patterns and selected covariates.


Subject(s)
Bayes Theorem , Humans , Longitudinal Studies , Brain/physiology , Brain/diagnostic imaging , Spectroscopy, Near-Infrared/methods , Data Interpretation, Statistical , Models, Statistical , Infant , Multivariate Analysis , Biostatistics/methods
9.
Int J Biostat ; 19(2): 309-331, 2023 11 01.
Article in English | MEDLINE | ID: mdl-37192544

ABSTRACT

In this work, we examine recently developed methods for Bayesian inference of optimal dynamic treatment regimes (DTRs). DTRs are a set of treatment decision rules aimed at tailoring patient care to patient-specific characteristics, thereby falling within the realm of precision medicine. In this field, researchers seek to tailor therapy with the intention of improving health outcomes; therefore, they are most interested in identifying optimal DTRs. Recent work has developed Bayesian methods for identifying optimal DTRs in a family indexed by ψ via Bayesian dynamic marginal structural models (MSMs) (Rodriguez Duque D, Stephens DA, Moodie EEM, Klein MB. Semiparametric Bayesian inference for dynamic treatment regimes via dynamic regime marginal structural models. Biostatistics; 2022. (In Press)); we review the proposed estimation procedure and illustrate its use via the new BayesDTR R package. Although methods in Rodriguez Duque D, Stephens DA, Moodie EEM, Klein MB. (Semiparametric Bayesian inference for dynamic treatment regimes via dynamic regime marginal structural models. Biostatistics; 2022. (In Press)) can estimate optimal DTRs well, they may lead to biased estimators when the model for the expected outcome if everyone in a population were to follow a given treatment strategy, known as a value function, is misspecified or when a grid search for the optimum is employed. We describe recent work that uses a Gaussian process ( G P ) prior on the value function as a means to robustly identify optimal DTRs (Rodriguez Duque D, Stephens DA, Moodie EEM. Estimation of optimal dynamic treatment regimes using Gaussian processes; 2022. Available from: https://doi.org/10.48550/arXiv.2105.12259). We demonstrate how a G P approach may be implemented with the BayesDTR package and contrast it with other value-search approaches to identifying optimal DTRs. We use data from an HIV therapeutic trial in order to illustrate a standard analysis with these methods, using both the original observed trial data and an additional simulated component to showcase a longitudinal (two-stage DTR) analysis.


Subject(s)
Models, Statistical , Precision Medicine , Humans , Bayes Theorem , Precision Medicine/methods , Biostatistics/methods
10.
Rev Esp Enferm Dig ; 115(4): 157-159, 2023 04.
Article in English | MEDLINE | ID: mdl-36779473

ABSTRACT

Statistical tests are the foundation on which data analysis for all types of clinical, basic and epidemiological research relies upon. Clinicians need to understand these tests and the basics of biostatistics to correctly relay the data and results from their research and to understand the results of other scientific publications. The first article in this three-part editorial series aims to present, in a clear way, some essential concepts of biostatistics that can assist the gastroenterologist in understanding scientific research, for an evidence-based clinical practice.


Subject(s)
Gastroenterologists , Humans , Biostatistics/methods
11.
Lifetime Data Anal ; 29(3): 508-536, 2023 07.
Article in English | MEDLINE | ID: mdl-36624222

ABSTRACT

The progression of disease for an individual can be described mathematically as a stochastic process. The individual experiences a failure event when the disease path first reaches or crosses a critical disease level. This happening defines a failure event and a first hitting time or time-to-event, both of which are important in medical contexts. When the context involves explanatory variables then there is usually an interest in incorporating regression structures into the analysis and the methodology known as threshold regression comes into play. To date, most applications of threshold regression have been based on parametric families of stochastic processes. This paper presents a semiparametric form of threshold regression that requires the stochastic process to have only one key property, namely, stationary independent increments. As this property is frequently encountered in real applications, this model has potential for use in many fields. The mathematical underpinnings of this semiparametric approach for estimation and prediction are described. The basic data element required by the model is a pair of readings representing the observed change in time and the observed change in disease level, arising from either a failure event or survival of the individual to the end of the data record. An extension is presented for applications where the underlying disease process is unobservable but component covariate processes are available to construct a surrogate disease process. Threshold regression, used in combination with a data technique called Markov decomposition, allows the methods to handle longitudinal time-to-event data by uncoupling a longitudinal record into a sequence of single records. Computational aspects of the methods are straightforward. An array of simulation experiments that verify computational feasibility and statistical inference are reported in an online supplement. Case applications based on longitudinal observational data from The Osteoarthritis Initiative (OAI) study are presented to demonstrate the methodology and its practical use.


Subject(s)
Biostatistics , Models, Statistical , Humans , Stochastic Processes , Computer Simulation , Time Factors , Biostatistics/methods
13.
Rev. esp. enferm. dig ; 115(4): 157-159, 2023. ilus
Article in English | IBECS | ID: ibc-218581
14.
Rev. cuba. inform. méd ; 14(2): e529, jul.-dic. 2022. tab, graf
Article in Spanish | LILACS, CUMED | ID: biblio-1408550

ABSTRACT

El uso de dispositivos móviles en la vida moderna es imprescindible debido a las ventajas que brindan al ofrecer nuevas posibilidades e implementar de manera virtual servicios ya establecidos. La mayor existencia de móviles que computadoras en los estudiantes de Cuba nos motivó a la realización de esta aplicación. El objetivo de este artículo es describir la aplicación nombrada Cálculos estadísticos y tasas en salud (Calc. Tasas versión 1.7) construida para realizar cálculos en un curso de Bioestadística, cubriendo gran parte del contenido de esta asignatura en la enseñanza de pregrado de las universidades médicas, así como otros contenidos de interés en esta materia. También incorpora una base de datos con información demográfica y sanitaria de Cuba y sus provincias en el período 2013-2020. Como resultado se logró independencia tecnológica al dejar de usar programas foráneos y se logró una mayor portabilidad pues funciona tanto en móviles como en computadoras utilizando un emulador de Android(AU)


The use of mobile devices in modern life is essential due to the advantages they provide, offering new possibilities and implementing virtual services. The existence of greater number of mobiles phones than computers in Cuban students motivated the realization of this application. The objective of the article is to describe the application Statistical calculations and rates in health (Calc. Rates version 1.7) built to perform calculations in a Biostatistics course, covering a large part of the content of this subject in the undergraduate teaching of medical universities, as well as other content related with this topic. It also incorporates a database with demographic and health information on Cuba and its provinces in the period 2013-2020. As a result, technological independence was achieved by stopping using foreign programs and a greater portability, since it works on both mobile phones and computers through an Android emulator(AU)


Subject(s)
Humans , Male , Female , Mathematical Computing , Medical Informatics Applications , Programming Languages , Biostatistics/methods , Mobile Applications , Cuba
15.
Rev. chil. infectol ; 39(5): 630-639, oct. 2022. ilus, tab
Article in Spanish | LILACS | ID: biblio-1431692

ABSTRACT

La estadística es uno de los pilares de la ciencia, especialmente para describir y analizar los datos, la que ha progresado exponencialmente en los últimos años. Se le debe entender como un apoyo fundamental para la toma de decisiones en las distintas disciplinas. Los análisis exploratorios son descritos como el primer paso en la estadística, buscando organizar, representar y describir los datos, pero muchas veces este proceso se vuelve complejo, siendo importante realizar una revisión exhaustiva de la matriz de datos. Es objetivo de este manuscrito describir una propuesta metodológica para la validación de bases de datos aplicada a las ciencias de la salud. Esta metodología consta de seis etapas: cuatro de ellas obligatorias y sucesivas, y las otras dos opcionales. En la literatura médica, estos procedimientos general-mente son pasados por alto; buscamos, en consecuencia, recalcar la importancia de este proceso como previo a los análisis exploratorios.


Statistics is one of the pillars of science, especially to describe and analyze data, it has progressed exponentially in recent years. Being the fundamental support for decision-making in different disciplines. Exploratory analyzes are described as the first step in statistics, seeking to organize, represent and describe the data, but many times this process becomes complex, and it is important to carry out an exhaustive review of the data matrix. The aim of this manuscript was to describe a methodological proposal for databases validation applied to health sciences. This methodology consists of 6 stages, 4 of them compulsory and successive, and the other two, optional. In the literature these procedures are generally overlooked, our purpose is thus to emphasize the importance of this process prior to exploratory analyzes.


Subject(s)
Humans , Biostatistics/methods , Databases as Topic , Health Sciences , Data Accuracy
16.
PLoS One ; 17(7): e0272007, 2022.
Article in English | MEDLINE | ID: mdl-35867721

ABSTRACT

Interval estimation with accurate coverage for risk difference (RD) in a correlated 2 × 2 table with structural zero is a fundamental and important problem in biostatistics. The score test-based and Bayesian tail-based confidence intervals (CIs) have good coverage performance among the existing methods. However, as approximation approaches, they have coverage probabilities lower than the nominal confidence level for finite and moderate sample sizes. In this paper, we propose three new CIs for RD based on the fiducial, inferential model (IM) and modified IM (MIM) methods. The IM interval is proven to be valid. Moreover, simulation studies show that the CIs of fiducial and MIM methods can guarantee the preset coverage rate even for small sample sizes. More importantly, in terms of coverage probability and expected length, the MIM interval outperforms other intervals. Finally, a real example illustrates the application of the proposed methods.


Subject(s)
Biostatistics , Models, Statistical , Bayes Theorem , Biometry , Biostatistics/methods , Confidence Intervals , Sample Size
17.
Parasit Vectors ; 15(1): 35, 2022 Jan 24.
Article in English | MEDLINE | ID: mdl-35073988

ABSTRACT

Dose-response relationships reflect the effects of a substance on organisms, and are widely used in broad research areas, from medicine and physiology, to vector control and pest management in agronomy. Furthermore, reporting on the response of organisms to stressors is an essential component of many public policies (e.g. public health, environment), and assessment of xenobiotic responses is an integral part of World Health Organization recommendations. Building upon an R script that we previously made available, and considering its popularity, we have now developed a software package in the R environment, BioRssay, to efficiently analyze dose-response relationships. It has more user-friendly functions and more flexibility, and proposes an easy interpretation of the results. The functions in the BioRssay package are built on robust statistical analyses to compare the dose/exposure-response of various bioassays and effectively visualize them in probit-graphs.


Subject(s)
Biological Assay/statistics & numerical data , Biostatistics , Software , Animals , Biostatistics/instrumentation , Biostatistics/methods , Dose-Response Relationship, Drug , Humans , Lethal Dose 50 , Public Health/statistics & numerical data
SELECTION OF CITATIONS
SEARCH DETAIL
...