Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Drug Saf ; 45(5): 549-561, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35579817

RESUMO

INTRODUCTION: Coding medicinal products described on adverse event (AE) reports to specific entries in standardised drug dictionaries, such as WHODrug Global, is a time-consuming step in case processing activities despite its potential for automation. Many organisations are already partially automating drug coding using text-processing methods and synonym lists, however addressing challenges such as misspellings, abbreviations or ambiguous trade names requires more advanced methods. WHODrug Koda is a drug coding engine using text-processing algorithms, built-in coding rules and machine learning to code drug verbatims to WHODrug Global. OBJECTIVE: Our aim was to evaluate the drug coding performance of WHODrug Koda on AE reports from VigiBase, the World Health Organization's global database of individual case safety reports, in terms of level of automation and coding quality. METHODS: Koda was evaluated on 4.8 million drug entries from VigiBase. Automation level was computed as the proportion of drug entries automatically coded by Koda and was compared to a simple case-insensitive text-matching algorithm. Coding quality was evaluated in terms of coding accuracy, by comparing Koda's prediction to the WHODrug entries found on the AE reports in VigiBase. To better understand the cases in which Koda's coding results did not match with the WHODrug entries in VigiBase, a manual assessment of 600 samples of disagreeing encodings was performed by two teams of expert drug coders. RESULTS: Compared with a simple direct-match baseline, Koda can increase the automation level from 61% to 89%, while providing high coding quality with an accuracy of 97%. CONCLUSIONS: Even though Koda was designed for use in clinical trials, Koda achieves automation level and coding quality for drug coding of AE reports comparable with the performance observed in a previous evaluation of Koda on clinical trial data. Koda can thus help organisations to automate their drug coding of AE reports to a large degree.


Assuntos
Algoritmos , Inteligência Artificial , Automação , Bases de Dados Factuais , Humanos , Aprendizado de Máquina
2.
Mol Biol Evol ; 37(10): 2944-2954, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-32697301

RESUMO

The southern African indigenous Khoe-San populations harbor the most divergent lineages of all living peoples. Exploring their genomes is key to understanding deep human history. We sequenced 25 full genomes from five Khoe-San populations, revealing many novel variants, that 25% of variants are unique to the Khoe-San, and that the Khoe-San group harbors the greatest level of diversity across the globe. In line with previous studies, we found several gene regions with extreme values in genome-wide scans for selection, potentially caused by natural selection in the lineage leading to Homo sapiens and more recent in time. These gene regions included immunity-, sperm-, brain-, diet-, and muscle-related genes. When accounting for recent admixture, all Khoe-San groups display genetic diversity approaching the levels in other African groups and a reduction in effective population size starting around 100,000 years ago. Hence, all human groups show a reduction in effective population size commencing around the time of the Out-of-Africa migrations, which coincides with changes in the paleoclimate records, changes that potentially impacted all humans at the time.


Assuntos
Evolução Biológica , Genoma Humano , Migração Humana , Povos Indígenas/genética , Densidade Demográfica , África Subsaariana , Humanos , Filogeografia
3.
Drug Saf ; 43(8): 797-808, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32410156

RESUMO

INTRODUCTION: A large number of studies on systems to detect and sometimes normalize adverse events (AEs) in social media have been published, but evidence of their practical utility is scarce. This raises the question of the transferability of such systems to new settings. OBJECTIVES: The aims of this study were to develop an AE recognition system, prospectively evaluate its performance on an external benchmark dataset and identify potential factors influencing the transferability of AE recognition systems. METHODS: A pipeline based on dictionary lookups and logistic regression classifiers was developed using a proprietary dataset of 196,533 Tweets manually annotated for AE relations and prospectively evaluated the system on the publicly available WEB-RADR reference dataset, exploring different aspects affecting transferability. RESULTS: Our system achieved 0.53 precision, 0.52 recall and 0.52 F1-score on the development test set; however, when applied to the WEB-RADR reference dataset, system performance dropped to 0.38 precision, 0.20 recall and 0.26 F1-score. Similarly, a previously published method aiming at automatically detecting adverse event posts reported 0.5 precision, 0.92 recall and 0.65 F1-score on thus another dataset, while performance on the WEB-RADR reference dataset was reduced to 0.37 precision, 0.63 recall and 0.46 F1-score. We identified four potential factors leading to poor transferability: overfitting, selection bias, label bias and prevalence. CONCLUSION: We warn the community about a potentially large discrepancy between the expected performance of automated AE recognition systems based on published results and the actual observed performance on independent data. This study highlights the difficulty of implementing an all-purpose system for automatic adverse event recognition in Twitter, which could explain the lack of such systems in practical pharmacovigilance settings. Our recommendation is to use benchmark independent datasets, such as the WEB-RADR reference, to investigate the transferability of the adverse event recognition systems and ultimately enforce rigorous comparisons across studies on the task.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos/normas , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Mídias Sociais , Bases de Dados Factuais , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/classificação , Humanos , Modelos Logísticos , Farmacovigilância , Prevalência , Estudos Prospectivos , Reprodutibilidade dos Testes , Viés de Seleção
4.
Drug Saf ; 43(5): 479-487, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32008183

RESUMO

INTRODUCTION: Uncovering safety signals through the collection and assessment of individual case reports remains a core pharmacovigilance activity. Despite the widespread use of disproportionality analysis in signal detection, recommendations are lacking on the minimum size of databases or subsets of databases required to yield robust results. OBJECTIVE: This study aims to investigate the relationship between database size and robustness of disproportionality analysis, with regards to limiting spurious associations. METHODS: Three types of subsets were created from the global database VigiBase: random subsets (500 replicates each of 11 fixed subset sizes between 250 and 100,000 reports), country-specific subsets (all 131 countries available in the original VigiBase extract) and subsets based on the Anatomical Therapeutic Chemical classification. For each subset, a spuriousness rate was computed as the ratio between the number of drug-event combinations highlighted by disproportionality analysis in a permuted version of the subset and the corresponding number in the original subset. In the permuted data, all true reporting associations between drugs and adverse events were broken. Subsets with fewer than five original associations were excluded. Additionally, the set of disproportionately over-reported drug-event combinations in three specific countries at three different time points were clinically assessed for labelledness. These time points corresponded to database sizes of less than 10,000, 5000 and 1000 reports, respectively. All disproportionality analysis was based on the Information Component (IC), implemented as IC025 > 0. RESULTS: Spuriousness rates were below 0.15 for all 110 included countries regardless of subset size, with only seven countries (6%) exceeding the empirical threshold of 0.10 observed for large subsets. All 21 excluded countries had < 500 reports. For random subsets containing 3000-5000 or more reports, the higher end of observed spuriousness rates was close to 0.10. In the clinical assessment, the proportion of labelled or otherwise known drug-event combinations was very high (87-100%) across all countries and time points studied. CONCLUSIONS: To mitigate the risk of highlighting spurious associations with disproportionality analysis, a minimum size of 500 reports is recommended for national databases. For databases or subsets that are not country-specific, our recommendation is 5000 reports. This study does not consider sensitivity, which is expected to be poor in smaller databases.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos/normas , Interpretação Estatística de Dados , Bases de Dados Factuais , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Reações Falso-Positivas , Farmacovigilância , Saúde Global , Humanos
5.
Drug Saf ; 43(5): 467-478, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-31997289

RESUMO

INTRODUCTION AND OBJECTIVE: Social media has been suggested as a source for safety information, supplementing existing safety surveillance data sources. This article summarises the activities undertaken, and the associated challenges, to create a benchmark reference dataset that can be used to evaluate the performance of automated methods and systems for adverse event recognition. METHODS: A retrospective analysis of public English-language Twitter posts (Tweets) was performed. We sampled 57,473 Tweets out of 5,645,336 Tweets created between 1 March, 2012 and 1 March, 2015 that mentioned at least one of six medicinal products of interest (insulin glargine, levetiracetam, methylphenidate, sorafenib, terbinafine, zolpidem). Products, adverse events, indications, product-event combinations, and product-indication combinations were extracted and coded by two independent teams of safety reviewers. RESULTS: The benchmark reference dataset consisted of 1056 positive controls ("adverse event Tweets") and 56,417 negative controls ("non-adverse event Tweets"). The 1056 adverse event Tweets contained 1396 product-event combinations referring to personal adverse event experiences, comprising 292 different MedDRA® Preferred Terms. The 1171 product-event combinations (83.9%) were confined to four MedDRA® System Organ Classes. The 195 Tweets (18.5%) contained indication information, comprising 25 different Preferred Terms. CONCLUSIONS: A manually curated benchmark reference dataset based on Twitter data has been created and is made available to the research community to evaluate the performance of automated methods and systems for adverse event recognition in unstructured free-text information.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos/normas , Benchmarking , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Mídias Sociais , Bases de Dados Factuais , Humanos , Farmacovigilância , Estados Unidos/epidemiologia
6.
BMC Bioinformatics ; 16: 242, 2015 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-26227424

RESUMO

BACKGROUND: In ecology and forensics, some population assignment techniques use molecular markers to assign individuals to known groups. However, assigning individuals to known populations can be difficult if the level of genetic differentiation among populations is small. Most assignment studies handle independent markers, often by pruning markers in Linkage Disequilibrium (LD), ignoring the information contained in the correlation among markers due to LD. RESULTS: To improve the accuracy of population assignment, we present an algorithm, implemented in the HaploPOP software, that combines markers into haplotypes, without requiring independence. The algorithm is based on the Gain of Informativeness for Assignment that provides a measure to decide if a pair of markers should be combined into haplotypes, or not, in order to improve assignment. Because complete exploration of all possible solutions for constructing haplotypes is computationally prohibitive, our approach uses a greedy algorithm based on windows of fixed sizes. We evaluate the performance of HaploPOP to assign individuals to populations using a split-validation approach. We investigate both simulated SNPs data and dense genotype data from individuals from Spain and Portugal. CONCLUSIONS: Our results show that constructing haplotypes with HaploPOP can substantially reduce assignment error. The HaploPOP software is freely available as a command-line software at www.ieg.uu.se/Jakobsson/software/HaploPOP/.


Assuntos
Genômica , Software , Algoritmos , Genética Populacional , Genótipo , Haplótipos , Humanos , Internet , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal
7.
Mol Biol Evol ; 32(6): 1544-55, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25739736

RESUMO

Adaptation drives genomic changes; however, evidence of specific adaptations in humans remains limited. We found that inhabitants of the northern Argentinean Andes, an arid region where elevated arsenic concentrations in available drinking water is common, have unique arsenic metabolism, with efficient methylation and excretion of the major metabolite dimethylated arsenic and a less excretion of the highly toxic monomethylated metabolite. We genotyped women from this population for 4,301,332 single nucleotide polymorphisms (SNPs) and found a strong association between the AS3MT (arsenic [+3 oxidation state] methyltransferase) gene and mono- and dimethylated arsenic in urine, suggesting that AS3MT functions as the major gene for arsenic metabolism in humans. We found strong genetic differentiation around AS3MT in the Argentinean Andes population, compared with a highly related Peruvian population (FST = 0.014) from a region with much less environmental arsenic. Also, 13 of the 100 SNPs with the highest genome-wide Locus-Specific Branch Length occurred near AS3MT. In addition, our examination of extended haplotype homozygosity indicated a selective sweep of the Argentinean Andes population, in contrast to Peruvian and Colombian populations. Our data show that adaptation to tolerate the environmental stressor arsenic has likely driven an increase in the frequencies of protective variants of AS3MT, providing the first evidence of human adaptation to a toxic chemical.


Assuntos
Adaptação Fisiológica/genética , Arsênio/análise , Metiltransferases/genética , Adolescente , Adulto , Idoso , Alelos , Argentina , Arsênio/toxicidade , Arsênio/urina , Bases de Dados Genéticas , Feminino , Genética Populacional , Estudo de Associação Genômica Ampla , Genótipo , Haplótipos , Humanos , Modelos Lineares , Metiltransferases/metabolismo , Pessoa de Meia-Idade , Peru , Fenótipo , Polimorfismo de Nucleotídeo Único , Seleção Genética , Adulto Jovem
8.
Mol Ecol ; 24(2): 328-45, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25482153

RESUMO

Approximate Bayesian computation (ABC) is a powerful tool for model-based inference of demographic histories from large genetic data sets. For most organisms, its implementation has been hampered by the lack of sufficient genetic data. Genotyping-by-sequencing (GBS) provides cheap genome-scale data to fill this gap, but its potential has not fully been exploited. Here, we explored power, precision and biases of a coalescent-based ABC approach where GBS data were modelled with either a population mutation parameter (θ) or a fixed site (FS) approach, allowing single or several segregating sites per locus. With simulated data ranging from 500 to 50 000 loci, a variety of demographic models could be reliably inferred across a range of timescales and migration scenarios. Posterior estimates were informative with 1000 loci for migration and split time in simple population divergence models. In more complex models, posterior distributions were wide and almost reverted to the uninformative prior even with 50 000 loci. ABC parameter estimates, however, were generally more accurate than an alternative composite-likelihood method. Bottleneck scenarios proved particularly difficult, and only recent bottlenecks without recovery could be reliably detected and dated. Notably, minor-allele-frequency filters - usual practice for GBS data - negatively affected nearly all estimates. With this in mind, we used a combination of FS and θ approaches on empirical GBS data generated from the Atlantic walrus (Odobenus rosmarus rosmarus), collectively providing support for a population split before the last glacial maximum followed by asymmetrical migration and a high Arctic bottleneck. Overall, this study evaluates the potential and limitations of GBS data in an ABC-coalescence framework and proposes a best-practice approach.


Assuntos
Teorema de Bayes , Genética Populacional , Modelos Genéticos , Morsas/genética , Animais
9.
Science ; 338(6105): 374-9, 2012 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-22997136

RESUMO

The history of click-speaking Khoe-San, and African populations in general, remains poorly understood. We genotyped ~2.3 million single-nucleotide polymorphisms in 220 southern Africans and found that the Khoe-San diverged from other populations ≥100,000 years ago, but population structure within the Khoe-San dated back to about 35,000 years ago. Genetic variation in various sub-Saharan populations did not localize the origin of modern humans to a single geographic region within Africa; instead, it indicated a history of admixture and stratification. We found evidence of adaptation targeting muscle function and immune response; potential adaptive introgression of protection from ultraviolet light; and selection predating modern human diversification, involving skeletal and neurological development. These new findings illustrate the importance of African genomic diversity in understanding human evolutionary history.


Assuntos
Adaptação Biológica/genética , Evolução Biológica , População Negra/genética , Genoma Humano/genética , População/genética , Animais , Botsuana , Cromossomos Humanos Par 10/genética , Cromossomos Humanos Par 6/genética , Genômica , Haplótipos , Homozigoto , Humanos , Músculo Esquelético/fisiologia , Pan troglodytes , Polimorfismo de Nucleotídeo Único
10.
Genetics ; 190(1): 159-74, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21868606

RESUMO

High-throughput genotyping and sequencing technologies can generate dense sets of genetic markers for large numbers of individuals. For most species, these data will contain many markers in linkage disequilibrium (LD). To utilize such data for population structure inference, we investigate the use of haplotypes constructed by combining the alleles at single-nucleotide polymorphisms (SNPs). We introduce a statistic derived from information theory, the gain of informativeness for assignment (GIA), which quantifies the additional information for assigning individuals to populations using haplotype data compared to using individual loci separately. Using a two-loci-two-allele model, we demonstrate that combining markers in linkage equilibrium into haplotypes always leads to nonpositive GIA, suggesting that combining the two markers is not advantageous for ancestry inference. However, for loci in LD, GIA is often positive, suggesting that assignment can be improved by combining markers into haplotypes. Using GIA as a criterion for combining markers into haplotypes, we demonstrate for simulated data a significant improvement of assigning individuals to candidate populations. For the many cases that we investigate, incorrect assignment was reduced between 26% and 97% using haplotype data. For empirical data from French and German individuals, the incorrectly assigned individuals can, for example, be decreased by 73% using haplotypes. Our results can be useful for challenging population structure and assignment problems, in particular for studies where large-scale population-genomic data are available.


Assuntos
Marcadores Genéticos , Haplótipos , Modelos Genéticos , Biologia Computacional/métodos , Simulação por Computador , Frequência do Gene , Genética Populacional , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , População Branca/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...